A Markov chain is “a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event”.

 Space or time can be either discrete(𝑋_𝑡:t=0, 1, 2,…) or continuous(𝑋_𝑡:t≥0). (we will focus on Markov chains in discrete space an time)
Example for Markov chain:

HMM is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (i.e. hidden) states.
State: x = (x1, x2, x3) ; 
Observation: y = (y1, y2, y3);
Transition matrix: A = (aij); 
Emission matrix: B = (bij)

Example:
 
 
Q1: how to evaluate the matching degree between the model and the observation ? ( forward algorithm)

Q2: how to infer the hidden state from the observation ? (Viterbi algorithm)
Observation 𝒚=(𝑦 
     
      
       
        
         
        
          1 
         
        
       
      
        _1 
       
      
    1, 𝑦 
     
      
       
        
         
        
          2 
         
        
       
      
        _2 
       
      
    2,…, 𝑦 
     
      
       
        
         
        
          𝑇 
         
        
       
      
        _𝑇 
       
      
    T), initial prob. 𝝅=(𝜋 
     
      
       
        
         
        
          1 
         
        
       
      
        _1 
       
      
    1,𝜋 
     
      
       
        
         
        
          2 
         
        
       
      
        _2 
       
      
    2,…, 𝜋 
     
      
       
        
         
        
          𝐾 
         
        
       
      
        _𝐾 
       
      
    K), transition matrix 𝐴, emission matrix 𝐵.
 
 Viterbi algorithm(optimal solution) backtracking method; It retains the optimal solution of each choice in the previous step and finds the optimal selection path through the backtracking method.
Example:
 
 

Q3: how to train the model to describe the observation more accurately ? (Baum-Welch algorithm)
The Baum–Welch algorithm uses the well known EM algorithm to find the maximum likelihood estimate of the parameters of a hidden Markov model given a set of observed feature vectors.
 
 
 Baum–Welch algorithm(forward-backward alg.) It approximates the optimal parameters through iteration.

use Lagrangian multiplier method and take the partial derivative of Lagrangian funcition。
CpG island:
 
 

reference:
未完待续…