This paper proposes a novel method for decoding any high-order hidden Markov model. First, the high-order hidden Markov model is transformed into an equivalent first-order hidden Markov model by Hadar’s transformation. Next, the optimal state sequence of the equivalent first-order hidden Markov model is recognized by the existing Viterbi algorithm of the first-order hidden Markov model. Finally, the optimal state sequence of the high-order hidden Markov model is inferred from the optimal state sequence of the equivalent first-order hidden Markov model. This method provides a unified algorithm framework for decoding hidden Markov models including the first-order hidden Markov model and any high-order hidden Markov model.

1. Introduction

Hidden Markov models are powerful tools for modeling and analyzing sequential data. For several decades, hidden Markov models have been used in many fields including handwriting recognition [1–3], speech recognition [4, 5], computational biology [6, 7], and longitudinal data analysis [8, 9]. Past and current developments on hidden Markov models are well documented in [10, 11]. A hidden Markov model comprises an underlying Markov chain and an observed process, where the observed process is a probabilistic function of the underlying Markov chain [12]. Given a hidden Markov model, an efficient procedure for finding the optimal state sequence is of great interest in the real-world applications. In the traditional first-order hidden Markov model, the Viterbi algorithm is utilized to recognize the optimal state sequence [13]. Like the Kalman filter, the Viterbi algorithm tracks the optimal state sequence with a recursive method.

In recent years, the theory and applications of high-order hidden Markov models have been substantially advanced, and high-order hidden Markov models are known to be more powerful than the first-order hidden Markov model. There are two basic approaches to study the algorithms of high-order hidden Markov models. The first one is called the extended approach, which is to extend directly the existing algorithms of the first-order hidden Markov model to high-order hidden Markov models [14–16]. The second one is called the model reduction method, which is to transform a high-order hidden Markov model to an equivalent first-order hidden Markov model by some means and then to establish the algorithms of the high-order hidden Markov model by using standard techniques applicable to the first-order hidden Markov model [17–20].

In this paper, we propose a novel method for decoding any high-order hidden Markov model. First, the high-order hidden Markov model is transformed into an equivalent first-order hidden Markov model by Hadar’s transformation. Next, the optimal state sequence of the equivalent first-order hidden Markov model is recognized by the existing Viterbi algorithm of the first-order hidden Markov model. Finally, the optimal state sequence of the high-order hidden Markov model is inferred from the optimal state sequence of the equivalent first-order hidden Markov model.

2. High-Order Hidden Markov Model and Hadar’s Transformation

Initially suppose two processes and {o_t} are defined on some probability space , where t is an integer index. takes values in a finite set , and o_t takes values in a finite set . Without loss of generality, the elements of can be denoted by {0,1, …, N − 1}. A high-order hidden Markov model is defined as follows.

Definition 1 (see [18], [20].)A high-order hidden Markov model is a doubly stochastic process with an underlying state process that is not directly observable but can be observed only through another stochastic process that is called the observation process. The observation process is governed by the hidden state process and produces the observation sequence. The state process and observation process, respectively, satisfy the following conditions.

(a)
The hidden state process is a homogeneous Markov chain of order n, that is, a stochastic process that satisfies
()
(b)
The observation process {o_t} is governed by the hidden state process according to a set of probability distributions that satisfy
()

To model the high-order hidden Markov model, the following parameters are needed.

(1)
State transition probability distribution:
()
(2)
Symbol emission probability distribution:
()
(3)
Initial state probability distribution:
()

where r = max⁡{n, m},

, and

. We call

, and

. For convenience, we use the compact notation

to indicate the complete parameters of the high-order hidden Markov model.

Definition 2 (see [18].)Let

()

be the mapping of any base N number to its decimal value; that is, if

, then

()

Definition 3 (see [17].)Any two models λ₁ and λ₂ are defined as equivalent if

()

for any arbitrary observation sequence O. In other words, two models are only considered equivalent if they yield the same likelihood, regardless of the specific observation sequence.

Based on Definition 2, we set

()

Since

, and

take values in the set

, q_t takes values in the set S = {0,1, …, N^r − 1}. Moreover, it is easy to see that the inverse transformation can be implemented as follows:

()

Remark 4. The function f is a one to one correspondence between the set and the set S = {0,1, …, N^r − 1}.

Proposition 5 (see [18].)Let a_ij = P(q_t+1 = j∣q_t = i) for any i, j ∈ S. If ⌊i/N⌋≠j − ⌊j/N^r−1⌋N^r−1, then a_ij = 0; that is, a transition from q_t = i to q_t+1 = j is impossible.

Lemma 6. Let ; then the process {q_t} forms the first-order homogeneous Markov chain.

Proof. Without loss of generality, we may assume that q_t = i and q_t+1 = j, where i, j ∈ S.

First, we consider the case that ⌊i/N⌋≠j − ⌊j/N^r−1⌋N^r−1. By Proposition 5, it is easy to see that

()

Next, we consider the case that ⌊i/N⌋ = j − ⌊j/N^r−1⌋N^r−1. Since q_t = i and q_t+1 = j, it follows from (10) that

()

where

. Moreover, we have

()

Through the above analysis, we derive that

()

Analogously, it is easy to see that

()

Therefore, the process {q_t} forms the first-order homogeneous Markov chain.

Lemma 7. The two processes {q_t} and {o_t} form the first-order hidden Markov model.

Proof. Without loss of generality, we may assume that and q_t = i, where and i ∈ S. Since q_t = i, it follows from (10) that

()

where

. Moreover, we have

()

Analogously, it is easy to see that

()

Combining these with Lemma 6, we prove that the two processes {q_t} and {o_t} form the first-order hidden Markov model.

Remark 8. Hadar and Messer [18] had also mentioned the fact that the two processes {q_t} and {o_t} form the first-order hidden Markov model, but they did not discuss and prove it in detail.

To model the first-order hidden Markov model {q_t, o_t}, the following parameters are needed.

(1)
State transition probability distribution:
()
(2)
Symbol emission probability distribution:
()
(3)
Initial state probability distribution:
()

where i, j ∈ S and

. We call A = {a_ij},

, and π = {π_i}. For convenience, we use the compact notation λ = (π, A, B) to indicate the complete parameters of the first-order hidden Markov model.

Proposition 9 (see [18].)Let i = f([i₁, …, i_r]) and j = f([i₀, …, i_r−1]) for any ; then

()

Lemma 10. Let O = o₁ ⋯ o_T be any arbitrary observation sequence; then

()

That is, the high-order hidden Markov model

is equivalent to the first-order hidden Markov model {q_t, o_t}.

Proof. For any , let

()

By Proposition 9, we have

()

Remark 11. Hadar and Messer [18] had also mentioned the fact that the high-order hidden Markov model is equivalent to the first-order hidden Markov model {q_t, o_t}, but they did not discuss and prove it in detail.

3. Methodology

Theorem 12. Let O = o₁ ⋯ o_T be any given observation sequence, and assume that ; then

()

Proof. Without loss of generality, let , where . According to Proposition 9, we have

()

where j_t = f([i_t, …, i_t−(r−1)]) for 1 ≤ t ≤ T.

On the other hand, it is easy to see that

()

Hence, by Lemma 10, we have

()

Theorem 13. Let O = o₁ ⋯ o_T be any given observation sequence, and assume that the state sequence satisfies

()

that is, the state sequence

is some optimal state sequence of the high-order hidden Markov model

. Let

(1 ≤ t ≤ T); then the state sequence

satisfies

()

That is, the state sequence

is some optimal state sequence of the first hidden Markov model {q_t, o_t}.

Proof. By Theorem 12, it is easy to see that

()

Meanwhile, we have the equation

()

Hence, we drive that

()

According to Theorem 13, we can know that some optimal state sequence of the high-order hidden Markov model is mapped to some optimal state sequence of the first-order hidden Markov model {q_t, o_t}. Similarly, we can draw the following conclusion.

Theorem 14. Let O = o₁ ⋯ o_T be any given observation sequence, and assume that the state sequence satisfies

()

that is, the state sequence

is some optimal state sequence of the first-order hidden Markov model {q_t, o_t}. For 1 ≤ t ≤ T, let

()

then the state sequence

satisfies

()

That is, the state sequence

is some optimal state sequence of the high-order hidden Markov model

Remark 15. Combining Theorem 13 with Theorem 14, it is known that there exists a one to one correspondence between the optimal state sequence of the high-order hidden Markov model and the optimal state sequence of the first-order hidden Markov model {q_t, o_t}.

To decode any high-order hidden Markov model , we transform it into an equivalent first-order hidden Markov model {q_t, o_t} by Hadar’s transformation. Then, do as follows.

Step 1. Determine some optimal state sequence of the first-order hidden Markov model {q_t, o_t} by using the Viterbi algorithm. Without loss of generality, let

()

where j₁, j₂, …, j_T ∈ S.

Step 2. For , using the transformations

()

we have

()

Step 3. For , , using the transformation

()

we have

()

Combining Step 2 with Step 3, we obtain the state sequence

()

According to Theorem 14, the above state sequence is some optimal state sequence of the high-order hidden Markov model .

4. Conclusions

In this paper, a novel method for decoding any high-order hidden Markov model is given. Based on this method, the optimal state sequence of any high-order hidden Markov model can be inferred by the existing Viterbi algorithm of the first-order hidden Markov model. This method has universal character for decoding hidden Markov models and provides a unified algorithm framework for decoding hidden Markov models including the first-order hidden Markov model and any high-order hidden Markov model. For instance, the Viterbi algorithm of the first-order hidden Markov model can be easily derived as a special case of our conclusion when m = n = 1.

This method we analyzed is practical and valuable in its own right. Future research could use this method for applications in handwriting, speech recognition, speaker recognition, emotion recognition, and so forth.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by the Major Program of the National Natural Science Foundation of China (no. 71390521), the Postdoctoral Science Foundation of China (no. 2014M551565), and the Scientific Research Foundation of Tongling University (no. 2012tlxyrc04).

References

1 Hu J., Brown M. K., and Turin W., HMM based on-line handwriting recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence. (1996) 18, no. 10, 1039–1045, https://doi.org/10.1109/34.541414, 2-s2.0-0030261112.
10.1109/34.541414
Web of Science® Google Scholar
2 Khorsheed M. S., Recognising handwritten Arabic manuscripts using a single hidden Markov model, Pattern Recognition Letters. (2003) 24, no. 14, 2235–2242, https://doi.org/10.1016/S0167-8655(03)00050-3, ZBL1047.68127, 2-s2.0-0042341575.
10.1016/S0167-8655(03)00050-3
Web of Science® Google Scholar
3 Artières T., Marukatat S., and Gallinari P., Online handwritten shape recognition using segmental hidden markov models, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2007) 29, no. 2, 205–217, https://doi.org/10.1109/TPAMI.2007.38, 2-s2.0-33947209467.
10.1109/TPAMI.2007.38
PubMed Web of Science® Google Scholar
4 Juang B. H. and Rabiner L. R., Hidden Markov models for speech recognition, Technometrics. (1991) 33, no. 3, 251–272, https://doi.org/10.2307/1268779, MR1132665.
10.1080/00401706.1991.10484833
Web of Science® Google Scholar
5 Gales M. and Young S., The application of hidden Markov Models in speech recognition, Foundations and Trends in Signal Processing. (2008) 1, no. 3, 195–304, https://doi.org/10.1561/2000000004, 2-s2.0-70349227947.
10.1561/2000000004
Google Scholar
6 Löytynoja A. and Milinkovitch M. C., A hidden Markov model for progressive multiple alignment, Bioinformatics. (2003) 19, no. 12, 1505–1513, https://doi.org/10.1093/bioinformatics/btg193, 2-s2.0-0041886963.
10.1093/bioinformatics/btg193
CAS PubMed Web of Science® Google Scholar
7 Regad L., Guyon F., Maupetit J., Tufféry P., and Camproux A. C., A Hidden Markov Model applied to the protein 3D structure analysis, Computational Statistics and Data Analysis. (2008) 52, no. 6, 3198–3207, https://doi.org/10.1016/j.csda.2007.09.010, ZBL05564696, 2-s2.0-39049088627.
10.1016/j.csda.2007.09.010
Web of Science® Google Scholar
8 Altman R. M., Mixed hidden Markov models: an extension of the hidden Markov model to the longitudinal data setting, Journal of the American Statistical Association. (2007) 102, no. 477, 201–210, https://doi.org/10.1198/016214506000001086, MR2345538, 2-s2.0-33947231297.
10.1198/016214506000001086
CAS Web of Science® Google Scholar
9 Spagnoli A., Henderson R., Boys R. J., and Houwing-Duistermaat J. J., A hidden Markov model for informative dropout in longitudinal response data with crisis states, Statistics and Probability Letters. (2011) 81, no. 7, 730–738, https://doi.org/10.1016/j.spl.2011.02.005, ZBL1217.62193, 2-s2.0-79955131852.
10.1016/j.spl.2011.02.005
Web of Science® Google Scholar
10 Ephraim Y. and Merhav N., Hidden Markov processes, IEEE Transactions on Information Theory. (2002) 48, no. 6, 1518–1569, https://doi.org/10.1109/TIT.2002.1003838, MR1909472, 2-s2.0-0036612017.
10.1109/TIT.2002.1003838
Web of Science® Google Scholar
11 Bilmes J. A., What HMMs can do, IEICE Transactions on Information and Systems. (2006) E89-D, no. 3, 869–891, https://doi.org/10.1093/ietisy/e89-d.3.869, 2-s2.0-33645791324.
10.1093/ietisy/e89-d.3.869
Web of Science® Google Scholar
12 Baum L. E. and Petrie T., Statistical inference for probabilistic functions of finite state Markov chains, The Annals of Mathematical Statistics. (1966) 37, no. 6, 1554–1563, https://doi.org/10.1214/aoms/1177699147, MR0202264.
10.1214/aoms/1177699147
Web of Science® Google Scholar
13 Rabiner L. R. and Juang B.-H., An introduction to hidden Markov models, IEEE ASSP Magazine. (1986) 3, no. 1, 4–16, https://doi.org/10.1109/MASSP.1986.1165342, 2-s2.0-0022594196.
10.1109/MASSP.1986.1165342
Google Scholar
14 Mari J.-F., Haton J.-P., and Kriouile A., Automatic word recognition based on second-order hidden Markov models, IEEE Transactions on Speech and Audio Processing. (1997) 5, no. 1, 22–25, https://doi.org/10.1109/89.554265, 2-s2.0-0030779362.
10.1109/89.554265
Web of Science® Google Scholar
15 Mari J.-F. and Le Ber F., Temporal and spatial data mining with second-order hidden markov models, Soft Computing. (2006) 10, no. 5, 406–414, 2-s2.0-31144440845, https://doi.org/10.1007/s00500-005-0501-0.
10.1007/s00500-005-0501-0
Web of Science® Google Scholar
16 Lee L.-M., High-order hidden markov model and application to continuous mandarin digit recognition, Journal of Information Science and Engineering. (2011) 27, no. 6, 1919–1930, ZBL1260.68357, 2-s2.0-80055082298.
Web of Science® Google Scholar
17 du Preez J. A., Efficient training of high-order hidden Markov models using first-order representations, Computer Speech and Language. (1998) 12, no. 1, 23–39, https://doi.org/10.1006/csla.1997.0037, 2-s2.0-0031651793.
10.1006/csla.1997.0037
Web of Science® Google Scholar
18 Hadar U. and Messer H., High-order hidden Markov models—estimation and implementation, Proceedings of the 15th IEEE/SP Workshop on Statistical Signal Processing (SSP ′09), September 2009, Cardiff, Wales, 249–252, https://doi.org/10.1109/SSP.2009.5278591, 2-s2.0-72349095360.
10.1109/SSP.2009.5278591
Google Scholar
19 Engelbrecht H. A. and du Preez J. A., Efficient backward decoding of high-order hidden Markov models, Pattern Recognition. (2010) 43, no. 1, 99–112, https://doi.org/10.1016/j.patcog.2009.06.004, 2-s2.0-68949131093.
10.1016/j.patcog.2009.06.004
Web of Science® Google Scholar
20 Ye F., Yi N., and Wang Y. F., EM algorithm for training high-order hidden Markov model with multiple observation sequences, Journal of Information and Computational Science. (2011) 8, no. 10, 1761–1777, 2-s2.0-80355135326.
Google Scholar

Citing Literature

All articles

A Novel Method for Decoding Any High-Order Hidden Markov Model

Abstract

1. Introduction

2. High-Order Hidden Markov Model and Hadar’s Transformation

3. Methodology

4. Conclusions

Conflict of Interests

Acknowledgments

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

A Novel Method for Decoding Any High-Order Hidden Markov Model

Abstract

1. Introduction

2. High-Order Hidden Markov Model and Hadar’s Transformation

3. Methodology

4. Conclusions

Conflict of Interests

Acknowledgments

References

Citing Literature

References

Related

Information