A Semantic and Detection‐Based Approach to Speech and Language Processing - Semantic Computing - Wiley Online Library

This chapter presents a new formulation that tightly integrates the detection - based algorithm into the maximum a posteriori (MAP) decision. The key to this formulation is to implement the sequential detection algorithm and to recurrently apply the sequential probability ratio test in a time - synchronous, single - pass decoding framework. The chapter shows that realizing the detection - based recognition in single - pass architecture is feasible. It provides an overview of the mathematical foundation of this approach, serving as an introduction to the general detection - based approach for computer processing of speech and language. This overview starts with the conventional fixed - sample - size detection, which then naturally extends to sequential detection theory. Finally, it presents a comprehensive case study on how the sequential detection technique is successfully applied to a speech understanding task that is related to personal information management.

Controlled Vocabulary Terms

natural language processing; speech processing

REFERENCES

R. C. Guido et al., Spoken document summarization based on dynamic time warping and wavelets, Int. J. Semantic Comput., 1: 347–357, 2007.
10.1142/S1793351X07000214
Google Scholar
F. Jelinek, L. Bahl, and R. Mercer, Design of a linguistic statistical decoder for the recognition of continuous speech, IEEE Trans. Inform. Theory, May 1975, pp. 250–256.
10.1109/TIT.1975.1055384
Web of Science® Google Scholar
B.-H. Juang and S. Furui, Automatic recognition and understanding of spoken language — A first step toward natural human-machine communication, Proc. IEEE, August 2000, pp. 1142–1165.
10.1109/5.880077
Web of Science® Google Scholar
Y. Wang, L. Deng, and A. Acero, An introduction to the statistical framework of spoken language understanding, IEEE Signal Process. Mag., 22 (5): 16–31, 2005.
10.1109/MSP.2005.1511821
CAS Web of Science® Google Scholar
X. D. Huang, A. Acero, and H. Hon, Spoken Language Processing, Prentice Hall, Englewood Cliffs, NJ, 2001.
Google Scholar
L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993.
Google Scholar
L. Deng and Doug O'Shaughnessy, Speech Processing: A Dynamic and Optimization - Oriented Approach, Marcel Dekker, New York, 2003.
Google Scholar
H. Hon and K. Wang, Unified frame and segment based models for automatic speech recognition, in Proc. ICASSP-2000, Istanbul, Turkey, 2000, 2, pp. 1017–1020.
Google Scholar
T. Kawahara, C. H. Lee, and B. H. Juang, Flexible speech understanding based on combined key-phrase detection and verification, IEEE Trans. Speech Audio Process., November 1998, pp. 558–568.
10.1109/89.725322
Web of Science® Google Scholar
J. Allen, How do humans processing and recognize speech? IEEE Trans. Speech Audio Process., October 1994, pp. 567–577.
10.1109/89.326615
Web of Science® Google Scholar
S. Furui, On the role of spectral transition for speech perception, J. Acoust. Soc. Am., 80: 1016–1025, 1986.
10.1121/1.393842
CAS PubMed Web of Science® Google Scholar
A. Houtsma, T. Rossing T., and W. Wagenaars, Auditory Demonstrations, Institute for Perception Research (IPO), Eindhoven, Netherlands, and the Acoustical Society of America, New York, 1987.
Google Scholar
G. Miller and P. Nicely, An analysis of perceptual confusions among some English consonants, J. Acoust. Soc. Am., 22: 338–352, 1955.
10.1121/1.1907526
Web of Science® Google Scholar
K. Wang and S. Shamma, Spectral shape analysis in the central auditory system, IEEE Trans. Speech Audio Process., September 1995, pp. 382–395.
10.1109/89.466657
Web of Science® Google Scholar
K. Stevens, Toward a model for lexical access based on acoustic landmarks and distinctive features, J. Acoust. Soc. Am., 111: 1872–1891, 2002.
10.1121/1.1458026
PubMed Web of Science® Google Scholar
K. Stevens, Diverse acoustic cues at consonantal landmarks, Phonetica, 57: 139–151, 2000.
10.1159/000028468
CAS PubMed Web of Science® Google Scholar
K. Stevens, On the quantal nature of speech, J. Phonet., 17: 3–45, 1989.
Web of Science® Google Scholar
K. Stevens, Acoustic Phonetics, MIT Press, Cambridge, MA, 1998.
Google Scholar
J. Li and C.-H. Lee, On designing and evaluating speech event detectors, in Proc. Interspeech, Lisbon, Portugal, September 2005, pp. 3365–3368.
Google Scholar
R. Niyogo, P. Mitra, and M. Sondhi, A detection framework for locating phonetic events, in Proc. ICSLP-1998, Sydney Australia, 1998, paper 0665.
Google Scholar
NSF Symposium on Next-Generation Automatic Speech Recognition, Atlanta, GA, October 7–8, 2003, available: http://users.ece.gatech.edu/∼chl/ngasr03/.
Google Scholar
C.-H. Lee, From knowledge-ignorant to knowledge-rich modeling: A new speech research paradigm for next-generation automatic speech recognition, in Proc. ICSLP-2004, Jeju Island, October 2004, pp. 109–111.
Google Scholar
K. Wang and D. Goblirsch, Extracting dynamic features using the stochastic matching pursuit algorithm for speech event detection. in Proc. IEEE ASRU Workshop, Santa Barbara, CA, 1997, pp. 132–139.
Google Scholar
S. M. Kay, Fundamentals of Statistical Signal Processing — Detection Theory, Prentice Hall, Englewood Cliffs, NJ, 1998.
Google Scholar
A. Wald, Sequential Analysis, Wiley, New York, 1947.
Google Scholar
C. Guo and A. Kuh, Temporal difference learning applied to sequential detection, IEEE Trans. Neural Networks, 8: 278–287, 1997.
10.1109/72.557666
CAS PubMed Web of Science® Google Scholar
C. Lee and J. Thomas, A modified sequential detection procedure, IEEE Trans. Inform. Theory, 30: 16–23, 1984.
10.1109/TIT.1984.1056850
Web of Science® Google Scholar
K. Wang, Semantic object synchronous decoding in SALT for highly interactive speech interface, in Proc. Eurospeech-2003, Geneva, Switzerland, 2003.
Google Scholar
K. Wang, A detection based approach to robust speech understanding, in Proc. ICASSP-2004, Montreal, Canada, May 2004, pp. 413–416.
Google Scholar
K. Wang, A study on semantic synchronous understanding on speech interface design, in Proc. UIST-2003, Vancouver, BC, 2003.
Google Scholar
K. Wang, Semantics synchronous understanding for robust spoken language applications, in Proc. Automatic Speech Recognition and Understanding Workshop, U.S. Virgin Islands, December 2003, pp. 640–645.
Google Scholar
L. Deng, K. Wang, A. Acero, H. Hon, J. Droppo, C. Boulis, Y. Wang, D. Jacoby, M. Mahajan, C. Chelba, and X. D. Huang, Distributed speech processing in MiPad's multimodal user interface, IEEE Trans. Speech Audio Process., 10 (8): 605–619, 2002.
Web of Science® Google Scholar
J. Bussgang and D. Middleton, Optimal sequential detection of signals in noise, IRE Trans. Inform. Theory, 1: 5–18, 1955.
10.1109/TIT.1955.1055137
Web of Science® Google Scholar
L. Deng and C. D. Geisler, Responses of auditory-nerve fibers to nasal consonant - vowel syllables, J. Acoust. Soc. Am., 82: 1977–1988, 1987.
10.1121/1.395642
CAS PubMed Web of Science® Google Scholar
L. Deng and C. D. Geisler, A composite auditory model for processing speech sounds, J. Acoust. Soc. Am., 82: 2001–2012, 1987.
10.1121/1.395644
CAS PubMed Web of Science® Google Scholar
S. Greenberg, W. Ainsworth, A. Popper, and R. Fay (Eds.), Speech Processing in the Auditory System, Springer, New York, 2004.
Google Scholar
C. W. Helstrom, Elements of Signal Detection and Estimation (Chapter 9), Prentice Hall, Englewood Cliffs, NJ, 1995.
Google Scholar
X. D. Huang et al., MiPad: A next generation PDA prototype, in Proc. ICSLP-2000, Beijing China, October 2000, VIII, pp. 33–36.
Google Scholar
Johns Hopkins University CLSP Summer Workshop on Landmark-Based Speech Recognition, Baltimore, MD, June–August 2004, available: http://www.clsp.jhu. edu/ws2004/groups/ws04ldmk.
Google Scholar
S. Keyser and K. Stevens, Feature geometry and the vocal tract, Phonology, 11: 207–236, 1994.
10.1017/S0952675700001950
Google Scholar
H. V. Poor, An Introduction to Signal Detection and Estimation, Springer-Verlag, New York, 1988.
10.1007/978-1-4757-3863-6
Google Scholar
H. Sheikhzadeh and L. Deng, A layered neural network interfaced with a cochlear model for the study of speech encoding in the auditory system, Computer Speech Lang., 13: 39–64, 1999.
10.1006/csla.1998.0049
Web of Science® Google Scholar
W. Strange, J. Jenkins, and T. Johnson, Dynamic specification of coarticulated vowels, J. Acoust. Soc. Am., 74: 695–705, 1983.
10.1121/1.389855
CAS PubMed Web of Science® Google Scholar
S. Zacks, Parametric Statistical Inference — Basic Theory and Modern Approaches (Chapter 4), Pergamon, Oxford, England, 1981.
Google Scholar

Citing Literature

Semantic Computing