Speech Coding
Channasandra Ravishankar,
Spiros Dimolitsas,
Channasandra Ravishankar
Hughes Network Systems, Germantown, MD
Search for more papers by this authorSpiros Dimolitsas
Lawrence Livermore National Laboratory, Livermore, CA
Search for more papers by this authorChannasandra Ravishankar,
Spiros Dimolitsas,
Channasandra Ravishankar
Hughes Network Systems, Germantown, MD
Search for more papers by this authorSpiros Dimolitsas
Lawrence Livermore National Laboratory, Livermore, CA
Search for more papers by this authorAbstract
The sections in this article are
- 1 Waveform Coders
- 2 Parametric Speech Coders
- 3 Excitation Modeling
- 4 Transformed-Domain Speech Coding
- 5 Speech Coding Standards
- 6 Speech Coder Performance Assessment
- 7 Concatenated Speech Coding
- 8 Future Trends in Speech Coding
- 9 Conclusions
Bibliography
- 1
J. Flanagan
Speech Analysis, Synthesis and Perception,
New York:
Springer-Verlag,
1972.
10.1007/978-3-662-01562-9 Google Scholar
- 2
H. Dudley
Remaking speech,
J. Acoust. Soc. Amer.,
11:
169–177,
1939.
10.1121/1.1916020 Google Scholar
- 3 G. Fant Acoustic Theory of Speech Production, Gravenhage, The Netherlands: Mouton, 1960.
- 4
W. Koemig
H. K. Dunn
L. Y. Lacy
The sound spectrograph,
J. Acoust. Soc. Amer.,
17:
19–49,
1946.
10.1121/1.1916342 Google Scholar
- 5 P. E. Papamichalis Practical Approaches to Speech Coding, Englewood Cliffs, NJ: Prentice-Hall, 1987.
- 6 ITU-T Recommendation G.711, Pulse Code Modulation (PCM) for Voice Frequencies, (Red Book), Malaga-Torremolinos, 1984.
- 7 ITU-T Recommendation G.726, 40-, 32-, 24-, and 16-kb/s Adaptive Differential Pulse Code Modulation (Blue Book), Geneva, 1990.
- 8 ITU-T Recommendation G.727. 5-, 4-, 3-, and 2-bits per sample Embedded Adaptive Differential Pulse Code Modulation (Blue Book), Geneva, 1991.
- 9 L. R. Rabiner R. W. Schafer Digital Processing of Speech Signals, Englewood Cliffs, NJ: Prentice-Hall,1975.
- 10 W. P. LeBlanc et al. Efficient search and design procedures for robust multi-stage VQ of LPC parameters for 4 kbps speech coding, IEEE Trans. Speech Audio Process., 1: 373–385, 1993.
- 11 P. Kabel R. Ramachandran The computation of line spectral frequencies using Chebychev polynomials, IEEE Trans. Acoust., Speech Signal Process., ASSP-34: 1419–1426, 1986.
- 12 K. Paliwal B. Atal Efficient vector quantization of LPC parameters at 24 bits/frame, Proc. Int. Conf. Acoust., Speech Signal Process., 1991, pp. 661–663.
- 13 C. S. Ravishankar B. R. U. Bhaskar S. Dimolitsas A 1200 bps voice coder based upon split VQ of line spectral frequencies, Proc. 1993 IEEE Speech Coding Workshop, St. Adele, 1993, pp. 37–38.
- 14 R. Schafer J. Markel Speech Analysis, New York: IEEE Press, 1979.
- 15
W. Hess
Pitch Determination of Speech Signal,
New York:
Springer-Verlag,
1983.
10.1007/978-3-642-81926-1 Google Scholar
- 16 B. S. Atal L. R. Rabiner A pattern recognition approach to voiced–unvoiced–silence classification with applications to speech recognition, IEEE Trans. Acoust., Speech Signal Process., ASSP-24: 201–212, 1976.
- 17 J. P. Campbell T. E. Tremain Voiced/unvoice classification of speech with applications to US Government LPC-10E algorithm, Proc. ICASSP, Tokyo, 1986. pp. 472–476.
- 18 C. K. Un D. T. Magill The residual-excited linear prediction vocoder with transmission below 9.6 Kb/s, IEEE Trans. Commun., COM-23: 1466–1473, 1995.
- 19 J. Makhoul et al. A mixed source model for speech compression and synthesis, J. Acoust. Soc. Amer., 64: 1577–1581, 1978.
- 20 A. McCree T. Barnwell, III A new mixed excitation LPC vocoder, Proc. ICASSP, 1991, pp. 593–596.
- 21 V. Viswanathan et al. A harmonic deviations linear predictive vocoder for improved narrowband speech transmission, Proc. ICASSP, 1982, pp. 610–613.
- 22 C. S. Ravishankar B. R. U. Bhaskar S. Dimolitsas A 1200 bps voice coder based upon alternate transmission of LPC and residual information, Proc. 1995 IEEE Speech Coding Workshop, Annapolis, MD, 1995, pp. 111–112.
- 23 J. B. Allen S. T. Neely Micromechanical models of the cochlea, Phys. Today, 45 (7): 40–47, 1992.
- 24 B. Atal J. Remde A new model for LPC excitation for producing natural sounding speech at low bit rates, Proc. Int. Conf. Acoust., Speech Signal Process., 1982, pp. 614–617.
- 25 P. Kroon E. F. Deprettere R. J. Sluyter Regular-pulse excitation—A novel approach to effective and efficient multipulse coding of speech, IEEE Trans. Acoust. Speech Signal Process., ASSP-34: 1054–1063, 1986.
- 26 M. R. Schroeder B. S. Atal Code excited linear prediction (CELP): High quality speech at very low bit rates, Proc. Int. Conf. Acoust., Speech Signal Process., 1985, pp. 937–940.
- 27
R. Crochiere
L. Rabiner
Multirate Digital Signal Processing,
Englewood Cliffs, NJ:
Prentice-Hall,
1983.
10.1016/0165-1684(83)90013-0 Google Scholar
- 28 P. P. Vaidyanathan Quadrature mirror filter banks for M-band extensions and perfect reconstruction techniques, Acoust. Speech Signal Process Mag., 4 (3): 4–20, 1987.
- 29 R. V. Cox et al. New directions in sub-band coding, IEEE Trans. Sel. Areas Commun., 6: 391–409, 1988.
- 30 D. W. Griffin J. S. Lim A new model based speech analysis/synthesis system, Proc. Int. Conf. Acoust., Speech Signal Process., 1985, pp. 513–516.
- 31 S. Dimolitsas et al. Evaluation of voice codec performance for the Inmarsat mini-M system, Proc., 10th Int. Digital Satellite Conf., Brighton, England, 1995.
- 32 S. Dimolitsas F. L. Corcoran C. Ravishankar Voice transmission quality of mobile satellite communications systems, Int. J. Satellite Commu., 12: 361–368, 1994.
- 33 R. McAulay T. Quatieri Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acous. Speech Signal Process., ASSP-34: 744, 1986.
- 34 R. Zelinski P. Noll Adaptive transform coding speech signals, IEEE Trans. Acoust. Speech Signal Process., ASSP-25: 299–309, 1977.
- 35
C. S. Ravishankar
S. Dimolitsas
Voice coding technology for digital aeronautical communications,
Air Traffic Control Q.,
4 (3):
197–221,
1997.
10.2514/atcq.4.3.197 Google Scholar
- 36 S. Dimolitsas F. L. Corcoran C. Ravishankar Correlation between headphone and telephone-handset listener opinion scores for single stimulus voice coder assessments, IEEE Lett. Signal Process., 2 (3): 41–43, 1995.
- 37 ITU-T Recommendation G.763, Digital Circuit Multiplication Equipment Using 32 kb/s ADPCM and Digital Speech Interpolation, Geneva, 1991.
- 38 J. H. Chen R. Cox The creation and evolution of 16 kbps LD-CELP: From concept to standard, in Speech Communication, Amsterdam: Elsevier/North-Holland, 1993, pp. 103–111.
- 39 Y. Linde A. Buzo A. Gray An algorithm for vector quantizer design, IEEE Trans. Commun., COM-28: 84–95, 1980.
- 40 J. R. B. De Marca N. S. Jayant An algorithm for assigning binary indices to the codevectors of a multi-dimensional quantizer, Proc. Int. Conf. Commun., Seattle, WA, pp. 1128–1132, 1987.
- 41 R. Salami et al. Description of the proposed ITU-T 8 kb/s speech coding standard, Proc. 1995 IEEE Speech Coding Workshop, Annapolis, MD, 1995, pp. 3–5.
- 42 S. Dimolitsas C. S. Ravishankar G. Schröder Current objectives for 4 kbit/s wireline-quality speech coding standardization, IEEE Lett. Signal Process., 1 (11): 157–159, 1994.
- 43 I. Gerson M. Jasiuk Vector sum excited linear prediction (VSELP) speech coding at 8 Kb/s, Proc. Int. Conf. Acoust., Speech Signal Process., Albuquerque, NM, 1990, pp. 461–464.
- 44 K. Hellwig et al. Speech coder for the European mobile radio system, Proc. GLOBECOM, Dallas, TX, 1989, pp. 1065–1069.
- 45 J. P. Campbell, Jr. T. E. Tremain V. C. Welch The DoD 4.8 kb/s standard (the proposed federal standard FS1016), in B. S. Atal, V. Cuperman, and A. Gersho (eds), Advances in Speech Coding, Norwell, MA: Kluwer, 1991, pp. 121–133.
- 46 A. McCree et al. A 2.4 kbps MELP coder candidate for the new US federal standard, Proc. Int. Conf. Acoust., Speech Signal Process., 1996, pp. 200–203.
- 47 T. E. Tremain The government standard linear predictive coding algorithm: LPC-10, Speech Technol., 1 (2): 40–49, 1982.
- 48 D. L. Richards Telecommunications by Speech,New York: Wiley, 1973.
- 49 S. Dimolitsas Subjective assessment methods for the measurement of digital speech coder quality, in B. S. Atal, V. Cuperman, and A. Gersho (eds.), Speech and Audio Coding for Wireless Applications, Norwell, MA: Kluwer, 1992.
- 50 ITU-T Recommendation G.131, Stability and Echo (Red Book), Malaga Torremonilos, 1984, Vol. III.1, pp. 183–194.
- 51 ITU-T Recommendation G.164, Echo Suppressors (Red Book), Malaga Torremonilos, 1984, Vol. III.1, pp. 225–258.
- 52 IEEE Recommended Practice for Speech Quality Measurements, IEEE Trans. Audio Electroacoust., AU-17: 225–246, 1969.
- 53 S. Dimolitsas F. L. Corcoran M. Baraniecki Transmission quality of North American cellular, personal communications, and public switched telephone networks, IEEE Trans Veh. Techol., 32: 245–251, 1994.
- 54 S. Dimolitsas F. L. Corcoran C. Ravishankar Voice quality of interconnected PCS, Japanese cellular, and public switched telephone networks, Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Detroit, MI, 1995, pp. 273–276.
- 55 S. Dimolitsas F. L. Corcoran C. Ravishankar Voice quality of interconnected North American cellular, European cellular, and public switched telephone networks, Proc. IEEE Veh. Technol. Conf., VTC'95, Chicago, 1995, pp. 719–722.
- 56 W. B. Kleijn Encoding speech using prototype waveforms, IEEE Trans. Speech Audio Process., 1: 386–399, 1993.
- 57 P. Lupini V. Cuperman Vector quantization of harmonic magnitudes for low-rate speech coders, Proc. GLOBECOM, 1994, pp. 858–862.
- 58 V. Vaishampayan N. Farvardin Joint design of block source codes and modulation signal sets, IEEE Trans. Inf. Theory, 38: 1230–1248, 1992.
- 59
S. Hong
P. K. M. Ho
V. Cuperman
Combined speech and channel coding for mobile radio communications,
IEEE Trans. Veh. Technol.,
43:
1078–1087,
1994.
10.1109/25.330171 Google Scholar
- 60 A. Husain V. Cuperman Reconstruction of missing packets for CELP based speech coders, Proc. Int. Conf. Acoust., Speech Signal Process., 1995, pp. 245–248.
- 61 ITU-T Recommendation G.722, 7 kHz Audio Coding within 64 kbps, Melbourne (Blue Book), 1988.
- 62 ITU-T Recommendation P.800, Methods of Subjective Determination of Transmission Quality, 1996.
- 63 ITU-T Recommendation P.861, Objective Quality Measurement of Telephone Band (300–3400 Hz) Speech Coders, 1996.
- 64 N. S. Jayant P. Noll Digital Coding of Waveforms, Englewood Cliffs, NJ: Prentice Hall, 1984.
- 65 J. Makhoul Linear prediction: A tutorial review, Proc. IEEE, 63 (4): 561–580, 1975.
- 66 B. S. Atal S. L. Hanauer Speech analysis and synthesis by linear prediction of the speech wave, J. Acoustical Society of America, 50: 637–655, 1971.
- 67 F. I. Itakura S. Saito Analysis-synthesis telephony based on the maximum likelihood method, Proc. 6th Int. Cong. Acous., Tokyo, Japan, 1968, pp. C17–20.
- 68 B. S. Atal M. R. Schroeder Stochastic coding of speech signals at very low bit rates, Proc. Int. Conf. Commun., 1984, pp. 1610–1613.
- 69 W. B. Kleijn D. J. Krasinski R. H. Ketchum Improved speech quality and efficient vector quantization in SELP, Proc. Int. Conf. Acoust., Speech Signal Process., 1988, pp. 155–158.
- 70 D. Lin New approaches to stochastic coding of speech sources at very low bit rates, in Signal Processing III: Theories and Applications, Elsevier-North Holland, 1986.
- 71 G. Davidson A. Gersho Complexity reduction methods for vector excitation coding, Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 1986, pp. 3055–3058.
- 72 J. P. Adoul et al. Fast CELP Coding Based on Algebraic Codes, Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 1987, pp. 1957–1960.
- 73
A. Gersho
R. M. Gray
Vector Quantization and Signal Compression,
Norwell, MA:
Kluwer,
1992.
10.1007/978-1-4615-3626-0 Google Scholar
Citing Literature
Wiley Encyclopedia of Electrical and Electronics Engineering
Browse other articles of this reference work: