Characterization of speech waveforms

by Sang Huy Ta

Institution: California State University – Northridge
Department: Department of Engineering
Degree: MS
Year: 1979
Keywords: Speech Acoustics.; Dissertations, Academic  – CSUN  – Engineering
Record ID: 1545324
Full text PDF: http://hdl.handle.net/10211.3/125142


Speech signals are a random process which can be voiced or unvoiced, oral and/or nasal. The smallest element of speech is called a phoneme. The phonemes are combined to form meaningful speech sounds. It is well known that speech can be recorded and reproduced by means of a recording disc or a magnetic tape, but for efficient speech storage, compression and transmission; for speech automatic verification and identification; for voice response system; for speech synthesis and other digital processings, the waveform of speech needs to be characterized. Among the various methods of characterization such as mathematical, statistical, physical, and analysis-by-synthesis methods, the characterization of speech waveforms by linear predictive coefficients is seen to be the best method. This method is considered in details with the concept of linear predication and its formulation. The linear predictor is simply a form of the recursive digital filter with coefficients that are to be estimated by least mean squares method in order to simulate speech. The number of coefficients or poles of the filter is determined by the desired degree of simulation and is shown to be related to the vocal tract length. In the estimation of the linear predictive coefficients, the autocorrelation functions of each characterized speech segment must be computed first using the principle of sample autocorrelation function. An iterative computation algorithm is developed to find the predictive coefficients. The choice of appropriate window function and speech frame size, and the derivation of other characterizing parameters such as voice pitch and formant frequencies from the linear predictive coefficients are discussed. An experiment using a microprocessor to compute the linear predictive coefficients is considered. Each windowed speech frame is represented by a set of filter coefficients which, in the strict sense, should make the simulated speech at the output of the filter indistinguishable from the natural speech in the resynthesis process, although the signal at the input of the filter is only a train of impulses or white noise.