Linear Predictive Coding (LPC) based vowel speech synthesis using the source-filter model of speech production — implemented in MATLAB.
This project was completed as part of an MSc Artificial Intelligence programme at the University of Surrey.
The source-filter model of speech production treats the vocal tract as a time-varying linear filter excited by a periodic source. This project implements that model by:
- Loading real male and female vowel recordings ("heed")
- Extracting a ~100 ms quasi-stationary segment from the vowel nucleus
- Estimating the mean fundamental frequency F0 using MATLAB's
pitch()function - Fitting an all-pole LPC filter to the segment using the autocorrelation method (
lpc()) - Extracting the first three formant frequencies from the LPC filter roots
- Generating a periodic impulse train at frequency F0 as the synthetic source
- Synthesizing speech by filtering the impulse train through the LPC filter
- Comparing the synthesized signal with the original in both time and frequency domains
LPC-Speech-Synthesizer/ ├── speech_synthesis_female.m # LPC synthesis pipeline for female voice ├── speech_synthesis_male.m # LPC synthesis pipeline for male voice └── README.md
Note: Speech sample
.wavfiles are not included. To reproduce results, download male/female vowel recordings (e.g. from the IViE Corpus) and placeheed_f.wavandheed_m.wavin the same directory as the scripts.
LPC models the vocal tract as an all-pole (AR) filter. Each speech sample is approximated as a linear combination of its past k values, weighted by prediction coefficients estimated via the autocorrelation (Yule-Walker) method. The LPC order k controls how many past samples are used — higher values capture more spectral detail.
Formant frequencies (the resonances of the vocal tract) are extracted from the poles of the LPC filter. Each pole in the z-plane corresponds to a frequency and bandwidth. Roots with bandwidth below 400 Hz and frequency above 90 Hz are classified as formants. The first three formants (F1, F2, F3) characterise the vowel quality.
A 1-second periodic impulse train is generated at the estimated F0 and passed through the LPC all-pole filter. The filter acts as a model of the vocal tract and shapes the flat spectrum of the impulse train into a vowel-like sound. The result approximates the original vowel recording.
| Parameter | Female ("heed") | Male ("heed") |
|---|---|---|
| Mean F0 | ~220 Hz | ~130 Hz |
| F1 (Hz) | ~350 | ~300 |
| F2 (Hz) | ~2200 | ~2100 |
| F3 (Hz) | ~2900 | ~2700 |
| LPC order used | 40 | 40 |
These values are consistent with published formant data for the /iː/ vowel (Peterson & Barney, 1952).
- MATLAB R2019b or later
- Signal Processing Toolbox
- Clone this repository:
git clone https://github.com/BilalAhmadSami/LPC-Speech-Synthesizer.git
cd LPC-Speech-Synthesizer-
Place your input
.wavfiles (heed_f.wav,heed_m.wav) in the project directory, or update theAUDIO_FILEvariable at the top of each script. -
Open MATLAB and run:
% For female voice:
run('speech_synthesis_female.m')
% For male voice:
run('speech_synthesis_male.m')-
When prompted, enter the LPC order
k. A value of 14 is a good starting point; try 40 or 100 to observe how spectral resolution changes. -
The script prints the estimated F0 and formant frequencies, generates five plots, and saves the original and synthesized audio as
.wavfiles.
LPC order k |
Effect |
|---|---|
| ~14 (rule of thumb: Fs/1000 + 2) | Captures main formants; smooth spectral envelope |
| 40 | Better spectral detail; more formant peaks resolved |
| 100+ | Models fine spectral structure; risk of overfitting noise |
Each script produces five figures:
- Speech Waveforms — full signal and the extracted quasi-stationary segment
- Impulse Train — the periodic source signal at frequency F0
- Filter Response vs. Amplitude Spectrum — LPC filter response overlaid with the segment's dB spectrum
- Pole-Zero Diagram — poles of the LPC filter in the z-plane
- Synthesized vs. Original — time-domain comparison of the synthesized and original signals
- Peterson, G. E., & Barney, H. L. (1952). Control methods used in a study of the vowels. Journal of the Acoustical Society of America, 24(2), 175–184.
- Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.
- Rabiner, L. R., & Schafer, R. W. (2010). Theory and Applications of Digital Speech Processing. Pearson.
MSc Artificial Intelligence, University of Surrey