Investigating the effect of speech features and the number of HMM mixtures in the quality HMM-based synthesizers

Document Type : Original Article

Authors

1 Modern Academy.

2 Military Technical College.

3 Faculty of computer and information sciences, Ain shams University.

Abstract

Abstract:
A statistical parametric speech synthesis system based on hidden Markov models
(HMMs) has grown in popularity over the last few years. In this approach the system
simultaneously models spectrum, excitation, and duration of speech using contextdependent
HMMs and generates speech waveforms from the HMMs themselves. This
paper describes the HMM-based speech synthesis system and applies it to Arabic
language using small size training speech database as an example, and shows that the
resulting model database has the advantage of being small (can be less than 1MB).
Experiments show that using Mel-cepstral coefficients as spectral parameters of speech
waveforms for training gives better results than using LPC or PARCOR coefficients.
Experiments also show that increasing the number of Gaussian Mixtures with this
relatively small size training data has the disadvantage of poor generalization of HMMs
that leads to perceivable discontinuities and clicks in the synthesized speech.

Keywords