HMM (Hidden Markov Model) can be model as
a sequence of feature vector quite accurately where as GMM (Gaussian Mixture Model)
takes only single feature vector corresponding to single frame. HMM is efficient
for text-dependent and GMM is good for text-independent task. The pattern
matching is shown above in Fig. 2.3.
Different number of classes defined for
each speech frame first like unvoiced or voiced then it will use vector
quantization in order to group feature vector according to their similarity and
the last phase will use speech sounds information.
In the training phase pattern matching
algorithm will use whole training feature vectors to make speaker models. Per
voice and per speaker for each model will be created. The model will be called
initial model architecture and re-estimating and values them accordingly.
Lastly, in the testing phase, feature
vectors will be evaluated with the past trained models and likelihood score
i.e. the probability of the given voice sound arises by speaker model will be
outputted for each speakers 72.
patter matching algorithm used in this thesis to function text-dependent data.
Thus, stochastic model are used with the combinations of GMMs and HMMs. Some
speaker independent sequential information is quite interesting for text-dependent
speaker identification and verification which will b extracted from the random
feature vector sequence.
2.5 Gaussian Mixture
The GMM was first introduced by Rose and
Reynold, 1995 and is mainly used speaker model because it has ability to model
random choice shaped probability density functions (pdfs) using superposition
of multivibrate Gaussian. For the diagonal covariance matrix this is even true
when the loss in expressible induced by the Gaussians being restricted to a
circular area can be suffering using more Gaussians. On using diagonal
covariance will help to boost recognition performance less parameters of the
model can be estimated more comfortably from the limited training data. The
main reason for choosing such model formulation is that each mixture models an
underlying large speech sounds class present in a speaker voice.
GMM consists of a mixture with M
Gaussians, where M completely depends non-linearly on the context and size of
the training data provided by the user.
typical value of M is 32 for characteristics feature dimensions for the range
of 12 to 26. D-dimensional employs for each mixture with mean vector
diagonal covariance vector
, weighted by a factor ‘w’ so that the overall
mass is 1and models forms a distribution. The