HMM (Hidden Markov Model) can be model as

a sequence of feature vector quite accurately where as GMM (Gaussian Mixture Model)

takes only single feature vector corresponding to single frame. HMM is efficient

for text-dependent and GMM is good for text-independent task. The pattern

matching is shown above in Fig. 2.3.

Different number of classes defined for

each speech frame first like unvoiced or voiced then it will use vector

quantization in order to group feature vector according to their similarity and

the last phase will use speech sounds information.

In the training phase pattern matching

algorithm will use whole training feature vectors to make speaker models. Per

voice and per speaker for each model will be created. The model will be called

initial model architecture and re-estimating and values them accordingly.

Lastly, in the testing phase, feature

vectors will be evaluated with the past trained models and likelihood score

i.e. the probability of the given voice sound arises by speaker model will be

outputted for each speakers 72.

The

patter matching algorithm used in this thesis to function text-dependent data.

Thus, stochastic model are used with the combinations of GMMs and HMMs. Some

speaker independent sequential information is quite interesting for text-dependent

speaker identification and verification which will b extracted from the random

feature vector sequence.

2.5 Gaussian Mixture

Model (GMM)

The GMM was first introduced by Rose and

Reynold, 1995 and is mainly used speaker model because it has ability to model

random choice shaped probability density functions (pdfs) using superposition

of multivibrate Gaussian. For the diagonal covariance matrix this is even true

when the loss in expressible induced by the Gaussians being restricted to a

circular area can be suffering using more Gaussians. On using diagonal

covariance will help to boost recognition performance less parameters of the

model can be estimated more comfortably from the limited training data. The

main reason for choosing such model formulation is that each mixture models an

underlying large speech sounds class present in a speaker voice.

GMM consists of a mixture with M

Gaussians, where M completely depends non-linearly on the context and size of

the training data provided by the user.

A

typical value of M is 32 for characteristics feature dimensions for the range

of 12 to 26. D-dimensional employs for each mixture with mean vector

and

diagonal covariance vector

, weighted by a factor ‘w’ so that the overall

mass is 1and models forms a distribution. The