Continuous speech recognition

Introduction

Voice identification is a key technology for human-computer interaction, and has achieved rapid progress in the past few decades. The traditional acoustic modeling mode is based on the hidden Markov framework, and the probability distribution of the speech acoustic characteristics is described with a mixed Gaussian MixTureModel (GMM). Since the hidden Markov model belongs to a typical shallow learning structure, only a single structure that converts the original input signal to a specific problem space feature, its performance is restricted under massive data.

Continuous speech recognition refers to identifying a continuous audio stream (ie, voice from speech directly, or audio signals in the phone or other audio and video domain), automatically converts audio information to text. In the input sound, reliable voice, exclude mute, background noise, music, etc., to determine men and women, and send the speech recognition decoder in real time.

method

1, implicit Markov model

Continuous speech recognition

hidden Markov model is a statistical model, which is used to describe 1 implied The Markov process of unknown parameters is widely used in speech recognition. 1 The system first generates a speech model by a large number of words, then extracts the acoustic characteristics, and the identification result is obtained after Viterbi decoding.

2, a method based on convolutional neural network

Convolution Neural network analyzes local features through the convolver, and enhanced the characteristic robustness extracted by the polymerization layer, finally Establish a model by all network layers to get the final classification result. The convolutional neural network is observed by the convolution layer, and then the information integration of the entire network layer is finally obtained, which has better physical significance than the deep neural network.

Application

1. In the security field, the relevant department combines related services to propose corresponding needs; in the field of education, the general-speaking level test and oral assessment of Mandarin in the educational field urgently needs objective, Automatic evaluation technology;

2, in the field of telecommunications, domestic and foreign speech recognition technology and departments have entered the Chinese market;

3, in embedded markets such as mobile phone, car navigation, etc. Speech recognition technology is also growing;

4, in the field of human machine, voice partners, mobile terminal speech search, etc. Wide application.

Therefore, speech recognition technology has a very broad prospect of technology as a very important human-computer interaction.

Related Articles
TOP