Master ThesisConnectionist Temporal Classification for Robust Speech Recognition ApplicationsAdvisors: Prof. Hervé Bourlard & Dr. Wilhelm Hagg AbstractAutomatic Speech Recognition (ASR) is the task of converting a speech signal into the sequence of words that it incorporates, by the help of an algorithm that is implemented on computers or computerized devices. Today, ASR systems find applications in a wide range of domains, including automatic call processing in telephone networks, automatic query-based information systems, human-to-machine interaction, and many others. One of the most popular approaches followed in ASR systems is to model the temporal variability of speech signals using hiddenMarkovModel (HMM) states. In the recent years, significant progress has been made into the acoustic modelling of these states based on the training of context-dependent deep neural networks (DNN’s) with observed feature vectors extracted from the signals. The training procedure of a traditional DNN-based acoustic model requires first to obtain the state alignments of the input feature frames with the HMMphoneme states, because traditional objective functions require a network output target value for every input feature vector. Connectionist Temporal Classification (CTC) overcomes this requirement. CTC is a cost function that is well-suited for sequence labelling tasks, where large sequences of input data (e.g. input feature vectors of a speech signal) are transcribed with smaller sequences of discrete labels (e.g. phoneme labels spoken in the speech stream). On the other hand, another major factor that degrades performance of current ASR systems is background noise and reverberation in the recorded speech.
Download Links[Thesis] |