Integration of DNN Based Speech Enhancement and ASR

Ramon F. Astudillo, Joana Correia, Isabel Trancoso

In Proceedings of the 16th Annual Conference of the International Speech Communication Association (Interspeech 2015), pp. 3576-3580, 2015

Integration of DNN Based Speech Enhancement and ASR

Abstract:
Speech enhancement employing Deep Neural Networks (DNNs) is gaining strength as a data-driven alternative to classical Minimum Mean Square Error (MMSE) enhancement approaches. In the past, Observation Uncertainty approaches to integrate MMSE speech enhancement with Automatic Speech Recognition (ASR) have yielded good results as a lightweight alternative for robust ASR. In this paper we thus explore the integration of DNN-based speech enhancement with ASR by employing Observation Uncertainty techniques. For this purpose, we explore various techniques and approximations that allow propagating the uncertainty of inference of the DNN into feature domain. This uncertainty can then be used to dynamically compensate the ASR model utilizing techniques like uncertainty decoding. We test the proposed techniques on the AU- RORA4 corpus and show that notable improvements can be attained over the already effective DNN enhancement.