On the use of Empirically Determined Impulse Responses for Improving Distant Talking Speech Recognition

T. Pl{\"o}tz AND G. A. Fink
Proc. Joint Workshop on Hands-free Speech Communication and Microphone Arrays, pages 156-159, Trento, Italy, 2008.

The effectiveness of distant talking speech recognition substantially relies on the quality of the recorded signals. Since the original signal is usually distorted by either additive noise or reverberations speech enhancement represents an important pre-processing step. For dynamic human-machine interaction scenarios with frequently changing acoustic conditions and speakers uttering only short portions of speech at different locations standard approaches to speech enhancement (namely cepstral mean normalization or blind deconvolution) might fail. Addressing distant talking speech recognition in a smart house we investigate the effectiveness of empirically determined impulse responses for improving distant talking speech recognition. Within an interaction scenario users are asked to perform some impulse like signal (clapping, snapping etc.) prior to every utterance. The determined response of the detected impulse is used for deconvolution based speech enhancement. In an experimental evaluation we investigate the effectiveness of the approach in certain variants.

