Abstract
Many speech recognition systems use mel-frequency cepstral coefficient (MFCC) feature extraction as a front end. In the algorithm, a speech spectrum passes through a filter bank of mel-spaced triangular filters, and the filter output energies are log-compressed and transformed to the cepstral domain by the DCT. The spacing of filter bank center frequencies mimics the known warped-frequency characteristics of the human auditory system, yet the bandwidths of these filters are not chosen through biological inspiration. Instead, they are set by aligning endpoints of the triangle, which is itself an arbitrary shape. It is surprising that for such a popular speech recognition front end, proper analysis or optimization of the filter bandwidths has not been performed. With complex cochlear models, realistic filter shapes that more closely approximate critical bands are used. And these filters, compared to the filters used in MFCC, are considerably wider and overlap with neighboring filters more. We have extended this filter characteristic to the MFCC algorithm and found that the increased filter bandwidth improves recognition performance in clean speech and provides added noise robustness as well.
Recommended Citation
M. D. Skowronski and J. G. Harris, "Increased MFCC Filter Bandwidth For Noise-robust Phoneme Recognition," ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol. 1, p. I/804, Institute of Electrical and Electronics Engineers, Jan 2002.
The definitive version is available at https://doi.org/10.1109/icassp.2002.5743839
Department(s)
Electrical and Computer Engineering
International Standard Serial Number (ISSN)
1520-6149
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2025 Institute of Electrical and Electronics Engineers, All rights reserved.
Publication Date
01 Jan 2002
