We define multiresolution elementary tonotopic features (ETFs) in general, and present specific functions and decompositions for computing them. Such decompositions, when cast in the form of local, fixed-weight FIR neural networks, have definite architectures. Results of their use as front-end inputs to a speaker-independent continuous-speech phoneme recognizer are encouraging. We analyze the dependence of the recognition performance on the various ETFs at different levels of resolution.
Proc. International Conference on Neural Networks, pp. 575-579, June 1997