Multiresolution Elementary Tonotopic Features for Speech Perception

We define multiresolution elementary tonotopic features (ETFs) in general, and present specific functions and decompositions for computing them. Such decompositions, when cast in the form of local, fixed-weight FIR neural networks, have definite architectures. Results of their use as front-end inputs to a speaker-independent continuous-speech phoneme recognizer are encouraging. We analyze the dependence of the recognition performance on the various ETFs at different levels of resolution.

