The speech signal of a word is a combination of frequencies which can produce specific transition frequency shapes.
These can be regarded as a written text in some unknown ‘script’. Before attempting methods to read the speech
spectrogram image using image processing techniques we need first to define the properties of the speech spectrogram
image as well as the reduction of the clutter of the spectrogram image and the selection of the methods to be employed
for image matching.
Thus methods to convert the speech signal to a spectrogram image are initially employed, followed by reduction of the
noise in the signal by capturing the energy associated with formants of the speech signal. This is followed by the
normalisation of the size of the image and its resolution of in both the frequency and time axes. Finally, template
matching methods are employed to recognise portions of text and isolated words. The paper describes the pre-processing
methods employed and outlines the use of normalised grey-level correlation for the recognition of words.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.