Paper
3 February 2014 Automatic lip reading by using multimodal visual features
Author Affiliations +
Proceedings Volume 9025, Intelligent Robots and Computer Vision XXXI: Algorithms and Techniques; 902508 (2014) https://doi.org/10.1117/12.2038375
Event: IS&T/SPIE Electronic Imaging, 2014, San Francisco, California, United States
Abstract
Since long time ago, speech recognition has been researched, though it does not work well in noisy places such as in the car or in the train. In addition, people with hearing-impaired or difficulties in hearing cannot receive benefits from speech recognition. To recognize the speech automatically, visual information is also important. People understand speeches from not only audio information, but also visual information such as temporal changes in the lip shape. A vision based speech recognition method could work well in noisy places, and could be useful also for people with hearing disabilities. In this paper, we propose an automatic lip-reading method for recognizing the speech by using multimodal visual information without using any audio information such as speech recognition. First, the ASM (Active Shape Model) is used to track and detect the face and lip in a video sequence. Second, the shape, optical flow and spatial frequencies of the lip features are extracted from the lip detected by ASM. Next, the extracted multimodal features are ordered chronologically so that Support Vector Machine is performed in order to learn and classify the spoken words. Experiments for classifying several words show promising results of this proposed method.
© (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Shohei Takahashi and Jun Ohya "Automatic lip reading by using multimodal visual features", Proc. SPIE 9025, Intelligent Robots and Computer Vision XXXI: Algorithms and Techniques, 902508 (3 February 2014); https://doi.org/10.1117/12.2038375
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Laser induced plasma spectroscopy

Visualization

Information visualization

Optical flow

Speech recognition

Feature extraction

Spatial frequencies

RELATED CONTENT


Back to Top