Paper
25 March 2024 Construction of a new database for Japanese lip-reading
Towa Yamabe, Takeshi Saitoh
Author Affiliations +
Proceedings Volume 13089, Fifteenth International Conference on Graphics and Image Processing (ICGIP 2023); 1308911 (2024) https://doi.org/10.1117/12.3021119
Event: Fifteenth International Conference on Graphics and Image Processing (ICGIP 2023), 2023, Suzhou, China
Abstract
In recent years, lip-reading techniques have been actively researched for estimating speech content only from visual information without audio information. Large databases are available for English but not enough for other languages. Therefore, this paper constructs a new database for improving the accuracy of Japanese lip-reading. In previous research, we asked collaborators to record utterance scenes to build a database. This paper uses YouTube videos. We download a weather forecast video from the “Weathernews” YouTube channel. We constructed a database that can be used for lip-reading by applying video and audio processing. Furthermore, we selected 50 Japanese words from our database and applied an existing deep-learning model. As a result, we obtained a word recognition rate of 66%. We have established a method for constructing a lip-reading database using YouTube, although there are still problems with the scale of the database and recognition accuracy.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Towa Yamabe and Takeshi Saitoh "Construction of a new database for Japanese lip-reading", Proc. SPIE 13089, Fifteenth International Conference on Graphics and Image Processing (ICGIP 2023), 1308911 (25 March 2024); https://doi.org/10.1117/12.3021119
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
Back to Top