In recent years, lip-reading techniques have been actively researched for estimating speech content only from visual information without audio information. Large databases are available for English but not enough for other languages. Therefore, this paper constructs a new database for improving the accuracy of Japanese lip-reading. In previous research, we asked collaborators to record utterance scenes to build a database. This paper uses YouTube videos. We download a weather forecast video from the “Weathernews” YouTube channel. We constructed a database that can be used for lip-reading by applying video and audio processing. Furthermore, we selected 50 Japanese words from our database and applied an existing deep-learning model. As a result, we obtained a word recognition rate of 66%. We have established a method for constructing a lip-reading database using YouTube, although there are still problems with the scale of the database and recognition accuracy.
|