In recent years, lip-reading techniques have been actively researched for estimating speech content only from visual information without audio information. Large databases are available for English but not enough for other languages. Therefore, this paper constructs a new database for improving the accuracy of Japanese lip-reading. In previous research, we asked collaborators to record utterance scenes to build a database. This paper uses YouTube videos. We download a weather forecast video from the “Weathernews” YouTube channel. We constructed a database that can be used for lip-reading by applying video and audio processing. Furthermore, we selected 50 Japanese words from our database and applied an existing deep-learning model. As a result, we obtained a word recognition rate of 66%. We have established a method for constructing a lip-reading database using YouTube, although there are still problems with the scale of the database and recognition accuracy.
Lip-reading technology has the advantage that it can be used even in noisy environments and has been actively studied in recent years. In this paper, we develop a navigation application, "KuchiNavi," as a new application using lip-reading technology. The basic technology is word-level lip-reading technology, which utilizes an existing deep-learning model. However, we quantitatively evaluated lip-reading accuracy by selecting words for navigation, collecting utterance scenes independently, building an original dataset, and conducting recognition experiments. This paper, 101 Japanese words were selected, utterance scenes were collected from 15 people, and recognition experiments were conducted using the speakerindependent recognition task, the leave-one-person-out method. As a result, an average recognition rate of 88.2% was obtained. In addition, we developed an iOS app and conducted a demonstration in a car to confirm its effectiveness.
Patients with intractable neuropathy may use a mouth-shape character (MSC)-based communication as an alternative to communication when speech, writing, and PC are unavailable. MSC-based communication requires not only a speaker but also a supporter. However, it may not be easy to read the mouth shape due to the skill of the supporter. The communication support system that automatically recognizes the mouth shape of a speaker is expected. This study aims to develop the whole support system, and this paper works on mouth shape recognition. We introduce 3DCNN-based mouth shape recognition. As for the input data of CNN, not only the color image but also the flow image obtained by applying the optical flow is used, and two outputs of two CNN models are integrated. We collected speech scenes of eight patients with intractable neurological diseases and conducted recognition experiments. As a result, an average recognition rate of 77.1% was obtained. Excluding the two patients, one had difficulty recognizing the mouth shape even by a human due to little movement of the mouth, and the other had a problem shooting, and the average recognition rate of 86.6% was obtained. We demonstrated the effectiveness of the proposed method.
There is a tendency to deal with a speaker-independent recognition task in the lip-reading field by collecting speech scenes from many speakers. The data collection task is time-consuming. This paper proposes a method to solve this problem. According to a driving video, First Order Motion Model (FOMM) is a deep generative model that generates a video sequence from a source image. Our idea is to apply FOMM to all speech scenes in the dataset to generate the speech scenes recording from one speaker. We propose a preprocessing method to replace the speaker-independent recognition task with the speaker-dependent recognition task by applying FOMM. We applied the proposed method to two publicly available databases: OuluVS and CUAVE, and confirmed that the recognition accuracy was improved by applying the proposed method to both databases.
In this research, we are studying on image-based identification of trees species that we can see everywhere. In our previous study, we showed that convolutional neural network (CNN) can recognize tree species by using a region of interest (ROI) image of bark. However, the bark region is manually extracted from a natural bark image. This paper solves this problem using semantic segmentation, and proposes an automatic tree species identification from natural bark image. The proposed method was evaluated with the bark image dataset collected independently. We confirmed the effectiveness of the proposed method.
This paper proposes a convolutional neural network (CNN)-based tree species identification method from bark image. The proposed method uses the well-known CNN model. The difficulty of our problem is to use a special tree image in which a colorful tag is stick on the bark. The tag is irrelevant to the species. In order to recognize with CNN, it is necessary to extract a region (ROI) excluding the tag. Thus, this paper proposes a ROI extraction method. Extracted ROI is fed to CNN. We evaluated the proposed method with six tree species. We carried out the evaluation experiment with various conditions, and found an optimal condition for our problem.
This research presents the character input system by using wearable camera based gaze estimation system (GES) for disorders. The GES is the system that can estimate the user's gaze direction. The proposed system uses the inside-out camera to get the user's gaze image. This system uses the gaze image to classify the character and create the word. Proposed system consists of two processes: Gazes estimation process and character recognition process. This research uses the Convolutional Neural Network (CNN) method to archive the good accuracy. By using the proposed method to train the model, the system was confirmed through evaluation experiments. The experimental results show the good accuracy of the proposed system compared with the previous research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.