KEYWORDS: Independent component analysis, 3D modeling, Head, Visualization, Principal component analysis, 3D acquisition, Solid modeling, Mouth, Data modeling, Transplantation
Efficient, realistic face animation is still a challenge. A system is proposed that yields realistic animations for speech. It starts from real 3D face dynamics, observed at a frame rate of 25 fps for thousands of points on the faces of speaking actors. When asked to animate a face it replicates the visemes that is has learned, and adds the necessary coarticulation effects. The speech animation could be based on as few as 16 modes, extracted through Independent Component Analysis from the observed face dynamics. Also faces for which only a static, neutral 3D model is available, can be animated. Rather then animating via verbatim copying other faces’ deformation fields, the visemes are adapted to the shape of the new face. By localising this face in a Face Space, where also the locations of the example faces are known, visemes are adapted automatically according to the relative distance with respect to these examples. The animation tool proposes a good speech-based face animation as a point of departure for animators, who also get support by the system to then make further changes as desired.
KEYWORDS: 3D modeling, Laser induced plasma spectroscopy, Solid modeling, Visualization, Motion models, Principal component analysis, Eye models, 3D image processing, 3D acquisition, Video
We are all experts in the perception and interpretation of faces and their dynamics. This makes facial animation a particularly demanding area of graphics. Increasingly, computer vision is brought to bear and 3D models and their motions are learned from observations. The paper subscribes to this strand for the 3D modeling of human speech. The approach follows a kind of bootstrap procedure. First, 3D shape statistics are learned from faces with a few markers. A 3D reconstruction of a speaking face is produced for each video frame. A topological mask of the lower half of the face is fitted to the motion. The 3D shape statistics are extracted and principal components analysis reduces the dimension of the maskspace. The final speech tracker can work without markers, as it is only allowed to roam this constrained space of masks. Upon the representation of the different visemes in this space, speech or text can be used as input for animation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.