Visually impaired people avoid dangers by using a white cane. Because they use it as an extension of their arm, we consider it a part of their body. Our goal is to recognize visually impaired people using a white cane in images and how they use it. We propose a method to estimate the posture of a person with a white cane by extending an existing posture estimation model: OpenPose. In our method, we incorporate a white cane as a part of the human skeleton model. We constructed a database of images of visually impaired people with a white cane to train the network for the extended human skeleton model. We develop a method to determine the left or right hand that holds the white cane in the training images because it is necessary to train right-handed and left-handed users separately. We can analyze the motion of the white cane by the result of posture estimation. We focus on the angle of the white cane and analyze its swing frequency. Throughout our experiments, we confirmed that our preliminary system successfully estimated the human posture with the white cane and the swing frequency of the white cane.
On detecting people in a video of a wide area, there are three problems. First, the apparent sizes of people are likely to be very different due to their positions in the scene. Second, the density of people is unevenly distributed in the scene. Detecting the entire image implies a high video processing cost. Third, there are regions where people do not permanently appear. These regions should not be processed. To solve these problems, we aim to detect people regardless of people's positions and sizes, focusing on the regions where people could appear. To achieve this aim, we propose a video dividing method based on people's positions and sizes in the video for high-resolution video detection and a method to eliminate the divided regions where people will not appear. We adopt a pyramid representation for video division and integration. We skip processing the regions where no people are detected for a certain amount of time. Furthermore, we use semantic segmentation to eliminate regions where people do not appear permanently. We choose 4K high-resolution videos looking down wide areas from a fixed camera for the experiment. We test the proposed method on PANDA dataset and the videos taken at Tsukuba.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.