The Visually Impaired People (VIP) have the difficulty in perceiving the accurate localization in their daily life. Developing an efficient algorithm to address the localization issues of the VIP is crucial. Visual Place Recognition (VPR) refers to using the image retrieval algorithms to determine the location of a query image in the database, which is promising to help the VIP solve their localization problems. However, the accuracy of VPR is directly affected by the changes of scene appearances such as illumination, seasons and viewpoints. Therefore, finding a method to extract robust image descriptors under the changes of scene appearance is one of the most critical tasks in current VPR research. In this paper, we propose a VPR approach to assist the localization and navigation of visually impaired pedestrians. The core of our proposal is a combination of multi-level descriptors by using appropriate descriptors: the whole image, local regions and key-points, aimed to enhance the robustness of VPR. The matching procedure between query images and database images includes three steps. Firstly, we obtain the Convolutional Neural Networks (CNN) features of the whole images from a pre-trained GoogLeNet, and the Euclidean distances between the query images and the database images are computed to determine the top 10 good matches. Secondly, local salient regions are detected from the top-10 best-matched images with Non-Maximum Suppression (NMS) to control the number of bounding boxes. Thirdly, we detect the SIFT key-points and extract the geodesc descriptors of the key-points, from the local salient region, and determine the top 1 among the top 10 good matches. In order to verify our approach, a comprehensive set of experiments has been conducted on dataset with challenging environmental changes, such as the GardensPointWalking dataset.
For Visually Impaired People (VIP), it’s very difficult to perceive their surroundings. To address this problem, we propose a scene understanding system to aid VIP in indoor and outdoor environments. Semantic segmentation performance is generally sensitive to the environment and illumination changes, including the change between indoor and outdoor environments and the change across different weather conditions. Meanwhile, most existing methods have paid more attention on either the accuracy or the efficiency, instead of the balance between both of them. In the proposed system, the training dataset is preprocessed by using an illumination-invariant transformation to weaken the impact of illumination changes and improve the robustness of the semantic segmentation network. Regarding the structure of semantic segmentation network, the lightweight networks such as MobileNetV2 and ShuffleNet V2 are employed as the backbone of DeepLabv3+ to improve the accuracy with little increasing of computation, which is suitable for mobile assistance device. We evaluate the robustness of the segmentation model across different environments on the Gardens Point Walking dataset, and demonstrate the extremely positive effect of the illumination-invariant pre-transformation in challenging real-world domain. The network trained on computer achieves a relatively high accuracy on ADE20K relabeled into 20 classes. The frame rate of the proposed system is up to 83 FPS on a 1080Ti GPU.
It is very difficult for visually impaired people (VIP) to perceive and avoid obstacles at a distance. To address this problem, we propose a sensor fusion system, which combines the RGB-depth (RGB-D) sensor and millimeter wave (MMW) radar sensor, to perceive the surrounding obstacles. The position and velocity information of the multiple targets are detected by the MMW radar based on the principle of frequency modulated continuous wave. The depth and position information of the obstacles are verified by the RGB-D sensor based on the MeanShift algorithm. The data fusion based on the joint probabilistic data association algorithm and Kalman filter enable the navigation assistance system to obtain more accurate state estimates compared with using only one sensor. The nonsemantic stereophonic interface is utilized to transfer the obstacle detection results to the VIP. The experiment results show that multiple objects with different ranges and angles are detected by the radar and the RGB-D sensor. The effective detection range is expanded up to 80 m compared to using only the RGB-D sensor. Moreover, the measurement results are stable under diverse illumination conditions. As a wearable system, the sensor fusion system has the characteristics of versatility, portability, and cost-effectiveness.
Visually Impaired (VI) people around the world have difficulties in socializing and traveling due to the limitation of traditional assistive tools. In recent years, practical assistance systems for scene text detection and recognition allow VI people to obtain text information from surrounding scenes. However, real-world scene text features complex background, low resolution, variable fonts as well as irregular arrangement which make it difficult to achieve robust scene text detection and recognition. In this paper, a scene text recognition system to help VI people is proposed. Firstly, we propose a high-performance neural network to detect and track objects, which is applied to specific scenes to obtain Regions of Interest (ROI). In order to achieve real-time detection, a light-weight deep neural network has been built using depth-wise separable convolutions that enables the system to be integrated into mobile devices with limited computational resources. Secondly, we train the neural network using the textural features to improve the precision of text detection. Our algorithm suppresses the effects of spatial transformation (including translation, scaling, rotation as well as other geometric transformations) based on the spatial transformer networks. Open-source optical character recognition (OCR) is used to train scene texts individually to improve the accuracy of text recognition. The interactive system eventually transfers the number and distance information of inbound buses to visually impaired people. Finally, a comprehensive set of experiments on several benchmark datasets demonstrates that our algorithm has achieved an extraordinary trade-off between precision and resource usage.
With the increasing demands of visually impaired people, developing assistive technology to help them travel effectively and safely has been a research hotspot. Red, Green, Blue and Depth (RGB-D) sensor has been widely used to help visually impaired people, but the detection and recognition of glass objects is still a challenge, considering the depth information of glass cannot be obtained correctly. In order to overcome the limitation, we put forward a method to detect glass objects in natural indoor scenes in this paper, which is based on the fusion of ultrasonic sensor and RGB-D sensor on a wearable prototype. Meanwhile, the erroneous depth map of glass object computed by the RGB-D sensor could also be densely recovered. In addition, under some special circumstances, such as facing a mirror or an obstacle within the minimum detectable range of the RGB-D sensor, we use a similar processing method to regain depth information in the invalid area of the original depth map. The experimental results show that the detection range and precision of the RGB-D sensor have been significantly improved with the aid of ultrasonic sensor. The proposed method is proved to be able to detect and recognize common glass obstacles for visually impaired people in real time, which is suitable for real-world indoor navigation assistance.
According to the data from the World Health Organization, 285 million people are estimated to be visually impaired worldwide, and 39 million are blind. It is very difficult for visually impaired people to perceive and avoid obstacles at a distance during their travelling. To address this problem, we propose a sensor fusion system, which combines the RGBDepth sensor and millimeter wave radar sensor, to detect the surrounding obstacles. The range and velocity of multiple obstacles are acquired by the millimeter wave radar based on the principle of frequency modulated continuous wave. The positions of the obstacles are verified by the RGB-Depth sensor based on the contour extraction and MeanShift algorithm. The data fusion algorithm based on particle filters obtains accurate state estimation by fusing RGB-Depth data with millimeter wave radar data. The experiment results show that multiple obstacles with different ranges and angles are successfully detected by the proposed system. The measurement uncertainties are reduced by the data fusion system, meanwhile the effective detectable range is expanded compared to the detection with only RGB-Depth sensor. Moreover, the measurement results are stable when the illumination varies. As a wearable prototype, the sensor fusion system has the characteristics of versatility, portability and cost-effectiveness, which is very suitable for blind navigation application.
Detecting and reminding of crosswalks at urban intersections is one of the most important demands for people with visual impairments. A real-time crosswalk detection algorithm, adaptive extraction and consistency analysis (AECA), is proposed. Compared with existing algorithms, which detect crosswalks in ideal scenarios, the AECA algorithm performs better in challenging scenarios, such as crosswalks at far distances, low-contrast crosswalks, pedestrian occlusion, various illuminances, and the limited resources of portable PCs. Bright stripes of crosswalks are extracted by adaptive thresholding, and are gathered to form crosswalks by consistency analysis. On the testing dataset, the proposed algorithm achieves a precision of 84.6% and a recall of 60.1%, which are higher than the bipolarity-based algorithm. The position and orientation of crosswalks are conveyed to users by voice prompts so as to align themselves with crosswalks and walk along crosswalks. The field tests carried out in various practical scenarios prove the effectiveness and reliability of the proposed navigation approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.