Shuohao Li, Anqi Han, Xu Chen, Xiaoqing Yin, Jun Zhang
Journal of Electronic Imaging, Vol. 26, Issue 05, 053023, (October 2017) https://doi.org/10.1117/1.JEI.26.5.053023
TOPICS: Data modeling, Detection and tracking algorithms, Performance modeling, Feature extraction, Image segmentation, Neural networks, Computer programming, Machine vision, Optical character recognition, Lithium
Recognizing text in images captured in the wild is a fundamental preprocessing task for many computer vision and machine learning applications and has gained significant attention in recent years. This paper proposes an end-to-end trainable deep review neural network for scene text recognition, which is a combination of feature extraction, feature reviewing, feature attention, and sequence recognition. Our model can generate the predicted text without any segmentation or grouping algorithm. Because the attention model in the feature attention stage lacks global modeling ability, a review network is applied to extract the global context of sequence data in the feature reviewing stage. We perform rigorous experiments across a number of standard benchmarks, including IIIT5K, SVT, ICDAR03, and ICDAR13 datasets. Experimental results show that our model is comparable to or outperforms state-of-the-art techniques.