Paper
28 January 2008 An OCR based approach for word spotting in Devanagari documents
Author Affiliations +
Proceedings Volume 6815, Document Recognition and Retrieval XV; 68150O (2008) https://doi.org/10.1117/12.767289
Event: Electronic Imaging, 2008, San Jose, California, United States
Abstract
This paper describes an OCR-based technique for word spotting in Devanagari printed documents. The system accepts a Devanagari word as input and returns a sequence of word images that are ranked according to their similarity with the input query. The methodology involves line and word separation, pre-processing document words, word recognition using OCR and similarity matching. We demonstrate a Block Adjacency Graph (BAG) based document cleanup in the pre-processing phase. During word recognition, multiple recognition hypotheses are generated for each document word using a font-independent Devanagari OCR. The similarity matching phase uses a cost based model to match the word input by a user and the OCR results. Experiments are conducted on document images from the publicly available ILT and Million Book Project dataset. The technique achieves an average precision of 80% for 10 queries and 67% for 20 queries for a set of 64 documents containing 5780 word images. The paper also presents a comparison of our method with template-based word spotting techniques.
© (2008) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Anurag Bhardwaj, Suryaprakash Kompalli, Srirangaraj Setlur, and Venu Govindaraju "An OCR based approach for word spotting in Devanagari documents", Proc. SPIE 6815, Document Recognition and Retrieval XV, 68150O (28 January 2008); https://doi.org/10.1117/12.767289
Lens.org Logo
CITATIONS
Cited by 10 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Image segmentation

Image filtering

Feature extraction

Image quality

Biometrics

Electronic imaging

Back to Top