Document management systems have become important because of the growing popularity of electronic filing of
documents and scanning of books, magazines, manuals, etc., through a scanner or a digital camera, for storage or reading
on a PC or an electronic book. Text information acquired by optical character recognition (OCR) is usually added to the
electronic documents for document retrieval. Since texts generated by OCR generally include character recognition
errors, robust retrieval methods have been introduced to overcome this problem. In this paper, we propose a retrieval
method that is robust against both character segmentation and recognition errors. In the proposed method, the insertion
of noise characters and dropping of characters in the keyword retrieval enables robustness against character segmentation
errors, and character substitution in the keyword of the recognition candidate for each character in OCR or any other
character enables robustness against character recognition errors. The recall rate of the proposed method was 15% higher
than that of the conventional method. However, the precision rate was 64% lower.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.