Paper
24 January 2011 Unsupervised method to generate page templates
Hervé Déjean
Author Affiliations +
Proceedings Volume 7874, Document Recognition and Retrieval XVIII; 78740M (2011) https://doi.org/10.1117/12.873160
Event: IS&T/SPIE Electronic Imaging, 2011, San Francisco Airport, California, United States
Abstract
In this paper, we propose a method for automatically inferring the different page templates used to layout the document content. The first step of the method consists in performing a logical analysis of the document. Depending of the coverage of this step, a given number of document elements will be labeled. Then geometric relations are computed between these labeled elements, and page templates candidates are generated using frequent related elements. A fuzzy matching operation allows for selecting the most frequent and relevant page templates for a given document. Such page templates can be used to correct errors produced during the different previous steps of the document analysis: zoning, OCR, and logical analysis. Evaluation has been performed using the INEX book track collection.
© (2011) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Hervé Déjean "Unsupervised method to generate page templates", Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 78740M (24 January 2011); https://doi.org/10.1117/12.873160
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Chemical elements

Fuzzy logic

Optical character recognition

Error analysis

Detection and tracking algorithms

Systems modeling

Cerium

RELATED CONTENT

Computing the ULLV decomposition
Proceedings of SPIE (October 28 1994)
On-line handwritten text categorization
Proceedings of SPIE (January 19 2009)
Robust keyword retrieval method for OCRed text
Proceedings of SPIE (January 24 2011)

Back to Top