Paper
19 January 2009 Segmentation of continuous document flow by a modified backward-forward algorithm
Th. Meilender, A. Belaïd
Author Affiliations +
Proceedings Volume 7247, Document Recognition and Retrieval XVI; 724705 (2009) https://doi.org/10.1117/12.805646
Event: IS&T/SPIE Electronic Imaging, 2009, San Jose, California, United States
Abstract
This paper describes a segmentation method of continuous document flow. A document flow is a list of successive scanned pages, put in a production chain, representing several documents without explicit separation mark between them. To separate the documents for their recognition, it is needed to analyze the content of the successive pages and to point out the limit pages of each document. The method proposed here is similar to the variable horizon models (VHM) or multi-grams used in speech recognition. It consists in maximizing the flow likelihood knowing all the Markov Models of the constituent elements. As the calculation of this likelihood on all the flow is NP-complete, the solution consists in studying them in windows of reduced observations. The first results obtained on homogeneous flows of invoices reaches more than 75% of precision and 90% of recall.
© (2009) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Th. Meilender and A. Belaïd "Segmentation of continuous document flow by a modified backward-forward algorithm", Proc. SPIE 7247, Document Recognition and Retrieval XVI, 724705 (19 January 2009); https://doi.org/10.1117/12.805646
Lens.org Logo
CITATIONS
Cited by 7 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Chemical elements

Image segmentation

Chronology

Databases

Electroluminescence

Optical character recognition

Speech recognition

Back to Top