Paper
1 April 1998 Table structure recognition based on robust block segmentation
Thomas G. Kieninger
Author Affiliations +
Proceedings Volume 3305, Document Recognition V; (1998) https://doi.org/10.1117/12.304642
Event: Photonics West '98 Electronic Imaging, 1998, San Jose, CA, United States
Abstract
This paper presents an efficient approach to identify tabular structures within either electronic or paper documents. The resulting T-Recs system takes word bounding box information as input, and outputs the corresponding logical text block units. Starting with an arbitrary word as block seed the algorithm recursively expands this block to all words that interleave with their vertical neighbors. Since even smallest gaps of table columns prevent their words from mutual interleaving, this initial segmentation is able to identify and isolate such columns. In order to deal with some inherent segmentation errors caused by isolated lines, overhanging words, or cells spawning more than one column, a series of postprocessing steps is added. These steps benefit form a very simple distinction between type 1 and type 2 blocks: type 1 blocks are those of at most one word per line, all others are of type 2. This distinction allows the selective application of heuristics to each group of blocks. The conjoint decomposition of column blocks into subsets of table cells leads to the final block segmentation of a homogeneous abstraction level. These segments serve the final layout analysis which identifies table environments and cells that are stretching over several rows and/or columns.
© (1998) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Thomas G. Kieninger "Table structure recognition based on robust block segmentation", Proc. SPIE 3305, Document Recognition V, (1 April 1998); https://doi.org/10.1117/12.304642
Lens.org Logo
CITATIONS
Cited by 88 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Optical character recognition

Visualization

Analytical research

Inspection

Human-machine interfaces

Information operations

RELATED CONTENT


Back to Top