Paper
24 January 2011 Reflowing-driven paragraph recognition for electronic books in PDF
Jing Fang, Zhi Tang, Liangcai Gao
Author Affiliations +
Proceedings Volume 7874, Document Recognition and Retrieval XVIII; 78740U (2011) https://doi.org/10.1117/12.872289
Event: IS&T/SPIE Electronic Imaging, 2011, San Francisco Airport, California, United States
Abstract
When reading electronic books on handheld devices, content sometimes should be reflowed and recomposed to adapt for small-screen mobile devices. According to people's reading practice, it is reasonable to reflow the text content based on paragraphs. Hence, this paper addresses the requirement and proposes a set of novel methods on paragraph recognition for electronic books in PDF. The proposed methods consist of three steps, namely, physical structure analysis, paragraph segmentation, and reading order detection. We make use of locally ordered property of PDF documents and layout style of books to improve traditional page recognition results. In addition, we employ the optimal matching of Bipartite Graph technology to detect paragraphs' reading order. Experiments show that our methods achieve high accuracy. It is noteworthy that, the research has been applied in a commercial software package for Chinese E-book production.
© (2011) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jing Fang, Zhi Tang, and Liangcai Gao "Reflowing-driven paragraph recognition for electronic books in PDF", Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 78740U (24 January 2011); https://doi.org/10.1117/12.872289
Lens.org Logo
CITATIONS
Cited by 9 scholarly publications and 1 patent.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Probability theory

Image processing

Mobile devices

Pattern recognition

Visualization

Applied research

RELATED CONTENT


Back to Top