Paper
24 January 2011 Font group identification using reconstructed fonts
Author Affiliations +
Proceedings Volume 7874, Document Recognition and Retrieval XVIII; 78740N (2011) https://doi.org/10.1117/12.873398
Event: IS&T/SPIE Electronic Imaging, 2011, San Francisco Airport, California, United States
Abstract
Ideally, digital versions of scanned documents should be represented in a format that is searchable, compressed, highly readable, and faithful to the original. These goals can theoretically be achieved through OCR and font recognition, re-typesetting the document text with original fonts. However, OCR and font recognition remain hard problems, and many historical documents use fonts that are not available in digital forms. It is desirable to be able to reconstruct fonts with vector glyphs that approximate the shapes of the letters that form a font. In this work, we address the grouping of tokens in a token-compressed document into candidate fonts. This permits us to incorporate font information into token-compressed images even when the original fonts are unknown or unavailable in digital format. This paper extends previous work in font reconstruction by proposing and evaluating an algorithm to assign a font to every character within a document. This is a necessary step to represent a scanned document image with a reconstructed font. Through our evaluation method, we have measured a 98.4% accuracy for the assignment of letters to candidate fonts in multi-font documents.
© (2011) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Michael P. Cutter, Joost van Beusekom, Faisal Shafait, and Thomas M. Breuel "Font group identification using reconstructed fonts", Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 78740N (24 January 2011); https://doi.org/10.1117/12.873398
Lens.org Logo
CITATIONS
Cited by 5 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Reconstruction algorithms

Optical character recognition

Prototyping

Visualization

Control systems

Databases

Electromagnetic coupling

Back to Top