Ideally, digital versions of scanned documents should be represented in a format that is searchable, compressed,
highly readable, and faithful to the original. These goals can theoretically be achieved through OCR and font
recognition, re-typesetting the document text with original fonts. However, OCR and font recognition remain
hard problems, and many historical documents use fonts that are not available in digital forms. It is desirable
to be able to reconstruct fonts with vector glyphs that approximate the shapes of the letters that form a
font. In this work, we address the grouping of tokens in a token-compressed document into candidate fonts.
This permits us to incorporate font information into token-compressed images even when the original fonts are
unknown or unavailable in digital format. This paper extends previous work in font reconstruction by proposing
and evaluating an algorithm to assign a font to every character within a document. This is a necessary step
to represent a scanned document image with a reconstructed font. Through our evaluation method, we have
measured a 98.4% accuracy for the assignment of letters to candidate fonts in multi-font documents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.