KEYWORDS: Detection and tracking algorithms, Databases, Tablets, Computing systems, Current controlled current source, Genetic algorithms, Visualization, Legal, Data storage, Diffusion
Online handwritten data, produced with Tablet PCs or digital pens, consists in a sequence of points (x, y). As
the amount of data available in this form increases, algorithms for retrieval of online data are needed. Word
spotting is a common approach used for the retrieval of handwriting. However, from an information retrieval
(IR) perspective, word spotting is a primitive keyword based matching and retrieval strategy. We propose a
framework for handwriting retrieval where an arbitrary word spotting method is used, and then a manifold
ranking algorithm is applied on the initial retrieval scores. Experimental results on a database of more than
2,000 handwritten newswires show that our method can improve the performances of a state-of-the-art word
spotting system by more than 10%.
KEYWORDS: Detection and tracking algorithms, Infrared imaging, Databases, Data fusion, Systems modeling, Image quality, Visualization, Computing systems, Electronic imaging, Current controlled current source
In this work, we propose to combine two quite different approaches for retrieving handwritten documents. Our
hypothesis is that different retrieval algorithms should retrieve different sets of documents for the same query.
Therefore, significant improvements in retrieval performances can be expected. The first approach is based on
information retrieval techniques carried out on the noisy texts obtained through handwriting recognition, while
the second approach is recognition-free using a word spotting algorithm. Results shows that for texts having
a word error rate (WER) lower than 23%, the performances obtained with the combined system are close to
the performances obtained on clean digital texts. In addition, for poorly recognized texts (WER > 52%), an
improvement of nearly 17% can be observed with respect to the best available baseline method.
KEYWORDS: Detection and tracking algorithms, Algorithm development, Optical character recognition, Feature extraction, Signal processing, Error analysis, Systems modeling, Visualization, Electronic imaging, Current controlled current source
As new innovative devices, accepting or producing on-line documents, emerge, managing facilities for these
kinds of documents such as topic spotting are required. This means that we should be able to perform text
categorization of on-line documents. The textual data available in on-line documents can be extracted through online
recognition, a process which produces noise, i.e. errors, in the resulting text. This work reports experiments
on categorization of on-line handwritten documents based on their textual contents. We analyze the effect of the
word recognition rate on the categorization performances, by comparing the performances of a categorization
system over the texts obtained through on-line handwriting recognition and the same texts available as ground
truth. Two categorization algorithms (kNN and SVM) are compared in this work. A subset of the Reuters-21578
corpus consisting of more than 2000 handwritten documents has been collected for this study. Results show that
accuracy loss is not significant, and precision loss is only significant for recall values of 60%-80% depending on
the noise levels.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.