Paper
1 April 1998 Recognition of printed Arabic text using machine learning
Adnan Amin
Author Affiliations +
Proceedings Volume 3305, Document Recognition V; (1998) https://doi.org/10.1117/12.304645
Event: Photonics West '98 Electronic Imaging, 1998, San Jose, CA, United States
Abstract
Many papers have been concerned with the recognition of Latin, Chinese and Japanese characters. However, although almost a third of a billion people worldwide, in several different languages, use Arabic characters for writing, little research progress, in both on-line and off-line has been achieved towards the automatic recognition of Arabic characters. This is a result of the lack of adequate support in terms of funding, and other utilities such as Arabic text database, dictionaries, etc. and of course of the cursive nature of its writing rules. The main theme of this paper is the automatic recognition of Arabic printed text using machine learning C4.5. Symbolic machine learning algorithms are designed to accept example descriptions in the form of feature vectors which include a label that identifies the class to which an example belongs. The output of the algorithm is a set of rules that classifies unseen examples based on generalization from the training set. This ability to generalize is the main attraction of machine learning for handwriting recognition. Samples of a character can be preprocessed into a feature vector representation for presentation to a machine learning algorithm that creates rules for recognizing characters of the same class. Symbolic machine learning has several advantages over other learning methods. It is fast in training and in recognition, generalizes well, is noise tolerant and the symbolic representation is easy to understand. The technique can be divided into three major steps: the first step is pre- processing in which the original image is transformed into a binary image utilizing a 300 dpi scanner and then forming the connected component. Second, global features of the input Arabic word are then extracted such as number subwords, number of peaks within the subword, number and position of the complementary character, etc. Finally, machine learning C4.5 is used for character classification to generate a decision tree.
© (1998) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Adnan Amin "Recognition of printed Arabic text using machine learning", Proc. SPIE 3305, Document Recognition V, (1 April 1998); https://doi.org/10.1117/12.304645
Lens.org Logo
CITATIONS
Cited by 7 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Machine learning

Detection and tracking algorithms

Optical character recognition

Binary data

Feature extraction

Tablets

Associative arrays

Back to Top