Paper
28 January 2008 Model-based document categorization employing semantic pattern analysis and local structure clustering
Kosei Fume, Yasuto Ishitani
Author Affiliations +
Proceedings Volume 6815, Document Recognition and Retrieval XV; 68150R (2008) https://doi.org/10.1117/12.765422
Event: Electronic Imaging, 2008, San Jose, California, United States
Abstract
We propose a document categorization method based on a document model that can be defined externally for each task and that categorizes Web content or business documents into a target category in accordance with the similarity of the model. The main feature of the proposed method consists of two aspects of semantics extraction from an input document. The semantics of terms are extracted by the semantic pattern analysis and implicit meanings of document substructure are specified by a bottom-up text clustering technique focusing on the similarity of text line attributes. We have constructed a system based on the proposed method for trial purposes. The experimental results show that the system achieves more than 80% classification accuracy in categorizing Web content and business documents into 15 or 70 categories.
© (2008) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Kosei Fume and Yasuto Ishitani "Model-based document categorization employing semantic pattern analysis and local structure clustering", Proc. SPIE 6815, Document Recognition and Retrieval XV, 68150R (28 January 2008); https://doi.org/10.1117/12.765422
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Model-based design

Classification systems

Document management

Video

Associative arrays

Feature extraction

Morphological analysis

RELATED CONTENT

Video search for ambiguous requests
Proceedings of SPIE (June 01 2020)
Semantic filtering of video content
Proceedings of SPIE (January 01 2001)
Multimedia content management in MPEG-21 framework
Proceedings of SPIE (July 01 2002)

Back to Top