Historical Tibetan document hold a significant place within surviving ancient literature in China for their representation of Tibetan culture and history. All of these documents were handwritten and thus possess various problems such as adhesion, blurred handwriting, and background stains. Segmentation of syllables from the text is a crucial step in analyzing images of Historical Tibetan document that must be completed prior to syllable recognition. Syllable segmentation is challenged by pre-syllable and post-syllable stroke sticking, tsheg sticking, and syllable stroke sticking. In this paper, we enhance the K-Net model to improve its efficacy in segmenting syllables in Historical Tibetan document texts. The main work includes: (1) to better classify the sticking syllables, we modify the backbone network to ResNeXt; (2) before entering into the kernel learning, we convert the feature mask and the segmentation mask into a convolution to increase the correct rate of mask prediction; (3) information proofreading in each step of kernel updating, and the mask prediction obtained from kernel learning is convolved with the feature mask to obtain a new mask prediction with higher correct rate; and (4) streamlining part of the instance segmentation code to lighten the network model. The experimental results demonstrate that the suggested technique can solve the syllable segmentation issue of syllable-to-syllable stroke sticking and tsheg-to-syllable stroke sticking proficiently. Consequently, syllable segmentation of mAcc attains 95.66%, and mIoU attains 76.59% for the Historical Tibetan document.
KEYWORDS: Image segmentation, Feature extraction, Education and training, Detection and tracking algorithms, Data modeling, Image processing algorithms and systems, Convolution, Deep learning, Stochastic processes, Semantics
Aiming at the problems of false detection and missing detection of texts in the process of text detection caused by random distribution of Tibetan texts, various scales and shapes in natural scenes, this paper proposes a natural scene Tibetan text detection algorithm based on feature enhancement of spatial attention mechanism. The spatial attention mechanism is introduced into the pyramid network module of feature extraction to extract richer local and overall information and enhance the ability of feature extraction; feature kernel clustering can better distinguish adjacent text instances, and the predicted similarity vector is accurate Aggregate text pixels to the corresponding text kernel, further improve the accuracy of scene Tibetan detection, and effectively reduce false detection and missed detection. The model is evaluated on the TCSD scene Tibetan dataset, and the results show that the F-measure comprehensive index of this method reaches 81.09%, which is better than the previous scene Tibetan detection algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.