Paper
21 July 2023 Research on Chinese word segmentation model based on RoBERTa and conditional radom field
Wenfeng Cao
Author Affiliations +
Proceedings Volume 12717, 3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023); 127173G (2023) https://doi.org/10.1117/12.2684633
Event: 3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023), 2023, Wuhan, China
Abstract
This paper proposes a Chinese word segmentation model based on RoBERTa and Conditional Random Field. The Chinese word segmentation model based on RoBERTa and CRF deals with Chinese word segmentation in the way of sequence labeling. The training of the model takes the character sequence of the sentence as the input, which is converted into number according to vocabulary and sent into RoBERTa layer. The output results of RoBERTa layer were converted into the probability distribution of word position label by linear transformation and softmax activation function. The predicted word position label sequence was obtained by Viterbi decoding operation. After experiments, the accuracy rate, precision rate, recall rate and f1 scores of RoBERTa + CRF model on SIGHAN's PKU and MSR corpora are 96.33%, 95.75%, 95.00% and 95.37% respectively (pick the highest score), which has preliminary application value and certain room for improvement.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Wenfeng Cao "Research on Chinese word segmentation model based on RoBERTa and conditional radom field", Proc. SPIE 12717, 3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023), 127173G (21 July 2023); https://doi.org/10.1117/12.2684633
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Machine learning

Neural networks

Attenuation

Mathematical optimization

Back to Top