Research on Chinese word segmentation model based on RoBERTa and conditional radom field

Wenfeng Cao

doi:10.1117/12.2684633

21 July 2023 Research on Chinese word segmentation model based on RoBERTa and conditional radom field

Wenfeng Cao

Proceedings Volume 12717, 3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023); 127173G (2023) https://doi.org/10.1117/12.2684633
Event: 3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023), 2023, Wuhan, China

Abstract

This paper proposes a Chinese word segmentation model based on RoBERTa and Conditional Random Field. The Chinese word segmentation model based on RoBERTa and CRF deals with Chinese word segmentation in the way of sequence labeling. The training of the model takes the character sequence of the sentence as the input, which is converted into number according to vocabulary and sent into RoBERTa layer. The output results of RoBERTa layer were converted into the probability distribution of word position label by linear transformation and softmax activation function. The predicted word position label sequence was obtained by Viterbi decoding operation. After experiments, the accuracy rate, precision rate, recall rate and f1 scores of RoBERTa + CRF model on SIGHAN's PKU and MSR corpora are 96.33%, 95.75%, 95.00% and 95.37% respectively (pick the highest score), which has preliminary application value and certain room for improvement.

Citation Download Citation

Wenfeng Cao "Research on Chinese word segmentation model based on RoBERTa and conditional radom field", Proc. SPIE 12717, 3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023), 127173G (21 July 2023); https://doi.org/10.1117/12.2684633

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
9 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Education and training

Machine learning

Neural networks

Attenuation

Mathematical optimization

RELATED CONTENT

An efficient differential privacy federated learning scheme with adaptive privacy...
Proceedings of SPIE (June 26 2023)

Derivation, optimization, and comparative analysis of support vector machines application...
Proceedings of SPIE (June 07 2024)

A study of machine learning based credit card potential default...
Proceedings of SPIE (April 03 2023)

Building energy consumption prediction based on Bayesian optimized LSTM model
Proceedings of SPIE (December 07 2023)

A transfer model for evaluating text classification models based on...
Proceedings of SPIE (April 08 2024)

LightGBM-based line loss prediction model for distribution networks
Proceedings of SPIE (August 16 2023)

An intelligent control model of ambulances in emergency scenarios
Proceedings of SPIE (April 08 2024)

Subscribe to Digital Library

Receive Erratum Email Alert

Keywords/Phrases

Search In:

Publication Years