Paper
22 February 2023 Automatic keyword extraction based on dependency parsing and BERT semantic weighting
HuiXin Liu
Author Affiliations +
Proceedings Volume 12587, Third International Seminar on Artificial Intelligence, Networking, and Information Technology (AINIT 2022); 125871C (2023) https://doi.org/10.1117/12.2667242
Event: Third International Seminar on Artificial Intelligence, Networking, and Information Technology (AINIT 2022), 2022, Shanghai, China
Abstract
It's hard for the classic TextRank algorithm to differentiate the degree of association between candidate keyword nodes. Furthermore, it readily ignores the long-distance syntactic relations and topic semantic information between words while extracting keywords from a document. For the purpose of solving this problem, we propose an improved TextRank algorithm utilizing lexical, grammatical, and semantic features to find objective keywords from Chinese academic text. Firstly, we construct the word graph of candidate keywords after text preprocessing. Secondly, we integrate multidimensional features of candidate words into the primary calculation of the transition probability matrix. In this regard, our approach mines the full text to extract a collection of grammatical and morphological features (such as part-of-speech, word position, long-distance dependencies, and distinguished BERT dynamic semantic information). By introducing the dependency syntax of long sentences, the algorithm's ability to identify low-frequency topic keywords is obviously promotional. In addition, the external semantic information is designed to be imported through the word embedding model. A merged feature-based matrix is then employed to calculate the influence of all candidate keyword nodes with the iterative formula of PageRank. Namely, we attain a set of satisfactory keywords by ranking candidate nodes according to their comprehensive influence scores and selecting the ultimate top N keywords. This paper utilizes public data sets to verify the effectiveness of the proposed algorithm. Our approach achieves comparable f-scores with a 5.5% improvement (4 keywords) over the classic. The experimental results demonstrate that our approach can expand the degree of association differentiation between nodes better by mining synthetic long text features. Besides, the results also show that the proposed algorithm is more promising and its extraction effect is more robust than previously studied ensemble methods.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
HuiXin Liu "Automatic keyword extraction based on dependency parsing and BERT semantic weighting", Proc. SPIE 12587, Third International Seminar on Artificial Intelligence, Networking, and Information Technology (AINIT 2022), 125871C (22 February 2023); https://doi.org/10.1117/12.2667242
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Semantics

Dielectrophoresis

Feature extraction

Statistical modeling

Lab on a chip

Data modeling

Mining

Back to Top