|
|
Research on Core Word Extraction Algorithm Based on Contextual Concept |
Shi Jin1, Han Jin2, Zhao Xiaoke1, Liu Qianli1 |
1.School of Information Management, Nanjing University, Nanjing 210023 2.Col1ege of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044 |
|
|
Abstract As there has been little domestic or international research on the context core word extraction algorithm—researchers have mainly focused on the keyword extraction algorithm—this paper proposes a context-based dependency grammar analysis algorithm. Right at the outset, this paper proves the equivalency of the dependency parsing problem to the splitting of a sentence to obtain the minimum scale context and to find the core words in the minimum scale context. In order to solve these problems, this paper proposes two context core word solvingthe context core word extraction algorithm based on entropy comparison; and the context core word extraction algorithm based on the sum of indegrees comparison, and a minimum context solving algorithm is proposed to construct a dependency grammar tree. Data from 1152 valid papers in the Journal of the China Society for Scientific and Technical Information from 2007 to 2018 was collected for testing, which was compared with the keywords extracted by the classic keyword extraction algorithms TF/IDF, Text-Rank, and LDA. The experimental results show that the context-based dependency grammar analysis algorithm has a positive effect on the extraction of keywords.
|
Received: 10 December 2018
|
|
|
|
1 冯志伟. 特思尼耶尔的从属关系语法[J]. 国外语言学, 1983(1): 63-65, 57. 2 聂卉, 杜嘉忠. 依存句法模板下的商品特征标签抽取研究[J]. 现代图书情报技术, 2014(12): 44-50. 3 LuhnH P. The automatic creation of literature abstracts[J]. IBM Journal of Research and Development, 1958, 2(2): 159-165. 4 MatsuoY, IshizukaM. Keyword extraction from a single document using word co-occurrence statistical information[J]. International Journal on Artificial Intelligence Tools, 2004, 13(1): 157-169. 5 ErcanG, CicekliI. Using lexical chains for keyword extraction[J]. Information Processing & Management, 2007, 43(6): 1705-1714. 6 LitvakM, LastM. Graph-based keyword extraction for single-document summarization[C]// Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization. Stroudsburg: Association for Computational Linguistics, 2008: 17-24. 7 LiX H, WuX D, HuX G, et al. Keyword extraction based on lexical chains and word co-occurrence for Chinese news web pages[C]// Proceedings of IEEE International Conference on Data Mining Workshops. New York:. IEEE, 2008: 744-751. 8 XieF, WuX D, HuX G. Keyphrase extraction based on semantic relatedness[C]// Proceedings of IEEE International Conference on Cognitive Informatics. New York: IEEE, 2010: 308-312. 9 陈荣. 语境研究的认知转向[J]. 贵阳学院学报(社会科学版), 2011, 6(2): 88-92. 10 许力生. 语言学研究的语境理论构建[J]. 浙江大学学报(人文社会科学版), 2006, 36(4): 158-165. 11 汪徽, 张辉. van Dijk的多学科语境理论述评[J]. 外国语(上海外国语大学学报), 2014, 37(2): 78-85. 12 modelContext[J]. Wikipedia[EB/OL]. [2019-11-10]. https://en.wikipedia.org/w/index.php?title=Context_model&oldid=796885540. 13 GebhardtJ, KruseR. The context model: An integrating view of vagueness and uncertainty[J]. International Journal of Approximate Reasoning, 1993, 9(3): 283-314. 14 KleinD, ManningC D. A generative constituent-context model for improved grammar induction[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2002: 128-135. 15 GuT, WangX H, PungH K, et al. An ontology-based context model in intelligent environments[C]// Proceedings of Communication Networks and Distributed Systems Modeling and Simulation Conference, 2004: 270-275. 16 WangX H, ZhangD Q, GuT, et al. Ontology based context modeling and reasoning using OWL[C]// Proceedings of the Second IEEE Annual Conference on Pervasive Computing and Communications Workshops. New York: IEEE, 2004: 18-22. 17 ChengH, BoumanC A. Multiscale Bayesian segmentation using a trainable context model[J]. IEEE Transactions on Image Processing, 2001, 10(4): 511-525. 18 ChoiM J, TorralbaA, WillskyA S. A tree-based context model for object recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(2): 240-252. 19 HanJ, SchmidtkeH R, XieX, et al. Adaptive content recommendation for mobile users: Ordering recommendations using a hierarchical context model with granularity[J]. Pervasive and Mobile Computing, 2014, 13: 85-98. 20 WangX Y, JiQ. A hierarchical context model for event recognition in surveillance video[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2014: 2561-2568. 21 LiuX P, Wang XiN, XueL, et al. Enforcing context model based service-oriented architecture policies and policy engine: US8788566[P]. 2014. 22 RobinsonJ J. Dependency structures and transformational rules[J]. Language, 1968, 46(2): 259-285. 23 RamosJ. Using TF-IDF to determine word relevance in document queries[C]// Proceedings of the First Instructional Conference on Machine Learning. 2003, 242: 133-142. 24 PageL, BrinS, MotwaniR, et al. The PageRank citation ranking: Bringing order to the web[R]. Stanford InfoLab, 1999. 25 MihalceaR, TarauP. TextRank: Bringing order into text[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2004. 26 BleiD M, NgA Y, JordanM I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. 27 LinC Y, HovyE. Automatic evaluation of summaries using N-gram co-occurrence statistics[C]// Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language. Stroudsburg: Association for Computational Linguistics, 2003: 71-78. 28 NieslerT R, WoodlandP C. A variable-length category-based n-gram language model[C]// Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. New York: IEEE, 1996, 1: 164-167. 29 MikolovT, KarafiátM, BurgetL, et al. Recurrent neural network based language model[C]// Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH2010: 1045-1048. |
|
|
|