|
|
Construction and Analysis of Semantic-Enhanced Full-Text Co-Occurrence Network |
Zhao Yiming1,2,3, Yin Jiaying2,4 |
1.Center for Studies of Information Resources, Wuhan University, Wuhan 430072 2.School of Information Management, Wuhan University, Wuhan 430072 3.Big Data Institute, Wuhan University, Wuhan 430072 4.National Demonstration Center for Experimental Library and Information Science Education, Wuhan University, Wuhan 430072 |
|
|
Abstract A co-occurrence network is an important method to investigate linguistic phenomena, while semantic features are important tacit knowledge in co-occurring words. Examining the semantic relationships and characteristics of such words can improve the research on co-occurrence networks from a semantic perspective and enrich the existing analysis methods with semantic knowledge. This study proposes a semantic-enhanced co-occurrence network construction and analysis method, which enriches the properties of co-occurrence network nodes and edges from the three dimensions of co-occurring, semantic, and network features. A semantic-enhanced co-occurrence network based on more than 140,000 news text data items is then constructed through experiments. Analysis of the semantic features of co-occurrence word reveals the value of semantic features in the field of computer linguistics and industry application. From the semantic perspective, this study expands the construction and analysis method of co-occurrence word network, describes the semantic characteristics of co-occurrence words, and verifies the asymmetric and transitive properties of semantic relations through experiments, which provides data verification for the classification and derivation of semantic relations. Through the semantic-enhanced co-occurrence network, semantic knowledge can be enriched in semantic disambiguation, word meaning understanding, and legal applications.
|
Received: 02 November 2022
|
|
|
|
1 Cong J, Liu H T. Approaching human language with complex networks[J]. Physics of Life Reviews, 2014, 11(4): 598-618. 2 赵怿怡, 刘海涛. 语言同现网、句法网、语义网的构建与比较[J]. 中文信息学报, 2014, 28(5): 24-31, 65. 3 Nasir J A, Varlamis I, Ishfaq S. A knowledge-based semantic framework for query expansion[J]. Information Processing & Management, 2019, 56(5): 1605-1617. 4 Segev E. Textual network analysis: detecting prevailing themes and biases in international news and social media[J]. Sociology Compass, 2020, 14(4): e12779. 5 Xiao K J, Qian Z P, Qin B. A graphical decomposition and similarity measurement approach for topic detection from online news[J]. Information Sciences, 2021, 570: 262-277. 6 Yang Y, Wang F. Author topic model for co-occurring normal documents and short texts to explore individual user preferences[J]. Information Sciences, 2021, 570: 185-199. 7 Tosi M D L, Dos Reis J C. Keyphrase extraction from single textual documents based on semantically defined background knowledge and co-occurrence graphs[J]. International Journal of Metadata, Semantics and Ontologies, 2021, 15(2): 121-132. 8 卫乃兴, 李文中, 濮建忠, 等. 变化中的语料库语言学[J]. 解放军外国语学院学报, 2014, 37(1): 1-9, 159. 9 Ferrer i Cancho R, Solé R V. The small world of human language[J]. Proceedings of the Royal Society of London, Series B: Biological Sciences, 2001, 268(1482): 2261-2265. 10 刘知远, 孙茂松. 汉语词同现网络的小世界效应和无标度特性[J]. 中文信息学报, 2007, 21(6): 52-58. 11 程齐凯, 王晓光. 一种基于共词网络社区的科研主题演化分析框架[J]. 图书情报工作, 2013, 57(8): 91-96. 12 耿志杰, 王文鼐. 关键词同现网络结构研究[J]. 情报杂志, 2010, 29(2): 14-16. 13 G?rnerup O, Karlgren J. Cross-lingual comparison between distributionally determined word similarity networks[C]// Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2010: 48-54. 14 王庆, 陈泽亚, 郭静, 等. 基于词共现矩阵的项目关键词词库和关键词语义网络[J]. 计算机应用, 2015, 35(6): 1649-1653. 15 李纲, 王忠义. 基于语义的共词分析方法研究[J]. 情报杂志, 2011, 30(12): 145-149. 16 Wang Z Y, Li G, Li C Y, et al. Research on the semantic-based co-word analysis[J]. Scientometrics, 2012, 90(3): 855-875. 17 张斌. 共词网络的结构与演化: 概念与理论进展[J]. 情报杂志, 2014, 33(7): 103-109. 18 Yang S, Tang Y. News topic detection based on capsule semantic graph[J]. Big Data Mining and Analytics, 2022, 5(2): 98-109. 19 Shi S Y, Ma W Z, Wang Z, et al. WG4Rec: modeling textual content with word graph for news recommendation[C]// Proceedings of the 30th ACM International Conference on Information & Knowledge Management. New York: ACM Press, 2021: 1651-1660. 20 赵一鸣, 吴林容, 任笑笑. 基于多知识图谱的中文文本语义图构建研究[J]. 情报科学, 2021, 39(4): 23-29. 21 冯佳, 张云秋. 基于语义距离的共词方法改进研究[J]. 图书馆杂志, 2017, 36(7): 66-73. 22 王忠义, 谭旭, 夏立新. 共词分析方法的细粒度化与语义化研究[J]. 情报学报, 2014, 33(9): 969-978. 23 冯璐, 冷伏海. 共词分析方法理论进展[J]. 中国图书馆学报, 2006, 32(2): 88-92. 24 马瑞敏, 闫晓慧, 申楠. 学科交叉直接测度研究[J]. 情报学报, 2019, 38(7): 688-696. 25 潘雨, 王帅辉, 张磊, 等. 复杂网络社团发现综述[J]. 计算机科学, 2022, 49(S2): 208-218. 26 Sinclair J. Corpus, concordance, collocation[M]. Oxford: Oxford University Press, 1991. 27 刘建鹏, 洪明. 基于语言网络的语义韵研究[J]. 浙江大学学报(人文社会科学版), 2018, 48(6): 69-82. 28 Thompson A. All the news[EB/OL]. [2023-02-12]. https://www.kaggle.com/datasets/snapcrack/all-the-news. 29 Strohman T, Metzler D, Turtle H, et al. Indri: a language-model based search engine for complex queries[C]// Proceedings of the International Conference on Intelligent Analysis, 2005: 2-6. 30 Speer R, Chin J, Havasi C. ConceptNet 5.5: an open multilingual graph of general knowledge[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2017, 31(1): 4444-4451. 31 Neo4j. Graph algorithms[EB/OL]. [2023-02-12]. https://neo4j.com/docs/graph-data-science/current/algorithms/. 32 Mouritsen S C. The dictionary is not a fortress: definitional fallacies and a corpus-based approach to plain meaning[J]. BYU Law Review, 2010, 2010(5): 1915-1980. 33 裘江南, 张彬. 客观知识体系中语义关系的分析分类研究[J]. 情报学报, 2012, 31(3): 259-267. |
|
|
|