A Study on the Stability of Semantic Representation of Entities in the Technology Domain-Comparison of Multiple Word Embedding Models
Chen Guo1, Xu Zan1, Hong Siqi1, Wu Jiahuan1, Xiao Lu2
1.School of Economics & Management, Nanjing University of Science & Technology, Nanjing 210094 2.School of Journalism, Nanjing University of Finance & Economics, Nanjing 210023
陈果, 徐赞, 洪思琪, 吴嘉桓, 肖璐. 科技领域词汇语义表示的稳定性研究:多种词嵌入模型对比[J]. 情报学报, 2024, 43(12): 1440-1452.
Chen Guo, Xu Zan, Hong Siqi, Wu Jiahuan, Xiao Lu. A Study on the Stability of Semantic Representation of Entities in the Technology Domain-Comparison of Multiple Word Embedding Models. 情报学报, 2024, 43(12): 1440-1452.
1 曹树金, 闫颂. 基于语义角色信息的科技论文创新段落定位及功能句识别方法研究——以中文情报学领域论文为例[J]. 情报理论与实践, 2022, 45(11): 1-9, 20. 2 Kutuzov A, ?vrelid L, Szymanski T, et al. Diachronic word embeddings and semantic shifts: a survey[C]// Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2018: 1384-1397. 3 Kulkarni V, Al-Rfou R, Perozzi B, et al. Statistically significant detection of linguistic change[C]// Proceedings of the 24th International Conference on World Wide Web. Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee, 2015: 625-635. 4 Rettenmeier L. Word embeddings: stability and semantic change[D/OL]. Heidelberg: University of Heidelberg, (2020-07-23). https://arxiv.org/pdf/2007.16006. 5 钟丽萍, 冷伏海, 罗世猛. 情报研究有效性的影响因素分析[J]. 情报理论与实践, 2013, 36(7): 6-9. 6 Chen G, Hong S Q, Du C X, et al. Comparing semantic representation methods for keyword analysis in bibliometric research[J]. Journal of Informetrics, 2024, 18(3): 101529. 7 段庆锋, 陈红, 闫绪娴, 等. 基于知识结构突变的学科新兴主题识别研究[J]. 情报学报, 2023, 42(9): 1018-1028. 8 刘志辉, 郑彦宁. 基于作者关键词耦合分析的研究专业识别方法研究[J]. 情报学报, 2013, 32(8): 788-796. 9 张颖怡, 章成志, 陈果. 基于关键词的学术文本聚类集成研究[J]. 情报学报, 2019, 38(8): 860-871. 10 陆泉, 曹越, 陈静. 基于语义关联与模糊聚类的共词分析方法[J]. 情报学报, 2022, 41(10): 1003-1014. 11 潘俊, 吴宗大. 词汇表示学习研究进展[J]. 情报学报, 2019, 38(11): 1222-1240. 12 Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1798-1828. 13 Gries S T. Particle movement: a cognitive and functional approach[J]. Cognitive Linguistics, 1999, 10(2): 105-145. 14 Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 4171-4186. 15 Wang Y X, Hou Y T, Che W X, et al. From static to dynamic word representations: a survey[J]. International Journal of Machine Learning and Cybernetics, 2020, 11(7): 1611-1630. 16 周潇, 高雅倩, 樊嘉逸. 基于BERT词嵌入的专利检索策略研究[J]. 情报学报, 2023, 42(11): 1347-1357. 17 程秀峰, 邹晶晶, 叶光辉, 等. 融合Word2Vec的半积累引用共词网络的领域主题演化研究[J]. 情报学报, 2023, 42(7): 801-815. 18 王卫军, 姚畅, 乔子越, 等. 基于词嵌入的国家自然科学基金学科交叉知识发现方法——以“人工智能”与“信息管理”为例[J]. 情报学报, 2021, 40(8): 831-845. 19 陈果, 许天祥. 小规模知识库指导下的细分领域实体关系发现研究[J]. 情报学报, 2019, 38(11): 1200-1211. 20 韩普, 王东波, 王子敏. 词汇相似度计算和相似词挖掘研究进展[J]. 情报科学, 2016, 34(9): 161-165. 21 Alahmari S S, Goldgof D B, Mouton P R, et al. Challenges for the repeatability of deep learning models[J]. IEEE Access, 2020, 8: 211860-211868. 22 Rinaldo A, Singh A, Nugent R, et al. Stability of density-based clustering[J]. Journal of Machine Learning Research, 2012, 13: 905-948. 23 Wendlandt L, Kummerfeld J K, Mihalcea R. Factors influencing the surprising instability of word embeddings[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2018: 2092-2102. 24 Chugh M, Whigham P A, Dick G. Stability of word embeddings using word2vec[C]// Proceedings of the 31st Australasian Joint Conference on Artificial Intelligence. Cham: Springer, 2018: 812-818. 25 Dridi A, Gaber M M, Azad R M A, et al. k-NN embedding stability for word2vec hyper-parametrisation in scientific text[C]// Proceedings of the 21st International Conference on Discovery Science. Cham: Springer, 2018: 328-343. 26 Borah A, Barman M P, Awekar A. Are word embedding methods stable and should we care about it?[C]// Proceedings of the 32nd ACM Conference on Hypertext and Social Media. New York: ACM Press, 2021: 45-55. 27 陈果, 陈晶, 肖璐. 词汇语义链: 领域分析视角下的词汇语义挖掘理论框架[J]. 情报理论与实践, 2022, 45(4): 170-176, 183. 28 Newman-Griffis D, Fosler-Lussier E. Second-order word embeddings from nearest neighbor topological features[OL]. (2017-05-23). https://arxiv.org/pdf/1705.08488. 29 刘知远, 刘扬, 涂存超, 等. 词汇语义变化与社会变迁定量观测与分析[J]. 语言战略研究, 2016, 1(6): 47-54. 30 潘俊, 吴宗大. 知识发现视角下词汇历时语义挖掘与可视化研究[J]. 情报学报, 2021, 40(10): 1052-1064. 31 张涛. 中文文本中未知词语的词义知识获取[D]. 太原: 山西大学, 2005. 32 陈果, 王盼停, 王曰芬. 文献集规模对科技领域情报分析的影响: 多种任务场景下的实证分析[J]. 情报学报, 2021, 40(8): 869-878. 33 Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[OL]. (2013-09-07). https://arxiv.org/pdf/1301.3781. 34 张剑, 屈丹, 李真. 基于词向量特征的循环神经网络语言模型[J]. 模式识别与人工智能, 2015, 28(4): 299-305.