Network Characterization of Domain Knowledge Clusters Based on Heterogeneous Information Network
Yang Xinyi1,4,5, Yang Jianlin2,4,5, Ye Wenhao3,4,5
1.School of Journalism and Communication, Shaanxi Normal University, Xi’an 710119 2.School of Information Management, Nanjing University, Nanjing 210023 3.College of Information Management, Nanjing Agricultural University, Nanjing 210095 4.Key Laboratory of Data Engineering and Knowledge Services in Provincial Universities (Nanjing University), Nanjing 210023 5.National Security Development Research Institute of Nanjing University, Nanjing 210023
摘要多主体参与的领域知识聚类能够从宏观和微观、内容与结构等多维度展现领域知识结构,对于认知领域知识的完整体系具有重要意义。本研究利用异质信息网络建构学者、论文及期刊等多类型知识实体与关系,形成知识的异质信息网络;在网络聚类中,引入图神经网络框架,融合网络结构特征与文本内容特征学习节点向量表示,利用节点表示更新连边权重,结合网络社团检测算法和社团归并、裂变策略识别领域知识簇。最后,从文本内容和网络特征两个方面分析领域知识簇,认知领域知识构成。以数据库/数据挖掘/内容检索(database, data mining, content retrieval,DBDMIR)领域的数据集为例进行实证,本研究的聚类流程改善了聚类效果,识别了语义明确、社团结构显著的领域知识簇。领域知识簇的文本特征表述了领域内的研究主题,拓扑特征反映了知识簇的形成机制和发展情况,比如,以论文发表在期刊的关系形成的星形知识簇揭示了领域内重要期刊的研究焦点,引用关系密集的网状知识簇代表了相对成熟的方向,而引用关系稀疏、依赖作者-论文间的异质关系连通的网状知识簇代表了新兴的研究方向。簇间关联分析表明,知识簇间的偏好连接将领域知识划分为多个子领域,异质连接偏好展示了知识簇间的知识交流方式。文本和网络特征的综合分析展示了领域知识发展的全貌,展现了多主体参与的领域知识簇在预测新兴主题方面的潜力。
杨欣谊, 杨建林, 叶文豪. 基于异质信息网络的领域知识簇网络特征分析[J]. 情报学报, 2025, 44(9): 1128-1143.
Yang Xinyi, Yang Jianlin, Ye Wenhao. Network Characterization of Domain Knowledge Clusters Based on Heterogeneous Information Network. 情报学报, 2025, 44(9): 1128-1143.
1 王伟, 杨建林. 基于引文网络重叠社团发现的图书情报领域学科主题结构分析[J]. 情报学报, 2020, 39(10): 1021-1033. 2 程秀峰, 邹晶晶, 叶光辉, 等. 融合Word2Vec的半积累引用共词网络的领域主题演化研究[J]. 情报学报, 2023, 42(7): 801-815. 3 李欣哲, 鲁晓. 国内外科技人才研究领域合作网络及主题分析[J]. 科学学研究, 2023, 41(9): 1570-1580, 1728. 4 Lee E W, Ho J C. PGB: a PubMed graph benchmark for heterogeneous network representation learning[C]// Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. New York: ACM Press, 2023: 5331-5335. 5 Wu S X, Wu Z X, Chen S H, et al. Community detection in blockchain social networks[J]. Journal of Communications and Information Networks, 2021, 6(1): 59-71. 6 Sun Y Z, Yu Y T, Han J W. Ranking-based clustering of heterogeneous information networks with star network schema[C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2009: 797-806. 7 Wang R, Shi C, Yu P S, et al. Integrating clustering and ranking on hybrid heterogeneous information network[C]// Proceedings of the Conference on Advances in Knowledge Discovery and Data Mining. Heidelberg: Springer, 2013: 583-594. 8 Deng H B, Han J W, Zhao B, et al. Probabilistic topic models with biased propagation on heterogeneous information networks[C]// Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2011: 1271-1279. 9 Wang Q, Peng Z H, Jiang F, et al. LSA-PTM: a propagation-based topic model using latent semantic analysis on heterogeneous information networks[C]// Proceedings of the 14th International Conference on Web-Age Information Management. Heidelberg: Springer, 2013: 13-24. 10 杨欣谊, 苏新宁. 领域知识结构认知——基于大数据环境的适用性分析[J]. 图书情报工作, 2024, 68(23): 4-16. 11 Huang L, Chen X, Zhang Y, et al. Identification of topic evolution: network analytics with piecewise linear representation and word embedding[J]. Scientometrics, 2022, 127(9): 5353-5383. 12 Chen G, Hong S Q, Du C X, et al. Comparing semantic representation methods for keyword analysis in bibliometric research[J]. Journal of Informetrics, 2024, 18(3): 101529. 13 Kammari M, Bhavani S D. Time-stamp based network evolution model for citation networks[J]. Scientometrics, 2023, 128(6): 3723-3741. 14 Zhai L, Yan X B. A directed collaboration network for exploring the order of scientific collaboration[J]. Journal of Informetrics, 2022, 16(4): 101345. 15 方阳, 谭真, 陈子阳, 等. 用于冷启动推荐的异质信息网络对比元学习[J]. 软件学报, 2023, 34(10): 4548-4564. 16 Pham P, Nguyen L T T, Nguyen N T, et al. ComGCN: community-driven graph convolutional network for link prediction in dynamic networks[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(9): 5481-5493. 17 郑玉艳, 王明省, 石川, 等. 异质信息网络中基于元路径的社团发现算法研究[J]. 中文信息学报, 2018, 32(9): 132-142. 18 Shi C, Wang R, Li Y T, et al. Ranking-based clustering on general heterogeneous information networks by network projection[C]// Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. New York: ACM Press, 2014: 699-708. 19 Jiang J Y, Li Z Y, Ju C J, et al. MARU: meta-context aware random walks for heterogeneous network representation learning[C]// Proceedings of the 29th ACM International Conference on Information & Knowledge Management. New York: ACM Press, 2020: 575-584. 20 Deng H B, Zhao B, Han J W. Collective topic modeling for heterogeneous networks[C]// Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 2011: 1109-1110. 21 Wang C, Danilevsky M, Liu J L, et al. Constructing topical hierarchies in heterogeneous information networks[C]// Proceedings of the 13th IEEE International Conference on Data Mining. Piscataway: IEEE, 2013: 767-776. 22 Wang C G, Song Y Q, El-Kishky A, et al. Incorporating world knowledge to document clustering via heterogeneous information networks[C]// Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2015: 1215-1224. 23 陈翔, 黄璐, 倪兴兴, 等. 基于动态语义网络分析的主题演化路径识别研究[J]. 情报学报, 2021, 40(5): 500-512. 24 熊回香, 唐明月, 叶佳鑫, 等. 融合加权异质网络与网络表示学习的学术信息推荐研究[J]. 现代情报, 2023, 43(5): 23-34. 25 易明, 刘明, 冯翠翠. 融合异质信息网络表示学习的跨领域推荐研究[J]. 情报学报, 2022, 41(4): 337-349. 26 毕达天, 张雪, 孔婧媛, 等. 基于异质图注意力网络与多特征融合的跨社交媒体用户识别研究[J]. 情报学报, 2024, 43(10): 1213-1226. 27 Wang X, Ji H Y, Shi C, et al. Heterogeneous graph attention network[C]// Proceedings of the World Wide Web Conference. New York: ACM Press, 2019: 2022-2032. 28 Zhang C X, Song D J, Huang C, et al. Heterogeneous graph neural network[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM Press, 2019: 793-803. 29 Lv Q S, Ding M, Liu Q, et al. Are we really making much progress?: Revisiting, benchmarking and refining heterogeneous graph neural networks[C]// Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. New York: ACM Press, 2021: 1150-1160. 30 巩永强, 王超, 王锐, 等. 复杂网络视角下的核心专利识别研究[J]. 情报理论与实践, 2022, 45(10): 103-113. 31 Wedell E, Park M, Korobskiy D, et al. Center-periphery structure in research communities[J]. Quantitative Science Studies, 2022, 3(1): 289-314. 32 Hagen N T. Harmonic allocation of authorship credit: source-level correction of bibliometric bias assures accurate publication and citation analysis[J]. PLoS One, 2008, 3(12): e4021. 33 Traag V A, Waltman L, van Eck N J. From Louvain to Leiden: guaranteeing well-connected communities[J]. Scientific Reports, 2019, 9: Article No.5233. 34 Blondel V D, Guillaume J L, Lambiotte R, et al. Fast unfolding of communities in large networks[J]. Journal of Statistical Mechanics: Theory and Experiment, 2008, 2008(10): P10008. 35 杨宁, 张志强, 黄飞虎, 等. 科学数据引用网络建模及演化特征分析——以基因表达数据集为例[J]. 现代情报, 2024, 44(5): 45-57. 36 张文, 谢锐, 余乐安, 等. 比特币交易网络结构演化特征与比特币价格波动的非对称联动研究[J]. 管理评论, 2024, 36(8): 39-51.