|
|
Network Characterization of Domain Knowledge Clusters Based on Heterogeneous Information Network |
Yang Xinyi1,4,5, Yang Jianlin2,4,5, Ye Wenhao3,4,5 |
1.School of Journalism and Communication, Shaanxi Normal University, Xi’an 710119 2.School of Information Management, Nanjing University, Nanjing 210023 3.College of Information Management, Nanjing Agricultural University, Nanjing 210095 4.Key Laboratory of Data Engineering and Knowledge Services in Provincial Universities (Nanjing University), Nanjing 210023 5.National Security Development Research Institute of Nanjing University, Nanjing 210023 |
|
|
Abstract Domain knowledge clusters with multi-entity participation can reveal the structure of domain knowledge from macro and micro content and topology perspectives, which is critical for understanding the domain knowledge system as an whole. This study uses heterogeneous information networks (HIN) to construct the multiple knowledge entities such as authors, papers and journals, and their relationships, forming an HIN of domain knowledge. In network clustering, a graph neural network framework is introduced to integrate network structural features and textual content features to learn node vector representations, then the node representations are used to recalculate weights of the links. Moreover, along with the strategy of community splitting and merging, we use network community detection algorithms to identify domain knowledge clusters. Finally, we analyzed the domain knowledge clusters in terms of textual content and network characteristics to better understand their composition. Using a dataset in the fields of database, data mining, and content retrieval (DBDMIR) as empirical evidence, our framework improves clustering results by identifying domain knowledge clusters with clear semantics and a strong community structure. The textual features of domain knowledge clusters represent research topics, while their topological features reflect the formation mechanism and development situation. In particular, the star clusters that are formed by the relationships between papers published in journals reveal the research focus of key journals in the field. In contrast, networked clusters that are formed by dense citation relationships represent a relatively mature direction. In addition, networked clusters with sparse citation relationships and connectivity based on heterogeneous author-paper relationships represent emerging research directions. Inter-cluster connectivity analysis shows that preferred connections between domain knowledge clusters divide the domain into subdomains, while heterogeneous connection preferences demonstrate how knowledge is exchanged between knowledge clusters. An analysis that combines textual and topological features provides an overview of domain knowledge content and development, implying that domain knowledge clusters containing multiple entities have the potential to predict emerging topics.
|
Received: 22 November 2024
|
|
|
|
1 王伟, 杨建林. 基于引文网络重叠社团发现的图书情报领域学科主题结构分析[J]. 情报学报, 2020, 39(10): 1021-1033. 2 程秀峰, 邹晶晶, 叶光辉, 等. 融合Word2Vec的半积累引用共词网络的领域主题演化研究[J]. 情报学报, 2023, 42(7): 801-815. 3 李欣哲, 鲁晓. 国内外科技人才研究领域合作网络及主题分析[J]. 科学学研究, 2023, 41(9): 1570-1580, 1728. 4 Lee E W, Ho J C. PGB: a PubMed graph benchmark for heterogeneous network representation learning[C]// Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. New York: ACM Press, 2023: 5331-5335. 5 Wu S X, Wu Z X, Chen S H, et al. Community detection in blockchain social networks[J]. Journal of Communications and Information Networks, 2021, 6(1): 59-71. 6 Sun Y Z, Yu Y T, Han J W. Ranking-based clustering of heterogeneous information networks with star network schema[C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2009: 797-806. 7 Wang R, Shi C, Yu P S, et al. Integrating clustering and ranking on hybrid heterogeneous information network[C]// Proceedings of the Conference on Advances in Knowledge Discovery and Data Mining. Heidelberg: Springer, 2013: 583-594. 8 Deng H B, Han J W, Zhao B, et al. Probabilistic topic models with biased propagation on heterogeneous information networks[C]// Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2011: 1271-1279. 9 Wang Q, Peng Z H, Jiang F, et al. LSA-PTM: a propagation-based topic model using latent semantic analysis on heterogeneous information networks[C]// Proceedings of the 14th International Conference on Web-Age Information Management. Heidelberg: Springer, 2013: 13-24. 10 杨欣谊, 苏新宁. 领域知识结构认知——基于大数据环境的适用性分析[J]. 图书情报工作, 2024, 68(23): 4-16. 11 Huang L, Chen X, Zhang Y, et al. Identification of topic evolution: network analytics with piecewise linear representation and word embedding[J]. Scientometrics, 2022, 127(9): 5353-5383. 12 Chen G, Hong S Q, Du C X, et al. Comparing semantic representation methods for keyword analysis in bibliometric research[J]. Journal of Informetrics, 2024, 18(3): 101529. 13 Kammari M, Bhavani S D. Time-stamp based network evolution model for citation networks[J]. Scientometrics, 2023, 128(6): 3723-3741. 14 Zhai L, Yan X B. A directed collaboration network for exploring the order of scientific collaboration[J]. Journal of Informetrics, 2022, 16(4): 101345. 15 方阳, 谭真, 陈子阳, 等. 用于冷启动推荐的异质信息网络对比元学习[J]. 软件学报, 2023, 34(10): 4548-4564. 16 Pham P, Nguyen L T T, Nguyen N T, et al. ComGCN: community-driven graph convolutional network for link prediction in dynamic networks[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(9): 5481-5493. 17 郑玉艳, 王明省, 石川, 等. 异质信息网络中基于元路径的社团发现算法研究[J]. 中文信息学报, 2018, 32(9): 132-142. 18 Shi C, Wang R, Li Y T, et al. Ranking-based clustering on general heterogeneous information networks by network projection[C]// Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. New York: ACM Press, 2014: 699-708. 19 Jiang J Y, Li Z Y, Ju C J, et al. MARU: meta-context aware random walks for heterogeneous network representation learning[C]// Proceedings of the 29th ACM International Conference on Information & Knowledge Management. New York: ACM Press, 2020: 575-584. 20 Deng H B, Zhao B, Han J W. Collective topic modeling for heterogeneous networks[C]// Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 2011: 1109-1110. 21 Wang C, Danilevsky M, Liu J L, et al. Constructing topical hierarchies in heterogeneous information networks[C]// Proceedings of the 13th IEEE International Conference on Data Mining. Piscataway: IEEE, 2013: 767-776. 22 Wang C G, Song Y Q, El-Kishky A, et al. Incorporating world knowledge to document clustering via heterogeneous information networks[C]// Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2015: 1215-1224. 23 陈翔, 黄璐, 倪兴兴, 等. 基于动态语义网络分析的主题演化路径识别研究[J]. 情报学报, 2021, 40(5): 500-512. 24 熊回香, 唐明月, 叶佳鑫, 等. 融合加权异质网络与网络表示学习的学术信息推荐研究[J]. 现代情报, 2023, 43(5): 23-34. 25 易明, 刘明, 冯翠翠. 融合异质信息网络表示学习的跨领域推荐研究[J]. 情报学报, 2022, 41(4): 337-349. 26 毕达天, 张雪, 孔婧媛, 等. 基于异质图注意力网络与多特征融合的跨社交媒体用户识别研究[J]. 情报学报, 2024, 43(10): 1213-1226. 27 Wang X, Ji H Y, Shi C, et al. Heterogeneous graph attention network[C]// Proceedings of the World Wide Web Conference. New York: ACM Press, 2019: 2022-2032. 28 Zhang C X, Song D J, Huang C, et al. Heterogeneous graph neural network[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM Press, 2019: 793-803. 29 Lv Q S, Ding M, Liu Q, et al. Are we really making much progress?: Revisiting, benchmarking and refining heterogeneous graph neural networks[C]// Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. New York: ACM Press, 2021: 1150-1160. 30 巩永强, 王超, 王锐, 等. 复杂网络视角下的核心专利识别研究[J]. 情报理论与实践, 2022, 45(10): 103-113. 31 Wedell E, Park M, Korobskiy D, et al. Center-periphery structure in research communities[J]. Quantitative Science Studies, 2022, 3(1): 289-314. 32 Hagen N T. Harmonic allocation of authorship credit: source-level correction of bibliometric bias assures accurate publication and citation analysis[J]. PLoS One, 2008, 3(12): e4021. 33 Traag V A, Waltman L, van Eck N J. From Louvain to Leiden: guaranteeing well-connected communities[J]. Scientific Reports, 2019, 9: Article No.5233. 34 Blondel V D, Guillaume J L, Lambiotte R, et al. Fast unfolding of communities in large networks[J]. Journal of Statistical Mechanics: Theory and Experiment, 2008, 2008(10): P10008. 35 杨宁, 张志强, 黄飞虎, 等. 科学数据引用网络建模及演化特征分析——以基因表达数据集为例[J]. 现代情报, 2024, 44(5): 45-57. 36 张文, 谢锐, 余乐安, 等. 比特币交易网络结构演化特征与比特币价格波动的非对称联动研究[J]. 管理评论, 2024, 36(8): 39-51. |
|
|
|