|
|
Research on Data Recommendation Based on Community Detection of Citation Network |
Li Chengzan1,2, Li Jianhui1, Wang Xuezhi1, Shen Zhihong1, Du Yi1,2 |
1.Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190 2.University of Chinese Academy of Sciences, Beijing 100049 |
|
|
Abstract Scientific data is the input and output of scientific research activities and the core driving factor of scientific and technological innovation. Only through open sharing and wide distribution of scientific data can its value be brought into full play. However, the utilization rate and dissemination efficiency of current data publications are generally low. To accelerate the dissemination and reuse of scientific data and enhance the effectiveness of open sharing of scientific data, this paper proposes a data recommendation method based on community detection of a citation network. Considering the construction of the association network among data sets, papers, and authors, the Louvain algorithm is used for community detection from three association modes of co-authorship, co-citation, and coupling. The similarity between data sets and academic papers is, thus, calculated by combining the TF-IDF algorithm and cosine similarity, and then the connection between the data sets and communities in which the papers are located is constructed for data recommendation. The experimental results show that the data recommendation method can effectively find papers or authors of potential interest in data sets. In addition, it is found that in terms of contribution and stability of data recommendation, community detection based on coupling relationship performs the best, followed by co-authorship relationship, whereas citation relationship is greatly affected by publishing and citation times.
|
Received: 16 May 2019
|
|
|
|
1 Reinsel D, Gantz J, Rydning J. Data age 2025: the evolution of data to life-critical[J]. Framingham: IDC Analyze the Future, 2017: 2-16. 2 黎建辉, 吴超, 李成赞, 等. 科学数据出版调查[DB/OL]. Science Data Bank. DOI: 10.11922/sciencedb.840.87. 3 Nugroho R P, Zuiderwijk A, Janssen M, et al. A comparison of national open data policies: lessons learned[J]. Transforming Government: People, Process and Policy, 2015, 9(3): 286-308. 4 完颜邓邓, 高峰. 澳大利亚高校图书馆研究数据管理服务的调查分析[J]. 图书与情报, 2015(3): 71-76. 5 Boas Hall M. Henry oldenburg: shaping the royal society[M]. Oxford: Oxford University Press, 2002. 6 Chen P, Redner S. Community structure of the physical review citation network[J]. Journal of Informetrics, 2010, 4(3): 278-290. 7 Marion L S, Garfield E, Hargens L L, et al. Social network analysis and citation network analysis: complementary approaches to the study of scientific communication. Sponsored by SIG MET[J]. Proceedings of the American Society for Information Science and Technology, 2003, 40(1): 486-487. 8 Watts D J, Strogatz S H. Collective dynamics of ‘small-world’ networks[J]. Nature, 1998, 393(6684): 440-442. 9 Barabási A L, Albert R. Emergence of scaling in random networks[J]. Science, 1999, 286(5439): 509-512. 10 Broido A D, Clauset A. Scale-free networks are rare[J]. Nature Communications, 2019, 10(1): 1017. 11 叶腾, 韩丽川, 邢春晓, 等. 基于复杂网络的虚拟社区创新知识传播机制研究[J]. 现代图书情报技术, 2016(7/8): 70-77. 12 黄俊铭, 沈华伟, 程学旗. 利用社交网络的影响力骨架探索信息传播[J]. 中文信息学报, 2016, 30(2): 74-82. 13 David-Barrett T. Network effects of demographic transition[J]. Scientific Reports, 2019, 9(1): 2361. 14 Jiang M H, An H Z, Gao X Y, et al. Factors driving global carbon emissions: a complex network perspective[J]. Resources, Conservation and Recycling, 2019, 146: 431-440. 15 Li L N, Goodchild M F, Xu B. Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr[J]. Cartography and Geographic Information Science, 2013, 40(2): 61-77. 16 Tsou M H, Yang J A, Lusher D, et al. Mapping social activities and concepts with social media (Twitter) and web search engines (Yahoo and Bing): a case study in 2012 US Presidential Election[J]. Cartography and Geographic Information Science, 2013, 40(4): 337-348. 17 王艳东, 付小康, 李萌萌. 一种基于共词网络的社交媒体数据主题挖掘方法[J]. 武汉大学学报?信息科学版, 2018, 43(12): 2287-2294. 18 Li G S, Cai Z P, Yin G S, et al. Differentially private recommendation system based on community detection in social network applications[J]. Security and Communication Networks, 2018, 2018: 1-18. 19 Kaffash S, Marra M. Data envelopment analysis in financial services: a citations network analysis of banks, insurance companies and money market funds[J]. Annals of Operations Research, 2017, 253(1): 307-344. 20 Karunan K, Lathabai H H, Prabhakaran T. Discovering interdisciplinary interactions between two research fields using citation networks[J]. Scientometrics, 2017, 113(1): 335-367. 21 Lathabai H H, Prabhakaran T, Changat M. Centrality and flow vergence gradient based path analysis of scientific literature: a case study of biotechnology for engineering[J]. Physica A: Statistical Mechanics and its Applications, 2015, 429: 157-168. 22 Lathabai H H, Prabhakaran T, Changat M. Contextual productivity assessment of authors and journals: a network scientometric approach[J]. Scientometrics, 2017, 110(2): 711-737. 23 Leydesdorff L, Wagner C S, Bornmann L. Betweenness and diversity in journal citation networks as measures of interdisciplinarity: a tribute to eugene garfield[J]. Scientometrics, 2018, 114(2): 567-592. 24 Liu J S, Chen H H, Ho M H C, et al. Citations with different levels of relevancy: tracing the main paths of legal opinions[J]. Journal of the Association for Information Science and Technology, 2014, 65(12): 2479-2488. 25 Kessler M M. Bibliographic coupling between scientific papers[J]. American Documentation, 1963, 14(1): 10-25. 26 Small H. Co-citation in the scientific literature: a new measure of the relationship between two documents[J]. Journal of the American Society for Information Science, 1973, 24(4): 265-269. 27 Small H, Griffith B C. The structure of scientific literatures I: identifying and graphing specialties[J]. Science Studies, 1974, 4(1): 17-40. 28 肖雪, 陈云伟, 邓勇. 引文网络的社团划分研究进展综述[J]. 情报杂志, 2016, 35(4): 125-130. 29 Huang M H, Chang C P. A comparative study on detecting research fronts in the organic light-emitting diode (OLED) field using bibliographic coupling and co-citation[J]. Scientometrics, 2015, 102(3): 2041-2057. 30 Newman M E J. Coauthorship networks and patterns of scientific collaboration[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(Suppl 1): 5200-5205. 31 韩青, 周晓英. 基于文献共被引特征的文献相似度计算优化研究[J]. 情报学报, 2018, 37(9): 905-911. 32 Zhao F, Zhang Y, Lu J G, et al. Measuring academic influence using heterogeneous author-citation networks[J]. Scientometrics, 2019, 118(3): 1119-1140. 33 Massucci F A, Docampo D. Measuring the academic reputation through citation networks via PageRank[J]. Journal of Informetrics, 2019, 13(1): 185-201. 34 Zhou J, Cai N, Tan Z Y, et al. Analysis of effects to journal impact factors based on citation networks generated via social computing[J]. IEEE Access, 2019, 7: 19775-19781. 35 West J D, Wesley-Smith I, Bergstrom C T. A recommendation system based on hierarchical clustering of an article-level citation network[J]. IEEE Transactions on Big Data, 2016, 2(2): 113-123. 36 Haruna K, Ismail M A, Bichi A B, et al. A citation-based recommender system for scholarly paper recommendation[C]// Proceedings of the International Conference on Computational Science and Its Applications. Cham: Springer, 2018: 514-525. 37 钱俊松, 冷文浩. 基于图数据库的TDM系统设计与实现[J]. 软件导刊, 2018, 17(9): 196-199. 38 Fortunato S, Hric D. Community detection in networks: a user guide[J]. Physics Reports, 2016, 659: 1-44. 39 Yang Z, Algesheimer R, Tessone C J. A comparative analysis of community detection algorithms on artificial networks[J]. Scientific Reports, 2016, 6: 30750. 40 Lancichinetti A, Fortunato S. Community detection algorithms: a comparative analysis[J]. Physical Review E, 2009, 80(5): 056117. 41 徐进, 邓乐龄. 基于Louvain算法的铁路旅客社会网络社区划分研究[J]. 山东农业大学学报(自然科学版), 2018, 49(4): 722-725. 42 陶跃华, 王锡钢, 王云爱. 信息检索向量空间模型中特征提取的研究[J]. 云南师范大学学报(自然科学版), 2000, 20(6): 18-20. 43 Tata S, Patel J M. Estimating the selectivity of TF-IDF based cosine similarity predicates[J]. ACM SIGMOD Record, 2007, 36(4): 75-80. 44 武永亮, 赵书良, 李长镜, 等. 基于TF-IDF和余弦相似度的文本分类方法[J]. 中文信息学报, 2017, 31(5): 138-145. |
|
|
|