|
|
Smap: Visualization of Scientific Knowledge Landscape Based on Document Semantics |
Zhang Shuang1,2, Liu Feifan1,2, Luo Shuangling3, Xia Haoxiang1,2 |
1.Institute of Systems Engineering, Dalian University of Technology, Dalian 116024 2.Research Center for Big Data and Intelligent Decision-Making, Dalian University of Technology, Dalian 116024 3.School of Maritime Economics and Management, Dalian Maritime University, Dalian 116026 |
|
|
Abstract Given the explosive growth of academic literature, the continuous cross-fusion of knowledge, and the expansion and the increasing complexity of scientific research, widespread attention has been drawn to clearly visualizing the knowledge structure drown in massive amounts of literature as well as grasping development trends. Based on document representation learning and manifold learning algorithms, we suggest a method for constructing a semantic map (Smap). First, Doc2Vec is adopted to capture the high-dimensional semantic features between documents; then, UMAP (uniform manifold approximation and projection) is utilized to perform non-linear dimensionality reduction on the semantic proximity of documents. Finally, the kernel density estimation is employed to characterize the knowledge structure according to the heterogeneity of the document distribution. In the empirical experiments, we cover four scientific domains, ranging from thousands-level to millions-level of documents. Then, we construct an Smap, identify knowledge hierarchical structure, and analyze their dynamic evolution. Furthermore, using the classification system provided by Microsoft Academic Graph (MAG), citation relations, and keywords, we quantify the local purity of the document distribution on Smap and the correlation between the map distance and research distinction to verify the effectiveness of the proposed method. By comparing with controlled experiments, we further demonstrate the significance of the effectiveness of our method. This study expands the current methods of visualization systems in the scientific field and provides an alternative visualization method for scientific and technological information services.
|
Received: 06 December 2021
|
|
|
|
1 Fortunato S, Bergstrom C T, B?rner K, et al. Science of science[J]. Science, 2018, 359(6379): eaao0185. 2 Zeng A, Shen Z S, Zhou J L, et al. The science of science: from the perspective of complex systems[J]. Physics Reports, 2017, 714/715: 1-73. 3 Boyack K W. Mapping knowledge domains: characterizing PNAS[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(Suppl 1): 5192-5199. 4 Okada A, Shum S B, Sherborne T. Knowledge cartography: software tools and mapping techniques[M]. London: Springer, 2008. 5 B?rner K. Atlas of science: visualizing what we know[M]. Cambridge: MIT Press, 2010. 6 陈云伟, B?rnerK. 论科学地图的科学价值[J]. 图书情报知识, 2009(6): 27-33, 74. 7 陈云伟, 方曙. 科学地图在情报研究中的应用研究[J]. 情报资料工作, 2011(3): 28-33. 8 Boyack K W, Klavans R, B?rner K. Mapping the backbone of science[J].Scientometrics, 2005, 64(3): 351-374. 9 Zhang C H, Li Z Y, Zhang J W. A survey on visualization for scientific literature topics[J]. Journal of Visualization, 2018, 21(2): 321-335. 10 叶六奇, 石晶. 知识地图的构建方法论研究[J]. 图书情报工作, 2012, 56(10): 30-34. 11 Small H. Co-citation in the scientific literature: a new measure of the relationship between two documents[J]. Journal of the American Society for Information Science, 1973, 24(4): 265-269. 12 Calero-Medina C, Noyons E C M. Combining mapping and citation network analysis for a better understanding of the scientific development: the case of the absorptive capacity field[J]. Journal of Informetrics, 2008, 2(4): 272-279. 13 Waltman L, van Eck N J, Noyons E C M. A unified approach to mapping and clustering of bibliometric networks[J]. Journal of Informetrics, 2010, 4(4): 629-635. 14 Velden T, Yan S Y, Lagoze C. Mapping the cognitive structure of astrophysics by infomap clustering of the citation network and topic affinity analysis[J]. Scientometrics, 2017, 111(2): 1033-1051. 15 Guilarte O F, Barbosa S D J, Pesco S. RelPath: an interactive tool to visualize branches of studies and quantify the expertise of authors by citation paths[J]. Scientometrics, 2021, 126(6): 4871-4897. 16 Fried D, Kobourov S G. Maps of computer science[C]// Proceedings of the 2014 IEEE Pacific Visualization Symposium. IEEE, 2014: 113-120. 17 刘自强, 岳丽欣, 许海云, 等. 时序共词网络构建及其动态可视化研究[J]. 情报学报, 2020, 39(2): 186-198. 18 郝佳, 阎艳, 王国新, 等. 基于潜在语义分析的领域知识地图构建技术[J]. 北京理工大学学报, 2014, 34(7): 691-694, 722. 19 Oesterling P, Scheuermann G, Teresniak S, et al. Two-stage framework for a topology-based projection and visualization of classified document collections[C]// Proceedings of the 2010 IEEE Symposium on Visual Analytics Science and Technology. IEEE, 2010: 91-98. 20 Klavans R, Boyack K W. Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge?[J]. Journal of the Association for Information Science and Technology, 2017, 68(4): 984-998. 21 Boyack K W, Newman D, Duhon R J, et al. Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches[J]. PLoS One, 2011, 6(3): e18029. 22 Skupin A. The world of geography: visualizing a knowledge domain with cartographic means[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(Suppl 1): 5274-5278. 23 Chen C M. CiteSpace II: detecting and visualizing emerging trends and transient patterns in scientific literature[J]. Journal of the American Society for Information Science and Technology, 2006, 57(3): 359-377. 24 van Eck N J, Waltman L. CitNetExplorer: a new software tool for analyzing and visualizing citation networks[J]. Journal of Informetrics, 2014, 8(4): 802-823. 25 王晓光, 程齐凯. 基于NEViewer的学科主题演化可视化分析[J]. 情报学报, 2013, 32(9): 900-911. 26 van Eck N J, Waltman L. Citation-based clustering of publications using CitNetExplorer and VOSviewer[J]. Scientometrics, 2017, 111(2): 1053-1070. 27 Sci2 Team. Science of science (Sci2) tool[CP/OL]. Indiana University and SciTech Strategies. https://sci2.cns.iu.edu. 28 王宗水, 刘海燕, 刘苇, 等. 基于时间拓展网络的知识发现与发展路径识别——以信息管理领域为例[J]. 情报学报, 2021, 40(9): 993-1003. 29 罗双玲, 张文琪, 夏昊翔. 基于半积累引文网络社区发现的学科领域主题演化分析——以“合作演化”领域为例[J]. 情报学报, 2017, 36(1): 100-110. 30 董克, 张斌. 学科知识扩散网络路径识别研究进展[J]. 情报理论与实践, 2017, 40(8): 139-144. 31 Saket B, Scheidegger C, Kobourov S G, et al. Map-based visualizations increase recall accuracy of data[J]. Computer Graphics Forum, 2015, 34(3): 441-450. 32 B?rner K, Simpson A H, Bueckle A, et al. Science map metaphors: a comparison of network versus hexmap-based visualizations[J]. Scientometrics, 2018, 114(2): 409-426. 33 Skupin A. A cartographic approach to visualizing conference abstracts[J]. IEEE Computer Graphics and Applications, 2002, 22(1): 50-58. 34 Osinska V. A qualitative-quantitative study of science mapping by different algorithms: the Polish journals landscape[J]. Journal of Information Science, 2021, 47(3): 359-372. 35 van Eck N J, Waltman L. Software survey: VOSviewer, a computer program for bibliometric mapping[J]. Scientometrics, 2010, 84(2): 523-538. 36 Tosi M D L, dos Reis J C. SciKGraph: a knowledge graph approach to structure a scientific field[J]. Journal of Informetrics, 2021, 15(1): 101109. 37 Jensen S, Liu X Z, Yu Y Y, et al. Generation of topic evolution trees from heterogeneous bibliographic networks[J]. Journal of Informetrics, 2016, 10(2): 606-621. 38 Feng J, Mu X M, Wang W, et al. A topic analysis method based on a three-dimensional strategic diagram[J]. Journal of Information Science, 2021, 47(6): 770-782. 39 Jiang X Y, Zhang J W. A text visualization method for cross-domain research topic mining[J]. Journal of Visualization, 2016, 19(3): 561-576. 40 Davidson G S, Hendrickson B, Johnson D K, et al. Knowledge mining with VxInsight: discovery through interaction[J]. Journal of Intelligent Information Systems, 1998, 11(3): 259-285. 41 刘非凡, 张爽, 罗双玲, 等. 基于深度图神经网络方法的领域知识结构探测[J]. 情报学报, 2021, 40(11): 1209-1220. 42 王昊, 邓三鸿, 苏新宁, 等. 基于深度学习的情报学理论及方法术语识别研究[J]. 情报学报, 2020, 39(8): 817-828. 43 Becht E, McInnes L, Healy J, et al. Dimensionality reduction for visualizing single-cell data using UMAP[J]. Nature Biotechnology, 2019, 37(1): 38-44. 44 McInnes L, Healy J, Saul N, et al. UMAP: uniform manifold approximation and projection[J]. Journal of Open Source Software, 2018, 3(29): 861. 45 Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2013: 3111-3119. 46 Grootendorst M. MaartenGr/KeyBERT: BibTex[CP/OL]. v0.1.3. Zenodo. (2021-01-25). https://doi.org/10.5281/zenodo.4461265. 47 Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[C]// Proceedings of the International Conference on Learning Representations, Scottsdale, 2013: 1-12. 48 Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017: 427-431. 49 Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2018: 2227-2237. 50 Rao G Z, Huang W H, Feng Z Y, et al. LSTM with sentence representations for document-level sentiment classification[J]. Neurocomputing, 2018, 308: 49-57. 51 Cohan A, Feldman S, Beltagy I, et al. SPECTER: document-level representation learning using citation-informed transformers[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 2270-2282. 52 Le Q V, Mikolov T. Distributed representations of sentences and documents[C]// Proceedings of the 31st International Conference on Machine Learning. JMLR.org, 2014: II-1188-II-1196. 53 Bojanowski P, Grave E, Joulin A, et al. Enriching word vectors with subword information[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 135-146. 54 Arora S, Liang Y Y, Ma T Y. A simple but tough-to-beat baseline for sentence embeddings[C]// Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 2017. 55 Reimers N, Gurevych I. Sentence-BERT: sentence embeddings using Siamese BERT-networks[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2019: 3982-3992. 56 Coenen A, Pearce A. Understanding UMAP[EB/OL]. [2022-05-26]. https://pair-code.github.io/understanding-umap/. 57 Sparck Jones K. A statistical interpretation of term specificity and its application in retrieval[J]. Journal of Documentation, 1972, 28(1): 11-21. 58 Alexander J M, Himmelreich J, Thompson C. Epistemic landscapes, optimal search, and the division of cognitive labor[J]. Philosophy of Science, 2015, 82(3): 424-453. 59 Weisberg M, Muldoon R. Epistemic landscapes and the division of cognitive labor[J]. Philosophy of Science, 2009, 76(2): 225-252. 60 Blondel V D, Guillaume J L, Lambiotte R, et al. Fast unfolding of communities in large networks[J]. Journal of Statistical Mechanics: Theory and Experiment, 2008, 2008: P10008. 61 Yan E J, Ding Y. Scholarly network similarities: how bibliographic coupling networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks relate to each other[J]. Journal of the American Society for Information Science and Technology, 2012, 63(7): 1313-1326. 62 中国计算机学会推荐国际学术会议和期刊目录[EB/OL]. [2022-05-12]. https://www.ccf.org.cn/Academic_Evaluation/By_ category/. 63 Sinha A, Shen Z H, Song Y, et al. An overview of Microsoft Academic Service (MAS) and applications[C]// Proceedings of the 24th International Conference on World Wide Web. New York: ACM Press, 2015: 243-246. 64 Kuhn T, Perc M, Helbing D. Inheritance patterns in citation networks reveal scientific memes[J]. Physical Review X, 2014, 4: 041036. 65 van der Maaten L, Hinton G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(86): 2579-2605. |
|
|
|