|
|
2020 Vol. 39, No. 10
Published: 2020-10-28 |
|
|
|
|
|
|
|
1011 |
Frontier Detection in Interdisciplinary Research from the Perspective of Altmetrics: Taking Medical Informatics as an Example Hot! |
|
|
Wang Feifei, Liu Ming |
|
|
DOI: 10.3772/j.issn.1000-0135.2020.10.001 |
|
|
This paper combines the data of Altmetrics based on online data with the traditional method of frontier detection to compensate for the shortcomings of the traditional leading edge detection method. To this end, this study used the Web of Science database and the Altmetric.com platform to obtain traditional metrics and data of Altmetrics. Five evaluation indicators were constructed for the frontier detection of the study—immediacy, growth, scientific influence, social attention, and intersectionality. Specifically, 55 research topics were extracted from experimental data using the LDA algorithm, and the frontier detection indicator score for each topic was calculated. Next, the principal component analysis method, the entropy weight method, and the gray correlation degree method were used to comprehensively evaluate the five evaluation indicators for each topic. By calculating the Kendall’s Coefficient of Concordance of the three evaluation methods, it can be verified that the evaluation method results are consistent and the results are acceptable. Finally, four topics were selected as cutting-edge topics with better frontier properties according to the skyline method. The experimental results show that the introduction of Altmetrics data can supplement the traditional frontier detection method, and the final extracted result is ideal, which is in line with the actual development needs. |
|
|
2020 Vol. 39 (10): 1011-1020
[Abstract]
(
245
)
HTML
(129 KB)
PDF
(1741 KB)
(
1104
) |
|
|
|
1021 |
Mapping the Subject Structure of Library and Information Science through Overlapping Community Detection in Citation Network Hot! |
|
|
Wang Wei, Yang Jianlin |
|
|
DOI: 10.3772/j.issn.1000-0135.2020.10.002 |
|
|
Mapping the subject structure can promote theoretical and methodological research—further accelerating a discipline??s innovation and development. This paper proposes to visualize subject structure based on an overlapping community detection algorithm, which measures the knowledge flow between topics through directed h-degree and reveals overlapping node roles through the analysis of topological structure and content. We apply this method to the field of library and information science. The results show that the method can dig out fine-grained research topics. These topics, which include theoretical research in information science and library science, knowledge management, citation analysis, smart library, and library subject service, are closely related to other topics. Overlapping nodes are divided into core nodes, which have a higher betweenness centrality and degree, and edge nodes, which have a lower betweenness centrality and degree. The boundary between information science and library science is fuzzy, and the research subject is homogeneous and differentiated. A deep discipline integration, which must be based on facilitating the theoretical and practical research of the two disciplines, will accelerate the development of library and information science. |
|
|
2020 Vol. 39 (10): 1021-1033
[Abstract]
(
259
)
HTML
(116 KB)
PDF
(4942 KB)
(
609
) |
|
|
|
1034 |
Comparisons of Data Set Construction Methods in Domain Analysis of Science and Technology: Consistency and Reliability Hot! |
|
|
Chen Guo, Shao Yu, Wang Yuefen |
|
|
DOI: 10.3772/j.issn.1000-0135.2020.10.003 |
|
|
In scientific and technical intelligence analysis, there are several methods in which documents can be collected; however, differences between these methods and their reliabilities are not well-understood. A manner in which scientific intelligence analysis can be promoted is to explore methods to collect field documents that can guarantee the reliability of analysis results. Therefore, this study aims to address this question quantitatively via an empirical analysis. First, the field of artificial intelligence is considered as an example along with the various task scenarios of scientific and technical intelligence analysis (including different analysis elements, element numbers, and sorting methods). Three progressive experiments are then designed in this study, each addressing one of the following: (1) the differences among the mainstream methods used to collect documents; (2) the reliability of these methods; and (3) the reliability of the final analysis results when the different methods are combined. The experimental results demonstrate that there is a slight difference and high reliability between these collection methods in intelligence analysis tasks of coarse-grained elements such as countries. However, when the intelligence analysis tasks can implemented by different researchers, the results of the existing collected methods are evidently different and their reliability is low. In different intelligence analysis tasks, we can select a relatively suitable manner to collect documents. The improvement in the reliability is not evident when the documents are collected by combining different methods. In other words, optimizing the collection of documents is crucial for the analysis of scientific and technical intelligence. Finally, based on quantitative indicators, this paper presents method to improve the reliability of scientific and technical intelligence analysis in terms of collecting documents along with future scope. |
|
|
2020 Vol. 39 (10): 1034-1045
[Abstract]
(
271
)
HTML
(106 KB)
PDF
(3768 KB)
(
577
) |
|
|
|
1046 |
Automatic Discipline Classification for Scientific Papers Based on a Deep Pre-training Language Model Hot! |
|
|
Luo Pengcheng, Wang Yibo, Wang Jimin |
|
|
DOI: 10.3772/j.issn.1000-0135.2020.10.004 |
|
|
In order to support discipline-related intelligence and literature services, this paper explores the use of a deep pre-training language model to automatically classify scientific papers for the Ministry of Education. Based on BERT and ERNIE, we constructed a literature classification model. The model was verified using a dataset that consisted of about 100,000 journal papers from 21 first-level disciplines belonging to the humanities and social sciences. We compared our model with traditional machine learning methods (such as, Na?ve Bayes, Support Vector Machines) and typical deep learning methods (i.e., Convolution Neural Network and Recurrent Neural Network). The results showed that the method based on the deep pre-training language model works best, and the top-1 and top-2 accuracy of ERNIE could reach 75.56% and 89.35%, respectively. The classifier that simultaneously used the title, keyword, and abstract of the papers as the input achieved the best result. Relatively independent disciplines achieved good classification accuracy. For example, the F1 score of Sports Science was 0.98. Other disciplines demonstrated poor accuracy owing to their relatively high intersection with other disciplines. For example, the F1 score of Theoretical Economics and Applied Economics was around 0.6. In addition, this paper further discusses the topics of disciplinary intersection, model application, and optimization. |
|
|
2020 Vol. 39 (10): 1046-1059
[Abstract]
(
321
)
HTML
(163 KB)
PDF
(2961 KB)
(
1393
) |
|
|
|
1060 |
Research on the Information Producer Distribution of a Network Q&A Community: An Empirical Analysis Based on the Music Portion of MetaFilter Data Hot! |
|
|
Yang Jinqing, Ye Guanghui |
|
|
DOI: 10.3772/j.issn.1000-0135.2020.10.005 |
|
|
To explore information producer distribution under the UGC (user-generated content) mode, in this study, the network Q&A community was taken as the research object. First, the comment, post, user, and timestamp data items of the music portion of the MetaFilter Q&A community was obtained as experimental data. Next, we verified the applicability of Lotka’s law to the distribution of comment frequency and number of users in the network Q&A community from the aspects of information productivity and social correlation strength, and explored the new characteristics of the user distribution law of the user??s social network. Finally, based on the power-law correlation of the user distribution phenomenon from the perspective of information productivity and social relevance strength, we explored the relationship between the social relevance strength of the user and information productivity. The experimental results show that Lotka??s law is still applicable to the UGC mode; the social network of users in the network Q&A community is a scale-free network. Users with strong social relevance may not produce large amounts of information, but there is a positive correlation between social relevance strength and information production, which can provide a reference for the formulation of management strategies for the network Q&A community. |
|
|
2020 Vol. 39 (10): 1060-1068
[Abstract]
(
264
)
HTML
(100 KB)
PDF
(2388 KB)
(
799
) |
|
|
|
1069 |
Meta-path-Based Research on Institution Name Normalization Hot! |
|
|
Yang Zhao |
|
|
DOI: 10.3772/j.issn.1000-0135.2020.10.006 |
|
|
Facing data governance and name standardization in the big data environment and aiming at the diversity and complexity of institution name data, this paper attempts to use the co-occurrence perspective and the heterogeneous network mining method to explore the name normalization of data-driven institutions, which can improve the quality of document network construction, mining, and application. From the perspective of the co-occurrence institution identification method, a triple heterogeneous co-occurrence network model is constructed, which consists of a superior institution, an institution, and a subordinate institution. The normalization problem of the institution name is transformed into a heterogeneous co-occurrence network mining problem, and a meta-path-based framework model of institution name normalization is constructed. Topological features and recognition tools based on meta-path are systematically designed to identify any hidden semantic relationships by mining the text attributes, geographic attributes, and relationship attributes of the heterogeneous co-occurrence networks. Using the name normalization of WoS (Web of Science) bibliographic data institutions in Shanghai Jiaotong University from 2008 to 2018 as an example, the experimental results verify the effectiveness of the method. |
|
|
2020 Vol. 39 (10): 1069-1080
[Abstract]
(
174
)
HTML
(193 KB)
PDF
(2164 KB)
(
775
) |
|
|
|
1081 |
Distribution Features of News Altmetrics Hot! |
|
|
Yu Houqiang, Cao Xueting, Wang Yuefen |
|
|
DOI: 10.3772/j.issn.1000-0135.2020.10.007 |
|
|
Propagation tracing of scholarly output in news has provided the basis for news altmetrics. The latest research shows that news altmetrics could reflect the societal impact of scholarly output. This study systematically revealed the distribution features of news altmetrics based on statistical and comparative analysis of over 4.27 million news altmetrics data records. The results showed that comprehensive and medical news platforms mentioned the highest number of scholarly outputs, the top three being The Conversation, EurekAlert!, and MedicalXpress. News altmetrics had a 56% immediacy rate, higher than that of policy document altmetrics but lower than that of Weibo and Twitter altmetrics. Distribution at the article level was highly concentrated, as measured by the number of unique users; 20% of the scholarly outputs analyzed contributed to 65% of news mentions, with 3.5 news mentions on average. Distribution at the source level was in accordance with Bradford??s law; 76 core sources were identified, the top three being The Conversation, Nature, and PLoS ONE. At the disciplinary level, medical and health sciences dominates the distribution, followed by biological science and psychology and cognitive sciences. The results of this study will help researchers better comprehend and apply news altmetrics. |
|
|
2020 Vol. 39 (10): 1081-1092
[Abstract]
(
204
)
HTML
(134 KB)
PDF
(2390 KB)
(
633
) |
|
|
|
1093 |
Key Generic Technology Identification Based on Patent Mining Hot! |
|
|
Ma Yonghong, Kong Lingkai, Lin Chaoran, Yang Xiaomeng |
|
|
DOI: 10.3772/j.issn.1000-0135.2020.10.008 |
|
|
The identification of key generic technology affects the resources provided by the government to uncertain market environments and the direction of enterprise research, restricting the development of the manufacturing industry. How to accurately identify key generic technology is an urgent problem for governments and enterprises. Based on the patent data and LDA theme model, the themes of implicit technology are extracted, the high-intensity technology topics are selected as research objects, the technology co-occurrence ratio versatility is analyzed, and the technical theme activity as well as value are combined to identify generic technology topics. Using social network analysis methods, technology topics are considered as nodes and technology theme co-occurrence intensity as edge weights; further, the technical subject rights are quantified, key nodes are screened, and key generic technology is identified. Finally, considering the new material field as an example, the results demonstrate that the preparation of high property aluminum alloys, nano powder, films, and high strength and hardness ceramic mold as well as the preparation and application of metal powder are realized by employing key generic technology in the field of new material. |
|
|
2020 Vol. 39 (10): 1093-1103
[Abstract]
(
250
)
HTML
(152 KB)
PDF
(2830 KB)
(
860
) |
|
|
|
1120 |
A Review of Knowledge Flow Research Hot! |
|
|
Dong Kun, Xu Haiyun, Cui Bin |
|
|
DOI: 10.3772/j.issn.1000-0135.2020.10.011 |
|
|
Knowledge flow is the key to facilitating the generation and diffusion of innovation. Tracking and predicting the progress and trends of knowledge flow in a timely manner is an important prerequisite for the comprehensive analysis of the knowledge flow mechanism and the promotion of knowledge that maximizes innovation value. This paper reviews the research on knowledge flow and summarizes their connotations and related concepts, forms and processes, models, influencing factors, and performance evaluation. This study also explores the knowledge flow network construction and the application of social network analysis in the research, as well as prospects for future research. The results show that we need to consider the complexity of knowledge flow, further optimize the knowledge flow model, facilitate the flow of tacit knowledge on the basis of fully understanding the function and value of tacit knowledge, and strengthen the organic combination and application of social network, scientific measurement, and traditional organizational management methods. In addition, we need to explore the network construction methods based on multi-relationship and pay more attention to the practical application of knowledge flow theory in the future. |
|
|
2020 Vol. 39 (10): 1120-1132
[Abstract]
(
590
)
HTML
(185 KB)
PDF
(2032 KB)
(
1172
) |
|
|
|