Full Abstracts

2022 Vol. 41, No. 10
Published: 2022-10-24

1003 A Co-word Analysis Method Based on Semantic Relevance and Fuzzy Clustering Hot!
Lu Quan, Cao Yue, Chen Jing
DOI: 10.3772/j.issn.1000-0135.2022.10.001
Co-word analysis is an important basic method for text content analysis; however, there are two shortcomings of the existing co-word analysis methods. One is that the semantic relevance of word pairs is not considered in the construction of the keyword co-word matrix; the other is that the diversity of word topic attribution is not supported in the cluster analysis of the co-word matrix. This study proposes a co-word analysis method based on semantic relevance and fuzzy clustering. Domain keywords are extracted based on Donohue's formula and the g-index of word frequency. The semantic vector representation of keywords is learned by the word embedding model. Subsequently, the semantic weighted co-word matrix is constructed to synthesize co-occurrence features and semantic relevance to measure the correlation between word pairs. Combining the fuzzy C-means clustering algorithm and factor dimensionality reduction, the semantic weighted co-word matrix is used for keyword fuzzy clustering to overcome the lack of simplification of word topic attribution in hard clustering, which can improve the information quality of clusters and determine the relationship between clusters. Experiments are conducted using periodicals of infectious diseases to verify the effectiveness and superiority of the method.
2022 Vol. 41 (10): 1003-1014 [Abstract] ( 400 ) HTML (188 KB)  PDF (3187 KB)  ( 412 )
1015 Examining Novelty of Technological Topics Based on Combination Probabilities Hot!
Sun Xiaoling, Chen Na, Ding Kun
DOI: 10.3772/j.issn.1000-0135.2022.10.002
Technological novelty is considered to be an important driving force to facilitate the breakthrough of innovation. Comprehensively measuring the novelty of technological topics can help identify novelty patents as early as possible and reduce the risk of delayed identification of emerging key technologies. As a knowledge element of technology, subject headings can adequately represent the subject content and methods of technological inventions. This study proposes a method to measure the novelty of technological topics from the perspective of combination probability, which integrates the direct combination times, indirect combination probability, and semantic similarity of patent subject words. Taking invention patents in the field of artificial intelligence as an example, it is verified that the method can capture the potential distance between subject word combinations, as well as identify more novelty combinations than a single indicator. The study’s findings indicate that high novelty/high conventional patents exhibit a higher average number of citations, and the high novelty patents exhibit the highest probability of becoming highly cited patents.
2022 Vol. 41 (10): 1015-1023 [Abstract] ( 235 ) HTML (147 KB)  PDF (1631 KB)  ( 272 )
1024 Academic Output Distribution in Authors of Highly Cited Papers among Different City-University Clusters Hot!
Zhang Guilan, Pan Yuntao, Zheng Chuhua, Wang Haiyan, Ma Zheng
DOI: 10.3772/j.issn.1000-0135.2022.10.003
In the open scientific research ecosystem environment, researchers were self-selective and self-organized to a certain extent in growth and development. The integrated development of universities and cities constituted the external ecological environment for researchers, which further affected their growth and development. Based on the economic level of cities and academic level of universities, this study proposed city-university clusters at different levels. We studied the academic output distribution of researchers among different city-university clusters. In this study, the authors of highly cited papers in artificial intelligence were used as examples. The basic and work information, project data, paper output data and patent output data of the authors were obtained comprehensively through data mining. We used statistical analysis and PSM (propensity score matching) to explore the distribution. We also examined the combined influence of city-university on their academic output. We found that the authors of highly cited papers were mainly concentrated in top universities, and ranking of universities and number of authors met the power function distribution law that a was negative. Certain differences were identified in the academic output distribution in authors of highly cited papers among different city-university clusters. The academic output of the higher level of city-university clusters was significantly higher, and degree of dispersion larger. Furthermore, university and city had double the influence on authors’ academic output. The influence of university on academic output was higher than that of city. The high-quality university platform could make up for the influence of city economic level on researchers’ academic output.
2022 Vol. 41 (10): 1024-1033 [Abstract] ( 211 ) HTML (129 KB)  PDF (2032 KB)  ( 400 )
1034 Multi Granularity Knowledge Organization Model for User Generated Content Hot!
Wang Zhongyi, Zheng Xin, Wang Keying
DOI: 10.3772/j.issn.1000-0135.2022.10.004
As an important text resource of network information resources in the era of big data, user generated content (UGC) has received increasing attention from scholars in various fields. Compared with traditional texts, it is more difficult to organize massive and fragmented UGC texts. Aiming at the fragmentation characteristics of the UGC text, this study proposes a multi granularity knowledge organization model based on the knowledge element. By extracting fragmented UGC knowledge element and establishing multi granularity association and multi granularity index, the fragmented UGC is organized from point to surface and from part to whole. On the one hand, in the empirical research part, pieces of fragmented UGC text are related to “retrieval” to complete multi-granularity knowledge organization, and the user interface is provided to complete the knowledge retrieval service; on the other hand, the effectiveness and scientificity of the multi granularity knowledge organization model proposed in this paper are proved by comparative experiments.
2022 Vol. 41 (10): 1034-1043 [Abstract] ( 354 ) HTML (86 KB)  PDF (2024 KB)  ( 153 )
1044 The Challenges of Webometrics and Altmetrics and the Evaluation of Robustmetric and Non-robustmetric in Societal Impact Hot!
Liu Tingyuan, Liu Shuman
DOI: 10.3772/j.issn.1000-0135.2022.10.005
In the wake of increasingly extensive evidence of the impact of scientific achievements, the challenges of webometrics and altmetrics and their societal impact evaluation are increasing. Due to the widespread prevalence of high zero values (left), multiple outliers (right), and extremely right-skewed distribution as web-altmetric data with societal impacts, the authenticity and rationality of the data set and the resistance of bias-error, reliability, and stability of their informetric methods and results are facing many unique challenges. In this study, in the face of high zero value, the quartile zero value scaling-down method is used for verification, and the proposed accurate calculation formula has good consistency and resistance of bias-error, which is an important basis for the reasonable correction of outliers and their robustmetric. The quartile zero-value rate is defined and derived based on the inter-quartile range,and the actual risk rate of its maximum scaling-down is low, which belongs to the ideal position parameter estimation point. For multiple outliers, the robustify winsorizing method is used for modification, and compared with the non-robust method, the corrected data set has more tolerance and reliability. For extremely right-skewed distribution, the linear proportional method based on the tailed mean is adopted for dimensionless, so that the results of mapping and transformation are more stable and consistent compared with the linear proportional method based on the mean. The solution of weight coefficient is based on the organic integration of subjective weight into objective weight method, and the weight set of G1 method (subjective), objective G1 method, and semi-objective G1 method is regarded as a triangular fuzzy number for defuzzification so that the weight values have a subjective-objective dual realization mechanism, thereby improving the reliability and stability of the comprehensive evaluation results. Compared with methods of non-robustmetric evaluation, the stability, reliability, and resistance of bias-error of robustmetric evaluation is greatly improved, which is conducive to promoting the development of informetrics and evaluation science towards complexity precision science.
2022 Vol. 41 (10): 1044-1058 [Abstract] ( 279 ) HTML (219 KB)  PDF (952 KB)  ( 511 )
1059 Imbalanced Classification of Emerging Technologies Identification: Based on Cost-sensitive Random Forest Hot!
Lu Xiaobin, Zhang Yangyi, Yang Guancan, Xing Jiaxin
DOI: 10.3772/j.issn.1000-0135.2022.10.006
Automated forward-looking forecasting based on large patent data and patent characteristics has gradually become the research focus of emerging technologies identification. In addition, the introduction of machine learning technology has attracted the attention of the small probability of discovering emerging technologies from massive technological inventions represented by patents, which comprises a typical imbalanced classification problem. This study aims to improve the identification performance of the classification bias to the majority caused by imbalanced datasets in emerging technologies identification and to propose a comprehensive imbalanced classification optimization framework that integrates three levels of data, algorithm, and evaluation verified by the binary classification of whether the patents in cancer drugs field can be authorized by the Food and Drug Administration to become new drugs as emerging technologies as an example. The specific improvements are as follows: progressive resampling is verified at the data level, cost-sensitive learning is introduced with three cost matrix setting methods under the background of a lack of expert experience are studied at the evaluation level, and the cost-sensitive random forest is constructed at the algorithm level. The results show that cost-sensitive random forest based on 1∶2 undersampling and ROC (receiver operating characteristic) -Youden index threshold cost matrix can predict 82.8% of the emerging technologies and 81.6% of the common technologies, which is significantly better than the control group and the existing related results. It has a certain reference value for further mining the essence of the imbalanced classification in emerging technologies identification in the future, and has certain reference value for the future exploration of the nature of the imbalanced classification problems in emerging technologies identification.
2022 Vol. 41 (10): 1059-1070 [Abstract] ( 294 ) HTML (177 KB)  PDF (1648 KB)  ( 247 )
1071 Framework of the Government-data Collaborative Governance Platform Based on the Alliance Blockchain: Considering the National Carbon Emissions Trading Market as an Example Hot!
Zheng Rong, Gao Zhihao, Wei Mingzhu, Sun Yanfei
DOI: 10.3772/j.issn.1000-0135.2022.10.007
Collaborative governance of government data under national security concept has become an important part of social and national governance. Based on the literature analysis and current research results, guided by the synergy theory, this study uses the alliance blockchain technology to build a government-data collaborative governance platform to achieve maximum synergy of the government data governance under multi-agent and multi-source data, and ensure symbiosis, sharing, co-governance, security, and stability of government data governance elements. This article focuses on the dilemma of the collaborative governance of government data, adopts the research paradigm of the “technical framework construction-platform model construction-analysis of operating mechanism,” and uses the national carbon emissions trading market as the actual application scenario to discuss the value of the platform in the collaborative governance of government data. It is proved the platform can realize collaboration between government data governance entities and government data, break data barriers, improve data security, credibility, and traceability, and clarify data standards and ownership rights in the data collaborative governance. Data security and value-added government data provide platform support and technical guarantees.
2022 Vol. 41 (10): 1071-1084 [Abstract] ( 331 ) HTML (106 KB)  PDF (5745 KB)  ( 365 )
1085 Risk Identification and Early Warning Model of Social Media Network Public Opinion in Emergencies Hot!
Li Yueqi, Wang Xiwei, Wang Nan'axue, Wang Xiaotian
DOI: 10.3772/j.issn.1000-0135.2022.10.008
Global natural disasters and public health emergencies occur frequently, and the risk crisis of online public opinion in social media is increasing gradually. The key point in emergency management is how to effectively identify and alert the risk of online public opinion in social media in emergencies. In this study, based on the ISM-BN model, the public opinion risk identification and early warning model of emergency in social media network is constructed to strengthen the judgment of key issues and need of risk for early warning. We built the knowledge database of social media risk of emergencies through a knowledge map. We used interpretative structural modeling (ISM) to identify the causal path and hierarchical relationship of the risk factors of social media network public opinion in emergencies. The Bayesian network (BN) model was used to warn about the social media risks of emergencies. We achieved the closed-loop decision-making process of emergency risk knowledge acquisition, knowledge analysis and early warning decision. This study provides a new theory and research method of social media risk management in the emergency environment, and public opinion risk identification and early warning decision support for related public opinion supervision structures.
2022 Vol. 41 (10): 1085-1099 [Abstract] ( 655 ) HTML (205 KB)  PDF (5097 KB)  ( 821 )
1100 Perspective of the Development of Intelligence Studies and Intelligence Service in the Era of Data Intelligence Hot!
Xu Xin, Ye Dingling
DOI: 10.3772/j.issn.1000-0135.2022.10.009
As the era of data intelligence transforms the core of intelligence studies and intelligence services, it has become necessary to systematically examine the changes within such studies and services to provide reference and developmental guidance for realizing the common advantages of intelligence studies and services. Starting from intelligence process, a systematic analysis of the impact of big data, cloud computing, artificial intelligence, blockchain, 5G technologies and the integration of such technologies on intelligence studies and services during demand and planning, retrieval and collection, integration and organization, analysis and condensation, presentation and transmission, is presented in this study. Further, certain viewpoints on the relationship between data intelligence technology and intelligence studies, the development of data intelligence technology and digital intelligence technology as well as a few intelligence studies theories are put forward.
2022 Vol. 41 (10): 1100-1110 [Abstract] ( 313 ) HTML (94 KB)  PDF (2157 KB)  ( 647 )