Full Abstracts

2020 Vol. 39, No. 6
Published: 2020-06-28

565 A Method for Institution Name Normalization Based on Institution-Author Vectors Hot!
Lyu Dongqing, Lu Hongru, Cheng Ying, Sun Haixia
DOI: 10.3772/j.issn.1000-0135.2020.06.001
Institution transition is one reason behind variety in institution names. Normalization of institution names benefits both information retrieval recall and the reliability of bibliometric research results. Thus, this paper proposes a method for institution name normalization based on the stable feature of personnel in an academic institution in the short term. Specifically, institution-author and institution-annual vectors are constructed for each academic institution, and the similarity of the integrated institution-author vectors, the number of co-authors, and mapping rules are used to identify transition relationships between two institutions, including renaming, merger, split, and reorganization. The method was tested using data from the CSSCI database between 1999 and 2015. After controlling for the impact of personnel turnover and homonymous authors, the proposed method demonstrated excellent performance in both accuracy and recall.
2020 Vol. 39 (6): 565-578 [Abstract] ( 319 ) HTML (138 KB)  PDF (2447 KB)  ( 794 )
579 Research on Terror-Related Sensitive Entity Recognition Model of a Heterogeneous Social Network Based on Broad Learning Hot!
Huang Wei, Tong Qingyun, Li Yuefeng
DOI: 10.3772/j.issn.1000-0135.2020.06.002
Currently, sensitive entities are active in the social network system and perform behaviors such as using social networks to spread extremism and contacting target groups. The primary issue in cyber-security governance is to identify sensitive entities. Therefore, a model based on broad learning is proposed for the identification of sensitive entities in heterogeneous social networks, which can provide strategies for China s practice of network information security governance in the new era. Two large experimental data sets of heterogeneous social networks (Twitter and Facebook), that is the user nodes and tweet nodes, are combined after they are processed by broad learning-based network embedding technology, and are embedded into the same low-dimensional feature space. The results are integrated into the matrix factorization framework to achieve the identification of multi-source heterogeneous sensitive entities. By comparing the experimental results of multiple data sets with those of a single data set, it can be concluded that the model proposed in this paper has better performance.
2020 Vol. 39 (6): 579-588 [Abstract] ( 259 ) HTML (188 KB)  PDF (2719 KB)  ( 684 )
589 Automatic Recognition of Research Methods from the Full-text of Academic Articles Hot!
Zhang Chengzhi, Zhang Yingyi
DOI: 10.3772/j.issn.1000-0135.2020.06.003
The degree ofresearch methods standardization marks the maturity of a discipline s development. In information science, theoretical analysis and normative research has gradually started to attract the attention of researchers. However, there is a lack of research on quantitative analyses of research methods. In addition, when a research method appears in an academic article, this implies that either the research method is used in the article, or it is just cited for analysis or comparison. Through research methods, researchers can quickly understand the key contents of the academic articles. Summarizing the research methods cited by academic papers helps in clarifying their evolution and development mode in the field. Thus, this paper divides research methods into those reported by and those cited in academic articles. First, this article compares a variety of automatic named entity recognition method, such as BiLSTM (bi-directional long short-term memory), from which an optimal model for final research method entities identification would be selected. The experimental results show that the character vector based BiLSTM joint training model combined with a CRF (conditional random field) yields the best performance. This paper analyzes research methods’ use in information science through the extracted research method entities. The results of statistical analysis show that the usage and citation frequency of experimental methods is the highest in information science.
2020 Vol. 39 (6): 589-600 [Abstract] ( 351 ) HTML (174 KB)  PDF (1551 KB)  ( 1038 )
601 Research of Multilingual Author-Topic Model for Profiling Researcher Interests Hot!
Li Yan, Liu Zhihui, Gao Yingfan
DOI: 10.3772/j.issn.1000-0135.2020.06.004
In the background of big data and globalization, mining latent topics automatically and profiling researchers interests accurately from massive multilingual literature are some of the key issues encountered in providing services with respect to information for knowledge and cross language information retrieval. Currently, the methods adopted to describe researchers interests are mostly based on literatures in one certain language and therefore, these are not applicable to multi-language datasets. This study suggests the JointAT (joint author-topic) model on the basis of author-topic model and multilingual topic model to profile researchers interests from multilingual datasets. Moreover, a Gibbs sampling method to estimate the parameters of the JointAT model is proposed. The experimental results indicate that the JointAT model exhibits a better generalization ability than the author-topic model.
2020 Vol. 39 (6): 601-608 [Abstract] ( 173 ) HTML (89 KB)  PDF (4069 KB)  ( 704 )
609 Research on Constructing a Model of Correlation Discrimination Between Funds and Funded Papers Based on Siamese Network Hot!
Ye Wenhao, Wang Dongbo, Shen Si, Su Xinning
DOI: 10.3772/j.issn.1000-0135.2020.06.005
To explore the phenomenon of mislabeling fund projects in research papers, this study proposes a deep learning model to calculate the correlation between the fund and its sponsored paper. Considering the National Social Science Fund Project and its sponsored papers as the data source, the similarity between the fund title and the title and abstract of the paper is calculated based on the word2vec model. The correlation score of text similarity establishes that there are differences between the fund content and its sponsored papers. By manually reviewing the low-similarity data pairs, we confirm that some funds are mislabeled. Finally, the correlation model between the fund and its sponsored papers is developed. This model is effective in detecting the papers with mislabeled fund projects with a precision of over 99%. The recall and F-score of the model that uses Transformer as the encoder are estimated at 89.13% and 94.22%, respectively. This model can aid in suppressing the fund s mislabeling behavior effectively from both author submissions and journal reviews.
2020 Vol. 39 (6): 609-618 [Abstract] ( 287 ) HTML (119 KB)  PDF (2604 KB)  ( 836 )
619 The Credibility of Business Public Opinion Based on Model Checking Hot!
Wu Peng, Xiao Weicong, Chu Rongzhen
DOI: 10.3772/j.issn.1000-0135.2020.06.006
Credibility assessment of business public opinion affects the both enterprise development and investors interests. To identify the criterion of credibility and judge its accuracy, this paper designed a credibility detection framework based on model checking technology. A decision tree algorithm was employed instead of the traditional artificial induction process to construct the rules for credibility judgment of business public opinion, and the language description was formalized with CTL. A business public opinion database is constructed as a credibility to-be-detected model based on the temporal logic relationship, represented by a Kripke structure. The model detector NuSMV performs automatic rule verification on the model to be detected, determines whether the model conforms to the credibility detection rule, and outputs the nonconforming paths as counter-examples (i.e., untrustworthy business public opinion detection paths). The proposed framework was validated in combination with empirical research, showing that the detection framework can quickly and effectively realize automatic detection of business public opinion credibility. This can help investors to analyze and predict the authenticity of business public opinion.
2020 Vol. 39 (6): 619-629 [Abstract] ( 155 ) HTML (153 KB)  PDF (1938 KB)  ( 660 )
630 Topic Mining of Online Reviews Based on Gaussian Latent Dirichlet Allocation Hot!
Guo Xianda, Zhao Narisa, Gao Huan, Yang Xinyi
DOI: 10.3772/j.issn.1000-0135.2020.06.007
This study proposes a method based on Gaussian latent Dirichlet allocation (LDA) for online comments to overcome the limitations of the current topic mining methods, such as sparseness and semantic incoherence of generated topics, that result in a poor applicability. The word vectors of online comments are obtained by word2vec training, and the topic distribution of online comments is achieved based on the Gaussian LDA model. The topic distribution is then used to calculate the similarity matrix of comments, and the affinity propagation clustering algorithm is employed to cluster online comments. The topic discovery is realized by analyzing the clustering results. Finally, the TextRank algorithm is used to extract the key sentences of each topic to generate the topic summary so that the description of the topic can be completed. The proposed method effectively alleviates the information overload problem of consumers online comments. The effectiveness and practical application value of the proposed method have been established through experiments and calculations performed on online product reviews from seven platforms, such as Taobao, Jingdong, and Douban.
2020 Vol. 39 (6): 630-639 [Abstract] ( 385 ) HTML (137 KB)  PDF (1634 KB)  ( 1061 )
640 Methodological and Automatic Sentence Extraction from Academic Article s Full-text Hot!
Zhang Yingyi, Zhang Chengzhi
DOI: 10.3772/j.issn.1000-0135.2020.06.008
Research methods are essential in the scientific literature. These include methods, tools, or techniques for solving problems in the field. Theresearch method s description is usually presented through sentences. Summarizing these scattered sentences in the scientific literature can help researchers to quickly explore appropriate research methods. According to the method s purpose in the research paper, the research method sentence is further divided into method used and method cited sentences. The method used sentence refers to the sentence that describes the research method used in the paper and the method cited sentence refers to that cited by the paper. In this study, a variety of neural network-based sentence classification models are used for extracting the method sentences from the scientific literature s full-text. At the word vector representation layer, the study uses two-word vector models: BERT and word2vec. In the feature selection layer, three different networks are utilized: convolutional neural network (CNN), bidirectional LSTM (BiLSTM), and attention mechanism network. In addition, the study uses two model training methods a single-level structure and a two-level structure. The experimental results show that the BERT-based BiLSTM model with single-level structure achieves the best performance. This paper analyzes the distribution of research method sentences extracted from the Journal of The China Society for Scientific and Technical Information. The analysis indicates that this journal paid more attention to the theoretical developments of information science; in addition, the journal also focused on constructing theoretical systems for this discipline.
2020 Vol. 39 (6): 640-650 [Abstract] ( 330 ) HTML (154 KB)  PDF (1677 KB)  ( 1253 )
651 Review of Research Progress on Emerging Technologies Identification Based on Quantitative and Evolutionary Perspectives Hot!
Lu Xiaobin, Yang Guancan, Xu Shuo, Zhang Yangyi
DOI: 10.3772/j.issn.1000-0135.2020.06.009
Emerging technologies identification has always been the focus of scientific and technological innovation management, scientific and technological policy-making, and technologically competitive intelligence research. Although significant academic research has been performed in this field, the conceptual definition of “emerging technology” has seriously restricted its development, which is attributed to expanding the conceptual boundary of emerging technologies from two different cognitive perspectives: quantitative and evolutionary. Therefore, the basis for clarifying the concept of emerging technologies identification is to first understand the characteristics and application scenarios of the two perspectives. In this paper, first, a framework consisting of three parts is proposed: characteristics, data representations and methods of emerging technologies identification. This framework can comprehensively cover the practical progress of emerging technologies identification from the current quantitative perspective. Subsequently, through literature comparison, it was found that the rationality of the evolutionary perspective lies in the fact that the proposed framework cannot explain the following four issues: radical innovation based on technological recombination, fusion effect of disciplines and technological networks, driving effect by technological practicability and efficiency, and disruptive innovation, which promote the transformation of data representation and recognition methods from the evolutionary perspective. Finally, preliminary research prospects of the development of data representation and recognition methods of emerging technologies identification are described. The research will provide references for the further development of emerging technologies identification activities, by comparing its understanding and practice from two different perspectives.
2020 Vol. 39 (6): 651-661 [Abstract] ( 247 ) HTML (125 KB)  PDF (1368 KB)  ( 808 )
662 Origin, Application, and Development of Message Framing Theory in Foreign Health Behavior Research Hot!
Yang Mengqing, Zhao Yuxiang, Song Shijie, Qinghua nd Zhu
DOI: 10.3772/j.issn.1000-0135.2020.06.010
This paper reviews the origin, application, and development of the message framing theory in foreign health behavior research, and to provide theoretical reference for health information behavior research in China. By discussing the main characteristics in the different application phases of the message framing theory in health behavior studies, we first described the evolution of this research topic. Then, a detailed analysis of the application of the message framing theory was created from two perspectives, namely framing and cognition. Finally, we put forward possible breakthroughs for future research based on the results of the previous review. The development of the message framing theory in health behavior research can be divided into three stages, namely theme formation stage, theme development stage, and theme maturity stage. Each stage has distinct characteristics. Based on different framing design ideas, the related research can be divided into three categories, gain- and loss-framed message, temporal framing, and narrative. Self-efficacy, information processing, behavioral motivation, and health beliefs are common cognitive perspectives when discussing the framing effect in health behaviors. At the end of this paper, combined with the research theme of information management, we propose that new research work can be carried out in three directions: information tailoring in the online health community, framing context in health information application adoption, and misinformation in health information dissemination.
2020 Vol. 39 (6): 662-674 [Abstract] ( 598 ) HTML (152 KB)  PDF (3431 KB)  ( 1128 )