|
|
Corpus Construction for Citation Sentiment in Chinese Literature |
Xu Linhong1,2, Ding Kun1, Chen Na1, Li Bing1 |
1.WISE Lab, Institute of Science of Science and Technology Management, Dalian University of Technology, Dalian 116024 2.Software Institute, Dalian University of Foreign Languages, Dalian 116044 |
|
|
Abstract A content-based citation sentiment analysis overcomes the traditional problem of frequency-based citation assimilation, which is an important research hotspot in the field of citation content analysis. However, citation sentiment analysis relies on annotated datasets, and the lack of a large-scale and high-quality citation sentiment corpus seriously restricts research progress in this field. Therefore, based on the analysis of citation sentiment expression, a set of annotation schemes for such expression is proposed in this paper, along with elaboration regarding the technology and method of corpus construction. A large-scale citation sentiment corpus on Chinese literature was constructed using the human-computer interaction annotation strategy through a comprehensive citation annotation system. The statistical results show the proportions of positive and negative citations as 22% and 6%, respectively, and the kappa value of citation sentiment reached 0.852, indicating that this corpus objectively reflects the author s sentiments and can provide data support for research in related fields such as paper evaluation, citation network analysis, and sentiment analysis.
|
Received: 21 February 2019
|
|
|
|
1 Garfield E. Citation indexes for science: A new dimension in documentation through association of ideas[J]. Science, 1955, 122(3159): 108-111. 2 Garfield E. Can citation indexing be automated?[C]// Proceedings of the Symposium on Statistical Association, Washington DC, 1964: 84-90. 3 Yousif A,Niu Z D,TarusJ K, et al. A survey on sentiment analysis of scientific citations[J]. Artificial Intelligence Review, 2019, 52(3): 1805-1838. 4 章成志, 丁睿祎, 王玉琢. 基于学术论文全文内容的算法使用行为及其影响力研究[J]. 情报学报, 2018, 37(12): 1175-1187. 5 Zhao D Z,Strotmann A,Cappello A. In-text function of author self-citations: Implications for research evaluation practice[J]. Journal of the Association for Information Science and Technology, 2018, 69(7): 949-952. 6 刘盛博, 丁堃, 刘则渊. 基于引用内容的引文检索与推荐系统[J]. 情报学报, 2013, 32(11): 1157-1163. 7 袁慧, 马建霞, 王文娟. 期刊引用行为与影响因子的关系[J]. 中国科技期刊研究, 2017, 28(11): 1058-1064. 8 Chubin D E,Moitra S D. Content analysis of references: Adjunct or alternative to citation counting?[J]. Social Studies of Science, 1975, 5(4): 423-441. 9 Moravcsik M J. Citation context classification of a citation classic concerning citation context classification[J]. Social Studies of Science, 1988, 18(3): 515-521. 10 van der Veer Martens B,Goodrum A A. The diffusion of theories: A functional approach[J]. Journal of the American Society for Information Science and Technology, 2006, 57(3): 330-341. 11 Teufel S,Siddharthan A,Tidhar D. Automatic classification of citation function[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2006: 103-110. 12 陆伟, 孟睿, 刘兴帮. 面向引用关系的引文内容标注框架研究[J]. 中国图书馆学报, 2014, 40(6): 93-104. 13 刘盛博, 丁堃, 张春博. 基于引用内容性质的引文评价研究[J]. 情报理论与实践, 2015, 38(3): 77-81. 14 张梦莹, 卢超, 郑茹佳, 等. 用于引文内容分析的标准化数据集构建[J]. 图书馆论坛, 2016, 36(8): 48-53. 15 尹莉. “极性”概念在引文分析中应用的一个实证研究[J]. 情报杂志, 2017, 36(8): 124-130, 143. 16 廖君华, 刘自强, 白如江, 等. 基于引文内容分析的引用情感识别研究[J]. 图书情报工作, 2018, 62(15): 112-121. 17 Small H. Interpreting maps of science using citation context sentiments: A preliminary investigation[J]. Scientometrics, 2011, 87(2): 373-388. 18 Athar A. Sentiment analysis of citations using sentence structure-based features[C]// Proceedings of the ACL-HLT Student Session. Stroudsburg: Association for Computational Linguistics, 2011: 81-87. 19 Athar A,Teufel S. Context-enhanced citation sentiment detection[C]// Proceedings of the Conference on the North American Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2012: 597-601. 20 Ma Z,Nam J,Weihe K. Improve sentiment analysis of citations with author modelling[C]// Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Stroudsburg: Association for Computational Linguistics, 2016: 122-127. 21 Yu B. Automated citation sentiment analysis: What can we learn from biomedical researchers[J]. Proceedings of the American Society for Information Science and Technology, 2013, 50(1): 1-9. 22 Pang B,Lee L,VaithyanathanS. Thumbs up?: Sentiment classification using machine learning techniques[C]// Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2002: 79-86. 23 Hu M Q,Liu B. Mining and summarizing customer reviews[C]// Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2004: 168-177. 24 姚源林, 王树伟, 徐睿峰, 等. 面向微博文本的情绪标注语料库构建[J]. 中文信息学报, 2014, 28(5): 83-91. 25 徐琳宏, 林鸿飞, 赵晶. 情感语料库的构建和分析[J]. 中文信息学报, 2008, 22(1): 116-122. 26 Kumar S. Structure and dynamics of signed citation networks[C]// Proceedings of the 25th International Conference Companion on World Wide Web. New York: ACM Press, 2016: 63-64. 27 Xu J,Zhang Y Y,WuY H, et al. Citation sentiment analysis in clinical trial papers[J]. AMIA Annual Symposium Proceedings Archive, 2015, 2015: 1334-1341. 28 Munkhdalai T,Lalor J,Yu H. Citation analysis with neural attention models[C]// Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis. Stroudsburg: Association for Computational Linguistics, 2016: 69-77. 29 李宏言, 范利春, 高鹏, 等. 大数据语音语料库的社会标注技术[J]. 清华大学学报(自然科学版), 2013, 53(6): 908-912. 30 柯永红, 俞士汶, 穗志方, 等. 基于群体智慧的语料标注方法研究[J]. 中文信息学报, 2017, 31(4): 108-113, 131. 31 Carletta J. Assessing agreement on classification tasks: The kappa statistic[J]. Computational Linguistics, 1996, 22(2): 249-254. |
|
|
|