|
|
Automatic Recognition and Bibliometric Analysis of Cited Books |
Huang Shuiqing1,2, Zhou Hao1,2, Peng Qiuru1,2, Wang Dongbo1,2 |
1.College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095 2.Research Center for Correlation of Domain Knowledge, Nanjing Agricultural University, Nanjing 210095 |
|
|
Abstract There are several citations of ancient books, which are called cited books. Present citation analysis focuses mostly on modern texts; the academic community pays less attention to the citation phenomenon in the texts of ancient books. In this paper, we apply the citation analysis method to ancient books and calculate and analyze the citation indicators of cited books in order to establish a preliminary framework for the bibliometrics research of cited books. This article takes Lunyu Zhushu, Maoshi Zhengyi, and Chunqiu Zuozhuan Zhengyi in Notes of Thirteen Classics as the sample. First, citation items from ancient books are automatically recognized based on CRF (conditional random field), Bi-LSTM (bidirectional long short-term memory) and Bi-LSTM-CRF models and compared their extracted features. Then, the citation analysis method is used to calculate and analyze various citation measurement indexes of these three classic books in order to examine the knowledge correlation between ancient books and discuss the citation behavior of ancient scholars. The results show that the machine learning model applied to the automatic recognition of citation items has a good overall effect, the two deep learning models perform better, and there is an obvious gap between CRF models. Among the two deep learning models, the Bi-LSTM-CRF model is slightly better than the Bi-LSTM. The scale of cited books is affected by various factors, and the cited times of classic documents account for the highest proportion, especially the ritual documents in classic documents. In addition, the ancient people’s citation behavior was influenced by multiple factors such as the purpose of the book, the scholars’ knowledge background, and the difficulty of obtaining the cited documents.
|
Received: 08 July 2020
|
|
|
|
1 Garfiel E. Citation index for science[J]. Science, 1955, 122(3159): 108-111. 2 邵作运, 李秀霞. 引文分析法与内容分析法结合的文献知识发现方法综述[J]. 情报理论与实践, 2020, 43(3): 153-159. 3 李梦姣. 《文选》李善注引子部儒家类书录[D]. 郑州: 郑州大学, 2018. 4 张丽. 《分门古今类事》引书研究[D]. 长春: 东北师范大学, 2015. 5 李睿, 周维, 王雪. 引文生态视角下标准必要专利的引文特征研究[J]. 情报学报, 2018, 37(9): 882-889. 6 马创新. 注疏文献的结构化知识表示[D]. 南京: 南京师范大学, 2014. 7 马创新, 陈小荷. 基于引文分析的古籍文献影响力评估[J]. 大学图书馆学报, 2016, 34(1): 16-24. 8 周好. 引书的自动识别及分析——以《论语注疏》《毛诗正义》《春秋左传正义》为例[D]. 南京: 南京农业大学, 2019. 9 王东波, 胡昊天, 周鑫, 等. 基于深度学习的数据科学招聘实体自动抽取及分析研究[J]. 图书情报工作, 2018, 62(13): 64-73. 10 黄炜, 黄建桥, 李岳峰. 基于BiLSTM-CRF的涉恐信息实体识别模型研究[J]. 情报杂志, 2019, 38(12): 149-156. 11 李娜. 基于条件随机场的方志古籍别名自动抽取模型构建[J]. 中文信息学报, 2018, 32(11): 41-48, 61. 12 高甦, 金佩, 张德政. 基于深度学习的中医典籍命名实体识别研究[J]. 情报工程, 2019, 5(1): 113-123. 13 Bornmann L, Daniel H D. What do citation counts measure? A review of studies on citing behavior[J]. Journal of Documentation, 2008, 64(1): 45-80. 14 叶继元. 引文的本质及其学术评价功能辨析[J].中国图书馆学报, 2010, 36(1): 35-39. 15 章成志, 王玉琢, 卢超. 学术专著引用行为研究——基于引文内容特征分析的视角[J]. 情报学报, 2017, 36(3): 319-330. 16 邱均平, 陈晓宇, 何文静. 科研人员论文引用动机及相互影响关系研究[J]. 图书情报工作, 2015, 59(9): 36-44. 17 黄永年. 古文献学讲义[M]. 上海: 中西书局, 2014: 27-28. 18 刘姝. 《难经集注》的文献研究[D]. 济南: 山东中医药大学, 2006. 19 Appendix C: named entity task definition (v 2.1)[C]// Proceedings of the 6th Message Understanding Conference. Stroudsburg: Association for Computational Linguistics,1995: 317-332. 20 Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging[OL]. (2015-08-09). http://export.arxiv.org/pdf/1508.01991. 21 赵军. 命名实体识别、排歧和跨语言关联[J]. 中文信息学报, 2009, 23(2): 3-17. 22 邱均平, 刘国徽. 国内耦合分析方法研究现状与展望[J]. 图书情报工作, 2014, 58(7): 131-136, 144. 23 Kessler M M. Bibliographic coupling between scientific papers[J]. American Documentation, 1963, 14(1): 10-25. 24 王力. 中国语言学史[M]. 太原: 山西人民出版社, 1981. 25 邱均平. 信息计量学(九)第九讲: 文献信息引证规律和引文分析法[J]. 情报理论与实践, 2001, 24(3): 236-240. 26 谢娟, 成颖, 孙建军, 等. 基于信息使用环境理论的引用行为研究: 参考文献分析的视角[J]. 中国图书馆学报, 2018, 44(5): 59-75. 27 Garfield E. Can citation indexing be automated[C]//Symposium Proceedings on Statistical Association Methods for Mechanized Documentation, 1965: 189-192. 28 曹顺庆, 王庆. 中国传统学术生成的奥秘: “依经立义”[J]. 中州学刊, 2012(5): 187-192. 29 金克木. 读《大学》[C]/ /王元化, 胡晓明, 傅杰. 释中国(第二卷)[M]. 上海: 上海文艺出版社, 1998(2): 1344. |
|
|
|