|
|
Multi-view Fusion DJ-TextRCNN for the Theme Recommendation of Ancient Texts |
Wu Shuai1, Yang Xiuzhang2,3, He Lin1 |
1.College of Information Management, Nanjing Agricultural University, Nanjing 211800 2.School of Cyber Science and Engineering, Wuhan University, Wuhan 430072 3.School of Information, Guizhou University of Finance and Economics, Guiyang 550025 |
|
|
Abstract Progress in digital humanities research is hindered by issues such as low working efficiency, blurred boundaries of cataloging topics, excessive reliance on expert knowledge, lack of in-depth mining of the semantics of ancient texts, and difficulty in accurately recommending topics in the field of ancient texts by combining the characteristics of ancient book texts. In this regard, this study aimed to realize the accurate recommendation of text theme content that satisfies the needs of researchers on the basis of the characteristics of ancient book corpora. First, ancient book corpus data annotated by the research group in the early stage were selected for subject category labeling and view classification. Second, a semantic mining model integrating a pretrained BERT model and improved convolutional neural network, recurrent neural network, and multi-head attention mechanism was constructed. Finally, the multi-view semantic enhancement model of “subject-relationship-object” was integrated to construct the DJ-TextRCNN model to realize more fine-grained, deeper, and multi-dimensional semantic mining of classic texts. The DJ-TextRCNN model achieved the best accuracy of ancient book theme recommendation tasks in different views. Under the “subject-relationship-object” view, an accuracy rate of 88.54% was reached, and the accurate theme recommendation of ancient texts was preliminarily realized. The model can help guide in-depth and fine-grained semantic mining of Chinese culture.
|
Received: 11 April 2023
|
|
|
|
1 曾蕾, 王晓光, 范炜. 图档博领域的智慧数据及其在数字人文研究中的角色[J]. 中国图书馆学报, 2018, 44(1): 17-34. 2 焦艳鹏, 刘葳. 知识获取、人工智能与图书馆精神[J]. 中国图书馆学报, 2021, 47(5): 20-32. 3 周贞云, 邱均平. 面向人工智能的我国知识图谱研究的分布特点与发展趋势[J]. 情报科学, 2022, 40(1): 184-192. 4 高丹, 何琳, 孙帆, 等. 国际数字人文领域研究方法的量化分析及启示研究[J]. 情报科学, 2022, 40(10): 114-122. 5 何琳, 陈雅玲, 孙珂迪. 面向先秦典籍的知识本体构建技术研究[J]. 图书情报工作, 2020, 64(7): 13-19. 6 Deshpande M, Karypis G. Item-based top-n recommendation algorithms[J]. ACM Transactions on Information Systems, 2004, 22(1): 143-177. 7 杨秀璋, 武帅, 夏换, 等. 基于自适应图像增强技术的水族文字提取与识别研究[J]. 计算机科学, 2021, 48(S1): 74-79. 8 赵宇翔, 张妍, 夏翠娟, 等. 数字人文视域下文化记忆机构价值共创研究及实践述评[J]. 中国图书馆学报, 2023, 49(1): 99-117. 9 Lai S W, Xu L H, Liu K, et al. Recurrent convolutional neural networks for text classification[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2015: 579-585. 10 何琳, 马晓雯, 喻雪寒, 等. 典籍事件触发动词识别研究: 基于《左传》的文本实验[J]. 图书情报工作, 2022, 66(5): 133-141. 11 高丹, 何琳. 数智赋能视域下的数字人文研究: 数据、技术与应用[J]. 图书馆论坛, 2023, 43(9): 107-119. 12 姚名达. 中国目录学史[M]. 上海: 上海古籍出版社, 2005. 13 马学良. 从《四部分类源流一览表》看目录学的考辨作用[J]. 图书馆理论与实践, 2012(7): 46-49. 14 万彩红. 史志目录中易学文献分类研究[J]. 南方论刊, 2015(6): 58-60. 15 夏翠娟, 林海青, 刘炜. 面向循证实践的中文古籍数据模型研究与设计[J]. 中国图书馆学报, 2017, 43(6): 16-34. 16 李惠, 陈涛, 侯君明, 等. 钩玄提要——古籍目录智能分析工具构建[J]. 中国图书馆学报, 2021, 47(4): 97-112. 17 李文琦, 王凤翔, 孙显斌, 等. 历代史志目录的数据集成与可视化[J]. 中国图书馆学报, 2023, 49(1): 82-98. 18 常娥. 农史专题资料自动编纂系统的构建与测试[J]. 图书馆学研究, 2009(6): 10-14. 19 Sinclair S, Ruecker S, Radzikowska M. Information visualization for humanities scholars[M]// Literary Studies in the Digital Age: an Evolving Anthology. New York: Modern Language Association of America, 2013. 20 李明杰. 数字环境下古籍整理范式的传承与拓新[J]. 中国图书馆学报, 2015, 41(5): 99-110. 21 张力元, 王军. 基于机器学习的古籍目录互著与别裁探析[J]. 中国图书馆学报, 2022, 48(2): 47-61. 22 Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. 23 Hofmann T. Unsupervised learning by probabilistic latent semantic analysis[J]. Machine Learning, 2001, 42(1): 177-196. 24 颜端武, 陶志恒, 李兰彬. 一种基于HDP模型的主题文献自动推荐方法及应用研究[J]. 情报理论与实践, 2016, 39(1): 128-132. 25 房小可, 纪春光. 基于标签主题和概念空间的个性化推荐研究[J]. 情报理论与实践, 2015, 38(5): 105-111. 26 祝婷, 秦春秀, 马晓悦, 等. 基于本体与LDA主题模型的文本资源推荐方法研究[J]. 情报杂志, 2015, 34(11): 150-156. 27 张亮. 基于LDA主题模型的标签推荐方法研究[J]. 现代情报, 2016, 36(2): 53-56. 28 秦贺然, 刘浏, 李斌, 等. 融入实体特征的典籍自动分类研究[J]. 数据分析与知识发现, 2019, 3(9): 68-76. 29 崔金栋, 杜文强, 关杨, 等. 微博用户信息个性化推荐主题模型LDA演化分析研究[J]. 情报科学, 2017, 35(8): 3-10. 30 杨秀璋, 武帅, 杨琪, 等. 多视图融合TextRCNN的论文自动推荐算法[J]. 计算机工程与应用, 2023, 59(2): 110-119. 31 翟姗姗, 胡畔, 潘英增, 等. 融合知识图谱与用户病情画像的在线医疗社区场景化信息推荐研究[J]. 情报科学, 2021, 39(5): 97-105. 32 彭博. 主题—知识关联的网络文物信息资源知识推荐方法研究[J]. 情报科学, 2021, 39(9): 162-169. 33 李锴君, 牛振东, 时恺泽, 等. 基于学术知识图谱及主题特征嵌入的论文推荐方法[J]. 数据分析与知识发现, 2023, 7(5): 48-59. 34 王杰, 唐菁荟, 王昊, 等. 融合主题模型和卷积神经网络的APP推荐研究[J]. 情报理论与实践, 2019, 42(4): 158-165. 35 严凡, 张霁月. 基于图书语义信息的推荐方法研究[J]. 图书馆学研究, 2018(21): 40-45. 36 李治, 孙锐, 姚羽轩, 等. 基于实时事件侦测的兴趣点推荐系统研究[J]. 数据分析与知识发现, 2022, 6(10): 114-127. 37 Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. 38 Wang S X, Wang X, Wang S M, et al. Bi-directional long short-term memory method based on attention mechanism and rolling update for short-term load forecasting[J]. International Journal of Electrical Power & Energy Systems, 2019, 109: 470-479. 39 卢春华, 杨辉, 李云鹏. 一种基于本体和循环神经网络的在线学习资源推荐技术[J]. 情报理论与实践, 2019, 42(12): 150-155, 138. 40 倪维健, 郭浩宇, 刘彤, 等. 基于多头自注意力神经网络的购物篮推荐方法[J]. 数据分析与知识发现, 2020, 4(S1): 68-77. 41 赵雪峰, 吴德林, 吴伟伟, 等. 基于深度学习与多分类轮询机制的高质量“卡脖子”技术专利识别模型——以专利申请文件为研究主体[J]. 数据分析与知识发现, 2023, 7(8): 30-45. 42 Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 4171-4186. 43 Li J, Wang X, Tu Z P, et al. On the diversity of multi-head attention[J]. Neurocomputing, 2021, 454: 14-24. 44 Baltru?aitis T, Ahuja C, Morency L P. Multimodal machine learning: a survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(2): 423-443. 45 Kim Y. Convolutional neural networks for sentence classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 1746-1751. 46 Liu P F, Qiu X P, Huang X J. Recurrent neural network for text classification with multi-task learning[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2016: 2873-2879. 47 杨柏峻, 徐提. 春秋左传词典[M]. 北京: 中华书局, 1985. 48 杨伯峻. 春秋左传注·一[M]. 北京: 中华书局, 1990. 49 李章超, 李忠凯, 何琳. 《左传》战争事件抽取技术研究[J]. 图书情报工作, 2020, 64(7): 20-29. |
|
|
|