|
|
|
| Academic Paper Profiling Based on CCA: A Theoretical Framework and Empirical Study |
| Wang Xinyue, Zhao Danqun |
| Department of Information Management, Peking University, Beijing 100871 |
|
|
|
|
Abstract Academic paper profiling can be categorized into two types: static profiling based on full text and dynamic profiling based on its citation corpus. This study focuses on the latter, and introduces citation content analysis (CCA) method to dynamically characterize a paper’s academic impact by collecting and analyzing peer-cited texts after its publication. Such dynamic profile provides critical support for downstream tasks such as paper evaluation and academic resource retrieval and recommendation. First, a three-dimensional framework profiling paper’s academic impact— “impact power, impact field & impact path” (Im-PFP) is developed through theoretical analysis. Then, an empirical study was conducted using a self-compiled citation corpus, with a single highly cited paper— “Co-citation in the scientific literature: a new measure of the relationship between two documents” (Small, 1973) — as a case study. The empirical result validates the universality and practical value of the Im-PFP framework proposed in this study, which can be extended to profiling studies of single or multiple highly cited papers across different disciplines. Profiling outcomes enable both the classification of an academic paper’s lifecycle stages and the analysis of temporal evolution of its impact power, impact field and impact path.
|
|
Received: 05 June 2025
|
|
|
|
1 王东, 李青, 张志刚, 等. 科研人员画像构建方法研究[J]. 情报学报, 2022, 41(8): 812-821. 2 董文慧, 熊回香, 杜瑾, 等. 基于学者画像的科研合作者推荐研究[J]. 数据分析与知识发现, 2022, 6(10): 20-34. 3 王心玥, 赵丹群. 数智时代的学者画像研究: 问题与进路[J]. 图书情报知识, 2026, 43(1): 88-97, 123. 4 郭红梅, 曾建勋. 基于本体的科研机构标签体系研究[J]. 情报学报, 2022, 41(6): 574-583. 5 胡潜, 吴茜, 董寒宇. 基于文献数据化的科研机构历时画像构建研究[J]. 情报理论与实践, 2024, 47(7): 88-96. 6 耿海英, 张建东, 杨立英, 等. 不同学科分类体系下学科结构画像的对比分析[J]. 情报科学, 2023, 41(10): 83-90, 120. 7 胡正银, 刘蕾蕾, 代冰, 等. 基于领域知识图谱的生命医学学科知识发现探析[J]. 数据分析与知识发现, 2020, 4(11): 1-14. 8 Zhang G, Ding Y, Milojevi? S. Citation content analysis (CCA): a framework for syntactic and semantic analysis of citation content[J]. Journal of the American Society for Information Science and Technology, 2013, 64(7): 1490-1503. 9 Zhao D Q, Guo Q Y, Chen H P, et al. Corpus construction and mining for citation context analysis[J]. Data Science and Informetrics, 2021, 1(1): 96-114. 10 丁堃, 赵昕航, 林原, 等. 面向学术评价的论文画像研究[J]. 情报理论与实践, 2022, 45(9): 94-101. 11 吴江. 基于论文画像的科研论文影响力评价方法研究[J]. 四川图书馆学报, 2022(3): 52-56. 12 张吉玉, 张均胜, 乔晓东. 辅助新颖性评估的科技论文评述画像构建方法[J]. 情报理论与实践, 2023, 46(1): 159-167. 13 Sudolska A, Lis A, Chodorek M. Research profiling for responsible and sustainable innovations[J]. Sustainability, 2019, 11(23): 6553. 14 Camara Viana L F, Hoffmann V E, da Silva Miranda Junior N. Regional resilience and innovation: paper profiles and research agenda[J]. Innovation & Management Review, 2023, 20(2): 119-131. 15 Anderson M H, Lemken R K. Citation context analysis as a method for conducting rigorous and impactful literature reviews[J]. Organizational Research Methods, 2023, 26(1): 77-106. 16 陈翀, 李楠, 梁冰, 等. 基于成果特征的学者学术专长识别方法[J]. 图书情报工作, 2019, 63(20): 96-103. 17 徐曾旭林, 谢靖, 于倩倩. 人才多元评价模型设计方法研究[J]. 数据分析与知识发现, 2021, 5(8): 122-131. 18 Meng L, Wu B. Core discovery and relation extraction in organization profiling[C]// Proceedings of the 13th International Conference on Semantics, Knowledge and Grids (SKG). Piscataway: IEEE, 2017: 219-222. 19 田瑞强, 潘云涛. 全面画像视角下的世界一流科技期刊研究[J]. 中国科技期刊研究, 2021, 32(9): 1111-1119. 20 潘飞, 孙文礼, 王骁龙, 等. “中国科技期刊卓越行动计划”资助期刊群体画像构建与分析——以领军期刊与重点期刊为例[J]. 中国科技期刊研究, 2024, 35(6): 831-840. 21 Lu C, Ding Y, Zhang C Z. Understanding the impact change of a highly cited article: a content-based citation analysis[J]. Scientometrics, 2017, 112(2): 927-945. 22 祝清松, 冷伏海. 基于引文内容分析的高被引论文主题识别研究[J]. 中国图书馆学报, 2014, 40(1): 39-49. 23 王心玥, 赵丹群. 引文情感识别研究进展及评述[J]. 情报理论与实践, 2024, 47(1): 173-181, 189. 24 Athar A. Sentiment analysis of citations using sentence structure-based features[C]// Proceedings of the ACL 2011 Student Session. Stroudsburg: Association for Computational Linguistics, 2011: 81-87. 25 Sula C A, Miller M. Citations, contexts, and humanistic discourse: toward automatic extraction and classification[J]. Literary and Linguistic Computing, 2014, 29(3): 452-464. 26 Raza H, Faizan M, Hamza A, et al. Scientific text sentiment analysis using machine learning techniques[J]. International Journal of Advanced Computer Science and Applications, 2019, 10(12): 157-165. 27 Lauscher A, Glava? G, Ponzetto S P, et al. Investigating convolutional networks and domain-specific embeddings for semantic classification of citations[C]// Proceedings of the 6th International Workshop on Mining Scientific Publications. New York: ACM Press, 2017: 24-28. 28 Cohan A, Ammar W, Van Zuylen M, et al. Structural scaffolds for citation intent classification in scientific publications[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 3586-3596. 29 Jha R, Jbara A A, Qazvinian V, et al. NLP-driven citation analysis for scientometrics[J]. Natural Language Engineering, 2017, 23(1): 93-130. 30 李铮, 邓三鸿, 孔嘉, 等. 学者学术影响力识别研究——基于引文全数据的视角[J]. 图书情报工作, 2020, 64(12): 87-94. 31 魏绪秋, 姜召昊, 常霞, 等. 基于引证意图的学术论文创新性评价研究[J]. 情报理论与实践, 2023, 46(9): 24-30, 46. 32 王剑, 高峰, 满芮, 等. 基于引用频次和内容分析的引文分布与动机关系研究[J]. 情报杂志, 2013, 32(9): 100-103. 33 Web of Science. Citation network[EB/OL]. [2025-11-27]. https://webofscience.clarivate.cn/wos/woscc/full-record/WOS:A1990DX 15600001. 34 刘盛博, 王博, 唐德龙, 等. 基于引用内容的论文影响力研究——以诺贝尔奖获得者论文为例[J]. 图书情报工作, 2015, 59(24): 109-114. 35 Al-Jamimi H A, BinMakhashen G M, Bornmann L. Use of bibliometrics for research evaluation in emerging markets economies: a review and discussion of bibliometric indicators[J]. Scientometrics, 2022, 127(10): 5879-5930. 36 Gou Z Y, Meng F, Chinchilla-Rodríguez Z, et al. Encoding the citation life-cycle: the operationalization of a literature-aging conceptual model[J]. Scientometrics, 2022, 127(8): 5027-5052. 37 Marres N, de Rijcke S. From indicators to indicating interdisciplinarity: a participatory mapping methodology for research communities in-the-making[J]. Quantitative Science Studies, 2020, 1(3): 1041-1055. 38 Small H. Co-citation in the scientific literature: a new measure of the relationship between two documents[J]. Journal of the American Society for Information Science, 1973, 24(4): 265-269. 39 Scientometrics[EB/OL]. [2025-11-27]. https://link.springer.com/journal/11192/articles. 40 Hummon N P, Dereian P. Connectivity in a citation network: the development of DNA theory[J]. Social Networks, 1989, 11(1): 39-63. 41 Liu J S, Lu L Y Y, Ho M H. A few notes on main path analysis[J]. Scientometrics, 2019, 119(1): 379-391. 42 Yu D J, Yan Z P. Main path analysis considering citation structure and content: case studies in different domains[J]. Journal of Informetrics, 2023, 17(1): 101381. 43 Van Raan A F J. Comments on Henry Small, recipient of the 1987 Derek de Solla Price award[J]. Scientometrics, 1988, 14(5): 361-363. 44 OECD. Making open science a reality[R]. Paris: OECD Publishing, 2015: 25. 45 Kwok K L. A probabilistic theory of indexing and similarity measure based on cited and citing documents[J]. Journal of the American Society for Information Science, 1985, 36(5): 342-351. 46 Li P, Liu H Y, Yu J X, et al. Fast single-pair simrank computation[C]// Proceedings of the 2010 SIAM International Conference on Data Mining. Philadelphia: Society for Industrial and Applied Mathematics, 2010: 571-582. 47 Yu W R, McCann J, Zhang C Y, et al. Scaling high-quality pairwise link-based similarity retrieval on billion-edge graphs[J]. ACM Transactions on Information Systems, 2022, 40(4): Article No.78. 48 Belter C W. A bibliometric analysis of NOAA’s office of ocean exploration and research[J]. Scientometrics, 2013, 95(2): 629-644. 49 Small H, Griffith B C. The structure of scientific literatures Ⅰ: identifying and graphing specialties[J]. Science Studies, 1974, 4(1): 17-40. 50 Griffith B C, Small H G, Stonehill J A, et al. The structure of scientific literatures Ⅱ: toward a macro- and microstructure for science[J]. Science Studies, 1974, 4(4): 339-365. 51 White H D, Griffith B C. Author cocitation: a literature measure of intellectual structure[J]. Journal of the American Society for Information Science, 1981, 32(3): 163-171. 52 Culnan M J. Mapping the intellectual structure of MIS, 1980-1985: a co-citation analysis[J]. MIS Quarterly, 1987, 11(3): 341-353. 53 Batisti? S, ?erne M, Vogel B. Just how multi-level is leadership research? A document co-citation analysis 1980–2013 on leadership constructs and outcomes[J]. The Leadership Quarterly, 2017, 28(1): 86-103. 54 Batisti? S, van der Laken P. History, evolution and future of big data and analytics: a bibliometric analysis of its relationship to performance in organizations[J]. British Journal of Management, 2019, 30(2): 229-251. 55 Zupic I, ?ater T. Bibliometric methods in management and organization[J]. Organizational Research Methods, 2015, 18(3): 429-472. 56 Cobo M J, López-Herrera A G, Herrera-Viedma E, et al. Science mapping software tools: review, analysis, and cooperative study among tools[J]. Journal of the American Society for Information Science and Technology, 2011, 62(7): 1382-1402. 57 McCain K W. The author cocitation structure of macroeconomics[J]. Scientometrics, 1983, 5(5): 277-289. 58 McCain K W. Longitudinal author cocitation mapping: the changing structure of macroeconomics[J]. Journal of the American Society for Information Science, 1984, 35(6): 351-359. 59 Zitt M, Bassecoulard E. Reassessment of co-citation methods for science indicators: effect of methods improving recall rates[J]. Scientometrics, 1996, 37(2): 223-244. 60 Schneider J W, Borlund P. Introduction to bibliometrics for construction and maintenance of thesauri: methodical considerations[J]. Journal of Documentation, 2004, 60(5): 524-549. 61 Cobo M J, López-Herrera A G, Herrera-Viedma E, et al. SciMAT: a new science mapping analysis software tool[J]. Journal of the American Society for Information Science and Technology, 2012, 63(8): 1609-1630. |
|
|
|