基于孪生网络的基金与受资助论文相关性判别模型构建研究

doi:10.3772/j.issn.1000-0135.2020.06.005

情报学报

2020, Vol. 39

Issue (6): 609-618 DOI: 10.3772/j.issn.1000-0135.2020.06.005

Current Issue | Archive | Adv Search

Research on Constructing a Model of Correlation Discrimination Between Funds and Funded Papers Based on Siamese Network

Ye Wenhao^1,2, Wang Dongbo³, Shen Si⁴, Su Xinning^1,2

1.School of Information Management, Nanjing University, Nanjing 210023
2.Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023
3.College of Information and Technology, Nanjing Agricultural University, Nanjing 210095
4.School of Economics & Management, Nanjing University of Science & Technology, Nanjing 210094

Abstract
Figure/Table
References (0)
Related Citation (12)

Download: PDF (2604 KB) HTML (119 KB)
Export: BibTeX | EndNote (RIS)

Abstract To explore the phenomenon of mislabeling fund projects in research papers, this study proposes a deep learning model to calculate the correlation between the fund and its sponsored paper. Considering the National Social Science Fund Project and its sponsored papers as the data source, the similarity between the fund title and the title and abstract of the paper is calculated based on the word2vec model. The correlation score of text similarity establishes that there are differences between the fund content and its sponsored papers. By manually reviewing the low-similarity data pairs, we confirm that some funds are mislabeled. Finally, the correlation model between the fund and its sponsored papers is developed. This model is effective in detecting the papers with mislabeled fund projects with a precision of over 99%. The recall and F-score of the model that uses Transformer as the encoder are estimated at 89.13% and 94.22%, respectively. This model can aid in suppressing the fund s mislabeling behavior effectively from both author submissions and journal reviews.

Key words： funded papers similarity Siamese Network self attention mechanism

Received: 23 July 2019

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Ye Wenhao
	Wang Dongbo
	Shen Si
	Su Xinning

Cite this article:

Ye Wenhao,Wang Dongbo,Shen Si, et al. Research on Constructing a Model of Correlation Discrimination Between Funds and Funded Papers Based on Siamese Network[J]. 情报学报, 2020, 39(6): 609-618.

URL:

https://qbxb.istic.ac.cn/EN/10.3772/j.issn.1000-0135.2020.06.005 OR https://qbxb.istic.ac.cn/EN/Y2020/V39/I6/609

1 Tainer J A, Abt H A, Hargens L L, et al. Science, citation, and funding[J]. Science, 1991, 251(5000): 1408-1409.
2 Zhao S X, Lou W, Tan A M, et al. Do funded papers attract more usage?[J]. Scientometrics, 2018, 115(1): 153-168.
3 Morillo F. Public-private interactions reflected through the funding acknowledgements[J]. Scientometrics, 2016, 108(3): 1193-1204.
4 Mejia C, Kajikawa Y. Using acknowledgement data to characterize funding organizations by the types of research sponsored: The case of robotics research[J]. Scientometrics, 2018, 114(3): 883-904.
5 夏朝晖. 基金论文比在科技期刊评价体系中的作用探析[J]. 中国科技期刊研究, 2008, 19(4): 574-577.
6 刘睿远, 刘雪立, 王璞, 等. 基金论文比作为科技期刊评价指标的合理性——基于SCI数据库中眼科学期刊的实证研究[J]. 中国科技期刊研究, 2013, 24(3): 472-476.
7 王谦, 林萍, 孙昌朋, 等. 医学期刊基金论文比与影响因子等指标的关系及影响因素[J]. 中国科技期刊研究, 2015, 26(6): 634-638.
8 吕小红. 正确对待基金论文严格审核基金信息[J]. 编辑学报, 2012, 24(5): 445-447.
9 赵丽莹, 杨波, 张荣丽, 等. 关于科技论文多项基金标注的几点建议[J]. 中国科技期刊研究, 2009, 20(4): 729-731.
10 白雪娜, 张辉玲, 黄修杰. 科技论文基金项目标注的不端行为及防范对策研究——基于178篇论文标注209个国家自然科学基金项目的实证分析[J]. 编辑学报, 2017, 29(3): 260-264.
11 韩磊, 邱源. 学术期刊须警惕基金论文中基金项目不实标注现象[J]. 编辑学报, 2017, 29(2): 151-154.
12 Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[OL]. https://arxiv.org/pdf/1301.3781.pdf.
13 Song Y, Shi S M, Li J, et al. Directional Skip-Gram: Explicitly distinguishing left and right context for word embeddings[C]// Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2018, 2: 175-180.
14 Kusner M J, Sun Y, Kolkin N I, et al. From word embeddings to document distances[C]// Proceedings of the 32nd International Conference on Machine Learning. JMLR, 2015: 957-966.
15 Chopra S, Hadsell R, LeCun Y. Learning a similarity metric discriminatively, with application to face verification[C]// Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2005, 1: 539-546.
16 Mueller J, Thyagarajan A. Siamese recurrent architectures for learning sentence similarity[C]// Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2016: 2786-2792.
17 Neculoiu P, Versteegh M, Rotaru M. Learning text similarity with Siamese recurrent networks[C]// Proceedings of the 1st Workshop on Representation Learning for NLP. Stroudsburg: Association for Computational Linguistics, 2016: 148-157.
18 Cho K, van Merri?nboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 1724-1734.
19 Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2017: 6000-6010.
20 Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[OL]. https://arxiv.org/pdf/1409.0473.pdf.
21 Luong M T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 1412-1421.

Editorial Office: JCSSTI Editorial Office, No.15 fuxing road, haidian, Beijing 100038
Tel: +86(010)68598273; Fax: +86(010)68598285; E-mail: qbxb@istic.ac.cn
Copyright © 2015 by the Journal of The China Society for Scientific and Technical Information
ISSN: 1000-0135 CN: 11-2257 / G3