基于类中心向量的论文作者归属机构自动识别方法研究

doi:10.3772/j.issn.1000-0135.2019.07.006

情报学报

2019, Vol. 38

Issue (7): 716-721 DOI: 10.3772/j.issn.1000-0135.2019.07.006

Current Issue | Archive | Adv Search

Auto-Identification of Authors Affiliation Based on Class-Center Vectors

He Tao, Wang Guifang, and Ma

Wuhan Documentation and Information Center, Chinese Academy of Sciences, Wuhan 430071

Abstract
Figure/Table
References
Related Citation (12)

Download: PDF (720 KB) HTML (74 KB)
Export: BibTeX | EndNote (RIS)

Abstract When analyzing a large amount of scientific and technical literature, identification of the author s affiliation is always necessary. A key step in this task is matching the author s address to the corresponding institution. Authors from one institution often state their affiliations in various forms in English. This causes string-matching methods to yield unsatisfactory results. In this paper, a machine learning method known as “class-center vectors” has been proposed to solve this problem according to the characteristics of the author s address. Compared with traditional methods, our method does not require matching rules to be written manually. The experimental results of Chinese Academy of Sciences (CAS) author s address data sets illustrate the feasibility of our method.

Key words： author’s address institution name class-center vectors machine learning

Received: 13 August 2018

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	He Tao
	Wang Guifang
	and Ma

Cite this article:

He Tao,Wang Guifang,and Ma. Auto-Identification of Authors Affiliation Based on Class-Center Vectors[J]. 情报学报, 2019, 38(7): 716-721.

URL:

https://qbxb.istic.ac.cn/EN/10.3772/j.issn.1000-0135.2019.07.006 OR https://qbxb.istic.ac.cn/EN/Y2019/V38/I7/716

1 梁娜, 张晓林, 钱力, 等. 开放获取论文推送转发服务系统iSwitch: 技术流程与标准[J]. 现代图书情报技术, 2014, 30(10): 9-13.
2 ArasuA, KaushikR. A grammar-based entity representation framework for data cleaning[C]// Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2009: 233-244.
3 师洪波, 钱力, 张晓林, 等. 开放获取论文推送转发服务系统iSwitch: 论文接收与解析[J]. 现代图书情报技术, 2015(6): 1-6.
4 AumuellerD. Towards Web supported identification of top affiliations from scholarly papers[C]// Proceedings of the Database Systems in Business, Technology and Web. Munster: WWU, 2009: 237-246.
5 AumuellerD, RahmE. Web-based affiliation matching[C]// Proceedings of the International Conference on Information Quality. Potsdam: Potsdam University, 2009: 246-256.
6 RahmE, ThorA. Citation analysis of database publications[J]. ACM SIGMOD Record, 2005, 34(4): 48-53.
7 KimS B, HanK S, RimH C, et al. Some effective techniques for naive bayes text classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(11): 1457-1466.
8 YangY M, LiuX. A re-examination of text categorization methods[C]// Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 1999: 42-49.
9 RuizM, SrinivasanP. Hierarchical neural networks for text categorization[C]// Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 1999: 281-282.
10 BlumerA, EhrenfeuchtA, HausslerD, et al. Occam’s razor[J]. Information Processing Letters, 1987, 24(6): 377-380.
11 ZhangW, YoshidaT, TangX J. A comparative study of TF*IDF, LSI and multi-words for text classification[J]. Expert Systems with Applications, 2011, 38(3): 2758-2765.
12 宗成庆. 统计自然语言处理[M]. 第2版. 北京: 清华大学出版社, 2012: 428-429.
13 ManningC D, RaghavanP, SchutzeH. Introduction to Information Retrieval[M]. Cambridge: Cambridge University Press, 2008: 234-250.

Editorial Office: JCSSTI Editorial Office, No.15 fuxing road, haidian, Beijing 100038
Tel: +86(010)68598273; Fax: +86(010)68598285; E-mail: qbxb@istic.ac.cn
Copyright © 2015 by the Journal of The China Society for Scientific and Technical Information
ISSN: 1000-0135 CN: 11-2257 / G3