|
|
Auto-Identification of Authors Affiliation Based on Class-Center Vectors |
He Tao, Wang Guifang, and Ma |
Wuhan Documentation and Information Center, Chinese Academy of Sciences, Wuhan 430071 |
|
|
Abstract When analyzing a large amount of scientific and technical literature, identification of the author s affiliation is always necessary. A key step in this task is matching the author s address to the corresponding institution. Authors from one institution often state their affiliations in various forms in English. This causes string-matching methods to yield unsatisfactory results. In this paper, a machine learning method known as “class-center vectors” has been proposed to solve this problem according to the characteristics of the author s address. Compared with traditional methods, our method does not require matching rules to be written manually. The experimental results of Chinese Academy of Sciences (CAS) author s address data sets illustrate the feasibility of our method.
|
Received: 13 August 2018
|
|
|
|
1 梁娜, 张晓林, 钱力, 等. 开放获取论文推送转发服务系统iSwitch: 技术流程与标准[J]. 现代图书情报技术, 2014, 30(10): 9-13. 2 ArasuA, KaushikR. A grammar-based entity representation framework for data cleaning[C]// Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2009: 233-244. 3 师洪波, 钱力, 张晓林, 等. 开放获取论文推送转发服务系统iSwitch: 论文接收与解析[J]. 现代图书情报技术, 2015(6): 1-6. 4 AumuellerD. Towards Web supported identification of top affiliations from scholarly papers[C]// Proceedings of the Database Systems in Business, Technology and Web. Munster: WWU, 2009: 237-246. 5 AumuellerD, RahmE. Web-based affiliation matching[C]// Proceedings of the International Conference on Information Quality. Potsdam: Potsdam University, 2009: 246-256. 6 RahmE, ThorA. Citation analysis of database publications[J]. ACM SIGMOD Record, 2005, 34(4): 48-53. 7 KimS B, HanK S, RimH C, et al. Some effective techniques for naive bayes text classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(11): 1457-1466. 8 YangY M, LiuX. A re-examination of text categorization methods[C]// Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 1999: 42-49. 9 RuizM, SrinivasanP. Hierarchical neural networks for text categorization[C]// Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 1999: 281-282. 10 BlumerA, EhrenfeuchtA, HausslerD, et al. Occam’s razor[J]. Information Processing Letters, 1987, 24(6): 377-380. 11 ZhangW, YoshidaT, TangX J. A comparative study of TF*IDF, LSI and multi-words for text classification[J]. Expert Systems with Applications, 2011, 38(3): 2758-2765. 12 宗成庆. 统计自然语言处理[M]. 第2版. 北京: 清华大学出版社, 2012: 428-429. 13 ManningC D, RaghavanP, SchutzeH. Introduction to Information Retrieval[M]. Cambridge: Cambridge University Press, 2008: 234-250. |
|
|
|