| 
					
						|  |  
    					|  |  
    					| Auto-Identification of Authors Affiliation Based on Class-Center Vectors |  
						| He Tao, Wang Guifang, and Ma |  
						| Wuhan Documentation and Information Center, Chinese Academy of Sciences, Wuhan 430071 |  
						|  |  
					
						| 
								
									| 
											
                        					 
												
													
													    |  |  
														| 
													
													    | Abstract  When analyzing a large amount of scientific and technical literature, identification of the author  s affiliation is always necessary. A key step in this task is matching the author  s address to the corresponding institution. Authors from one institution often state their affiliations in various forms in English. This causes string-matching methods to yield unsatisfactory results. In this paper, a machine learning method known as “class-center vectors” has been proposed to solve this problem according to the characteristics of the author  s address. Compared with traditional methods, our method does not require matching rules to be written manually. The experimental results of Chinese Academy of Sciences (CAS) author  s address data sets illustrate the feasibility of our method. |  
															| Received: 13 August 2018 |  |  |  |  
													
																												  
															| 1 梁娜, 张晓林, 钱力, 等. 开放获取论文推送转发服务系统iSwitch: 技术流程与标准[J]. 现代图书情报技术, 2014, 30(10): 9-13. 2 ArasuA, KaushikR. A grammar-based entity representation framework for data cleaning[C]// Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2009: 233-244.
 3 师洪波, 钱力, 张晓林, 等. 开放获取论文推送转发服务系统iSwitch: 论文接收与解析[J]. 现代图书情报技术, 2015(6): 1-6.
 4 AumuellerD. Towards Web supported identification of top affiliations from scholarly papers[C]// Proceedings of the Database Systems in Business, Technology and Web. Munster: WWU, 2009: 237-246.
 5 AumuellerD, RahmE. Web-based affiliation matching[C]// Proceedings of the International Conference on Information Quality. Potsdam: Potsdam University, 2009: 246-256.
 6 RahmE, ThorA. Citation analysis of database publications[J]. ACM SIGMOD Record, 2005, 34(4): 48-53.
 7 KimS B, HanK S, RimH C, et al. Some effective techniques for naive bayes text classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(11): 1457-1466.
 8 YangY M, LiuX. A re-examination of text categorization methods[C]// Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 1999: 42-49.
 9 RuizM, SrinivasanP. Hierarchical neural networks for text categorization[C]// Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 1999: 281-282.
 10 BlumerA, EhrenfeuchtA, HausslerD, et al. Occam’s razor[J]. Information Processing Letters, 1987, 24(6): 377-380.
 11 ZhangW, YoshidaT, TangX J. A comparative study of TF*IDF, LSI and multi-words for text classification[J]. Expert Systems with Applications, 2011, 38(3): 2758-2765.
 12 宗成庆. 统计自然语言处理[M]. 第2版. 北京: 清华大学出版社, 2012: 428-429.
 13 ManningC D, RaghavanP, SchutzeH. Introduction to Information Retrieval[M]. Cambridge: Cambridge University Press, 2008: 234-250.
 |  
											 
											 |  |  |