|
|
A Method of Event Subject Words Filtering Based on Nonlinear Programming Theory |
GAO Yingfan1, SU Na2, ZHANG Yunliang1, HAN Hongqi1 |
1. Institute of Scientific and Technical Information of China, Beijing 100038; 2. Institute of Science and Development, Chunese Acaolemy of Science, Beijing 100190 |
|
|
Abstract This paper presents a method of event subject words filtering based on nonlinear programming theory. We identify the boundaries of subject phrases by computing the left and right adjacent entropy. This method can help in selecting more informative phrases as candidate keywords. We counted the frequency of the candidate keywords by searching the original document sets. Some noise words are filtered using this method. Finally, a nonlinear programming theory based function can be used to filter the noisy phrases. The experimental results proved that the method in this paper showed better performance compared with the classical TF-IDF filtering method.
|
Received: 15 March 2017
|
|
|
|
[1] 范维澄. 国家突发公共事件应急管理中科学问题的思考和建议[J]. 中国科学基金, 2007, 21(2): 71-76. [2] 陈飞, 刘奕群, 魏超, 等. 基于条件随机场方法的开放领域新词发现[J]. 软件学报, 2013, 24(5): 1051-1060. [3] 陈平, 周昌乐, 练睿婷. 一种改进的KEA关键词抽取算法研究[J]. 心智与计算, 2011(2): 48-54. [4] Popescu A M, Nguyen B, Etzioni O.OPINE: Extracting product features and opinions from reviews[C]// Proceedings of HLT/EMNLP on Demonstration. Stroudsburg: Association for Computational Linguistics, 2005: 32-33. [5] 夏天. 词语位置加权TextRank的关键词抽取研究[J]. 现代图书情报技术, 2013(9): 30-34. [6] 陈炯, 张永奎. 基于加权信息论的突发事件新闻主题抽取方法[J]. 计算机应用, 2008, 28(s1): 150-151. [7] 张永奎, 李红娟. 基于类别关键词的突发事件新闻文本分类方法[J]. 计算机应用, 2008, 28(s1): 139-140, 143. [8] 杨建林. 关键词选择策略及其对共词分析的影响[J]. 情报学报, 2014, 33(10): 1083-1090. [9] 金保华,林青,吴怀广. 基于中文关键词提取的预案智能匹配方案[J]. 郑州轻工业学院学报(自然科学版), 2013, 28(2): 78-82, 86. [10] 郑魁, 疏学明, 袁宏永. 网络舆情热点信息自动发现方法[J]. 计算机工程, 2010, 36(3): 4-6. [11] 樊梦佳, 段东圣, 杜翠兰, 等. 统计与规则相融合的领域术语抽取算法[J]. 计算机应用研究, 2016, 33(8): 2282-2285, 2306. [12] Kim K H, Choi S J.Label propagation through minimax paths for scalable semi-supervised learning[J]. Pattern Recognition Letters, 2014, 45: 17-25. [13] Zhu X J, Ghahramani Z, Laffert J.Semi-supervised learning using Gaussian fields and harmonic functions[C]// Proceeding of the 20th International Conference on Machine Learning. Palo Alto: AAAI Press, 2003: 912-919. [14] 孙新, 欧阳童, 严西敏, 等. 基于训练集裁剪的加权K近邻文本分类算法[J]. 情报工程, 2016(6): 8-16. [15] 姚恩瑜, 何勇, 陈仕平. 数学规划与组合优化[M]. 杭州: 浙江大学出版社, 2001: 10. [16] Nemhause G L, Wolsey L A.Integer and combinatorial optimization[M]. New York: John Wiley and Sons, 1999. |
|
|
|