|
|
Patent Keyword Extraction Driven by Claim Features |
Yu Yan1,2, Shang Mingjie1, Zhao Naixuan1 |
1.Institute of the Information Management and Technology, Nanjing Tech University, Nanjing 210009 2.School of Electronics and Computer, Chengxian College, Southeast University, Nanjing 211816 |
|
|
Abstract The current patent keyword extraction is primarily based on general text keyword extraction methods without considering patent features. Thus, this paper proposes a patent keyword extraction method driven by patent claim features. This method includes selecting candidate keywords based on the longest common substring, removing redundant candidate keywords based on the information gain ratio, and integrating a specific degree to weigh candidate keywords. The results of real patent data demonstrate the effectiveness and feasibility of the proposed method.
|
Received: 20 April 2020
|
|
|
|
1 Salton G, Buckley C. Term-weighting approaches in automatic text retrieval[J]. Information Processing & Management, 1988, 24(5): 513-523. 2 Noh H, Jo Y, Lee S. Keyword selection and processing strategy for applying text mining to patent analysis[J]. Expert Systems with Applications, 2015, 42(9): 4348-4360. 3 Wang L T, Li F. SJTULTLAB: chunk based method for keyphrase extaction[C]// Proceedings of the 5th International Workshop on Semantic Evaluation. Stroudburg: Association for Computational Linguistics, 2010: 158-161. 4 Witten I H, Paynter G W, Frank E, et al. KEA: practical automatic keyphrase extraction[C]// Proceedings of the fourth ACM Conference on Digital Libraries. New York: ACM Press, 1999: 254-255. 5 Zhang K, Xu H, Tang J, et al. Keyword extraction using support vector machine[C]// Proceedings of the 7th International Conference on Web-Age Information Management. Heidelberg: Springer, 2006: 85-96. 6 陈忆群, 周如旗, 朱蔚恒, 等. 挖掘专利知识实现关键词自动抽取[J]. 计算机研究与发展, 2016, 53(8): 1740-1752. 7 Hu J, Li S B, Yao Y, et al. Patent keyword extraction algorithm based on distributed representation for patent classification[J]. Entropy, 2018, 20(2): 104. 8 Zhang C, Wang H, Liu Y, et al. Automatic keyword extraction from documents using conditional random fields[J]. Journal of Computer Information Systems, 2008, 4(3): 1169-1180. 9 Gollapalli S D, Li X, Yang P. Incorporating expert knowledge into keyphrase extraction[C]// Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017: 3180-3187. 10 成彬, 施水才, 都云程, 等. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108. 11 陈伟, 吴友政, 陈文亮, 等. 基于BiLSTM-CRF的关键词自动抽取[J]. 计算机科学, 2018, 45(S1): 91-96, 113. 12 Sterckx L, Demeester T, Deleu J, et al. Creation and evaluation of large keyphrase extraction collections with multiple opinions[J]. Language Resources and Evaluation, 2018, 52(2): 503-532. 13 刘峰, 吴瑞红, 徐川, 等. 专利文献中关键词抽取方法的改进[J]. 情报杂志, 2014, 33(12): 36-40. 14 黄磊, 伍雁鹏, 朱群峰. 关键词自动提取方法的研究与改进[J]. 计算机科学, 2014, 41(6): 204-207. 15 张瑾. 基于改进TF-IDF算法的情报关键词提取方法[J]. 情报杂志, 2014, 33(4): 153-155. 16 牛萍, 黄德根. TF-IDF与规则相结合的中文关键词自动抽取研究[J]. 小型微型计算机系统, 2016, 37(4): 711-715. 17 Joung J, Kim K. Monitoring emerging technologies for technology planning using technical keyword based analysis from patent data[J]. Technological Forecasting and Social Change, 2017, 114(1): 281-292. 18 Nguyen K L, Shin B J, Yoo S J. Hot topic detection and technology trend tracking for patents utilizing term frequency and proportional document frequency and semantic information[C]// Proceedings of the 2016 International Conference on Big Data and Smart Computing. IEEE, 2016: 223-230. 19 Mihalcea R, Tarau P. TextRank: bringing order into texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2004: 404-411. 20 夏天. 词语位置加权TextRank的关键词抽取研究[J]. 现代图书情报技术, 2013(9): 30-34. 21 Florescu C, Caragea C. PositionRank: an unsupervised approach to keyphrase extraction from scholarly documents[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017: 1105-1115. 22 李航, 唐超兰, 杨贤, 等. 融合多特征的TextRank关键词抽取方法[J]. 情报杂志, 2017, 36(8): 183-187. 23 刘竹辰, 陈浩, 于艳华, 等. 词位置分布加权TextRank的关键词提取[J]. 数据分析与知识发现, 2018, 2(9): 74-79. 24 Boudin F. Unsupervised keyphrase extraction with multipartite graphs[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2018: 667-672. 25 顾益军, 夏天. 融合LDA与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2014(Z1): 41-47. 26 刘啸剑, 谢飞, 吴信东. 基于图和LDA主题模型的关键词抽取算法[J]. 情报学报, 2016, 35(6): 664-672. 27 夏天. 词向量聚类加权TextRank的关键词抽取[J]. 数据分析与知识发现, 2017, 1(2): 28-34. 28 宁建飞, 刘降珍. 融合Word2Vec与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2016(6): 20-27. 29 Wang R, Liu W, McDonald C. Using word embeddings to enhance keyword identification for scientific publications[C]// Proceedings of the Australasian Database Conference. Cham: Springer, 2015: 257-268. 30 Dan G. Algorithms on stings, trees, and sequences[J]. ACM SIGACT News, 1997, 28(4): 41-60. 31 Lahiri S, Mihalcea R, Lai P H. Keyword extraction from emails[J]. Natural Language Engineering, 2017, 23(2): 295-317. 32 王志宏, 过弋. 基于词句重要性的中文专利关键词自动抽取研究[J]. 情报理论与实践, 2018, 41(9): 123-129, 160. 33 Carletta J. Assessing agreement on classification tasks: the kappa statistic[J]. Computational linguistics, 1996, 22(2): 249-254. 34 结巴分词[EB/OL]. [2020-06-01]. https://github.com/fxsjy/jieba. 35 哈尔滨工业大学停用词[EB/OL]. [2020-06-01]. https://github.com/goto456/stopwords. |
|
|
|