Artificial Intelligence Technology: Novel Strategy for Patent Dataset Creation Based on Machine Learning
Chen Yue1, Song Kai1, Liu Anrong2, Cao Xiaoyang2
1.Institution of Science of Science and S&T Management & WISE Lab, Dalian University of Technology, Dalian 116024 2.Chinese Academy of Engineering Innovation Strategy, Beijing 100089
1 米黑尔·罗科, 威廉·班布里奇. 聚合四大科技, 提高人类能力: 纳米技术、生物技术、信息技术和认知科学[M]. 蔡曙山, 王志栋, 周允程, 等译. 北京: 清华大学出版社, 2010. 2 NRC. Convergence: facilitating transdisciplinary integration of life sciences, physical sciences, engineering, and beyond[M]. Washington, DC: National Academies Press, 2014. 3 Kim J, Jun S, Jang D, et al. Sustainable technology analysis of artificial intelligence using Bayesian and social network models[J]. Sustainability, 2018, 10(2): 115. 4 Huang L, Miao W, Zhang Y, et al. Patent network analysis for identifying technological evolution: a case study of China’s artificial intelligence technologies[C]// Proceedings of the 2017 Portland International Conference on Management of Engineering and Technology (PICMET). IEEE, 2017: 1-9. 5 李悦, 苏成, 贾佳, 等. 基于科学计量的世界人工智能领域发展状况分析[J]. 计算机科学, 2017, 44(12): 183-187. 6 Kim H W, Noh K R, Ahn S. Technology convergence map creation and country profile analysis in the field of artificial intelligence[J]. The Journal of the Korea Institute of Electronic Communication Sciences, 2017, 12(1): 139-146. 7 黄名选, 严小卫, 张师超. 查询扩展技术进展与展望[J]. 计算机应用与软件, 2007, 24(11): 1-4, 8. 8 Jones K S, Barber E O. What makes an automatic keyword classification effective?[J]. Journal of the American Society for Information Science, 1971, 22(3): 166-175. 9 Voorhees E M. The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval[R]. UMI Order No. GAX86-07224. New York: Cornell University, 1985. 10 贾君枝, 叶壮壮. 基于潜在语义索引的Wikidata机构实体聚类研究[J]. 数据分析与知识发现, 2019, 3(10): 56-65. 11 Jing Y F, Croft W B. An association thesaurus for information retrieval[C]// Proceedings of RIAO 1994 Conference. New York: CiteSeer, 1994, 94: 146-160. 12 Salton G, Buckley C. Improving retrieval performance by relevance feedback[J]. Journal of the American Society for Information Science, 1990, 41(4): 288-297. 13 Xu J X, Croft W B. Query expansion using local and global document analysis[C]// Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 1996: 4-11. 14 Broder A Z, Fontoura M, Gabrilovich E, et al. Robust classification of rare queries using web knowledge[C]// Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 2007: 231-238. 15 Furnas G W, Deerwester S, Dumais S T, et al. Information retrieval using a singular value decomposition model of latent semantic structure[C]// Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 1988: 465-480. 16 宋峻峰, 张维明, 肖卫东, 等. 基于本体的信息检索模型研究[J]. 南京大学学报(自然科学版), 2005, 41(2): 189-197. 17 Müller H M, Kenny E E, Sternberg P W. Textpresso: an ontology-based information retrieval and extraction system for biological literature[J]. PLoS Biology, 2004, 2(11): e309. 18 Wei J, Bressan S, Ooi B C. Mining term association rules for automatic global query expansion: methodology and preliminary results[C]// Proceedings of the First International Conference on Web Information Systems Engineering. IEEE, 2000, 1: 366-373. 19 Martín-Bautista M J, Sánchez D, Chamorro-Martínez J, et al. Mining web documents to find additional query terms using fuzzy association rules[J]. Fuzzy Sets and Systems, 2004, 148(1): 85-104. 20 Song M, Song I Y, Hu X H, et al. Integration of association rules and ontologies for semantic query expansion[J]. Data & Knowledge Engineering, 2007, 63(1): 63-75. 21 Cui H, Wen J R, Nie J Y, et al. Query expansion by mining user logs[J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(4): 829-839. 22 Fonseca B M, Golgher P B, De Moura E S, et al. Discovering search engine related queries using association rules[J]. Journal of Web Engineering, 2003, 2(4): 215-227. 23 许侃, 林原, 曲忱, 等. 专利查询扩展的词向量方法研究[J]. 计算机科学与探索, 2018, 12(6): 972-980. 24 Joachims T. Text categorization with support vector machines: learning with many relevant features[C]// Proceedings of the European Conference on Machine Learning. Heidelberg: Springer, 1998: 137-142. 25 Yang Y, Pedersen J P. A comparative study on feature selection in text categorization[C]// Proceedings of the Fourteenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers, 1997, 97: 412-420. 26 Peng H C, Long F H, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238. 27 Mikolov T, Yih W, Zweig G. Linguistic regularities in continuous space word representations[C]// Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2013: 746-751. 28 Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]// Proceedings of the 2013 Advances in Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2013: 3111-3119. 29 陈悦, LamirelJean-Charles, 刘则渊. 中国科学学40年研究主题变迁——基于特征最大化F指标的文本内容分析[J]. 科学学与科学技术管理, 2018, 39(12): 28-45. 30 Lamirel J C, Cuxac P, Chivukula A S, et al. Optimizing text classification through efficient feature selection based on quality metric[J]. Journal of Intelligent Information Systems, 2015, 45(3): 379-396. 31 Zhao J H, Wu H, Deng F Y, et al. Maximum value matters: finding hot topics in scholarly fields[OL]. (2017-10-18). https://arxiv.org/pdf/1710.06637.pdf. 32 Abdou M, Gloncák V, Bojar O. Variable mini-batch sizing and pre-trained embeddings[C]// Proceedings of the Second Conference on Machine Translation. Stroudsburg: Association for Computational Linguistics, 2017: 680-686. 33 Schmid H. Probabilistic part-of-speech tagging using decision trees[C]// Processing of the International Conference on New Methods in Language Processing. London and New York: Routledge, 2013: 154. 34 Salton G, Buckley C. Term-weighting approaches in automatic text retrieval[J]. Information Processing & Management, 1988, 24(5): 513-523.