An Approach to Identify Emerging Technologies Using Machine Learning: A Case Study of Robotics
Zhou Yuan1, Liu Yufei2, Xue Lan1
1. School of Public Policy and Management, Tsinghua University, Beijing 100084; 2. The Center for Strategic Studies, Chinese Academy of Engineering, Beijing 100036
摘要基于文献数据帮助技术预见研究提高其信度和效度,逐渐受到国内外预见方法学的关注。但是,传统文献计量学无法高通量的处理数据,分析时未能考虑文献的语义信息,同时,无法有效的嵌入技术专家领域知识与判断,使得适用性和有效性受到限制。因此,本文提出一种基于机器学习主题模型的新兴技术识别预见方法,通过对技术领域全样本的论文与专利数据的高通量融合处理,挖掘论文与专利的语义信息,从而提高技术识别的全面性与颗粒度一致性;在此基础上,将预见专家组的领域知识与判断,融入机器学习过程中,从而提高机器学习的准确度与识别新兴技术的能力,同时,使用论文与专利每年引用率作为指标,分析技术领域下细分技术的潜在新兴模式。本研究以机器人技术为例,提取Web of Science(WoS)论文数据库和Thomson Innovation(TI)专利数据库的十余万全领域海量数据,识别出机器人领域的新兴技术簇群,并进一步甄别全新技术颠覆和跨领域技术融合驱动等两种新兴技术出现模式,为新兴技术发展轨迹预见工作提供有益的支持。
周源, 刘宇飞, 薛澜. 一种基于机器学习的新兴技术识别方法: 以机器人技术为例[J]. 情报学报, 2018, 37(9): 939-955.
Zhou Yuan, Liu Yufei, Xue Lan. An Approach to Identify Emerging Technologies Using Machine Learning: A Case Study of Robotics. 情报学报, 2018, 37(9): 939-955.
[1] Pietrobelli C, Puppato F.Technology foresight and industrial strategy[J]. Technological Forecasting and Social Change, 2016, 110: 117-125. [2] Martin B R.Technology foresight: A review of recent government exercises[J]. Science, Technology, Industry Review, 1996, 17(1): 15-50. [3] Shin T.Delphi study at the multi-country level: gains and limitations[R]. NISTEP Study Material, 2001, 77: 161-172. [4] 穆荣平, 任中保, 袁思达, 等. 中国未来20年技术预见德尔菲调查方法研究[J]. 科研管理, 2006, 27(1): 1-7. [5] Tichy G.The over-optimism among experts in assessment and foresight[J]. Technological Forecasting and Social Change, 2004, 71(4): 341-363. [6] Liu H, Yu J, Xu J, et al.Identification of key oil refining technologies for China National Petroleum Co. (CNPC)[J]. Energy Policy, 2007, 35(4): 2635-2647. [7] Cho Y, Yoon S P, Kim K S.An industrial technology roadmap for supporting public R&D planning[J]. Technological Forecasting and Social Change, 2016, 107(1): 1-12. [8] Barnes S J, Mattsson J.Understanding current and future issues in collaborative consumption: A four-stage Delphi study[J]. Technological Forecasting and Social Change, 2016, 104(1): 200-211. [9] Yoon J, Kim K.Identifying rapidly evolving technological trends for R&D planning using SAO-based semantic patent networks[J]. Scientometrics, 2011, 88(1): 213-228. [10] Yeo W, Kim S, Park H, et al.A bibliometric method for measuring the degree of technological innovation[J]. Technological Forecasting and Social Change, 2015, 95(1): 152-162. [11] Jun S.A forecasting model for technological trend using unsupervised learning[M]// Database Theory and Application, Bio- Science and Bio-Technology. Heidelberg: Springer, 2011: 51-60. [12] 陈峰. 日本第八次技术预见方法的创新[J]. 中国科技论坛, 2007(8): 132-135. [13] 左晓利, 许晔. 日本第九次技术预测及启示[C]// 全国技术预见学术研讨会, 2012. [14] Venugopalan S, Rai V.Topic based classification and pattern identification in patents[J]. Technological Forecasting and Social Change, 2015, 94(1): 236-250. [15] Lehmann J, Isele R, Jakob M, et al.DBpedia—A large-scale, multilingual knowledge base extracted from Wikipedia[J]. Semantic Web, 2015, 6(2): 167-195. [16] Tseng Y H, Lin C J, Lin Y I.Text mining techniques for patent analysis[J]. Information Processing & Management, 2007, 43(5): 1216-1247. [17] Kim Y G, Suh J H, Park S C.Visualization of patent analysis for emerging technology[J]. Expert Systems with Applications, 2008, 34(3): 1804-1812. [18] Furukawa T, Mori K, Arino K, et al.Identifying the evolutionary process of emerging technologies: A chronological network analysis of World Wide Web conference sessions[J]. Technological Forecasting and Social Change, 2014, 91(1): 280-294. [19] Piskorski J, Yangarber R.Information extraction: Past, present and future[M]// Multi-source, Multilingual Information Extraction and Summarization, 2013: 23-49. [20] Tsourikov V M, Batchilo L S, Sovpel I V.Document semantic analysis/selection with knowledge creativity capability utilizing subject-action-object (SAO) structures[P]. US 6167370A, 2000. [21] Moehrle M G, Walter L, Geritz A, et al.Patent‐based inventor profiles as a basis for human resource decisions in research and development[J]. R&D Management, 2005, 35(5): 513-524. [22] Bergmann I, Butzke D, Walter L, et al.Evaluating the risk of patent infringement by means of semantic patent analysis: the case of DNA chips[J]. R&D Management, 2008, 38(5): 550-562. [23] Choi S, Yoon J, Kim K, et al.SAO network analysis of patents for technology trends identification: a case study of polymer electrolyte membrane technology in proton exchange membrane fuel cells[J]. Scientometrics, 2011, 88(3): 863-883. [24] Yoon J, Choi S, Kim K.Invention property-function network analysis of patents: a case of silicon-based thin film solar cells[J]. Scientometrics, 2011, 86(3): 687-703. [25] Park H, Ree J J, Kim K.An SAO-based approach to patent evaluation using TRIZ evolution trends[C]// Proceedings of the 6th International Conference on Management of Innovation and Technology (ICMIT). IEEE, 2012. [26] Park H, Ree J J, Kim, K.Identification of promising patents for technology transfers using TRIZ evolution trends[J]. Expert Systems with Applications, 2013, 40(2): 736-743. [27] Zhang Y, Zhou X, Porter A L, et al.How to combine term clumping and technology roadmapping for newly emerging science & technology competitive intelligence: “problem & solution” pattern based semantic TRIZ tool and case study[J]. Scientometrics, 2014, 101(2): 1375-1389. [28] Wang Y, Liu J, Qu J S, et al.Hashtag graph based topic model for tweet mining[C]// Proceedings of the 2014 IEEE International Conference on Data Mining. Washington, DC: IEEE Computer Society, 2014: 1025-1030. [29] Steyvers M, Smyth P, Rosen-Zvi M, et al.Probabilistic author-topic models for information discovery[C]// Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2004: 306-315. [30] Duriau V J, Reger R K, Pfarrer M D, et al.A content analysis of the content analysis literature in organization studies: Research themes, data sources, and methodological refinements[J]. Organizational Research Methods, 2007, 10(1): 5-34. [31] Ramage D, Hall D, Nallapati R, et al.Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora[C]// Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2009: 248-256. [32] Tantanasiriwong S, Haruechaiyasak C.Patent citation recommendation based on topic model expansion[C]// Proceedings of the Second Asian Conference on Information Systems. 2013: 77-78. [33] Wang M Y, Fang S C, Chang Y H.Exploring technological opportunities by mining the gaps between science and technology: Microalgal biofuels[J]. Technological Forecasting and Social Change, 2014, 92(1): 182-195. [34] 董放, 刘宇飞, 周源. 基于LDA-SVM论文摘要多分类新兴技术预测[J]. 情报杂志, 2017, 36(7): 40-45. [35] Hong T P, Lin C W, Yang K T, et al.Using TF-IDF to hide sensitive itemsets[J]. Applied Intelligence, 2013, 38(4): 502-510. [36] Jiang L, Li C, Wang S, et al.Deep feature weighting for naive Bayes and its application to text classification[J]. Engineering Applications of Artificial Intelligence, 2016, 52: 26-39. [37] Panigrahi S S, Mantri J K.A text based Decision Tree model for stock market forecasting[C]// Proceedings of the 2015 International Conference on Green Computing and Internet of Things. IEEE, 2015: 405-411. [38] Patri A, Patnaik Y.Random Forest and Stochastic Gradient Tree Boosting based approach for the prediction of airfoil self-noise[J]. Procedia Computer Science, 2015, 46: 109-121. [39] Dramé K, Mougin F, Diallo G.Large scale biomedical texts classification: a kNN and an ESA-based approaches[J]. Journal of Biomedical Semantics, 2016, 7(1): 1-12. [40] Haddoud M, Mokhtari A, Lecroq T, et al.Combining supervised term-weighting metrics for SVM text classification with extended term representation[J]. Knowledge and Information Systems, 2016, 49(3): 909-931.