|
|
Analysis of the Evolutionary Trend of Technical Topics in Patents Based on LDA and HMM: Taking Marine Diesel Engine Technology as an Example |
Chen Wei1, Lin Chaoran1, Li Jinqiu1, Yang Zaoli2 |
1. School of Economics and Management, Harbin Engineering University, Harbin 150001; 2. School of Economics and Management, Beijing University of Technology, Beijing 100124 |
|
|
Abstract Identifying potential research hotspots from a large number of patents is a crucial strategic issue for both enterprises and countries. In view of the problems in the current analysis of patents, such as the non-repeatability of manual classification and unrecognized specialized vocabulary in natural language processing, a combination method is proposed here as follows. First, we use the Viterbi algorithm to identify specialized terms in patent documents. Second, we introduce the LDA algorithm from machine learning to capture latent topic clusters in patent documents. Third, combining the hidden Markov model and double stochastic process, the distribution and evolution of existing technology topics are analyzed and future technical trends are predicted. Finally, this study uses marine diesel engine technology as an example of applying the above combination method to analyze the topic distribution, evolutionary pattern, and future trend of marine diesel engine technology. The experimental results prove that the proposed method shows better performance.
|
Received: 27 December 2017
|
|
|
|
[1] Kelly K.What technology wants[M]. Penguin Books, 2010: 71-74. [2] 谢志明, 张媛, 贺正楚, 等. 新能源汽车产业专利趋势分析[J]. 中国软科学, 2015(9): 127-141. [3] 韩震, 沈君, 曲莎莎. RFID技术趋势及竞争态势的专利计量分析[J]. 科研管理, 2013(7): 11-16. [4] 中国科学院综合计划局, 中国科学院国家科学图书馆成都文献情报中心. 中国科学院专利分析报告[R]. 成都: 中国科学院, 2015. [5] 林岩. 基于专利数据的知识计量研究评述[J]. 科技管理研究, 2008(9): 91-93. [6] 余江, 陈凯华. 中国战略性新兴产业的技术创新现状与挑战——基于专利文献计量的角度[J]. 科学学研究, 2012(5): 682-695. [7] 刘云, 刘璐, 闫哲, 等. 基于专利计量的全球碳纳米管领域技术创新特征分析[J]. 科研管理, 2016(S1): 337-345. [8] 刘云, 夏民, 武晓明. 中国最大500家外商投资企业在华专利及影响的计量研究[J]. 预测, 2003(6): 19-23. [9] 丁堃, 曲昭, 张春博. 比较视角下的中美银行专利计量分析和创新对策研究[J]. 科研管理, 2014(9): 138-146. [10] Magri A, Giovannini F, Connan R, et al.Nutrient management from biogas digester effluents: a bibliometric-based analysis of publications and patents[J]. International Journal of Environmental Science and Technology, 2017,14(8): 1739-1756. [11] 李欣, 王静静, 杨梓, 等. 基于SAO结构语义分析的新兴技术识别研究[J]. 情报杂志, 2016(3): 80-84. [12] Figuerola C, Marco F, Pinto M.Mapping the evolution of library and information science (1978-2014) using topic modeling on lisa[J]. Scientometrics, 2017,112(3): 1507-1535. [13] Hu B B, Dong X L, Zhang C W, et al.A lead-lag analysis of the topic evolution patterns for preprints and publications[J]. Journal of the Association for Information Science and Technology, 2015, 66(12): 2643-2656. [14] Jiang H C, Qiang M S, Lin P.Finding academic concerns of the three gorges project based on a topic modeling approach[J]. Ecological Indicators, 2016, 60: 693-701. [15] Li W W.Application of grey prediction theory to forecast technology input within the Chinese high-tech industries[C]// Proceedings of the 3rd International Conference on Advanced Computer Control. IEEE, 2011: 88-92. [16] 李柏洲, 李新. 基于集对分析的企业技术依赖预警及其演化趋势测度[J]. 运筹与管理, 2015, 24(2): 262-271. [17] 黄鲁成, 成雨, 吴菲菲, 等. 关于颠覆性技术识别框架的探索[J]. 科学学研究, 2015, 33(5): 654-664. [18] 李欣, 黄鲁成. 技术路线图方法探索与实践应用研究——基于文献计量和专利分析视角[J]. 科技进步与对策, 2016, 33(5): 62-72. [19] Heo G E, Kang K Y, Song M, et al.Analyzing the field of bioinformatics with the multi-faceted topic modeling technique[J]. BMC Bioinformatics, 2017, 18(Suppl 7): 251. [20] 官建成. 产品创新扩散中的随机现象[J]. 中国管理科学, 1994(3): 44-50. [21] Viterbi A.Viterbi algorithm[M]. John Wiley & Sons, 2003: 6246. [22] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. [23] Blei D.Probabilistic topic models[J]. Communications of the ACM, 2012, 55(4): 77-84. [24] Teh Y W, Jordan M I, Beal M J, et al.Sharing clusters among related groups: hierarchical Dirichlet processes[C]// Proceedings of the Neural Information Processing Systems Conference, 2005: 1385-1392. [25] 关鹏, 王曰芬. 科技情报分析中LDA主题模型最优主题数确定方法研究[J]. 现代图书情报技术, 2016(9): 42-50. [26] Heinrich G. Parameter estimation for text analysis[R/OL]. http:// rakaposhi.eas.asu.edu/f12-cse571-mailarchive/pdflwcW7WccCL. pdf. [27] Welch L.Hidden Markov models and the Baum-welch algorithm[J]. IEEE Information Theory Society Newsletter, 2003, 53(4): 10-13. [28] 刘新华. 中国海洋战略的层次性探析[J]. 中国软科学, 2017(6): 1-13. [29] Hetland M.Network programming[M]// Beginning Python. Springer, 2017: 273-287. [30] AlSumait L, Barbará D, Domeniconi C. On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking[C]// Proceedings of the 2008 Eighth IEEE International Conference on Data Mining. Washington, DC: IEEE Computer Society, 2008: 3-12. |
|
|
|