Text Mining on the Government Work Reports of the State Council (1954-2017) and Social Transformation Research |
Wei Wei, Guo Chonghui, Chen Jingfeng |
Institute of Systems Engineering, Dalian University of Technology, Dalian 116024 |
Abstract The government work report of the State Council is a comprehensive policy text. This paper uses text mining technology to carry out a comprehensive multi-granularity, multi-level quantitative analysis of the government work reports. This has immense practical significance for acquiring an understanding of the evolution of domain content and the laws of social transformation discovery to relevant personnel. Firstly, a series of text preprocessing is done by using the Chinese word segmentation tool combined with three kinds of dictionaries created by us: the domain dictionary, the stop word dictionary, and the thesaurus dictionary. Then, frequent words, hot words, and new words are redefined and three kinds of corresponding feature mining methods are proposed. A quantitative calculation method for social vitality is proposed based on these new words, and then clustering analysis is conducted for feature words represented by a time series with a popular clustering method. According to the document information of the government work reports, we divide the time period from 1954 to 2017 into different stages, using which we conduct the pattern discovery for feature words combined with the feature words clustering results. Finally, our findings show that the selected frequent words, hot words, and new words in the government work reports can indicate the common problems, the hot issues and its evolution pattern, and the changes in social activity over the years. From the term clustering results and the reasonable time stages of the whole period, we get nine specific patterns of feature words.
Received: 27 July 2017
[1] 张兆曙. 非常规行动与社会变迁: 一个社会学的新概念与新论题[J]. 社会学研究, 2008(3): 172-200. [2] Ledwith M.Revolutionary social transformation: Democratic hopes, political possibilities and critical education[J]. Community Development Journal, 2001, 32(2): 325-330. [3] 韩大元, 孟凡壮. 中国社会变迁六十年的公民宪法意识[J]. 中国社会科学, 2014(12): 123-142. [4] Sewell Jr W H. Logics of history: Social theory and social transformation[M]. Chicago: University of Chicago Press, 2005. [5] 梁琳琳, 侯敏, 何宇茵. 中国历年《政府工作报告》词汇与社会变迁的计量研究[J]. 广西社会科学, 2014(4): 141-144. [6] 邓雪琳. 改革开放以来中国政府职能转变的测量——基于国务院政府工作报告(1978-2015)的文本分析[J]. 中国行政管理, 2015(8): 30-36. [7] 黄如花, 李白杨, 周力虹. 2005~2015年国内外政府数据开放共享研究述评[J]. 情报学报, 2016, 35(12): 1323-1334. [8] 文宏. 中国政府推进基本公共服务的注意力测量——基于中央政府工作报告(1954—2013)的文本分析[J]. 吉林大学社会科学学报, 2014, 54(2): 20-26. [9] 杨君. 晋升预期、政策承诺与治理绩效——基于15个副省级城市GAR的研究[J]. 公共行政评论, 2011, 4(5): 170-176. [10] 朱光喜, 金东日. 政府工作报告中的绩效自评估——基于2006—2010年省级政府工作报告的分析[J]. 公共行政评论, 2012, 5(3): 113-143. [11] Rule A, Cointet J P, Bearman P S.Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790-2014[J]. Proceedings of the National Academy of Sciences of the United States of America, 2015, 112(35): 10837. [12] 聂振华. 社会变迁中的司法[D]. 重庆: 西南政法大学, 2003. [13] 周芬. 基于语料库的台州《政府工作报告》主题词与社会变迁关系[J]. 浙江树人大学学报, 2017(1): 74-78. [14] Liu Q, Zhang H P, Yu H K, et al.Chinese lexical analysis using cascaded hidden Markov model[J]. Journal of Computer Research and Development, 2004, 41(8): 1421-1430. [15] Gambhir M, Gupta V.Recent automatic text summarization techniques: a survey[J]. Artificial Intelligence Review, 2017, 47(1): 1-66. [16] 王伟, Chen W, Zhu K, 等. 众筹融资成功率与语言风格的说服性——基于Kickstarter的实证研究[J]. 管理世界, 2016(5): 81-98. [17] 苗鑫, 李敏. 弹复导向的应急管理研究前沿挈领——代表性SSCI论文追踪及其研究方法简评[J]. 公共管理学报, 2013(4): 125-136. [18] 贾康, 苏京春. 论供给侧改革[J]. 管理世界, 2016, 270(3): 1-24. [19] 王立新. 世界领导地位的荣耀和负担: 信誉焦虑与冷战时期美国的对外军事干预[J]. 中国社会科学, 2016(2): 176-203. [20] Huda S, Miah S, Hassan M M, et al.Defending unknown attacks on cyber-physical systems by semi-supervised approach and available unlabeled data[J]. Information Sciences, 2017, 379: 211-228. [21] Patil L H, Atique M.A novel approach for feature selection method TF-IDF in document clustering[C]// Proceedings of 2013 IEEE 3rd International Advance Computing Conference. IEEE, 2013: 858-862. [22] 张海军, 史树敏, 朱朝勇, 等. 中文新词识别技术综述[J]. 计算机科学, 2010, 37(3): 6-10. [23] Wang Z, Liu T.Chinese unknown word identification based on local bigram model[J]. International Journal of Computer Processing of Oriental Languages, 2005, 18(3): 185-196. [24] McKeown M G, Curtis M E. The nature of vocabulary acquisition[M]. Psychology Press, 2014. [25] Lena P D, Margara L.Optimal global alignment of signals by maximization of Pearson correlation[J]. Information Processing Letters, 2010, 110(16): 679-686. [26] Blondel V D, Guillaume J L, Lambiotte R, et al.Fast unfolding of communities in large networks[J]. Journal of Statistical Mechanics: Theory and Experiment, 2008, 2008(10): 155-168. [27] Salton G.A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11): 613-620. [28] Trovati M, Bessis N.An influence assessment method based on co-occurrence for topologically reduced big data sets[J]. Soft Computing, 2016, 20(5): 2021-2030. [29] Martinez-Gil J.An overview of textual semantic similarity measures based on web intelligence[J]. Artificial Intelligence Review, 2014, 42(4): 935-943. [30] Kullback S, Leibler R A.On information and sufficiency[J]. Annals of Mathematical Statistics, 1951, 22(1): 79-86. [31] Boche H, Stanczak S.The Kullback-Leibler divergence and nonnegative matrices[J]. IEEE Transactions on Information Theory, 2006, 52(12): 5539-5545. [32] Stanczak S, Boche H.Information theoretic approach to the Perron root of nonnegative irreducible matrices[C]// Proceedings of IEEE Information Theory Workshop. IEEE, 2004: 254-259. [33] Ahlswede R, Gacs P.Spreading of sets in product spaces and hypercontraction of the Markov operator[J]. Annals of Probability, 1976, 4(6): 925-939. [34] Cover T M, Thomas J A.Elements of information theory[M]. Beijing: Tsinghua University Press, 2003. [35] Lesniewski A, Ruskai M B.Monotone Riemannian metrics and relative entropy on noncommutative probability spaces[J]. Journal of Mathematical Physics, 1999, 40(11): 5702-5724. [36] Michel J B, Yuan K S, Aiden A P, et al.Quantitative analysis of culture using millions of digitized books[J]. Science, 2011, 331(6014): 176-182. [37] 郭崇慧, 魏伟, 任晓玲. 文化组学研究综述[J]. 情报学报, 2014, 33(7): 765-774. [38] 孙秀林, 陈华珊. 互联网与社会学定量研究[J]. 中国社会科学, 2016(7): 119-125. [39] 岳天明, 陈继. 国内关于社会活力若干问题研究综述[J]. 创新, 2010(4): 38-40. [40] 常东亮, 董慧. 论转型社会境遇中的社会交往与社会活力[J]. 理论导刊, 2012(1): 46-50. [41] 董慧. 社会活力论[D]. 武汉: 华中科技大学, 2008. [42] Lin G, Schramm R M.China’s foreign exchange policies since 1979: A review of developments and an assessment[J]. China Economic Review, 2003, 14(3): 246-280. [43] 黄新华. 中国经济体制改革的制度分析[D]. 厦门: 厦门大学, 2002. [44] 范恒山. 中国经济体制改革的历史经验和基本方向[J]. 理论前沿, 2006(14): 5-9. |