|
|
The Study of Company Screening Method Based on Automatic Taxonomy Construction |
Huang Wenbin, Bai Haodong |
Department of Information Management, Peking University, Beijing 100871 |
|
|
Abstract In the equity trading market, investors' demand for scientific and effective discovery of the company groups engaged in specific business in the new third board market, is growing. Listed companies in this market have the characteristics of small business scope, high innovation, and strong cross-cutting. It is difficult for investors to find comparable companies with similar main businesses. This study proposes a method to obtain the hierarchical division among companies, based on automatic taxonomy construction. First, a weak supervision classification method is exploited to extract business terms from the main business text disclosed in annual reports. Second, clustering is conducted over the similarity of terms to obtain one business taxonomy; companies are mapped according to the terms appearing in their reports. As our experimental results show, the proposed method can help investors discover new business concepts from the financial market, understand the underlying connection between concepts and business models, compare companies in specific fields, and find investment targets.
|
Received: 28 February 2019
|
|
|
|
1 Alford A W. The effect of the set of comparable firms on the accuracy of the price-earnings valuation method[J]. Journal of Accounting Research, 1992, 30(1): 94-108. 2 Bhojraj S, Lee C M C. Who is my peer? A valuation-based approach to the selection of comparable firms[J]. Journal of Accounting Research, 2002, 40(2): 407-439. 3 Phillips R L, Ormsby R. Industry classification schemes: an analysis and review[J]. Journal of Business & Finance Librarianship, 2016, 21(1): 1-25. 4 Barra M. Global industry classification standard (GICS)[R]. New York: Standard & Poor’s, 2009. 5 Bhojraj S, Lee C M C, Oler D K. What’s my line? A comparison of industry classification schemes for capital market research[J]. Journal of Accounting Research, 2003, 41(5): 745-774. 6 上海申银万国证券研究所有限公司. 申银万国行业分类标准[EB/OL]. [2018-12-08]. http://www.swsindex.com/idx0530.aspx. 7 全国中小公司股份转让系统有限责任公司. 挂牌公司投资型行业分类指引[EB/OL]. (2018-01-03) [2018-12-08]. http://www.neeq.com.cn/uploads/1/file/public/201801/20180103150009_p13ukun0hq.docx. 8 De Franco G, Kothari S P, Verdi R S. The benefits of financial statement comparability[J]. Journal of Accounting Research, 2011, 49(4): 895-931. 9 郭峰, 徐玉生, 陈晓云, 等. 基于信息提取的面向行业应用文本分类算法[J]. 清华大学学报(自然科学版), 2005, 45(S1): 1810-1813. 10 Hoberg G, Phillips G. Product market synergies and competition in mergers and acquisitions: a text-based analysis[J]. The Review of Financial Studies, 2010, 23(10): 3773-3811. 11 Hoberg G, Phillips G. Text-based network industries and endogenous product differentiation[J]. Journal of Political Economy, 2016, 124(5): 1423-1465. 12 Tetlock P C, Saar-Tsechansky M, Macskassy S. More than words: quantifying language to measure firms' fundamentals[J]. The Journal of Finance, 2008, 63(3): 1437-1467. 13 曹四华. 基于LDA主题模型上市公司年报文本知识发现[D]. 北京: 中国地质大学, 2016. 14 Wang C Y, He X F, Zhou A Y. A short survey on taxonomy learning from text corpora: issues, resources and recent advances[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 1190-1203. 15 Hearst M A. Automatic acquisition of hyponyms from large text corpora[C]// Proceedings of the 14th Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 1992, 2: 539-545. 16 Vivaldi J, Màrquez L, Rodríguez H. Improving term extraction by system combination using boosting[C]// Proceedings of the European Conference on Machine Learning. Heidelberg: Springer, 2001: 515-526. 17 Fu R J, Guo J, Qin B, et al. Learning semantic hierarchies via word embeddings[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2014, 1: 1199-1209. 18 de Knijff J, Frasincar F, Hogenboom F. Domain taxonomy learning from text: the subsumption method versus hierarchical clustering[J]. Data & Knowledge Engineering, 2013, 83: 54-69. 19 杜慧平, 何琳, 侯汉清. 基于聚类分析的自然语言叙词表的自动构建[J]. 国家图书馆学刊, 2007, 16(3): 44-49. 20 Meijer K, Frasincar F, Hogenboom F. A semantic approach for extracting domain taxonomies from text[J]. Decision Support Systems, 2014, 62: 78-93. 21 Choi M J, Tan V Y F, Anandkumar A, et al. Learning latent tree graphical models[J]. Journal of Machine Learning Research, 2011, 12(4): 1771-1812. 22 Elkan C, Noto K. Learning classifiers from only positive and unlabeled data[C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2008: 213-220. 23 du Plessis M C, Niu G, Sugiyama M. Analysis of learning from positive and unlabeled data[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014, 1: 703-711. 24 Givoni I E, Chung C, Frey B J. Hierarchical affinity propagation[C]// Proceeding of the 27th Conference on Uncertainty in Artificial Intelligence. Barcelona: AUAI Press, 2011: 238-246. |
|
|
|