基于数据增强和<bold>ViT</bold>的印章识别方法研究

doi:10.3772/j.issn.1000-0135.2024.03.007

情报学报

2024, Vol. 43

Issue (3): 327-338 DOI: 10.3772/j.issn.1000-0135.2024.03.007

情报技术与应用

本期目录 | 过刊浏览 | 高级检索

基于数据增强和ViT的印章识别方法研究

张志剑^1,2,3, 夏苏迪⁴, 刘政昊^1,2,3, 王文慧^1,2,3, 陈帅朴^1,2,3, 霍朝光⁵

1.武汉大学信息管理学院,武汉 430072
2.武汉大学大数据研究院,武汉 430072
3.武汉大学信息资源研究中心,武汉 430072
4.南京中医药大学卫生经济管理学院,南京 210023
5.中国人民大学信息资源管理学院,北京 100872

A Study on Seal Recognition Method Based on Data Augmentation and Vision Transformer

Zhang Zhijian^1,2,3, Xia Sudi⁴, Liu Zhenghao^1,2,3, Wang Wenhui^1,2,3, Chen Shuaipu^1,2,3, Huo Chaoguang⁵

1.School of Information Management, Wuhan University, Wuhan 430072
2.Big Data Institute, Wuhan University, Wuhan 430072
3.The Center for Studies of Information Resources, Wuhan University, Wuhan 430072
4.School of Health Economics and Management, Nanjing University of Chinese Medicine, Nanjing 210023
5.School of Information Resource Management, Renmin University of China, Beijing 100872

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (2434 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要印章识别因采集标注困难和印章图像退化等导致识别难度较大。数据增强可以缓解数据缺乏的困境，结合ViT（vision transformer）模型提取印章的全局特征，可以提高复杂情境下的印章识别能力。首先根据印章所处的情境特点进行分析，针对分析结果制定数据增强策略，进而扩充训练集；然后将印章图像输入ViT模型中，进行特征提取和印章识别。本文采集并标注《兰亭序》等16幅书法字画上包含的1259枚印章，经过11个数据增强模块处理后，训练集包含127159枚印章图像。与基线模型ResNet50相比，ViT模型的F1值提高了12.17个百分点，去除数据增强所得扩展数据后，所有模型均无法收敛。在标注数据较少的情况下，通过数据增强和ViT模型可以对印章图像进行准确识别。本文方法尚缺少语义推理能力，无法识别训练集中未出现的印章。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	张志剑
	夏苏迪
	刘政昊
	王文慧
	陈帅朴
	霍朝光

关键词 ：印章识别, 深度学习, 数据增强, 数字人文

收稿日期: 2023-05-15

基金资助:国家社会科学基金“加快构建中国特色哲学社会科学学科体系、学术体系、话语体系”研究专项项目“新时代中国特色图情学基本理论问题研究”（19VXK09）。

作者简介: 张志剑，男，1994年生，博士研究生，主要研究方向为数字人文与知识组织；夏苏迪，男，1995年生，博士，讲师，主要研究方向为信息服务与用户、信息计量；刘政昊，男，1997年生，博士研究生，主要研究方向为知识组织与知识服务研究；王文慧，女，2001年生，硕士研究生，主要研究方向为数字人文；陈帅朴，男，1998年生，博士研究生，主要研究方向为知识组织与学术文本挖掘；霍朝光，通信作者，男，1990年生，博士，副教授，主要研究方向为政策信息学与文本挖掘，E-mail：huochaoguang@126.com；

引用本文:

张志剑, 夏苏迪, 刘政昊, 王文慧, 陈帅朴, 霍朝光. 基于数据增强和ViT的印章识别方法研究[J]. 情报学报, 2024, 43(3): 327-338.
Zhang Zhijian, Xia Sudi, Liu Zhenghao, Wang Wenhui, Chen Shuaipu, Huo Chaoguang. A Study on Seal Recognition Method Based on Data Augmentation and Vision Transformer. 情报学报, 2024, 43(3): 327-338.

链接本文:

https://qbxb.istic.ac.cn/CN/10.3772/j.issn.1000-0135.2024.03.007 或 https://qbxb.istic.ac.cn/CN/Y2024/V43/I3/327

1 郝继文. 文化遗产关键词: 印章[J]. 民族艺术, 2015(3): 47-52.
2 黄宾虹. 黄宾虹金石篆印丛编[M]. 北京: 人民美术出版社, 1999: 90.
3 沈浩. 处厚居实——诸乐三先生的篆刻世界[J]. 中国书法, 2022(5): 88-94.
4 郑智航, 倪文艳. 中国印章的多重意义检视[J]. 河南大学学报(社会科学版), 2022, 62(6): 58-65, 154.
5 赵杰, 郭东. 基于平行注意力机制的对抗样本防御方法[J]. 吉林大学学报(信息科学版), 2022, 40(5): 846-855.
6 Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
7 Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]// Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2005: 886-893.
8 Bay H, Tuytelaars T, van Gool L. SURF: speeded up robust features[C]// Proceedings of the European Conference on Computer Vision. Heidelberg: Springer, 2006: 404-417.
9 Rublee E, Rabaud V, Konolige K, et al. ORB: an efficient alternative to SIFT or SURF[C]// Proceedings of the 2011 International Conference on Computer Vision. Piscataway: IEEE, 2011: 2564-2571.
10 Ojala T, Pietik?inen M, Harwood D. A comparative study of texture measures with classification based on featured distributions[J]. Pattern Recognition, 1996, 29(1): 51-59.
11 Viola P, Jones M J. Robust real-time face detection[J]. International Journal of Computer Vision, 2004, 57(2): 137-154.
12 刘赏, 沈逸凡. 基于新闻标题-正文差异性的虚假新闻检测方法[J]. 数据分析与知识发现, 2023, 7(2): 97-107.
13 LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
14 张海涛, 王丹, 徐海玲, 等. 基于卷积神经网络的微博舆情情感分类研究[J]. 情报学报, 2018, 37(7): 695-702.
15 李慧, 柴亚青. 基于卷积神经网络的细粒度情感分析方法[J]. 数据分析与知识发现, 2019, 3(1): 95-103.
16 张柳, 王晰巍, 黄博, 等. 基于字词向量的多尺度卷积神经网络微博评论的情感分类模型及实验研究[J]. 图书情报工作, 2019, 63(18): 99-108.
17 李枫林, 柯佳. 基于深度学习框架的实体关系抽取研究进展[J]. 情报科学, 2018, 36(3): 169-176.
18 耿晶晶, 刘玉敏, 李洋, 等. 基于CNN-LSTM的股票指数预测模型[J]. 统计与决策, 2021, 37(5): 134-138.
19 Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]// Proceedings of the 3rd International Conference on Learning Representations. Washington: ICLR, 2015: arXiv:1409.1556.
20 Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1-9.
21 He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778.
22 Wang J, Yang Y, Mao J H, et al. CNN-RNN: a unified framework for multi-label image classification[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2285-2294.
23 Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[C]// Proceedings of the 3rd International Conference on Learning Representations. Washington: ICLR, 2015: arXiv:1409.0473.
24 Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale[C]// Proceedings of the 9th International Conference on Learning Representations. Washington: ICLR, 2021: arXiv:2010.11929.
25 Shorten C, Khoshgoftaar T M. A survey on image data augmentation for deep learning[J]. Journal of Big Data, 2019, 6(1): Article No.60.
26 Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357.
27 He H B, Bai Y, Garcia E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning[C]// Proceedings of 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). Piscataway: IEEE, 2008: 1322-1328.
28 Han H, Wang W Y, Mao B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[C]// Proceedings of the International Conference on Intelligent Computing. Heidelberg: Springer, 2005: 878-887.
29 DeVries T, Taylor G W. Improved regularization of convolutional neural networks with cutout[OL]. (2017-11-29). https://arxiv.org/pdf/1708.04552.pdf.
30 Zhong Z, Zheng L, Kang G L, et al. Random erasing data augmentation[C]// Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 13001-13008.
31 Lopes R G, Yin D, Poole B, et al. Improving robustness without sacrificing accuracy with patch Gaussian augmentation[OL]. (2019-06-06). https://arxiv.org/pdf/1906.02611.pdf.
32 Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
33 Zhang H Y, Cisse M, Dauphin Y N, et al. mixup: beyond empirical risk minimization[C]// Proceedings of the 6th International Conference on Learning Representations. Washington: ICLR, 2018: arXiv:1710.09412.
34 Guo H Y, Mao Y Y, Zhang R C. MixUp as locally linear out-of-manifold regularization[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 3714-3722.
35 Takahashi R, Matsubara T, Uehara K. Data augmentation using random image cropping and patching for deep CNNs[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(9): 2917-2931.
36 Takahashi R, Matsubara T, Uehara K. RICAP: random image cropping and patching data augmentation for deep CNNs[C]// Proceedings of the 10th Asian Conference on Machine Learning. Cambridge: MIT Press, 2018: 786-798.
37 Zhou K Y, Yang Y X, Qiao Y, et al. Domain generalization with MixStyle[C]// Proceedings of the 9th International Conference on Learning Representations. Washington: ICLR, 2021: arXiv:2104.02008.
38 Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 2672-2680.
39 Mirza M, Osindero S. Conditional generative adversarial nets[OL]. (2014-11-06). https://arxiv.org/pdf/1411.1784.pdf.
40 Odena A, Olah C, Shlens J. Conditional image synthesis with auxiliary classifier GANs[C]// Proceedings of the 34th International Conference on Machine Learning. JMLR.org, 2017: 2642-2651.
41 Zheng S, Song Y, Leung T, et al. Improving the robustness of deep neural networks via stability training[C]// Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 4480-4488.
42 Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507.
43 Kingma D P, Welling M. Auto-encoding variational Bayes[OL]. (2022-12-10). https://arxiv.org/pdf/1312.6114.pdf.
44 Larsen A B L, S?nderby S K, Larochelle H, et al. Autoencoding beyond pixels using a learned similarity metric[C]// Proceedings of the 33rd International Conference on International Conference on Machine Learning. JMLR.org, 2016: 1558-1566.
45 杨琴, 丁莉, 姜鹏, 等. 基于高光谱成像的书画模糊印章信息增强研究[J]. 中国国家博物馆馆刊, 2022(7): 136-147.
46 牟加俊, 王建, 何宇清, 等. 一种中国古画印章自动定位算法[J]. 现代电子技术, 2015, 38(2): 96-99, 103.
47 杨有, 张汝荟, 许鹏程, 等. 面向民国档案印章分割的改进U-Net[J]. 计算机应用, 2023, 43(3): 943-948.
48 周新光, 褚昊, 吴来明. 最小噪声分离(MNF)变换应用于模糊印章提取的研究[J]. 文物保护与考古科学, 2020, 32(3): 65-69.
49 康雅琪, 孙鹏, 郎宇博, 等. 重构色彩空间下陈旧印章印文自适应Canny检测[J]. 计算机仿真, 2023, 40(8): 230-234, 402.
50 葛怀东, 尚弘. 古籍书影中钤印提取技术的探讨[J]. 计算机应用与软件, 2017, 34(5): 189-194.
51 陈娅娅, 刘全香, 王凯丽, 等. 基于ResNet和迁移学习的古印章文本识别[J]. 计算机工程与应用, 2022, 58(10): 125-131.
52 欧阳欢, 范大昭, 李东子. 多特征融合决策的发票印章识别[J]. 计算机工程与设计, 2018, 39(9): 2842-2847.
53 戴俊峰, 杨天, 熊闻心. 基于极坐标转换的中文印章文字识别[J]. 计算机工程与设计, 2021, 42(11): 3174-3180.
54 Naseer M M, Ranasinghe K, Khan S H, et al. Intriguing properties of vision transformers[C]// Proceedings of the 35th Conference on Neural Information Processing Systems, Vancouver, Canada, 2021: 23296-23308.