|
|
A Study on Seal Recognition Method Based on Data Augmentation and Vision Transformer |
Zhang Zhijian1,2,3, Xia Sudi4, Liu Zhenghao1,2,3, Wang Wenhui1,2,3, Chen Shuaipu1,2,3, Huo Chaoguang5 |
1.School of Information Management, Wuhan University, Wuhan 430072 2.Big Data Institute, Wuhan University, Wuhan 430072 3.The Center for Studies of Information Resources, Wuhan University, Wuhan 430072 4.School of Health Economics and Management, Nanjing University of Chinese Medicine, Nanjing 210023 5.School of Information Resource Management, Renmin University of China, Beijing 100872 |
|
|
Abstract Seal recognition poses challenges due to difficulties in data collection, annotation, and image degradation. This study aims to alleviate data scarcity through data augmentation and improve the model's ability to recognize seals in complex scenarios by using the vision transformer (ViT) model to extract global features. First, the contextual characteristics of the seals are analyzed, implementing data augmentation strategies based on the analysis results to expand the training set. Seal images are then input into the ViT model for feature extraction and recognition. We collected and annotated 1,259 seals from 16 calligraphy and painting works, such as “Lanting Xu.” After applying 11 data augmentation modules, the training set expanded to include 127,159 seal images. Compared with the baseline model ResNet50, the F1 score improved by 12.17%. When the extended data obtained through data augmentation is removed, all models fail to converge. However, the proposed method lacks semantic reasoning ability and cannot recognize seals not present in the training set. In scenarios with limited annotated data, the combination of data augmentation techniques and the utilization of the ViT model can facilitate accurate seal image recognition.
|
Received: 15 May 2023
|
|
|
|
1 郝继文. 文化遗产关键词: 印章[J]. 民族艺术, 2015(3): 47-52. 2 黄宾虹. 黄宾虹金石篆印丛编[M]. 北京: 人民美术出版社, 1999: 90. 3 沈浩. 处厚居实——诸乐三先生的篆刻世界[J]. 中国书法, 2022(5): 88-94. 4 郑智航, 倪文艳. 中国印章的多重意义检视[J]. 河南大学学报(社会科学版), 2022, 62(6): 58-65, 154. 5 赵杰, 郭东. 基于平行注意力机制的对抗样本防御方法[J]. 吉林大学学报(信息科学版), 2022, 40(5): 846-855. 6 Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. 7 Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]// Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2005: 886-893. 8 Bay H, Tuytelaars T, van Gool L. SURF: speeded up robust features[C]// Proceedings of the European Conference on Computer Vision. Heidelberg: Springer, 2006: 404-417. 9 Rublee E, Rabaud V, Konolige K, et al. ORB: an efficient alternative to SIFT or SURF[C]// Proceedings of the 2011 International Conference on Computer Vision. Piscataway: IEEE, 2011: 2564-2571. 10 Ojala T, Pietik?inen M, Harwood D. A comparative study of texture measures with classification based on featured distributions[J]. Pattern Recognition, 1996, 29(1): 51-59. 11 Viola P, Jones M J. Robust real-time face detection[J]. International Journal of Computer Vision, 2004, 57(2): 137-154. 12 刘赏, 沈逸凡. 基于新闻标题-正文差异性的虚假新闻检测方法[J]. 数据分析与知识发现, 2023, 7(2): 97-107. 13 LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. 14 张海涛, 王丹, 徐海玲, 等. 基于卷积神经网络的微博舆情情感分类研究[J]. 情报学报, 2018, 37(7): 695-702. 15 李慧, 柴亚青. 基于卷积神经网络的细粒度情感分析方法[J]. 数据分析与知识发现, 2019, 3(1): 95-103. 16 张柳, 王晰巍, 黄博, 等. 基于字词向量的多尺度卷积神经网络微博评论的情感分类模型及实验研究[J]. 图书情报工作, 2019, 63(18): 99-108. 17 李枫林, 柯佳. 基于深度学习框架的实体关系抽取研究进展[J]. 情报科学, 2018, 36(3): 169-176. 18 耿晶晶, 刘玉敏, 李洋, 等. 基于CNN-LSTM的股票指数预测模型[J]. 统计与决策, 2021, 37(5): 134-138. 19 Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]// Proceedings of the 3rd International Conference on Learning Representations. Washington: ICLR, 2015: arXiv:1409.1556. 20 Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1-9. 21 He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 22 Wang J, Yang Y, Mao J H, et al. CNN-RNN: a unified framework for multi-label image classification[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2285-2294. 23 Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[C]// Proceedings of the 3rd International Conference on Learning Representations. Washington: ICLR, 2015: arXiv:1409.0473. 24 Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale[C]// Proceedings of the 9th International Conference on Learning Representations. Washington: ICLR, 2021: arXiv:2010.11929. 25 Shorten C, Khoshgoftaar T M. A survey on image data augmentation for deep learning[J]. Journal of Big Data, 2019, 6(1): Article No.60. 26 Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357. 27 He H B, Bai Y, Garcia E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning[C]// Proceedings of 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). Piscataway: IEEE, 2008: 1322-1328. 28 Han H, Wang W Y, Mao B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[C]// Proceedings of the International Conference on Intelligent Computing. Heidelberg: Springer, 2005: 878-887. 29 DeVries T, Taylor G W. Improved regularization of convolutional neural networks with cutout[OL]. (2017-11-29). https://arxiv.org/pdf/1708.04552.pdf. 30 Zhong Z, Zheng L, Kang G L, et al. Random erasing data augmentation[C]// Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 13001-13008. 31 Lopes R G, Yin D, Poole B, et al. Improving robustness without sacrificing accuracy with patch Gaussian augmentation[OL]. (2019-06-06). https://arxiv.org/pdf/1906.02611.pdf. 32 Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. 33 Zhang H Y, Cisse M, Dauphin Y N, et al. mixup: beyond empirical risk minimization[C]// Proceedings of the 6th International Conference on Learning Representations. Washington: ICLR, 2018: arXiv:1710.09412. 34 Guo H Y, Mao Y Y, Zhang R C. MixUp as locally linear out-of-manifold regularization[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 3714-3722. 35 Takahashi R, Matsubara T, Uehara K. Data augmentation using random image cropping and patching for deep CNNs[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(9): 2917-2931. 36 Takahashi R, Matsubara T, Uehara K. RICAP: random image cropping and patching data augmentation for deep CNNs[C]// Proceedings of the 10th Asian Conference on Machine Learning. Cambridge: MIT Press, 2018: 786-798. 37 Zhou K Y, Yang Y X, Qiao Y, et al. Domain generalization with MixStyle[C]// Proceedings of the 9th International Conference on Learning Representations. Washington: ICLR, 2021: arXiv:2104.02008. 38 Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 2672-2680. 39 Mirza M, Osindero S. Conditional generative adversarial nets[OL]. (2014-11-06). https://arxiv.org/pdf/1411.1784.pdf. 40 Odena A, Olah C, Shlens J. Conditional image synthesis with auxiliary classifier GANs[C]// Proceedings of the 34th International Conference on Machine Learning. JMLR.org, 2017: 2642-2651. 41 Zheng S, Song Y, Leung T, et al. Improving the robustness of deep neural networks via stability training[C]// Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 4480-4488. 42 Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. 43 Kingma D P, Welling M. Auto-encoding variational Bayes[OL]. (2022-12-10). https://arxiv.org/pdf/1312.6114.pdf. 44 Larsen A B L, S?nderby S K, Larochelle H, et al. Autoencoding beyond pixels using a learned similarity metric[C]// Proceedings of the 33rd International Conference on International Conference on Machine Learning. JMLR.org, 2016: 1558-1566. 45 杨琴, 丁莉, 姜鹏, 等. 基于高光谱成像的书画模糊印章信息增强研究[J]. 中国国家博物馆馆刊, 2022(7): 136-147. 46 牟加俊, 王建, 何宇清, 等. 一种中国古画印章自动定位算法[J]. 现代电子技术, 2015, 38(2): 96-99, 103. 47 杨有, 张汝荟, 许鹏程, 等. 面向民国档案印章分割的改进U-Net[J]. 计算机应用, 2023, 43(3): 943-948. 48 周新光, 褚昊, 吴来明. 最小噪声分离(MNF)变换应用于模糊印章提取的研究[J]. 文物保护与考古科学, 2020, 32(3): 65-69. 49 康雅琪, 孙鹏, 郎宇博, 等. 重构色彩空间下陈旧印章印文自适应Canny检测[J]. 计算机仿真, 2023, 40(8): 230-234, 402. 50 葛怀东, 尚弘. 古籍书影中钤印提取技术的探讨[J]. 计算机应用与软件, 2017, 34(5): 189-194. 51 陈娅娅, 刘全香, 王凯丽, 等. 基于ResNet和迁移学习的古印章文本识别[J]. 计算机工程与应用, 2022, 58(10): 125-131. 52 欧阳欢, 范大昭, 李东子. 多特征融合决策的发票印章识别[J]. 计算机工程与设计, 2018, 39(9): 2842-2847. 53 戴俊峰, 杨天, 熊闻心. 基于极坐标转换的中文印章文字识别[J]. 计算机工程与设计, 2021, 42(11): 3174-3180. 54 Naseer M M, Ranasinghe K, Khan S H, et al. Intriguing properties of vision transformers[C]// Proceedings of the 35th Conference on Neural Information Processing Systems, Vancouver, Canada, 2021: 23296-23308. |
|
|
|