Paper
25 March 2024 A novel self-learning network integrating contrastive learning, perceptual learning and masked image modelling
Yingxian Chen, Rui Yang, Rushi Lan
Author Affiliations +
Proceedings Volume 13089, Fifteenth International Conference on Graphics and Image Processing (ICGIP 2023); 1308906 (2024) https://doi.org/10.1117/12.3021579
Event: Fifteenth International Conference on Graphics and Image Processing (ICGIP 2023), 2023, Suzhou, China
Abstract
Unsupervised learning methods in computer vision have achieved remarkable success, exceeding the performance of supervised learning methods. It is noteworthy that current unsupervised learning methods share certain similarities, particularly in their data augmentation techniques. Masking, a type of data augmentation, can be utilized for both contrastive learning and masked image modelling. This paper presents a novel deep learning approach on visual unsupervised learning. It integrates previous methods such as contrastive learning, perceptual learning, self-distillation and masked image modelling. In our method, we treat the network that handles the original images as the teacher network, and the network that handles the masked images as the student network. The student network employs the representations extracted by the projection head for contrastive learning, while the features generated by the decoder are employed for masked image modeling. The process of self-knowledge distillation is facilitated by perceptual learning between the teacher and student networks. This model aligns with the main idea of contrastive learning, which aims to pull similar images closer while pushing dissimilar images further apart. Simultaneously, it reflects the main idea of masked image modelling, which enables the extraction of semantic information from large scale masked pixel reconstruction tasks. Additionally, we compare the effect of self-supervised methods to the performance of the model. Our results show that with only 75 epochs of fine-tuning, our 29M-parameter model achieves 78.5% top-1 accuracy on the ImageNet-1k dataset.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Yingxian Chen, Rui Yang, and Rushi Lan "A novel self-learning network integrating contrastive learning, perceptual learning and masked image modelling", Proc. SPIE 13089, Fifteenth International Conference on Graphics and Image Processing (ICGIP 2023), 1308906 (25 March 2024); https://doi.org/10.1117/12.3021579
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
Back to Top