A novel self-learning network integrating contrastive learning, perceptual learning and masked image modelling

Yingxian Chen; Rui Yang; Rushi Lan

doi:10.1117/12.3021579

25 March 2024 A novel self-learning network integrating contrastive learning, perceptual learning and masked image modelling

Yingxian Chen, Rui Yang, Rushi Lan

Author Affiliations +

Proceedings Volume 13089, Fifteenth International Conference on Graphics and Image Processing (ICGIP 2023); 1308906 (2024) https://doi.org/10.1117/12.3021579
Event: Fifteenth International Conference on Graphics and Image Processing (ICGIP 2023), 2023, Suzhou, China

Abstract

Unsupervised learning methods in computer vision have achieved remarkable success, exceeding the performance of supervised learning methods. It is noteworthy that current unsupervised learning methods share certain similarities, particularly in their data augmentation techniques. Masking, a type of data augmentation, can be utilized for both contrastive learning and masked image modelling. This paper presents a novel deep learning approach on visual unsupervised learning. It integrates previous methods such as contrastive learning, perceptual learning, self-distillation and masked image modelling. In our method, we treat the network that handles the original images as the teacher network, and the network that handles the masked images as the student network. The student network employs the representations extracted by the projection head for contrastive learning, while the features generated by the decoder are employed for masked image modeling. The process of self-knowledge distillation is facilitated by perceptual learning between the teacher and student networks. This model aligns with the main idea of contrastive learning, which aims to pull similar images closer while pushing dissimilar images further apart. Simultaneously, it reflects the main idea of masked image modelling, which enables the extraction of semantic information from large scale masked pixel reconstruction tasks. Additionally, we compare the effect of self-supervised methods to the performance of the model. Our results show that with only 75 epochs of fine-tuning, our 29M-parameter model achieves 78.5% top-1 accuracy on the ImageNet-1k dataset.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Yingxian Chen, Rui Yang, and Rushi Lan "A novel self-learning network integrating contrastive learning, perceptual learning and masked image modelling", Proc. SPIE 13089, Fifteenth International Conference on Graphics and Image Processing (ICGIP 2023), 1308906 (25 March 2024); https://doi.org/10.1117/12.3021579

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available