Big Self-Supervised Models are Strong Semi-Supervised Learners
Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton arXiv 2020
PDF, SSL By SeonghoonYu July 26th, 2021
Summary
This paper achieves SOTA performance by combine the pre-trained model on self-supervised learning with knowledge distilation. Namely, They show that using pre-trained model on SSL as teacher model for training student model is good improvements. In addition, they observe that bigger models can produce larger improvements with fewer labeled examples. They propose SimCLRv2 which improves upon SimCLR in three major ways
Step 1: Self supervised pretraining with SimCLRv2
SimCLRv2 use bigger model 152-layer ResNetx3 with SK Net. Instead of throwing away g() after pretraining as in SimCLR, we fine-tune from a middle layer. We also incorporate the memory mechanism from MOCO witch desinates a memory network.
Step 2: Fine-tuning
Fine-tunning is common way to adapt the task-agnostically pretrained network for a specific task.
Step 3: Self-training/knowledge distillation via unlabeled examples
Use fine-tuned model as a teacher model to impute labels for training a student network.
when the number of labeled examples is significant, one can also combine the distillation loss with ground-truth labeled examples using a weighted combination
Experiment
- Top-1 accuracy of fine-tuning SimCLRv2 models
Projection head
The effect of distillation
What I like about the paper
- Achieves SOTA performance on Semi supervised Learning by combine pre-trained SSL model with knowledge distillation
my github about what i read