논문 읽기/Classification

[논문 읽기] Deit(2020), Training data-efficient image transformers & distillation through attention

AI 꿈나무 2021. 8. 4. 21:57
반응형

Training data-efficient image transformers & distillation through attention

Hugo Touvron, Matthieu Cord, Matthijs Douze, arXiv 2020

 

PDF, Classification By SeonghoonYu August 4th, 2021

 

Summary

 

 Deit는 ViT에 distillation token을 추가하여 Knowledge distillation을 적용한 논문입니다. 

 Deit is the model which apply Knowledge distillation to ViT by adding a distillation token to ViT.

 

 class token에 head를 적용하여 얻은 확률은 Cross entropy loss에 사용하고 distillation token에 dist_head를 적용하여 얻은 확률은 KD loss에 사용합니다.

 The probability obtained by applying head to a class token is used for Cross entropy loss and by applying dist_head to distillation token is used for KD loss.

 

 2 종류의 KD Loss가 존재하는데, Hard-label distillation의 성능이 더 뛰어납니다.

 There are 2 types of KD Loss, the performance of Hard-label distillation outperforms over Soft distillation

 

Soft

 

Hard-label

 

Experiment

 

 

What I like about the paper

  • Why has Distillation token for KD Loss the good performance? so suprising.

 


my github about what i read

 

Seonghoon-Yu/Paper_Review_and_Implementation_in_PyTorch

공부 목적으로 논문을 리뷰하고 해당 논문 파이토치 재구현을 합니다. Contribute to Seonghoon-Yu/Paper_Review_and_Implementation_in_PyTorch development by creating an account on GitHub.

github.com

 

반응형