[논문 읽기] Deit(2020), Training data-efficient image transformers & distillation through attention

논문 읽기/Classification

[논문 읽기] Deit(2020), Training data-efficient image transformers & distillation through attention

AI 꿈나무 2021. 8. 4. 21:57

Training data-efficient image transformers & distillation through attention

Hugo Touvron, Matthieu Cord, Matthijs Douze, arXiv 2020

PDF, Classification By SeonghoonYu August 4th, 2021

Summary

Deit는 ViT에 distillation token을 추가하여 Knowledge distillation을 적용한 논문입니다.

Deit is the model which apply Knowledge distillation to ViT by adding a distillation token to ViT.

class token에 head를 적용하여 얻은 확률은 Cross entropy loss에 사용하고 distillation token에 dist_head를 적용하여 얻은 확률은 KD loss에 사용합니다.

The probability obtained by applying head to a class token is used for Cross entropy loss and by applying dist_head to distillation token is used for KD loss.

2 종류의 KD Loss가 존재하는데, Hard-label distillation의 성능이 더 뛰어납니다.

There are 2 types of KD Loss, the performance of Hard-label distillation outperforms over Soft distillation

Experiment

What I like about the paper

Why has Distillation token for KD Loss the good performance? so suprising.

my github about what i read

Seonghoon-Yu/Paper_Review_and_Implementation_in_PyTorch

공부 목적으로 논문을 리뷰하고 해당 논문 파이토치 재구현을 합니다. Contribute to Seonghoon-Yu/Paper_Review_and_Implementation_in_PyTorch development by creating an account on GitHub.

github.com

'논문 읽기 > Classification' 카테고리의 다른 글

[논문 읽기] CvT(2021), Introducing Convolutions to Vision Transformers (0)	2021.08.08
[Paper Review] CeiT(2021), Incorporating Convolution Designs into Visual Transformers (0)	2021.08.05
[논문 읽기] Non-local Neural Networks(2017) (0)	2021.07.13
[논문 읽기] Big Transfer(BiT, 2019), General Visual Representation Learning (0)	2021.07.02
[논문 읽기] MLP-Mixer(2021), An all-MLP Architecture for Vision (0)	2021.07.01

현재글[논문 읽기] Deit(2020), Training data-efficient image transformers & distillation through attention

딥러닝 공부방