[논문 읽기] ALIGN(2021), Scaling Up Vision-Language Representation Learning with Noisy Text Supervision

논문 읽기/Zero shot

[논문 읽기] ALIGN(2021), Scaling Up Vision-Language Representation Learning with Noisy Text Supervision

AI 꿈나무 2021. 12. 1. 13:03

Scaling Up Vision-Language Representation Learning with Noisy Text Supervision

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

Pre-trained representations are becoming crucial for many NLP and perception tasks. While representation learning in NLP has transitioned to training on raw text without human annotations, visual and vision-language representations still rely heavily on cu

arxiv.org

Summary

모델의 크기를 키우기 위해서는 raw data 로부터 학습하는게 필요하다.

raw data를 학습하는 방법에는 visual data만을 사용하는 self-sup과 vision-language를 사용하는 zero-shot learning이 있다.

해당 논문은 vision-language data를 사용하는 zero-shot learning 논문이다.

논문에서 데이터 셋을 하나 만든다. 데이터가 많아야 모델의 크기도 키울 수 있기 때문에 엄청 큰 데이터 셋을 만드는데..

전처리과정을 엄청 간단하게 사용한다. 따라서 noisy가 심하지만 데이터 수가 많은 데이터셋을 구축할 수 있다. 즉, 데이터세을 구축하는데 비용이 적게 든다는 것이다. 큰 데이터 셋을 만들기 위해 데이터 셋의 퀄리티를 포기했다. 최종적으로 1.8B image-text pair를 지닌 dataset을 구성한다.

image-text pair로 이루어진 dataset을 활용하여 image encoder와 text encoder 를 학습시킨다. 손실 함수는 constrative loss를 사용한다.

CLIP과의 차이점은 dataset이 어떻게 구성되어 있냐이다.

아래는 loss function이다.

image encoder은 EfficientNet, text encoder는 BERT를 사용하며 BERT 끝 단에 image output feature 차원과 맞춰주기 위하여 MLP를 사용한다. 두 encoder 모두 바닥부터 학습시킨다.

Experiment

my github

Seonghoon-Yu/Paper_Review_and_Implementation_in_PyTorch

공부 목적으로 논문을 리뷰하고 해당 논문 파이토치 재구현을 합니다. Contribute to Seonghoon-Yu/Paper_Review_and_Implementation_in_PyTorch development by creating an account on GitHub.

github.com

'논문 읽기 > Zero shot' 카테고리의 다른 글

[논문 읽기] TCN(2019), Transferable Contrastive Network for Generalized Zero-Shot Learning (0)	2021.12.20
[논문 읽기] CE-GZSL(2021), Contrastive Embedding for Generalized Zero-Shot Learning (0)	2021.12.19
[논문 읽기] DAZLE(2020), Fine-Grained Generalized Zero-Shot Learning via Dense Attribute-Based Attention (0)	2021.11.29
[논문 읽기] LiT, Zero-Shot Transfer with Locked-image Text Tuning(2021) (0)	2021.11.24
[논문 읽기] Zero-shot Learning via Shared-Reconstruction-Graph Pursuit(2017) (0)	2021.11.22

현재글[논문 읽기] ALIGN(2021), Scaling Up Vision-Language Representation Learning with Noisy Text Supervision

딥러닝 공부방