Scaling Up Vision-Language Representation Learning with Noisy Text Supervision https://arxiv.org/abs/2102.05918 Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision Pre-trained representations are becoming crucial for many NLP and perception tasks. While representation learning in NLP has transitioned to training on raw text without human annotations, visual ..