[논문 읽기] X-ViT(2021), Space-time Mixing Attention for Video Transformer

논문 읽기/Video Recognition

[논문 읽기] X-ViT(2021), Space-time Mixing Attention for Video Transformer

AI 꿈나무 2021. 9. 21. 13:25

Space-time Mixing Attention for Video Transformer

PDF, Video, Adrian Bulat, Juan-Manuel Perez-Rua, Swathikiran Sudharan, Brais Martinez, Georgios Tzimiropolos, arXiv 2021

Summary

ViT를 Video에 적용한 논문입니다. self-attention의 계산 복잡도를 O(TS^2)로 감소시키는데 어떤 방법을 이용하는지 살펴볼 필요가 있는 것 같습니다. 성능도 잘 나오고 FLOPs 관점에서 엄청난 이점을 갖습니다.

Method를 살펴보면 이해가 잘 안갑니다. 코드를 뜯어봐야 이해가 될 것 같네요.

my github

Seonghoon-Yu/Paper_Review_and_Implementation_in_PyTorch

공부 목적으로 논문을 리뷰하고 해당 논문 파이토치 재구현을 합니다. Contribute to Seonghoon-Yu/Paper_Review_and_Implementation_in_PyTorch development by creating an account on GitHub.

github.com

'논문 읽기 > Video Recognition' 카테고리의 다른 글

[논문 읽기] Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition(2021) (0)	2021.09.22
[논문 읽기] VLF(2021), VideoLightFormer: Lightweight Action Recognition using Transformers (0)	2021.09.21
[논문 읽기] X3D(2020), Expanding Architectures for Efficient Video Recognition (0)	2021.09.20
[논문 읽기] VTN(2021), Video Transformer Network (0)	2021.09.12
[논문 읽기] MViT(2021), Multiscale Vision Transformers (0)	2021.09.12

현재글[논문 읽기] X-ViT(2021), Space-time Mixing Attention for Video Transformer

딥러닝 공부방