반응형

논문 읽기 255

[논문 읽기] Centroid Transformer(2021)

Centroid Transformers: Learning to Abstract with Attention Lemeng Wu, Xingchao Liu, Qiang Liu, arXiv 2021 PDF, Transformer By SeonghoonYu August 02th, 2021 Summary 센트로이드 트랜스포머는 N개의 입력값을 M개의 요소로 요약합니다. 이 과정에서 필요없는 정보를 버리고 트랜스포머의 계산 복잡도를 O(MN)으로 감소합니다. M개의 요소는 Clustering의 centroid로 생각해 볼 수 있는데, 이 M개의 요소를 어떻게 선정하는 지가 핵심 아이디어로 생각해볼 수 있습니다. M개의 centroid를 선정하기 위해 입력값 x와 centroid 사이의 유사도를 측정하고 손실함수를 설계..

[Paper Review] ACT(2020), End-to-End Object Detection with Adaptive Clustering Transformer

End-to-End Object Detection with Adaptive Clustering Transformer Minghang Zheng, Peng Gao, Xiaogang Wang, HongshengLi, Hao Dong, arXiv 2020 PDF, Object Detection By SeonghoonYu July 31th, 2021 Summary This paper improve the computational complexity of DETR by replacing self-attention module in DETR with ACT(adaptive clustering transformer). Also they presents MTKD(Multi-Task Knowledge Distillati..

[Paper Review] Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution Yunpeng Chen, Haoqi Fan, Bing Xu, Facebook AI, arXiv 2029 PDF, Video By SeonghoonYu July 31th, 2021 Summary Drop an Octave is motivated from idea about the information is conveyed at different frequencies where higher frequencies are usually encoded with fine details and lower frequencies are usu..

[Paper Review] Invariant Information Clustering for Unsupervised Image Classification and Segmentation(2018)

Invariant Information Clustering for Unsupervised Image Classification and Segmentation Xu Ji, Joao F.Henriques, Andrea Vedaldi, arXiv 2018 PDF, Clustering By SeonghoonYu July 30th, 2021 Summary This paper presents IIC model which acieves SOTA performance on Image clustering and Image segmentation by maximizing the mutual information between the original image and the transformed image from orig..

[Paper Review] GCNet(2019), Non-local Networks Meet Squeeze-Excitation Networks and Beyond

GCNet, Non-local Networks Meet Squeeze-Excitation Networks and Beyond Yue Cao, Jiarui Xu, Stephen Lin, Fangyum Wei, Han Hu, arXiv 2019 PDF, Video By SeonghoonYu July 27th, 2021 Summary This paper observes that the global contexts modeled by non-local network are almost the same for different query positions within an image. They calculate the global context abount only one query because calculat..

[Paper Review] SimCLRv2(2020), Big Self-Supervised Models are Strong Semi-Supervised Learners

Big Self-Supervised Models are Strong Semi-Supervised Learners Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton arXiv 2020 PDF, SSL By SeonghoonYu July 26th, 2021 Summary This paper achieves SOTA performance by combine the pre-trained model on self-supervised learning with knowledge distilation. Namely, They show that using pre-trained model on SSL as teacher model fo..

[Paper Review] Set Transformer(2018), A Framework for Attention-based Permutation-Invariant Neural Networks

Set Transformer, A Framework for Attention-based Permutation-Invariant Neural Networks Juho Lee, Yoonho Lee, Jungtaek Kim arXiv 2018 PDF, Transformer By SeonghoonYu July 25th, 2021 Summary Set Transforemr is a function that performs permutation invariant by taking elements thar are ordered in the set. The model consists of an encoder and a decoder, both of which rely on attention. This model lea..

[Paper Review] Unsupervised Learning of Visual Representations using Videos(2015)

Unsupervised Learning of Visual Representations using Videos Xiaolong Wang, Abhinav Gupta, arXiv 2015 PDF, Video By SeonghoonYu July 23th, 2021 Summary This paper use hundreds of thousands of unlabeled videos from the web to learn visual representations. They use the first frame and the last frame in same video as positive samples and a random frame from different video as negative sample. They ..

[Paper Review] TSM(2018), Temporal Shift Module for Efficient Video Understanding

TSM: Temporal Shift Module for Efficient Video Understanding Ji Lin, Chuang Gan, Song Han, arXiv 2018 PDF Video By SeonghoonYu July 23th, 2021 Summary This paper is 2D Conv based Video model. They present TSM(temporal shift Module). It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. TSM shift the channels along the temporal dimension both forwar..

[Paper Review] BERT(2018), Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, arXiv 2018 PDF, NLP By SeonghoonYu July 22th, 2021 Summary BETR is a multi-layer bidirectional Transformer encoder and learn the word embedding by using the unlabeled data. And then the learned word embbeding is fine-tuned using labeled data from downstre..

논문 읽기/NLP 2021.07.22
반응형