반응형

논문 읽기/Video Recognition 17

[Paper Review] STM(2019), Spatio Temporal and Motion Encoding for Action Recognition

STM: Spatio Temporal and Motion Encoding for Action Recognition Boyuan Jiang, MengMeng Wang, Weihao Gan, arXiv 2019 PDF, Video By SeonghoonYu August 3th, 2021 Summary STM consists of the Channel-wise SpatioTemporal Module(CSTM) and the Channel-wise Motion Module(CMM). CSTM encode the spatiotemporal features from different timestamps and CCM encode the motion features between neighboring frames. ..

[Paper Review] Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution Yunpeng Chen, Haoqi Fan, Bing Xu, Facebook AI, arXiv 2029 PDF, Video By SeonghoonYu July 31th, 2021 Summary Drop an Octave is motivated from idea about the information is conveyed at different frequencies where higher frequencies are usually encoded with fine details and lower frequencies are usu..

[Paper Review] GCNet(2019), Non-local Networks Meet Squeeze-Excitation Networks and Beyond

GCNet, Non-local Networks Meet Squeeze-Excitation Networks and Beyond Yue Cao, Jiarui Xu, Stephen Lin, Fangyum Wei, Han Hu, arXiv 2019 PDF, Video By SeonghoonYu July 27th, 2021 Summary This paper observes that the global contexts modeled by non-local network are almost the same for different query positions within an image. They calculate the global context abount only one query because calculat..

[Paper Review] TSM(2018), Temporal Shift Module for Efficient Video Understanding

TSM: Temporal Shift Module for Efficient Video Understanding Ji Lin, Chuang Gan, Song Han, arXiv 2018 PDF Video By SeonghoonYu July 23th, 2021 Summary This paper is 2D Conv based Video model. They present TSM(temporal shift Module). It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. TSM shift the channels along the temporal dimension both forwar..

[Paper review] SlowFast Networks for Video Recognition(2018)

SlowFast Networks for Video Recognition Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He, arXiv 2018 PDF, Video By SeonghoonYu July 20th, 2021 Summary They presents a two-pathway SlowFast model for video recognition. Two pathways seperately work at low and high temporal resolutions. (1) One is Slow pathway designed to capture sementic information that can be given by a few sparse f..

[Paper review] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset(2017)

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset Joao Carreira, Andrew Zisserman, arXiv 2017 PDF, VD By SeonghoonYu July 17th, 2021 Summary They achive SOTA performence in video action recognition using two method. (1) Apply ImageNet pre-trained 2D Conv model to 3D Conv model for the video classification by repeating the weights of the 2D filters N times along the time dimensi..

[논문 읽기] (2014) Learning Spatiotemporal Features with 3D Convolutional Networks

안녕하세요, 오늘 읽은 논문은 Learning Spatiotemporal Features with 3D Convolutional Networks 입니다. 한줄 정리 video task를 3D Convolution, 3D Pooling을 사용하여 Sota 성능을 기록합니다. Motivation 다음 4가지 성질을 만족하는 효과적인 video descriptor를 개발하려 합니다. (1) generic, (2) compact, (3) efficient, (4) Simple Contribution (1) 3D Conv가 appearance와 motion을 동시에 포착하여 good feature을 학습합니다. (2) 3x3x3 Conv 구조가 효과가 좋다는 것을 실험적으로 발견합니다. (3) 4개의 task와 ..

반응형