[Paper Review] TSM(2018), Temporal Shift Module for Efficient Video Understanding

논문 읽기/Video Recognition

[Paper Review] TSM(2018), Temporal Shift Module for Efficient Video Understanding

AI 꿈나무 2021. 7. 23. 20:03

TSM: Temporal Shift Module for Efficient Video Understanding

Ji Lin, Chuang Gan, Song Han, arXiv 2018

PDF Video By SeonghoonYu July 23th, 2021

Summary

This paper is 2D Conv based Video model. They present TSM(temporal shift Module). It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. TSM shift the channels along the temporal dimension both forward and backward.

Previous video models directly use 2D CNN. However, 2D CNN on individual frames cannot well model the temporal information. 3D CNNs can jointly learn spatial and temporal features but the computation cost is large. There are works to trade off beteen temporal modeling and computation

This paper propose a new perspective for efficient temporal modeling in video understanding by proposing a novel Temporal Shift Module(TSM)

(1) Partial Shift

TSM shift small portion(1/4) of the chnnels. because it can limit the latency overhead to only 3%. Therefore this partial shift strategy bring down the memory movement cost

(2) Residual TSM

They put the TSM inside the residual branch in residual block. Because in-place shift harms the spatial feature learning capability of the backbone model, since the information stored in the shifted channels is lost for the current frame.

(3) Online models with uni-directional TSM

Offline TSM shifts part of the channels bi-directionally, which requires features from future frames to replace the features in the current frame. If we only shift the feature from previous frames to current frames, we can achieve online recognition with uni-directional TSM

During inference, for each frame, we can save the first 1/8 feature maps of each residual block and cache it in the memory. For the next feame, we replace the first 1/8 of the current feature maps with the cached feature map. We use the combination of 7/8 current feature maps and 1/8 old feature maps to generate the next layer, and repeat.

Experiment

What I like about the paper

propose a new approach for efficient temporal modeling in video understaning by proposing a novel TSM.
Shift partial of the channels along the temporal dimensions to exchange the temporal information with neighbor frames

my github about what i read

Seonghoon-Yu/Paper_Review_and_Implementation_in_PyTorch

공부 목적으로 논문을 리뷰하고 해당 논문 파이토치 재구현을 합니다. Contribute to Seonghoon-Yu/Paper_Review_and_Implementation_in_PyTorch development by creating an account on GitHub.

github.com

'논문 읽기 > Video Recognition' 카테고리의 다른 글

[Paper Review] Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution (0)	2021.07.31
[Paper Review] GCNet(2019), Non-local Networks Meet Squeeze-Excitation Networks and Beyond (0)	2021.07.27
[Paper review] SlowFast Networks for Video Recognition(2018) (0)	2021.07.20
[Paper review] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset(2017) (0)	2021.07.17
[논문 읽기] (2014) Learning Spatiotemporal Features with 3D Convolutional Networks (0)	2021.07.15

현재글[Paper Review] TSM(2018), Temporal Shift Module for Efficient Video Understanding

딥러닝 공부방

[Paper Review] TSM(2018), Temporal Shift Module for Efficient Video Understanding

TSM: Temporal Shift Module for Efficient Video Understanding