반응형
STM: Spatio Temporal and Motion Encoding for Action Recognition
Boyuan Jiang, MengMeng Wang, Weihao Gan, arXiv 2019
PDF, Video By SeonghoonYu August 3th, 2021
Summary
STM consists of the Channel-wise SpatioTemporal Module(CSTM) and the Channel-wise Motion Module(CMM). CSTM encode the spatiotemporal features from different timestamps and CCM encode the motion features between neighboring frames. STM assemble two modules to combine different information encoded by each modules together.
The STM blocks can be easily inserted into existing ResNet architectures by replacing the original residual blocks to form the STM networks
Experiment
What I like about the paper
- encoding spatiotemporal and motion features together in a unified 2D CNN networks
- It is simple architecture which can replace the original residual blocks with STM blocks in ResNet architecture to build the STM network.
- 비디오에서 spatiotemporal feature와 motion feature는 중요한 요소인데 이를 통합하여 구현하였다!! ㄴㅇㄱ
my github about what i read
반응형