[Paper Review] Deep InfoMax(2018), Learning Deep Representations by Mutual Information Estimation and Maximization

논문 읽기/Self-Supervised

[Paper Review] Deep InfoMax(2018), Learning Deep Representations by Mutual Information Estimation and Maximization

AI 꿈나무 2021. 7. 21. 19:57

Learning Deep Representations by Mutual Information Estimation and Maximization

R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, arXiv 2018

PDF, SSL By SeonghoonYu July 21th, 2021

Summary

This paper updates model's parameters by maximizing mutial information between immediate feature maps and flattened last feature maps obtained from ConvNet. To do this, they use Jensen-Shannon divergence(JSD) to calulating mutual information and formulate global matual information, feature mutual information and prior matching to calculate loss function.

we maximize the mutual information between MxM feature map and Feature vector using JSD.

(1) Use JSD to calulate mutual information

T is discriminator, x is m x m size immediate feature map, E is encoder, sp is softplus, x' is m x m size immediate feaure map from different images in batch.

I can implement JSD as the following pytorch pseudo-code

Ej = -F.softplus(-self.discriminator(Y, M)).mean()
Em = -F.softplus(-self.discriminator(Y, M_facke)).mean()
loss = (Em - Ej)

(2) Deep InfoMax with global ML

Pass both the high-level feature vector(Y) and the lower-level MxM feature map through a discriminator to get the score. MxM feature map from different images is Fake sample and this is used for calculating the global mutual information

(3) Deep infoMax with Local ML

They summarize local feature map into a gloval feature using 1x1 conv or fc layer

Then, they maximize the average MI between the local features and the global feature

(4) Matching representations to a prior distribution

They discriminate between a prior MxM feature map refered to 'Real' and fake MxM feature map from different images. Diiscriminator is updated by minimizing the follow divergence.

(5) Total Loss

All three objectives(global and local MI maximization and prior matching) can be used together.

Experiment

Comparison with other methods

JSD is less sensitive to batch size than InfoNCE Loss

What I like about the paper

Interasting method maximizing the mutual information between the input data and the output of encoder

my github about what i read

Seonghoon-Yu/Paper_Review_and_Implementation_in_PyTorch

공부 목적으로 논문을 리뷰하고 해당 논문 파이토치 재구현을 합니다. Contribute to Seonghoon-Yu/Paper_Review_and_Implementation_in_PyTorch development by creating an account on GitHub.

github.com

'논문 읽기 > Self-Supervised' 카테고리의 다른 글

[Paper Review] Unsupervised Learning of Visual Representations using Videos(2015) (0)	2021.07.23
[Paper Review] Unsupervised Feature Learning via Non-Parametric Instance Discrimination(2018) (0)	2021.07.22
[Paper review] Deep Clustering for Unsupervised Learning of Visual Features(2018) (0)	2021.07.20
[Paper review] SwAV(2020), Unsupervied Learning of Visual Features by Contrasting Cluster Assignments (0)	2021.07.19
[Paper review] SeLa(2019), Self-Labelling via Simultaneous Clustering and Representation Learning (0)	2021.07.19

현재글[Paper Review] Deep InfoMax(2018), Learning Deep Representations by Mutual Information Estimation and Maximization

딥러닝 공부방