[Paper review] Temporal Ensembling for Semi-Supervised Learning(2016)

논문 읽기/Semi-Supervised

[Paper review] Temporal Ensembling for Semi-Supervised Learning(2016)

AI 꿈나무 2021. 7. 18. 02:09

Temporal Ensembling for Semi-Supervised Learning

Samuli Laine, Timo Aila, arxiv 2016

PDF, Semi-Supervised Learning By SeonghoonYu July 18th, 2021

Summary

They propose $\sqcap$-model and temporal Ensemling in a semi-supervised learning setting only a small portion of training data is labeled.

During training, $\sqcap$-model evaluates each training input $x_i$, resulting in prediction vetors $z_i$ and $\hat{z_i}$. Because of Dropout, two evalutaions($z_i, \hat{z_i}$) is different result under same parameters.

The loss function consists of two components. The first component is the standard cross-entropy loss, evaluated for labeled inputs only. The second component, evaluated for all inputs, penalizes different predictions for the same training input $x_i$ by taking the mean square difference betwwen the prediction vectors $z_i and $\hat{z_i}$.

To combine the supervised and unsupervised loss terms, scale the latter by time-depenpendent weighting function w(t). The unsupervised loss weighting function w(t) ramps up, starting from zero, along a Gaussian curve during the first 80 training epochs. In the beginning the total loss are thus dominated by the supervised loss component.

This has a problem that $\sqcap$-model can be expected to be noisy, as the training targets obtained a single evaluation of network.

Temporal ensembling alleviates the $\sqcap$-model's problem by aggregating the predictions of multiple previous network evaluations into an ensemble prediction. It also let us evaluate the network only once during training, gaining an approximate 2x speedup over the $\sqcap$-model

After every training epoch, the network outputs $z_i$ are accumulated into ensemble outputs $Z_i$ by updating the following formulation.

next, Z is divided by factor (1-$\alpha^t$). This is bias correction.

Experiment

They achieves SOTA performance compared to other semi-supervised learning method on CIFAL-10 and SVHN dataset.

Temporal ensembling is more tolerance to increct labels than the supervised learning. Because accumulated prediction vector make model have genelarized outputs to diffenrent classes.

What I like about the paper

make model have more genelarization prediction vectors by calculating moving-average on the previous prediction.
interesting method using unlabeled training inputs.

my github about what i read

Seonghoon-Yu/Paper_Review_and_Implementation_in_PyTorch

공부 목적으로 논문을 리뷰하고 해당 논문 파이토치 재구현을 합니다. Contribute to Seonghoon-Yu/Paper_Review_and_Implementation_in_PyTorch development by creating an account on GitHub.

github.com

'논문 읽기 > Semi-Supervised' 카테고리의 다른 글

[논문 읽기] Soft Teacher(2021), End-to-End Semi-Supervised Object Detection with Soft Teacher (0)	2022.01.30
[논문 읽기] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision(2021) (0)	2022.01.29
[논문 읽기] ReCo(2021), Bootstrapping Semantic Segmentation with Regional Contrast (0)	2021.09.29
[Paper review] Mean teachers are better role models(2017) (0)	2021.07.18
[논문 읽기] (2019), Consistency-based Semi-supervised Learning for Object Detection (0)	2021.07.08

현재글[Paper review] Temporal Ensembling for Semi-Supervised Learning(2016)

딥러닝 공부방