5

I understand that BatchNorm (Batch Normalization) centers to (mean, std) = (0, 1) and potentially scales (with $ \gamma $) and offsets (with $ \beta $) the data which is input to the layer. BatchNorm follows this formula:

Vanilla BatchNorm (retrieved from arxiv-id 1502.03167)

However, when it comes to 'adaptive BatchNorm', I don't understand what the difference is. What is adaptive BatchNorm doing differently? It is described as follows:

adaptive BatchNorm (retrieved from arxiv-id 1603.04779)

DaveTheAl
  • 493
  • 1
  • 5
  • 11

1 Answers1

2

I think the original batch normalization paper proposes to use mean and standard deviation estimated on the train set. The adaptive batch normalization simply re-estimates them on the target domain (could be the test set, or some unlabeled data from the target domain).

Please correct me if I am wrong.

paraba
  • 36
  • 2
  • 1
    After a long time having this question, I also think this is the only difference. Thanks a lot for the answer! :) – DaveTheAl Jun 13 '17 at 09:53