validation accuracy during training is different than validate.py #235

AmnonGeifman · 2020-09-07T13:39:55Z

First thanks for the great repo!
I'm training mobilenetv2-140 and it seems that I got an accuracy of 76.35 during the training (see line from summary.csv below):

456 | 2.249339835984370 | 1.0110450189209000 | 76.3519998852539 | 92.94199993652340

The problem is when I use the validation.py script I'm not getting the same accuracy (I'm getting 76.0).
Do you have an idea why this gap is happening?

Thanks

rwightman · 2020-09-07T16:48:17Z

@AmnonGeifman If you are doing distributed training without sync-bn there can be a fairly noticeable gap between train and post-train validation. One way to reduce the gap is to train with one of the --dist-bn modes to synchronize just the batch norm stats between distributed training nodes before eval. It has less overhead than using sync-bn.

AmnonGeifman · 2020-09-08T13:31:54Z

Thanks very much! It probably explains the gap.

AmnonGeifman closed this as completed Sep 8, 2020

hollance mentioned this issue Oct 2, 2020

Bug fix: test_time_pool would be set to a non-False value #244

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

validation accuracy during training is different than validate.py #235

validation accuracy during training is different than validate.py #235

AmnonGeifman commented Sep 7, 2020

rwightman commented Sep 7, 2020

Uh oh!

AmnonGeifman commented Sep 8, 2020

Uh oh!

Uh oh!

validation accuracy during training is different than validate.py #235

validation accuracy during training is different than validate.py #235

Comments

AmnonGeifman commented Sep 7, 2020

rwightman commented Sep 7, 2020

Uh oh!

AmnonGeifman commented Sep 8, 2020

Uh oh!