Skip to content

validation accuracy during training is different than validate.py #235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AmnonGeifman opened this issue Sep 7, 2020 · 2 comments
Closed

Comments

@AmnonGeifman
Copy link

First thanks for the great repo!
I'm training mobilenetv2-140 and it seems that I got an accuracy of 76.35 during the training (see line from summary.csv below):

456 | 2.249339835984370 | 1.0110450189209000 | 76.3519998852539 | 92.94199993652340

The problem is when I use the validation.py script I'm not getting the same accuracy (I'm getting 76.0).
Do you have an idea why this gap is happening?

Thanks

@rwightman
Copy link
Collaborator

@AmnonGeifman If you are doing distributed training without sync-bn there can be a fairly noticeable gap between train and post-train validation. One way to reduce the gap is to train with one of the --dist-bn modes to synchronize just the batch norm stats between distributed training nodes before eval. It has less overhead than using sync-bn.

@AmnonGeifman
Copy link
Author

Thanks very much! It probably explains the gap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants