A2 config from the article #917

mrT23 · 2021-10-17T07:11:06Z

mrT23
Oct 17, 2021

Hi @rwightman
for a future article, i am trying to use A2 configuration as a baseline

here is the config i am using, following Table 2 from the article:

python -u -m torch.distributed.launch --nproc_per_node=8 \ --nnodes=1 \ --node_rank=0 \ ./train.py \ /data/imagenet/ \ --amp \ -b=256 \ --epochs=300 \ --drop-path=0.05 \ --opt=lamb \ --weight-decay=0.02 \ --sched='cosine' \ --lr=5e-3 \ --warmup-epochs=5 \ --model=resnet50 \ --aa=rand-m7-mstd0.5-inc1 \ -j=16 \ --reprob=0.0 \ --remode='pixel' \ --mixup=0.1 \ --cutmix=1.0 \ --aug_repeats \ --bce-loss

i am getting 79.6% -+ 0.05%
This is very close to the article (79.8%, 0.2% gap), but a bit lower.

Small differences like this can sometimes come even from different versions of pytroch\NGC (mixed-precision implementation)
Still, do you something that stands out and is missing in my config ?

p.s.
my motivation for using A2 - i think that improvements should be presented on a strong baseline. it is much easier to show improvement for an under-trained resnet50 of 76%, and these kinds of improvements don't always generalize to strong training schemes.
my current improvement increased resnet50 from 79.6% to 80.1%, so that gives me a good feeling about it's general applicability

Answered by rwightman

Oct 17, 2021

@mrT23 closer matching the paper A2 run add --bce-target-thresh 0.2 --aug-repeats 3 ... my lamb impl works, but you can also use fusedlamb from apex if you're able, most of the experiments were run with fusedlamb for the slight throughput / mem gains for GPU

remode doesn't need to be specified if prob is 0

View full answer

rwightman · 2021-10-17T15:47:39Z

rwightman
Oct 17, 2021
Maintainer

@mrT23 closer matching the paper A2 run add --bce-target-thresh 0.2 --aug-repeats 3 ... my lamb impl works, but you can also use fusedlamb from apex if you're able, most of the experiments were run with fusedlamb for the slight throughput / mem gains for GPU

remode doesn't need to be specified if prob is 0

3 replies

rwightman Oct 17, 2021
Maintainer

Also the train script that launched the sweeps on the FB infra fixes the train interpolation at bicubic while timm train script defaults to random selection.

rwightman Oct 17, 2021
Maintainer

Attached is args for a reproduction run I did locally on 4xv100 setup (./distributed_train.sh 4 -config <>) ... I actually messed up the # warmup epochs in this one, but it was within .02 of the paper seed 0. While the weight init across seeds was verified by us to be the same, different local versions of various libs etc causes the aug / training randomness to be different enough that best seeds aren't necessarily the same across diff environments...

_79_83-fusedlamb-cosine-lr0.00500-wd0.020000-n0-rand-m7-mstd0.5-inc1-m0.1-sd0.1-d0.0-ls0.0-301-299-resnet50-args.yaml.txt

mrT23 Oct 18, 2021
Author

thanks, i think i have all the info I need

(one more difference is the label smoothing, the default is 0.1 so I need to set it to 0)

EthanChen1234 · 2022-01-24T14:25:06Z

EthanChen1234
Jan 24, 2022

Hi, @mrT23

You increased the A2 procedure, from 79.6% to 80.1%?
That's cool.

Hope you give me some configs or tricks to reproduce the result.

1 reply

mrT23 Jan 24, 2022
Author

i used ml-decoder classification head, instead of regular liner head:
https://github.com/Alibaba-MIIL/ML_Decoder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

A2 config from the article #917

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

A2 config from the article #917

Uh oh!

Uh oh!

mrT23 Oct 17, 2021

Replies: 2 comments · 4 replies

Uh oh!

rwightman Oct 17, 2021 Maintainer

Uh oh!

rwightman Oct 17, 2021 Maintainer

Uh oh!

rwightman Oct 17, 2021 Maintainer

Uh oh!

mrT23 Oct 18, 2021 Author

Uh oh!

EthanChen1234 Jan 24, 2022

Uh oh!

mrT23 Jan 24, 2022 Author

mrT23
Oct 17, 2021

Replies: 2 comments 4 replies

rwightman
Oct 17, 2021
Maintainer

rwightman Oct 17, 2021
Maintainer

rwightman Oct 17, 2021
Maintainer

mrT23 Oct 18, 2021
Author

EthanChen1234
Jan 24, 2022

mrT23 Jan 24, 2022
Author