The implementation of bottleneck_attn #890

ydhongHIT · 2021-09-28T12:41:13Z

ydhongHIT
Sep 28, 2021

In the https://github.com/rwightman/pytorch-image-models/blob/3f9959cdd28cb959980abf81fc4cf34f32e18399/timm/models/layers/bottleneck_attn.py#L125,
I think it should be ''attn_out = (attn_out @ v).transpose(1, 2).reshape(B, H, W, self.dim_out).permute(0, 3, 1, 2)".

rwightman · 2021-09-28T23:19:52Z

rwightman
Sep 28, 2021
Maintainer

@ydhongHIT looks like it is indeed wrong, but was a simpler mistake/fix. I left off the negatives on the transpose indices, I believe it should be (attn_out @ v).transpose(-1, -2).reshape(B, self.dim_out, H, W)

Before the matmul, the dims are B, num_heads, H * W, dim_head, so we want to swap H*W with dim_head before the reshape... your approach works too, it's just extra steps

I'll see if this improves the training, bottleneck transformer was not working very well compared to halo and I hadn't found the time to analyse closely

1 reply

rwightman Sep 29, 2021
Maintainer

FYI, it's on this PR / branch right now, testing training.... #880

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The implementation of bottleneck_attn #890

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

The implementation of bottleneck_attn #890

ydhongHIT Sep 28, 2021

Replies: 1 comment · 1 reply

rwightman Sep 28, 2021 Maintainer

rwightman Sep 29, 2021 Maintainer

ydhongHIT
Sep 28, 2021

Replies: 1 comment 1 reply

rwightman
Sep 28, 2021
Maintainer

rwightman Sep 29, 2021
Maintainer