Replies: 1 comment 1 reply
-
@ydhongHIT looks like it is indeed wrong, but was a simpler mistake/fix. I left off the negatives on the transpose indices, I believe it should be Before the matmul, the dims are I'll see if this improves the training, bottleneck transformer was not working very well compared to halo and I hadn't found the time to analyse closely |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In the https://github.com/rwightman/pytorch-image-models/blob/3f9959cdd28cb959980abf81fc4cf34f32e18399/timm/models/layers/bottleneck_attn.py#L125,
I think it should be ''attn_out = (attn_out @ v).transpose(1, 2).reshape(B, H, W, self.dim_out).permute(0, 3, 1, 2)".
Beta Was this translation helpful? Give feedback.
All reactions