Has anyone been successful with quantized vision transformers? #674
Unanswered
alexander-soare
asked this question in
General
Replies: 1 comment 1 reply
-
@alexander-soare I have not done this, but I'd pay particular attention to what's happening with the GELU activations during quantization, how do they get approximated? Also the LayerNorm mean/std, possible overflow? What's the precision of the accumulator? Despite the annoyances of BatchNorm, they are great for inference/quantization compared to GN, LN, etc that must always calc activations stats in the fwd pass. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Today I managed to run FX quantization on
vit_deit_base_distilled_patch16_384
with a few minor tweaks. I get a 2.5x speed up on CPU. Accuracy plummets though :(Wondering if anyone has had experience with doing Quantization Aware Training on a vision transformer. Were you successful?
Beta Was this translation helpful? Give feedback.
All reactions