Omit Normalization Layer from MLP Class #1852

ChristophReich1996 · 2023-06-16T18:17:00Z

This pull request fixes issue #1851. The unused normalization layer is omitted from the Mlp class. Usage of the norm_layer argument is also omitted. Both BEiT(-v2) and EVA(-02) models are not affected by this change. EVA(-02) uses the norm_layer argument but since all configurations are using the SwiGLU instead if Mlp existing model weights are not affected. BEiT(-v2) sets in all configurations scale_mlp=False, meaning norm_layer was set to None, thus, also no affect here on existing models/weights.

HuggingFaceDocBuilderDev · 2023-06-16T18:22:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

rwightman · 2023-06-17T00:14:54Z

@ChristophReich1996 my bad on communication, I was assuming this'd go the other way, call the .norm() in forward() in case it's set, as this is a modification as per NormFormer https://arxiv.org/abs/2110.09456 ... as you noted it's only used in combo with the swiglu right now, but can be used in a non-gated MLP too

Omit unused norm_layer from Mlp class

3c15862

rwightman closed this in 76d1669 Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Omit Normalization Layer from MLP Class #1852

Omit Normalization Layer from MLP Class #1852

ChristophReich1996 commented Jun 16, 2023

HuggingFaceDocBuilderDev commented Jun 16, 2023

rwightman commented Jun 17, 2023

Omit Normalization Layer from MLP Class #1852

Omit Normalization Layer from MLP Class #1852

Conversation

ChristophReich1996 commented Jun 16, 2023

HuggingFaceDocBuilderDev commented Jun 16, 2023

rwightman commented Jun 17, 2023