Skip to content

[X86] Target feature implication mismatch with GCC #136209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sayantn opened this issue Apr 17, 2025 · 7 comments
Open

[X86] Target feature implication mismatch with GCC #136209

sayantn opened this issue Apr 17, 2025 · 7 comments

Comments

@sayantn
Copy link

sayantn commented Apr 17, 2025

There is a mismatch between which other features avx512f and avx512fp16 imply.

For avx512f, GCC implies only avx2, where LLVM also implies fma and f16c(see here).
Also for avx512fp16, GCC implies only avx512bw, where LLVM also implies avx512dq and avx512vl (see here).

gcc -mavx512f -Q --help=target | grep enabled

  -m128bit-long-double                  [enabled]
  -m64                                  [enabled]
  -m80387                               [enabled]
  -malign-stringops                     [enabled]
  -mavx                                 [enabled]
  -mavx2                                [enabled]
  -mavx512f                             [enabled]
  -mcrc32                               [enabled]
  -mdirect-extern-access                [enabled]
  -mevex512                             [enabled]
  -mfancy-math-387                      [enabled]
  -mfp-ret-in-387                       [enabled]
  -mfxsr                                [enabled]
  -mglibc                               [enabled]
  -mhard-float                          [enabled]
  -mieee-fp                             [enabled]
  -mlong-double-80                      [enabled]
  -mmmx                                 [enabled]
  -mmwait                               [enabled]
  -mpartial-vector-fp-math              [enabled]
  -mpopcnt                              [enabled]
  -mpush-args                           [enabled]
  -mred-zone                            [enabled]
  -msse                                 [enabled]
  -msse2                                [enabled]
  -msse3                                [enabled]
  -msse4                                [enabled]
  -msse4.1                              [enabled]
  -msse4.2                              [enabled]
  -mssse3                               [enabled]
  -mstv                                 [enabled]
  -mtls-direct-seg-refs                 [enabled]
  -mxsave                               [enabled]

gcc -mavx512fp16 -Q --help=target | grep enabled

  -m128bit-long-double                  [enabled]
  -m64                                  [enabled]
  -m80387                               [enabled]
  -malign-stringops                     [enabled]
  -mavx                                 [enabled]
  -mavx2                                [enabled]
  -mavx512bw                            [enabled]
  -mavx512f                             [enabled]
  -mavx512fp16                          [enabled]
  -mcrc32                               [enabled]
  -mdirect-extern-access                [enabled]
  -mevex512                             [enabled]
  -mfancy-math-387                      [enabled]
  -mfp-ret-in-387                       [enabled]
  -mfxsr                                [enabled]
  -mglibc                               [enabled]
  -mhard-float                          [enabled]
  -mieee-fp                             [enabled]
  -mlong-double-80                      [enabled]
  -mmmx                                 [enabled]
  -mmwait                               [enabled]
  -mpartial-vector-fp-math              [enabled]
  -mpopcnt                              [enabled]
  -mpush-args                           [enabled]
  -mred-zone                            [enabled]
  -msse                                 [enabled]
  -msse2                                [enabled]
  -msse3                                [enabled]
  -msse4                                [enabled]
  -msse4.1                              [enabled]
  -msse4.2                              [enabled]
  -mssse3                               [enabled]
  -mstv                                 [enabled]
  -mtls-direct-seg-refs                 [enabled]
  -mxsave                               [enabled]

gcc --version

gcc (GCC) 14.2.1 20250207
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I do not know what Intel specifies about implies features, but I believe this mismatch needs to be corrected.

related: rust-lang/rust#138940
related: rust-lang/rust#111137

@llvmbot
Copy link
Member

llvmbot commented Apr 17, 2025

@llvm/issue-subscribers-backend-x86

Author: Sayantan Chakraborty (sayantn)

There is a mismatch between which other features `avx512f` and `avx512fp16` imply.

For avx512f, GCC implies only avx2, where LLVM also implies fma and f16c(see here).
Also for avx512fp16, GCC implies only avx512bw, where LLVM also implies avx512dq and avx512vl (see here).

gcc -mavx512f -Q --help=target | grep enabled

  -m128bit-long-double                  [enabled]
  -m64                                  [enabled]
  -m80387                               [enabled]
  -malign-stringops                     [enabled]
  -mavx                                 [enabled]
  -mavx2                                [enabled]
  -mavx512f                             [enabled]
  -mcrc32                               [enabled]
  -mdirect-extern-access                [enabled]
  -mevex512                             [enabled]
  -mfancy-math-387                      [enabled]
  -mfp-ret-in-387                       [enabled]
  -mfxsr                                [enabled]
  -mglibc                               [enabled]
  -mhard-float                          [enabled]
  -mieee-fp                             [enabled]
  -mlong-double-80                      [enabled]
  -mmmx                                 [enabled]
  -mmwait                               [enabled]
  -mpartial-vector-fp-math              [enabled]
  -mpopcnt                              [enabled]
  -mpush-args                           [enabled]
  -mred-zone                            [enabled]
  -msse                                 [enabled]
  -msse2                                [enabled]
  -msse3                                [enabled]
  -msse4                                [enabled]
  -msse4.1                              [enabled]
  -msse4.2                              [enabled]
  -mssse3                               [enabled]
  -mstv                                 [enabled]
  -mtls-direct-seg-refs                 [enabled]
  -mxsave                               [enabled]

gcc -mavx512fp16 -Q --help=target | grep enabled

  -m128bit-long-double                  [enabled]
  -m64                                  [enabled]
  -m80387                               [enabled]
  -malign-stringops                     [enabled]
  -mavx                                 [enabled]
  -mavx2                                [enabled]
  -mavx512bw                            [enabled]
  -mavx512f                             [enabled]
  -mavx512fp16                          [enabled]
  -mcrc32                               [enabled]
  -mdirect-extern-access                [enabled]
  -mevex512                             [enabled]
  -mfancy-math-387                      [enabled]
  -mfp-ret-in-387                       [enabled]
  -mfxsr                                [enabled]
  -mglibc                               [enabled]
  -mhard-float                          [enabled]
  -mieee-fp                             [enabled]
  -mlong-double-80                      [enabled]
  -mmmx                                 [enabled]
  -mmwait                               [enabled]
  -mpartial-vector-fp-math              [enabled]
  -mpopcnt                              [enabled]
  -mpush-args                           [enabled]
  -mred-zone                            [enabled]
  -msse                                 [enabled]
  -msse2                                [enabled]
  -msse3                                [enabled]
  -msse4                                [enabled]
  -msse4.1                              [enabled]
  -msse4.2                              [enabled]
  -mssse3                               [enabled]
  -mstv                                 [enabled]
  -mtls-direct-seg-refs                 [enabled]
  -mxsave                               [enabled]

gcc --version

gcc (GCC) 14.2.1 20250207
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I do not know what Intel specifies about implies features, but I believe this mismatch needs to be corrected.

related: rust-lang/rust#138940
related: rust-lang/rust#111137

@topperc
Copy link
Collaborator

topperc commented Apr 17, 2025

If I recall correctly, the avx512f implying fma and fp16 is intentional to make the assembler behavior legal. avx512f and fma/fp16 use the same mnemonic names for some instructions. The assembler will prioritize the VEX encoding from fma over the EVEX encoding from avx512f if only registers xmm0-xmm15 are used and no EVEX only things like masking are used. If the VEX encoding can't be assumed to exist with avx512f then the assembler is broken.

@sayantn
Copy link
Author

sayantn commented Apr 17, 2025

Ok that makes sense, what about avx512fp16? Also, Intel doesn't specify anything about fma/f16c with avx512f right? So (theoretically) there can be CPUs that have avx512f, but not fma or f16c.

@topperc
Copy link
Collaborator

topperc commented Apr 17, 2025

The line in X86.td you linked to for avx512fp16 has two FIXMEs above it that describe the reason those dependencies are there.

@sayantn
Copy link
Author

sayantn commented Apr 17, 2025

Those comments seem strange. The types v8f16 or v8i64 should be guarded under sse2 and avx512f only (due to the register size), because the cpu doesn't care about the actual representation of the registers. Only the instructions should be guarded against some target features (e.g. vaddsh under avx512fp16).

If the same logic is followed, then _mm_add_round_ss should also be guarded by avx512vl (as it is EVEX encoded and has a 128-bit operand). But Intel specifies that it doesn't require avx512vl, only avx512f.

@topperc
Copy link
Collaborator

topperc commented Apr 18, 2025

The types v8f16 or v8i64 should be guarded under sse2 and avx512f only (due to the register size), because the cpu doesn't care about the actual representation of the registers.

Making a type legal in llvm but not having operations available requires a bunch of changes in LLVM to emulate or scalarize those operations. So the v8f16 comment was correct at the time it was written in 2021. The X86 backend only made v8f16 legal with VLX+avx512fpf16. It was made legal with SSE2 in 2022 f187948

The avx512dq comment doesn't make sense to me. v8i64 should have always been legal with avx512f.

@sayantn
Copy link
Author

sayantn commented Apr 18, 2025

So, if I am getting this right, atp both of them are fixed, and avx512fp16 should only imply avx512bw now? Then I believe this should be done soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants