-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
COMPAT/BLD: rolling failed on Arm64 and ppc64le Linux #38921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@fangchenli, I can reproduce the exact same problem on amd64. It seems to be dependent on |
@mgorny could you give more info about your OS and CPU? I couldn't reproduce this on Intel Mac with the |
It's Gentoo Linux.
|
|
Scratch that. I didn't notice the file's in C++ and I didn't update CXXFLAGS. Apparently you need to have In any case, I've been able to pin it down to My results:
|
Not sure how much help is that but I was able to make
So I guess the |
@mgorny Thank you so much for this input. @pandas-dev/pandas-core Any thought on this? |
Does it always fail? Wondering if these expect a particular alignment. |
Yes, it fails reliably. I don't think it's alignment-related, as it produces the wrong result rather than crashing. I'm wondering if it could be a GCC optimization bug. |
It definitely sounds like a compiler bug. It is unfortunately really hard to do things like add pragmas or ifdefs in Cython from what I can tell. It might be possible to tell compiler to disable optimizations in these functions using C99 Pragma_ but these are heavy handed. Probably simpler to just pass specific compiler flags for this file on ARM so that it is possible less aggressively optimized (e.g. ,-O1). |
It definitely sounds like a compiler bug. It is unfortunately really hard to do things like add pragmas or ifdefs in Cython from what I can tell. It might be possible to tell compiler to disable optimizations in these functions using C99 Pragma_ but these are heavy handed. Probably easiest to just pass specific compiler flags for this file on ARM so that it is possible less aggressively optimized (e.g. ,O1). |
Anyone want to build with clang with the same flags rather than gcc to see if it reproduces? |
Will do in a minute. I can also try gcc-9, in case version matters (I'm running gcc 10.2.0). |
gcc-9.3.0 suffers the same problem, clang++ too. However, note that on clang I had to pass Note that I'm only testing on AMD64. I don't know if AArch64 implements FMA equivalent (as part of NEON, maybe?). But it smells a bit suspicious that two compilers would have the same bug on two different architectures. |
Heh, I've just looked at bug #37051 and now I feel stupid. It's not a bug but actually a 'bugfix'. FWICS the test is supposed to test for artifacts due to precision loss, so it obviously fails when the calculations are done with a better precision which is probably what's happening here. Independently of compiler flags for pandas, I can reproduce it with:
i.e. this is clearly due to FMA application here, and the compiler is applying it correctly. |
It wasn't just the build method, but also the environment. I changed the environment when switching to meson. This is the old environment below. INSTALLED VERSIONScommit : 43f1bc8 pandas : 2.1.0.dev0+874.g43f1bc8fb6 I haven't checked what's making the difference. |
So currently we have:
I'd like to try the setuptools build on M2 to see what happens there. @topper-123, how did you reset your environment and build folder for that? I don't want to miss a step... |
The old environment still exists, it's just that I couldn't get it to build with meson, so I created a completely new one. In the new environment I get the failures both with meson and (after doing If you're interested I could provide some logs from the old environment from the meson build failure. I created the new environment because I gave up trying to figure out why it was failing, so I don't think I can explain what was happening, but you're welcome to take a look. |
If meson and setuptools yield different results then they must not be compiling the same way. Can you check the flags sent to gcc in both methods and see what may be different? Also are there any build warnings? |
I am not proficient in gcc or gcc flags, or what to look for in the gcc logs, sorry. If you can give some hints what to look for I could give a shot, but not sure I will make sense of it. |
No problem. I'm pretty sure that this test hits the aggregations.pyx file in pandas/_libs/window/aggregations.pyx . So you are looking for a gcc command that generates aggregations.o from the Cython-generated aggregations.cpp file setuptools prints the commands it uses for compilation directly to stdout. So when you do the traditional setup.py build_ext look for something like this in your terminal: /home/willayd/mambaforge/envs/pandas-dev/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare
-DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/willayd/mambaforge/envs/pandas-dev/include -fPIC -O2
-isystem /home/willayd/mambaforge/envs/pandas-dev/include -march=nocona -mtune=haswell -ftree-vectorize -fPIC
-fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe
-isystem /home/willayd/mambaforge/envs/pandas-dev/include -DNDEBUG -D_FORTIFY_SOURCE=2
-O2 -isystem /home/willayd/mambaforge/envs/pandas-dev/include -fPIC -DNPY_NO_DEPRECATED_API=0
-Ipandas/_libs/include -I/home/willayd/mambaforge/envs/pandas-dev/lib/python3.10/site-packages/numpy/core/include
-I/home/willayd/mambaforge/envs/pandas-dev/include/python3.10 -c pandas/_libs/window/aggregations.cpp
-o build/temp.linux-x86_64-cpython-310/pandas/_libs/window/aggregations.o In the above command you can see I am using the x86_64-conda-linux-gnu-cc compiler and passing it compilation options like https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html As far as meson is concerned, it doesn't print out the gcc command by default, but you can specify the compile-args as verbose with As a complete example, you can try /home/willayd/mambaforge/envs/pandas-dev/bin/x86_64-conda-linux-gnu-c++ -shared -Wl,--allow-shlib-undefined
-Wl,-rpath,/home/willayd/mambaforge/envs/pandas-dev/lib -Wl,-rpath-link,/home/willayd/mambaforge/envs/pandas-dev/lib
-L/home/willayd/mambaforge/envs/pandas-dev/lib -Wl,--allow-shlib-undefined
-Wl,-rpath,/home/willayd/mambaforge/envs/pandas-dev/lib -Wl,-rpath-link,/home/willayd/mambaforge/envs/pandas-dev/lib
-L/home/willayd/mambaforge/envs/pandas-dev/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,
-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined
-Wl,-rpath,/home/willayd/mambaforge/envs/pandas-dev/lib
-Wl,-rpath-link,/home/willayd/mambaforge/envs/pandas-dev/lib
-L/home/willayd/mambaforge/envs/pandas-dev/lib -march=nocona -mtune=haswell -ftree-vectorize
-fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe
-isystem /home/willayd/mambaforge/envs/pandas-dev/include -DNDEBUG
-D_FORTIFY_SOURCE=2 -O2
-isystem /home/willayd/mambaforge/envs/pandas-dev/include build/temp.linux-x86_64-cpython-310/pandas/_libs/window/aggregations.o
-o build/lib.linux-x86_64-cpython-310/pandas/_libs/window/aggregations.cpython-310-x86_64-linux-gnu.so Arguments that follow the pattern Re: build warnings, definitely keep an out eye for them. If you are compiling serially then the warning should appear directly below the compilation command being used, but in parallel compilation you may need to look further down the output. When in doubt, paste the entire output of the compilations and we can look at them here |
Actually an even easier way to see what meson did is to look at the |
I have this failing for months, this is not solely related to meson. |
test_rolling.py is unreliable on aarch64-darwin depending on the compiler used and possibly optimization settings. Disabling these tests allows clang 16 to build Pandas on Darwin successfully. See pandas-dev/pandas#38921.
We test on more architectures, so upstream's xfails are not always correct everywhere. On those known to fail: arm64 xfail -> all non-x86 xfail x86 or unconditional strict xfail -> unconditional nonstrict xfail pandas/tests/window/test_rolling.py also gets an i386 xfail for rounding error that may be x87 excess precision Author: Rebecca N. Palmer <[email protected]> Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146 Forwarded: no Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
We test on more architectures, so upstream's xfails are not always correct everywhere. On those known to fail: arm64 xfail -> all non-x86 xfail x86 or unconditional strict xfail -> unconditional nonstrict xfail pandas/tests/window/test_rolling.py also gets an i386 xfail for rounding error that may be x87 excess precision Author: Rebecca N. Palmer <[email protected]> Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146 Forwarded: no Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
We test on more architectures, so upstream's xfails are not always correct everywhere. On those known to fail: arm64 xfail -> all non-x86 xfail x86 or unconditional strict xfail -> unconditional nonstrict xfail pandas/tests/window/test_rolling.py also gets an i386 xfail for rounding error that may be x87 excess precision Author: Rebecca N. Palmer <[email protected]> Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146 Forwarded: no Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
We test on more architectures, so upstream's xfails are not always correct everywhere. On those known to fail: arm64 xfail -> all non-x86 xfail x86 or unconditional strict xfail -> unconditional nonstrict xfail Author: Rebecca N. Palmer <[email protected]> Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146 Forwarded: no Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
We test on more architectures, so upstream's xfails are not always correct everywhere. On those known to fail: arm64 xfail -> all non-x86 xfail x86 or unconditional strict xfail -> unconditional nonstrict xfail Author: Rebecca N. Palmer <[email protected]> Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146 Forwarded: no Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
The test is failing on riscv64. The failure is reproducible under qemu and on real hardware (VisionFive 2). Updates pandas-dev#38921
This test is failing on riscv64 as well. I see the failures on a VisionFive2 running Ubuntu 23.10 and also on qemu-user. I've submitted a patch to add riscv64 to the list of platforms for which the test is disabled. |
) The test is failing on riscv64. The failure is reproducible under qemu and on real hardware (VisionFive 2). Updates #38921
We test on more architectures, so upstream's xfails are not always correct everywhere. On those known to fail: arm64 xfail -> all non-x86 xfail x86 or unconditional strict xfail -> unconditional nonstrict xfail Author: Rebecca N. Palmer <[email protected]> Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146 Forwarded: no Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
We test on more architectures, so upstream's xfails are not always correct everywhere. On those known to fail: arm64 xfail -> all non-x86 xfail x86 or unconditional strict xfail -> unconditional nonstrict xfail Author: Rebecca N. Palmer <[email protected]> Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146 Forwarded: no Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
We test on more architectures, so upstream's xfails are not always correct everywhere. On those known to fail: arm64 xfail -> all non-x86 xfail x86 or unconditional strict xfail -> unconditional nonstrict xfail Author: Rebecca N. Palmer <[email protected]> Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146 Forwarded: no Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
We test on more architectures, so upstream's xfails are not always correct everywhere. On those known to fail: arm64 xfail -> all non-x86 xfail x86 or unconditional strict xfail -> unconditional nonstrict xfail Author: Rebecca N. Palmer <[email protected]> Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146 Forwarded: no Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
We test on more architectures, so upstream's xfails are not always correct everywhere. On those known to fail: arm64 xfail -> all non-x86 xfail x86 or unconditional strict xfail -> unconditional nonstrict xfail Author: Rebecca N. Palmer <[email protected]> Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146 Forwarded: no Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
We test on more architectures, so upstream's xfails are not always correct everywhere. On those known to fail: arm64 xfail -> all non-x86 xfail x86 or unconditional strict xfail -> unconditional nonstrict xfail Author: Rebecca N. Palmer <[email protected]> Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146 Forwarded: no Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
We test on more architectures, so upstream's xfails are not always correct everywhere. On those known to fail: arm64 xfail -> all non-x86 xfail x86 or unconditional strict xfail -> unconditional nonstrict xfail Author: Rebecca N. Palmer <[email protected]> Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146 Forwarded: no Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
We test on more architectures, so upstream's xfails are not always correct everywhere. On those known to fail: arm64 xfail -> all non-x86 xfail x86 or unconditional strict xfail -> unconditional nonstrict xfail Author: Rebecca N. Palmer <[email protected]> Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146 Forwarded: no Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
We test on more architectures, so upstream's xfails are not always correct everywhere. On those known to fail: arm64 xfail -> all non-x86 xfail x86 or unconditional strict xfail -> unconditional nonstrict xfail Author: Rebecca N. Palmer <[email protected]> Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146 Forwarded: no Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
We test on more architectures, so upstream's xfails are not always correct everywhere. On those known to fail: arm64 xfail -> all non-x86 xfail x86 or unconditional strict xfail -> unconditional nonstrict xfail Author: Rebecca N. Palmer <[email protected]> Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146 Forwarded: no Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
The test
pandas/tests/window/test_rolling.py::test_rolling_var_numerical_issues
has failed on arm64 build.The text was updated successfully, but these errors were encountered: