Skip to content
This repository was archived by the owner on Aug 30, 2024. It is now read-only.

update submodules to latest versions #73

Merged
merged 5 commits into from
Mar 7, 2020

Conversation

mattip
Copy link
Contributor

@mattip mattip commented Feb 17, 2020

test latest version of tools before adding new platforms, also check openblas is properly installed

@mattip
Copy link
Contributor Author

mattip commented Feb 17, 2020

The 32 bit builds are failing the new test_umath_accuracy.py with 8 ULP instead of 4. I wonder if this is a problem with gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 in the manylinux2010 image. The azure builds on numpy/numpy don't report which gcc is in use, but they use i386/ubuntu:bionic and then download gcc7 as well. Since the azure build neither skips nor xfails the test, I assume it passes but maybe a more verbose log would help.

I can try to reproduce the failure locally using the manylinux2010 32-bit image

@charris
Copy link
Contributor

charris commented Feb 17, 2020

IIRC, the tests are run in a docker image. I wonder if the correct image is being used?

@mattip
Copy link
Contributor Author

mattip commented Feb 17, 2020

@r-devulap do you have an idea why the log test is returning more than 8 ULP difference on a travis docker run with the manylinux2010 image?

Edit: @charris is correct, this is with the matthew-brett 32 bit image using wheelhouse/numpy-1.19.0.dev0+4edd437-cp36-cp36m-manylinux2010_i686.wh produced by the build-wheel step

@mattip
Copy link
Contributor Author

mattip commented Feb 17, 2020

@charris is that wheel uploaded anywhere, or is it only local to the CI run?

@charris
Copy link
Contributor

charris commented Feb 17, 2020

@mattip Which wheel?

@mattip
Copy link
Contributor Author

mattip commented Feb 17, 2020

The wheels from this PR: specifically wheelhouse/numpy-1.19.0.dev0+4edd437-cp36-cp36m-manylinux2010_i686.whl

@charris
Copy link
Contributor

charris commented Feb 17, 2020

No, the wheels generated in PRs are not uploaded, the rackspace access key isn't decoded in PR's. I don't know how to access them otherwise. We can change that as a travis option, I think, but then the rackspace repository would be contaminated. I wonder if we could send wheels to someplace like dropbox?

@mattip
Copy link
Contributor Author

mattip commented Feb 17, 2020

Not a big deal, I can build it locally since both the build and the test are scripted runs.

@mattip
Copy link
Contributor Author

mattip commented Feb 17, 2020

The 32-bit docker that fails and the 64-bit that pass have the same __cpu_features__, so they must be using the same loops. Maybe it is some cpu floating point register that is set differently?

@pv
Copy link

pv commented Feb 17, 2020

mfpmath=sse vs mfpmath=387?

@r-devulap
Copy link

The 32 bit builds are failing the new test_umath_accuracy.py with 8 ULP instead of 4.

@mattip the ULP difference is not 8, it is 8.38861e+06 ULP. Something is off (like computing ULP difference between 2 NAN's or something). Can you point me to a docker image I can try running locally to debug?

@r-devulap
Copy link

I see this in the error message: core/tests/test_numeric.py::TestClip::test_NaT_propagation[arr0-amin0-amax0]

@mattip
Copy link
Contributor Author

mattip commented Feb 17, 2020

The same docker that runs the azure tests successfully fails to properly use the wheel built here. I am now going to try to both build and test with the azure script from numpy/numpy. Note that this repo builds a wheel in the multilinux2010 docker and then tests it in a different one.

@mattip
Copy link
Contributor Author

mattip commented Feb 17, 2020

ULP difference is not 8, it is 8.38861e+06 ULP

Ahh, so maybe it is failing to properly catch the NAN error conditions in these dockers, and somehow Azure does something special?

@mattip
Copy link
Contributor Author

mattip commented Feb 17, 2020

mfpmath=sse vs mfpmath=387

Do we set that at all in the gcc flags, or is there an -march setting that toggles one or the other? In numpy/numpy#15585 I changed the platform.uname().machine: on the Azure build it is x86_64 but on a local build it is i686. So maybe this is a bug in the __cpu_feature__ detection inside a i686 docker.

@mattip
Copy link
Contributor Author

mattip commented Feb 17, 2020

@r-devulap : I can run the manylinux docker like

sudo docker run -it --rm -v/path/to/git/checkout/of/numpy:/build_dir \
quay.io/pypa/manylinux2010_i686  /bin/bash

Once in the docker, you will want to cd /build_dir, which will put you at your external numpy checkout. I find it helpful to adduser and su so I am not root but YMMV. The pythons are at /opt/python/cp*, so I usually /opt/python/cp36-cp36m/bin/python -venv /tmp/venv, then I can /tmp/venv/bin/python -mpip install -r test_requirements.txt and /tmp/venv/bin/python runtests.py.

The problem only shows up when compiling inside the docker of the manylinux image.

@r-devulap
Copy link

Thanks @mattip, I did confirm the failure and it happens for a NAN to NAN comparison, not sure what the difference is between this and other linux Glibc that it fails for this. Will dig in further tomorrow.

@mattip
Copy link
Contributor Author

mattip commented Feb 18, 2020

As always, there is more than one problem.

First, HAVE_ATTRIBUTE_TARGET_AVX512F_WITH_INTRINSICS NPY_HAVE_SSE2_INTRINSICS is not defined on this platform, even though the __cpu_features__ say it should be. So the FLOAT_log_avx512f FLOAT_log_fma loop is substituted in (because the feature should be there), the test is not skipped (because the feature should be there), and instead of the avx function, glibc's npy_log is being used (because the feature is not there).

Second, npy_log(-1e-45) is returning NAN, just not the NAN the test asks for. The test wants 0xffc00000, the function is returning 0x7fc00000. Both are NAN, but nulp_diff is very literal:

>>> import numpy as np, numpy.testing
>>> x = np.int32(0xffc00000).view(np.float32)
>>> y = np.int32(0x7fc00000).view(np.float32)
>>> print(x, y)
nan nan
>>> numpy.testing._private.utils.nulp_diff(y, x)
8388608.0

Edit: loop name, proper define

@mattip
Copy link
Contributor Author

mattip commented Feb 18, 2020

For completeness, this is what __cpu_features__ reports, so at least the feature detection disagrees with check_gcc_function_attribute_with_intrinsics:

>>> np.core._multiarray_umath.__cpu_features__['AVX512F']
False
>>> np.core._multiarray_umath.__cpu_features__['FMA3']
True
>>> np.core._multiarray_umath.__cpu_features__['AVX2']
True
>>> np.core._multiarray_umath.__cpu_features__['SSE2']
True

Edit: SSE2 is detected, but NPY_HAVE_SSE2_INTRINSICS is not defined

@mattip
Copy link
Contributor Author

mattip commented Feb 18, 2020

Edited: the first problem is SSE2 is detected via __cpu_features__, but NPY_HAVE_SSE2_INTRINSICS is not defined

@mattip
Copy link
Contributor Author

mattip commented Feb 18, 2020

xref numpy/numpy#15593 for the use of a too-specific NAN

For the dissonance between SSE2 detection methods, ping @seiko2plus

@mattip
Copy link
Contributor Author

mattip commented Feb 18, 2020

The log problem is solved by added -msse2 to the compile flags, otherwise the check for _mm_load_pd fails which fails the NPY_HAVE_SSE2_INTRINSICS define in npy_common.h

@seiko2plus
Copy link
Contributor

@mattip, The short answer, NPY_HAVE_SSE2_INTRINSICS check during the build if the compiler has support for SSE2, while __cpu_features__["SSE2"] and NPY_CPU_HAVE(SSE2) check if the running CPU has support for SSE2 during the runtime. we discussed a fix for this situation in numpy/numpy#15558.

@mattip
Copy link
Contributor Author

mattip commented Feb 18, 2020

Now better, but there are still 32-bit failures. We should probably add a manylinux201 i686 run to the regular CI to pick these up.

Is there an accepted way to skip test failures in certain builds? Or perhaps we should disable 32-bit wheel builds until we can work out the kinks.

Edit: add "wheel" to last line

@mattip mattip closed this Feb 26, 2020
@mattip mattip reopened this Feb 26, 2020
@mattip
Copy link
Contributor Author

mattip commented Feb 26, 2020

Nope, still getting a segfault on linux32.

@mattip
Copy link
Contributor Author

mattip commented Mar 2, 2020

Close reopen to upgrade openblas to 0.3.9 beta

@mattip mattip closed this Mar 2, 2020
@mattip mattip reopened this Mar 2, 2020
@mattip mattip force-pushed the update-subrepos branch from 8e3cb1e to c9ab82e Compare March 6, 2020 13:55
exclude:
# Exclude the default Python 3.6 build
- python: 3.6
jobs:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lines 19-here update the confusing travis.yml: we don't have any python3.6 jobs to exclude, sudo is no longer needed, matrix should be replaced by jobs.

@@ -7,44 +7,39 @@ function build_wheel {
local lib_plat=$PLAT
if [ -n "$IS_OSX" ]; then
install_gfortran
else
# For manylinux2010 builds with manylinux1 openblas builds
$use_sudo yum install -y libgfortran-4.4.7
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to use the manylinux2010 openblas builds on the manylinux1 wheel builds. libfortran will be statically linked (uses the *.a file) so the wheels are still valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like the comment is backward. Maybe "Needed for manylinux1 wheels using libraries built for manylinux2010."

@mattip
Copy link
Contributor Author

mattip commented Mar 6, 2020

Builds are green

config.sh Outdated
(cd / && $use_sudo tar zxf $tar_path)
# Use the same incantation as numpy/tools/travis-before-install.sh
python -mpip install urllib3
target=$(python numpy/tools/openblas_support.py)
Copy link
Contributor

@charris charris Mar 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange to call a command a target. Maybe explicitly use python on the next line. Where does the file come from?

EDIT: nvm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a strange construction, it could use a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like what it does is download all the libraries into /var/tmp/openblas/. This could become rather inefficient if the libraries become numerous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a comment and a todo to use site.cfg instead of copying.

# Environment variables for build
OPENBLAS_VERSION="v0.3.7"
MACOSX_DEPLOYMENT_TARGET=10.9
CFLAGS="-msse2 -std=c99 -fno-strict-aliasing"
Copy link
Contributor

@charris charris Mar 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is comes from env_vars.sh, correct? I assume the extra bit is -msse2. Is the latest multibuild is updated to use this. Is it documented in the README?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The needed bit is the -msse2. Commenting here, do you have a place in the README you would like to see it documented as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a line to the README, as well as explaining the OpenBLAS download

}

function get_test_cmd {
local extra_argv=${1:-$EXTRA_ARGV}
echo "import sys; import numpy; \
sys.exit(not numpy.test('full', \
extra_argv=[${extra_argv}]))"
extra_argv=['-vv', ${extra_argv}]))"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For debugging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@charris
Copy link
Contributor

charris commented Mar 7, 2020

LGTM, I'll put this in when the tests finish. Thanks Matti.

@charris
Copy link
Contributor

charris commented Mar 7, 2020

Is changing .appveyor.yml to download from anaconda cloud next?

@mattip
Copy link
Contributor Author

mattip commented Mar 7, 2020

Updated appyveyor too

@mattip mattip force-pushed the update-subrepos branch from 6c5beef to 1e6dae6 Compare March 7, 2020 18:53
@charris charris merged commit 0e39420 into MacPython:master Mar 7, 2020
@charris
Copy link
Contributor

charris commented Mar 7, 2020

Merged.

@charris
Copy link
Contributor

charris commented Mar 7, 2020

Hmm, two of the 32 bit builds segfaulted in the merge.

@mattip
Copy link
Contributor Author

mattip commented Mar 7, 2020

Grrr. I was hoping that the move to xenial would fix those pesky 32-bit segfaults. Maybe we should revert the -msse2 flag addition, and not use sse loops for the 32-bit builds? It would require adjusting the NumPy tests. Some of the tests assume if the CPU supports sse2 then the compiler will also.

@charris
Copy link
Contributor

charris commented Mar 7, 2020

Grrr. I was hoping that the move to xenial would fix those pesky 32-bit segfaults.

Me too :) SSE2 has been around for 20 years, in fact I think NumPy requires it and we made that decision some time ago. Remind me what the compiler problem is.

@mattip
Copy link
Contributor Author

mattip commented Mar 7, 2020

If we compile with sse2 we get segfaults. If we compile without it, we get test failures on 32-bit since

  • numpy compiles without support for the hand-written intrinsic loops for exp, reciprical and more because gcc does not use sse2
  • tests for accuracy of those ufunc loops fail since glibc is less accurate than the hand-written loops.

@charris
Copy link
Contributor

charris commented Mar 7, 2020

Hmm. Is the problem only in manylinux2010 and in manylinux1, or is it something new we are testing for in master.

@mattip
Copy link
Contributor Author

mattip commented Mar 8, 2020

It is something new in master: we now have sse-specific loops for many ufuncs. The latest crop of sse/avx loops added accuracy tests. This pointed out a deficiency in the glibc version of some of the functions (particularily reciprocal), which does not pass the accuracy tests on linux32. That led to the discovery that we need -msse2 on 32-bits to avoid using glibc and use the sse versions, which was added in PR 15600 about 3 weeks ago. Since the current numpy-wheels builds do not use that flag, they do not fail.

I see that on Azure the builds also do not fail. There the tests are run in the multibuild2010_i686 image on top of Ubuntu 18.04, here we run in a Ubuntu 16.04 (xenial) image on top of Ubuntu 18.04 (bionic). Things to try:

  • removing the msse2 flag
  • running tests in a multibuild2010_i686 image
  • move the 32-bit linux and appveyor tests to Azure, like in numpy/numpy

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants