BUG: FloatingArray(float16data) #44715

jbrockmendel · 2021-12-02T02:47:18Z

closes #xxxx
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

Broken off from #43930 then fleshed out with targeted tests.

jreback · 2021-12-03T16:31:23Z

================================== FAILURES ===================================
____________________ test_floating_array_disallows_float16 ____________________
[gw0] win32 -- Python 3.9.7 C:\Miniconda\envs\pandas-dev\python.exe
[XPASS(strict)] numpy does not raise on np.dtype('Float16')
============================== warnings summary ===============================
pandas/tests/extension/test_floating.py::Test2DCompat::test_reductions_2d_axis_none[Float32Dtype-prod]
pandas/tests/extension/test_floating.py::Test2DCompat::test_reductions_2d_axis1[Float32Dtype-prod]
C:\Miniconda\envs\pandas-dev\lib\site-packages\numpy\core\fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

pandas/tests/frame/methods/test_shift.py::TestDataFrameShift::test_shift_dt64values_axis1_invalid_fill[Float32-True]
C:\Miniconda\envs\pandas-dev\lib\site-packages_pytest_io\saferepr.py:58: FutureWarning: Index.ravel returning ndarray is deprecated; in a future version this will return a view on self.
s = repr(x)

-- Docs: https://docs.pytest.org/en/stable/warnings.html

looks like an xpass

jreback · 2021-12-03T16:32:37Z

i guess user facing if you can add a note

jbrockmendel · 2021-12-03T18:46:17Z

i guess user facing if you can add a note

has one

jbrockmendel · 2021-12-03T18:47:07Z

pandas/tests/frame/methods/test_shift.py::TestDataFrameShift::test_shift_dt64values_axis1_invalid_fill[Float32-True]
C:\Miniconda\envs\pandas-dev\lib\site-packages_pytest_io\saferepr.py:58: FutureWarning: Index.ravel returning ndarray is deprecated; in a future version this will return a view on self.

Yah I've seen that locally too, haven't found a good way to suppress it.

jbrockmendel · 2021-12-03T18:56:04Z

Opened numpy/numpy#20512 about the xpass

jbrockmendel · 2021-12-04T00:37:08Z

greenish

burnpanck · 2021-12-18T14:35:06Z

This PR has broken the tests for me as a non-US user. The new test here adds a pytest.mark.xfail for me because I'm not on the en_US locale, however, the test does in fact XPASS, which causes the tests to fail in the standard (strict) configuration. Potentially related to #44625?

jbrockmendel · 2021-12-18T16:22:19Z

What locale are you using? Let’s update the condition to be more specific

burnpanck · 2021-12-19T10:50:10Z

My locale is

LANG=en_GB.UTF-8
LC_ALL=de_CH.UTF-8
LC_CTYPE=UTF-8

that is the locale is de_CH but I expect my OS to talk to me in en_GB.

However, I believe this is going to be a rabbit hole: There will always be a locale that hasn't been considered causing trouble for somebody else - that is exactly what happened in #44625. Either an API is locale-dependent, or it isn't. If it is, we shouldn't try to make any assumptions about the value unless we know exactly what the locale is. Either the locale should be set within the context of the test (difficult, because there is no API for a context-local locale change), or any outcome should be accepted unless the current locale happens to match a specific known one. In the case here, any unknown locale should be directed to pytest.mark.skip, not .xfail.

jbrockmendel · 2021-12-19T21:03:02Z

Either an API is locale-dependent, or it isn't

Not sure where this cases fits in this dichotomy: numpy's behavior clearly shouldn't depend on the locale, but in practice does.

or any outcome should be accepted unless the current locale happens to match a specific known one. In the case here, any unknown locale should be directed to pytest.mark.skip, not .xfail.

I've been making a push to change skips to xfails because many of the skipped tests get fixed without us being alerted to that fact. I recognize this is a pain point for you, but I think the solution here is to figure out the actually-correct condition under which numpy gives the weird behavior and only xfail in that case.

difficult, because there is no API for a context-local locale change

pd._config.localization.set_locale?

burnpanck · 2021-12-20T06:48:33Z

Not sure where this cases fits in this dichotomy: numpy's behavior clearly shouldn't depend on the locale, but in practice does.

You are right of course, there is a third option: An API that is not documented can do anything it wants. The common cases here is of course the exception path: Take any documented API and use it in an unsupported way - in python we expect to get an Exception, but there is no specification how that exception looks like. The thing is, both here and in #44625 we acknowledge that the exception message is in fact locale-dependent, as we modify the test behaviour based on the current locale.

I've been making a push to change skips to xfails because many of the skipped tests get fixed without us being alerted to that fact. I recognize this is a pain point for you, but I think the solution here is to figure out the actually-correct condition under which numpy gives the weird behavior and only xfail in that case.

If you want to mark a test-case to fail due to an upstream bug that is out of control xfail is of course the right thing to do. However, in the case here (that is the new test test_floating_array_disallows_float16), the xfail does not concern the numpy behaviour! The xfail marks concerns the behaviour of the test-case. As it stands, the test-case here expects numpy to throw a specific exception with a specific message iff the locale is en_US - with that xfail, it also expects every other locale in the world not to throw that exception. That assumption is just plain simply wrong, and thus the test effectively broken.

I don't know about you, but I have no clue how I would be able to predict the behaviour for every possible locale. Unless you can find a reasonably efficient way to identify the behaviour of a locale-dependent API for every conceivable locale (and the CI tests cover every such locale), I stand by my assessment: We should not make any assumptions about unknown locales.

IMHO, the best fix for both issues (that is the test this PR broke, as well as #44625) is to only check the specific exception message when the locale is among a small restricted set of known locales (such as en_US only), and for every other locale just check the exception type. I'm willing to create a PR to that end. The other option would be to force the locale within those tests, but changing the locale (even temporarily) is often frowned upon...:

pd._config.localization.set_locale

Didn't know that one, thanks for the pointer! It appears to be a context manager that uses locale.setlocale, which is documented as not thread-safe; it still changes the locale globally (process-wide) - and has no understanding of PEP567 contexts. In general, it would be a very hard task to implement such a context-local locale change: The locale influences some C-library functions provided by the system and used by CPython. One would have to avoid any locale-dependent C-library function in CPython, because the C-library does not specify any method even for a thread-local change to the locale.

jbrockmendel · 2021-12-21T22:45:41Z

I don't know about you, but I have no clue how I would be able to predict the behaviour for every possible locale. Unless you can find a reasonably efficient way to identify the behaviour of a locale-dependent API for every conceivable locale (and the CI tests cover every such locale)

#44972?

As it stands, the test-case here expects numpy to throw a specific exception with a specific message iff the locale is en_US - with that xfail, it also expects every other locale in the world not to throw that exception. That assumption is just plain simply wrong, and thus the test effectively broken.

The process is an iterative one. We'll fix the test in your locale, then wait for someone else to inform us the test is broken in their locale and recurse.

I recognize that is an annoyance, but the alternative of skipping would mean that functionality would be untested in e.g. your locale.

which is documented as not thread-safe [...]

Good to know, thanks!

burnpanck · 2021-12-22T10:56:49Z

I recognize that is an annoyance, but the alternative of skipping would mean that functionality would be untested in e.g. your locale.

It's a compromise between false negatives and false positives. I would have assumed that false negatives are rare - it's hard to break something just for single individual locales if there is no public locale-dependent API/functionality at all. False positives however will make it difficult for non-regular contributors like me to contribute - if unrelated tests fail, we usually don't know if the test is broken or the code is broken.

I realise however that I might really be misunderstanding the issue that is being tested in the particular test here. So we should really deal with each issue separately. I have thus opened #45013.

simonjayhawkins · 2021-12-22T12:15:52Z

The process is an iterative one. We'll fix the test in your locale, then wait for someone else to inform us the test is broken in their locale and recurse.

FWIW i see local failures too.

LANG             : en_GB.UTF-8
LOCALE           : en_GB.UTF-8

___________________________________________________ test_floating_array_disallows_float16 ___________________________________________________
[gw2] linux -- Python 3.8.12 /home/simon/miniconda3/envs/pandas-dev/bin/python
[XPASS(strict)] numpy does not raise on np.dtype('Float16')

jbrockmendel · 2021-12-22T17:19:31Z

I don't have the bandwidth to continue with this. Please make a PR.

jbrockmendel added 5 commits December 1, 2021 18:46

BUG: FloatingArray(float16data)

ee001a9

GH ref

3524338

xfail

fe0a002

Merge branch 'master' into bug-nullable-16

b28b0be

xfail

d3a3fa3

jreback added the Dtype Conversions Unexpected or buggy dtype conversions label Dec 3, 2021

jreback added this to the 1.4 milestone Dec 3, 2021

jreback added the Bug label Dec 3, 2021

Merge branch 'master' into bug-nullable-16

25c1ddd

jbrockmendel mentioned this pull request Dec 3, 2021

BUG: np.dtype("Float16") raising inconsistently depending on locale numpy/numpy#20512

Closed

jbrockmendel added 3 commits December 3, 2021 12:58

troubleshoot

a515817

only on newer numpy

26ecaae

troubleshoot

1e963be

jreback merged commit 5302d1b into pandas-dev:master Dec 4, 2021

jbrockmendel deleted the bug-nullable-16 branch December 5, 2021 00:09

burnpanck mentioned this pull request Dec 18, 2021

ENH: [Draft] Fix issue #35131 Identify zero-dimensional duck arrays as non-iterable #44626

Merged

4 tasks

burnpanck mentioned this pull request Dec 22, 2021

BUG: pandas/tests/arrays/floating/test_construction.py:test_floating_array_disallows_float16 fails on de_CH locale #45013

Closed

3 tasks

Uh oh!

BUG: FloatingArray(float16data) #44715

BUG: FloatingArray(float16data) #44715

Uh oh!

Conversation

jbrockmendel commented Dec 2, 2021

Uh oh!

jreback commented Dec 3, 2021

Uh oh!

jreback commented Dec 3, 2021

Uh oh!

jbrockmendel commented Dec 3, 2021

Uh oh!

jbrockmendel commented Dec 3, 2021

Uh oh!

jbrockmendel commented Dec 3, 2021

Uh oh!

jbrockmendel commented Dec 4, 2021

Uh oh!

burnpanck commented Dec 18, 2021

Uh oh!

jbrockmendel commented Dec 18, 2021

Uh oh!

burnpanck commented Dec 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockmendel commented Dec 19, 2021

Uh oh!

burnpanck commented Dec 20, 2021

Uh oh!

jbrockmendel commented Dec 21, 2021

Uh oh!

burnpanck commented Dec 22, 2021

Uh oh!

simonjayhawkins commented Dec 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockmendel commented Dec 22, 2021

Uh oh!

Uh oh!

burnpanck commented Dec 19, 2021 •

edited

Loading

simonjayhawkins commented Dec 22, 2021 •

edited

Loading