Skip to content

BUG: FloatingArray(float16data) #44715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Dec 4, 2021

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

Broken off from #43930 then fleshed out with targeted tests.

@jreback
Copy link
Contributor

jreback commented Dec 3, 2021

================================== FAILURES ===================================
____________________ test_floating_array_disallows_float16 ____________________
[gw0] win32 -- Python 3.9.7 C:\Miniconda\envs\pandas-dev\python.exe
[XPASS(strict)] numpy does not raise on np.dtype('Float16')
============================== warnings summary ===============================
pandas/tests/extension/test_floating.py::Test2DCompat::test_reductions_2d_axis_none[Float32Dtype-prod]
pandas/tests/extension/test_floating.py::Test2DCompat::test_reductions_2d_axis1[Float32Dtype-prod]
C:\Miniconda\envs\pandas-dev\lib\site-packages\numpy\core\fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

pandas/tests/frame/methods/test_shift.py::TestDataFrameShift::test_shift_dt64values_axis1_invalid_fill[Float32-True]
C:\Miniconda\envs\pandas-dev\lib\site-packages_pytest_io\saferepr.py:58: FutureWarning: Index.ravel returning ndarray is deprecated; in a future version this will return a view on self.
s = repr(x)

-- Docs: https://docs.pytest.org/en/stable/warnings.html

looks like an xpass

@jreback jreback added the Dtype Conversions Unexpected or buggy dtype conversions label Dec 3, 2021
@jreback jreback added this to the 1.4 milestone Dec 3, 2021
@jreback jreback added the Bug label Dec 3, 2021
@jreback
Copy link
Contributor

jreback commented Dec 3, 2021

i guess user facing if you can add a note

@jbrockmendel
Copy link
Member Author

i guess user facing if you can add a note

has one

@jbrockmendel
Copy link
Member Author

pandas/tests/frame/methods/test_shift.py::TestDataFrameShift::test_shift_dt64values_axis1_invalid_fill[Float32-True]
C:\Miniconda\envs\pandas-dev\lib\site-packages_pytest_io\saferepr.py:58: FutureWarning: Index.ravel returning ndarray is deprecated; in a future version this will return a view on self.

Yah I've seen that locally too, haven't found a good way to suppress it.

@jbrockmendel
Copy link
Member Author

Opened numpy/numpy#20512 about the xpass

@jbrockmendel
Copy link
Member Author

greenish

@jreback jreback merged commit 5302d1b into pandas-dev:master Dec 4, 2021
@jbrockmendel jbrockmendel deleted the bug-nullable-16 branch December 5, 2021 00:09
@burnpanck
Copy link
Contributor

This PR has broken the tests for me as a non-US user. The new test here adds a pytest.mark.xfail for me because I'm not on the en_US locale, however, the test does in fact XPASS, which causes the tests to fail in the standard (strict) configuration. Potentially related to #44625?

@jbrockmendel
Copy link
Member Author

What locale are you using? Let’s update the condition to be more specific

@burnpanck
Copy link
Contributor

burnpanck commented Dec 19, 2021

My locale is

LANG=en_GB.UTF-8
LC_ALL=de_CH.UTF-8
LC_CTYPE=UTF-8

that is the locale is de_CH but I expect my OS to talk to me in en_GB.

However, I believe this is going to be a rabbit hole: There will always be a locale that hasn't been considered causing trouble for somebody else - that is exactly what happened in #44625. Either an API is locale-dependent, or it isn't. If it is, we shouldn't try to make any assumptions about the value unless we know exactly what the locale is. Either the locale should be set within the context of the test (difficult, because there is no API for a context-local locale change), or any outcome should be accepted unless the current locale happens to match a specific known one. In the case here, any unknown locale should be directed to pytest.mark.skip, not .xfail.

@jbrockmendel
Copy link
Member Author

Either an API is locale-dependent, or it isn't

Not sure where this cases fits in this dichotomy: numpy's behavior clearly shouldn't depend on the locale, but in practice does.

or any outcome should be accepted unless the current locale happens to match a specific known one. In the case here, any unknown locale should be directed to pytest.mark.skip, not .xfail.

I've been making a push to change skips to xfails because many of the skipped tests get fixed without us being alerted to that fact. I recognize this is a pain point for you, but I think the solution here is to figure out the actually-correct condition under which numpy gives the weird behavior and only xfail in that case.

difficult, because there is no API for a context-local locale change

pd._config.localization.set_locale?

@burnpanck
Copy link
Contributor

Not sure where this cases fits in this dichotomy: numpy's behavior clearly shouldn't depend on the locale, but in practice does.

You are right of course, there is a third option: An API that is not documented can do anything it wants. The common cases here is of course the exception path: Take any documented API and use it in an unsupported way - in python we expect to get an Exception, but there is no specification how that exception looks like. The thing is, both here and in #44625 we acknowledge that the exception message is in fact locale-dependent, as we modify the test behaviour based on the current locale.

I've been making a push to change skips to xfails because many of the skipped tests get fixed without us being alerted to that fact. I recognize this is a pain point for you, but I think the solution here is to figure out the actually-correct condition under which numpy gives the weird behavior and only xfail in that case.

If you want to mark a test-case to fail due to an upstream bug that is out of control xfail is of course the right thing to do. However, in the case here (that is the new test test_floating_array_disallows_float16), the xfail does not concern the numpy behaviour! The xfail marks concerns the behaviour of the test-case. As it stands, the test-case here expects numpy to throw a specific exception with a specific message iff the locale is en_US - with that xfail, it also expects every other locale in the world not to throw that exception. That assumption is just plain simply wrong, and thus the test effectively broken.

I don't know about you, but I have no clue how I would be able to predict the behaviour for every possible locale. Unless you can find a reasonably efficient way to identify the behaviour of a locale-dependent API for every conceivable locale (and the CI tests cover every such locale), I stand by my assessment: We should not make any assumptions about unknown locales.

IMHO, the best fix for both issues (that is the test this PR broke, as well as #44625) is to only check the specific exception message when the locale is among a small restricted set of known locales (such as en_US only), and for every other locale just check the exception type. I'm willing to create a PR to that end. The other option would be to force the locale within those tests, but changing the locale (even temporarily) is often frowned upon...:

pd._config.localization.set_locale

Didn't know that one, thanks for the pointer! It appears to be a context manager that uses locale.setlocale, which is documented as not thread-safe; it still changes the locale globally (process-wide) - and has no understanding of PEP567 contexts. In general, it would be a very hard task to implement such a context-local locale change: The locale influences some C-library functions provided by the system and used by CPython. One would have to avoid any locale-dependent C-library function in CPython, because the C-library does not specify any method even for a thread-local change to the locale.

@jbrockmendel
Copy link
Member Author

I don't know about you, but I have no clue how I would be able to predict the behaviour for every possible locale. Unless you can find a reasonably efficient way to identify the behaviour of a locale-dependent API for every conceivable locale (and the CI tests cover every such locale)

#44972?

As it stands, the test-case here expects numpy to throw a specific exception with a specific message iff the locale is en_US - with that xfail, it also expects every other locale in the world not to throw that exception. That assumption is just plain simply wrong, and thus the test effectively broken.

The process is an iterative one. We'll fix the test in your locale, then wait for someone else to inform us the test is broken in their locale and recurse.

I recognize that is an annoyance, but the alternative of skipping would mean that functionality would be untested in e.g. your locale.

which is documented as not thread-safe [...]

Good to know, thanks!

@burnpanck
Copy link
Contributor

I recognize that is an annoyance, but the alternative of skipping would mean that functionality would be untested in e.g. your locale.

It's a compromise between false negatives and false positives. I would have assumed that false negatives are rare - it's hard to break something just for single individual locales if there is no public locale-dependent API/functionality at all. False positives however will make it difficult for non-regular contributors like me to contribute - if unrelated tests fail, we usually don't know if the test is broken or the code is broken.

I realise however that I might really be misunderstanding the issue that is being tested in the particular test here. So we should really deal with each issue separately. I have thus opened #45013.

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Dec 22, 2021

The process is an iterative one. We'll fix the test in your locale, then wait for someone else to inform us the test is broken in their locale and recurse.

FWIW i see local failures too.

LANG             : en_GB.UTF-8
LOCALE           : en_GB.UTF-8
___________________________________________________ test_floating_array_disallows_float16 ___________________________________________________
[gw2] linux -- Python 3.8.12 /home/simon/miniconda3/envs/pandas-dev/bin/python
[XPASS(strict)] numpy does not raise on np.dtype('Float16')

@jbrockmendel
Copy link
Member Author

I don't have the bandwidth to continue with this. Please make a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants