Skip to content

CI: failing timedelta64 test on Linux py37_locale build #33890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jorisvandenbossche opened this issue Apr 30, 2020 · 15 comments
Open

CI: failing timedelta64 test on Linux py37_locale build #33890

jorisvandenbossche opened this issue Apr 30, 2020 · 15 comments
Labels
CI Continuous Integration Timedelta Timedelta data type

Comments

@jorisvandenbossche
Copy link
Member

This is seen on a few PRs (eg https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=34508&view=logs&jobId=c4100758-acc4-5cd0-72d8-90828968cf00&j=c4100758-acc4-5cd0-72d8-90828968cf00)

=================================== FAILURES ===================================
___ TestSeriesDtypes.test_astype_generic_timestamp_no_frequency[timedelta64] ___
[gw0] linux -- Python 3.7.6 /home/vsts/miniconda3/envs/pandas-dev/bin/python

self = <pandas.tests.series.test_dtypes.TestSeriesDtypes object at 0x7f3d21a6c050>
dtype = <class 'numpy.timedelta64'>

    @pytest.mark.parametrize("dtype", [np.datetime64, np.timedelta64])
    def test_astype_generic_timestamp_no_frequency(self, dtype):
        # see gh-15524, gh-15987
        data = [1]
        s = Series(data)
    
        msg = (
            fr"The '{dtype.__name__}' dtype has no unit\. "
            fr"Please pass in '{dtype.__name__}\[ns\]' instead."
        )
        with pytest.raises(ValueError, match=msg):
>           s.astype(dtype)
E           Failed: DID NOT RAISE <class 'ValueError'>

pandas/tests/series/test_dtypes.py:395: Failed
@jorisvandenbossche jorisvandenbossche added the CI Continuous Integration label Apr 30, 2020
@jbrockmendel
Copy link
Member

xref #33898 looks like in this circumstance np.dtype(np.timedelta64) is giving np.dtype("m8[ns]") instead of np.dtype("m8")

        d = np.dtype(dtype)
>       assert d.name in ["datetime64", "timedelta64"]  # troubleshooting CI
E       AssertionError: assert 'timedelta64[ns]' in ['datetime64', 'timedelta64']
E        +  where 'timedelta64[ns]' = dtype('<m8[ns]').name

@jbrockmendel
Copy link
Member

@seberg any guess why np.dtype(np.timedelta64) would behave differently with a locale set?

@seberg
Copy link
Contributor

seberg commented Apr 30, 2020

@jbrockmendel I somewhat thought we had removed locale stuff, but not sure, I will look into it. The astype may try to guess the unit based on the data here. I do not see why it should end up with ns or different things depending on locale, but I will look at those branches.

Just to be sure, is the failure newer, or just something that came up now randomly?

@jbrockmendel
Copy link
Member

Just to be sure, is the failure newer, or just something that came up now randomly?

I've been seeing it for a few days, but its not in the npdev build. Its py37, Linux, numpy 1.18.1

@seberg
Copy link
Contributor

seberg commented Apr 30, 2020

@jbrockmendel you have a setup where you can reproduce this, right? For you np.dtype(np.timedelta64) will keep returning the [ns] variant, right?

The current dtypes work by having a C-side singleton instance. I assume that singleton instance is modified somewhere before this test is executed. I.e. you are not allowed to modify that singleton, but somewhere it does get modified. Seems tricky to track down where it happens before this test if this is the case.

Is there a minimal test run I can do to reproduce this?

@jbrockmendel
Copy link
Member

you have a setup where you can reproduce this, right?

I haven't been able to reproduce this locally; it is in the build that sets the locale to "zh_CN.utf8", which my machine wont let me do

@seberg
Copy link
Contributor

seberg commented Apr 30, 2020

I poked at it for a while, and I honestly have no clue where to search and I am unable to get things running far enough to hope to reproduce it. I see there were a couple of changes e.g. in the json code a few weeks back. Could it be related to that? I tried looking at them and did not see anything obvious, but its pretty tricky and complex C code...

@jbrockmendel
Copy link
Member

Thanks for taking a look. I think we're going to end up xfailing it for the time being.

@seberg
Copy link
Contributor

seberg commented May 1, 2020

I ran the pandas tests through valgrind over night. There is one fishy spot in that json code, although I doubt that it caused this. I will open a PR for that I think, does the test fail fairly reliably on CI?

There were two or so other slightly fishy things seen in valgrind. One a possible one in take, which surprised me (but such errors can also always come through some np.empty() like initialization somewhere).

What worried me a bit more is:

==46670==  Address 0xa33408e2 is 50 bytes inside a block of size 57 free'd
==46670==    at 0x48379AB: free (vg_replace_malloc.c:540)
==46670==    by 0x1DFDC38C: _Py_DECREF (object.h:478)
==46670==    by 0x1DFDC38C: __pyx_pf_6pandas_5_libs_9hashtable_15StringHashTable_16_unique.isra.0 (hashtable.c:22087)
==46670==    by 0x1DFDD141: __pyx_pw_6pandas_5_libs_9hashtable_15StringHashTable_17_unique (hashtable.c:21705)

in the hashtable implementation. Although likely also just an immediate use-after-free which is unlikely to cause actual problems in practice, it may be nice to clean up?

@jbrockmendel
Copy link
Member

I will open a PR for that I think, does the test fail fairly reliably on CI?

it was failing pretty often, is now xfailed.

One a possible one in take, which surprised me

Can you describe this?

Although likely also just an immediate use-after-free which is unlikely to cause actual problems in practice, it may be nice to clean up?

I'll take a look at this, though it is likely past the limits of my C-fu.

@seberg
Copy link
Contributor

seberg commented May 1, 2020

Hmmm, the take one (sorry, I did not use a debug python version here, so the output is not super useful:

.local/lib/python3.8/site-packages/pandas/tests/test_sorting.py s...s....==46670== Invalid read of size 8
==46670==    at 0x1E99C84B: __pyx_pf_6pandas_5_libs_5algos_162take_1d_int64_int64 (algos.c:105992)
==46670==    by 0x1E99C84B: __pyx_pw_6pandas_5_libs_5algos_163take_1d_int64_int64 (algos.c:105872)
==46670==    by 0x5EEF54: UnknownInlinedFun (call.c:742)
<snip>
==46670==  Address 0x8346f190 is 0 bytes after a block of size 48 alloc'd
==46670==    at 0x4838D7B: realloc (vg_replace_malloc.c:836)
==46670==    by 0x53762B: UnknownInlinedFun (obmalloc.c:703)
==46670==    by 0x53762B: _PyObject_GC_Resize (gcmodule.c:2032)
==46670==    by 0x5A7B91: _PyTuple_Resize (tupleobject.c:915)
==46670==    by 0x5AB89C: UnknownInlinedFun (abstract.c:1923)
==46670==    by 0x5AB89C: UnknownInlinedFun (tupleobject.c:707)
==46670==    by 0x5AB89C: tuple_new.lto_priv.0 (tupleobject.c.h:92)
==46670==    by 0x5EF2CA: UnknownInlinedFun (typeobject.c:971)
==46670==    by 0x5EF2CA: _PyObject_MakeTpCall (call.c:159)
==46670==    by 0x56B755: UnknownInlinedFun (abstract.h:125)
==46670==    by 0x56B755: UnknownInlinedFun (abstract.h:115)
==46670==    by 0x56B755: UnknownInlinedFun (ceval.c:4987)
==46670==    by 0x56B755: _PyEval_EvalFrameDefault (ceval.c:3500)
==46670==    by 0x564A21: UnknownInlinedFun (ceval.c:741)
==46670==    by 0x564A21: _PyEval_EvalCodeWithName (ceval.c:4298)
==46670==    by 0x5F1F74: _PyFunction_Vectorcall (call.c:435)
==46670==    by 0x5EEA4D: UnknownInlinedFun (call.c:199)
==46670==    by 0x5EEA4D: PyObject_Call (call.c:227)
==46670==    by 0x6525BAF: PyUFunc_CheckOverride (override.c:666)
==46670==    by 0x64FEA8B: ufunc_generic_call (ufunc_object.c:4693)

I had missed the section part, which seems to occur within __array_ufunc__ usage! Although I wonder if there is some "not correctly holding on to references in NumPy". Not sure if this whole issue may not be related to even to a NumPy bug, hehe.

@seberg
Copy link
Contributor

seberg commented May 1, 2020

Wow, some other random change I made in NumPy made me run into a segfault in a semi-recent change to NumPy, which seems like it is probably related. Still have to investigate more, but I think the bug is probably in NumPy, just so hidden we did not notice it yet.

EDIT: Sorry, false flag :(, it just happened that there was a header issue in the datetime file due to my change...

@jbrockmendel
Copy link
Member

Closable?

@jorisvandenbossche
Copy link
Member Author

This just stopped failing somehow? Or did we actually do something?

@jbrockmendel
Copy link
Member

This just stopped failing somehow? Or did we actually do something?

xfailed the affected test in the relevant conditions

@mroeschke mroeschke added the Timedelta Timedelta data type label Feb 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

4 participants