Skip to content

CI: Fix tests broken by np 1.18 sorting change #29877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 27, 2019

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Nov 27, 2019

Longer-term, do we want to use numpy's new behavior?

cc @TomAugspurger

Update A little more background: consider two ndarrays in np<1.18

arr1 = np.array([1, 2, np.nan, 3, 4])
arr2 = np.array([1, 2, np.datetime64("NaT"), 3, 4], dtype="datetime64[ns]")

>>> np.sort(arr1)
array([ 1.,  2.,  3.,  4., nan])
>>> np.sort(arr2)
array([                          'NaT', '1970-01-01T00:00:00.000000001',
       '1970-01-01T00:00:00.000000002', '1970-01-01T00:00:00.000000003',
       '1970-01-01T00:00:00.000000004'], dtype='datetime64[ns]')

In numpy 1.18, the behavior of np.sort(arr2) is changing to put the NaT at the end instead of at the beginning, to behave more like NaN. This breaks a few tests, which this PR fixes.

Side-note: as of now, 1.18 changes the behavior for datetime64, but not timedelta64.

@jbrockmendel jbrockmendel changed the title Cifix2 CI: Fix tests broken by np 1.18 sorting change Nov 27, 2019
@gfyoung gfyoung added Dependencies Required and optional dependencies Testing pandas testing functions or related to the test suite labels Nov 27, 2019
@jorisvandenbossche
Copy link
Member

Longer-term, do we want to use numpy's new behavior?

We already do? (put NaT last)

In [5]: pd.Series(np.array([1, 2, np.datetime64("NaT"), 3, 4], dtype="datetime64[ns]")).sort_values()  
Out[5]: 
0   1970-01-01 00:00:00.000000001
1   1970-01-01 00:00:00.000000002
3   1970-01-01 00:00:00.000000003
4   1970-01-01 00:00:00.000000004
2                             NaT
dtype: datetime64[ns]

In [6]: np.sort(np.array([1, 2, np.datetime64("NaT"), 3, 4], dtype="datetime64[ns]")) 
Out[6]: 
array([                          'NaT', '1970-01-01T00:00:00.000000001',
       '1970-01-01T00:00:00.000000002', '1970-01-01T00:00:00.000000003',
       '1970-01-01T00:00:00.000000004'], dtype='datetime64[ns]')

@jorisvandenbossche
Copy link
Member

We already do? (put NaT last)

Ah, I tested for Series. But for Index we indeed follow numpy's behaviour and put it first.

@jorisvandenbossche
Copy link
Member

Going to merge this to get CI green again.

Longer-term, do we want to use numpy's new behavior?

@jbrockmendel can you open an issue for this?

Side-note: as of now, 1.18 changes the behavior for datetime64, but not timedelta64.

Known, or something to report to numpy?

@jorisvandenbossche jorisvandenbossche merged commit f855025 into pandas-dev:master Nov 27, 2019
@jorisvandenbossche jorisvandenbossche added this to the 1.0 milestone Nov 27, 2019
@jbrockmendel jbrockmendel deleted the cifix2 branch November 27, 2019 16:23
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dependencies Required and optional dependencies Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants