BUG: Series.median() treats NaT as INT_MIN #8617

ischwabacher · 2014-10-23T18:19:04Z

Fixing this will be easier once numpy/numpy#5222 is fixed.

In [1]: import pandas as pd

In [2]: pd.Series([0, pd.tslib.iNaT], dtype='m8[ns]')
Out[2]: 
0   0 days
1      NaT
dtype: timedelta64[ns]

In [3]: _.median()
Out[3]: 
0   -53375 days, 23:53:38.427388
dtype: timedelta64[ns]

jreback · 2014-10-23T18:44:32Z

nothing to do with numpy, this is an error here: https://github.com/pydata/pandas/blob/master/pandas/core/nanops.py#L281(just take out the mask and it should work), the mask is getting overwritten

ischwabacher · 2014-10-23T19:02:07Z

Yeah, it's not the fix that's the problem (though you do have to carefully propagate the reshaping of the input to the mask; you can't just close it into get_median and expect things to work). The reason I care about the upstream problems is that when I added a bunch of all- and partial NaT arrays to test_nanops.py and extended check_funs to use them appropriately, I got a pile of tests that failed because the numpy ufuncs we're testing against mishandle NaT.

jreback · 2014-10-23T19:11:34Z

@ischwabacher I fixed all of this for 0.15.0 (aside from this one), which I guess was just not completely tested. You can fix upstream if you want but tests don't rely on numpy very much for these tests.

ischwabacher · 2014-10-23T21:57:55Z

It looks like max and min with skipna=False are still treating NaT as INT_MIN as well, and argmax and argmin are returning nan. Numpy behavior with floats (which appears to be intended) is to return the index of the first nan, so that arr[argmax(arr)].equals(max(arr)). Or would if np.ndarray had an equals method.

In [81]: pd.Series([0, pd.tslib.iNaT], dtype='M8[ns]').max(skipna=False)
Out[81]: Timestamp('1970-01-01 00:00:00')

In [82]: pd.Series([0, pd.tslib.iNaT], dtype='m8[ns]').max(skipna=False)
Out[82]: Timedelta('0 days 00:00:00')

In [83]: pd.Series([0, pd.tslib.iNaT], dtype='M8[ns]').min(skipna=False)
Out[83]: NaT

In [84]: pd.Series([0, pd.tslib.iNaT], dtype='m8[ns]').min(skipna=False)
Out[84]: NaT

jreback · 2014-10-24T12:27:30Z

for these dtypes skipna=False is not well defined

by definition you are not using a mask so the values are the values
the above look right 2 me (the min/max) for skipna=False

ischwabacher · 2014-10-24T15:01:34Z

Really? Why aren't we treating NaT the same as NaN in that case?

Also, if NDFrame behaves differently from ndarray it makes things harder to test. But I'll look at the tests for floats to see how they're handled, since currently pandas treats NaN differently from the way numpy treats it.

jreback · 2014-10-24T15:18:06Z

so the min/max for skipna=False are wrong above (81,82).

pandas is NOT numpy. so treatment can be different, though not w/o a good reason. (like numpy doesn't handle anything NaT related correctly). usually pretty good with NaN.

Don't test directly with numpy. Write a function that returns the correct result.

ischwabacher · 2014-10-24T15:30:15Z

Don't test directly with numpy. Write a function that returns the correct result.

And then merge it into numpy. :)

I really think that getting the NaT handling working correctly in numpy is the first step to implementing NA_integer_, since that's effectively what NaT is.

jreback · 2014-10-24T15:45:13Z

well you can touch numpy if you would like :).

We are going to bypass numpy to support integer NA, using libdynd, essentially a much better numpy (with the same exact API).

ischwabacher · 2014-10-24T15:51:13Z

Welp, I guess that's a thing.

facaiy · 2016-08-13T09:01:18Z

@jreback It seems that the bug had been fixed.

In [9]: s = pd.Series([0, pd.tslib.iNaT], dtype='m8[ns]')

In [10]: s.median()
Out[10]: Timedelta('0 days 00:00:00')

In [11]: s.min()
Out[11]: Timedelta('0 days 00:00:00')

In [12]: s.max()
Out[12]: Timedelta('0 days 00:00:00')

In [13]: s = pd.Series([0, pd.NaT], dtype='m8[ns]')

In [14]: s.median()
Out[14]: Timedelta('0 days 00:00:00')

In [15]: s.min()
Out[15]: Timedelta('0 days 00:00:00')

In [16]: s.max()
Out[16]: Timedelta('0 days 00:00:00')

facaiy · 2016-08-13T09:03:24Z

More tests may be needed.

jreback · 2016-08-15T11:39:24Z

xref #12992 can you do a pr for some tests?

jreback added Bug Timedelta Timedelta data type Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Oct 23, 2014

jreback added this to the 0.15.1 milestone Oct 23, 2014

jreback modified the milestones: 0.15.2, 0.16.0 Nov 29, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

jreback mentioned this issue Apr 21, 2016

Operations on NaT returning float instead of datetime64[ns] #12941

Closed

jreback modified the milestones: 0.18.2, Next Major Release Apr 21, 2016

jreback mentioned this issue Aug 6, 2016

BUG: agg() function on groupby dataframe changes dtype of datetime64[ns] column to float64 #12992

Closed

jorisvandenbossche added the Testing pandas testing functions or related to the test suite label Aug 27, 2016

jorisvandenbossche mentioned this issue Aug 31, 2016

TST: confirming tests for some fixed issues #14117

Merged

jorisvandenbossche closed this as completed in #14117 Sep 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Series.median() treats NaT as INT_MIN #8617

BUG: Series.median() treats NaT as INT_MIN #8617

ischwabacher commented Oct 23, 2014

jreback commented Oct 23, 2014

ischwabacher commented Oct 23, 2014

jreback commented Oct 23, 2014

ischwabacher commented Oct 23, 2014

jreback commented Oct 24, 2014

ischwabacher commented Oct 24, 2014

jreback commented Oct 24, 2014

ischwabacher commented Oct 24, 2014

jreback commented Oct 24, 2014

ischwabacher commented Oct 24, 2014

facaiy commented Aug 13, 2016

facaiy commented Aug 13, 2016

jreback commented Aug 15, 2016

BUG: Series.median() treats NaT as INT_MIN #8617

BUG: Series.median() treats NaT as INT_MIN #8617

Comments

ischwabacher commented Oct 23, 2014

jreback commented Oct 23, 2014

ischwabacher commented Oct 23, 2014

jreback commented Oct 23, 2014

ischwabacher commented Oct 23, 2014

jreback commented Oct 24, 2014

ischwabacher commented Oct 24, 2014

jreback commented Oct 24, 2014

ischwabacher commented Oct 24, 2014

jreback commented Oct 24, 2014

ischwabacher commented Oct 24, 2014

facaiy commented Aug 13, 2016

facaiy commented Aug 13, 2016

jreback commented Aug 15, 2016