Skip to content

BUG: Series/Index results in datetime/timedelta incorrectly if inputs are all nan/nat like #13477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 11, 2016

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Jun 18, 2016

@sinhrks sinhrks added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Clean labels Jun 18, 2016
@sinhrks sinhrks added this to the 0.18.2 milestone Jun 18, 2016
@codecov-io
Copy link

codecov-io commented Jun 18, 2016

Current coverage is 84.33%

Merging #13477 into master will not change coverage

@@             master     #13477   diff @@
==========================================
  Files           138        138          
  Lines         51092      51092          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits          43087      43087          
  Misses         8005       8005          
  Partials          0          0          

Powered by Codecov. Last updated by c2cc68d...a017a8c

@@ -1,5 +1,6 @@
import sys
cimport util
from tslib cimport is_timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might move this here as well (though may cause a circular import issue).

@jreback
Copy link
Contributor

jreback commented Jun 18, 2016

@sinhrks puzzled why this fixed things as all you did was move something

@sinhrks
Copy link
Member Author

sinhrks commented Jun 19, 2016

ATM, it doesn't affect to the behavior. However, I found some inconsistency when I adding the test. Updated in #13467.

@sinhrks sinhrks force-pushed the nat_nan branch 3 times, most recently from 32b86f9 to 03c016c Compare June 19, 2016 05:02

if util.is_datetime64_object(val) or val is NaT:
# same logic as is_null_datetimelike except for integer nat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its the same logic, then let's just add a flag that includes/excludes it rather then replicating the logic
(or make another function sitting below it that does that).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback The logic should be updated after a decision in #13467. Could you check?

@sinhrks sinhrks changed the title CLN/TST: Add tests for nan/nat mixed input API: Add tests for nan/nat mixed input Jun 24, 2016
@jreback
Copy link
Contributor

jreback commented Jun 24, 2016

ok as I commented on the other issue, I think all -> NaT/nan should be M8.

@jreback
Copy link
Contributor

jreback commented Jun 30, 2016

@sinhrks ready to merge? or is there still an open question on this

@sinhrks
Copy link
Member Author

sinhrks commented Jul 2, 2016

@jreback Updated based on #13467, and found that there is a test which expect np.timedelta('nat') is regarded as timedelta dtype (I think it's natural)

How about skipping only np.nan and pd.NaT in infer_dtype? I mean:

  • [pd.NaT, np.datetime64('nat')] -> datetime
  • [pd.NaT, np.timedelta64('nat')] -> timedelta
  • [np.datetime64('nat'), np.timedelta64('nat')] -> mixed (object)

@jreback
Copy link
Contributor

jreback commented Jul 2, 2016

yeah that looks right

the problem is we can't know definitively the inferred type if we only have pd.NaT (or just np.nan for that matter)

so I suppose object is correct in those cases (though might break things)

@sinhrks
Copy link
Member Author

sinhrks commented Jul 2, 2016

Yeah. pd.NaT should be ragarded as datetime in all nan-like case, if not specified otherwise (no np.datetime or np.timedelta).

  • [pd.NaT, np.nan] -> datetime (it results in timedelta on current master, NG)
  • [np.nan, pd.NaT] -> datetime (OK on current master)
  • [pd.NaT, pd.NaT] -> datetime

@sinhrks sinhrks changed the title API: Add tests for nan/nat mixed input BUG: Series/Index results in datetime/timedelta incorrectly if inputs are all nan/nat like Jul 2, 2016
@sinhrks
Copy link
Member Author

sinhrks commented Jul 2, 2016

Updated based on the above discussion.

@sinhrks sinhrks force-pushed the nat_nan branch 3 times, most recently from e539f34 to cd859a4 Compare July 5, 2016 19:28
if is_datetime64_array(values):
return 'datetime64'
elif is_timedelta_or_timedelta64_array(values):
return 'timedelta'

elif util.is_integer_object(val):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I thought this was needed maybe on py3? (the integer object check and then check for timedelta)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though test has been passed, will revert it for safety.



cdef inline bint is_null_datetime64(v):
# determine if we have a null for a datetime (or integer versions)x,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to export these (as in tslib.pxd)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed ATM because tslib doesn't use is_null_datetimelike.

@jreback
Copy link
Contributor

jreback commented Jul 6, 2016

@sinhrks looks really good. esp all of the added tests! just a couple of very minor questions. ping when ready.

tm.assert_series_equal(Series([pd.NaT, pd.NaT]), exp)
tm.assert_series_equal(Series(np.array([pd.NaT, pd.NaT])), exp)

exp = Series(pd.DatetimeIndex([pd.NaT, pd.NaT]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference with the exp above (exp = Series([pd.NaT, pd.NaT]))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed because of the bug fix. Removed.

@sinhrks
Copy link
Member Author

sinhrks commented Jul 7, 2016

Now all green.

if is_datetime64_array(values):
return 'datetime64'
elif is_timedelta_or_timedelta64_array(values):
return 'timedelta'

elif is_timedelta(val):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be moved here. Otherwise, timedelta and object mixed data is regarded as "mixed-integer".

@sinhrks sinhrks force-pushed the nat_nan branch 2 times, most recently from e6dc033 to 9dfda77 Compare July 10, 2016 21:40
@jreback
Copy link
Contributor

jreback commented Jul 10, 2016

IIRC this was green. but rebase and ping.

@sinhrks
Copy link
Member Author

sinhrks commented Jul 11, 2016

@jreback @jorisvandenbossche rebased, and all green.

@jorisvandenbossche jorisvandenbossche merged commit 5605f99 into pandas-dev:master Jul 11, 2016
@jorisvandenbossche
Copy link
Member

@sinhrks Thanks!

@sinhrks sinhrks deleted the nat_nan branch July 11, 2016 07:52
nateGeorge pushed a commit to nateGeorge/pandas that referenced this pull request Aug 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Clean Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: NaT/nan mixed input coerces to timedelta64
4 participants