Skip to content

Make NatType more Timestamp-like and less datetime.datetime-like #12976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rs2 opened this issue Apr 24, 2016 · 5 comments
Closed

Make NatType more Timestamp-like and less datetime.datetime-like #12976

rs2 opened this issue Apr 24, 2016 · 5 comments
Labels
API Design Datetime Datetime data dtype Internals Related to non-user accessible pandas implementation Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Info Clarification about behavior needed to assess issue
Milestone

Comments

@rs2
Copy link
Contributor

rs2 commented Apr 24, 2016

In [127]: isinstance(pd.Timestamp(np.nan), pd.tslib.Timestamp)
Out[127]: False

In [128]: isinstance(pd.Timestamp('2015'), pd.tslib.Timestamp)
Out[128]: True

In [129]: isinstance(pd.Timestamp(np.nan), datetime.datetime)
Out[129]: True

In [130]: isinstance(pd.Timestamp('2015'), datetime.datetime)
Out[130]: True
@sinhrks
Copy link
Member

sinhrks commented Apr 25, 2016

Can you describe expected output?

@sinhrks sinhrks added Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Info Clarification about behavior needed to assess issue labels Apr 25, 2016
@jreback
Copy link
Contributor

jreback commented Apr 25, 2016

So this is a bit of an API design issue actually. The cython implementation of Timestamp is a sub-class of datetime.datetime (though this itself is a bit tricky). NaT is not technically a sub-class of Timestamp, but of the 'base' class here (datetime.datetime).

However, a lot has changed since the beginning, and now we use NaT for Timedelta as well as Period missing value repr. Further np.nan is the float missing value, but also sub in for strings, boolean.

So we could take several paths here:

  1. missing values are base-classed on their own and don't have a specific dtype
  2. the missing value base class diverges to datetimelike (e.g. like NaT) and non-datetimelike (e.g. np.nan), kind of like now.
  3. We have base classes like above AND have specific dtypes for missing values. E.g. you would effectively have a NaT for M8[ns] and for m8[ns].

In 1) and 2) you could make the objects respond to isinstance checks (e.g. NaT could be both a sub-class of Timestamp AND Timedelta).

So I am not sure that 3) really gains you a lot except more complexity, though you do get the ability to then construct things that don't default e.g. Series([pd.NaT_m8]) then would be dtypes m8[ns] rather than M8[ns].

  1. is rather clean, with the inference based on the dtype of the container object; we can't really support this ATM, until: ENH/INT: libpandas refactor #11970 where all of this is abstracted away.

  2. may be possible with some meta-class hackery / reorg at least for datetimelikes. again prob can't really change np.nan much.

thoughts @wesm @sinhrks @jorisvandenbossche

@jreback jreback added API Design Internals Related to non-user accessible pandas implementation labels Apr 25, 2016
@wesm
Copy link
Member

wesm commented Apr 28, 2016

My preference would be to limit the amount of effort we're expending in advance of the internals overhaul (which may be able to yield a single, global pandas.NA value) we've been discussing. Though there may be some temporary things that will make things more pleasant for users (without adding new and possibly onerous API contracts).

@shoyer
Copy link
Member

shoyer commented Apr 28, 2016

I agree that the existing behavior is strange, but also agree with @wesm that it should be differed to the internals overhaul.

@jreback
Copy link
Contributor

jreback commented May 6, 2017

actually going to close this. NaT mimics almost all methods/attributes of datetimes ATM. Though many of them raise (on purpose). So a user has to be aware of missing values when not using pandas.

@jreback jreback closed this as completed May 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Datetime Datetime data dtype Internals Related to non-user accessible pandas implementation Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

5 participants