Skip to content

BUG: Cast pd.NA to pd.NaT in to_datetime #32214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 26, 2020
Merged

BUG: Cast pd.NA to pd.NaT in to_datetime #32214

merged 7 commits into from
Feb 26, 2020

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Feb 24, 2020

@jorisvandenbossche
Copy link
Member

I am not fully sure we should do this, since pd.NA and pd.NaT behave differently (and we probably want to have a timestamp dtype that uses pd.NA at some point)

@dsaxton
Copy link
Member Author

dsaxton commented Feb 24, 2020

I am not fully sure we should do this, since pd.NA and pd.NaT behave differently (and we probably want to have a timestamp dtype that uses pd.NA at some point)

Ok, I'll close the PR, but should probably leave the issue open for now?

@dsaxton dsaxton closed this Feb 24, 2020
@dsaxton dsaxton deleted the na-date branch February 24, 2020 13:58
@dsaxton dsaxton restored the na-date branch February 24, 2020 17:27
@jreback jreback reopened this Feb 24, 2020
@jreback jreback added this to the 1.0.2 milestone Feb 24, 2020
@jbrockmendel
Copy link
Member

if the user specifically calls to_datetime, it makes sense that pd.NA would cast to NaT

@jorisvandenbossche
Copy link
Member

@jbrockmendel so the reason I initially didn't add this check to checknull_with_nat is because this adds a dependency of tslibs on _libs/missing.pyx, while now the dependency is actually the other way around.

We might want to make missing.pyx independent on the rest of _libs, so tslibs can easily depend on it.

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Datetime Datetime data dtype labels Feb 26, 2020
@@ -22,6 +22,8 @@ from pandas._libs.tslibs.util cimport (
get_nat, is_integer_object, is_float_object, is_datetime64_object,
is_timedelta64_object)

from pandas._libs.missing cimport C_NA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel is this a problem? (the dependency)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid it if possible. The property "tslibs doesnt rely on anything except _config" is really nice for reasoning-about-code purposes.

@dsaxton is it feasible to put pd.NA into a higher-level check than checknull_with_nat?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it may be but I think it would have to be before https://github.com/pandas-dev/pandas/blob/master/pandas/core/arrays/datetimes.py#L1844, do you think that looks possible? There's also the "trick" of not importing NA at all and checking for (val == val) is (val != val) in checknull_with_nat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, going to merge this as-is. @jbrockmendel or @dsaxton if you can figure a way to decouple this a bit would be great (though since missing is pretty dependency free i actually don't mind this)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsaxton im having trouble figuring out which usage of checknull_with_nat is the relevant one here, can you point it out? The only ones I see when grepping are in libmissing.checknull (which already checks for C_NA, which this renders redundant), and in Period.__new__, which seems like it shouldn't be relevant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like i was confusing checknull_with_nat with is_null_datetimelike, never mind

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this suggests we may have too many null-checking-like functions)

@jbrockmendel
Copy link
Member

We might want to make missing.pyx independent on the rest of _libs, so tslibs can easily depend on it.

Maybe a file with just NAType, as that doesn't depend on most of the things in libmissing (many of which do depend on tslibs). This seems like it will be necessary if we want e.g. pd.Timestamp(pd.NA) to give pd.NaT instead of raising

@jbrockmendel
Copy link
Member

Is this something we need for 1.0.2, or can it wait for 1.1.0? if the latter id like to revert and take more time to figure out the dependency structure

@jreback
Copy link
Contributor

jreback commented Feb 26, 2020

Is this something we need for 1.0.2, or can it wait for 1.1.0? if the latter id like to revert and take more time to figure out the dependency structure

this is a usability issue in 1.0.2

happy to change for 1.1 to a proper structure (this is already merged and backport ed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pandas.to_datetime raises when given pd.NA
4 participants