-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: PeriodIndex fails to handle NA, rather than putting NaT in its place (#46673) #47780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -446,7 +446,7 @@ def test_astype_string_to_extension_dtype_roundtrip( | |||
self, data, dtype, request, nullable_string_dtype | |||
): | |||
if dtype == "boolean" or ( | |||
dtype in ("period[M]", "datetime64[ns]", "timedelta64[ns]") and NaT in data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially addresses #40566 by fixing handling of pd.NA for conversion to period types.
pandas/_libs/tslibs/nattype.pyx
Outdated
@@ -1217,7 +1218,7 @@ cdef inline bint checknull_with_nat(object val): | |||
""" | |||
Utility to check if a value is a nat or not. | |||
""" | |||
return val is None or util.is_nan(val) or val is c_NaT | |||
return val is None or util.is_nan(val) or val is c_NaT or val is C_NA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel I guess NaT is NA-like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depends on context. NA has different semantics than NaN/NaT, so it isn't obvious we should accept it everywhere.
changing it here in particular changes behavior in a bunch of places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel
Thank you for the input. I see how this function is used in a lot of different places, not just in converting to PeriodIndex which is the main thing I was trying to get at. I will try an approach that more directly tackles the specific case of #46673 (assuming that the expected behavior listed in the issue is valid, i.e.
Expected Behavior
The return value of
pd.PeriodIndex(["2022-04-06", "2022-04-07", pd.NA], freq='D')
should be the same as
pd.PeriodIndex(["2022-04-06", "2022-04-07", None], freq='D')
that is:
PeriodIndex(['2022-04-06', '2022-04-07', 'NaT'], dtype='period[D]')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming that the expected behavior listed in the issue is valid
Unfortunately it isn't entirely obvious that this should be the expected behavior. bc pd.NA has different semantics than pd.NaT, we might want to distinguish between them in this context and do some kind of NullableArray[Period]
This also would introduce an inconsistency between PeriodIndex vs the Period constructor, which is a whole other can of worms (and then we get to the same issues for Timestamp/Timedelta...)
Then again, I'm an outlier in crankiness on pd.NA-consistency-headaches, so if @mroeschke is fine with the proposed behavior, i won't make a stink about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM cc @jbrockmendel for another look
Thanks @kapiliyer |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.