-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Special-case setting nan into integer series #54527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Special-case setting nan into integer series #54527
Conversation
def has_only_ints_or_nan(floating[:] arr) -> bool: | ||
cdef: | ||
floating val | ||
intp_t i | ||
|
||
for i in range(len(arr)): | ||
val = arr[i] | ||
if (val != val) or (val == <int64_t>val): | ||
continue | ||
else: | ||
return False | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So one thing that this doesn't yet do, is checking for the proper range for lower bitwidth integers.
For example setting [1000.0, np.nan] into a int8 Series should still raise the warning because 1000 is too big, but this helper function will now say it are only integers or NaN.
pandas/core/internals/blocks.py
Outdated
# setting `nan` into an integer series won't raise. | ||
if is_scalar(other) and is_integer_dtype(self.values.dtype): | ||
try: | ||
is_nan = bool(np.isnan(other)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check for all NA values since we hopefully end up with NA only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting pd.NA or None also cast to float with NaN nowadays, so agreed that those should be catched here as well.
But setting pd.NaT currently casts to object dtype, and that one is maybe not something we want to allow? (i.e. continue to warn for that is fine?) So maybe add a and other is not pd.NaT
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I had that earlier, but @phofl brought up that in the future that would also be treated as NA here and so wouldn't upcast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but before you were only checking for nan with np.isnan
, I think? (plus checking for not being NaT)
To be clear I think we should also check in general with isna
so we also check for None and pd.NA. I would maybe just continue disallowing NaT (that currently upcasts to object dtype, not to float)
@MarcoGorelli - looks like you recently updated the title of this PR which seems to be causing other PRs to fail doc builds with:
|
Seems like it doesn’t even allow the “(pdep6)” at the end of the title |
I think it's just the label - I've removed that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code changes look good to me.
One remaining question that I commented above on the helper function that I added, about a combination of NaN with int overflow, which should still warn (didn't check if that's the case with this PR). But I think that's also a cornercase that could certainly be fixed in a follow-up
Agreed this is not a blocker and can be done later. And I personally also have no strong opinion. I suppose if we follow the same logic, it also shouldn't warn? |
thanks - merging then, have opened #54790 regarding the interval case |
…o integer series) (#54791) Backport PR #54527: BUG: Special-case setting nan into integer series Co-authored-by: Marco Edward Gorelli <[email protected]>
* wip * add has_only_ints_or_nan cython helper * wip * take care of nat * fixup more tests * catch interval[int64, right] warning * just use isna * fixup tests * noop * exclude NaT --------- Co-authored-by: Joris Van den Bossche <[email protected]>
totally wrong, please ignore, will updatehave updated, maybe it's not totally wrong anymore, but I'm off to bed now🛏️have woken up, and I think this is alright, so ready for reviews
Should integer interval dtype also not warn when setting
np.nan
? I've left that one out for now, but interval dtype is far less used than integer so I wouldn't consider that part a blocker to 2.1Closes #54660