Skip to content

BUG: Special-case setting nan into integer series #54527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 28, 2023

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Aug 13, 2023

totally wrong, please ignore, will update

have updated, maybe it's not totally wrong anymore, but I'm off to bed now🛏️

have woken up, and I think this is alright, so ready for reviews

Should integer interval dtype also not warn when setting np.nan? I've left that one out for now, but interval dtype is far less used than integer so I wouldn't consider that part a blocker to 2.1

Closes #54660

Comment on lines +535 to +546
def has_only_ints_or_nan(floating[:] arr) -> bool:
cdef:
floating val
intp_t i

for i in range(len(arr)):
val = arr[i]
if (val != val) or (val == <int64_t>val):
continue
else:
return False
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one thing that this doesn't yet do, is checking for the proper range for lower bitwidth integers.

For example setting [1000.0, np.nan] into a int8 Series should still raise the warning because 1000 is too big, but this helper function will now say it are only integers or NaN.

@MarcoGorelli MarcoGorelli marked this pull request as ready for review August 22, 2023 10:17
@MarcoGorelli MarcoGorelli requested a review from WillAyd as a code owner August 22, 2023 10:17
@MarcoGorelli MarcoGorelli changed the title wip special-case setting nan into integer series (pdep6) Special-case setting nan into integer series (pdep6) Aug 22, 2023
@MarcoGorelli MarcoGorelli changed the title Special-case setting nan into integer series (pdep6) PDEP: Special-case setting nan into integer series (pdep6) Aug 22, 2023
@MarcoGorelli MarcoGorelli added the PDEP pandas enhancement proposal label Aug 22, 2023
@MarcoGorelli MarcoGorelli added this to the 2.1 milestone Aug 22, 2023
# setting `nan` into an integer series won't raise.
if is_scalar(other) and is_integer_dtype(self.values.dtype):
try:
is_nan = bool(np.isnan(other))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check for all NA values since we hopefully end up with NA only?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting pd.NA or None also cast to float with NaN nowadays, so agreed that those should be catched here as well.

But setting pd.NaT currently casts to object dtype, and that one is maybe not something we want to allow? (i.e. continue to warn for that is fine?) So maybe add a and other is not pd.NaT or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I had that earlier, but @phofl brought up that in the future that would also be treated as NA here and so wouldn't upcast

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but before you were only checking for nan with np.isnan, I think? (plus checking for not being NaT)
To be clear I think we should also check in general with isna so we also check for None and pd.NA. I would maybe just continue disallowing NaT (that currently upcasts to object dtype, not to float)

@lukemanley
Copy link
Member

@MarcoGorelli - looks like you recently updated the title of this PR which seems to be causing other PRs to fail doc builds with:

Traceback (most recent call last):
  File "/home/runner/work/pandas/pandas/web/pandas_web.py", line 475, in <module>
    sys.exit(main(args.source_path, args.target_path))
  File "/home/runner/work/pandas/pandas/web/pandas_web.py", line 427, in main
    context = get_context(config_fname, target_path=target_path)
  File "/home/runner/work/pandas/pandas/web/pandas_web.py", line 382, in get_context
    context = preprocessor(context)
  File "/home/runner/work/pandas/pandas/web/pandas_web.py", line 334, in roadmap_pdeps
    for pdep in sorted(pdeps["items"], key=sort_pdep)
  File "/home/runner/work/pandas/pandas/web/pandas_web.py", line 328, in sort_pdep
    raise ValueError(msg)
ValueError: Could not find PDEP number in 'PDEP: Special-case setting nan into integer series (pdep6)'. Please make sure to
                write the title as: 'PDEP-num: PDEP: Special-case setting nan into integer series (pdep6)'.

@MarcoGorelli MarcoGorelli changed the title PDEP: Special-case setting nan into integer series (pdep6) BUG: Special-case setting nan into integer series (pdep6) Aug 22, 2023
@lukemanley
Copy link
Member

Seems like it doesn’t even allow the “(pdep6)” at the end of the title

@MarcoGorelli MarcoGorelli changed the title BUG: Special-case setting nan into integer series (pdep6) BUG: Special-case setting nan into integer series Aug 22, 2023
@MarcoGorelli MarcoGorelli added Bug and removed PDEP pandas enhancement proposal labels Aug 22, 2023
@MarcoGorelli
Copy link
Member Author

I think it's just the label - I've removed that

@MarcoGorelli MarcoGorelli requested a review from phofl August 25, 2023 14:07
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look good to me.

One remaining question that I commented above on the helper function that I added, about a combination of NaN with int overflow, which should still warn (didn't check if that's the case with this PR). But I think that's also a cornercase that could certainly be fixed in a follow-up

@jorisvandenbossche
Copy link
Member

Should integer interval dtype also not warn when setting np.nan? I've left that one out for now, but interval dtype is far less used than integer so I wouldn't consider that part a blocker to 2.1

Agreed this is not a blocker and can be done later. And I personally also have no strong opinion. I suppose if we follow the same logic, it also shouldn't warn?

@MarcoGorelli
Copy link
Member Author

thanks - merging then, have opened #54790 regarding the interval case

@MarcoGorelli MarcoGorelli merged commit 5adafd6 into pandas-dev:main Aug 28, 2023
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Aug 28, 2023
phofl pushed a commit that referenced this pull request Aug 28, 2023
…o integer series) (#54791)

Backport PR #54527: BUG: Special-case setting nan into integer series

Co-authored-by: Marco Edward Gorelli <[email protected]>
mroeschke pushed a commit to mroeschke/pandas that referenced this pull request Sep 11, 2023
* wip

* add has_only_ints_or_nan cython helper

* wip

* take care of nat

* fixup more tests

* catch interval[int64, right] warning

* just use isna

* fixup tests

* noop

* exclude NaT

---------

Co-authored-by: Joris Van den Bossche <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PDEP-6: special case (allow) upcasting to float when setting NaN in integer series
4 participants