-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API/BUG: always try to operate inplace when setting with loc/iloc[foo, bar] #39163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
One potential issue here is that we don't have a nice way of doing a not-inplace |
this lgtm. can you merge master and add a whatsnew sub-section (that we can update later for other issues). this is very subtle and need to make it clear what is happening. |
…i-setitem-inplace
…i-setitem-inplace
…i-setitem-inplace
…i-setitem-inplace
rebased + green |
…i-setitem-inplace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. any additional comments cc @pandas-dev/pandas-core
Only looked at the whatsnew note for now, will try to take a look at the actual code tomorrow. I find the whatsnew note a bit hard to follow. Currently it focuses on the aspect of how it impacts a potential viewing array on the data. But that's already a quite advanced use case that I think many users won't follow/do. Doesn't have the change impact on the actual dtypes (something more visible). Eg setting ints in a float column preserves the float dtype? (according to your example in the top post). Starting the whatsnew note with such an example might make it easier to grasp what it's about. |
updated the whatsnew, is that clearer @jorisvandenbossche ? |
Thanks, yes, I think that's clearer! |
Looking at the changed tests, I think one potentially problematic aspect is that object dtype gets preserved now when you start from an "empty" (all-NaN object dtype) DataFrame. |
Just checking, when you say "empty", you dont mean What would you want to do here? Could special-case the all-NaN-object case i guess |
looks fine, can you rebase and ping on green |
Sorry for the late answer here. But yes, not size=0, but all NaN (eg That might indeed require special casing all-NaN object. |
thanks @jbrockmendel |
@jbrockmendel can you do the all-NaN object case as a follow-up? @jreback yes, I know, I just commented and didn't mark it with "request changes", but it would be good if you can read the last comment before merging |
@jorisvandenbossche i would suggest you request changes we have a lot of PRa |
sure. let's double-check we're on the same page about what the issue is. The example case is |
Yes, is the column that is being set is all-NA object dtype (regardless whether other columns are not all-NA or not), it got inferred to a new dtype:
|
im OK with this special case. any objections @jreback @TomAugspurger@phofl before i implement? |
(The PR title should have a caveat "in cases that make it to
BlockManager.setitem
)Discussed briefly on today's call. The main idea, as discussed in #38896 and #33457, is that
df[col] = ser
should not alter the existingdf[col]._values
, whiledf.iloc[:, num] = ser
should always try to operate inplace before falling back to casting. This PR focuses on the latter case.ATM, restricting to cases where we a) get to Block.setitem, b) could do the setitem inplace, and c) are setting all the entries in the array, we have 4 different behaviors. Examples are posted at the bottom of this post.
This PR changes Block.setitem so that in the _can_hold_element case we always operate inplace, always retain the same underlying array.
Existing Behavior
new_values
being set are categorical, we overwrite existing values and then discard them, do not get a view on new_values. (only can replicate with Series until BUG: setting categorical values into object dtype DataFrame #39136)