-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: df[col] = arr should not overwrite data in df[col] #35417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jbrockmendel! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-12-29 22:43:33 UTC |
What's the summary of the behavior change from 1.0.5? |
@jbrockmendel I attempted a whatsnew in e600237, if you could take a look. |
Going through the failing tests
Looking into |
One question on desired behavior: In [2]: df = pd.DataFrame({"A": [1, 2, 3]})
In [3]: df2 = df.iloc[:]
In [4]: df._mgr.blocks[0].values is df2._mgr.blocks[0].values
Out[4]: True
In [5]: df.loc[:, "A"] = 0
In [6]: df2
Out[6]:
A
0 1
1 2
2 3 |
Agreed. I think thats the behavior with the FIXME comment in test_block_internals. |
The relevant part of this test reads:
I think with the new behavior, the last assertion is now incorrect, as we expect the setitem to create a new array. My question is: do we still need the SettingWithCopyError? |
Sorry, but "OK" I meant OK with updating the tests for the new behavior.
I was wondering this as well. If the warning is about whether or not |
Down to two failing tests:
|
@jbrockmendel 1.2 whatsnew now merged. also you can add back the release note that was deleted in #35271? |
I am not comfortable with merging this right before the RC. (and sorry for my slow response here (I have been quite absent the last weeks for several reasons), which is part of why it's delayed until right before the RC ...) That said, in addition I am also not yet convinced that this PR is worth the 1) subtle breaking changes and 2) performance implications.
I think so, yes. The main point is that a setitem operation, which before this PR could be fully inplace, can now trigger a copy of the (almost) full DataFrame (triggered directly or a potential delayed until a next consolidation). This additional data copy has an inherent cost. And I don't think it can be avoided while preserving the actual original intent of this PR (i.e. not overriding existing data). But it's not only the performance implications, it are also the subtle API changes. And of course it's always a trade-off. In many case a slight performance degradation and some behaviour changes can be worth it for the benefit of the change. I am just not convinced that the benefits of this PR are big enough. |
So we're roughly where we've always been: I think fixing this inconsistency is a bugfix, Joris thinks its an API change. |
there's a dev meeting tomorrow where this can be discussed. |
pandas/core/indexing.py
Outdated
@@ -693,8 +694,23 @@ def _ensure_listlike_indexer(self, key, axis=None, value=None): | |||
# GH#38148 | |||
keys = self.obj.columns.union(key, sort=False) | |||
|
|||
# Try to get the right dtype when we do this reindex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't infer_dtype_from do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that infers a dtype, but we need to pass a scalar fill_value to reindex_axis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this edit as no longer necessary. i think at this point all the controversial bits are gone and we're down to just the bugfix.
closing in favor of #45352 |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
cc @jorisvandenbossche this still fails 7 tests locally and there's one more (commented in-line) test that looks fishy. Extra eyeballs would be welcome.
xref #35271, #35266