-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: Series and DataFrame constructors to return shallow copy (i.e. don't share index) from another Series/DataFrame #50539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nstructor-shallow-copy
doc/source/whatsnew/v2.0.0.rst
Outdated
@@ -596,6 +596,11 @@ Other API changes | |||
or :attr:`~DataFrame.iloc` (thus, ``df.loc[:, :]`` or ``df.iloc[:, :]``) now returns a | |||
new DataFrame (shallow copy) instead of the original DataFrame, consistent with other | |||
methods to get a full slice (for example ``df.loc[:]`` or ``df[:]``) (:issue:`49469`) | |||
- The :class:`Series` and :class:`DataFrame` constructors will now return a shallow copy | |||
(i.e. share data, but not attributes) when passed a Series and DataFrame, respectively, | |||
and with the default of ``copy=False`` (and if no other triggers a copy). Previously, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"and if no other triggers a copy" is there a word missing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I suppose I meant "no other keyword triggers a copy", eg by passing a dtype, or by passing index/columns causing a reindex.
doc/source/whatsnew/v2.0.0.rst
Outdated
(i.e. share data, but not attributes) when passed a Series and DataFrame, respectively, | ||
and with the default of ``copy=False`` (and if no other triggers a copy). Previously, | ||
the new Series or DataFrame would share the index attribute (e.g. ``df.index = ...`` | ||
would also update the index of the parent or child) (:issue:`49523`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC it is setting df.index.name = ..., not setting df.index itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, actually the index itself, see the code snippet in the issue: #49523
This is because right now the returned Series is actually sharing the same manager, and so setting df.index = ..
updates the index attribute of the manager in-place, and since they share the same manager, also the other Series/DataFrame's index gets updated.
I suppose actually also after this change with doing a proper shallow copy, they still share the actual Index object, and so mutating an attribute of the index (eg df.index.name = ..
) will still propagate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm: as expected, mutating index' attributes is still propagated (since the Index object is shared, because immutable):
In [1]: s = pd.Series([1, 2, 3])
...: s2 = pd.Series(s)
In [2]: s2.index.name = "test!"
In [3]: s
Out[3]:
test!
0 1
1 2
2 3
dtype: int64
We should maybe consider also "shallow copying" the index when creating a shallow copy of a Series/DataFrame, i.e. create a new Index but viewing the same data, to avoid the above. But that's a bigger change / for a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should maybe consider also "shallow copying" the index when creating a shallow copy of a Series/DataFrame, i.e. create a new Index but viewing the same data, to avoid the above. But that's a bigger change / for a separate issue.
+1
…nstructor-shallow-copy
…nstructor-shallow-copy
@jbrockmendel good to go? |
LGTM merge on green |
This changes the Series and DataFrame constructors to return a proper shallow copy (i.e. only share data, not attributes like the index) instead of returning a new Series/DataFrame object but with the same manager, for the case of
Series(s)
orDataFrame(df)
(so no reindexing or casting, and with default ofcopy=False
).See #49523 for an example.
Strictly speaking this is a breaking change (or notable bug fix, not sure if this was done intentionally in the past).
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.