Skip to content

API: Series and DataFrame constructors to return shallow copy (i.e. don't share index) from another Series/DataFrame #50539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jorisvandenbossche
Copy link
Member

This changes the Series and DataFrame constructors to return a proper shallow copy (i.e. only share data, not attributes like the index) instead of returning a new Series/DataFrame object but with the same manager, for the case of Series(s) or DataFrame(df) (so no reindexing or casting, and with default of copy=False).

See #49523 for an example.

Strictly speaking this is a breaking change (or notable bug fix, not sure if this was done intentionally in the past).

@@ -596,6 +596,11 @@ Other API changes
or :attr:`~DataFrame.iloc` (thus, ``df.loc[:, :]`` or ``df.iloc[:, :]``) now returns a
new DataFrame (shallow copy) instead of the original DataFrame, consistent with other
methods to get a full slice (for example ``df.loc[:]`` or ``df[:]``) (:issue:`49469`)
- The :class:`Series` and :class:`DataFrame` constructors will now return a shallow copy
(i.e. share data, but not attributes) when passed a Series and DataFrame, respectively,
and with the default of ``copy=False`` (and if no other triggers a copy). Previously,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and if no other triggers a copy" is there a word missing here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suppose I meant "no other keyword triggers a copy", eg by passing a dtype, or by passing index/columns causing a reindex.

(i.e. share data, but not attributes) when passed a Series and DataFrame, respectively,
and with the default of ``copy=False`` (and if no other triggers a copy). Previously,
the new Series or DataFrame would share the index attribute (e.g. ``df.index = ...``
would also update the index of the parent or child) (:issue:`49523`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC it is setting df.index.name = ..., not setting df.index itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, actually the index itself, see the code snippet in the issue: #49523

This is because right now the returned Series is actually sharing the same manager, and so setting df.index = .. updates the index attribute of the manager in-place, and since they share the same manager, also the other Series/DataFrame's index gets updated.

I suppose actually also after this change with doing a proper shallow copy, they still share the actual Index object, and so mutating an attribute of the index (eg df.index.name = ..) will still propagate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm: as expected, mutating index' attributes is still propagated (since the Index object is shared, because immutable):

In [1]: s = pd.Series([1, 2, 3])
   ...: s2 = pd.Series(s)

In [2]: s2.index.name = "test!"

In [3]: s
Out[3]: 
test!
0    1
1    2
2    3
dtype: int64

We should maybe consider also "shallow copying" the index when creating a shallow copy of a Series/DataFrame, i.e. create a new Index but viewing the same data, to avoid the above. But that's a bigger change / for a separate issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should maybe consider also "shallow copying" the index when creating a shallow copy of a Series/DataFrame, i.e. create a new Index but viewing the same data, to avoid the above. But that's a bigger change / for a separate issue.

+1

@jorisvandenbossche
Copy link
Member Author

@jbrockmendel good to go?

@jbrockmendel
Copy link
Member

LGTM merge on green

@jorisvandenbossche jorisvandenbossche merged commit 180d81f into pandas-dev:main Feb 10, 2023
@jorisvandenbossche jorisvandenbossche deleted the series-dataframe-constructor-shallow-copy branch February 10, 2023 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Constructors Series/DataFrame/Index/pd.array Constructors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: should creating a Series from a Series return a shallow copy? (share data, not attributes)
2 participants