Skip to content

API: Series and DataFrame constructors to return shallow copy (i.e. don't share index) from another Series/DataFrame #50539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5 changes: 5 additions & 0 deletions doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -596,6 +596,11 @@ Other API changes
or :attr:`~DataFrame.iloc` (thus, ``df.loc[:, :]`` or ``df.iloc[:, :]``) now returns a
new DataFrame (shallow copy) instead of the original DataFrame, consistent with other
methods to get a full slice (for example ``df.loc[:]`` or ``df[:]``) (:issue:`49469`)
- The :class:`Series` and :class:`DataFrame` constructors will now return a shallow copy
(i.e. share data, but not attributes) when passed a Series and DataFrame, respectively,
and with the default of ``copy=False`` (and if no other triggers a copy). Previously,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and if no other triggers a copy" is there a word missing here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suppose I meant "no other keyword triggers a copy", eg by passing a dtype, or by passing index/columns causing a reindex.

the new Series or DataFrame would share the index attribute (e.g. ``df.index = ...``
would also update the index of the parent or child) (:issue:`49523`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC it is setting df.index.name = ..., not setting df.index itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, actually the index itself, see the code snippet in the issue: #49523

This is because right now the returned Series is actually sharing the same manager, and so setting df.index = .. updates the index attribute of the manager in-place, and since they share the same manager, also the other Series/DataFrame's index gets updated.

I suppose actually also after this change with doing a proper shallow copy, they still share the actual Index object, and so mutating an attribute of the index (eg df.index.name = ..) will still propagate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm: as expected, mutating index' attributes is still propagated (since the Index object is shared, because immutable):

In [1]: s = pd.Series([1, 2, 3])
   ...: s2 = pd.Series(s)

In [2]: s2.index.name = "test!"

In [3]: s
Out[3]: 
test!
0    1
1    2
2    3
dtype: int64

We should maybe consider also "shallow copying" the index when creating a shallow copy of a Series/DataFrame, i.e. create a new Index but viewing the same data, to avoid the above. But that's a bigger change / for a separate issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should maybe consider also "shallow copying" the index when creating a shallow copy of a Series/DataFrame, i.e. create a new Index but viewing the same data, to avoid the above. But that's a bigger change / for a separate issue.

+1

- Disallow computing ``cumprod`` for :class:`Timedelta` object; previously this returned incorrect values (:issue:`50246`)
- Loading a JSON file with duplicate columns using ``read_json(orient='split')`` renames columns to avoid duplicates, as :func:`read_csv` and the other readers do (:issue:`50370`)
- :func:`to_datetime` with ``unit`` of either "Y" or "M" will now raise if a sequence contains a non-round ``float`` value, matching the ``Timestamp`` behavior (:issue:`50301`)
Expand Down
2 changes: 2 additions & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -641,6 +641,8 @@ def __init__(

if isinstance(data, DataFrame):
data = data._mgr
if not copy:
data = data.copy(deep=False)

if isinstance(data, (BlockManager, ArrayManager)):
# first check if a Manager is passed without any other arguments
Expand Down
5 changes: 1 addition & 4 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -426,10 +426,7 @@ def __init__(
elif isinstance(data, Series):
if index is None:
index = data.index
if using_copy_on_write():
data = data._mgr.copy(deep=False)
else:
data = data._mgr
data = data._mgr.copy(deep=False)
else:
data = data.reindex(index, copy=copy)
copy = False
Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -2100,6 +2100,17 @@ def test_constructor_frame_copy(self, float_frame):
assert (cop["A"] == 5).all()
assert not (float_frame["A"] == 5).all()

def test_constructor_frame_shallow_copy(self, float_frame):
# constructing a DataFrame from DataFrame with copy=False should still
# give a "shallow" copy (share data, not attributes)
# https://github.com/pandas-dev/pandas/issues/49523
orig = float_frame.copy()
cop = DataFrame(float_frame)
assert cop._mgr is not float_frame._mgr
# Overwriting index of copy doesn't change original
cop.index = np.arange(len(cop))
tm.assert_frame_equal(float_frame, orig)

def test_constructor_ndarray_copy(self, float_frame, using_array_manager):
if not using_array_manager:
df = DataFrame(float_frame.values)
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/series/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -721,6 +721,18 @@ def test_constructor_limit_copies(self, index):
# we make 1 copy; this is just a smoke test here
assert s._mgr.blocks[0].values is not index

def test_constructor_shallow_copy(self):
# constructing a Series from Series with copy=False should still
# give a "shallow" copy (share data, not attributes)
# https://github.com/pandas-dev/pandas/issues/49523
s = Series([1, 2, 3])
s_orig = s.copy()
s2 = Series(s)
assert s2._mgr is not s._mgr
# Overwriting index of s2 doesn't change s
s2.index = ["a", "b", "c"]
tm.assert_series_equal(s, s_orig)

def test_constructor_pass_none(self):
s = Series(None, index=range(5))
assert s.dtype == np.float64
Expand Down