-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Add clearer info when copy is False but memory shared only for certain objects #41514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@github-actions pre-commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for helping to clarify this behavior @mdhsieh! Left some comments below
pandas/core/series.py
Outdated
@@ -251,6 +254,36 @@ class Series(base.IndexOpsMixin, generic.NDFrame): | |||
Note that the Index is first build with the keys from the dictionary. | |||
After this the Series is reindexed with the given Index values, hence we | |||
get all NaN as a result. | |||
|
|||
Constructing Series from an array with `copy=False`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this a list, since the difference between an array and list in this context is important in terms of ability to share memory.
pandas/core/series.py
Outdated
1 2 | ||
dtype: int64 | ||
|
||
The Series returns a `copy` of the original data, so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by returns here? It's more like the data backing the series does not share the same memory as the input r
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, by "returns" a copy I meant the Series created had a copy of the data. I guess "returns" is not the correct word to use.
pandas/core/series.py
Outdated
|
||
Constructing Series from an array with `copy=False`. | ||
|
||
>>> r = [1,2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> r = [1,2] | |
>>> r = [1, 2] |
pandas/core/series.py
Outdated
|
||
Constructing Series from a `numpy.array` with `copy=False`. | ||
|
||
>>> r = np.array([1,2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> r = np.array([1,2]) | |
>>> r = np.array([1, 2]) |
pandas/core/series.py
Outdated
@@ -223,7 +223,10 @@ class Series(base.IndexOpsMixin, generic.NDFrame): | |||
name : str, optional | |||
The name to give to the Series. | |||
copy : bool, default False | |||
Copy input data. | |||
Copy input data. If False and the Series returns a `copy` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment on usage of "returns" as below.
Generally I think this description might be too verbose for a parameter description and could be shortened to basically just get the idea across that a copy may still be made if necessary (as in the case of a list where we cannot share memory). Then the examples below should suffice to explain the different possible cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html has a good example of this)
pandas/core/series.py
Outdated
dtype: int32 | ||
|
||
The Series returns a `view` on the original data, so | ||
the original data is changed as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above on wording of returns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good start, very similar comments to @mzeitlin11 and suggetsed changes.
pandas/core/series.py
Outdated
Copy input data. If False and the Series returns a `copy` | ||
of the data, the memory location for the values is not shared. | ||
If False and the Series returns a `view` on the data, | ||
the memory location for the values is shared. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy input data. If False and the Series returns a `copy` | |
of the data, the memory location for the values is not shared. | |
If False and the Series returns a `view` on the data, | |
the memory location for the values is shared. | |
Copy input data. Only affects Series or 1d ndarray input. See examples. |
pandas/core/series.py
Outdated
@@ -251,6 +254,36 @@ class Series(base.IndexOpsMixin, generic.NDFrame): | |||
Note that the Index is first build with the keys from the dictionary. | |||
After this the Series is reindexed with the given Index values, hence we | |||
get all NaN as a result. | |||
|
|||
Constructing Series from an array with `copy=False`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constructing Series from an array with `copy=False`. | |
Constructing Series from a list with `copy=False`. |
pandas/core/series.py
Outdated
1 2 | ||
dtype: int64 | ||
|
||
The Series returns a `copy` of the original data, so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Series returns a `copy` of the original data, so | |
Due to input data type the Series is created with a `copy` even though `copy=False`, so |
pandas/core/series.py
Outdated
1 2 | ||
dtype: int32 | ||
|
||
The Series returns a `view` on the original data, so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Series returns a `view` on the original data, so | |
Due to input data type here the Series represents a `view` on the original data, so |
…1423). Remove return, replace with has. Remove array, replace with list. Fix array spacing to match other Series examples. Replace long param info with only affects sentence, based off DataFrame documentation. Reword example explanation. Change numpy.array to 1d ndarray to match other documentation, like DataFrame.
…-series-copy" This reverts commit 5eab8f9, reversing changes made to 671cf86. Need to revert merge commit to find out why build is failing after pushing branches. These were the commands run to do the revert, at previous commit: check parent branches of the merge commit git cat-file -p 5eab8f9 then revert commit git revert --no-commit -m 2 5eab8f9 Then accepted incoming change in series.py, which removed my info and examples added.
…#41423). Remove whitespace on blank lines and line break sentences to avoid style errors. Previous commit had 1 failing check.
a4f512e
to
52ce3ba
Compare
pandas/core/series.py
Outdated
>>> r = np.array([1, 2]) | ||
>>> ser = pd.Series(r, copy=False) | ||
>>> ser.iloc[0] = 999 | ||
>>> r | ||
array([999, 2]) | ||
>>> ser | ||
0 999 | ||
1 2 | ||
dtype: int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this test is failing in CI:
=================================== FAILURES ===================================
_____________________ [doctest] pandas.core.series.Series ______________________
269 the data is unchanged.
270
271 Constructing Series from a 1d ndarray with `copy=False`.
272
273 >>> r = np.array([1, 2])
274 >>> ser = pd.Series(r, copy=False)
275 >>> ser.iloc[0] = 999
276 >>> r
277 array([999, 2])
278 >>> ser
Differences (unified diff with -expected +actual):
@@ -1,3 +1,3 @@
0 999
1 2
-dtype: int32
+dtype: int64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh got it, I didn't see that. Appreciate the help.
Thanks @mzeitlin11 @attack68 . I've tried rewording as suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks clearer to me. thanks.
thanks @mdhsieh |
Added to Series section more info in
copy
constructor parameter and 2 examples.Ran validation script:
Output: