Skip to content

DOC: Add clearer info when copy is False but memory shared only for certain objects #41514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 21, 2021

Conversation

mdhsieh
Copy link
Contributor

@mdhsieh mdhsieh commented May 16, 2021

Added to Series section more info in copy constructor parameter and 2 examples.

Ran validation script:

python scripts/validate_docstrings.py pandas.Series

Output:

################################################################################
########################## Docstring (pandas.Series)  ##########################
################################################################################

One-dimensional ndarray with axis labels (including time series).

Labels need not be unique but must be a hashable type. The object
supports both integer- and label-based indexing and provides a host of
methods for performing operations involving the index. Statistical
methods from ndarray have been overridden to automatically exclude
missing data (currently represented as NaN).

Operations between Series (+, -, /, *, **) align values based on their
associated index values-- they need not be the same length. The result
index will be the sorted union of the two indexes.

Parameters
----------
data : array-like, Iterable, dict, or scalar value
    Contains data stored in Series. If data is a dict, argument order is
    maintained.
index : array-like or Index (1d)
    Values must be hashable and have the same length as `data`.
    Non-unique index values are allowed. Will default to
    RangeIndex (0, 1, 2, ..., n) if not provided. If data is dict-like
    and index is None, then the keys in the data are used as the index. If the
    index is not None, the resulting Series is reindexed with the index values.
dtype : str, numpy.dtype, or ExtensionDtype, optional
    Data type for the output Series. If not specified, this will be
    inferred from `data`.
    See the :ref:`user guide <basics.dtypes>` for more usages.
name : str, optional
    The name to give to the Series.
copy : bool, default False
    Copy input data. If False and the Series returns a `copy`
    of the data, the memory location for the values is not shared.
    If False and the Series returns a `view` on the data,
    the memory location for the values is shared.

Examples
--------
Constructing Series from a dictionary with an Index specified

>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> ser = pd.Series(data=d, index=['a', 'b', 'c'])
>>> ser
a   1
b   2
c   3
dtype: int64

The keys of the dictionary match with the Index values, hence the Index
values have no effect.

>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> ser = pd.Series(data=d, index=['x', 'y', 'z'])
>>> ser
x   NaN
y   NaN
z   NaN
dtype: float64

Note that the Index is first build with the keys from the dictionary.
After this the Series is reindexed with the given Index values, hence we
get all NaN as a result.

Constructing Series from an array with `copy=False`.

>>> r = [1,2]
>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
[1, 2]
>>> ser
0    999
1      2
dtype: int64

The Series returns a `copy` of the original data, so
the original data is unchanged.

Constructing Series from a `numpy.array` with `copy=False`.

>>> r = np.array([1,2])
>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
array([999,   2])
>>> ser
0    999
1      2
dtype: int32

The Series returns a `view` on the original data, so
the original data is changed as well.

################################################################################
################################## Validation ##################################
################################################################################

3 Errors found:
        Parameters {'fastpath'} not documented
        See Also section not found
        flake8 error: E902 PermissionError: [Errno 13] Permission denied: 'C:\\Users\\mdhsi\\AppData\\Local\\Temp\\tmpxz50buet'

mdhsieh added 3 commits May 14, 2021 15:46
…v#41423).

Original data unchanged when Series is a Copy,
but original data is changed when Series is a View.
…v#41423).

After building documentation, found the code examples were not displayed correctly.
Added newlines and rebuilt to fix.
@pep8speaks
Copy link

pep8speaks commented May 16, 2021

Hello @mdhsieh! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-20 00:41:26 UTC

@mdhsieh
Copy link
Contributor Author

mdhsieh commented May 16, 2021

@github-actions pre-commit

Copy link
Member

@mzeitlin11 mzeitlin11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping to clarify this behavior @mdhsieh! Left some comments below

@@ -251,6 +254,36 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
Note that the Index is first build with the keys from the dictionary.
After this the Series is reindexed with the given Index values, hence we
get all NaN as a result.

Constructing Series from an array with `copy=False`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call this a list, since the difference between an array and list in this context is important in terms of ability to share memory.

1 2
dtype: int64

The Series returns a `copy` of the original data, so
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by returns here? It's more like the data backing the series does not share the same memory as the input r.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, by "returns" a copy I meant the Series created had a copy of the data. I guess "returns" is not the correct word to use.


Constructing Series from an array with `copy=False`.

>>> r = [1,2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> r = [1,2]
>>> r = [1, 2]


Constructing Series from a `numpy.array` with `copy=False`.

>>> r = np.array([1,2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> r = np.array([1,2])
>>> r = np.array([1, 2])

@@ -223,7 +223,10 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
name : str, optional
The name to give to the Series.
copy : bool, default False
Copy input data.
Copy input data. If False and the Series returns a `copy`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on usage of "returns" as below.

Generally I think this description might be too verbose for a parameter description and could be shortened to basically just get the idea across that a copy may still be made if necessary (as in the case of a list where we cannot share memory). Then the examples below should suffice to explain the different possible cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype: int32

The Series returns a `view` on the original data, so
the original data is changed as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above on wording of returns

@mzeitlin11 mzeitlin11 added Constructors Series/DataFrame/Index/pd.array Constructors Copy / view semantics Docs labels May 17, 2021
Copy link
Contributor

@attack68 attack68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good start, very similar comments to @mzeitlin11 and suggetsed changes.

Comment on lines 226 to 229
Copy input data. If False and the Series returns a `copy`
of the data, the memory location for the values is not shared.
If False and the Series returns a `view` on the data,
the memory location for the values is shared.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Copy input data. If False and the Series returns a `copy`
of the data, the memory location for the values is not shared.
If False and the Series returns a `view` on the data,
the memory location for the values is shared.
Copy input data. Only affects Series or 1d ndarray input. See examples.

@@ -251,6 +254,36 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
Note that the Index is first build with the keys from the dictionary.
After this the Series is reindexed with the given Index values, hence we
get all NaN as a result.

Constructing Series from an array with `copy=False`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Constructing Series from an array with `copy=False`.
Constructing Series from a list with `copy=False`.

1 2
dtype: int64

The Series returns a `copy` of the original data, so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Series returns a `copy` of the original data, so
Due to input data type the Series is created with a `copy` even though `copy=False`, so

1 2
dtype: int32

The Series returns a `view` on the original data, so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Series returns a `view` on the original data, so
Due to input data type here the Series represents a `view` on the original data, so

mdhsieh added 6 commits May 17, 2021 23:24
…1423).

Remove return, replace with has.
Remove array, replace with list.
Fix array spacing to match other Series examples.
Replace long param info with only affects sentence, based off DataFrame documentation.
Reword example explanation.
Change numpy.array to 1d ndarray to match other documentation, like DataFrame.
…-series-copy"

This reverts commit 5eab8f9, reversing
changes made to 671cf86.

Need to revert merge commit to find out why build is failing after pushing branches.

These were the commands run to do the revert, at previous commit:
check parent branches of the merge commit

git cat-file  -p 5eab8f9

then revert commit

git revert --no-commit -m 2 5eab8f9

Then accepted incoming change in series.py, which removed my info and examples added.
…#41423).

Remove whitespace on blank lines and line break sentences to avoid style errors.
Previous commit had 1 failing check.
@mdhsieh mdhsieh force-pushed the 41423-doc-series-copy branch from a4f512e to 52ce3ba Compare May 18, 2021 21:05
Comment on lines 273 to 281
>>> r = np.array([1, 2])
>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
array([999, 2])
>>> ser
0 999
1 2
dtype: int32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this test is failing in CI:

=================================== FAILURES ===================================
_____________________ [doctest] pandas.core.series.Series ______________________
269     the data is unchanged.
270 
271     Constructing Series from a 1d ndarray with `copy=False`.
272 
273     >>> r = np.array([1, 2])
274     >>> ser = pd.Series(r, copy=False)
275     >>> ser.iloc[0] = 999
276     >>> r
277     array([999,   2])
278     >>> ser
Differences (unified diff with -expected +actual):
    @@ -1,3 +1,3 @@
     0    999
     1      2
    -dtype: int32
    +dtype: int64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh got it, I didn't see that. Appreciate the help.

@mdhsieh
Copy link
Contributor Author

mdhsieh commented May 20, 2021

Thanks @mzeitlin11 @attack68 . I've tried rewording as suggested.

Copy link
Contributor

@attack68 attack68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks clearer to me. thanks.

@jreback jreback added this to the 1.3 milestone May 21, 2021
@jreback jreback merged commit 3980696 into pandas-dev:master May 21, 2021
@jreback
Copy link
Contributor

jreback commented May 21, 2021

thanks @mdhsieh

TLouf pushed a commit to TLouf/pandas that referenced this pull request Jun 1, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Copy / view semantics Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
6 participants