Skip to content

DOC: update the pandas.Series.shift docstring #20472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 4, 2018

Conversation

ZackStone
Copy link
Contributor

@ZackStone ZackStone commented Mar 23, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
####################### Docstring (pandas.Series.shift)  #######################
################################################################################

Shift index by desired number of periods with an optional time freq.

When freq is not passed, shift the index without realign the data.
If freq is passed (in this case, the index must be date or datetime),
the index will be increased using the periods and the freq.

Parameters
----------
periods : int
    Number of periods to move, can be positive or negative.
freq : DateOffset, timedelta, or time rule string, optional
    Increment to use from the tseries module or time rule (e.g. 'EOM').
    See Notes.
axis : int or str
    Shift direction. {0, 'index'}.

Notes
-----
If freq is specified then the index values are shifted but the data
is not realigned. That is, use freq if you would like to extend the
index when shifting and preserve the original data.

Returns
-------
shifted : Series

Examples
--------
>>> df = pd.DataFrame({'Col1': [10, 20, 30, 20, 15, 30, 45],
...                    'Col2': [13, 23, 33, 23, 18, 33, 48],
...                    'Col3': [17, 27, 37, 27, 22, 37, 52]})
>>> print(df)
   Col1  Col2  Col3
0    10    13    17
1    20    23    27
2    30    33    37
3    20    23    27
4    15    18    22
5    30    33    37
6    45    48    52

>>> df.shift(periods=3)
   Col1  Col2  Col3
0   NaN   NaN   NaN
1   NaN   NaN   NaN
2   NaN   NaN   NaN
3  10.0  13.0  17.0
4  20.0  23.0  27.0
5  30.0  33.0  37.0
6  20.0  23.0  27.0

>>> df.shift(periods=1, axis=1)
   Col1  Col2  Col3
0   NaN  10.0  13.0
1   NaN  20.0  23.0
2   NaN  30.0  33.0
3   NaN  20.0  23.0
4   NaN  15.0  18.0
5   NaN  30.0  33.0
6   NaN  45.0  48.0

>>> import datetime
>>> names = ['João', 'Maria', 'Emanuel', 'Jussara', 'José']
>>> dates = [datetime.datetime(2018, 3, 1, 11, 15),
...          datetime.datetime(2018, 3, 5, 11, 15),
...          datetime.datetime(2018, 3, 10, 11, 15),
...          datetime.datetime(2018, 3, 15, 11, 15),
...          datetime.datetime(2018, 3, 20, 11, 15)]
>>> df = pd.DataFrame(data={'names': names}, index=dates)
>>> print(df)
                       names
2018-03-01 11:15:00     João
2018-03-05 11:15:00    Maria
2018-03-10 11:15:00  Emanuel
2018-03-15 11:15:00  Jussara
2018-03-20 11:15:00     José

>>> df.shift(periods=2, freq='D')
                       names
2018-03-03 11:15:00     João
2018-03-07 11:15:00    Maria
2018-03-12 11:15:00  Emanuel
2018-03-17 11:15:00  Jussara
2018-03-22 11:15:00     José

>>> df.shift(periods=75, freq='min')
                       names
2018-03-01 12:30:00     João
2018-03-05 12:30:00    Maria
2018-03-10 12:30:00  Emanuel
2018-03-15 12:30:00  Jussara
2018-03-20 12:30:00     José

See Also
--------
slice_shift: Equivalent to shift without copying data.
tshift: Shift the time index, using the index’s frequency if available.

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.Series.shift" correct. :)

@@ -6815,16 +6815,21 @@ def mask(self, cond, other=np.nan, inplace=False, axis=None, level=None,
errors=errors)

_shared_docs['shift'] = ("""
Shift index by desired number of periods with an optional time freq
Shift index by desired number of periods with an optional time freq.
Copy link
Member

@WillAyd WillAyd Mar 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References to parameters in the descriptions / notes should be put in backticks, so `freq` wherever applicable (couple of other instances below)

Shift index by desired number of periods with an optional time freq
Shift index by desired number of periods with an optional time freq.

When freq is not passed, shift the index without realign the data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"realigning" instead of "realign"

Shift index by desired number of periods with an optional time freq.

When freq is not passed, shift the index without realign the data.
If freq is passed (in this case, the index must be date or datetime),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if freq is passed and the index is not datetime-like? Perhaps this should have a Raises section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the docs to "Raises section"? I could not find it in https://python-sprints.github.io/pandas/guide/pandas_docstring.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would look something like:

Returns
--------
same_class : yada yada

Raises
------
ValueError : something went terribly horribly wrong

freq : DateOffset, timedelta, or time rule string, optional
Increment to use from the tseries module or time rule (e.g. 'EOM').
See Notes.
axis : %(axes_single_arg)s
axis : int or str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The axis parameter has it's own standard (see the guide):

axis : {0 orindex’, 1 orcolumns’, None}, default None


Examples
--------
>>> df = pd.DataFrame({'Col1': [10, 20, 30, 20, 15, 30, 45],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad example but I think we could save some space - could you limit the example to 5 rows of data? Shouldn't drastically affect your examples below

>>> df = pd.DataFrame({'Col1': [10, 20, 30, 20, 15, 30, 45],
... 'Col2': [13, 23, 33, 23, 18, 33, 48],
... 'Col3': [17, 27, 37, 27, 22, 37, 52]})
>>> print(df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm well your DataFrame may be simple enough that you don't need to explicitly print it. That said, if you do need to print you can just say df instead of print(df).

I'll leave it as a judgment call on your end if you want this, but if so definitely get rid of the surrounding print statement


See Also
--------
slice_shift: Equivalent to shift without copying data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need whitespace before each colon in this section


Parameters
----------
periods : int
Number of periods to move, can be positive or negative
Number of periods to move, can be positive or negative.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor but you want a semicolon instead of a comma here. Perhaps saying "shift" instead of move would be better

@jreback jreback added Docs Datetime Datetime data dtype labels Mar 26, 2018

See Also
--------
slice_shift: Equivalent to shift without copying data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't list slice_shift, this is being deprecated / removed

@codecov
Copy link

codecov bot commented Mar 26, 2018

Codecov Report

Merging #20472 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20472      +/-   ##
==========================================
+ Coverage   92.22%   92.23%   +<.01%     
==========================================
  Files         161      161              
  Lines       51187    51197      +10     
==========================================
+ Hits        47209    47220      +11     
+ Misses       3978     3977       -1
Flag Coverage Δ
#multiple 90.61% <100%> (ø) ⬆️
#single 42.27% <100%> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/core/generic.py 96.81% <100%> (ø) ⬆️
pandas/core/arrays/timedeltas.py 93.75% <0%> (-0.56%) ⬇️
pandas/core/series.py 93.87% <0%> (ø) ⬆️
pandas/core/indexes/timedeltas.py 90.74% <0%> (+0.12%) ⬆️
pandas/core/indexes/frozen.py 92.1% <0%> (+0.43%) ⬆️
pandas/core/arrays/period.py 98.08% <0%> (+0.59%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f71755...4d9c3c0. Read the comment docs.

@ZackStone
Copy link
Contributor Author

@jreback, @WillAyd: updated.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more changes. Can you also add a Raises section to describe what happens when freq is provided for something other than a DateTimeIndex?


>>> import datetime
>>> names = ['João', 'Maria', 'Emanuel', 'Jussara', 'José']
>>> dates = [datetime.datetime(2018, 3, 1, 11, 15),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be done in a more succinct manner by using pd.date_range. Trying something like:

pd.date_range(start='3/1/18 11:15:00', freq='3D', periods=5)

Can modify date / freq to whatever you want. Can get rid of the date time import then as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one!


Notes
-----
If freq is specified then the index values are shifted but the data
is not realigned. That is, use freq if you would like to extend the
If `freq` is specified then the index values are shifted but the data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reinforce this concept it would be good to add into the examples where appropriate (see below)

3 NaN 30.0 33.0
4 NaN 45.0 48.0

>>> import datetime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to above comment, this example could use an introduction along the lines of "When using the freq parameter with a :class:pandas.DateTimeIndex the index values are shifted but the data is not realigned."

If freq is specified then the index values are shifted but the data
is not realigned. That is, use freq if you would like to extend the
If `freq` is specified then the index values are shifted but the data
is not realigned. That is, use `freq` if you would like to extend the
index when shifting and preserve the original data.

Returns
-------
shifted : %(klass)s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are just returning a single element then you do not need to list the name of that variable in the returns, so here get rid of "shifted :" The returned object also needs a description indented on the subsequent line


Parameters
----------
periods : int
Number of periods to move, can be positive or negative
Number of periods to shift; can be positive or negative.
freq : DateOffset, timedelta, or time rule string, optional
Increment to use from the tseries module or time rule (e.g. 'EOM').
See Notes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the entire Notes section is specific to freq, I'd prefer to remove it and move the content here.


Parameters
----------
periods : int
Number of periods to move, can be positive or negative
Number of periods to shift; can be positive or negative.
freq : DateOffset, timedelta, or time rule string, optional
Increment to use from the tseries module or time rule (e.g. 'EOM').
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "Increment" supposed to be a DateOffest? If so, let's use that term consistently.

Could you also link to the frequency aliases in timeseries.rst? timeseries.offset_aliases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't get it... Can you explain in other words?

4 NaN 45.0 48.0

>>> import datetime
>>> names = ['João', 'Maria', 'Emanuel', 'Jussara', 'José']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-ASCII characters will cause encoding issues on Py2 without an encoding declaration. Probably best to stcik to ASCII till we're py3 only.

2018-03-10 11:15:00 Emanuel
2018-03-15 11:15:00 Jussara
2018-03-20 11:15:00 José

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a bit of an explanation between these? Especially explaining the freq argument, as people may not know what 'D' or 'min' refer to.

2018-03-15 12:30:00 Jussara
2018-03-20 12:30:00 José

See Also
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Also goes before Examples

@datapythonista
Copy link
Member

@YitzhakAndrade seems like you've got other changes besides yours in this PR.

Can you do the next in your branch:

  • git fetch upstream
  • git merge upstream/master
  • git reset --soft upstream/master
  • git commit -m "DOC: update the pandas.Series.shift docstring"

Thanks!

Copy link
Contributor Author

@ZackStone ZackStone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOC: update the pandas.Series.shift docstring

@datapythonista
Copy link
Member

Thanks for the update @YitzhakAndrade. Seems like you still have unrelated changes in this PR. Could you try what I said in my last comment?

For the next time, I think the problem can be related to using master for the changes, it's better to use a branch.

@jreback
Copy link
Contributor

jreback commented Nov 1, 2018

can you merge master and update

@pep8speaks
Copy link

Hello @YitzhakAndrade! Thanks for updating the PR.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel one more if you don't mind.

@datapythonista datapythonista mentioned this pull request Nov 3, 2018
4 tasks
PeriodIndex.shift : Shift values of PeriodIndex.
Number of periods to shift. Can be positive or negative.
freq : DateOffset, timedelta, or str, optional
Offset to use from the tseries module or time rule (e.g. 'EOM').
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth specifying tseries.offsets?

If `freq` is specified then the index values are shifted but the
data is not realigned. That is, use `freq` if you would like to
extend the index when shifting and preserve the original data.
axis : {0 or ‘index’, 1 or ‘columns’, None}, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we usually avoid fancy quotes?

@jbrockmendel
Copy link
Member

Couple of comments, agreement with points from Will and Tom. Generally looks nice.

…licate content with the freq param description
@datapythonista
Copy link
Member

Thanks for the review @jbrockmendel, very useful. The docstring can surely be improved, with the raises section, and more examples, but I prefer to leave that for someone else in a separate PR, and just merge this as it is, as the PR is abandoned.

@jreback jreback added this to the 0.24.0 milestone Nov 4, 2018
@jreback jreback merged commit 471fb0e into pandas-dev:master Nov 4, 2018
@jreback
Copy link
Contributor

jreback commented Nov 4, 2018

thanks @YitzhakAndrade

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants