Skip to content

Docstring shift #21039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 24 commits into from
Closed
Changes from 13 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
77f2a2b
added an example to shift
PyJay May 14, 2018
c935896
fixing pep8 errors
PyJay May 14, 2018
a6ac960
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay May 15, 2018
c892655
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay May 26, 2018
3639e6c
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay May 31, 2018
61a4ae1
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay Jun 2, 2018
206b2da
making changes per feedback
PyJay Jun 2, 2018
c31d1a1
fixing pep8 errors
PyJay Jun 2, 2018
73a640f
fix trailing whitespace
PyJay Jun 2, 2018
5bee1e3
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay Jun 28, 2018
de0b7a1
WIP: adding more detail in docstring
PyJay Jun 28, 2018
739f8a1
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay Jul 1, 2018
a9550fa
displaying difference between shift with/without freq
PyJay Jul 1, 2018
67b6ee2
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay Jul 15, 2018
ab4e2f5
wip: updating docstring
PyJay Jul 22, 2018
e7d6f0a
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay Jul 22, 2018
5bcae81
wip: updating docstring
PyJay Jul 22, 2018
17f55d5
wip: updating docstring
PyJay Jul 22, 2018
4dd276a
Docstring updated
PyJay Jul 22, 2018
0b4f41b
fixing pep8 issues
PyJay Jul 22, 2018
7c15b08
fix whitespace in frames
PyJay Jul 22, 2018
950cc2e
fixing whitespace
PyJay Jul 22, 2018
9ec5ed7
fixing whitespace
PyJay Jul 22, 2018
888b97b
fixing pep8 errors
PyJay Jul 22, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 82 additions & 2 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -7793,12 +7793,12 @@ def mask(self, cond, other=np.nan, inplace=False, axis=None, level=None,
errors=errors)

_shared_docs['shift'] = ("""
Shift index by desired number of periods with an optional time freq
Shift index by desired number of periods with an optional time freq.

Parameters
----------
periods : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the default 1

Number of periods to move, can be positive or negative
Number of periods to move, can be positive or negative.
freq : DateOffset, timedelta, or time rule string, optional
Increment to use from the tseries module or time rule (e.g. 'EOM').
See Notes.
Expand All @@ -7810,6 +7810,86 @@ def mask(self, cond, other=np.nan, inplace=False, axis=None, level=None,
is not realigned. That is, use freq if you would like to extend the
index when shifting and preserve the original data.

Examples
--------
Compute the difference between a column in a dataframe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever you reference it be sure to capitalize appropriately the word DataFrame

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This first line isn't necessary - can simply delete

Copy link
Author

@PyJay PyJay Jul 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which first line? This one? "Compute the difference between a column in a dataframe
with grouped data, and its shifted version." Or the blank line before "Examples"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @WillAyd was referring to the "Compute the difference..." line.

with grouped data, and its shifted version.

>>> data = pd.DataFrame({'myvalue': [1, 2, 3, 4, 5, 6],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the variable name df instead of data for consistency

... 'group': ['A', 'A', 'A', 'B', 'B', 'B']},
... index=pd.DatetimeIndex(['2016-06-06',
... '2016-06-08',
... '2016-06-09',
... '2016-06-10',
... '2016-06-12',
... '2016-06-13'],
... name='mydate'))

>>> data
myvalue group
mydate
2016-06-06 1 A
2016-06-08 2 A
2016-06-09 3 A
2016-06-10 4 B
2016-06-12 5 B
2016-06-13 6 B

For the groups compute the difference between current `myvalue` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing the point but I don't think you need this sentence

`myvalue` shifted forward by 1 day.

If the dataframe is shifted without passing a freq argument than the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple typo - "then" instead of "than" here

values simply move down

>>> data[data.group=='A'].myvalue.shift(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of filtering to group A you'd be better served to work with the entire frame

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The group stuff seems to be a distraction from shift at this point.

mydate
2016-06-06 NaN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others might have a differing opinion but I find these examples rather confusing as you really need to think through how the data is indexed.

I mentioned it before but I think the cleanest approach would be the reindex / fill set of operations I posted in the original PR - is there any reason why we can't use that here instead of the UDF? I think it more clearly explains the situation and it will certainly scale better on larger datasets, hence why I'd rather we suggest that type of usage in the documentation.

Copy link
Author

@PyJay PyJay Jul 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - I can use your suggestion instead. You suggested

dt_rng = pd.date_range(data.index.min(), data.index.max())
data = data.reindex(dt_rng)
data['group'] = data['group'].ffill()
data.groupby('group')['myvalue'].transform(lambda x: x-x.shift())

I think I can avoid the UDF by doing

df['myvalue'] - df.groupby('group')['myvalue'].shift(1)

I get the same answer, just wanted to confirm that it's the correct thing to do here?
Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s correct

2016-06-08 1.0
2016-06-09 2.0
Name: myvalue, dtype: float64

What we want however, is to shift myvalue forward by one day in order
to compute the difference.

>>> data[data.group=='A'].myvalue.shift(1, freq=pd.Timedelta('1 days'))
mydate
2016-06-07 1
2016-06-09 2
2016-06-10 3
Name: myvalue, dtype: int64

After considering the grouping we can calculate the difference
as follows

>>> result = data.groupby('group').myvalue.apply(
... lambda x: x - x.shift(1, pd.Timedelta('1 days')))
>>> result
group mydate
A 2016-06-06 NaN
2016-06-07 NaN
2016-06-08 NaN
2016-06-09 1.0
2016-06-10 NaN
B 2016-06-10 NaN
2016-06-11 NaN
2016-06-12 NaN
2016-06-13 1.0
2016-06-14 NaN
Name: myvalue, dtype: float64

Merge result as a column named `delta` to the original data

>>> result.name = 'delta'
>>> pd.merge(data, result.to_frame(), on=['mydate', 'group'])
myvalue group delta
mydate
2016-06-06 1 A NaN
2016-06-08 2 A NaN
2016-06-09 3 A 1.0
2016-06-10 4 B NaN
2016-06-12 5 B NaN
2016-06-13 6 B 1.0

Returns
-------
shifted : %(klass)s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move Return before the examples?

Expand Down