Skip to content

Docstring shift #21039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 24 commits into from
Closed
Changes from 2 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
77f2a2b
added an example to shift
PyJay May 14, 2018
c935896
fixing pep8 errors
PyJay May 14, 2018
a6ac960
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay May 15, 2018
c892655
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay May 26, 2018
3639e6c
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay May 31, 2018
61a4ae1
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay Jun 2, 2018
206b2da
making changes per feedback
PyJay Jun 2, 2018
c31d1a1
fixing pep8 errors
PyJay Jun 2, 2018
73a640f
fix trailing whitespace
PyJay Jun 2, 2018
5bee1e3
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay Jun 28, 2018
de0b7a1
WIP: adding more detail in docstring
PyJay Jun 28, 2018
739f8a1
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay Jul 1, 2018
a9550fa
displaying difference between shift with/without freq
PyJay Jul 1, 2018
67b6ee2
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay Jul 15, 2018
ab4e2f5
wip: updating docstring
PyJay Jul 22, 2018
e7d6f0a
Merge branch 'master' of https://github.com/pandas-dev/pandas into do…
PyJay Jul 22, 2018
5bcae81
wip: updating docstring
PyJay Jul 22, 2018
17f55d5
wip: updating docstring
PyJay Jul 22, 2018
4dd276a
Docstring updated
PyJay Jul 22, 2018
0b4f41b
fixing pep8 issues
PyJay Jul 22, 2018
7c15b08
fix whitespace in frames
PyJay Jul 22, 2018
950cc2e
fixing whitespace
PyJay Jul 22, 2018
9ec5ed7
fixing whitespace
PyJay Jul 22, 2018
888b97b
fixing pep8 errors
PyJay Jul 22, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -7807,6 +7807,66 @@ def mask(self, cond, other=np.nan, inplace=False, axis=None, level=None,
is not realigned. That is, use freq if you would like to extend the
index when shifting and preserve the original data.

Examples
--------
Compute the difference between a column in a dataframe and
its shifted version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End with a .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted


>>> data = pd.DataFrame({'mydate': [pd.to_datetime('2016-06-06'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use pd.Timestamp('2016-06-06') or pd.to_datetime(['2016-06-06', ...]) to convey that to_datetime should be used for arrays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move mydate to index= and remove the set_index below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is useful to have a named index as the label (mydate) is used later in the process (for the left join). Would you prefer if I use index= and then set the label using data.index.name = ?

Copy link
Contributor

@TomAugspurger TomAugspurger May 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. How about

index=pd.DatetimeIndex(['2016-06-08', ... '2016-06-13'], name='mydate')

... pd.to_datetime('2016-06-08'),
... pd.to_datetime('2016-06-09'),
... pd.to_datetime('2016-06-10'),
... pd.to_datetime('2016-06-12'),
... pd.to_datetime('2016-06-13')],
... 'myvalue': [1, 2, 3, 4, 5, 6],
... 'group': ['A', 'A', 'A', 'B', 'B', 'B']})

>>> data.set_index('mydate', inplace=True)
>>> data
myvalue group
mydate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this indented too far? I thought the leftmost output should be directly below the leftmost >.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted

2016-06-06 1 A
2016-06-08 2 A
2016-06-09 3 A
2016-06-10 4 B
2016-06-12 5 B
2016-06-13 6 B

For the groups compute the difference between current *myvalue* and
*myvalue* shifted forward by 1 day

>>> result = data.groupby('group').myvalue.apply(
... lambda x: x - x.shift(1, pd.Timedelta('1 days')))
>>> result
group mydate
A 2016-06-06 NaN
2016-06-07 NaN
2016-06-08 NaN
2016-06-09 1.0
2016-06-10 NaN
B 2016-06-10 NaN
2016-06-11 NaN
2016-06-12 NaN
2016-06-13 1.0
2016-06-14 NaN
Name: myvalue, dtype: float64

Merge result as a column named *delta* to the original data

>>> result.name = 'delta'
>>> data.reset_index().merge(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reset_index and set_index can be avoided in 0.23+ (can join on mix of columns and index names).

Copy link
Author

@PyJay PyJay May 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have had a play and looks like the reset_index and set_index are necessary. reset_index changes the delta (shifted values) from a series to a DataFrame and it exposes mydate as column on the original data which is needed for the left join. And the set_index just sets up mydate as an index like in the original dataset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#21220 for that.

I think pd.merge(data, result.to_frame(), on=['mydate', 'group']) may work

In [45]: pd.merge(data, result.to_frame(), on=['group', 'mydate'])
Out[45]:
            myvalue_x group  myvalue_y
mydate
2016-06-06          1     A        NaN
2016-06-08          2     A        NaN
2016-06-09          3     A        1.0
2016-06-10          4     B        NaN
2016-06-12          5     B        NaN
2016-06-13          6     B        1.0

... result.reset_index(),
... how='left',
... on=['mydate', 'group']).set_index('mydate')
myvalue group delta
mydate
2016-06-06 1 A NaN
2016-06-08 2 A NaN
2016-06-09 3 A 1.0
2016-06-10 4 B NaN
2016-06-12 5 B NaN
2016-06-13 6 B 1.0

Returns
-------
shifted : %(klass)s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move Return before the examples?

Expand Down