-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Docstring shift #21039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docstring shift #21039
Changes from 13 commits
77f2a2b
c935896
a6ac960
c892655
3639e6c
61a4ae1
206b2da
c31d1a1
73a640f
5bee1e3
de0b7a1
739f8a1
a9550fa
67b6ee2
ab4e2f5
e7d6f0a
5bcae81
17f55d5
4dd276a
0b4f41b
7c15b08
950cc2e
9ec5ed7
888b97b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7793,12 +7793,12 @@ def mask(self, cond, other=np.nan, inplace=False, axis=None, level=None, | |
errors=errors) | ||
|
||
_shared_docs['shift'] = (""" | ||
Shift index by desired number of periods with an optional time freq | ||
Shift index by desired number of periods with an optional time freq. | ||
|
||
Parameters | ||
---------- | ||
periods : int | ||
Number of periods to move, can be positive or negative | ||
Number of periods to move, can be positive or negative. | ||
freq : DateOffset, timedelta, or time rule string, optional | ||
Increment to use from the tseries module or time rule (e.g. 'EOM'). | ||
See Notes. | ||
|
@@ -7810,6 +7810,86 @@ def mask(self, cond, other=np.nan, inplace=False, axis=None, level=None, | |
is not realigned. That is, use freq if you would like to extend the | ||
index when shifting and preserve the original data. | ||
|
||
Examples | ||
-------- | ||
Compute the difference between a column in a dataframe | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Whenever you reference it be sure to capitalize appropriately the word There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This first line isn't necessary - can simply delete There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Which first line? This one? "Compute the difference between a column in a dataframe There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe @WillAyd was referring to the "Compute the difference..." line. |
||
with grouped data, and its shifted version. | ||
|
||
>>> data = pd.DataFrame({'myvalue': [1, 2, 3, 4, 5, 6], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use the variable name |
||
... 'group': ['A', 'A', 'A', 'B', 'B', 'B']}, | ||
... index=pd.DatetimeIndex(['2016-06-06', | ||
... '2016-06-08', | ||
... '2016-06-09', | ||
... '2016-06-10', | ||
... '2016-06-12', | ||
... '2016-06-13'], | ||
... name='mydate')) | ||
|
||
>>> data | ||
myvalue group | ||
mydate | ||
2016-06-06 1 A | ||
2016-06-08 2 A | ||
2016-06-09 3 A | ||
2016-06-10 4 B | ||
2016-06-12 5 B | ||
2016-06-13 6 B | ||
|
||
For the groups compute the difference between current `myvalue` and | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I'm missing the point but I don't think you need this sentence |
||
`myvalue` shifted forward by 1 day. | ||
|
||
If the dataframe is shifted without passing a freq argument than the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Simple typo - "then" instead of "than" here |
||
values simply move down | ||
|
||
>>> data[data.group=='A'].myvalue.shift(1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of filtering to group There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. The group stuff seems to be a distraction from |
||
mydate | ||
2016-06-06 NaN | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Others might have a differing opinion but I find these examples rather confusing as you really need to think through how the data is indexed. I mentioned it before but I think the cleanest approach would be the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure - I can use your suggestion instead. You suggested
I think I can avoid the UDF by doing
I get the same answer, just wanted to confirm that it's the correct thing to do here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That’s correct |
||
2016-06-08 1.0 | ||
2016-06-09 2.0 | ||
Name: myvalue, dtype: float64 | ||
|
||
What we want however, is to shift myvalue forward by one day in order | ||
to compute the difference. | ||
|
||
>>> data[data.group=='A'].myvalue.shift(1, freq=pd.Timedelta('1 days')) | ||
mydate | ||
2016-06-07 1 | ||
2016-06-09 2 | ||
2016-06-10 3 | ||
Name: myvalue, dtype: int64 | ||
|
||
After considering the grouping we can calculate the difference | ||
as follows | ||
|
||
>>> result = data.groupby('group').myvalue.apply( | ||
... lambda x: x - x.shift(1, pd.Timedelta('1 days'))) | ||
>>> result | ||
group mydate | ||
A 2016-06-06 NaN | ||
2016-06-07 NaN | ||
2016-06-08 NaN | ||
2016-06-09 1.0 | ||
2016-06-10 NaN | ||
B 2016-06-10 NaN | ||
2016-06-11 NaN | ||
2016-06-12 NaN | ||
2016-06-13 1.0 | ||
2016-06-14 NaN | ||
Name: myvalue, dtype: float64 | ||
|
||
Merge result as a column named `delta` to the original data | ||
|
||
>>> result.name = 'delta' | ||
>>> pd.merge(data, result.to_frame(), on=['mydate', 'group']) | ||
myvalue group delta | ||
mydate | ||
2016-06-06 1 A NaN | ||
2016-06-08 2 A NaN | ||
2016-06-09 3 A 1.0 | ||
2016-06-10 4 B NaN | ||
2016-06-12 5 B NaN | ||
2016-06-13 6 B 1.0 | ||
|
||
Returns | ||
------- | ||
shifted : %(klass)s | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you move |
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the default
1