Docstring shift #21039

PyJay · 2018-05-14T21:35:24Z

closes issue when shifting with Timedelta in a groupby #20492
tests passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2018-05-15T02:18:00Z

Codecov Report

Merging #21039 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #21039   +/-   ##
=======================================
  Coverage   91.99%   91.99%           
=======================================
  Files         167      167           
  Lines       50578    50578           
=======================================
  Hits        46530    46530           
  Misses       4048     4048

Flag	Coverage Δ
#multiple	`90.4% <ø> (ø)`	⬆️
#single	`42.17% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/generic.py	`96.47% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0828c25...888b97b. Read the comment docs.

PyJay · 2018-05-15T14:52:11Z

@TomAugspurger @jreback for a check if you have the chance

…cstring_shift

WillAyd · 2018-05-15T19:01:14Z

I think this is missing the main purpose of that thread, namely that shift will not do any implicit inference of ordering or continuity for you.

While the solution you provided "answers" that question, I think it would be more appropriate to document shift with / without reindexing prior to the fact to contrast the difference

PyJay · 2018-05-15T20:02:31Z

@WillAyd Thanks for the feedback.

I am not sure how I could document the case without re-indexing without resulting in an error. I decided to print the intermediary result in an attempt to make this clear. Do you suggest I show the column with the shift applied instead of having it hide behind the lambda operation?

WillAyd · 2018-05-15T20:08:45Z

I don't think whatever you show should have any lambda (UDFs should typically be a "last resort").

I would suggest doing a shift with the base frame, pointing out that values from 06-06 simply get moved down to 06-08 without any regard for the fact that there is a two day gap in between those. Then contrast that by reindexing then shifting to show how to move from 06-06 to 06-07

PyJay · 2018-05-15T20:43:52Z

Okay - will change this to remove usage of lambda.

Doing a shift with the base frame without any re-indexing data.shift(1, pd.Timedelta('1 days'))) moves values from 06 to 07 but I believe the re-indexing and left join is needed because of the grouping and the mismatch of indexing between the original and resulting dataframe as a result of the shift.

TomAugspurger · 2018-05-17T12:48:23Z

pandas/core/generic.py

+        Compute the difference between a column in a dataframe and
+        its shifted version
+
+        >>> data = pd.DataFrame({'mydate': [pd.to_datetime('2016-06-06'),


maybe use pd.Timestamp('2016-06-06') or pd.to_datetime(['2016-06-06', ...]) to convey that to_datetime should be used for arrays.

Move mydate to index= and remove the set_index below.

It is useful to have a named index as the label (mydate) is used later in the process (for the left join). Would you prefer if I use index= and then set the label using data.index.name = ?

Agreed. How about

index=pd.DatetimeIndex(['2016-06-08', ... '2016-06-13'], name='mydate')

TomAugspurger · 2018-05-17T12:48:29Z

pandas/core/generic.py

+        Examples
+        --------
+        Compute the difference between a column in a dataframe and
+        its shifted version


End with a .

TomAugspurger · 2018-05-17T12:49:49Z

pandas/core/generic.py

+        >>> data.set_index('mydate', inplace=True)
+        >>> data
+                        myvalue group
+            mydate


Is this indented too far? I thought the leftmost output should be directly below the leftmost >.

TomAugspurger · 2018-05-17T12:50:49Z

pandas/core/generic.py

+        Merge result as a column named *delta* to the original data
+
+        >>> result.name = 'delta'
+        >>> data.reset_index().merge(


I think the reset_index and set_index can be avoided in 0.23+ (can join on mix of columns and index names).

I have had a play and looks like the reset_index and set_index are necessary. reset_index changes the delta (shifted values) from a series to a DataFrame and it exposes mydate as column on the original data which is needed for the left join. And the set_index just sets up mydate as an index like in the original dataset.

#21220 for that.

I think pd.merge(data, result.to_frame(), on=['mydate', 'group']) may work

In [45]: pd.merge(data, result.to_frame(), on=['group', 'mydate']) Out[45]: myvalue_x group myvalue_y mydate 2016-06-06 1 A NaN 2016-06-08 2 A NaN 2016-06-09 3 A 1.0 2016-06-10 4 B NaN 2016-06-12 5 B NaN 2016-06-13 6 B 1.0

…cstring_shift

pep8speaks · 2018-06-02T17:04:38Z

Hello @PyJay! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 22, 2018 at 22:04 Hours UTC

PyJay · 2018-06-02T17:38:49Z

@TomAugspurger I have updated the docs as per the feedback.

WillAyd · 2018-06-02T22:31:41Z

I still think this is missing the point. Can you show:

The "as is shift", calling out that the values are simply moved down (i.e. 06-06 value goes to 06-08)
A reindexed version where the value from 06-06 subsequently moves to 06-07

Please remove the use of the anonymous function

…cstring_shift

PyJay · 2018-07-01T07:43:06Z

@WillAyd I have updated to address to your first two points. I am still unsure about removing the lambda since the only alternative I can think of is having a one line function customized to the use case to do the same. Maybe I'm missing a better way here?

WillAyd · 2018-07-02T22:55:08Z

pandas/core/generic.py

@@ -7810,6 +7810,86 @@ def mask(self, cond, other=np.nan, inplace=False, axis=None, level=None,
        is not realigned. That is, use freq if you would like to extend the
        index when shifting and preserve the original data.

+        Examples
+        --------
+        Compute the difference between a column in a dataframe


Whenever you reference it be sure to capitalize appropriately the word DataFrame

WillAyd · 2018-07-02T22:55:36Z

pandas/core/generic.py

@@ -7810,6 +7810,86 @@ def mask(self, cond, other=np.nan, inplace=False, axis=None, level=None,
        is not realigned. That is, use freq if you would like to extend the
        index when shifting and preserve the original data.

+        Examples
+        --------
+        Compute the difference between a column in a dataframe


This first line isn't necessary - can simply delete

Which first line? This one? "Compute the difference between a column in a dataframe
with grouped data, and its shifted version." Or the blank line before "Examples"?

I believe @WillAyd was referring to the "Compute the difference..." line.

WillAyd · 2018-07-02T22:55:49Z

pandas/core/generic.py

+        Compute the difference between a column in a dataframe
+        with grouped data, and its shifted version.
+
+        >>> data = pd.DataFrame({'myvalue': [1, 2, 3, 4, 5, 6],


Use the variable name df instead of data for consistency

WillAyd · 2018-07-02T22:56:42Z

pandas/core/generic.py

+        If the dataframe is shifted without passing a freq argument than the
+        values simply move down
+
+        >>> data[data.group=='A'].myvalue.shift(1)


Instead of filtering to group A you'd be better served to work with the entire frame

Agreed. The group stuff seems to be a distraction from shift at this point.

WillAyd · 2018-07-02T22:58:41Z

pandas/core/generic.py

+        For the groups compute the difference between current `myvalue` and
+        `myvalue` shifted forward by 1 day.
+
+        If the dataframe is shifted without passing a freq argument than the


Simple typo - "then" instead of "than" here

WillAyd · 2018-07-02T23:02:09Z

pandas/core/generic.py

+
+        >>> data[data.group=='A'].myvalue.shift(1)
+        mydate
+        2016-06-06    NaN


Others might have a differing opinion but I find these examples rather confusing as you really need to think through how the data is indexed.

I mentioned it before but I think the cleanest approach would be the reindex / fill set of operations I posted in the original PR - is there any reason why we can't use that here instead of the UDF? I think it more clearly explains the situation and it will certainly scale better on larger datasets, hence why I'd rather we suggest that type of usage in the documentation.

Sure - I can use your suggestion instead. You suggested

dt_rng = pd.date_range(data.index.min(), data.index.max())
data = data.reindex(dt_rng)
data['group'] = data['group'].ffill()
data.groupby('group')['myvalue'].transform(lambda x: x-x.shift())

I think I can avoid the UDF by doing

df['myvalue'] - df.groupby('group')['myvalue'].shift(1)

I get the same answer, just wanted to confirm that it's the correct thing to do here?
Thanks.

That’s correct

…cstring_shift

PyJay · 2018-07-22T22:07:16Z

@WillAyd @TomAugspurger Changes made as per feedback.

WillAyd

Almost there!

WillAyd · 2018-07-23T00:27:42Z

pandas/core/generic.py

+        2016-06-13    1.0
+        Freq: D, Name: myvalue, dtype: float64
+
+        Concatenate result as a column named `delta` to the original data


I appreciate the thoroughness you are aiming for here, but let's get rid of this section to the example. concat will have a separate docstring with examples to teach that piece to users should they want to know

WillAyd · 2018-07-23T00:29:04Z

pandas/core/generic.py

+        Examples
+        --------
+
+        >>> df = pd.DataFrame({'group': ['A', 'A', 'A', 'B', 'B', 'B'],


This probably isn't enforceable and maybe it's just a personal preference but I think using key and val would be preferable to group and myvalue. I've seen the former used in quite a few more places so for consistency would be better to use those names.

@datapythonista maybe something to think about

In general I think it makes things clearer if the example is with real world data. In this case, if we just show basic examples of shift (and we avoid using ffill, reindexing, groupby and concat), I don't think we need the column group.

I think the examples are quite good for the cookbook, or the time series documentation, but for the docstring, I think we should show the basic stuff. In this case what I'd do:

Create a simple dataframe (one column and 4 rows for example)

Show .shift() with default arguments

Show how period can change the shifted periods

Show how freq can be used

And I don't know exactly what axis does in this method, it could be shown too.

For all the rest of the stuff, feel free to contribute it to the other documents.

WillAyd · 2018-07-23T00:29:52Z

pandas/core/generic.py

+        2016-06-12	B	5
+        2016-06-13	B	6
+
+        For the groups compute the difference between current `myvalue` and


Maybe I'm missing the point but I don't think you need this sentence

WillAyd · 2018-07-23T00:32:04Z

pandas/core/generic.py

+        For the groups compute the difference between current `myvalue` and
+        `myvalue` shifted forward by 1 day.
+
+        If `myvalue` is shifted then the values will simply move down.


Good start but I think the point to stress here is that "move down" makes no consideration of the dates being in order. So maybe to emphasize better can append something like "...move down one row, regardless of the chronology of the dates"

WillAyd · 2018-07-23T00:33:34Z

pandas/core/generic.py

+        2016-06-13    5.0
+        Name: myvalue, dtype: float64
+
+        We only want to shift `myvalue` forward by one day before computing


Just for clearer wording I'd say "If instead you wanted to shift values forward by a day you can do this by reindexing first and filling key"

WillAyd · 2018-07-23T00:34:37Z

pandas/core/generic.py

+        After considering the grouping we can calculate the difference
+        as follows
+
+        >>> result = df['myvalue'] - df.groupby('group')['myvalue'].shift(1)


Not sure what you are aiming for with the subtraction but I think it's clearer if you just do df.groupby('key').shift() here and show the result

PyJay · 2018-07-23T07:54:49Z

Ahh thanks for the feedback @WillAyd I realise this has dragged on a bit! Will hopefully be able to finalise this weekend. @datapythonista were you thinking of taking this work over?

datapythonista · 2018-07-23T12:21:12Z

pandas/core/generic.py

+        2016-06-11	 B	   NaN	     NaN
+        2016-06-12	 B	   5.0	     NaN
+        2016-06-13	 B	   6.0	     1.0
+
        Returns
        -------
        shifted : %(klass)s


Can you move Return before the examples?

datapythonista · 2018-07-23T12:21:41Z

pandas/core/generic.py

@@ -8012,12 +8012,12 @@ def mask(self, cond, other=np.nan, inplace=False, axis=None, level=None,
                          errors=errors)

    _shared_docs['shift'] = ("""
-        Shift index by desired number of periods with an optional time freq
+        Shift index by desired number of periods with an optional time freq.

        Parameters
        ----------
        periods : int


Can you add the default 1

datapythonista · 2018-07-23T12:31:31Z

pandas/core/generic.py

+        Examples
+        --------
+
+        >>> df = pd.DataFrame({'group': ['A', 'A', 'A', 'B', 'B', 'B'],


In general I think it makes things clearer if the example is with real world data. In this case, if we just show basic examples of shift (and we avoid using ffill, reindexing, groupby and concat), I don't think we need the column group.

I think the examples are quite good for the cookbook, or the time series documentation, but for the docstring, I think we should show the basic stuff. In this case what I'd do:

Create a simple dataframe (one column and 4 rows for example)

Show .shift() with default arguments

Show how period can change the shifted periods

Show how freq can be used

And I don't know exactly what axis does in this method, it could be shown too.

For all the rest of the stuff, feel free to contribute it to the other documents.

datapythonista · 2018-11-03T06:40:11Z

Duplicate of #20472

PyJay added 2 commits May 14, 2018 17:09

added an example to shift

77f2a2b

fixing pep8 errors

c935896

PyJay mentioned this pull request May 14, 2018

issue when shifting with Timedelta in a groupby #20492

Open

Merge branch 'master' of https://github.com/pandas-dev/pandas into do…

a6ac960

…cstring_shift

TomAugspurger reviewed May 17, 2018

View reviewed changes

gfyoung added the Docs label May 21, 2018

PyJay added 4 commits May 26, 2018 17:14

Merge branch 'master' of https://github.com/pandas-dev/pandas into do…

c892655

…cstring_shift

Merge branch 'master' of https://github.com/pandas-dev/pandas into do…

3639e6c

…cstring_shift

Merge branch 'master' of https://github.com/pandas-dev/pandas into do…

61a4ae1

…cstring_shift

making changes per feedback

206b2da

PyJay added 2 commits June 2, 2018 18:33

fixing pep8 errors

c31d1a1

fix trailing whitespace

73a640f

PyJay added 4 commits June 28, 2018 22:56

Merge branch 'master' of https://github.com/pandas-dev/pandas into do…

5bee1e3

…cstring_shift

WIP: adding more detail in docstring

de0b7a1

Merge branch 'master' of https://github.com/pandas-dev/pandas into do…

739f8a1

…cstring_shift

displaying difference between shift with/without freq

a9550fa

WillAyd requested changes Jul 2, 2018

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into do…

67b6ee2

…cstring_shift

datapythonista self-assigned this Jul 22, 2018

wip: updating docstring

ab4e2f5

PyJay added 9 commits July 22, 2018 19:51

Merge branch 'master' of https://github.com/pandas-dev/pandas into do…

e7d6f0a

…cstring_shift

wip: updating docstring

5bcae81

wip: updating docstring

17f55d5

Docstring updated

4dd276a

fixing pep8 issues

0b4f41b

fix whitespace in frames

7c15b08

fixing whitespace

950cc2e

fixing whitespace

9ec5ed7

fixing pep8 errors

888b97b

WillAyd requested changes Jul 23, 2018

View reviewed changes

datapythonista reviewed Jul 23, 2018

View reviewed changes

datapythonista marked this as a duplicate of #20472 Nov 3, 2018

datapythonista closed this Nov 3, 2018

Uh oh!

Docstring shift #21039

Docstring shift #21039

Uh oh!

Conversation

PyJay commented May 14, 2018

Uh oh!

codecov bot commented May 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PyJay commented May 15, 2018

Uh oh!

WillAyd commented May 15, 2018

Uh oh!

PyJay commented May 15, 2018

Uh oh!

WillAyd commented May 15, 2018

Uh oh!

PyJay commented May 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomAugspurger May 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PyJay May 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pep8speaks commented Jun 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on July 22, 2018 at 22:04 Hours UTC

Uh oh!

PyJay commented Jun 2, 2018

Uh oh!

WillAyd commented Jun 2, 2018

Uh oh!

PyJay commented Jul 1, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PyJay Jul 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PyJay Jul 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

codecov bot commented May 15, 2018 •

edited

Loading

PyJay commented May 15, 2018 •

edited

Loading

TomAugspurger May 26, 2018 •

edited

Loading

PyJay May 26, 2018 •

edited

Loading

pep8speaks commented Jun 2, 2018 •

edited

Loading

PyJay Jul 15, 2018 •

edited

Loading

PyJay Jul 15, 2018 •

edited

Loading