ENH: allow .rolling / .expanding as groupby methods #12743

jreback · 2016-03-30T03:38:55Z

closes #12738
closes #12486
closes #12363

more tests (other methods)

~~- [ ] doc section in groupby~~ will do later

In [3]: pd.options.display.max_rows=10

In [4]:    df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
                      'B': np.arange(40)})

In [5]: df
Out[5]: 
    A   B
0   1   0
1   1   1
2   1   2
3   1   3
4   1   4
.. ..  ..
35  3  35
36  3  36
37  3  37
38  3  38
39  3  39

[40 rows x 2 columns]

In [6]:    df.groupby('A').apply(lambda x: x.rolling(4).B.mean())
Out[6]: 
A    
1  0      NaN
   1      NaN
   2      NaN
   3      1.5
   4      2.5
         ... 
3  35    33.5
   36    34.5
   37    35.5
   38    36.5
   39    37.5
Name: B, dtype: float64

In [7]:    df.groupby('A').rolling(4).B.mean()
Out[7]: 
A    
1  0      NaN
   1      NaN
   2      NaN
   3      1.5
   4      2.5
         ... 
3  35    33.5
   36    34.5
   37    35.5
   38    36.5
   39    37.5
Name: B, dtype: float64
In [9]:    df.index = pd.date_range('20130101',freq='s',periods=40)

In [10]:    df
Out[10]: 
                     A   B
2013-01-01 00:00:00  1   0
2013-01-01 00:00:01  1   1
2013-01-01 00:00:02  1   2
2013-01-01 00:00:03  1   3
2013-01-01 00:00:04  1   4
...                 ..  ..
2013-01-01 00:00:35  3  35
2013-01-01 00:00:36  3  36
2013-01-01 00:00:37  3  37
2013-01-01 00:00:38  3  38
2013-01-01 00:00:39  3  39

[40 rows x 2 columns]

In [11]:    df.groupby('A').apply(lambda x: x.resample('4s').mean())
Out[11]: 
                         A     B
A                               
1 2013-01-01 00:00:00  1.0   1.5
  2013-01-01 00:00:04  1.0   5.5
  2013-01-01 00:00:08  1.0   9.5
  2013-01-01 00:00:12  1.0  13.5
  2013-01-01 00:00:16  1.0  17.5
2 2013-01-01 00:00:20  2.0  21.5
  2013-01-01 00:00:24  2.0  25.5
  2013-01-01 00:00:28  2.0  29.5
3 2013-01-01 00:00:32  3.0  33.5
  2013-01-01 00:00:36  3.0  37.5

In [12]:    df.groupby('A').resample('4s').mean()
Out[12]: 
                         A     B
A                               
1 2013-01-01 00:00:00  1.0   1.5
  2013-01-01 00:00:04  1.0   5.5
  2013-01-01 00:00:08  1.0   9.5
  2013-01-01 00:00:12  1.0  13.5
  2013-01-01 00:00:16  1.0  17.5
2 2013-01-01 00:00:20  2.0  21.5
  2013-01-01 00:00:24  2.0  25.5
  2013-01-01 00:00:28  2.0  29.5
3 2013-01-01 00:00:32  3.0  33.5
  2013-01-01 00:00:36  3.0  37.5

jreback · 2016-03-30T03:39:49Z

cc @lminer

jreback · 2016-03-30T15:01:08Z

pandas/core/groupby.py

@@ -794,6 +794,7 @@ def _concat_objects(self, keys, values, not_indexed_same=False):

            if isinstance(result, Series):
                result = result.reindex(ax)
+                result.name = self.name


@sinhrks this fixed a couple of tests in groupby below, but I don't know if we have a related issue. any idea?

related (but did not fix) #12363

Not sure, I can find no open issue.

jreback · 2016-03-31T14:53:48Z

ok, this is ready. comments.

@shoyer @jorisvandenbossche @TomAugspurger @sinhrks

jorisvandenbossche · 2016-03-31T19:49:04Z

Is the example at the top still up to date?
I am wondering a bit if the grouper should be in the result or not?

jorisvandenbossche · 2016-03-31T19:51:44Z

Also, how does the result looks like if you have a non-sorted grouper?
Currently with apply it sorts the groups (but also includes the grouper as the index)

jreback · 2016-04-02T15:10:48Z

@jorisvandenbossche all updated. Had to work thru some issues. But much more fully tested now. This basically replicates what .apply is doing with the new syntax by actually using apply. In theory we could have a more efficient impl, but will leave that for later.

Also this PR cleans up a bunch of cases where the name is not returned for groupbys. So much more consistency now.

jreback · 2016-04-02T15:41:23Z

pandas/core/groupby.py

@@ -339,16 +342,23 @@ def __init__(self, obj, keys=None, axis=0, level=None,
        self.sort = sort
        self.group_keys = group_keys
        self.squeeze = squeeze
+        self.mutated = kwargs.pop('mutated', False)


this is a massive hack using. well hack is the wrong word. Its a way of informing the groupby that we want to force the multiindex construction path which is normally taken only when things are mutated. its not an external visible kw.

jreback · 2016-04-10T22:50:30Z

@jorisvandenbossche if you'd have a look

jorisvandenbossche · 2016-04-11T09:59:29Z

Will take a look later today!

jorisvandenbossche · 2016-04-12T12:01:33Z

doc/source/whatsnew/v0.18.1.txt

+                      'B': np.arange(40)})
+   df
+
+You can now use ``.rolling(..)`` and ``.expanding(..)`` as methods on groupbys. These return another object where you operate.


"These return another object where you operate" -> this is not really a clear sentence. What do exactly want to say?

TomAugspurger · 2016-04-19T12:41:55Z

@jreback one more lost index name, when you group a Series by another (using the original dataframe)

In [27]: df['B'].groupby(df.A).rolling(4).mean()
Out[27]:
A  idx
1  0       NaN
   1       NaN
   2       NaN
   3       1.5
   4       2.5
          ...
3  35     33.5
   36     34.5
   37     35.5
   38     36.5
   39     37.5
dtype: float64

Let me know if you want to fix that here, otherwise I'll open an issue.

TomAugspurger · 2016-04-19T12:45:49Z

And I'm (correctly) seeing a MultiIndex for the df.groupby('A').resample('3s').mean()

jorisvandenbossche · 2016-04-19T12:56:24Z

Will try again (I just fetched the head of this PR, didn't rebase myself as I thought you already did that)

jorisvandenbossche · 2016-04-19T13:11:08Z

@TomAugspurger It's with the rolling case I do not see a MultiIndex (df.groupby('A').rolling(3).mean()), resample did work for me as well.

@jreback Tested it again (fetched this PR, rebased on latest master myself, rebuild pandas to be certain), and still getting the same result as I showed in #12743 (comment) (using python 2.7, numpy 1.10.1, Windows)

jreback · 2016-04-19T13:38:13Z

@jorisvandenbossche hmm, ok I did on 2.7/1.10.4 and it doesn't look right. odd. (on windows)

jreback · 2016-04-19T13:42:04Z

ok, this has been failing on windows on my branch : https://ci.appveyor.com/project/jreback/pandas/build/1.0.2077/job/qfrsv7p46f6ns6xg

jreback · 2016-04-19T13:44:17Z

ok, this is not a 2.7/numpy issue, but something specific to windows.

jreback · 2016-04-19T14:08:35Z

ok pushed a commit to fix. This comes back to some of the 'figuring what the shape of the return' is logic. IOW its a heuristic depending on whether something is mutated (this is the standard case). But when we are doing a chained window/resample we need to force this.

jreback · 2016-04-19T14:48:50Z

pushed a new version, had some flake/formatting issues on windows.

jreback · 2016-04-19T16:21:16Z

@TomAugspurger I picked up that last name fix (was a bug in the concat step).

closes pandas-dev#12738 BUG: allow df.groupby(...).resample(...) to return a Resampler groupby object closes pandas-dev#12486 BUG: consistency of name of returned groupby closes pandas-dev#12363

jreback · 2016-04-22T19:24:23Z

@TomAugspurger @jorisvandenbossche any more comments.

TomAugspurger · 2016-04-23T19:50:49Z

Nothing else from me 👍

jreback · 2016-04-25T14:49:21Z

ok will merge shortly unless @jorisvandenbossche has any further comments.

jorisvandenbossche · 2016-04-26T22:52:56Z

Tested on master, and now indeed the inconsistency is solved! Thanks

ibigquant · 2017-08-11T11:15:58Z

@jreback, thanks.

I want to do grouby then rolling then corr on two columns, code below takes 18s+ on 1M rows:

df.groupby('name', as_index=False, sort=False, group_keys=False).apply(
        lambda x: x['_a'].rolling(d).corr(other=x['_b'], pairwise=True))

Is there a faster way to do this? i tried code but it does not work (got exceptions):

_g = df.groupby('name', as_index=False, sort=False, group_keys=False)
_g['_a'].rolling(5).corr(other=_g['_b'], pairwise=False)

Another question: how to ignore the group key (just keep the original index) for below code:

_g['_a'].rolling(d).min()

jreback added Enhancement Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 30, 2016

jreback added this to the 0.18.1 milestone Mar 30, 2016

jreback force-pushed the expand branch from d65afdd to 292e05f Compare March 30, 2016 14:55

jreback reviewed Mar 30, 2016
View reviewed changes

jreback force-pushed the expand branch from 292e05f to 167843c Compare March 31, 2016 14:46

jreback changed the title ~~[WIP] ENH: allow .rolling / .expanding as groupby methods~~ ENH: allow .rolling / .expanding as groupby methods Mar 31, 2016

jreback mentioned this pull request Apr 1, 2016

BUG: Resample loses PeriodIndex name #12769

Closed

jreback force-pushed the expand branch from 9a5409b to 37bc0dd Compare April 2, 2016 15:06

jreback force-pushed the expand branch from 37bc0dd to b4bd4d6 Compare April 2, 2016 15:39

jreback reviewed Apr 2, 2016
View reviewed changes

jreback mentioned this pull request Apr 3, 2016

Using rolling method call on a pandas.core.groupby.DataFrameGroupBy object results in an AttributeError, previous rolling methods are deprecated #12782

Closed

jreback force-pushed the expand branch 4 times, most recently from ba7f228 to 102bfad Compare April 6, 2016 23:18

jreback force-pushed the expand branch from 102bfad to 4e94805 Compare April 10, 2016 22:50

jorisvandenbossche reviewed Apr 12, 2016
View reviewed changes

jreback force-pushed the expand branch 4 times, most recently from 18256d1 to 2525374 Compare April 18, 2016 17:23

TomAugspurger mentioned this pull request Apr 19, 2016

Lots of unexpected behavior using resample after groupby #12923

Closed

jreback force-pushed the expand branch from 8678bdb to 4964113 Compare April 19, 2016 14:48

jreback force-pushed the expand branch 3 times, most recently from 209c013 to 190ecd0 Compare April 19, 2016 16:18

jreback force-pushed the expand branch from 190ecd0 to c37940e Compare April 20, 2016 01:06

ENH: allow .rolling / .expanding as groupby methods

f98e6f8

closes pandas-dev#12738 BUG: allow df.groupby(...).resample(...) to return a Resampler groupby object closes pandas-dev#12486 BUG: consistency of name of returned groupby closes pandas-dev#12363

jreback force-pushed the expand branch from c37940e to f98e6f8 Compare April 21, 2016 23:15

jreback closed this in 6994240 Apr 26, 2016

adneu mentioned this pull request May 9, 2016

BUG: GH12824 fixed apply() returns different result depending on whet… #12977

Closed

4 tasks

jorisvandenbossche mentioned this pull request May 12, 2016

why resample with group by not pad null value #13151

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: allow .rolling / .expanding as groupby methods #12743

ENH: allow .rolling / .expanding as groupby methods #12743

jreback commented Mar 30, 2016 •

edited

Loading

jreback commented Mar 30, 2016

jreback Mar 30, 2016

sinhrks Mar 31, 2016

jreback commented Mar 31, 2016

jorisvandenbossche commented Mar 31, 2016

jorisvandenbossche commented Mar 31, 2016

jreback commented Apr 2, 2016

jreback Apr 2, 2016

jreback commented Apr 10, 2016

jorisvandenbossche commented Apr 11, 2016

jorisvandenbossche Apr 12, 2016

jreback Apr 12, 2016

TomAugspurger commented Apr 19, 2016 •

edited

Loading

TomAugspurger commented Apr 19, 2016

jorisvandenbossche commented Apr 19, 2016

jorisvandenbossche commented Apr 19, 2016

jreback commented Apr 19, 2016 •

edited

Loading

jreback commented Apr 19, 2016

jreback commented Apr 19, 2016

jreback commented Apr 19, 2016

jreback commented Apr 19, 2016

jreback commented Apr 19, 2016

jreback commented Apr 22, 2016

TomAugspurger commented Apr 23, 2016

jreback commented Apr 25, 2016

jorisvandenbossche commented Apr 26, 2016

ibigquant commented Aug 11, 2017 •

edited

Loading

ENH: allow .rolling / .expanding as groupby methods #12743

ENH: allow .rolling / .expanding as groupby methods #12743

Conversation

jreback commented Mar 30, 2016 • edited Loading

jreback commented Mar 30, 2016

jreback Mar 30, 2016

Choose a reason for hiding this comment

sinhrks Mar 31, 2016

Choose a reason for hiding this comment

jreback commented Mar 31, 2016

jorisvandenbossche commented Mar 31, 2016

jorisvandenbossche commented Mar 31, 2016

jreback commented Apr 2, 2016

jreback Apr 2, 2016

Choose a reason for hiding this comment

jreback commented Apr 10, 2016

jorisvandenbossche commented Apr 11, 2016

jorisvandenbossche Apr 12, 2016

Choose a reason for hiding this comment

jreback Apr 12, 2016

Choose a reason for hiding this comment

TomAugspurger commented Apr 19, 2016 • edited Loading

TomAugspurger commented Apr 19, 2016

jorisvandenbossche commented Apr 19, 2016

jorisvandenbossche commented Apr 19, 2016

jreback commented Apr 19, 2016 • edited Loading

jreback commented Apr 19, 2016

jreback commented Apr 19, 2016

jreback commented Apr 19, 2016

jreback commented Apr 19, 2016

jreback commented Apr 19, 2016

jreback commented Apr 22, 2016

TomAugspurger commented Apr 23, 2016

jreback commented Apr 25, 2016

jorisvandenbossche commented Apr 26, 2016

ibigquant commented Aug 11, 2017 • edited Loading

jreback commented Mar 30, 2016 •

edited

Loading

TomAugspurger commented Apr 19, 2016 •

edited

Loading

jreback commented Apr 19, 2016 •

edited

Loading

ibigquant commented Aug 11, 2017 •

edited

Loading