-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: allow .rolling / .expanding as groupby methods #12743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @lminer |
@@ -794,6 +794,7 @@ def _concat_objects(self, keys, values, not_indexed_same=False): | |||
|
|||
if isinstance(result, Series): | |||
result = result.reindex(ax) | |||
result.name = self.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, I can find no open issue.
ok, this is ready. comments. |
Is the example at the top still up to date? |
Also, how does the result looks like if you have a non-sorted grouper? |
@jorisvandenbossche all updated. Had to work thru some issues. But much more fully tested now. This basically replicates what Also this PR cleans up a bunch of cases where the name is not returned for groupbys. So much more consistency now. |
@@ -339,16 +342,23 @@ def __init__(self, obj, keys=None, axis=0, level=None, | |||
self.sort = sort | |||
self.group_keys = group_keys | |||
self.squeeze = squeeze | |||
self.mutated = kwargs.pop('mutated', False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a massive hack using. well hack is the wrong word. Its a way of informing the groupby that we want to force the multiindex construction path which is normally taken only when things are mutated. its not an external visible kw.
ba7f228
to
102bfad
Compare
@jorisvandenbossche if you'd have a look |
Will take a look later today! |
'B': np.arange(40)}) | ||
df | ||
|
||
You can now use ``.rolling(..)`` and ``.expanding(..)`` as methods on groupbys. These return another object where you operate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"These return another object where you operate" -> this is not really a clear sentence. What do exactly want to say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
18256d1
to
2525374
Compare
@jreback one more lost index name, when you group a Series by another (using the original dataframe) In [27]: df['B'].groupby(df.A).rolling(4).mean()
Out[27]:
A idx
1 0 NaN
1 NaN
2 NaN
3 1.5
4 2.5
...
3 35 33.5
36 34.5
37 35.5
38 36.5
39 37.5
dtype: float64 Let me know if you want to fix that here, otherwise I'll open an issue. |
And I'm (correctly) seeing a MultiIndex for the |
Will try again (I just fetched the head of this PR, didn't rebase myself as I thought you already did that) |
@TomAugspurger It's with the rolling case I do not see a MultiIndex ( @jreback Tested it again (fetched this PR, rebased on latest master myself, rebuild pandas to be certain), and still getting the same result as I showed in #12743 (comment) (using python 2.7, numpy 1.10.1, Windows) |
@jorisvandenbossche hmm, ok I did on 2.7/1.10.4 and it doesn't look right. odd. (on windows) |
ok, this has been failing on windows on my branch : https://ci.appveyor.com/project/jreback/pandas/build/1.0.2077/job/qfrsv7p46f6ns6xg |
ok, this is not a 2.7/numpy issue, but something specific to windows. |
ok pushed a commit to fix. This comes back to some of the 'figuring what the shape of the return' is logic. IOW its a heuristic depending on whether something is mutated (this is the standard case). But when we are doing a chained window/resample we need to force this. |
pushed a new version, had some flake/formatting issues on windows. |
209c013
to
190ecd0
Compare
@TomAugspurger I picked up that last name fix (was a bug in the concat step). |
closes pandas-dev#12738 BUG: allow df.groupby(...).resample(...) to return a Resampler groupby object closes pandas-dev#12486 BUG: consistency of name of returned groupby closes pandas-dev#12363
@TomAugspurger @jorisvandenbossche any more comments. |
Nothing else from me 👍 |
ok will merge shortly unless @jorisvandenbossche has any further comments. |
Tested on master, and now indeed the inconsistency is solved! Thanks |
@jreback, thanks. I want to do grouby then rolling then corr on two columns, code below takes 18s+ on 1M rows:
Is there a faster way to do this? i tried code but it does not work (got exceptions):
Another question: how to ignore the group key (just keep the original index) for below code:
|
closes #12738
closes #12486
closes #12363
- [ ] doc section in groupbywill do later