-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH Consistent apply output when grouping with freq #12362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -72,15 +72,81 @@ API changes | |
|
||
|
||
- ``CParserError`` is now a ``ValueError`` instead of just an ``Exception`` (:issue:`12551`) | ||
|
||
- ``pd.show_versions()`` now includes ``pandas_datareader`` version (:issue:`12740`) | ||
|
||
- Using ``apply`` on resampling groupby operations (e.g. ``df.groupby(pd.TimeGrouper(freq='M', key='date')).apply(...)``) now has the same output types as similar ``apply``s on other groupby operations (e.g. ``df.groupby(pd.Grouper(key='color')).apply(...)``). (:issue:`11742`). | ||
|
||
Previous behavior: | ||
|
||
.. code-block:: python | ||
|
||
In [1]: df = pd.DataFrame({'date': pd.to_datetime(['10/10/2000', '11/10/2000']), 'value': [10, 13]}) | ||
|
||
In [2]: df.groupby(pd.TimeGrouper(key='date', freq='M')).apply(lambda x: x.value.sum()) | ||
Out[2]: | ||
... | ||
TypeError: cannot concatenate a non-NDFrame object | ||
|
||
In [3]: df.groupby(pd.TimeGrouper(key='date', freq='M')).apply(lambda x: x[['value']].sum()) | ||
Out[3]: | ||
date | ||
2000-10-31 value 10 | ||
2000-11-30 value 13 | ||
dtype: int64 | ||
|
||
In [3]: type(df.groupby(pd.TimeGrouper(key='date', freq='M')).apply(lambda x: x[['value']].sum())) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think you also need to show the types, instead put a comment right above it
this will shorter up this section a bit. (just 3 examples then in the previous block), you can put the same comments in the new block. Imagine you are the user reading this and try to be as simple (but as complete) as possible. |
||
Out[3]: pandas.core.series.Series | ||
|
||
|
||
In [4]: df.groupby(pd.Grouper(key='date')).apply(lambda x: x.value.sum()) | ||
Out[4]: | ||
date | ||
2000-10-10 10 | ||
2000-11-10 13 | ||
dtype: int64 | ||
|
||
In [5]: type(df.groupby(pd.Grouper(key='date')).apply(lambda x: x.value.sum())) | ||
Out[5]: pandas.core.series.Series | ||
|
||
|
||
In [6]: df.groupby(pd.Grouper(key='date')).apply(lambda x: x[['value']].sum()) | ||
Out[6]: | ||
value | ||
date | ||
2000-10-10 10 | ||
2000-11-10 13 | ||
|
||
In [7]: type(df.groupby(pd.Grouper(key='date')).apply(lambda x: x[['value']].sum())) | ||
Out[7]: pandas.core.frame.DataFrame | ||
|
||
|
||
New Behavior: | ||
|
||
.. code-block:: python | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. make the new an ipython block (so the code runs) |
||
|
||
In [1]: df = pd.DataFrame({'date': pd.to_datetime(['10/10/2000', '11/10/2000']), 'value': [10, 13]}) | ||
|
||
In [2]: df.groupby(pd.TimeGrouper(key='date', freq='M')).apply(lambda x: x.value.sum()) | ||
Out[2]: | ||
date | ||
2000-10-31 10 | ||
2000-11-30 13 | ||
Freq: M, dtype: int64 | ||
|
||
In [3]: type(df.groupby(pd.TimeGrouper(key='date', freq='M')).apply(lambda x: x.value.sum())) | ||
Out[3]: pandas.core.series.Series | ||
|
||
|
||
In [4]: df.groupby(pd.TimeGrouper(key='date', freq='M')).apply(lambda x: x[['value']].sum()) | ||
Out[4]: | ||
value | ||
date | ||
2000-10-31 10 | ||
2000-11-30 13 | ||
|
||
In [5]: type(df.groupby(pd.TimeGrouper(key='date', freq='M')).apply(lambda x: x[['value']].sum())) | ||
Out[5]: pandas.core.frame.DataFrame | ||
|
||
|
||
.. _whatsnew_0181.deprecations: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4824,6 +4824,40 @@ def test_timegrouper_get_group(self): | |
result = grouped.get_group(dt) | ||
assert_frame_equal(result, expected) | ||
|
||
def test_timegrouper_apply_return_type_series(self): | ||
# Using `apply` with the `TimeGrouper` should give the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. put the issue number here as a comment |
||
# same return type as an `apply` with a `Grouper`. | ||
df = pd.DataFrame({'date': ['10/10/2000', '11/10/2000'], | ||
'value': [10, 13]}) | ||
df_dt = df.copy() | ||
df_dt['date'] = pd.to_datetime(df_dt['date']) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can basically put on of these test examples in the whatsnew |
||
def sumfunc_series(x): | ||
return pd.Series([x['value'].sum()], ('sum',)) | ||
|
||
expected = df.groupby(pd.Grouper(key='date')).apply(sumfunc_series) | ||
result = (df_dt.groupby(pd.TimeGrouper(freq='M', key='date')) | ||
.apply(sumfunc_series)) | ||
assert_frame_equal(result.reset_index(drop=True), | ||
expected.reset_index(drop=True)) | ||
|
||
def test_timegrouper_apply_return_type_value(self): | ||
# Using `apply` with the `TimeGrouper` should give the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. issue number in the comment |
||
# same return type as an `apply` with a `Grouper`. | ||
df = pd.DataFrame({'date': ['10/10/2000', '11/10/2000'], | ||
'value': [10, 13]}) | ||
df_dt = df.copy() | ||
df_dt['date'] = pd.to_datetime(df_dt['date']) | ||
|
||
def sumfunc_value(x): | ||
return x.value.sum() | ||
|
||
expected = df.groupby(pd.Grouper(key='date')).apply(sumfunc_value) | ||
result = (df_dt.groupby(pd.TimeGrouper(freq='M', key='date')) | ||
.apply(sumfunc_value)) | ||
assert_series_equal(result.reset_index(drop=True), | ||
expected.reset_index(drop=True)) | ||
|
||
def test_cumcount(self): | ||
df = DataFrame([['a'], ['a'], ['a'], ['b'], ['a']], columns=['A']) | ||
g = df.groupby('A') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move [1] to an ipython block above (so it runs) (and show df)