Skip to content

Commit 2f667db

Browse files
committed
DOC: update groupby docs for using pd.Grouper
1 parent a7b19f9 commit 2f667db

File tree

4 files changed

+56
-6
lines changed

4 files changed

+56
-6
lines changed

doc/source/groupby.rst

+48-2
Original file line numberDiff line numberDiff line change
@@ -707,6 +707,52 @@ can be used as group keys. If so, the order of the levels will be preserved:
707707
708708
data.groupby(factor).mean()
709709
710+
.. _groupby.specify:
711+
712+
Grouping with a Grouper specification
713+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
714+
715+
Your may need to specify a bit more data to properly group. You can
716+
use the ``pd.Grouper`` to provide this local control.
717+
718+
.. ipython:: python
719+
720+
import datetime as DT
721+
722+
df = DataFrame({
723+
'Branch' : 'A A A A A A A B'.split(),
724+
'Buyer': 'Carl Mark Carl Carl Joe Joe Joe Carl'.split(),
725+
'Quantity': [1,3,5,1,8,1,9,3],
726+
'Date' : [
727+
DT.datetime(2013,1,1,13,0),
728+
DT.datetime(2013,1,1,13,5),
729+
DT.datetime(2013,10,1,20,0),
730+
DT.datetime(2013,10,2,10,0),
731+
DT.datetime(2013,10,1,20,0),
732+
DT.datetime(2013,10,2,10,0),
733+
DT.datetime(2013,12,2,12,0),
734+
DT.datetime(2013,12,2,14,0),
735+
]})
736+
737+
df
738+
739+
Groupby a specific column with the desired frequency. This is like resampling.
740+
741+
.. ipython:: python
742+
743+
df.groupby([pd.Grouper(freq='1M',key='Date'),'Buyer']).sum()
744+
745+
You have an ambiguous specification in that you have a named index and a column
746+
that could be potential groupers.
747+
748+
.. ipython:: python
749+
750+
df = df.set_index('Date')
751+
df['Date'] = df.index + pd.offsets.MonthEnd(2)
752+
df.groupby([pd.Grouper(freq='6M',key='Date'),'Buyer']).sum()
753+
754+
df.groupby([pd.Grouper(freq='6M',level='Date'),'Buyer']).sum()
755+
710756
711757
Taking the first rows of each group
712758
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -753,7 +799,7 @@ To select from a DataFrame or Series the nth item, use the nth method:
753799
754800
g.nth(-1)
755801
756-
If you want to select the nth not-null method, use the dropna kwarg. For a DataFrame this should be either 'any' or 'all' just like you would pass to dropna, for a Series this just needs to be truthy.
802+
If you want to select the nth not-null method, use the dropna kwarg. For a DataFrame this should be either 'any' or 'all' just like you would pass to dropna, for a Series this just needs to be truthy.
757803

758804
.. ipython:: python
759805
@@ -808,7 +854,7 @@ column index name will be used as the name of the inserted column:
808854
'b': [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1],
809855
'c': [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
810856
'd': [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
811-
})
857+
})
812858
813859
def compute_metrics(x):
814860
result = {'b_sum': x['b'].sum(), 'c_mean': x['c'].mean()}

doc/source/release.rst

+2
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,8 @@ API Changes
125125
``DataFrame.stack`` operations where the name of the column index is used as
126126
the name of the inserted column containing the pivoted data.
127127

128+
- Allow specification of a more complex groupby, via ``pd.Groupby`` (:issue:`3794`)
129+
128130
Experimental Features
129131
~~~~~~~~~~~~~~~~~~~~~
130132

doc/source/v0.14.0.txt

+2
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,8 @@ These are out-of-bounds selections
9090

9191
g.nth(0, dropna='any') # similar to old behaviour
9292

93+
- Allow specification of a more complex groupby via ``pd.Groupby``, See :ref:`the docs <groupby.specify>`. (:issue:`3794`)
94+
9395
- Local variable usage has changed in
9496
:func:`pandas.eval`/:meth:`DataFrame.eval`/:meth:`DataFrame.query`
9597
(:issue:`5987`). For the :class:`~pandas.DataFrame` methods, two things have

pandas/tests/test_groupby.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -2933,7 +2933,7 @@ def test_timegrouper_with_reg_groups(self):
29332933
DT.datetime(2013,12,31,0,0),
29342934
DT.datetime(2013,12,31,0,0),
29352935
]}).set_index(['Date','Buyer'])
2936-
result = df.groupby([pd.TimeGrouper('A'),'Buyer']).sum()
2936+
result = df.groupby([pd.Grouper(freq='A'),'Buyer']).sum()
29372937
assert_frame_equal(result,expected)
29382938

29392939
expected = DataFrame({
@@ -2946,7 +2946,7 @@ def test_timegrouper_with_reg_groups(self):
29462946
DT.datetime(2013,7,1,0,0),
29472947
]}).set_index(['Date','Buyer'])
29482948

2949-
result = df.groupby([pd.TimeGrouper('6MS'),'Buyer']).sum()
2949+
result = df.groupby([pd.Grouper(freq='6MS'),'Buyer']).sum()
29502950
assert_frame_equal(result,expected)
29512951

29522952
df = DataFrame({
@@ -2975,10 +2975,10 @@ def test_timegrouper_with_reg_groups(self):
29752975
DT.datetime(2013,10,2,0,0),
29762976
]}).set_index(['Date','Buyer'])
29772977

2978-
result = df.groupby([pd.TimeGrouper('1D'),'Buyer']).sum()
2978+
result = df.groupby([pd.Grouper(freq='1D'),'Buyer']).sum()
29792979
assert_frame_equal(result,expected)
29802980

2981-
result = df.groupby([pd.TimeGrouper('1M'),'Buyer']).sum()
2981+
result = df.groupby([pd.Grouper(freq='1M'),'Buyer']).sum()
29822982
expected = DataFrame({
29832983
'Buyer': 'Carl Joe Mark'.split(),
29842984
'Quantity': [10,18,3],

0 commit comments

Comments
 (0)