Skip to content

Commit 361f703

Browse files
committed
Merge pull request #6516 from jreback/time_grouper
BUG/API: allow TimeGrouper with other columns in a groupby (GH3794)
2 parents 4216178 + 5e965e9 commit 361f703

File tree

9 files changed

+435
-67
lines changed

9 files changed

+435
-67
lines changed

doc/source/groupby.rst

+56-2
Original file line numberDiff line numberDiff line change
@@ -707,6 +707,54 @@ can be used as group keys. If so, the order of the levels will be preserved:
707707
708708
data.groupby(factor).mean()
709709
710+
.. _groupby.specify:
711+
712+
Grouping with a Grouper specification
713+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
714+
715+
Your may need to specify a bit more data to properly group. You can
716+
use the ``pd.Grouper`` to provide this local control.
717+
718+
.. ipython:: python
719+
720+
import datetime as DT
721+
722+
df = DataFrame({
723+
'Branch' : 'A A A A A A A B'.split(),
724+
'Buyer': 'Carl Mark Carl Carl Joe Joe Joe Carl'.split(),
725+
'Quantity': [1,3,5,1,8,1,9,3],
726+
'Date' : [
727+
DT.datetime(2013,1,1,13,0),
728+
DT.datetime(2013,1,1,13,5),
729+
DT.datetime(2013,10,1,20,0),
730+
DT.datetime(2013,10,2,10,0),
731+
DT.datetime(2013,10,1,20,0),
732+
DT.datetime(2013,10,2,10,0),
733+
DT.datetime(2013,12,2,12,0),
734+
DT.datetime(2013,12,2,14,0),
735+
]})
736+
737+
df
738+
739+
Groupby a specific column with the desired frequency. This is like resampling.
740+
741+
.. ipython:: python
742+
743+
df.groupby([pd.Grouper(freq='1M',key='Date'),'Buyer']).sum()
744+
745+
You have an ambiguous specification in that you have a named index and a column
746+
that could be potential groupers.
747+
748+
.. ipython:: python
749+
750+
df = df.set_index('Date')
751+
df['Date'] = df.index + pd.offsets.MonthEnd(2)
752+
df.groupby([pd.Grouper(freq='6M',key='Date'),'Buyer']).sum()
753+
754+
df.groupby([pd.Grouper(freq='6M',level='Date'),'Buyer']).sum()
755+
756+
757+
.. _groupby.nth:
710758

711759
Taking the first rows of each group
712760
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -753,7 +801,7 @@ To select from a DataFrame or Series the nth item, use the nth method:
753801
754802
g.nth(-1)
755803
756-
If you want to select the nth not-null method, use the dropna kwarg. For a DataFrame this should be either 'any' or 'all' just like you would pass to dropna, for a Series this just needs to be truthy.
804+
If you want to select the nth not-null method, use the dropna kwarg. For a DataFrame this should be either 'any' or 'all' just like you would pass to dropna, for a Series this just needs to be truthy.
757805

758806
.. ipython:: python
759807
@@ -787,6 +835,9 @@ To see the order in which each row appears within its group, use the
787835
Examples
788836
--------
789837

838+
Regrouping by factor
839+
~~~~~~~~~~~~~~~~~~~~
840+
790841
Regroup columns of a DataFrame according to their sum, and sum the aggregated ones.
791842

792843
.. ipython:: python
@@ -796,6 +847,9 @@ Regroup columns of a DataFrame according to their sum, and sum the aggregated on
796847
df.groupby(df.sum(), axis=1).sum()
797848
798849
850+
Returning a Series to propogate names
851+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
852+
799853
Group DataFrame columns, compute a set of metrics and return a named Series.
800854
The Series name is used as the name for the column index. This is especially
801855
useful in conjunction with reshaping operations such as stacking in which the
@@ -808,7 +862,7 @@ column index name will be used as the name of the inserted column:
808862
'b': [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1],
809863
'c': [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
810864
'd': [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
811-
})
865+
})
812866
813867
def compute_metrics(x):
814868
result = {'b_sum': x['b'].sum(), 'c_mean': x['c'].mean()}

doc/source/release.rst

+2
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,8 @@ API Changes
131131
``FutureWarning`` is raised to alert that the old ``rows`` and ``cols`` arguments
132132
will not be supported in a future release (:issue:`5505`)
133133

134+
- Allow specification of a more complex groupby, via ``pd.Groupby`` (:issue:`3794`)
135+
134136
Experimental Features
135137
~~~~~~~~~~~~~~~~~~~~~
136138

doc/source/v0.14.0.txt

+6-1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ Highlights include:
1111

1212
- MultIndexing Using Slicers
1313
- Joining a singly-indexed DataFrame with a multi-indexed DataFrame
14+
- More flexible groupby specifications
1415

1516
API changes
1617
~~~~~~~~~~~
@@ -82,7 +83,7 @@ These are out-of-bounds selections
8283
g[['B']].head(1)
8384

8485
- groupby ``nth`` now filters by default, with optional dropna argument to ignore
85-
NaN (to replicate the previous behaviour.)
86+
NaN (to replicate the previous behaviour.), See :ref:`the docs <groupby.nth>`.
8687

8788
.. ipython:: python
8889

@@ -92,6 +93,9 @@ These are out-of-bounds selections
9293

9394
g.nth(0, dropna='any') # similar to old behaviour
9495

96+
- Allow specification of a more complex groupby via ``pd.Groupby``, such as grouping
97+
by a Time and a string field simultaneously. See :ref:`the docs <groupby.specify>`. (:issue:`3794`)
98+
9599
- Local variable usage has changed in
96100
:func:`pandas.eval`/:meth:`DataFrame.eval`/:meth:`DataFrame.query`
97101
(:issue:`5987`). For the :class:`~pandas.DataFrame` methods, two things have
@@ -123,6 +127,7 @@ These are out-of-bounds selections
123127
.. ipython:: python
124128

125129
i[[0,1,2]].astype(np.int_)
130+
126131
- ``set_index`` no longer converts MultiIndexes to an Index of tuples. For example,
127132
the old behavior returned an Index in this case (:issue:`6459`):
128133

pandas/core/api.py

+1
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
from pandas.core.algorithms import factorize, match, unique, value_counts
77
from pandas.core.common import isnull, notnull
88
from pandas.core.categorical import Categorical
9+
from pandas.core.groupby import Grouper
910
from pandas.core.format import set_eng_float_format
1011
from pandas.core.index import Index, Int64Index, Float64Index, MultiIndex
1112

0 commit comments

Comments
 (0)