Skip to content

BUG/API: allow TimeGrouper with other columns in a groupby (GH3794) #6516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 15, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 56 additions & 2 deletions doc/source/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -707,6 +707,54 @@ can be used as group keys. If so, the order of the levels will be preserved:

data.groupby(factor).mean()

.. _groupby.specify:

Grouping with a Grouper specification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Your may need to specify a bit more data to properly group. You can
use the ``pd.Grouper`` to provide this local control.

.. ipython:: python

import datetime as DT

df = DataFrame({
'Branch' : 'A A A A A A A B'.split(),
'Buyer': 'Carl Mark Carl Carl Joe Joe Joe Carl'.split(),
'Quantity': [1,3,5,1,8,1,9,3],
'Date' : [
DT.datetime(2013,1,1,13,0),
DT.datetime(2013,1,1,13,5),
DT.datetime(2013,10,1,20,0),
DT.datetime(2013,10,2,10,0),
DT.datetime(2013,10,1,20,0),
DT.datetime(2013,10,2,10,0),
DT.datetime(2013,12,2,12,0),
DT.datetime(2013,12,2,14,0),
]})

df

Groupby a specific column with the desired frequency. This is like resampling.

.. ipython:: python

df.groupby([pd.Grouper(freq='1M',key='Date'),'Buyer']).sum()

You have an ambiguous specification in that you have a named index and a column
that could be potential groupers.

.. ipython:: python

df = df.set_index('Date')
df['Date'] = df.index + pd.offsets.MonthEnd(2)
df.groupby([pd.Grouper(freq='6M',key='Date'),'Buyer']).sum()

df.groupby([pd.Grouper(freq='6M',level='Date'),'Buyer']).sum()


.. _groupby.nth:

Taking the first rows of each group
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -753,7 +801,7 @@ To select from a DataFrame or Series the nth item, use the nth method:

g.nth(-1)

If you want to select the nth not-null method, use the dropna kwarg. For a DataFrame this should be either 'any' or 'all' just like you would pass to dropna, for a Series this just needs to be truthy.
If you want to select the nth not-null method, use the dropna kwarg. For a DataFrame this should be either 'any' or 'all' just like you would pass to dropna, for a Series this just needs to be truthy.

.. ipython:: python

Expand Down Expand Up @@ -787,6 +835,9 @@ To see the order in which each row appears within its group, use the
Examples
--------

Regrouping by factor
~~~~~~~~~~~~~~~~~~~~

Regroup columns of a DataFrame according to their sum, and sum the aggregated ones.

.. ipython:: python
Expand All @@ -796,6 +847,9 @@ Regroup columns of a DataFrame according to their sum, and sum the aggregated on
df.groupby(df.sum(), axis=1).sum()


Returning a Series to propogate names
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Group DataFrame columns, compute a set of metrics and return a named Series.
The Series name is used as the name for the column index. This is especially
useful in conjunction with reshaping operations such as stacking in which the
Expand All @@ -808,7 +862,7 @@ column index name will be used as the name of the inserted column:
'b': [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1],
'c': [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
'd': [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
})
})

def compute_metrics(x):
result = {'b_sum': x['b'].sum(), 'c_mean': x['c'].mean()}
Expand Down
2 changes: 2 additions & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,8 @@ API Changes
``DataFrame.stack`` operations where the name of the column index is used as
the name of the inserted column containing the pivoted data.

- Allow specification of a more complex groupby, via ``pd.Groupby`` (:issue:`3794`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pdf.Groupby -> pd.Grouper?


Experimental Features
~~~~~~~~~~~~~~~~~~~~~

Expand Down
7 changes: 6 additions & 1 deletion doc/source/v0.14.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Highlights include:

- MultIndexing Using Slicers
- Joining a singly-indexed DataFrame with a multi-indexed DataFrame
- More flexible groupby specifications

API changes
~~~~~~~~~~~
Expand Down Expand Up @@ -80,7 +81,7 @@ These are out-of-bounds selections
g[['B']].head(1)

- groupby ``nth`` now filters by default, with optional dropna argument to ignore
NaN (to replicate the previous behaviour.)
NaN (to replicate the previous behaviour.), See :ref:`the docs <groupby.nth>`.

.. ipython:: python

Expand All @@ -90,6 +91,9 @@ These are out-of-bounds selections

g.nth(0, dropna='any') # similar to old behaviour

- Allow specification of a more complex groupby via ``pd.Groupby``, such as grouping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same here

by a Time and a string field simultaneously. See :ref:`the docs <groupby.specify>`. (:issue:`3794`)

- Local variable usage has changed in
:func:`pandas.eval`/:meth:`DataFrame.eval`/:meth:`DataFrame.query`
(:issue:`5987`). For the :class:`~pandas.DataFrame` methods, two things have
Expand Down Expand Up @@ -121,6 +125,7 @@ These are out-of-bounds selections
.. ipython:: python

i[[0,1,2]].astype(np.int_)

- ``set_index`` no longer converts MultiIndexes to an Index of tuples. For example,
the old behavior returned an Index in this case (:issue:`6459`):

Expand Down
1 change: 1 addition & 0 deletions pandas/core/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from pandas.core.algorithms import factorize, match, unique, value_counts
from pandas.core.common import isnull, notnull
from pandas.core.categorical import Categorical
from pandas.core.groupby import Grouper
from pandas.core.format import set_eng_float_format
from pandas.core.index import Index, Int64Index, Float64Index, MultiIndex

Expand Down
Loading