Skip to content

Commit da7b406

Browse files
author
Jon M. Mease
committed
Added example for grouping by combination of index level and column
1 parent 8281982 commit da7b406

File tree

2 files changed

+77
-15
lines changed

2 files changed

+77
-15
lines changed

doc/source/groupby.rst

+64-15
Original file line numberDiff line numberDiff line change
@@ -105,9 +105,9 @@ consider the following DataFrame:
105105
.. versionadded:: 0.20
106106

107107
A string passed to ``groupby`` may refer to either a column or an index level.
108-
If a string matches both a column and an index level then a warning is issued
109-
and the column takes precedence. This will result in an ambiguity error in a
110-
future version.
108+
If a string matches both a column name and an index level name then a warning is
109+
issued and the column takes precedence. This will result in an ambiguity error
110+
in a future version.
111111

112112
.. ipython:: python
113113
@@ -247,17 +247,6 @@ the length of the ``groups`` dict, so it is largely just a convenience:
247247
gb.aggregate gb.count gb.cumprod gb.dtype gb.first gb.groups gb.hist gb.max gb.min gb.nth gb.prod gb.resample gb.sum gb.var
248248
gb.apply gb.cummax gb.cumsum gb.fillna gb.gender gb.head gb.indices gb.mean gb.name gb.ohlc gb.quantile gb.size gb.tail gb.weight
249249

250-
251-
.. ipython:: python
252-
:suppress:
253-
254-
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
255-
'foo', 'bar', 'foo', 'foo'],
256-
'B' : ['one', 'one', 'two', 'three',
257-
'two', 'two', 'one', 'three'],
258-
'C' : np.random.randn(8),
259-
'D' : np.random.randn(8)})
260-
261250
.. _groupby.multiindex:
262251

263252
GroupBy with MultiIndex
@@ -299,7 +288,9 @@ chosen level:
299288
300289
s.sum(level='second')
301290
302-
Also as of v0.6, grouping with multiple levels is supported.
291+
.. versionadded:: 0.6
292+
293+
Grouping with multiple levels is supported.
303294

304295
.. ipython:: python
305296
:suppress:
@@ -316,15 +307,73 @@ Also as of v0.6, grouping with multiple levels is supported.
316307
s
317308
s.groupby(level=['first', 'second']).sum()
318309
310+
.. versionadded:: 0.20
311+
312+
Index level names may be supplied as keys.
313+
314+
.. ipython:: python
315+
316+
s.groupby(['first', 'second']).sum()
317+
319318
More on the ``sum`` function and aggregation later.
320319

320+
Grouping DataFrame with Index Levels and Columns
321+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
322+
A DataFrame may be grouped by a combination of columns and index levels by
323+
specifying the column names as strings and the index levels as ``pd.Grouper``
324+
objects.
325+
326+
.. ipython:: python
327+
328+
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
329+
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
330+
331+
index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
332+
333+
df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3, 3],
334+
'B': np.arange(8)},
335+
index=index)
336+
337+
df
338+
339+
The following example groups ``df`` by the ``second`` index level and
340+
the ``A`` column.
341+
342+
.. ipython:: python
343+
344+
df.groupby([pd.Grouper(level=1), 'A']).sum()
345+
346+
Index levels may also be specified by name.
347+
348+
.. ipython:: python
349+
350+
df.groupby([pd.Grouper(level='second'), 'A']).sum()
351+
352+
.. versionadded:: 0.20
353+
354+
Index level names may be specified as keys directly to ``groupby``.
355+
356+
.. ipython:: python
357+
358+
df.groupby(['second', 'A']).sum()
359+
321360
DataFrame column selection in GroupBy
322361
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
323362

324363
Once you have created the GroupBy object from a DataFrame, for example, you
325364
might want to do something different for each of the columns. Thus, using
326365
``[]`` similar to getting a column from a DataFrame, you can do:
327366

367+
.. ipython:: python
368+
:suppress:
369+
370+
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
371+
'foo', 'bar', 'foo', 'foo'],
372+
'B' : ['one', 'one', 'two', 'three',
373+
'two', 'two', 'one', 'three'],
374+
'C' : np.random.randn(8),
375+
'D' : np.random.randn(8)})
376+
328377
.. ipython:: python
329378
330379
grouped = df.groupby(['A'])

doc/source/whatsnew/v0.20.0.txt

+13
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,19 @@ Other enhancements
3131
^^^^^^^^^^^^^^^^^^
3232
- Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names (:issue:`5677`)
3333

34+
.. ipython:: python
35+
36+
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
37+
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
38+
39+
index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
40+
41+
df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3, 3],
42+
'B': np.arange(8)},
43+
index=index)
44+
45+
df.groupby(['second', 'A']).sum()
46+
3447

3548
.. _whatsnew_0200.api_breaking:
3649

0 commit comments

Comments
 (0)