Skip to content

Commit 7aee071

Browse files
committed
ENH: add .ngroup() method to groupby objects (#14026)
Closes #11642
1 parent f5b7bcb commit 7aee071

File tree

5 files changed

+299
-60
lines changed

5 files changed

+299
-60
lines changed

doc/source/api.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1706,6 +1706,7 @@ Computations / Descriptive Stats
17061706
GroupBy.mean
17071707
GroupBy.median
17081708
GroupBy.min
1709+
GroupBy.ngroup
17091710
GroupBy.nth
17101711
GroupBy.ohlc
17111712
GroupBy.prod

doc/source/groupby.rst

+34
Original file line numberDiff line numberDiff line change
@@ -1087,6 +1087,23 @@ To see the order in which each row appears within its group, use the
10871087
10881088
df.groupby('A').cumcount(ascending=False) # kwarg only
10891089
1090+
Enumerate groups
1091+
~~~~~~~~~~~~~~~~
1092+
1093+
.. versionadded:: 0.20.0
1094+
1095+
To see the ordering of the groups themselves, you can use the ``ngroup``
1096+
method:
1097+
1098+
.. ipython:: python
1099+
1100+
df = pd.DataFrame(list('aaabba'), columns=['A'])
1101+
df
1102+
1103+
df.groupby('A').ngroup()
1104+
1105+
df.groupby('A').ngroup(ascending=False) # kwarg only
1106+
10901107
Plotting
10911108
~~~~~~~~
10921109

@@ -1178,3 +1195,20 @@ column index name will be used as the name of the inserted column:
11781195
result
11791196
11801197
result.stack()
1198+
1199+
Multi-column factorization
1200+
~~~~~~~~~~~~~~~~~~~~~~~~~~
1201+
1202+
By using ``.ngroup()``, we can extract information about the groups in a
1203+
way similar to ``pd.factorize()``, but which applies naturally to multiple
1204+
columns of mixed type and different sources:
1205+
1206+
.. ipython::python
1207+
1208+
df = pd.DataFrame({"A": [1, 1, 2, 3, 2], "B": list("aaaba")})
1209+
1210+
df
1211+
1212+
df.groupby(["A", "B"]).ngroup()
1213+
1214+
df.groupby(["A", [0, 0, 0, 1, 1]]).ngroup()

doc/source/whatsnew/v0.20.0.txt

+17-2
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,8 @@ fixed-width text files, and :func:`read_excel` for parsing Excel files.
5656

5757
.. _whatsnew_0200.enhancements.groupby_access:
5858

59-
Groupby Enhancements
60-
^^^^^^^^^^^^^^^^^^^^
59+
Groupby Access Enhancements
60+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
6161

6262
Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names (:issue:`5677`)
6363

@@ -75,6 +75,21 @@ Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now refere
7575

7676
df.groupby(['second', 'A']).sum()
7777

78+
.. _whatsnew_0200.enhancements.groupby_ngroup:
79+
80+
Groupby Group Numbers
81+
^^^^^^^^^^^^^^^^^^^^^
82+
83+
A new groupby method ``ngroup``, parallel to the existing ``cumcount``, has been added to return the group order (:issue:`11642`).
84+
85+
.. ipython:: python
86+
87+
df = pd.DataFrame({"A": [1, 1, 2, 3, 3], "B": list("aaaba")})
88+
89+
df.groupby("A").ngroup()
90+
91+
df.groupby(["A", "B"]).ngroup()
92+
7893
.. _whatsnew_0200.enhancements.compressed_urls:
7994

8095
Better support for compressed URLs in ``read_csv``

pandas/core/groupby.py

+56
Original file line numberDiff line numberDiff line change
@@ -1363,6 +1363,62 @@ def nth(self, n, dropna=None):
13631363

13641364
return result
13651365

1366+
@Substitution(name='groupby')
1367+
@Appender(_doc_template)
1368+
def ngroup(self, ascending=True):
1369+
"""
1370+
Number each group from 0 to the number of groups - 1.
1371+
1372+
This is the enumerative complement of cumcount. Note that the
1373+
numbers given to the groups match the order in which the groups
1374+
would be seen when iterating over the groupby object, not the
1375+
order they are first observed.
1376+
1377+
.. versionadded:: 0.20.0
1378+
1379+
Parameters
1380+
----------
1381+
ascending : bool, default True
1382+
If False, number in reverse, from number of group - 1 to 0.
1383+
1384+
Examples
1385+
--------
1386+
1387+
>>> df = pd.DataFrame({"A": list("aaabba")})
1388+
>>> df
1389+
A
1390+
0 a
1391+
1 a
1392+
2 a
1393+
3 b
1394+
4 b
1395+
5 a
1396+
>>> df.groupby('A').ngroup()
1397+
0 0
1398+
1 0
1399+
2 0
1400+
3 1
1401+
4 1
1402+
5 0
1403+
dtype: int64
1404+
>>> df.groupby('A').ngroup(ascending=False)
1405+
0 1
1406+
1 1
1407+
2 1
1408+
3 0
1409+
4 0
1410+
5 1
1411+
dtype: int64
1412+
"""
1413+
1414+
self._set_group_selection()
1415+
1416+
index = self._selected_obj.index
1417+
result = Series(self.grouper.group_info[0], index)
1418+
if not ascending:
1419+
result = self.ngroups - 1 - result
1420+
return result
1421+
13661422
@Substitution(name='groupby')
13671423
@Appender(_doc_template)
13681424
def cumcount(self, ascending=True):

0 commit comments

Comments
 (0)