Skip to content

Commit 5bb3321

Browse files
author
pdpark
committed
Moved note about exclusion of Decimal columns from agg functions to automatic-exclusion-of-nuisance-columns section
1 parent 155e85e commit 5bb3321

File tree

1 file changed

+36
-22
lines changed

1 file changed

+36
-22
lines changed

doc/source/groupby.rst

+36-22
Original file line numberDiff line numberDiff line change
@@ -497,28 +497,6 @@ index are the group names and whose values are the sizes of each group.
497497

498498
``nth`` can act as a reducer *or* a filter, see :ref:`here <groupby.nth>`
499499

500-
Decimal columns are "nuisance" columns that .agg automatically excludes in groupby.
501-
502-
If you do wish to aggregate them you must do so explicitly:
503-
504-
.. ipython:: python
505-
506-
from decimal import Decimal
507-
dec = pd.DataFrame(
508-
{'name': ['foo', 'bar', 'foo', 'bar'],
509-
'title': ['boo', 'far', 'boo', 'far'],
510-
'id': [123, 456, 123, 456],
511-
'int_column': [1, 2, 3, 4],
512-
'dec_column1': [Decimal('0.50'), Decimal('0.15'), Decimal('0.25'), Decimal('0.40')],
513-
'dec_column2': [Decimal('0.20'), Decimal('0.30'), Decimal('0.55'), Decimal('0.60')]
514-
},
515-
columns=['name','title','id','int_column','dec_column1','dec_column2']
516-
)
517-
518-
dec.groupby(['name', 'title', 'id'], as_index=False).sum()
519-
520-
dec.groupby(['name', 'title', 'id'], as_index=False).agg({'dec_column1': 'sum', 'dec_column2': 'sum'})
521-
522500
.. _groupby.aggregate.multifunc:
523501

524502
Applying multiple functions at once
@@ -977,6 +955,42 @@ will be (silently) dropped. Thus, this does not pose any problems:
977955
978956
df.groupby('A').std()
979957
958+
.. note::
959+
Decimal columns are also "nuisance" columns. They are excluded from aggregate functions automatically in groupby.
960+
961+
If you do wish to include decimal columns in the aggregation, you must do so explicitly:
962+
963+
.. ipython:: python
964+
965+
from decimal import Decimal
966+
dec = pd.DataFrame(
967+
{'name': ['foo', 'bar', 'foo', 'bar'],
968+
'title': ['boo', 'far', 'boo', 'far'],
969+
'id': [123, 456, 123, 456],
970+
'int_column': [1, 2, 3, 4],
971+
'dec_column1': [Decimal('0.50'), Decimal('0.15'), Decimal('0.25'), Decimal('0.40')],
972+
'dec_column2': [Decimal('0.20'), Decimal('0.30'), Decimal('0.55'), Decimal('0.60')]
973+
},
974+
columns=['name','title','id','int_column','dec_column1','dec_column2']
975+
)
976+
977+
dec.head()
978+
979+
dec.dtypes
980+
981+
# Decimal columns excluded from sum by default
982+
dec.groupby(['name', 'title', 'id'], as_index=False).sum()
983+
984+
# Decimal columns can be sum'd explicitly by themselves...
985+
dec.groupby(['name', 'title', 'id'], as_index=False)['dec_column1','dec_column2'].sum()
986+
987+
# ...but cannot be combined with standard data types or they will be excluded
988+
dec.groupby(['name', 'title', 'id'], as_index=False)['int_column','dec_column1','dec_column2'].sum()
989+
990+
# Use .agg function to aggregate over standard and "nuisance" data types at the same time
991+
dec.groupby(['name', 'title', 'id'], as_index=False).agg({'int_column': 'sum', 'dec_column1': 'sum', 'dec_column2': 'sum'})
992+
993+
980994
.. _groupby.missing:
981995

982996
NA and NaT group handling

0 commit comments

Comments
 (0)