Skip to content

Commit cf12120

Browse files
Patrick ParkPingviinituutti
Patrick Park
authored andcommitted
DOC: Added note about groupby excluding Decimal columns by default (pandas-dev#18953)
1 parent 2fdc5d2 commit cf12120

File tree

1 file changed

+27
-0
lines changed

1 file changed

+27
-0
lines changed

doc/source/groupby.rst

+27
Original file line numberDiff line numberDiff line change
@@ -995,6 +995,33 @@ Note that ``df.groupby('A').colname.std().`` is more efficient than
995995
is only interesting over one column (here ``colname``), it may be filtered
996996
*before* applying the aggregation function.
997997

998+
.. note::
999+
Any object column, also if it contains numerical values such as ``Decimal``
1000+
objects, is considered as a "nuisance" columns. They are excluded from
1001+
aggregate functions automatically in groupby.
1002+
1003+
If you do wish to include decimal or object columns in an aggregation with
1004+
other non-nuisance data types, you must do so explicitly.
1005+
1006+
.. ipython:: python
1007+
1008+
from decimal import Decimal
1009+
df_dec = pd.DataFrame(
1010+
{'id': [1, 2, 1, 2],
1011+
'int_column': [1, 2, 3, 4],
1012+
'dec_column': [Decimal('0.50'), Decimal('0.15'), Decimal('0.25'), Decimal('0.40')]
1013+
}
1014+
)
1015+
1016+
# Decimal columns can be sum'd explicitly by themselves...
1017+
df_dec.groupby(['id'])[['dec_column']].sum()
1018+
1019+
# ...but cannot be combined with standard data types or they will be excluded
1020+
df_dec.groupby(['id'])[['int_column', 'dec_column']].sum()
1021+
1022+
# Use .agg function to aggregate over standard and "nuisance" data types at the same time
1023+
df_dec.groupby(['id']).agg({'int_column': 'sum', 'dec_column': 'sum'})
1024+
9981025
.. _groupby.observed:
9991026

10001027
Handling of (un)observed Categorical values

0 commit comments

Comments
 (0)