Skip to content

Commit b5b04e0

Browse files
committed
BUG: boolean groupby performance issue. close #2692
1 parent bb6d9b9 commit b5b04e0

File tree

3 files changed

+13
-1
lines changed

3 files changed

+13
-1
lines changed

RELEASE.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ pandas 0.10.1
7878
- Exclude non-numeric data from DataFrame.quantile by default (GH2625_)
7979
- Fix a Cython C int64 boxing issue causing read_csv to return incorrect
8080
results (GH2599_)
81+
- Fix groupby summing performance issue on boolean data (GH2692_)
8182

8283
**API Changes**
8384

@@ -98,6 +99,7 @@ pandas 0.10.1
9899
.. _GH2625: https://github.com/pydata/pandas/issues/2625
99100
.. _GH2643: https://github.com/pydata/pandas/issues/2643
100101
.. _GH2637: https://github.com/pydata/pandas/issues/2637
102+
.. _GH2692: https://github.com/pydata/pandas/issues/2692
101103

102104
pandas 0.10.0
103105
=============

pandas/core/common.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -708,7 +708,7 @@ def _default_index(n):
708708

709709

710710
def ensure_float(arr):
711-
if issubclass(arr.dtype.type, np.integer):
711+
if issubclass(arr.dtype.type, (np.integer, np.bool_)):
712712
arr = arr.astype(float)
713713

714714
return arr

vb_suite/groupby.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -254,3 +254,13 @@ def f(g):
254254

255255
groupbym_frame_apply = Benchmark("df.groupby(['key', 'key2']).apply(f)", setup,
256256
start_date=datetime(2011, 10, 1))
257+
258+
#----------------------------------------------------------------------
259+
# Sum booleans #2692
260+
261+
setup = common_setup + """
262+
N = 500
263+
df = DataFrame({'ii':range(N),'bb':[True for x in range(N)]})
264+
"""
265+
266+
groupby_sum_booleans = Benchmark("df.groupby('ii').sum()", setup)

0 commit comments

Comments
 (0)