Skip to content

Commit 8ea3d49

Browse files
WillAydMateusz Górski
authored and
Mateusz Górski
committed
Fixed segfaults and incorrect results in GroupBy.quantile with NA Values in Grouping (pandas-dev#29173)
1 parent 16cbd02 commit 8ea3d49

File tree

3 files changed

+27
-0
lines changed

3 files changed

+27
-0
lines changed

doc/source/whatsnew/v1.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -411,6 +411,7 @@ Groupby/resample/rolling
411411
- Bug in :meth:`DataFrame.groupby` not offering selection by column name when ``axis=1`` (:issue:`27614`)
412412
- Bug in :meth:`DataFrameGroupby.agg` not able to use lambda function with named aggregation (:issue:`27519`)
413413
- Bug in :meth:`DataFrame.groupby` losing column name information when grouping by a categorical column (:issue:`28787`)
414+
- Bug in :meth:`DataFrameGroupBy.quantile` where NA values in the grouping could cause segfaults or incorrect results (:issue:`28882`)
414415

415416
Reshaping
416417
^^^^^^^^^

pandas/_libs/groupby.pyx

+3
Original file line numberDiff line numberDiff line change
@@ -766,6 +766,9 @@ def group_quantile(ndarray[float64_t] out,
766766
with nogil:
767767
for i in range(N):
768768
lab = labels[i]
769+
if lab == -1: # NA group label
770+
continue
771+
769772
counts[lab] += 1
770773
if not mask[i]:
771774
non_na_counts[lab] += 1

pandas/tests/groupby/test_function.py

+23
Original file line numberDiff line numberDiff line change
@@ -1373,6 +1373,29 @@ def test_quantile_out_of_bounds_q_raises():
13731373
g.quantile(-1)
13741374

13751375

1376+
def test_quantile_missing_group_values_no_segfaults():
1377+
# GH 28662
1378+
data = np.array([1.0, np.nan, 1.0])
1379+
df = pd.DataFrame(dict(key=data, val=range(3)))
1380+
1381+
# Random segfaults; would have been guaranteed in loop
1382+
grp = df.groupby("key")
1383+
for _ in range(100):
1384+
grp.quantile()
1385+
1386+
1387+
def test_quantile_missing_group_values_correct_results():
1388+
# GH 28662
1389+
data = np.array([1.0, np.nan, 3.0, np.nan])
1390+
df = pd.DataFrame(dict(key=data, val=range(4)))
1391+
1392+
result = df.groupby("key").quantile()
1393+
expected = pd.DataFrame(
1394+
[1.0, 3.0], index=pd.Index([1.0, 3.0], name="key"), columns=["val"]
1395+
)
1396+
tm.assert_frame_equal(result, expected)
1397+
1398+
13761399
# pipe
13771400
# --------------------------------
13781401

0 commit comments

Comments
 (0)