Skip to content

Commit f9a0b87

Browse files
authored
BUG: GroupBy.quantile implicitly sorts index.levels (#53049)
* fixed groupby quantile index.levels * changelog added * updated changelog * modified test to test whole frame - not sure but I think it is not checking index levels like this * updated tests to check index levels separately * avoid performance regression when not multiindex * tried to combine code blocks
1 parent 0826720 commit f9a0b87

File tree

3 files changed

+43
-3
lines changed

3 files changed

+43
-3
lines changed

doc/source/whatsnew/v2.1.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -414,6 +414,7 @@ Groupby/resample/rolling
414414
the function operated on the whole index rather than each element of the index. (:issue:`51979`)
415415
- Bug in :meth:`DataFrameGroupBy.apply` causing an error to be raised when the input :class:`DataFrame` was subset as a :class:`DataFrame` after groupby (``[['a']]`` and not ``['a']``) and the given callable returned :class:`Series` that were not all indexed the same. (:issue:`52444`)
416416
- Bug in :meth:`GroupBy.groups` with a datetime key in conjunction with another key produced incorrect number of group keys (:issue:`51158`)
417+
- Bug in :meth:`GroupBy.quantile` may implicitly sort the result index with ``sort=False`` (:issue:`53009`)
417418
- Bug in :meth:`GroupBy.var` failing to raise ``TypeError`` when called with datetime64, timedelta64 or :class:`PeriodDtype` values (:issue:`52128`, :issue:`53045`)
418419
-
419420

pandas/core/groupby/groupby.py

+12-3
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,10 @@ class providing the base-class of operations.
7070
)
7171
from pandas.util._exceptions import find_stack_level
7272

73-
from pandas.core.dtypes.cast import ensure_dtype_can_hold_na
73+
from pandas.core.dtypes.cast import (
74+
coerce_indexer_dtype,
75+
ensure_dtype_can_hold_na,
76+
)
7477
from pandas.core.dtypes.common import (
7578
is_bool_dtype,
7679
is_float_dtype,
@@ -4309,13 +4312,19 @@ def _insert_quantile_level(idx: Index, qs: npt.NDArray[np.float64]) -> MultiInde
43094312
MultiIndex
43104313
"""
43114314
nqs = len(qs)
4315+
lev_codes, lev = Index(qs).factorize()
4316+
lev_codes = coerce_indexer_dtype(lev_codes, lev)
43124317

43134318
if idx._is_multi:
43144319
idx = cast(MultiIndex, idx)
4315-
lev_codes, lev = Index(qs).factorize()
43164320
levels = list(idx.levels) + [lev]
43174321
codes = [np.repeat(x, nqs) for x in idx.codes] + [np.tile(lev_codes, len(idx))]
43184322
mi = MultiIndex(levels=levels, codes=codes, names=idx.names + [None])
43194323
else:
4320-
mi = MultiIndex.from_product([idx, qs])
4324+
nidx = len(idx)
4325+
idx_codes = coerce_indexer_dtype(np.arange(nidx), idx)
4326+
levels = [idx, lev]
4327+
codes = [np.repeat(idx_codes, nqs), np.tile(lev_codes, nidx)]
4328+
mi = MultiIndex(levels=levels, codes=codes, names=[idx.name, None])
4329+
43214330
return mi

pandas/tests/groupby/test_quantile.py

+30
Original file line numberDiff line numberDiff line change
@@ -471,3 +471,33 @@ def test_groupby_quantile_dt64tz_period():
471471
expected.index = expected.index.astype(np.int_)
472472

473473
tm.assert_frame_equal(result, expected)
474+
475+
476+
def test_groupby_quantile_nonmulti_levels_order():
477+
# Non-regression test for GH #53009
478+
ind = pd.MultiIndex.from_tuples(
479+
[
480+
(0, "a", "B"),
481+
(0, "a", "A"),
482+
(0, "b", "B"),
483+
(0, "b", "A"),
484+
(1, "a", "B"),
485+
(1, "a", "A"),
486+
(1, "b", "B"),
487+
(1, "b", "A"),
488+
],
489+
names=["sample", "cat0", "cat1"],
490+
)
491+
ser = pd.Series(range(8), index=ind)
492+
result = ser.groupby(level="cat1", sort=False).quantile([0.2, 0.8])
493+
494+
qind = pd.MultiIndex.from_tuples(
495+
[("B", 0.2), ("B", 0.8), ("A", 0.2), ("A", 0.8)], names=["cat1", None]
496+
)
497+
expected = pd.Series([1.2, 4.8, 2.2, 5.8], index=qind)
498+
499+
tm.assert_series_equal(result, expected)
500+
501+
# We need to check that index levels are not sorted
502+
expected_levels = pd.core.indexes.frozen.FrozenList([["B", "A"], [0.2, 0.8]])
503+
tm.assert_equal(result.index.levels, expected_levels)

0 commit comments

Comments
 (0)