BUG: Fixed groupby quantile for listlike q #27827

TomAugspurger · 2019-08-08T20:36:24Z

Closes #27526

Just a WIP right now. This is under tested for

groupby(sort=False)
non-sorted q
groupby(multiple levels)

Closes pandas-dev#27526

…rray

…ntile-array

TomAugspurger · 2019-08-13T16:07:28Z

doc/source/whatsnew/v0.25.1.rst

@@ -118,6 +118,7 @@ Plotting
 Groupby/resample/rolling
 ^^^^^^^^^^^^^^^^^^^^^^^^

+- Fixed regression in :meth:`pands.core.groupby.DataFrameGroupBy.quantile` raising when multiple quantiles are given (:issue:`27526`)


Typo: pandas

Multiple -> list-like

TomAugspurger · 2019-08-13T16:11:08Z

pandas/core/groupby/groupby.py

+                )
+                for qi in q
+            ]
+            # fix levels to place quantiles on the inside


Would welcome improvements to this. Essentially, we have two things to do after concating the results

Reorder the levels. By default concat places the keys on the outermost level, we need it on the innermost

Re-sort things. Initially we're sorted by quantile (the q-th quantile for every group is together, then the next quantile). We need to be grouped by group key, then quantiles. We can't just use sort_index, since we need to preserve the user-provided group key order with sort=False and the order of the quintile list.

yah this isn't fun. just checking: we can have mixed dtypes here? if we had a single dtype then stack/unstack would be cheap and there might be a prettier workaround

concat(results, axis=1, keys=q).stack(0) gives the same answer for at least one of the test cases

Unfortunately that sorts the axis. I theory, we should be able to follow it up with a loc to restore the correct order on that level,

diff --git a/pandas/core/groupby/groupby.py b/pandas/core/groupby/groupby.py index c00bd8398..617466b8d 100644 --- a/pandas/core/groupby/groupby.py +++ b/pandas/core/groupby/groupby.py @@ -1928,23 +1928,10 @@ class GroupBy(_GroupBy): for qi in q ] # fix levels to place quantiles on the inside - result = concat(results, axis=0, keys=q) - order = np.roll(list(range(result.index.nlevels)), -1) - result = result.reorder_levels(order) - result = result.reindex(q, level=-1) - - # fix order. - hi = len(q) * self.ngroups - arr = np.arange(0, hi, self.ngroups) - arrays = [] - - for i in range(self.ngroups): - arr = arr + i - arrays.append(arr) - - indices = np.concatenate(arrays) - assert len(indices) == len(result) - return result.take(indices) + result = concat(results, axis=1, keys=q).stack(0) + slices = [slice(None)] * result.index.ndim + result = result.loc[tuple(slices), :] + return result @Substitution(name="groupby") def ngroup(self, ascending=True):

But that runs into the (I think known) issue with loc & a MultiIndex

In [21]: s = pd.Series(list(range(12)), index=pd.MultiIndex.from_product([['a', 'b', 'c'], [1, 2, 3, 4]])) In [22]: s.loc[pd.IndexSlice[:, [3, 1, 2, 4]]] Out[22]: a 1 0 2 1 3 2 4 3 b 1 4 2 5 3 6 4 7 c 1 8 2 9 3 10 4 11 dtype: int64

Darn. Can you add a my-idea-doesnt-work case to the tests

TomAugspurger · 2019-08-19T02:56:14Z

@WillAyd @jbrockmendel do you have time to take a look at this?

jbrockmendel · 2019-08-19T03:00:51Z

will look in AM

…ntile-array

TomAugspurger · 2019-08-20T14:25:08Z

@jbrockmendel I added your alternative implementation as a TODO once the MultiIndex issue (#10710) is fixed.

Darn. Can you add a my-idea-doesnt-work case to the tests

It's covered.

TomAugspurger · 2019-08-20T16:17:52Z

CI failure is just codecov.

TomAugspurger · 2019-08-21T18:43:17Z

Merging in 1 hour if there aren't any objections (only CI failure is codecov).

* BUG: Fixed groupby quantile for listlike q Closes pandas-dev#27526

* master: (40 commits) DOC: Fix GL01 and GL02 errors in the docstrings (pandas-dev#27988) Remove Encoding of values in char** For Labels (pandas-dev#27618) TYPING: more type hints for io.formats.printing (pandas-dev#27765) TST: fix compression tests when run without virtualenv/condaenv (pandas-dev#28051) DOC: Start 0.25.2 (pandas-dev#28111) DOC: Fix docstrings lack of punctuation (pandas-dev#28031) DOC: Remove alias for numpy.random.randn from the docs (pandas-dev#28082) DOC: update GroupBy.head()/tail() documentation (pandas-dev#27844) BUG: timedelta merge asof with tolerance (pandas-dev#27650) BUG: Series.rename raises error on values accepted by Series construc… (pandas-dev#27814) Preserve index when setting new column on empty dataframe. (pandas-dev#26471) BUG: Fixed groupby quantile for listlike q (pandas-dev#27827) BUG: iter with readonly values, closes pandas-dev#28055 (pandas-dev#28074) TST: non-strict xfail for period test (pandas-dev#28072) DOC: Update whatsnew (pandas-dev#28073) CI: disable codecov (pandas-dev#28065) CI: Set SHA for codecov upload (pandas-dev#28067) BUG: Correct the previous bug fixing on xlim for plotting (pandas-dev#28059) CI: Add pip dependence explicitly (pandas-dev#28008) DOC: Change document code prun in a row (pandas-dev#28029) ...

BUG: Fixed groupby quantile for listlike q

0236046

Closes pandas-dev#27526

TomAugspurger added this to the 0.25.1 milestone Aug 8, 2019

TomAugspurger added the Groupby label Aug 8, 2019

TomAugspurger added 6 commits August 9, 2019 11:01

Merge remote-tracking branch 'upstream' into 27526-groupby-quantile-a…

c1a447f

…rray

bad

ca2411a

Merge remote-tracking branch 'upstream/master' into 27526-groupby-qua…

e060d1d

…ntile-array

wip

f66a67b

Merge remote-tracking branch 'upstream/master' into 27526-groupby-qua…

421c80f

…ntile-array

Remove dtype selection

624d33b

TomAugspurger commented Aug 13, 2019

View reviewed changes

TomAugspurger added 3 commits August 19, 2019 15:07

Merge remote-tracking branch 'upstream/master' into 27526-groupby-qua…

a9fb4f6

…ntile-array

Merge remote-tracking branch 'upstream/master' into 27526-groupby-qua…

945648e

…ntile-array

add future implementation

dc5147a

TomAugspurger merged commit 8f6118c into pandas-dev:master Aug 22, 2019

TomAugspurger deleted the 27526-groupby-quantile-array branch August 22, 2019 11:28

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Aug 22, 2019

Backport PR pandas-dev#27827: BUG: Fixed groupby quantile for listlike q

cefc8a4

meeseeksmachine mentioned this pull request Aug 22, 2019

Backport PR #27827 on branch 0.25.x (BUG: Fixed groupby quantile for listlike q) #28085

Merged

TomAugspurger pushed a commit that referenced this pull request Aug 22, 2019

Backport PR #27827: BUG: Fixed groupby quantile for listlike q (#28085)

91aa2f5

galuhsahid pushed a commit to galuhsahid/pandas that referenced this pull request Aug 25, 2019

BUG: Fixed groupby quantile for listlike q (pandas-dev#27827)

8091bd7

* BUG: Fixed groupby quantile for listlike q Closes pandas-dev#27526

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fixed groupby quantile for listlike q #27827

BUG: Fixed groupby quantile for listlike q #27827

TomAugspurger commented Aug 8, 2019

TomAugspurger Aug 13, 2019

TomAugspurger Aug 13, 2019

TomAugspurger Aug 13, 2019

jbrockmendel Aug 19, 2019

jbrockmendel Aug 19, 2019

TomAugspurger Aug 19, 2019

jbrockmendel Aug 19, 2019

TomAugspurger commented Aug 19, 2019

jbrockmendel commented Aug 19, 2019

TomAugspurger commented Aug 20, 2019

TomAugspurger commented Aug 20, 2019

TomAugspurger commented Aug 21, 2019

BUG: Fixed groupby quantile for listlike q #27827

BUG: Fixed groupby quantile for listlike q #27827

Conversation

TomAugspurger commented Aug 8, 2019

TomAugspurger Aug 13, 2019

Choose a reason for hiding this comment

TomAugspurger Aug 13, 2019

Choose a reason for hiding this comment

TomAugspurger Aug 13, 2019

Choose a reason for hiding this comment

jbrockmendel Aug 19, 2019

Choose a reason for hiding this comment

jbrockmendel Aug 19, 2019

Choose a reason for hiding this comment

TomAugspurger Aug 19, 2019

Choose a reason for hiding this comment

jbrockmendel Aug 19, 2019

Choose a reason for hiding this comment

TomAugspurger commented Aug 19, 2019

jbrockmendel commented Aug 19, 2019

TomAugspurger commented Aug 20, 2019

TomAugspurger commented Aug 20, 2019

TomAugspurger commented Aug 21, 2019