BUG: transform and filter misbehave when grouping on categorical data #9994

evanpw · 2015-04-26T19:04:19Z

Fixes GH #9921
closes #9700

evanpw · 2015-04-26T19:08:11Z

pandas/core/groupby.py

-        names = result.columns
-        result = obj.merge(result, how='outer', left_index=True, right_index=True).iloc[:,-result.shape[1]:]
-        result.columns = names
-        return result


This section was almost unreachable (see the conditions above). In fact, it's no longer reached by test case that motivated it. And it's still not locked down enough: it can only work correctly when the initial groupby was on the index, which transform doesn't know. I have an idea on how to make something like this work in the general case, but I'll wait until this PR goes through.

probably ; pls run a perf test to validate

There is about a 25% slowdown in some cases:

------------------------------------------------------------------------------- Test name | head[ms] | base[ms] | ratio | ------------------------------------------------------------------------------- groupby_transform_series2 | 124.4840 | 124.6049 | 0.9990 | groupby_transform | 152.5413 | 151.6960 | 1.0056 | groupby_transform_series | 20.0484 | 19.6927 | 1.0181 | groupby_transform_ufunc | 112.1867 | 110.1057 | 1.0189 | groupby_transform_multi_key4 | 156.6887 | 138.9330 | 1.1278 | groupby_transform_multi_key3 | 918.0430 | 758.8653 | 1.2098 | groupby_transform_multi_key2 | 57.8043 | 46.5036 | 1.2430 | groupby_transform_multi_key1 | 87.1673 | 69.4257 | 1.2555 |

but that's because of the rest of the change, not this particular piece. (Because anything with a MultiIndex already takes the slow path). The groupby_transform_ufunc test is the relevant one, and there's no change there.

This piece of the change also fixes GH #9700. I'll add a test for that.

ok if u can profile and see if u can fix would be gr8

evanpw · 2015-04-27T01:12:51Z

That small change fixed most of the performance difference:

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
groupby_transform_series2                    | 124.1800 | 124.5937 |   0.9967 |
groupby_transform_series                     |  19.8309 |  19.8836 |   0.9974 |
groupby_transform                            | 151.5836 | 151.8770 |   0.9981 |
groupby_transform_ufunc                      | 112.3510 | 110.2444 |   1.0191 |
groupby_transform_multi_key4                 | 143.6467 | 137.7186 |   1.0430 |
groupby_transform_multi_key3                 | 802.0917 | 753.2163 |   1.0649 |
groupby_transform_multi_key2                 |  49.4637 |  46.3487 |   1.0672 |
groupby_transform_multi_key1                 |  73.4100 |  68.7597 |   1.0676 |

jreback · 2015-04-28T00:37:59Z

pandas/core/groupby.py

@@ -3126,8 +3128,8 @@ def filter(self, func, dropna=True, *args, **kwargs):
            # interpret the result of the filter
            if (isinstance(res, (bool, np.bool_)) or


change to is_bool_dtype

jreback · 2015-04-28T00:40:23Z

lgtm. pls squash, and ping when green.

… (GH 9921)

evanpw · 2015-04-28T12:02:25Z

Conditional is cleaned up, squashed, and tests are green.

BUG: transform and filter misbehave when grouping on categorical data

jreback · 2015-04-29T10:32:00Z

@evanpw thanks!

evanpw reviewed Apr 26, 2015
View reviewed changes

jreback added Bug Categorical Categorical Data Type labels Apr 28, 2015

jreback added this to the 0.16.1 milestone Apr 28, 2015

jreback reviewed Apr 28, 2015
View reviewed changes

jreback added the Groupby label Apr 28, 2015

BUG: transform and filter misbehave when grouping on categorical data…

3d73550

… (GH 9921)

evanpw force-pushed the issue_9921 branch from 2e93037 to 3d73550 Compare April 28, 2015 10:44

jreback added a commit that referenced this pull request Apr 29, 2015

Merge pull request #9994 from evanpw/issue_9921

c9d1ef9

BUG: transform and filter misbehave when grouping on categorical data

jreback merged commit c9d1ef9 into pandas-dev:master Apr 29, 2015

jreback mentioned this pull request Apr 29, 2015

groupby transform misbehaving with categoricals #9921

Closed

evanpw deleted the issue_9921 branch June 10, 2015 13:41

evanpw restored the issue_9921 branch September 19, 2015 00:34

evanpw deleted the issue_9921 branch September 19, 2015 00:38

jreback mentioned this pull request Mar 29, 2016

PERF: DataFrame groupby with fast transform #12737

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: transform and filter misbehave when grouping on categorical data #9994

BUG: transform and filter misbehave when grouping on categorical data #9994

evanpw commented Apr 26, 2015

evanpw Apr 26, 2015

jreback Apr 26, 2015

evanpw Apr 26, 2015

jreback Apr 26, 2015

evanpw commented Apr 27, 2015

jreback Apr 28, 2015

jreback commented Apr 28, 2015

evanpw commented Apr 28, 2015

jreback commented Apr 29, 2015

		@@ -3126,8 +3128,8 @@ def filter(self, func, dropna=True, args, *kwargs):
		# interpret the result of the filter
		if (isinstance(res, (bool, np.bool_)) or

BUG: transform and filter misbehave when grouping on categorical data #9994

BUG: transform and filter misbehave when grouping on categorical data #9994

Conversation

evanpw commented Apr 26, 2015

evanpw Apr 26, 2015

Choose a reason for hiding this comment

jreback Apr 26, 2015

Choose a reason for hiding this comment

evanpw Apr 26, 2015

Choose a reason for hiding this comment

jreback Apr 26, 2015

Choose a reason for hiding this comment

evanpw commented Apr 27, 2015

jreback Apr 28, 2015

Choose a reason for hiding this comment

jreback commented Apr 28, 2015

evanpw commented Apr 28, 2015

jreback commented Apr 29, 2015