-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG/ENH: categorical returned during a transform #8065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @JanSchulz maybe need some inference on the returned categoricals |
Ok, here are multiple issues:
One problem is that in case of categorical the IMO the best way would be to construct a Series when we have the first @jreback: can you merge the fixup PR, then I can submit a new PR for this one without the need to rebase constantly the whole PR. |
@JanSchulz ok, fixups merged. yeh the |
@JanSchulz if you get to this before 0.15 ok, otherwise will push it (I may take a look as well). |
Let's see if I find time tomorrow and if not it will have to wait until after my holidays :-/ The first point is addressed (qcut returns a ordered categorical). |
This is not easy, this is the current code:
But this is not going to work with categoricals, because it stuffs the new values together with the old ones which will always convert the categoricals to a numpy type. There are three possible ways:
The "throw an error" variant is not always "correct", because one could return a categorical with the same categories for all groups and this should actually succeed. |
@jreback can you comment here? |
@JanSchulz let me take a look. |
@JanSchulz this is related to #7883 The issue is that I did a short-cut to make Series.transformation fast. But that didn't with multiple-dtypes well. So have to defer this (as you can use the work-around I suggested in this issue). We can fix for 0.15.1. |
This now returns an object series of Intervals (soon to be an IntervalArray). I think the original issue is fixed. In [140]: df.groupby('grp')['x'].transform(pd.qcut, 3)
Out[140]:
0 (0.796, 0.945]
1 (0.571, 0.796]
2 (0.571, 0.796] |
http://stackoverflow.com/questions/25372877/python-pandas-groupby-and-qcut-doesnt-work-in-0-14-1
worked in 0.13 (but returns a 2 element series with a list)
much cleaner.
The text was updated successfully, but these errors were encountered: