-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Respect Filtered Categorical in Crosstab #13073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
this should put |
Why would we want Also, not sure how best to efficiently implement that since you would have to iterate to check which categories are not represented and then replace them all with |
|
of course the default is |
|
5b9952b
to
efd24a1
Compare
Current coverage is 84.14%@@ master #13073 diff @@
==========================================
Files 138 137 -1
Lines 50380 50231 -149
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 42380 42265 -115
+ Misses 8000 7966 -34
Partials 0 0
|
@gfyoung I know you keep updating this, but:
so need to map out what the output should actually be. haven't really though this thru, but like others to weigh in here. |
@jreback : for some reason, there was a merge conflict on my branch locally that wasn't showing up a GitHub, hence the update. In any case, I think your point of view hinges essentially on whether or not preserving |
1cd038d
to
84e5e33
Compare
@jorisvandenbossche @JanSchulz : any thoughts on what @jreback has said? |
Closes pandas-devgh-12298. [ci skip]
84e5e33
to
31db27e
Compare
def _make_list(arr): | ||
# see gh-12298 | ||
if com.is_categorical(arr): | ||
arr = Series(list(arr), name=arr.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a more efficient way to do that (than convert to list in between), by using the dtype of the categories.
I'm firmly in that camp: if I have categoricals and some of them are not used, I want counts for that category show up as "0". So from my perspective #12298 works as intended and I would reject this premise (#12298 (comment)) and the row should not be dropped:
The only exception I can think of is if there is any way to distinguish between a empty categorical because of the filter and simply a not used category in the original data. But I don't think this is possible. |
@JanSchulz : I wouldn't have major issues leaving the behaviour as is, for I was in the same camp as you are when I saw this issue as well. We could then just close this PR as well as the issue if there is enough consensus that this is indeed "correct" behaviour. |
I follow @JanSchulz in this one |
@gfyoung I tell you what. How about a nice example and explanation in the docs where cross tab is? We'll close this one and the issue. |
@jreback : Sounds good. Will do. |
Follow-on to #13073 by explaining the `Categorical` behaviour in the documentation. Author: gfyoung <[email protected]> Closes #13177 from gfyoung/crosstab-categorical-explain and squashes the following commits: 11ebb94 [gfyoung] DOC: Clarify Categorical Crosstab Behaviour
Closes #12298 by stripping the
Categorical
elements of their original categories so that only their included values are considered during the counting / aggregation.