BUG: Respect Filtered Categorical in Crosstab #13073

gfyoung · 2016-05-03T19:44:07Z

Closes #12298 by stripping the Categorical elements of their original categories so that only their included values are considered during the counting / aggregation.

jreback · 2016-05-03T19:53:04Z

this should put NaN where the categories are not represented, then dropna can control this.

gfyoung · 2016-05-03T19:57:02Z

Why would we want dropna to control this? What advantage would you have by counting unrepresented categories as NaN?

Also, not sure how best to efficiently implement that since you would have to iterate to check which categories are not represented and then replace them all with NaN? Gymnastics much?

jreback · 2016-05-03T20:01:20Z

In [19]: result = pd.crosstab(df_filtered.col0, df_filtered.col1)

In [20]: result.iloc[2] = np.nan

In [21]: result
Out[21]: 
col1    1    2    3
col0               
a     2.0  0.0  0.0
b     1.0  1.0  0.0
c     NaN  NaN  NaN

jreback · 2016-05-03T20:01:42Z

of course the default is dropna to maybe this should simply drop all of these (but if dropna=False) then should leave.

gfyoung · 2016-05-04T15:02:01Z

Don't see an explanation as to why keeping NaN is useful.
Your code won't work. When you call df.pivot_table, there will be no NaN's to fill at that point. The data you will see is {rowname: Categorical,...}

codecov-io · 2016-05-05T15:53:47Z

Current coverage is 84.14%

Merging #13073 into master will increase coverage by +<.01%

@@             master     #13073   diff @@
==========================================
  Files           138        137     -1   
  Lines         50380      50231   -149   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          42380      42265   -115   
+ Misses         8000       7966    -34   
  Partials          0          0

4 files (not in diff) in pandas/tseries were modified. more
- Misses -4
- Hits -26
3 files (not in diff) in pandas/io were modified. more
- Misses +54
6 files (not in diff) in pandas/core were modified. more
- Misses -16
- Hits -69
3 files (not in diff) in pandas were modified. more
- Misses -2
- Hits -24
File pandas/io/s3.py (not in diff) was deleted. more

Powered by Codecov. Last updated by 4de83d2...5b9952b

jreback · 2016-05-05T17:32:40Z

@gfyoung I know you keep updating this, but:

I think you need to deal with dropna=False and
simply converting to object (and listifying) is not a satisfying soln

so need to map out what the output should actually be. haven't really though this thru, but
I think categorical actually means something, you DO want to propogate it. (or at least we do in some situations). So need to figure out if this is one of those.

like others to weigh in here.

@jorisvandenbossche @JanSchulz

gfyoung · 2016-05-05T18:21:17Z

@jreback : for some reason, there was a merge conflict on my branch locally that wasn't showing up a GitHub, hence the update. In any case, I think your point of view hinges essentially on whether or not preserving Categorical information is needed. As a contributor, I generally would side with the user (in this case, whomever filed the issue) unless there is a strong argument against, which IMHO I have not seen either here or in the issue.

gfyoung · 2016-05-12T16:41:12Z

@jorisvandenbossche @JanSchulz : any thoughts on what @jreback has said?

Closes pandas-devgh-12298. [ci skip]

jankatins · 2016-05-12T22:02:40Z

pandas/tools/pivot.py

+    def _make_list(arr):
+        # see gh-12298
+        if com.is_categorical(arr):
+            arr = Series(list(arr), name=arr.name)


There should be a more efficient way to do that (than convert to list in between), by using the dtype of the categories.

jankatins · 2016-05-12T22:11:00Z

I think categorical actually means something, you DO want to propogate it. (or at least we do in some situations).

I'm firmly in that camp: if I have categoricals and some of them are not used, I want counts for that category show up as "0". So from my perspective #12298 works as intended and I would reject this premise (#12298 (comment)) and the row should not be dropped:

if we accept the premise that we don't want to always represent categorical results (IOW we treat them like regular results), then a filtered row should be dropped.

The only exception I can think of is if there is any way to distinguish between a empty categorical because of the filter and simply a not used category in the original data. But I don't think this is possible.

gfyoung · 2016-05-12T22:15:36Z

@JanSchulz : I wouldn't have major issues leaving the behaviour as is, for I was in the same camp as you are when I saw this issue as well. We could then just close this PR as well as the issue if there is enough consensus that this is indeed "correct" behaviour.

@jreback ? @jorisvandenbossche ?

jorisvandenbossche · 2016-05-12T22:34:05Z

I follow @JanSchulz in this one

gfyoung · 2016-05-12T22:36:51Z

Alright, seems like consensus is to close this PR as well as #12298, which is perfectly fine with me. @jreback (or someone else with write access) can do the honours?

jreback · 2016-05-13T00:20:25Z

@gfyoung I tell you what. How about a nice example and explanation in the docs where cross tab is?

We'll close this one and the issue.

gfyoung · 2016-05-13T00:43:14Z

@jreback : Sounds good. Will do.

Follow-on to #13073 by explaining the `Categorical` behaviour in the documentation. Author: gfyoung <[email protected]> Closes #13177 from gfyoung/crosstab-categorical-explain and squashes the following commits: 11ebb94 [gfyoung] DOC: Clarify Categorical Crosstab Behaviour

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Categorical Categorical Data Type labels May 3, 2016

gfyoung force-pushed the crosstab-categorical branch from 5b9952b to efd24a1 Compare May 5, 2016 15:53

gfyoung force-pushed the crosstab-categorical branch 11 times, most recently from 1cd038d to 84e5e33 Compare May 11, 2016 23:14

BUG: Respect filtered Categorical in crosstab

31db27e

Closes pandas-devgh-12298. [ci skip]

gfyoung force-pushed the crosstab-categorical branch from 84e5e33 to 31db27e Compare May 12, 2016 21:22

jankatins reviewed May 12, 2016
View reviewed changes

jreback closed this May 13, 2016

jreback added this to the No action milestone May 13, 2016

gfyoung deleted the crosstab-categorical branch May 13, 2016 00:46

gfyoung mentioned this pull request May 14, 2016

DOC: Clarify Categorical Crosstab Behaviour #13177

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Respect Filtered Categorical in Crosstab #13073

BUG: Respect Filtered Categorical in Crosstab #13073

gfyoung commented May 3, 2016

jreback commented May 3, 2016

gfyoung commented May 3, 2016

jreback commented May 3, 2016

jreback commented May 3, 2016

gfyoung commented May 4, 2016

codecov-io commented May 5, 2016 •

edited

Loading

jreback commented May 5, 2016 •

edited

Loading

gfyoung commented May 5, 2016 •

edited

Loading

gfyoung commented May 12, 2016

jankatins May 12, 2016

jankatins commented May 12, 2016 •

edited

Loading

gfyoung commented May 12, 2016 •

edited

Loading

jorisvandenbossche commented May 12, 2016

gfyoung commented May 12, 2016

jreback commented May 13, 2016

gfyoung commented May 13, 2016

BUG: Respect Filtered Categorical in Crosstab #13073

BUG: Respect Filtered Categorical in Crosstab #13073

Conversation

gfyoung commented May 3, 2016

jreback commented May 3, 2016

gfyoung commented May 3, 2016

jreback commented May 3, 2016

jreback commented May 3, 2016

gfyoung commented May 4, 2016

codecov-io commented May 5, 2016 • edited Loading

Current coverage is 84.14%

jreback commented May 5, 2016 • edited Loading

gfyoung commented May 5, 2016 • edited Loading

gfyoung commented May 12, 2016

jankatins May 12, 2016

Choose a reason for hiding this comment

jankatins commented May 12, 2016 • edited Loading

gfyoung commented May 12, 2016 • edited Loading

jorisvandenbossche commented May 12, 2016

gfyoung commented May 12, 2016

jreback commented May 13, 2016

gfyoung commented May 13, 2016

codecov-io commented May 5, 2016 •

edited

Loading

jreback commented May 5, 2016 •

edited

Loading

gfyoung commented May 5, 2016 •

edited

Loading

jankatins commented May 12, 2016 •

edited

Loading

gfyoung commented May 12, 2016 •

edited

Loading