-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API/BUG: inconsistent plotting of new CategoricalIndex #10254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@TomAugspurger Can you look at this (I won't have time until second half of next week) Or maybe @sinhrks ? |
If I have a chance I will this weekend.
|
@jorisvandenbossche we do you think about the following two rules when plotting Categoricals (Series or Index):
|
Another drawback is that when you have integer categories, this can look odd (eg my example I use in the notebook: with categories [1,2,4] but codes [0,1,2]). |
Yes, I think these are definitely the right rules. This is also consistent with
Actually, the way that Categorical works, codes are always in the same order as the categories -- note that |
In [1]: s = pd.Series(pd.Categorical([0, 2, 1], ordered=True))
In [2]: s.cat.codes
Out[2]:
0 0
1 2
2 1
dtype: int8 Seems you're correct. I'm not sure what I was seeing yesterday then. That's good. I think this means we push until 0.17 for this? |
I also agree with the rules, the discussion point is more how to put them into place:
|
I see... I'd vote for |
yes, that makes a point, but now it is also completely broken (it really doesn't make sense how it is plotted now). And the release will only be the latest one a few months, but a lot of people will maybe use this release a few years .. |
Other option would be: d) only change it for CategoricalIndex (broken in 0.16.1 anyway) and leave Categorical/Series as is and only change this in 0.17 |
I was just going to suggest that :) It's a bit weird because of the inconsistency between CategoricalIndex v. CategoricalSeries, but I suppose it's better for now. |
Ok, I'll at least get a PR together for that, hopefully by Wednesday. To summarize
|
One disadvantage of your rules above is that using pandas plotting machinery or matplotlib will result in a different output then:
So these two otherwise rather similar calls will end up differently. |
Looks good rule for
|
I'm still working on this. I have the positioning using |
Ok... this is more messy than my original rules. Right now, for BarPlot absolutely makes sense. I think I have that "working" for a CategoricalIndex. Things are still a bit strange since we don't actually use the This is consistent with how regular Indexes work with barplots when there are dupes. |
I'm just going around in circles at this point... I think our plotting is fine. The only case I see for plotting a categorical is Maybe I'm missing something. |
@TomAugspurger I think line plots make sense for categorical variables if you use |
is the conclusion that we don't do anything for 0.16.2 but make an API change for 0.17.0? |
@jorisvandenbossche @TomAugspurger moving this to 0.17.0 unless you have something imminent. |
Nothing imminent.
|
very exciting by mpl! this will be in 2.0? |
@jorisvandenbossche @TomAugspurger we have a bunch of categorical plotting issues |
This is aimed at 2.1 |
Well, this is kinda "fixed" since we drop non-numeric data before plotting, and AFAICT, categorical is always considered non-numeric, regardless of the underlying type. This can be improved, but isn't a blocker. |
To record the issue we discussed yesterday, to be solved for 0.16.2
Plotting of categorical:
range(len(cat))
x dataOverview: http://nbviewer.ipython.org/gist/jorisvandenbossche/992d9d34dbfcfd8bc326
Disclaimer: didn't yet look into the code to see why this is like this.
Way forward:
The text was updated successfully, but these errors were encountered: