-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: pivot table bug with Categorical indexes, #10993 #11371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: pivot table bug with Categorical indexes, #10993 #11371
Conversation
TST: add test case from Issue pandas-dev#10989
603e99b
to
76d7c9a
Compare
My argument was always that Categorical should not gain any more public methods, as I think the "normal" case for working with lickert scales and such things is IMO satisfied. Reading #10989 I think this is something which can't be solved without problems: converting to a object index will loose the order/... but keeping the categorical index but appending a value will also change the underlying data type: if I work with the underlying "data" (aka the Given this two cases, I would opt for the first one, as it is similar to what happens the integer ones:
A third solution would be to add a check that margin does not work with categorical. |
I think adding new category is useful, but have no good idea how Appending ordered=True and orderd=True results in ordered=True, this looks OK.
However, there can be a case which category orders conflict. This case should be orderd=False?
Appending ordered=True and orderd=False results in ordered=True. I think this should be ordered=False because we can't know second categories order.
|
I think about each instance of a
-> It's converted to an index type, which can take both data types. IMO one could argue that appending two categoricals should "upcast" to object, as that's at least deterministic: |
@JanSchulz not gaining any methods here at all. It doesn't make sense to Now upcasting to object is a quite reasonable argument, though we currently allow index preservation when the new values ARE in the categories (IOW its still a CI). Further pandas in general just works, rarely raising these types of conversion errors (which is why this should do something logical and deterministic). So we need to allow index insert/append otherwise CI are crippled compared to other indexes. Since we are creating new indexes this is simply like To make this somewhat simple we need a convention.
so appears @JanSchulz is for a) anyone else? |
@jreback And there is also option e) raise an error (the current situation) I am following @JanSchulz that we should not just change the categories of an object by appending or inserting. The categories are a fundamental part of the dtype, and IMHO should not be changed lightly by some operation (unless it is explicitely doing that like From that position, I would be for either convert to object or either raise an error (and 'fix the symptoms' in places where this leads to errors (@jakevdp 's approach for the pivot_table). |
76d7c9a
to
f5442ba
Compare
I tried to make these auto-coerce to ok, so reverted to use a modified version of @jakevdp soln; had an embedded bug which is fixed as well. thanks for the commentary @JanSchulz @jorisvandenbossche |
f5442ba
to
97043da
Compare
…ed CategoricalIndexes Signed-off-by: Jeff Reback <[email protected]>
97043da
to
7ca878e
Compare
BUG: pivot table bug with Categorical indexes, #10993
@jreback What do you mean xith "doesn't quite work"? |
oh, I tried the strategy of having |
ok! |
closes #10993
replaces #10989
So issue #10993 involves the insertion of a key into a multi-index that has as one of its levels a
CategoricalIndex
. This causes the semantics to break down because we are inserting an new key.Existing
New
The only issue that was slightly controversial is that
.insert
will retain theordered
attribute (and new categories go to the end). while.append
will always haveordered=False
. In theory we could do the same, but.append
is used to generally append anotherCategoricalIndex
, so you would have some possiblity of interleaving of the 'ordered' categories (IOW, if self has[1,2,3]
and other has[3,2,1,4]
, then the result will be[1,2,3,4]
.We could raise if we have mixed ordering (e.g. self is
ordered=False
, other isordered=True
).Of course the user is free to reorder and such, but the default should be intuitive.
Futher note that we can now concat pandas objects with
CategoricalIndexes
(I don't think was specified before, certainly not tested), e.g.