-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH/BUG: support Categorical in to_panel reshaping (GH8704) #8705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @immerrr can you have a look. I think So the problem is you can have multiple return blocks (e.g. lmk what you think. |
cc @shoyer you might also find this interesting, as you have brought this up before (e.g. a 2-d Categorical type) |
My hope was that we'll switch categoricals to dynd before too long, which would give us N-D categoricals for free. It looks like this works, although the fix is awkward. That said, getting the current categoricals fully N-dimensional really does not look that bad... I was most of the way there in #8012. |
I am not a fan of NonConsolidatable blocks in general. The idea may be great to get 90% of use cases working with low effort, but it adds a lot of semantic load, e.g. you have to think through every tiny bit of API to work with it properly (which is I guess the issue you hit with this PR). I suppose we could make categorical a proper dtype (or try and take one from dynd) to convey the category information and that would render NonConsolidatable hack unnecessary as categorical blocks will stop consolidating with object ones because of dtype mismatch. As a bonus, categorical blocks that actually share the category info could be merged and nd-categoricals will no longer be an issue. |
An interesting generalization of categorical dtype would be a record-like categorical, where each item across one of the axes can be of its own categorical type, since it is basically a MultiIndex. |
So a) could disallow this until we fix it (e.g. you can create a panel but selecting from it is just doesn't work) ? |
My point is exactly to put category information into dtype definition so that ones with different categories become unequal. |
@immerrr and I agree with you, but I don't think that's trivial / possible ATM (or maybe not ever, how does one define that these categories are different from others in a dtype-like? maybe hash it). The shape, sub-dtype is simply not good enough. I think we have to work in the existing framework. what do you think about the other ideas? |
Of those I like b) most. Intentionally or not, to me it sounds pretty much making categorical blocks "in some way consolidatable".
For the beginning, requiring a full match should be ok, e.g. |
373bb33
to
d210bdb
Compare
CLN: move block2d_to_blocknd support code to core/internal.py TST/BUG: support Categorical reshaping via .unstack
ENH/BUG: support Categorical in to_panel reshaping (GH8704)
CLN: move block2d_to_blocknd support code to core/internal.py
TST/BUG: support Categorical reshaping via .unstack
closes #8704