Skip to content

WIP: generalize categorical to N-dimensions #8012

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Aug 13, 2014

No description provided.

@shoyer
Copy link
Member Author

shoyer commented Aug 13, 2014

closing until I make more progress

@@ -269,10 +269,23 @@ def __init__(self, values, levels=None, ordered=None, name=None, fastpath=False,
self.levels = levels
self.name = name

def _replace_codes(self, codes):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is is more used like a _constructor(...).

@jankatins
Copy link
Contributor

Maybe put in the _finalize... thingies which are used in pandas/numpy? Each method would need a change to call finalize then on new categoricals...

@jreback
Copy link
Contributor

jreback commented Aug 13, 2014

@shoyer so you want an n-dim codes array (rather than 1-d). How would this be used?
conceptually this is the 'same' as a DataFrame of categorical series (that maybe share levels, for 2-d). What is your usecase for this (aside from xray)?

@shoyer
Copy link
Member Author

shoyer commented Aug 13, 2014

@jreback I expect that allowing for n-dimensional categoricals could lead to much higher performance for a DataFrame with multiple columns of the same type, in the same way that multi-dimensional arrays are positive for performance for other dtypes.

e.g., you could call unstack on a Categorical series essentially for free (since it's just a numpy reshape under the covers). (though I'm not sure enough about the pandas Block system to be sure about this)

@jreback
Copy link
Contributor

jreback commented Aug 13, 2014

@shoyer ok, that is reasonable. And in fact right now Categoricals are kept in completely separate blocks regardless of their internal structure. It is possible to combine them, so your soln would make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants