Skip to content

Add arithmetic to categoricals? #8629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shoyer opened this issue Oct 25, 2014 · 6 comments
Closed

Add arithmetic to categoricals? #8629

shoyer opened this issue Oct 25, 2014 · 6 comments
Labels
API Design Categorical Categorical Data Type Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@shoyer
Copy link
Member

shoyer commented Oct 25, 2014

I'm not entirely sure why (perhaps just the trouble of implementing it), but categoricals do not currently support arithmetic:

In contrast to statistical categorical variables, categorical data might have an order (e.g. ‘strongly agree’ vs ‘agree’ or ‘first observation’ vs. ‘second observation’), but numerical operations (additions, divisions, ...) are not possible.

Similarly to string operations (#8627), arithmetic with scalars could very efficient transform categoricals into new categoricals.

In contrast, array + array operations should probably just return a normal array.

CC @JanSchultz @jreback

@jreback
Copy link
Contributor

jreback commented Oct 25, 2014

i believe the thinking was that Categoricals were primarily for strings. But for a numeric (sub-dtype) then certainly this is possible.

can you update with an example of the usecase pls.

@jreback jreback added Categorical Categorical Data Type Numeric Operations Arithmetic, Comparison, and Logical operations labels Oct 25, 2014
@jreback jreback added this to the 0.16.0 milestone Oct 25, 2014
@shoyer
Copy link
Member Author

shoyer commented Oct 25, 2014

Yeah, I'm actually not entirely sure this is a good idea, given that numbers can already be represented as efficiently in essentially the same space as the categorical codes.

@shoyer
Copy link
Member Author

shoyer commented Oct 25, 2014

Actually going to close this, I was getting ahead of myself here. I'm struggling to think of an actual use case for this.

@shoyer shoyer closed this as completed Oct 25, 2014
@jreback
Copy link
Contributor

jreback commented Oct 25, 2014

hmm,maybe for a limited set(e.g. < 128) it would save some memory

@jankatins
Copy link
Contributor

Categoricals represents "categories", so in "most" cases nothing with a defined difference (needed for '-' and '+') and no zero (so no '*' or '/'). Even if we have integers as a representation for the categories...

If we really want to have some memory efficient whatever, we can, but then please make that a different class.

@jankatins
Copy link
Contributor

@shoyer Just FYI: Schulz without 'tz', I just found this issues :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Categorical Categorical Data Type Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

3 participants