-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add categorize
method to Categoricals
#10374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmm, just saw so >>> s.cat.categorize(new_codes)
# rather than
>>> pd.Categorical.from_codes(new_codes, s.cat.categories) |
This is how I would do it (currently a bug reports here #10324)
|
@jreback the bug has been resolved, so your proposed solution works just fine, should this be closed? Not sure if there's a need for additional APIs for this. |
yep this looks good. |
I may be missing an existing method here. Didn't see anything though...
Given a Categorical
And some codes,
new_codes = pd.Series([0, 1, 0, 2], index=[4, 5, 6, 7])
, I'd like to easily categorize the new codes. I think this is essentiallyThat's not quite the same (what if
new_codes
doesn't have every code froms
, then it's "missing"? need to make sure the categories / codes are identical), but the basic idea is there.My use-case here is going from Categorical -> codes -> scikit-learn classifier -> prediction (codes). It'd be nice if the column could easily transform that result back.
Naming wise, I'd say
categorize
ordecode
. Should we also add the symmetricalencode
method forcategory
tocode
?If
new_codes
contains a previously unseencode
I'd say raise, or maybe insert nan (perhaps an option here?)The text was updated successfully, but these errors were encountered: