-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Rename categories with Series #17982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
b551b28
19af62a
f659e78
6460bc5
506f9ba
4f6a1dd
cbcc723
aec380d
0e086ca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -866,11 +866,6 @@ def set_categories(self, new_categories, ordered=None, rename=False, | |
def rename_categories(self, new_categories, inplace=False): | ||
""" Renames categories. | ||
|
||
The new categories can be either a list-like dict-like object. | ||
If it is list-like, all items must be unique and the number of items | ||
in the new categories must be the same as the number of items in the | ||
old categories. | ||
|
||
Raises | ||
------ | ||
ValueError | ||
|
@@ -879,8 +874,26 @@ def rename_categories(self, new_categories, inplace=False): | |
|
||
Parameters | ||
---------- | ||
new_categories : Index-like or dict-like (>=0.21.0) | ||
The renamed categories. | ||
new_categories : list-like or dict-like | ||
The categories end up with | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this seems like an incomplete sentence ? |
||
|
||
.. versionchanged:: 0.21.0 | ||
|
||
new_categories may now also be dict-like, in which case it | ||
specifies a mapping from old-categories to new. | ||
|
||
If it is list-like, all items must be unique and the number of | ||
items in the new categories must match the existing number of | ||
categories. | ||
|
||
If dict-like, categories not contained in the mapping are passed | ||
through. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find this a bit unstructured with the versionchanged and the further explanation splitted. Suggestion (just copy paste, not edited sentences):
(only not fully sure how the versionchanges works in a list) |
||
|
||
.. warning:: | ||
|
||
Currently, Series are considered list like. In a future version | ||
of pandas they'll be considered dict-like. | ||
|
||
inplace : boolean (default: False) | ||
Whether or not to rename the categories inplace or return a copy of | ||
this categorical with renamed categories. | ||
|
@@ -896,11 +909,41 @@ def rename_categories(self, new_categories, inplace=False): | |
remove_categories | ||
remove_unused_categories | ||
set_categories | ||
|
||
Examples | ||
-------- | ||
>>> c = Categorical(['a', 'a', 'b']) | ||
>>> c.rename_categories([0, 1]) | ||
[0, 0, 1] | ||
Categories (2, int64): [0, 1] | ||
|
||
For dict-like ``new_categories``, extra keys are ignored and | ||
categories not in the dictionary are passed through | ||
|
||
>>> c.rename_categories({'a': 'A', 'c': 'C'}) | ||
[A, A, b] | ||
Categories (2, object): [A, b] | ||
|
||
Series are considered list-like here, so the *values* are used | ||
instead of the *index* | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we actually want this behaviour? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure. It’ll be a backwards incompatible change if we don’t treat Series as arrays so I think we should at least do this for now, maybe with a warning. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, I think this is fine. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback which part to you agree with? Warning that it'll change to dict-like in the future? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no I agree the current behavior is correct. we handle list-like the same. no warning is needed as this is expected. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually, looking at this again. a Series should be just like a dict. This is a perf issue yes?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please revert the warning There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you explain? |
||
|
||
>>> c.rename_categories(pd.Series([0, 1], index=['a', 'b'])) | ||
[0, 0, 1] | ||
Categories (2, int64): [0, 1] | ||
""" | ||
inplace = validate_bool_kwarg(inplace, 'inplace') | ||
cat = self if inplace else self.copy() | ||
|
||
if is_dict_like(new_categories): | ||
is_series = isinstance(new_categories, ABCSeries) | ||
|
||
if is_series: | ||
msg = ("Treating Series 'new_categories' as a list-like and using " | ||
"the values. In a future version, 'rename_categories' will " | ||
"treat Series like a dictionary.\n" | ||
"For dict-like, use 'new_categories.to_dict()'\n" | ||
"For list-like, use 'new_categories.values'.") | ||
warn(msg, FutureWarning, stacklevel=2) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe convert the series to array (list-like), so then the rest of the code does not need to take care of it being a series or not There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sorry I think that was an example I added in the first commit of this PR, before we decided to treat Series as list-like. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this a breaking change at all? we simply did not support this before There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The old behavior required a list-like, and cat.rename(Series([0, 1])) to work, since it did! But we have a new feature that changes the behavior. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok I c, I think this was accidently supported before. ok so fine on the FutureWarning. |
||
if is_dict_like(new_categories) and not is_series: | ||
cat.categories = [new_categories.get(item, item) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just use map |
||
for item in cat.categories] | ||
else: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"lookup up" -> ""looked up"? (not sure what is the correct english conjugation)