Skip to content

Rename categories with Series #17982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 26, 2017
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,36 @@ Now, to find prices per store/product, we can simply do:
.pipe(lambda grp: grp.Revenue.sum()/grp.Quantity.sum())
.unstack().round(2))


.. _whatsnew_0210.enhancements.reanme_categories:

``Categorical.rename_categories`` accepts a dict-like
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:meth:`Categorical.rename_categories` now accepts a dict-like argument for
``new_categories``. The previous categories are lookup up in the dictionary's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"lookup up" -> ""looked up"? (not sure what is the correct english conjugation)

keys and replaced if found. The behavior of missing and extra keys is the same
as in :meth:`DataFrame.rename`.

.. ipython:: python

c = pd.Categorical(['a', 'a', 'b'])
c.rename_categories({"a": "eh", "b": "bee"})

.. warning::

To assist with upgrading pandas, ``rename_categories`` treats ``Series`` as
list-like. Typically, they are considered to be dict-like, and in a future
version of pandas ``rename_categories`` will change to treat them as
dict-like.

.. ipython:: python
:okwarning:

c.rename_categories(pd.Series([0, 1], index=['a', 'c']))

Follow the warning message's recommendations.

See the :ref:`documentation <groupby.pipe>` for more.

.. _whatsnew_0210.enhancements.other:
Expand Down Expand Up @@ -267,7 +297,6 @@ Other Enhancements
- :func:`DataFrame.items` and :func:`Series.items` are now present in both Python 2 and 3 and is lazy in all cases. (:issue:`13918`, :issue:`17213`)
- :func:`Styler.where` has been implemented as a convenience for :func:`Styler.applymap`. (:issue:`17474`)
- :func:`MultiIndex.is_monotonic_decreasing` has been implemented. Previously returned ``False`` in all cases. (:issue:`16554`)
- :func:`Categorical.rename_categories` now accepts a dict-like argument as ``new_categories`` and only updates the categories found in that dict. (:issue:`17336`)
- :func:`read_excel` raises ``ImportError`` with a better message if ``xlrd`` is not installed. (:issue:`17613`)
- :func:`read_json` now accepts a ``chunksize`` parameter that can be used when ``lines=True``. If ``chunksize`` is passed, read_json now returns an iterator which reads in ``chunksize`` lines with each iteration. (:issue:`17048`)
- :meth:`DataFrame.assign` will preserve the original order of ``**kwargs`` for Python 3.6+ users instead of sorting the column names. (:issue:`14207`)
Expand Down
59 changes: 51 additions & 8 deletions pandas/core/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -866,11 +866,6 @@ def set_categories(self, new_categories, ordered=None, rename=False,
def rename_categories(self, new_categories, inplace=False):
""" Renames categories.

The new categories can be either a list-like dict-like object.
If it is list-like, all items must be unique and the number of items
in the new categories must be the same as the number of items in the
old categories.

Raises
------
ValueError
Expand All @@ -879,8 +874,26 @@ def rename_categories(self, new_categories, inplace=False):

Parameters
----------
new_categories : Index-like or dict-like (>=0.21.0)
The renamed categories.
new_categories : list-like or dict-like
The categories end up with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like an incomplete sentence ?


.. versionchanged:: 0.21.0

new_categories may now also be dict-like, in which case it
specifies a mapping from old-categories to new.

If it is list-like, all items must be unique and the number of
items in the new categories must match the existing number of
categories.

If dict-like, categories not contained in the mapping are passed
through.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a bit unstructured with the versionchanged and the further explanation splitted.

Suggestion (just copy paste, not edited sentences):

- list-like: If it is list-like, all items must be unique and the number of
  items in the new categories must match the existing number of
  categories.
- .. versionadded:: 0.21.1 dict-like, in which case it specifies a mapping from old-categories to new. Categories not contained in the mapping are passed through.

.. warning::

   about series

(only not fully sure how the versionchanges works in a list)


.. warning::

Currently, Series are considered list like. In a future version
of pandas they'll be considered dict-like.

inplace : boolean (default: False)
Whether or not to rename the categories inplace or return a copy of
this categorical with renamed categories.
Expand All @@ -896,11 +909,41 @@ def rename_categories(self, new_categories, inplace=False):
remove_categories
remove_unused_categories
set_categories

Examples
--------
>>> c = Categorical(['a', 'a', 'b'])
>>> c.rename_categories([0, 1])
[0, 0, 1]
Categories (2, int64): [0, 1]

For dict-like ``new_categories``, extra keys are ignored and
categories not in the dictionary are passed through

>>> c.rename_categories({'a': 'A', 'c': 'C'})
[A, A, b]
Categories (2, object): [A, b]

Series are considered list-like here, so the *values* are used
instead of the *index*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually want this behaviour?
Eg for Series.rename, a Series is seen as a dict-like ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. It’ll be a backwards incompatible change if we don’t treat Series as arrays so I think we should at least do this for now, maybe with a warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think this is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback which part to you agree with? Warning that it'll change to dict-like in the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I agree the current behavior is correct. we handle list-like the same. no warning is needed as this is expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, looking at this again.

a Series should be just like a dict. This is a perf issue yes?
do this. (once Index.map works we could simplify a bit)

In [12]: cat = pd.Categorical(['a', 'b', 'c', 'd'])
    ...: res = cat.rename_categories(pd.Series({'a': 4, 'b': 3, 'c': 2, 'd': 1}))
    ...: 
    ...: 

In [13]: cat
Out[13]: 
[a, b, c, d]
Categories (4, object): [a, b, c, d]

In [14]: res
Out[14]: 
[4, 3, 2, 1]
Categories (4, int64): [4, 3, 2, 1]

In [17]: pd.Series(cat.categories).map({'a': 4, 'b': 3, 'c': 2, 'd': 1}).values
Out[17]: array([4, 3, 2, 1])

In [19]: pd.Series(cat.categories).map({'a': 4, 'b': 3, 'c': 2, 'd': 1}).values
Out[19]: array([4, 3, 2, 1])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert the warning

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain?
If we go for "Series -> dict-like" behaviour, this is a breaking change, and we need to use a warning for that.


>>> c.rename_categories(pd.Series([0, 1], index=['a', 'b']))
[0, 0, 1]
Categories (2, int64): [0, 1]
"""
inplace = validate_bool_kwarg(inplace, 'inplace')
cat = self if inplace else self.copy()

if is_dict_like(new_categories):
is_series = isinstance(new_categories, ABCSeries)

if is_series:
msg = ("Treating Series 'new_categories' as a list-like and using "
"the values. In a future version, 'rename_categories' will "
"treat Series like a dictionary.\n"
"For dict-like, use 'new_categories.to_dict()'\n"
"For list-like, use 'new_categories.values'.")
warn(msg, FutureWarning, stacklevel=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe convert the series to array (list-like), so then the rest of the code does not need to take care of it being a series or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go for "Series -> dict-like" behaviour, this is a breaking change, and we need to use a warning for that.

Sorry I think that was an example I added in the first commit of this PR, before we decided to treat Series as list-like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a breaking change at all? we simply did not support this before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old behavior required a list-like, and Series are list like. It's not unreasonable for a user to expect

cat.rename(Series([0, 1]))

to work, since it did! But we have a new feature that changes the behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I c, I think this was accidently supported before. ok so fine on the FutureWarning.

if is_dict_like(new_categories) and not is_series:
cat.categories = [new_categories.get(item, item)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use map

for item in cat.categories]
else:
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/test_categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -1203,6 +1203,18 @@ def test_rename_categories(self):
with pytest.raises(ValueError):
cat.rename_categories([1, 2])

def test_rename_categories_series(self):
# https://github.com/pandas-dev/pandas/issues/17981
c = pd.Categorical(['a', 'b'])
xpr = "Treating Series 'new_categories' as a list-like "
with tm.assert_produces_warning(FutureWarning) as rec:
result = c.rename_categories(pd.Series([0, 1]))

assert len(rec) == 1
assert xpr in str(rec[0].message)
expected = pd.Categorical([0, 1])
tm.assert_categorical_equal(result, expected)

def test_rename_categories_dict(self):
# GH 17336
cat = pd.Categorical(['a', 'b', 'c', 'd'])
Expand Down