Skip to content

DOC: Improve the docstrings of CategoricalIndex.map #20286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 50 additions & 7 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -1080,20 +1080,63 @@ def remove_unused_categories(self, inplace=False):
return cat

def map(self, mapper):
"""Apply mapper function to its categories (not codes).
"""
Map categories using input correspondence (dict, Series, or function).

Maps the categories to new categories. If the mapping correspondence is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simplified to the point that I think you can now just say "If the mapping correspondence is one-to-one the result is a ..." in the second sentence

a bijection (maps each original category to a different new category)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can use something like "one-to-one mapping" instead ?

the result is a :class:`~pandas.Categorical` which has the same order
property as the original, otherwise a :class:`~pandas.Index` is
returned.

If a `dict` or :class:`~pandas.Series` is used any unmapped category is
mapped to NaN. Note that if this happens an :class:`~pandas.Index` will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaN should be in backticks, so 'NaN'

be returned.

Parameters
----------
mapper : callable
Function to be applied. When all categories are mapped
to different categories, the result will be Categorical which has
the same order property as the original. Otherwise, the result will
be np.ndarray.
mapper : function, dict, or Series
Mapping correspondence.

Returns
-------
applied : Categorical or Index.
pandas.Categorical or pandas.Index
Mapped categorical.

See Also
--------
CategoricalIndex.map : Apply a mapping correspondence on a
:class:`~pandas.CategoricalIndex`.
Index.map : Apply a mapping correspondence on an
:class:`~pandas.Index`.
Series.map : Apply a mapping correspondence on a
:class:`~pandas.Series`.
Series.apply : Apply more complex functions on a
:class:`~pandas.Series`.

Examples
--------
>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat
[a, b, c]
Categories (3, object): [a, b, c]
>>> cat.map(lambda x: x.upper())
[A, B, C]
Categories (3, object): [A, B, C]
>>> cat.map({'a': 'first', 'b': 'second', 'c': 'third'})
[first, second, third]
Categories (3, object): [first, second, third]

If the mapping is not bijective an :class:`~pandas.Index` is returned:

>>> cat.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

If a `dict` is used, all unmapped categories are mapped to NaN and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment with 'NaN'

the result is an :class:`~pandas.Index`:

>>> cat.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', nan], dtype='object')
"""
new_categories = self.categories.map(mapper)
try:
Expand Down
7 changes: 4 additions & 3 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3352,22 +3352,23 @@ def groupby(self, values):
return result

def map(self, mapper, na_action=None):
"""Map values of Series using input correspondence
"""
Map values using input correspondence (a dict, Series, or function).

Parameters
----------
mapper : function, dict, or Series
Mapping correspondence.
na_action : {None, 'ignore'}
If 'ignore', propagate NA values, without passing them to the
mapping function
mapping correspondence.

Returns
-------
applied : Union[Index, MultiIndex], inferred
The output of the mapping function applied to the index.
If the function returns a tuple with more than one element
a MultiIndex will be returned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a small example here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the original scope of the contribution, see first PR comment.

"""

from .multi import MultiIndex
Expand Down
55 changes: 48 additions & 7 deletions pandas/core/indexes/category.py
Original file line number Diff line number Diff line change
Expand Up @@ -660,20 +660,61 @@ def is_dtype_equal(self, other):
take_nd = take

def map(self, mapper):
"""Apply mapper function to its categories (not codes).
"""
Map values using input correspondence (a dict, Series, or function).

Maps the values (their categories, not the codes) of the index to new
categories. If the mapping correspondence is a bijection (maps each
original category to a different new category) the result is a
:class:`~pandas.CategoricalIndex` which has the same order property as
the original, otherwise an :class:`~pandas.Index` is returned.

If a `dict` or :class:`~pandas.Series` is used any unmapped category is
mapped to NaN. Note that if this happens an :class:`~pandas.Index` will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'NaN'

be returned.

Parameters
----------
mapper : callable
Function to be applied. When all categories are mapped
to different categories, the result will be a CategoricalIndex
which has the same order property as the original. Otherwise,
the result will be a Index.
mapper : function, dict, or Series
Mapping correspondence.

Returns
-------
applied : CategoricalIndex or Index
pandas.CategoricalIndex or pandas.Index
Mapped index.

See Also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all of these methods are similar why don't you add a See Also in all of the instances that refer to one another?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the original scope of the contribution, see first PR comment.

--------
Index.map : Apply a mapping correspondence on an
:class:`~pandas.Index`.
Series.map : Apply a mapping correspondence on a
:class:`~pandas.Series`.
Series.apply : Apply more complex functions on a
:class:`~pandas.Series`.

Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to see Examples here - why not add to other map implementations as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the original scope of the contribution, see first PR comment.

--------
>>> idx = pd.CategoricalIndex(['a', 'b', 'c'])
>>> idx
CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'],
ordered=False, dtype='category')
>>> idx.map(lambda x: x.upper())
CategoricalIndex(['A', 'B', 'C'], categories=['A', 'B', 'C'],
ordered=False, dtype='category')
>>> idx.map({'a': 'first', 'b': 'second', 'c': 'third'})
CategoricalIndex(['first', 'second', 'third'], categories=['first',
'second', 'third'], ordered=False, dtype='category')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche @l736x the point I was trying to make was that since we mention that the ordering property gets retained with a mapping, that we should have an example for an ordered CategoricalIndex explicitly showing this. It could be something as simple as:

Note that one-to-one mappings will retain the ordering of the CategoricalIndex
idx = pd.CategoricalIndex(['a,'b','c'], ordered=True)
idx.map({'a': 3, 'b': 2, 'c': 1})

Just my $.02 though @jorisvandenbossche I'm good to go whenever you want to merge

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would indeed be good to have an example for this. But maybe just adding ordered=True to the example above is enough? (without adding another example)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me


If the mapping is not bijective an :class:`~pandas.Index` is returned:

>>> idx.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

If a `dict` is used, all unmapped categories are mapped to NaN and
the result is an :class:`~pandas.Index`:

>>> idx.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', nan], dtype='object')
"""
return self._shallow_copy_with_infer(self.values.map(mapper))

Expand Down
17 changes: 9 additions & 8 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -2831,25 +2831,26 @@ def unstack(self, level=-1, fill_value=None):

def map(self, arg, na_action=None):
"""
Map values of Series using input correspondence (which can be
a dict, Series, or function)
Map values of Series using input correspondence (a dict, Series, or
function).
Parameters
----------
arg : function, dict, or Series
Mapping correspondence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add here some details on how a dict and Series are handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the original scope of the contribution, see first PR comment.

na_action : {None, 'ignore'}
If 'ignore', propagate NA values, without passing them to the
mapping function
mapping correspondence.
Returns
-------
y : Series
same index as caller
Same index as caller.
Examples
--------
Map inputs to outputs (both of type `Series`)
Map inputs to outputs (both of type `Series`):
>>> x = pd.Series([1,2,3], index=['one', 'two', 'three'])
>>> x
Expand Down Expand Up @@ -2900,9 +2901,9 @@ def map(self, arg, na_action=None):
See Also
--------
Series.apply: For applying more complex functions on a Series
DataFrame.apply: Apply a function row-/column-wise
DataFrame.applymap: Apply a function elementwise on a whole DataFrame
Series.apply : For applying more complex functions on a Series.
DataFrame.apply : Apply a function row-/column-wise.
DataFrame.applymap : Apply a function elementwise on a whole DataFrame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add Series.replace ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the original scope of the contribution, see first PR comment.

Notes
-----
Expand Down