Skip to content

DOC: Improve the docstrings of CategoricalIndex.map #20286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

l736x
Copy link
Contributor

@l736x l736x commented Mar 11, 2018

Hi core devs,

I worked on the CategoricalIndex.map docstring during the sprint in Paris.
Although we concentrated on this function, while doing that we couldn't help improving (marginally!) also the docstrings for Categorical.map, Index.map and Series.map.
We probably should have refrained from doing it, knowing that other cities were working on those, but it was difficult to change one and leaving the style of the others incoherent with the first.
As a result I'm proposing here a pull request that combines all the changes.
The validation script passes for CategoricalIndex.map but not for the other cases.
If that is not acceptable I can change that, but I will probably need some help with the git part.

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
################### Docstring (pandas.CategoricalIndex.map)  ###################
################################################################################

Map index values using input correspondence (a dict, Series, or
function).

Maps the values (their categories, not the codes) of the index to new
categories. If the mapping correspondence maps each original category
to a different new category the result is a CategoricalIndex which has
the same order property as the original, otherwise an Index is
returned.

If a dictionary or Series is used any unmapped category is mapped to
NA. Note that if this happens an Index will be returned.

Parameters
----------
mapper : function, dict, or Series
    Mapping correspondence.

Returns
-------
CategoricalIndex or Index
    Mapped index.

See Also
--------
Index.map : Apply a mapping correspondence on an Index.
Series.map : Apply a mapping correspondence on a Series.
Series.apply : Apply more complex functions on a Series.

Examples
--------
>>> idx = pd.CategoricalIndex(['a', 'b', 'c'])
>>> idx
CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'],
                 ordered=False, dtype='category')
>>> idx.map(lambda x: x.upper())
CategoricalIndex(['A', 'B', 'C'], categories=['A', 'B', 'C'],
                 ordered=False, dtype='category')
>>> idx.map({'a': 'first', 'b': 'second', 'c': 'third'})
CategoricalIndex(['first', 'second', 'third'], categories=['first',
                 'second', 'third'], ordered=False, dtype='category')

If the mapping is not bijective an Index is returned:

>>> idx.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

If a dictionary is used, all unmapped categories are mapped to NA and
the result is an Index:

>>> idx.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', nan], dtype='object')

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.CategoricalIndex.map" correct. :)

################################################################################
###################### Docstring (pandas.Categorical.map) ######################
################################################################################

Map categories (not codes) using input correspondence (a dict,
Series, or function).

Maps the categories to new categories. If the mapping
correspondence maps each original category to a different new category
the result is a Categorical which has the same order property as
the original, otherwise an np.ndarray is returned.

If a dictionary or Series is used any unmapped category is mapped to
NA. Note that if this happens an np.ndarray will be returned.

Parameters
----------
mapper : callable
    Function to be applied.

Returns
-------
applied : Categorical or Index.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	See Also section not found
	No examples section found

################################################################################
######################### Docstring (pandas.Index.map) #########################
################################################################################

Map values using input correspondence (a dict, Series, or function).

Parameters
----------
mapper : function, dict, or Series
    Mapping correspondence.
na_action : {None, 'ignore'}
    If 'ignore', propagate NA values, without passing them to the
    mapping correspondence.

Returns
-------
applied : Union[Index, MultiIndex], inferred
    The output of the mapping function applied to the index.
    If the function returns a tuple with more than one element
    a MultiIndex will be returned.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No extended summary found
	See Also section not found
	No examples section found

################################################################################
######################## Docstring (pandas.Series.map)  ########################
################################################################################

Map values of Series using input correspondence (a dict, Series, or
function).

Parameters
----------
arg : function, dict, or Series
    Mapping correspondence.
na_action : {None, 'ignore'}
    If 'ignore', propagate NA values, without passing them to the
    mapping correspondence.

Returns
-------
y : Series
    Same index as caller.

Examples
--------

Map inputs to outputs (both of type `Series`):

>>> x = pd.Series([1,2,3], index=['one', 'two', 'three'])
>>> x
one      1
two      2
three    3
dtype: int64

>>> y = pd.Series(['foo', 'bar', 'baz'], index=[1,2,3])
>>> y
1    foo
2    bar
3    baz

>>> x.map(y)
one   foo
two   bar
three baz

If `arg` is a dictionary, return a new Series with values converted
according to the dictionary's mapping:

>>> z = {1: 'A', 2: 'B', 3: 'C'}

>>> x.map(z)
one   A
two   B
three C

Use na_action to control whether NA values are affected by the mapping
function.

>>> s = pd.Series([1, 2, 3, np.nan])

>>> s2 = s.map('this is a string {}'.format, na_action=None)
0    this is a string 1.0
1    this is a string 2.0
2    this is a string 3.0
3    this is a string nan
dtype: object

>>> s3 = s.map('this is a string {}'.format, na_action='ignore')
0    this is a string 1.0
1    this is a string 2.0
2    this is a string 3.0
3                     NaN
dtype: object

See Also
--------
Series.apply : For applying more complex functions on a Series.
DataFrame.apply : Apply a function row-/column-wise.
DataFrame.applymap : Apply a function elementwise on a whole DataFrame.

Notes
-----
When `arg` is a dictionary, values in Series that are not in the
dictionary (as keys) are converted to ``NaN``. However, if the
dictionary is a ``dict`` subclass that defines ``__missing__`` (i.e.
provides a method for default values), then this default is used
rather than ``NaN``:

>>> from collections import Counter
>>> counter = Counter()
>>> counter['bar'] += 1
>>> y.map(counter)
1    0
2    1
3    0
dtype: int64

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No summary found (a short summary in a single line should be present at the beginning of the docstring)
	Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 31, in pandas.Series.map
Failed example:
    y
Expected:
    1    foo
    2    bar
    3    baz
Got:
    1    foo
    2    bar
    3    baz
    dtype: object
**********************************************************************
Line 36, in pandas.Series.map
Failed example:
    x.map(y)
Expected:
    one   foo
    two   bar
    three baz
Got:
    one      foo
    two      bar
    three    baz
    dtype: object
**********************************************************************
Line 46, in pandas.Series.map
Failed example:
    x.map(z)
Expected:
    one   A
    two   B
    three C
Got:
    one      A
    two      B
    three    C
    dtype: object
**********************************************************************
Line 56, in pandas.Series.map
Failed example:
    s2 = s.map('this is a string {}'.format, na_action=None)
Expected:
    0    this is a string 1.0
    1    this is a string 2.0
    2    this is a string 3.0
    3    this is a string nan
    dtype: object
Got nothing
**********************************************************************
Line 63, in pandas.Series.map
Failed example:
    s3 = s.map('this is a string {}'.format, na_action='ignore')
Expected:
    0    this is a string 1.0
    1    this is a string 2.0
    2    this is a string 3.0
    3                     NaN
    dtype: object
Got nothing

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

to different categories, the result will be Categorical which has
the same order property as the original. Otherwise, the result will
be np.ndarray.
Function to be applied.

Returns
-------
applied : Categorical or Index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for a period at the end of the type. Also should namespace pandas types (except for DataFrame and Series) so pandas.Categorial or pandas.Index

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the original scope of the contribution, see first PR comment.

@@ -1081,20 +1081,26 @@ def remove_unused_categories(self, inplace=False):
return cat

def map(self, mapper):
"""Apply mapper function to its categories (not codes).
"""
Map categories (not codes) using input correspondence (a dict,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to fit on one line - can you condense?


Maps the categories to new categories. If the mapping
correspondence maps each original category to a different new category
the result is a Categorical which has the same order property as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can link to Categorical here so :class:`~pandas.Categorical'

Map categories (not codes) using input correspondence (a dict,
Series, or function).

Maps the categories to new categories. If the mapping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is mostly a copy / paste of the original but I find the wording rather confusing - any thoughts on how to make the explanation clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, this is not very clear but I don't know how to make it better. In my understanding a Categorical resembles in some respect a MultiIndex: the categories are a little bit like the levels and the codes are like the labels. Without that analogy in mind, any wording is going to be confusing.
What about something like that (a bit verbose):

"""
Maps the categories (the values of the elements of the array, not the
codes that represent the positions) to new categories. If the mapping
correspondence ...
"""

I added backticks to the codes to make it clear that we are talking about an attribute and not just a generic word.
What do you think? I'm open to suggestions!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd What is the aspect you find confusing? The first "Maps the categories to new categories" or the explanation after it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have news on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line still need to be consolidated to one line.

As to the Extended Summary I think saying things like "mapping correspondence maps each original category to a different new category" is just very confusing. Could we not be more concise and just say something like "Maps the categories to new categories, retaining any ordering. Missing entries will be replaced with an np.ndarray containing NaN."?

I think that is the what the summary is trying to say now but it's not entirely clear. Adding Examples to show how the mapping works and how missing data is handled would make things a lot clearer.

the result is a Categorical which has the same order property as
the original, otherwise an np.ndarray is returned.

If a dictionary or Series is used any unmapped category is mapped to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When referencing Python types use the actual name and place in backticks, so `dict` here. For pandas types can also link to the class so :class:`~pandas.Series`

the original, otherwise an np.ndarray is returned.

If a dictionary or Series is used any unmapped category is mapped to
NA. Note that if this happens an np.ndarray will be returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but use `NaN` instead of NA

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please notice that NA is widely used across the pandas documentation:

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sum.html#pandas.Series.sum

We asked ourselves whether NA or NaN was more appropriate and decided for NA that seems more used than NaN.
What does the official guideline say on the subject?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the good answer here, that is something we should discuss (in some places I see we use "NA/NaN" ..)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, shall I go for NA or NaN?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use NA when referring to the concept of missing data and NaN to refer to the actual missing value. Here you want the latter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified it.
But please notice that when we have timestamps the missing value is a NaT.

In [42]: idx = pd.CategoricalIndex(['a', 'b', 'c'])

In [43]: cat = idx.values

In [44]: type(cat)
Out[44]: pandas.core.arrays.categorical.Categorical

In [45]: cat.map({'a': pd.Timestamp('2000-01-03'), 'b': pd.Timestamp('2000-01-02')})
Out[45]: DatetimeIndex(['2000-01-03', '2000-01-02', 'NaT'], dtype='datetime64[ns]', freq=None)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But please notice that when we have timestamps the missing value is a NaT.

Yes, we need some better guidelines on this, but you can ignore that inconsistency for now


See Also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all of these methods are similar why don't you add a See Also in all of the instances that refer to one another?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the original scope of the contribution, see first PR comment.

Series.map : Apply a mapping correspondence on a Series.
Series.apply : Apply more complex functions on a Series.

Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to see Examples here - why not add to other map implementations as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the original scope of the contribution, see first PR comment.

>>> idx.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

If a dictionary is used, all unmapped categories are mapped to NA and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as before (may be a few other places) use `dict` instead of dictionary and `NaN` instead of NA

@WillAyd
Copy link
Member

WillAyd commented Mar 11, 2018

Make sure you review the validation script - there are definitely some errors with your examples in Series.map that need to be accounted for

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Added some additional comments

@@ -661,20 +661,58 @@ def is_dtype_equal(self, other):
take_nd = take

def map(self, mapper):
"""Apply mapper function to its categories (not codes).
"""
Map index values using input correspondence (a dict, Series, or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you leave out 'index' it fits on one line?


Returns
-------
applied : Union[Index, MultiIndex], inferred
The output of the mapping function applied to the index.
If the function returns a tuple with more than one element
a MultiIndex will be returned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a small example here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the original scope of the contribution, see first PR comment.

DataFrame.applymap: Apply a function elementwise on a whole DataFrame
Series.apply : For applying more complex functions on a Series.
DataFrame.apply : Apply a function row-/column-wise.
DataFrame.applymap : Apply a function elementwise on a whole DataFrame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add Series.replace ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the original scope of the contribution, see first PR comment.


Parameters
----------
arg : function, dict, or Series
Mapping correspondence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add here some details on how a dict and Series are handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the original scope of the contribution, see first PR comment.

@l736x
Copy link
Contributor Author

l736x commented Mar 12, 2018

Thanks for the comments. I will work on them this evening probably.
I would like to point out though that, as stated in the topmost comment of the pull request, I worked on CategoricalIndex.map docstring. The other docstrings were just marginally improved on the way, sometimes in very trivial ways, like, for example, the first sentence being on the same line as """ or a dot in the wrong place.
I never wanted to fully amend the docs of the other three methods (Categorical.map, Index.map, Series.map). The time to do that goes beyond what I have at the moment. Not considering that some of the methods we are talking about were supposed to be improved by other participants to the sprint.
I made sure that the validation script runs correctly on the CategoricalIndex.map, but I didn't require that the script run on the others.
Can we consider that the present pull request makes the CategoricalIndex.map compliant with the guidelines and incidentally improves (but does not make compliant) the other docstrings?
I can change the title if you want.

@jorisvandenbossche jorisvandenbossche changed the title DOC: Improve the docstrings of CategoricalIndex.map(), Categorical.map(), Index.map(), Series.map() DOC: Improve the docstrings of CategoricalIndex.map Mar 12, 2018
@jorisvandenbossche
Copy link
Member

@l736x Ah, sorry, I didn't fully read that comment at the top. I have renamed the PR to make that more clear :)

@jorisvandenbossche
Copy link
Member

For me it's fine both ways: keep it focused on CategoricalIndex (and you can ignore the comments in the others pars), or it's also fine to expand the scope of the PR to try to fo all map methods. As you prefer.

@l736x
Copy link
Contributor Author

l736x commented Mar 12, 2018

Just to clarify: I'm not saying that the pull request cannot be improved, I completely agree that also Series.map, Index.map and Categorical.map need a rework.
I'm just saying that this goes beyond the scope of the task we worked on during the sprint and that I can not commit to this larger rework at the moment.

@l736x
Copy link
Contributor Author

l736x commented Mar 12, 2018

I'm more willing to work on the other map methods in a separate pull request to avoid delaying this one, if you are all ok with this.

@jorisvandenbossche
Copy link
Member

That's fine!

@l736x
Copy link
Contributor Author

l736x commented Mar 17, 2018

Ok, I thing I'm getting out of my mind.
I was editing the pull request to take into account your comments and all of a sudden I cannot run anymore the examples I wrote in the docstring of CategoricalIndex.map.
I honestly don't understand, although I suspect I'm doing something stupid somewhere.
I need help!

Can you tell me if at the moment this code runs correctly?:

idx = pd.CategoricalIndex(['a', 'b', 'c'])
idx.map({'a': 'first', 'b': 'second', 'c': 'third'})

According to the examples it should give:

CategoricalIndex(['first', 'second', 'third'], categories=['first',
                 'second', 'third'], ordered=False, dtype='category')

I swear it was working last saturday, but at the moment I get:

In [1]: import pandas as pd

In [2]: idx = pd.CategoricalIndex(['a', 'b', 'c'])

In [3]: idx.map({'a': 'first', 'b': 'second', 'c': 'third'})
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-c78c3647d1e7> in <module>()
----> 1 idx.map({'a': 'first', 'b': 'second', 'c': 'third'})

~/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/indexes/category.py in map(self, mapper)
    639 
    640         """
--> 641         return self._shallow_copy_with_infer(self.values.map(mapper))
    642 
    643     def delete(self, loc):

~/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/categorical.py in map(self, mapper)
   1138 
   1139         """
-> 1140         new_categories = self.categories.map(mapper)
   1141         try:
   1142             return self.from_codes(self._codes.copy(),

~/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/indexes/base.py in map(self, mapper)
   2882         """
   2883         from .multi import MultiIndex
-> 2884         mapped_values = self._arrmap(self.values, mapper)
   2885         attributes = self._get_attributes_dict()
   2886         if mapped_values.size and isinstance(mapped_values[0], tuple):

pandas/_libs/algos_common_helper.pxi in pandas._libs.algos.arrmap_object()

TypeError: 'dict' object is not callable

By inspecting the code in algos_common_helper.pxi I would guess that it should indeed fail. (The file did not change recently as far as I can see.)

The troubling thing is that even a simple Index is not working with .map(dict), while from the docstring it should.

idx = pd.Index(['a', 'b', 'c'])
idx.map({'a': 'first', 'b': 'second', 'c': 'third'})

Any thoughts?

@WillAyd
Copy link
Member

WillAyd commented Mar 17, 2018

Typically if you get errors from the Cython layer you need to rebuild the C extensions, so try running

python setup.py build_ext --inplace

And see if that resolves

@l736x
Copy link
Contributor Author

l736x commented Mar 17, 2018

It says everything is up-to-date and does not rebuild anything. :(
Can you check if the last two commands I wrote in the previous comment work for you?

@l736x
Copy link
Contributor Author

l736x commented Mar 17, 2018

Ok, I got it: for a reason that I don't understand I was picking up the anaconda default pandas instead of the branch I was editing. Problem solved, back to my pull request.
Sorry for the noise!

@l736x
Copy link
Contributor Author

l736x commented Mar 17, 2018

I updated the pull request.
In the end I added also the missing sections for Categorical.map. In the current state also that method is now docstring-compliant.

About your proposal for the Extended Summary: "Maps the categories to new categories, retaining any ordering. Missing entries will be replaced with an np.ndarray containing NaN."
To me it is not clear what ordering the sentence is referring to: the categorical or the properties?
It is not clear either where the entries are missing: in the categorical? in the mapping?
Moreover we are not replacing single missing entries with ndarrays, we are changing the type of the output according to whether there are missing entries or not.
Personally I don't see the reason for being too concise in the Extended Summary, I think that if there is a catch in the function it is better to put it there rather than in the example section that is far below and might not be read at all.

Let me know if you like this version better.

Here the new validation script outputs:

################################################################################
################### Docstring (pandas.CategoricalIndex.map)  ###################
################################################################################

Map values using input correspondence (a dict, Series, or function).

Maps the values (their categories, not the codes) of the index to new
categories. If the mapping correspondence is a bijection (maps each
original category to a different new category) the result is a
:class:`~pandas.CategoricalIndex` which has the same order property as
the original, otherwise an :class:`~pandas.Index` is returned.

If a `dict` or :class:`~pandas.Series` is used any unmapped category is
mapped to NaN. Note that if this happens an :class:`~pandas.Index` will
be returned.

Parameters
----------
mapper : function, dict, or Series
    Mapping correspondence.

Returns
-------
pandas.CategoricalIndex or pandas.Index
    Mapped index.

See Also
--------
Index.map : Apply a mapping correspondence on an
    :class:`~pandas.Index`.
Series.map : Apply a mapping correspondence on a
    :class:`~pandas.Series`.
Series.apply : Apply more complex functions on a
    :class:`~pandas.Series`.

Examples
--------
>>> idx = pd.CategoricalIndex(['a', 'b', 'c'])
>>> idx
CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'],
                 ordered=False, dtype='category')
>>> idx.map(lambda x: x.upper())
CategoricalIndex(['A', 'B', 'C'], categories=['A', 'B', 'C'],
                 ordered=False, dtype='category')
>>> idx.map({'a': 'first', 'b': 'second', 'c': 'third'})
CategoricalIndex(['first', 'second', 'third'], categories=['first',
                 'second', 'third'], ordered=False, dtype='category')

If the mapping is not bijective an :class:`~pandas.Index` is returned:

>>> idx.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

If a `dict` is used, all unmapped categories are mapped to NaN and
the result is an :class:`~pandas.Index`:

>>> idx.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', nan], dtype='object')

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.CategoricalIndex.map" correct. :)
################################################################################
###################### Docstring (pandas.Categorical.map) ######################
################################################################################

Map categories using input correspondence (dict, Series, or function).

Maps the categories to new categories. If the mapping correspondence is
a bijection (maps each original category to a different new category)
the result is a :class:`~pandas.Categorical` which has the same order
property as the original, otherwise a :class:`~pandas.Index` is
returned.

If a `dict` or :class:`~pandas.Series` is used any unmapped category is
mapped to NaN. Note that if this happens an :class:`~pandas.Index` will
be returned.

Parameters
----------
mapper : function, dict, or Series
    Mapping correspondence.

Returns
-------
pandas.Categorical or pandas.Index
    Mapped categorical.

See Also
--------
CategoricalIndex.map : Apply a mapping correspondence on a
    :class:`~pandas.CategoricalIndex`.
Index.map : Apply a mapping correspondence on an
    :class:`~pandas.Index`.
Series.map : Apply a mapping correspondence on a
    :class:`~pandas.Series`.
Series.apply : Apply more complex functions on a
    :class:`~pandas.Series`.

Examples
--------
>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat
[a, b, c]
Categories (3, object): [a, b, c]
>>> cat.map(lambda x: x.upper())
[A, B, C]
Categories (3, object): [A, B, C]
>>> cat.map({'a': 'first', 'b': 'second', 'c': 'third'})
[first, second, third]
Categories (3, object): [first, second, third]

If the mapping is not bijective an :class:`~pandas.Index` is returned:

>>> cat.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

If a `dict` is used, all unmapped categories are mapped to NaN and
the result is an :class:`~pandas.Index`:

>>> cat.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', nan], dtype='object')

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.Categorical.map" correct. :)

@WillAyd
Copy link
Member

WillAyd commented Mar 17, 2018

I would also say that if the documentation is not clear to you as the writer (or in this case editor) then it would not be clear to the general reader either, hence the request for an update. I agree with you that examples would be nice - given the docstring before was certainly trying to call out something about how the ordering and missing values were being handled can you add examples to illustrate those caveats? Perhaps then a better wording will become clear to you.

FWIW I wouldn't use words that fall under a specific technical domain like "bijection" - I think the concept(s) can be explained in general terms so again try adding at least one example each highlighting the sorting and missing value nuances and see what that yields

@l736x
Copy link
Contributor Author

l736x commented Mar 18, 2018

@WillAyd I did! Did you read the new docstring?

Better explanation:

If the mapping correspondence is
a bijection (maps each original category to a different new category) ...

Example (in both Categorical and CategoricalIndex):

If the mapping is not bijective an :class:`~pandas.Index` is returned:

>>> cat.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

Personally I'm happy with this version. I took into consideration your remarks, I explained why I don't agree with the correction that you proposed and I gave you a new version that to me is clear enough.
If you have other comments, please be sure to read the current version first.

[edited for a misspelling]

@codecov
Copy link

codecov bot commented Mar 18, 2018

Codecov Report

Merging #20286 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #20286   +/-   ##
=======================================
  Coverage    91.8%    91.8%           
=======================================
  Files         152      152           
  Lines       49215    49215           
=======================================
  Hits        45181    45181           
  Misses       4034     4034
Flag Coverage Δ
#multiple 90.18% <ø> (ø) ⬆️
#single 41.84% <ø> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/categorical.py 96.21% <ø> (ø) ⬆️
pandas/core/series.py 93.84% <ø> (ø) ⬆️
pandas/core/indexes/category.py 97.3% <ø> (ø) ⬆️
pandas/core/indexes/base.py 96.68% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01882ba...ecbaca0. Read the comment docs.

@jorisvandenbossche
Copy link
Member

FWIW I wouldn't use words that fall under a specific technical domain like "bijection" - I think the concept(s) can be explained in general terms so again try adding at least one example each highlighting the sorting and missing value nuances and see what that yields

I agree with trying to avoid terms like "bijection", I don't think many people know what it it (although you directly explain it here ..).
For the ordering, I find the current version clear (it is not about the ordering of the mapping, it is the returned categorical that keeps the ordering property).

@WillAyd if there is still something else unclear to you in the current version, can you add some inline comments?

Map categories using input correspondence (dict, Series, or function).

Maps the categories to new categories. If the mapping correspondence is
a bijection (maps each original category to a different new category)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can use something like "one-to-one mapping" instead ?

ordered=False, dtype='category')
>>> idx.map({'a': 'first', 'b': 'second', 'c': 'third'})
CategoricalIndex(['first', 'second', 'third'], categories=['first',
'second', 'third'], ordered=False, dtype='category')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche @l736x the point I was trying to make was that since we mention that the ordering property gets retained with a mapping, that we should have an example for an ordered CategoricalIndex explicitly showing this. It could be something as simple as:

Note that one-to-one mappings will retain the ordering of the CategoricalIndex
idx = pd.CategoricalIndex(['a,'b','c'], ordered=True)
idx.map({'a': 3, 'b': 2, 'c': 1})

Just my $.02 though @jorisvandenbossche I'm good to go whenever you want to merge

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would indeed be good to have an example for this. But maybe just adding ordered=True to the example above is enough? (without adding another example)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me

@l736x
Copy link
Contributor Author

l736x commented Mar 20, 2018

I replaced the word "bijection" with "one-to-one mapping" as suggested. You are both right, this is better.

@WillAyd, sorry I didn't understand that your comment was about a new example. I agree it's a good idea so I added one for both Categorical.map and CategoricalIndex.map.

Let me know if you are ok with the present version or you need other fixes.

################################################################################
################### Docstring (pandas.CategoricalIndex.map)  ###################
################################################################################

Map values using input correspondence (a dict, Series, or function).

Maps the values (their categories, not the codes) of the index to new
categories. If the mapping correspondence is a one-to-one mapping (maps
each original category to a different new category) the result is a
:class:`~pandas.CategoricalIndex` which has the same order property as
the original, otherwise an :class:`~pandas.Index` is returned.

If a `dict` or :class:`~pandas.Series` is used any unmapped category is
mapped to NaN. Note that if this happens an :class:`~pandas.Index` will
be returned.

Parameters
----------
mapper : function, dict, or Series
    Mapping correspondence.

Returns
-------
pandas.CategoricalIndex or pandas.Index
    Mapped index.

See Also
--------
Index.map : Apply a mapping correspondence on an
    :class:`~pandas.Index`.
Series.map : Apply a mapping correspondence on a
    :class:`~pandas.Series`.
Series.apply : Apply more complex functions on a
    :class:`~pandas.Series`.

Examples
--------
>>> idx = pd.CategoricalIndex(['a', 'b', 'c'])
>>> idx
CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'],
                 ordered=False, dtype='category')
>>> idx.map(lambda x: x.upper())
CategoricalIndex(['A', 'B', 'C'], categories=['A', 'B', 'C'],
                 ordered=False, dtype='category')
>>> idx.map({'a': 'first', 'b': 'second', 'c': 'third'})
CategoricalIndex(['first', 'second', 'third'], categories=['first',
                 'second', 'third'], ordered=False, dtype='category')

The ordering of the categories is preserved by the map:

>>> idx = pd.CategoricalIndex(['a', 'b', 'c'], ordered=True)
>>> idx
CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'],
                 ordered=True, dtype='category')
>>> idx.map({'a': 3, 'b': 2, 'c': 1})
CategoricalIndex([3, 2, 1], categories=[3, 2, 1], ordered=True,
                 dtype='category')

If the mapping is not one-to-one an :class:`~pandas.Index` is returned:

>>> idx.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

If a `dict` is used, all unmapped categories are mapped to NaN and
the result is an :class:`~pandas.Index`:

>>> idx.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', nan], dtype='object')

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.CategoricalIndex.map" correct. :)
################################################################################
###################### Docstring (pandas.Categorical.map) ######################
################################################################################

Map categories using input correspondence (dict, Series, or function).

Maps the categories to new categories. If the mapping correspondence is
a one-to-one mapping (maps each original category to a different new
category) the result is a :class:`~pandas.Categorical` which has the
same order property as the original, otherwise a :class:`~pandas.Index`
is returned.

If a `dict` or :class:`~pandas.Series` is used any unmapped category is
mapped to NaN. Note that if this happens an :class:`~pandas.Index` will
be returned.

Parameters
----------
mapper : function, dict, or Series
    Mapping correspondence.

Returns
-------
pandas.Categorical or pandas.Index
    Mapped categorical.

See Also
--------
CategoricalIndex.map : Apply a mapping correspondence on a
    :class:`~pandas.CategoricalIndex`.
Index.map : Apply a mapping correspondence on an
    :class:`~pandas.Index`.
Series.map : Apply a mapping correspondence on a
    :class:`~pandas.Series`.
Series.apply : Apply more complex functions on a
    :class:`~pandas.Series`.

Examples
--------
>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat
[a, b, c]
Categories (3, object): [a, b, c]
>>> cat.map(lambda x: x.upper())
[A, B, C]
Categories (3, object): [A, B, C]
>>> cat.map({'a': 'first', 'b': 'second', 'c': 'third'})
[first, second, third]
Categories (3, object): [first, second, third]

The ordering of the categories is preserved by the map:

>>> cat = pd.Categorical(['a', 'b', 'c'], ordered=True)
>>> cat
[a, b, c]
Categories (3, object): [a < b < c]
>>> cat.map({'a': 3, 'b': 2, 'c': 1})
[3, 2, 1]
Categories (3, int64): [3 < 2 < 1]

If the mapping is not one-to-one an :class:`~pandas.Index` is returned:

>>> cat.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

If a `dict` is used, all unmapped categories are mapped to NaN and
the result is an :class:`~pandas.Index`:

>>> cat.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', nan], dtype='object')

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.Categorical.map" correct. :)

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! I like the format but just have some slight suggestions on how to improve wording.
Please be sure to also surround the NaN references in backticks (think this was called out in earlier review)

"""
Map categories using input correspondence (dict, Series, or function).

Maps the categories to new categories. If the mapping correspondence is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simplified to the point that I think you can now just say "If the mapping correspondence is one-to-one the result is a ..." in the second sentence

is returned.

If a `dict` or :class:`~pandas.Series` is used any unmapped category is
mapped to NaN. Note that if this happens an :class:`~pandas.Index` will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaN should be in backticks, so 'NaN'

[first, second, third]
Categories (3, object): [first, second, third]

The ordering of the categories is preserved by the map:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job, but to just contrast a little better with what you have below maybe say "If the mapping is one-to-one the ordering is preserved" (we also wouldn't want the reader who stops here to inadvertently think that ordering is always preserved)

>>> cat.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

If a `dict` is used, all unmapped categories are mapped to NaN and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment with 'NaN'

Map values using input correspondence (a dict, Series, or function).

Maps the values (their categories, not the codes) of the index to new
categories. If the mapping correspondence is a one-to-one mapping (maps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above

the original, otherwise an :class:`~pandas.Index` is returned.

If a `dict` or :class:`~pandas.Series` is used any unmapped category is
mapped to NaN. Note that if this happens an :class:`~pandas.Index` will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'NaN'

@l736x
Copy link
Contributor Author

l736x commented Mar 21, 2018

Thanks for the improvements. I missed the NaN because I introduced them later to replace NA.

@WillAyd
Copy link
Member

WillAyd commented Mar 21, 2018

Thanks for the updates lgtm. @jorisvandenbossche I'm good with this when you are

@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Mar 22, 2018
@jorisvandenbossche jorisvandenbossche merged commit 6621cb6 into pandas-dev:master Mar 22, 2018
@jorisvandenbossche
Copy link
Member

Thanks @l736x for the PR and @WillAyd for the feedback!

@l736x l736x deleted the docstring_categoricalindex_map branch March 29, 2018 11:55
javadnoorb pushed a commit to javadnoorb/pandas that referenced this pull request Mar 29, 2018
dworvos pushed a commit to dworvos/pandas that referenced this pull request Apr 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants