DOC: Improve the docstrings of CategoricalIndex.map #20286

l736x · 2018-03-11T17:16:41Z

Hi core devs,

I worked on the CategoricalIndex.map docstring during the sprint in Paris.
Although we concentrated on this function, while doing that we couldn't help improving (marginally!) also the docstrings for Categorical.map, Index.map and Series.map.
We probably should have refrained from doing it, knowing that other cities were working on those, but it was difficult to change one and leaving the style of the others incoherent with the first.
As a result I'm proposing here a pull request that combines all the changes.
The validation script passes for CategoricalIndex.map but not for the other cases.
If that is not acceptable I can change that, but I will probably need some help with the git part.

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
################### Docstring (pandas.CategoricalIndex.map)  ###################
################################################################################

Map index values using input correspondence (a dict, Series, or
function).

Maps the values (their categories, not the codes) of the index to new
categories. If the mapping correspondence maps each original category
to a different new category the result is a CategoricalIndex which has
the same order property as the original, otherwise an Index is
returned.

If a dictionary or Series is used any unmapped category is mapped to
NA. Note that if this happens an Index will be returned.

Parameters
----------
mapper : function, dict, or Series
    Mapping correspondence.

Returns
-------
CategoricalIndex or Index
    Mapped index.

See Also
--------
Index.map : Apply a mapping correspondence on an Index.
Series.map : Apply a mapping correspondence on a Series.
Series.apply : Apply more complex functions on a Series.

Examples
--------
>>> idx = pd.CategoricalIndex(['a', 'b', 'c'])
>>> idx
CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'],
                 ordered=False, dtype='category')
>>> idx.map(lambda x: x.upper())
CategoricalIndex(['A', 'B', 'C'], categories=['A', 'B', 'C'],
                 ordered=False, dtype='category')
>>> idx.map({'a': 'first', 'b': 'second', 'c': 'third'})
CategoricalIndex(['first', 'second', 'third'], categories=['first',
                 'second', 'third'], ordered=False, dtype='category')

If the mapping is not bijective an Index is returned:

>>> idx.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

If a dictionary is used, all unmapped categories are mapped to NA and
the result is an Index:

>>> idx.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', nan], dtype='object')

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.CategoricalIndex.map" correct. :)

################################################################################
###################### Docstring (pandas.Categorical.map) ######################
################################################################################

Map categories (not codes) using input correspondence (a dict,
Series, or function).

Maps the categories to new categories. If the mapping
correspondence maps each original category to a different new category
the result is a Categorical which has the same order property as
the original, otherwise an np.ndarray is returned.

If a dictionary or Series is used any unmapped category is mapped to
NA. Note that if this happens an np.ndarray will be returned.

Parameters
----------
mapper : callable
    Function to be applied.

Returns
-------
applied : Categorical or Index.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	See Also section not found
	No examples section found

################################################################################
######################### Docstring (pandas.Index.map) #########################
################################################################################

Map values using input correspondence (a dict, Series, or function).

Parameters
----------
mapper : function, dict, or Series
    Mapping correspondence.
na_action : {None, 'ignore'}
    If 'ignore', propagate NA values, without passing them to the
    mapping correspondence.

Returns
-------
applied : Union[Index, MultiIndex], inferred
    The output of the mapping function applied to the index.
    If the function returns a tuple with more than one element
    a MultiIndex will be returned.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No extended summary found
	See Also section not found
	No examples section found

################################################################################
######################## Docstring (pandas.Series.map)  ########################
################################################################################

Map values of Series using input correspondence (a dict, Series, or
function).

Parameters
----------
arg : function, dict, or Series
    Mapping correspondence.
na_action : {None, 'ignore'}
    If 'ignore', propagate NA values, without passing them to the
    mapping correspondence.

Returns
-------
y : Series
    Same index as caller.

Examples
--------

Map inputs to outputs (both of type `Series`):

>>> x = pd.Series([1,2,3], index=['one', 'two', 'three'])
>>> x
one      1
two      2
three    3
dtype: int64

>>> y = pd.Series(['foo', 'bar', 'baz'], index=[1,2,3])
>>> y
1    foo
2    bar
3    baz

>>> x.map(y)
one   foo
two   bar
three baz

If `arg` is a dictionary, return a new Series with values converted
according to the dictionary's mapping:

>>> z = {1: 'A', 2: 'B', 3: 'C'}

>>> x.map(z)
one   A
two   B
three C

Use na_action to control whether NA values are affected by the mapping
function.

>>> s = pd.Series([1, 2, 3, np.nan])

>>> s2 = s.map('this is a string {}'.format, na_action=None)
0    this is a string 1.0
1    this is a string 2.0
2    this is a string 3.0
3    this is a string nan
dtype: object

>>> s3 = s.map('this is a string {}'.format, na_action='ignore')
0    this is a string 1.0
1    this is a string 2.0
2    this is a string 3.0
3                     NaN
dtype: object

See Also
--------
Series.apply : For applying more complex functions on a Series.
DataFrame.apply : Apply a function row-/column-wise.
DataFrame.applymap : Apply a function elementwise on a whole DataFrame.

Notes
-----
When `arg` is a dictionary, values in Series that are not in the
dictionary (as keys) are converted to ``NaN``. However, if the
dictionary is a ``dict`` subclass that defines ``__missing__`` (i.e.
provides a method for default values), then this default is used
rather than ``NaN``:

>>> from collections import Counter
>>> counter = Counter()
>>> counter['bar'] += 1
>>> y.map(counter)
1    0
2    1
3    0
dtype: int64

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No summary found (a short summary in a single line should be present at the beginning of the docstring)
	Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 31, in pandas.Series.map
Failed example:
    y
Expected:
    1    foo
    2    bar
    3    baz
Got:
    1    foo
    2    bar
    3    baz
    dtype: object
**********************************************************************
Line 36, in pandas.Series.map
Failed example:
    x.map(y)
Expected:
    one   foo
    two   bar
    three baz
Got:
    one      foo
    two      bar
    three    baz
    dtype: object
**********************************************************************
Line 46, in pandas.Series.map
Failed example:
    x.map(z)
Expected:
    one   A
    two   B
    three C
Got:
    one      A
    two      B
    three    C
    dtype: object
**********************************************************************
Line 56, in pandas.Series.map
Failed example:
    s2 = s.map('this is a string {}'.format, na_action=None)
Expected:
    0    this is a string 1.0
    1    this is a string 2.0
    2    this is a string 3.0
    3    this is a string nan
    dtype: object
Got nothing
**********************************************************************
Line 63, in pandas.Series.map
Failed example:
    s3 = s.map('this is a string {}'.format, na_action='ignore')
Expected:
    0    this is a string 1.0
    1    this is a string 2.0
    2    this is a string 3.0
    3                     NaN
    dtype: object
Got nothing

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

…icalindex_map

WillAyd · 2018-03-11T18:50:26Z

pandas/core/arrays/categorical.py

-            to different categories, the result will be Categorical which has
-            the same order property as the original. Otherwise, the result will
-            be np.ndarray.
+            Function to be applied.

        Returns
        -------
        applied : Categorical or Index.


No need for a period at the end of the type. Also should namespace pandas types (except for DataFrame and Series) so pandas.Categorial or pandas.Index

This is outside the original scope of the contribution, see first PR comment.

WillAyd · 2018-03-11T18:50:59Z

pandas/core/arrays/categorical.py

@@ -1081,20 +1081,26 @@ def remove_unused_categories(self, inplace=False):
            return cat

    def map(self, mapper):
-        """Apply mapper function to its categories (not codes).
+        """
+        Map categories (not codes) using input correspondence (a dict,


Needs to fit on one line - can you condense?

WillAyd · 2018-03-11T18:53:20Z

pandas/core/arrays/categorical.py

+
+        Maps the categories to new categories. If the mapping
+        correspondence maps each original category to a different new category
+        the result is a Categorical which has the same order property as


Can link to Categorical here so :class:`~pandas.Categorical'

WillAyd · 2018-03-11T18:53:54Z

pandas/core/arrays/categorical.py

+        Map categories (not codes) using input correspondence (a dict,
+        Series, or function).
+
+        Maps the categories to new categories. If the mapping


I realize this is mostly a copy / paste of the original but I find the wording rather confusing - any thoughts on how to make the explanation clearer?

I agree with you, this is not very clear but I don't know how to make it better. In my understanding a Categorical resembles in some respect a MultiIndex: the categories are a little bit like the levels and the codes are like the labels. Without that analogy in mind, any wording is going to be confusing.
What about something like that (a bit verbose):

"""
Maps the categories (the values of the elements of the array, not the
codes that represent the positions) to new categories. If the mapping
correspondence ...
"""

I added backticks to the codes to make it clear that we are talking about an attribute and not just a generic word.
What do you think? I'm open to suggestions!

@WillAyd What is the aspect you find confusing? The first "Maps the categories to new categories" or the explanation after it?

Do we have news on this?

The first line still need to be consolidated to one line.

As to the Extended Summary I think saying things like "mapping correspondence maps each original category to a different new category" is just very confusing. Could we not be more concise and just say something like "Maps the categories to new categories, retaining any ordering. Missing entries will be replaced with an np.ndarray containing NaN."?

I think that is the what the summary is trying to say now but it's not entirely clear. Adding Examples to show how the mapping works and how missing data is handled would make things a lot clearer.

WillAyd · 2018-03-11T18:54:59Z

pandas/core/arrays/categorical.py

+        the result is a Categorical which has the same order property as
+        the original, otherwise an np.ndarray is returned.
+
+        If a dictionary or Series is used any unmapped category is mapped to


When referencing Python types use the actual name and place in backticks, so `dict` here. For pandas types can also link to the class so :class:`~pandas.Series`

WillAyd · 2018-03-11T18:55:39Z

pandas/core/arrays/categorical.py

+        the original, otherwise an np.ndarray is returned.
+
+        If a dictionary or Series is used any unmapped category is mapped to
+        NA. Note that if this happens an np.ndarray will be returned.


Minor but use `NaN` instead of NA

Please notice that NA is widely used across the pandas documentation:

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sum.html#pandas.Series.sum

We asked ourselves whether NA or NaN was more appropriate and decided for NA that seems more used than NaN.
What does the official guideline say on the subject?

I don't have the good answer here, that is something we should discuss (in some places I see we use "NA/NaN" ..)

So, shall I go for NA or NaN?

Use NA when referring to the concept of missing data and NaN to refer to the actual missing value. Here you want the latter

I modified it.
But please notice that when we have timestamps the missing value is a NaT.

In [42]: idx = pd.CategoricalIndex(['a', 'b', 'c']) In [43]: cat = idx.values In [44]: type(cat) Out[44]: pandas.core.arrays.categorical.Categorical In [45]: cat.map({'a': pd.Timestamp('2000-01-03'), 'b': pd.Timestamp('2000-01-02')}) Out[45]: DatetimeIndex(['2000-01-03', '2000-01-02', 'NaT'], dtype='datetime64[ns]', freq=None)

But please notice that when we have timestamps the missing value is a NaT.

Yes, we need some better guidelines on this, but you can ignore that inconsistency for now

WillAyd · 2018-03-11T18:57:08Z

pandas/core/indexes/category.py


+        See Also


Since all of these methods are similar why don't you add a See Also in all of the instances that refer to one another?

This is outside the original scope of the contribution, see first PR comment.

WillAyd · 2018-03-11T18:57:54Z

pandas/core/indexes/category.py

+        Series.map : Apply a mapping correspondence on a Series.
+        Series.apply : Apply more complex functions on a Series.
+
+        Examples


Good to see Examples here - why not add to other map implementations as well?

This is outside the original scope of the contribution, see first PR comment.

WillAyd · 2018-03-11T18:58:42Z

pandas/core/indexes/category.py

+        >>> idx.map({'a': 'first', 'b': 'second', 'c': 'first'})
+        Index(['first', 'second', 'first'], dtype='object')
+
+        If a dictionary is used, all unmapped categories are mapped to NA and


Same comments as before (may be a few other places) use `dict` instead of dictionary and `NaN` instead of NA

WillAyd · 2018-03-11T19:01:21Z

Make sure you review the validation script - there are definitely some errors with your examples in Series.map that need to be accounted for

jorisvandenbossche

Thanks for the PR! Added some additional comments

jorisvandenbossche · 2018-03-12T10:31:37Z

pandas/core/indexes/category.py

@@ -661,20 +661,58 @@ def is_dtype_equal(self, other):
    take_nd = take

    def map(self, mapper):
-        """Apply mapper function to its categories (not codes).
+        """
+        Map index values using input correspondence (a dict, Series, or


I think if you leave out 'index' it fits on one line?

jorisvandenbossche · 2018-03-12T10:32:07Z

pandas/core/indexes/base.py


        Returns
        -------
        applied : Union[Index, MultiIndex], inferred
            The output of the mapping function applied to the index.
            If the function returns a tuple with more than one element
            a MultiIndex will be returned.
-


It would be good to add a small example here

This is outside the original scope of the contribution, see first PR comment.

jorisvandenbossche · 2018-03-12T10:33:26Z

pandas/core/series.py

-        DataFrame.applymap: Apply a function elementwise on a whole DataFrame
+        Series.apply : For applying more complex functions on a Series.
+        DataFrame.apply : Apply a function row-/column-wise.
+        DataFrame.applymap : Apply a function elementwise on a whole DataFrame.


Can you also add Series.replace ?

This is outside the original scope of the contribution, see first PR comment.

jorisvandenbossche · 2018-03-12T10:35:48Z

pandas/core/series.py


        Parameters
        ----------
        arg : function, dict, or Series
+            Mapping correspondence.


I think we should add here some details on how a dict and Series are handled.

This is outside the original scope of the contribution, see first PR comment.

l736x · 2018-03-12T12:50:09Z

Thanks for the comments. I will work on them this evening probably.
I would like to point out though that, as stated in the topmost comment of the pull request, I worked on CategoricalIndex.map docstring. The other docstrings were just marginally improved on the way, sometimes in very trivial ways, like, for example, the first sentence being on the same line as """ or a dot in the wrong place.
I never wanted to fully amend the docs of the other three methods (Categorical.map, Index.map, Series.map). The time to do that goes beyond what I have at the moment. Not considering that some of the methods we are talking about were supposed to be improved by other participants to the sprint.
I made sure that the validation script runs correctly on the CategoricalIndex.map, but I didn't require that the script run on the others.
Can we consider that the present pull request makes the CategoricalIndex.map compliant with the guidelines and incidentally improves (but does not make compliant) the other docstrings?
I can change the title if you want.

jorisvandenbossche · 2018-03-12T12:54:19Z

@l736x Ah, sorry, I didn't fully read that comment at the top. I have renamed the PR to make that more clear :)

jorisvandenbossche · 2018-03-12T12:56:29Z

For me it's fine both ways: keep it focused on CategoricalIndex (and you can ignore the comments in the others pars), or it's also fine to expand the scope of the PR to try to fo all map methods. As you prefer.

l736x · 2018-03-12T13:00:00Z

Just to clarify: I'm not saying that the pull request cannot be improved, I completely agree that also Series.map, Index.map and Categorical.map need a rework.
I'm just saying that this goes beyond the scope of the task we worked on during the sprint and that I can not commit to this larger rework at the moment.

l736x · 2018-03-12T13:02:04Z

I'm more willing to work on the other map methods in a separate pull request to avoid delaying this one, if you are all ok with this.

jorisvandenbossche · 2018-03-12T13:06:02Z

That's fine!

l736x · 2018-03-17T21:27:12Z

Ok, I thing I'm getting out of my mind.
I was editing the pull request to take into account your comments and all of a sudden I cannot run anymore the examples I wrote in the docstring of CategoricalIndex.map.
I honestly don't understand, although I suspect I'm doing something stupid somewhere.
I need help!

Can you tell me if at the moment this code runs correctly?:

idx = pd.CategoricalIndex(['a', 'b', 'c'])
idx.map({'a': 'first', 'b': 'second', 'c': 'third'})

According to the examples it should give:

CategoricalIndex(['first', 'second', 'third'], categories=['first',
                 'second', 'third'], ordered=False, dtype='category')

I swear it was working last saturday, but at the moment I get:

In [1]: import pandas as pd

In [2]: idx = pd.CategoricalIndex(['a', 'b', 'c'])

In [3]: idx.map({'a': 'first', 'b': 'second', 'c': 'third'})
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-c78c3647d1e7> in <module>()
----> 1 idx.map({'a': 'first', 'b': 'second', 'c': 'third'})

~/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/indexes/category.py in map(self, mapper)
    639 
    640         """
--> 641         return self._shallow_copy_with_infer(self.values.map(mapper))
    642 
    643     def delete(self, loc):

~/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/categorical.py in map(self, mapper)
   1138 
   1139         """
-> 1140         new_categories = self.categories.map(mapper)
   1141         try:
   1142             return self.from_codes(self._codes.copy(),

~/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/indexes/base.py in map(self, mapper)
   2882         """
   2883         from .multi import MultiIndex
-> 2884         mapped_values = self._arrmap(self.values, mapper)
   2885         attributes = self._get_attributes_dict()
   2886         if mapped_values.size and isinstance(mapped_values[0], tuple):

pandas/_libs/algos_common_helper.pxi in pandas._libs.algos.arrmap_object()

TypeError: 'dict' object is not callable

By inspecting the code in algos_common_helper.pxi I would guess that it should indeed fail. (The file did not change recently as far as I can see.)

The troubling thing is that even a simple Index is not working with .map(dict), while from the docstring it should.

idx = pd.Index(['a', 'b', 'c'])
idx.map({'a': 'first', 'b': 'second', 'c': 'third'})

Any thoughts?

WillAyd · 2018-03-17T21:30:20Z

Typically if you get errors from the Cython layer you need to rebuild the C extensions, so try running

python setup.py build_ext --inplace

And see if that resolves

l736x · 2018-03-17T21:37:04Z

It says everything is up-to-date and does not rebuild anything. :(
Can you check if the last two commands I wrote in the previous comment work for you?

l736x · 2018-03-17T21:43:28Z

Ok, I got it: for a reason that I don't understand I was picking up the anaconda default pandas instead of the branch I was editing. Problem solved, back to my pull request.
Sorry for the noise!

…icalindex_map

l736x · 2018-03-17T22:32:47Z

I updated the pull request.
In the end I added also the missing sections for Categorical.map. In the current state also that method is now docstring-compliant.

About your proposal for the Extended Summary: "Maps the categories to new categories, retaining any ordering. Missing entries will be replaced with an np.ndarray containing NaN."
To me it is not clear what ordering the sentence is referring to: the categorical or the properties?
It is not clear either where the entries are missing: in the categorical? in the mapping?
Moreover we are not replacing single missing entries with ndarrays, we are changing the type of the output according to whether there are missing entries or not.
Personally I don't see the reason for being too concise in the Extended Summary, I think that if there is a catch in the function it is better to put it there rather than in the example section that is far below and might not be read at all.

Let me know if you like this version better.

Here the new validation script outputs:

################################################################################
################### Docstring (pandas.CategoricalIndex.map)  ###################
################################################################################

Map values using input correspondence (a dict, Series, or function).

Maps the values (their categories, not the codes) of the index to new
categories. If the mapping correspondence is a bijection (maps each
original category to a different new category) the result is a
:class:`~pandas.CategoricalIndex` which has the same order property as
the original, otherwise an :class:`~pandas.Index` is returned.

If a `dict` or :class:`~pandas.Series` is used any unmapped category is
mapped to NaN. Note that if this happens an :class:`~pandas.Index` will
be returned.

Parameters
----------
mapper : function, dict, or Series
    Mapping correspondence.

Returns
-------
pandas.CategoricalIndex or pandas.Index
    Mapped index.

See Also
--------
Index.map : Apply a mapping correspondence on an
    :class:`~pandas.Index`.
Series.map : Apply a mapping correspondence on a
    :class:`~pandas.Series`.
Series.apply : Apply more complex functions on a
    :class:`~pandas.Series`.

Examples
--------
>>> idx = pd.CategoricalIndex(['a', 'b', 'c'])
>>> idx
CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'],
                 ordered=False, dtype='category')
>>> idx.map(lambda x: x.upper())
CategoricalIndex(['A', 'B', 'C'], categories=['A', 'B', 'C'],
                 ordered=False, dtype='category')
>>> idx.map({'a': 'first', 'b': 'second', 'c': 'third'})
CategoricalIndex(['first', 'second', 'third'], categories=['first',
                 'second', 'third'], ordered=False, dtype='category')

If the mapping is not bijective an :class:`~pandas.Index` is returned:

>>> idx.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

If a `dict` is used, all unmapped categories are mapped to NaN and
the result is an :class:`~pandas.Index`:

>>> idx.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', nan], dtype='object')

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.CategoricalIndex.map" correct. :)

################################################################################
###################### Docstring (pandas.Categorical.map) ######################
################################################################################

Map categories using input correspondence (dict, Series, or function).

Maps the categories to new categories. If the mapping correspondence is
a bijection (maps each original category to a different new category)
the result is a :class:`~pandas.Categorical` which has the same order
property as the original, otherwise a :class:`~pandas.Index` is
returned.

If a `dict` or :class:`~pandas.Series` is used any unmapped category is
mapped to NaN. Note that if this happens an :class:`~pandas.Index` will
be returned.

Parameters
----------
mapper : function, dict, or Series
    Mapping correspondence.

Returns
-------
pandas.Categorical or pandas.Index
    Mapped categorical.

See Also
--------
CategoricalIndex.map : Apply a mapping correspondence on a
    :class:`~pandas.CategoricalIndex`.
Index.map : Apply a mapping correspondence on an
    :class:`~pandas.Index`.
Series.map : Apply a mapping correspondence on a
    :class:`~pandas.Series`.
Series.apply : Apply more complex functions on a
    :class:`~pandas.Series`.

Examples
--------
>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat
[a, b, c]
Categories (3, object): [a, b, c]
>>> cat.map(lambda x: x.upper())
[A, B, C]
Categories (3, object): [A, B, C]
>>> cat.map({'a': 'first', 'b': 'second', 'c': 'third'})
[first, second, third]
Categories (3, object): [first, second, third]

If the mapping is not bijective an :class:`~pandas.Index` is returned:

>>> cat.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

If a `dict` is used, all unmapped categories are mapped to NaN and
the result is an :class:`~pandas.Index`:

>>> cat.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', nan], dtype='object')

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.Categorical.map" correct. :)

WillAyd · 2018-03-17T22:39:59Z

I would also say that if the documentation is not clear to you as the writer (or in this case editor) then it would not be clear to the general reader either, hence the request for an update. I agree with you that examples would be nice - given the docstring before was certainly trying to call out something about how the ordering and missing values were being handled can you add examples to illustrate those caveats? Perhaps then a better wording will become clear to you.

FWIW I wouldn't use words that fall under a specific technical domain like "bijection" - I think the concept(s) can be explained in general terms so again try adding at least one example each highlighting the sorting and missing value nuances and see what that yields

l736x · 2018-03-18T00:39:43Z

@WillAyd I did! Did you read the new docstring?

Better explanation:

If the mapping correspondence is
a bijection (maps each original category to a different new category) ...

Example (in both Categorical and CategoricalIndex):

If the mapping is not bijective an :class:`~pandas.Index` is returned:

>>> cat.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

Personally I'm happy with this version. I took into consideration your remarks, I explained why I don't agree with the correction that you proposed and I gave you a new version that to me is clear enough.
If you have other comments, please be sure to read the current version first.

[edited for a misspelling]

codecov · 2018-03-18T07:02:23Z

Codecov Report

Merging #20286 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #20286   +/-   ##
=======================================
  Coverage    91.8%    91.8%           
=======================================
  Files         152      152           
  Lines       49215    49215           
=======================================
  Hits        45181    45181           
  Misses       4034     4034

Flag	Coverage Δ
#multiple	`90.18% <ø> (ø)`	⬆️
#single	`41.84% <ø> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/arrays/categorical.py	`96.21% <ø> (ø)`	⬆️
pandas/core/series.py	`93.84% <ø> (ø)`	⬆️
pandas/core/indexes/category.py	`97.3% <ø> (ø)`	⬆️
pandas/core/indexes/base.py	`96.68% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01882ba...ecbaca0. Read the comment docs.

jorisvandenbossche · 2018-03-19T23:05:49Z

FWIW I wouldn't use words that fall under a specific technical domain like "bijection" - I think the concept(s) can be explained in general terms so again try adding at least one example each highlighting the sorting and missing value nuances and see what that yields

I agree with trying to avoid terms like "bijection", I don't think many people know what it it (although you directly explain it here ..).
For the ordering, I find the current version clear (it is not about the ordering of the mapping, it is the returned categorical that keeps the ordering property).

@WillAyd if there is still something else unclear to you in the current version, can you add some inline comments?

jorisvandenbossche · 2018-03-19T23:08:54Z

pandas/core/arrays/categorical.py

+        Map categories using input correspondence (dict, Series, or function).
+
+        Maps the categories to new categories. If the mapping correspondence is
+        a bijection (maps each original category to a different new category)


maybe we can use something like "one-to-one mapping" instead ?

WillAyd · 2018-03-20T00:18:38Z

pandas/core/indexes/category.py

+                         ordered=False, dtype='category')
+        >>> idx.map({'a': 'first', 'b': 'second', 'c': 'third'})
+        CategoricalIndex(['first', 'second', 'third'], categories=['first',
+                         'second', 'third'], ordered=False, dtype='category')


@jorisvandenbossche @l736x the point I was trying to make was that since we mention that the ordering property gets retained with a mapping, that we should have an example for an ordered CategoricalIndex explicitly showing this. It could be something as simple as:

Note that one-to-one mappings will retain the ordering of the CategoricalIndex idx = pd.CategoricalIndex(['a,'b','c'], ordered=True) idx.map({'a': 3, 'b': 2, 'c': 1})

Just my $.02 though @jorisvandenbossche I'm good to go whenever you want to merge

It would indeed be good to have an example for this. But maybe just adding ordered=True to the example above is enough? (without adding another example)

Works for me

…icalindex_map

l736x · 2018-03-20T22:23:40Z

I replaced the word "bijection" with "one-to-one mapping" as suggested. You are both right, this is better.

@WillAyd, sorry I didn't understand that your comment was about a new example. I agree it's a good idea so I added one for both Categorical.map and CategoricalIndex.map.

Let me know if you are ok with the present version or you need other fixes.

################################################################################
################### Docstring (pandas.CategoricalIndex.map)  ###################
################################################################################

Map values using input correspondence (a dict, Series, or function).

Maps the values (their categories, not the codes) of the index to new
categories. If the mapping correspondence is a one-to-one mapping (maps
each original category to a different new category) the result is a
:class:`~pandas.CategoricalIndex` which has the same order property as
the original, otherwise an :class:`~pandas.Index` is returned.

If a `dict` or :class:`~pandas.Series` is used any unmapped category is
mapped to NaN. Note that if this happens an :class:`~pandas.Index` will
be returned.

Parameters
----------
mapper : function, dict, or Series
    Mapping correspondence.

Returns
-------
pandas.CategoricalIndex or pandas.Index
    Mapped index.

See Also
--------
Index.map : Apply a mapping correspondence on an
    :class:`~pandas.Index`.
Series.map : Apply a mapping correspondence on a
    :class:`~pandas.Series`.
Series.apply : Apply more complex functions on a
    :class:`~pandas.Series`.

Examples
--------
>>> idx = pd.CategoricalIndex(['a', 'b', 'c'])
>>> idx
CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'],
                 ordered=False, dtype='category')
>>> idx.map(lambda x: x.upper())
CategoricalIndex(['A', 'B', 'C'], categories=['A', 'B', 'C'],
                 ordered=False, dtype='category')
>>> idx.map({'a': 'first', 'b': 'second', 'c': 'third'})
CategoricalIndex(['first', 'second', 'third'], categories=['first',
                 'second', 'third'], ordered=False, dtype='category')

The ordering of the categories is preserved by the map:

>>> idx = pd.CategoricalIndex(['a', 'b', 'c'], ordered=True)
>>> idx
CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'],
                 ordered=True, dtype='category')
>>> idx.map({'a': 3, 'b': 2, 'c': 1})
CategoricalIndex([3, 2, 1], categories=[3, 2, 1], ordered=True,
                 dtype='category')

If the mapping is not one-to-one an :class:`~pandas.Index` is returned:

>>> idx.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

If a `dict` is used, all unmapped categories are mapped to NaN and
the result is an :class:`~pandas.Index`:

>>> idx.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', nan], dtype='object')

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.CategoricalIndex.map" correct. :)

################################################################################
###################### Docstring (pandas.Categorical.map) ######################
################################################################################

Map categories using input correspondence (dict, Series, or function).

Maps the categories to new categories. If the mapping correspondence is
a one-to-one mapping (maps each original category to a different new
category) the result is a :class:`~pandas.Categorical` which has the
same order property as the original, otherwise a :class:`~pandas.Index`
is returned.

If a `dict` or :class:`~pandas.Series` is used any unmapped category is
mapped to NaN. Note that if this happens an :class:`~pandas.Index` will
be returned.

Parameters
----------
mapper : function, dict, or Series
    Mapping correspondence.

Returns
-------
pandas.Categorical or pandas.Index
    Mapped categorical.

See Also
--------
CategoricalIndex.map : Apply a mapping correspondence on a
    :class:`~pandas.CategoricalIndex`.
Index.map : Apply a mapping correspondence on an
    :class:`~pandas.Index`.
Series.map : Apply a mapping correspondence on a
    :class:`~pandas.Series`.
Series.apply : Apply more complex functions on a
    :class:`~pandas.Series`.

Examples
--------
>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat
[a, b, c]
Categories (3, object): [a, b, c]
>>> cat.map(lambda x: x.upper())
[A, B, C]
Categories (3, object): [A, B, C]
>>> cat.map({'a': 'first', 'b': 'second', 'c': 'third'})
[first, second, third]
Categories (3, object): [first, second, third]

The ordering of the categories is preserved by the map:

>>> cat = pd.Categorical(['a', 'b', 'c'], ordered=True)
>>> cat
[a, b, c]
Categories (3, object): [a < b < c]
>>> cat.map({'a': 3, 'b': 2, 'c': 1})
[3, 2, 1]
Categories (3, int64): [3 < 2 < 1]

If the mapping is not one-to-one an :class:`~pandas.Index` is returned:

>>> cat.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

If a `dict` is used, all unmapped categories are mapped to NaN and
the result is an :class:`~pandas.Index`:

>>> cat.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', nan], dtype='object')

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.Categorical.map" correct. :)

WillAyd

Thanks for the updates! I like the format but just have some slight suggestions on how to improve wording.
Please be sure to also surround the NaN references in backticks (think this was called out in earlier review)

WillAyd · 2018-03-21T05:39:27Z

pandas/core/arrays/categorical.py

+        """
+        Map categories using input correspondence (dict, Series, or function).
+
+        Maps the categories to new categories. If the mapping correspondence is


This is simplified to the point that I think you can now just say "If the mapping correspondence is one-to-one the result is a ..." in the second sentence

WillAyd · 2018-03-21T05:40:03Z

pandas/core/arrays/categorical.py

+        is returned.
+
+        If a `dict` or :class:`~pandas.Series` is used any unmapped category is
+        mapped to NaN. Note that if this happens an :class:`~pandas.Index` will


NaN should be in backticks, so 'NaN'

WillAyd · 2018-03-21T05:42:13Z

pandas/core/arrays/categorical.py

+        [first, second, third]
+        Categories (3, object): [first, second, third]
+
+        The ordering of the categories is preserved by the map:


Nice job, but to just contrast a little better with what you have below maybe say "If the mapping is one-to-one the ordering is preserved" (we also wouldn't want the reader who stops here to inadvertently think that ordering is always preserved)

WillAyd · 2018-03-21T05:43:31Z

pandas/core/arrays/categorical.py

+        >>> cat.map({'a': 'first', 'b': 'second', 'c': 'first'})
+        Index(['first', 'second', 'first'], dtype='object')
+
+        If a `dict` is used, all unmapped categories are mapped to NaN and


Similar comment with 'NaN'

WillAyd · 2018-03-21T05:43:47Z

pandas/core/indexes/category.py

+        Map values using input correspondence (a dict, Series, or function).
+
+        Maps the values (their categories, not the codes) of the index to new
+        categories. If the mapping correspondence is a one-to-one mapping (maps


Same comment as above

WillAyd · 2018-03-21T05:43:57Z

pandas/core/indexes/category.py

+        the original, otherwise an :class:`~pandas.Index` is returned.
+
+        If a `dict` or :class:`~pandas.Series` is used any unmapped category is
+        mapped to NaN. Note that if this happens an :class:`~pandas.Index` will


l736x · 2018-03-21T21:31:38Z

Thanks for the improvements. I missed the NaN because I introduced them later to replace NA.

WillAyd · 2018-03-21T23:37:28Z

Thanks for the updates lgtm. @jorisvandenbossche I'm good with this when you are

jorisvandenbossche · 2018-03-22T08:39:18Z

Thanks @l736x for the PR and @WillAyd for the feedback!

l736x added 15 commits March 10, 2018 14:15

Applied docstring convention.

4c6d29e

Improve Series.map docstring plus conventions.

a2bf1c2

Improved Index.map docstring.

bf0b870

Docstring convention.

dba4d1f

Docstring convention.

d13f83c

Improve the CategoricalIndex.map docstring.

9366e39

Docstring convention.

e42fd0c

Docstring convention.

8240278

Improve Categorical.map docstring.

9087677

Improve CategoricalIndex.map docstring.

da84d5f

Improved CategoricalIndex.map and Categorical.map docstrings.

848d960

Improved CategoricalIndex.map docstring.

2af44df

Final version, pandas docstring convention compliant.

1ad38e9

Bug fix in docstring.

1a8040d

Merge remote-tracking branch 'upstream/master' into docstring_categor…

4fdb6a8

…icalindex_map

WillAyd requested changes Mar 11, 2018

View reviewed changes

jorisvandenbossche reviewed Mar 12, 2018

View reviewed changes

jorisvandenbossche added the Docs label Mar 12, 2018

jorisvandenbossche changed the title ~~DOC: Improve the docstrings of CategoricalIndex.map(), Categorical.map(), Index.map(), Series.map()~~ DOC: Improve the docstrings of CategoricalIndex.map Mar 12, 2018

l736x added 3 commits March 17, 2018 23:10

DOC: Improved CategoricalIndex.map and Categorical.map docstrings.

986e1dd

DOC: Bugfix.

0fc2c48

Merge remote-tracking branch 'upstream/master' into docstring_categor…

9e25133

…icalindex_map

jorisvandenbossche reviewed Mar 19, 2018

View reviewed changes

WillAyd reviewed Mar 20, 2018

View reviewed changes

l736x added 2 commits March 20, 2018 23:01

Merge remote-tracking branch 'upstream/master' into docstring_categor…

1cd4c38

…icalindex_map

DOC: Added examples to Categorical.map and CategoricalIndex.map docs.

a76a4b5

WillAyd requested changes Mar 21, 2018

View reviewed changes

DOC: Improved Categorical.map and CategoricalIndex.map.

ecbaca0

WillAyd approved these changes Mar 21, 2018

View reviewed changes

jorisvandenbossche approved these changes Mar 22, 2018

View reviewed changes

jorisvandenbossche added this to the 0.23.0 milestone Mar 22, 2018

jorisvandenbossche merged commit 6621cb6 into pandas-dev:master Mar 22, 2018

l736x deleted the docstring_categoricalindex_map branch March 29, 2018 11:55

javadnoorb pushed a commit to javadnoorb/pandas that referenced this pull request Mar 29, 2018

DOC: Improve the docstrings of CategoricalIndex.map (pandas-dev#20286)

e6980f3

dworvos pushed a commit to dworvos/pandas that referenced this pull request Apr 2, 2018

DOC: Improve the docstrings of CategoricalIndex.map (pandas-dev#20286)

30afa66

DOC: Improve the docstrings of CategoricalIndex.map #20286

DOC: Improve the docstrings of CategoricalIndex.map #20286

Conversation

l736x commented Mar 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Mar 11, 2018

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l736x commented Mar 12, 2018

jorisvandenbossche commented Mar 12, 2018

jorisvandenbossche commented Mar 12, 2018

l736x commented Mar 12, 2018

l736x commented Mar 12, 2018

jorisvandenbossche commented Mar 12, 2018

l736x commented Mar 17, 2018

WillAyd commented Mar 17, 2018

l736x commented Mar 17, 2018

l736x commented Mar 17, 2018

l736x commented Mar 17, 2018

WillAyd commented Mar 17, 2018

l736x commented Mar 18, 2018 • edited Loading

codecov bot commented Mar 18, 2018 • edited Loading

Codecov Report

jorisvandenbossche commented Mar 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l736x commented Mar 20, 2018

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l736x commented Mar 21, 2018

WillAyd commented Mar 21, 2018

jorisvandenbossche commented Mar 22, 2018

l736x commented Mar 11, 2018 •

edited

Loading

l736x commented Mar 18, 2018 •

edited

Loading

codecov bot commented Mar 18, 2018 •

edited

Loading