-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: Improve the docstrings of CategoricalIndex.map #20286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 15 commits
4c6d29e
a2bf1c2
bf0b870
dba4d1f
d13f83c
9366e39
e42fd0c
8240278
9087677
da84d5f
848d960
2af44df
1ad38e9
1a8040d
4fdb6a8
986e1dd
0fc2c48
9e25133
1cd4c38
a76a4b5
ecbaca0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1081,20 +1081,26 @@ def remove_unused_categories(self, inplace=False): | |
return cat | ||
|
||
def map(self, mapper): | ||
"""Apply mapper function to its categories (not codes). | ||
""" | ||
Map categories (not codes) using input correspondence (a dict, | ||
Series, or function). | ||
|
||
Maps the categories to new categories. If the mapping | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I realize this is mostly a copy / paste of the original but I find the wording rather confusing - any thoughts on how to make the explanation clearer? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with you, this is not very clear but I don't know how to make it better. In my understanding a Categorical resembles in some respect a MultiIndex: the categories are a little bit like the levels and the codes are like the labels. Without that analogy in mind, any wording is going to be confusing. """ I added backticks to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @WillAyd What is the aspect you find confusing? The first "Maps the categories to new categories" or the explanation after it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have news on this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The first line still need to be consolidated to one line. As to the Extended Summary I think saying things like "mapping correspondence maps each original category to a different new category" is just very confusing. Could we not be more concise and just say something like "Maps the categories to new categories, retaining any ordering. Missing entries will be replaced with an I think that is the what the summary is trying to say now but it's not entirely clear. Adding Examples to show how the mapping works and how missing data is handled would make things a lot clearer. |
||
correspondence maps each original category to a different new category | ||
the result is a Categorical which has the same order property as | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can link to Categorical here so :class:`~pandas.Categorical' |
||
the original, otherwise an np.ndarray is returned. | ||
|
||
If a dictionary or Series is used any unmapped category is mapped to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When referencing Python types use the actual name and place in backticks, so `dict` here. For pandas types can also link to the class so :class:`~pandas.Series` |
||
NA. Note that if this happens an np.ndarray will be returned. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor but use `NaN` instead of NA There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please notice that NA is widely used across the pandas documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sum.html#pandas.Series.sum We asked ourselves whether NA or NaN was more appropriate and decided for NA that seems more used than NaN. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have the good answer here, that is something we should discuss (in some places I see we use "NA/NaN" ..) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, shall I go for NA or NaN? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use NA when referring to the concept of missing data and There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I modified it.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, we need some better guidelines on this, but you can ignore that inconsistency for now |
||
|
||
Parameters | ||
---------- | ||
mapper : callable | ||
Function to be applied. When all categories are mapped | ||
to different categories, the result will be Categorical which has | ||
the same order property as the original. Otherwise, the result will | ||
be np.ndarray. | ||
Function to be applied. | ||
|
||
Returns | ||
------- | ||
applied : Categorical or Index. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need for a period at the end of the type. Also should namespace pandas types (except for DataFrame and Series) so There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is outside the original scope of the contribution, see first PR comment. |
||
|
||
""" | ||
new_categories = self.categories.map(mapper) | ||
try: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3121,22 +3121,23 @@ def groupby(self, values): | |
return result | ||
|
||
def map(self, mapper, na_action=None): | ||
"""Map values of Series using input correspondence | ||
""" | ||
Map values using input correspondence (a dict, Series, or function). | ||
|
||
Parameters | ||
---------- | ||
mapper : function, dict, or Series | ||
Mapping correspondence. | ||
na_action : {None, 'ignore'} | ||
If 'ignore', propagate NA values, without passing them to the | ||
mapping function | ||
mapping correspondence. | ||
|
||
Returns | ||
------- | ||
applied : Union[Index, MultiIndex], inferred | ||
The output of the mapping function applied to the index. | ||
If the function returns a tuple with more than one element | ||
a MultiIndex will be returned. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be good to add a small example here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is outside the original scope of the contribution, see first PR comment. |
||
""" | ||
|
||
from .multi import MultiIndex | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -661,20 +661,58 @@ def is_dtype_equal(self, other): | |
take_nd = take | ||
|
||
def map(self, mapper): | ||
"""Apply mapper function to its categories (not codes). | ||
""" | ||
Map index values using input correspondence (a dict, Series, or | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think if you leave out 'index' it fits on one line? |
||
function). | ||
|
||
Maps the values (their categories, not the codes) of the index to new | ||
categories. If the mapping correspondence maps each original category | ||
to a different new category the result is a CategoricalIndex which has | ||
the same order property as the original, otherwise an Index is | ||
returned. | ||
|
||
If a dictionary or Series is used any unmapped category is mapped to | ||
NA. Note that if this happens an Index will be returned. | ||
|
||
Parameters | ||
---------- | ||
mapper : callable | ||
Function to be applied. When all categories are mapped | ||
to different categories, the result will be a CategoricalIndex | ||
which has the same order property as the original. Otherwise, | ||
the result will be a Index. | ||
mapper : function, dict, or Series | ||
Mapping correspondence. | ||
|
||
Returns | ||
------- | ||
applied : CategoricalIndex or Index | ||
CategoricalIndex or Index | ||
Mapped index. | ||
|
||
See Also | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since all of these methods are similar why don't you add a See Also in all of the instances that refer to one another? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is outside the original scope of the contribution, see first PR comment. |
||
-------- | ||
Index.map : Apply a mapping correspondence on an Index. | ||
Series.map : Apply a mapping correspondence on a Series. | ||
Series.apply : Apply more complex functions on a Series. | ||
|
||
Examples | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good to see Examples here - why not add to other There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is outside the original scope of the contribution, see first PR comment. |
||
-------- | ||
>>> idx = pd.CategoricalIndex(['a', 'b', 'c']) | ||
>>> idx | ||
CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'], | ||
ordered=False, dtype='category') | ||
>>> idx.map(lambda x: x.upper()) | ||
CategoricalIndex(['A', 'B', 'C'], categories=['A', 'B', 'C'], | ||
ordered=False, dtype='category') | ||
>>> idx.map({'a': 'first', 'b': 'second', 'c': 'third'}) | ||
CategoricalIndex(['first', 'second', 'third'], categories=['first', | ||
'second', 'third'], ordered=False, dtype='category') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jorisvandenbossche @l736x the point I was trying to make was that since we mention that the ordering property gets retained with a mapping, that we should have an example for an ordered Note that one-to-one mappings will retain the ordering of the CategoricalIndex
idx = pd.CategoricalIndex(['a,'b','c'], ordered=True)
idx.map({'a': 3, 'b': 2, 'c': 1}) Just my $.02 though @jorisvandenbossche I'm good to go whenever you want to merge There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would indeed be good to have an example for this. But maybe just adding There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Works for me |
||
|
||
If the mapping is not bijective an Index is returned: | ||
|
||
>>> idx.map({'a': 'first', 'b': 'second', 'c': 'first'}) | ||
Index(['first', 'second', 'first'], dtype='object') | ||
|
||
If a dictionary is used, all unmapped categories are mapped to NA and | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comments as before (may be a few other places) use `dict` instead of dictionary and `NaN` instead of NA |
||
the result is an Index: | ||
|
||
>>> idx.map({'a': 'first', 'b': 'second'}) | ||
Index(['first', 'second', nan], dtype='object') | ||
""" | ||
return self._shallow_copy_with_infer(self.values.map(mapper)) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2423,25 +2423,26 @@ def unstack(self, level=-1, fill_value=None): | |
|
||
def map(self, arg, na_action=None): | ||
""" | ||
Map values of Series using input correspondence (which can be | ||
a dict, Series, or function) | ||
Map values of Series using input correspondence (a dict, Series, or | ||
function). | ||
|
||
Parameters | ||
---------- | ||
arg : function, dict, or Series | ||
Mapping correspondence. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should add here some details on how a dict and Series are handled. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is outside the original scope of the contribution, see first PR comment. |
||
na_action : {None, 'ignore'} | ||
If 'ignore', propagate NA values, without passing them to the | ||
mapping function | ||
mapping correspondence. | ||
|
||
Returns | ||
------- | ||
y : Series | ||
same index as caller | ||
Same index as caller. | ||
|
||
Examples | ||
-------- | ||
|
||
Map inputs to outputs (both of type `Series`) | ||
Map inputs to outputs (both of type `Series`): | ||
|
||
>>> x = pd.Series([1,2,3], index=['one', 'two', 'three']) | ||
>>> x | ||
|
@@ -2492,9 +2493,9 @@ def map(self, arg, na_action=None): | |
|
||
See Also | ||
-------- | ||
Series.apply: For applying more complex functions on a Series | ||
DataFrame.apply: Apply a function row-/column-wise | ||
DataFrame.applymap: Apply a function elementwise on a whole DataFrame | ||
Series.apply : For applying more complex functions on a Series. | ||
DataFrame.apply : Apply a function row-/column-wise. | ||
DataFrame.applymap : Apply a function elementwise on a whole DataFrame. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you also add Series.replace ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is outside the original scope of the contribution, see first PR comment. |
||
|
||
Notes | ||
----- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to fit on one line - can you condense?