Skip to content

Fix the docstring of xs in pandas/core/generic.py #22892 #23913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 2, 2018

Conversation

tlentali
Copy link
Contributor

@tlentali tlentali commented Nov 25, 2018

Hello @datapythonista !
As requested in #22892 , I corrected the docstring of the pandas/core/generic.py xs function to pass correctly the following command :

./scripts/validate_docstrings.py pandas.Series.xs

And I also validate the PEP-8 according to :

flake8 --doctests pandas/core/generic.py

Is it ok for you ?

@pep8speaks
Copy link

Hello @tlentali! Thanks for submitting the PR.

@datapythonista datapythonista added Docs Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 25, 2018
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just couple of comments on things that IMO would make the docstring better.


Returns a cross-section (row(s) or column(s))
from the Series/DataFrame.
Defaults to cross-section on the rows (axis=0).

Parameters
----------
key : object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename object to label here


Parameters
----------
key : object
Some label contained in the index, or partially in a MultiIndex
Some label contained in the index, or partially in a MultiIndex.
axis : int, default 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do a grep of axis : and see what we usually use for the axis type?, it's something like {0 or 'index', 1 or 'columns'}, default 0

If False, returns object with same levels as self.

Returns
-------
xs : Series or DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
xs : Series or DataFrame
Series or DataFrame

We don't name the output anymore, just say the type. But in the next line we should have a short description on what's returned.


Notes
-----
xs is only for getting, not setting values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
xs is only for getting, not setting values.
`xs` can not be used to set values.

xs is only for getting, not setting values.

MultiIndex Slicers is a generic way to get/set values on any level or
levels. It is a superset of xs functionality, see
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
levels. It is a superset of xs functionality, see
levels. It is a superset of `xs` functionality, see


MultiIndex Slicers is a generic way to get/set values on any level or
levels. It is a superset of xs functionality, see
:ref:`MultiIndex Slicers <advanced.mi_slicers>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:ref:`MultiIndex Slicers <advanced.mi_slicers>`
:ref:`MultiIndex Slicers <advanced.mi_slicers>`.

Examples
--------
>>> d = {'A': [4, 4, 9], 'B': [5, 0, 7], 'C': [2, 9, 3]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's be nice to use a more real-world example. I think it makes things more complicated that the values used are arbitrary.

We've been using examples with animals, so for example df.xs(('mammal', 'cat')) returning num_legs 4 makes it very clear what is going on.

@codecov
Copy link

codecov bot commented Nov 25, 2018

Codecov Report

Merging #23913 into master will decrease coverage by 49.84%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #23913       +/-   ##
===========================================
- Coverage   92.29%   42.45%   -49.85%     
===========================================
  Files         161      161               
  Lines       51498    51561       +63     
===========================================
- Hits        47530    21888    -25642     
- Misses       3968    29673    +25705
Flag Coverage Δ
#multiple ?
#single 42.45% <100%> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/core/generic.py 39.34% <100%> (-57.51%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-97.69%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.17%) ⬇️
pandas/io/sas/sas_xport.py 0% <0%> (-90.24%) ⬇️
... and 120 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 209e7f5...3f11180. Read the comment docs.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I just added couple of comments about formatting, and about making the examples simpler/clearer (IMO). With them it should be ready to be merged.

Thanks @tlentali


Returns a cross-section (row(s) or column(s))
from the Series/DataFrame.
Defaults to cross-section on the rows (axis=0).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit repetitive of the short summary. Can you try to explain in an easier way what the method does. I don't think a person who hasn't used it can understand by just reading the description.

axis : int, default 0
Axis to retrieve cross-section on
key : label
Some label contained in the index, or partially in a MultiIndex.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think more than one level can be provided, right? Can you make it clear here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to understand this one, do you want me to precise the key ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can dodf.xs(key=('mammal', 'dog')), and if that's the case, the type label seems incorrect (or not clear). label or tuple of label and an a description that explains what is expected would be better

Returns
-------
xs : Series or DataFrame
Series or DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a short description on what is being returned.

See Also
--------
DataFrame.loc : Access a group of rows and columns
by label(s) or a boolean array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right indentation is 4 spaces to the right of the D in DataFrame.loc. Also, in the previous line, can you continue the description until it's close to the maximum of 79 characters?

Same things in the next item too.


Notes
-----
xs is only for getting, not setting values.
'xs' can not be used to set values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'xs' can not be used to set values.
`xs` can not be used to set values.

the quoting of xs should be backticks, no single quotes. Same in the next case.

... 'num_arms': [0, 0, 0, 0],
... 'num_wings': [0, 0, 2, 0],
... 'num_spec_seen': [9, 2, 7, 3],
... 'class': ['mammal', 'mammal', 'sauropsida', 'sauropsida'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably use mammal and bird (and change turtle by a bird). The idea is to show what xs does in the simplest possible way, so anything that adds complexity, like using a specialized concept like sauropsida, is something I'd try to avoid.

I'd also remove num_arms, and num_spec_seen, as they are not used, and they just add "noise" that makes focusing on what xs does more difficult.

... 'num_spec_seen': [9, 2, 7, 3],
... 'class': ['mammal', 'mammal', 'sauropsida', 'sauropsida'],
... 'animal': ['cat', 'dog', 'duck', 'turtle'],
... 'area': [1, 1, 2, 3]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if instead of area (which for what I understand is arbitrary), we use something like "flies", "walks" (you can use bat and penguin, so you have the examples crossed). I think this way the last example will be clearer.

@tlentali
Copy link
Contributor Author

I am on it 👍

@tlentali
Copy link
Contributor Author

I corrected the points you highlighted, is it ok for you ? Again, thanks for you time and your help !

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review, the notification for this got lost in my inbox.

Looks great, couple of comments, but almost ready.

Also, it could make sense to leave a blank line between some of the examples, so they are rendered in blocks. If you want to do a ./doc/make.py html --single=pandas.DataFrame.xs, and see how it looks better (a single code block with all the examples, or couple of code blocks). And if you divide the examples, and you think that it would be helpful to have a comment explaining what we are doing...

Series/DataFrame. Defaults to cross-section on the rows (axis=0).
Return cross-section from the Series/DataFrame.

This methode takes a level argument to select data at a particular
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This methode takes a level argument to select data at a particular
This method takes a `key` argument to select data at a particular

Return cross-section from the Series/DataFrame.

This methode takes a level argument to select data at a particular
level of a MultiIndex easier.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
level of a MultiIndex easier.
level of a MultiIndex.

Some label contained in the index, or partially in a MultiIndex
axis : int, default 0
Axis to retrieve cross-section on
key : label or tuple of label, e.g. ['a', 'b', 'c']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
key : label or tuple of label, e.g. ['a', 'b', 'c']
key : label or tuple of label

Returns
-------
xs : Series or DataFrame
Series or DataFrame of the cross-section (row(s) or column(s)) from
the original Series/DataFrame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format of the returns section is:

Series or DataFrame  # type or types in a way that they can be parsed (e.g. `bool, int or str` -> `['bool', 'int', 'str']`)
    Cross-section from the original Series or DataFrame corresponding to the selected index levels.
    # ^ Description of what is being returned

num_legs num_wings
locomotion
walks 4 0
>>> df.xs('num_wings', axis=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're repeating this, right? See two commands above.

bat flies 2
bird penguin walks 2
Name: num_wings, dtype: int64
>>> df.xs(('mammal', 'bat'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this shows the same use case than df.xs(('mammal', 'dog')), I think with just one should be enough.

num_legs num_wings
class locomotion
mammal walks 4 0
>>> df.xs(('bird', 'walks'), level=[0, 'locomotion'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice example. And the previous too. Not sure if it'd make it clearer for the readers with examples that returns more than one value? like df.xs('walks', level=2) and df.xs(('mammal', 'walks'), level=[0, 'locomotion'])

Up to you, whatever you think would help more a newbie to pandas.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @tlentali

@tlentali
Copy link
Contributor Author

tlentali commented Dec 2, 2018

Thank you very much for your help @datapythonista !

@jreback jreback added this to the 0.24.0 milestone Dec 2, 2018
@jreback jreback merged commit 212a1d1 into pandas-dev:master Dec 2, 2018
@jreback
Copy link
Contributor

jreback commented Dec 2, 2018

thanks @tlentali

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Fix the docstring of xs in pandas/core/generic.py
4 participants