-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix the docstring of xs in pandas/core/generic.py #22892 #23913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @tlentali! Thanks for submitting the PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just couple of comments on things that IMO would make the docstring better.
pandas/core/generic.py
Outdated
|
||
Returns a cross-section (row(s) or column(s)) | ||
from the Series/DataFrame. | ||
Defaults to cross-section on the rows (axis=0). | ||
|
||
Parameters | ||
---------- | ||
key : object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rename object
to label
here
pandas/core/generic.py
Outdated
|
||
Parameters | ||
---------- | ||
key : object | ||
Some label contained in the index, or partially in a MultiIndex | ||
Some label contained in the index, or partially in a MultiIndex. | ||
axis : int, default 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do a grep of axis :
and see what we usually use for the axis
type?, it's something like {0 or 'index', 1 or 'columns'}, default 0
pandas/core/generic.py
Outdated
If False, returns object with same levels as self. | ||
|
||
Returns | ||
------- | ||
xs : Series or DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xs : Series or DataFrame | |
Series or DataFrame |
We don't name the output anymore, just say the type. But in the next line we should have a short description on what's returned.
pandas/core/generic.py
Outdated
|
||
Notes | ||
----- | ||
xs is only for getting, not setting values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xs is only for getting, not setting values. | |
`xs` can not be used to set values. |
pandas/core/generic.py
Outdated
xs is only for getting, not setting values. | ||
|
||
MultiIndex Slicers is a generic way to get/set values on any level or | ||
levels. It is a superset of xs functionality, see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
levels. It is a superset of xs functionality, see | |
levels. It is a superset of `xs` functionality, see |
pandas/core/generic.py
Outdated
|
||
MultiIndex Slicers is a generic way to get/set values on any level or | ||
levels. It is a superset of xs functionality, see | ||
:ref:`MultiIndex Slicers <advanced.mi_slicers>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:ref:`MultiIndex Slicers <advanced.mi_slicers>` | |
:ref:`MultiIndex Slicers <advanced.mi_slicers>`. |
pandas/core/generic.py
Outdated
Examples | ||
-------- | ||
>>> d = {'A': [4, 4, 9], 'B': [5, 0, 7], 'C': [2, 9, 3]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's be nice to use a more real-world example. I think it makes things more complicated that the values used are arbitrary.
We've been using examples with animals, so for example df.xs(('mammal', 'cat'))
returning num_legs 4
makes it very clear what is going on.
Codecov Report
@@ Coverage Diff @@
## master #23913 +/- ##
===========================================
- Coverage 92.29% 42.45% -49.85%
===========================================
Files 161 161
Lines 51498 51561 +63
===========================================
- Hits 47530 21888 -25642
- Misses 3968 29673 +25705
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I just added couple of comments about formatting, and about making the examples simpler/clearer (IMO). With them it should be ready to be merged.
Thanks @tlentali
pandas/core/generic.py
Outdated
|
||
Returns a cross-section (row(s) or column(s)) | ||
from the Series/DataFrame. | ||
Defaults to cross-section on the rows (axis=0). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit repetitive of the short summary. Can you try to explain in an easier way what the method does. I don't think a person who hasn't used it can understand by just reading the description.
pandas/core/generic.py
Outdated
axis : int, default 0 | ||
Axis to retrieve cross-section on | ||
key : label | ||
Some label contained in the index, or partially in a MultiIndex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think more than one level can be provided, right? Can you make it clear here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure to understand this one, do you want me to precise the key ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can dodf.xs(key=('mammal', 'dog'))
, and if that's the case, the type label
seems incorrect (or not clear). label or tuple of label
and an a description that explains what is expected would be better
Returns | ||
------- | ||
xs : Series or DataFrame | ||
Series or DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a short description on what is being returned.
pandas/core/generic.py
Outdated
See Also | ||
-------- | ||
DataFrame.loc : Access a group of rows and columns | ||
by label(s) or a boolean array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right indentation is 4 spaces to the right of the D
in DataFrame.loc
. Also, in the previous line, can you continue the description until it's close to the maximum of 79 characters?
Same things in the next item too.
pandas/core/generic.py
Outdated
|
||
Notes | ||
----- | ||
xs is only for getting, not setting values. | ||
'xs' can not be used to set values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'xs' can not be used to set values. | |
`xs` can not be used to set values. |
the quoting of xs should be backticks, no single quotes. Same in the next case.
pandas/core/generic.py
Outdated
... 'num_arms': [0, 0, 0, 0], | ||
... 'num_wings': [0, 0, 2, 0], | ||
... 'num_spec_seen': [9, 2, 7, 3], | ||
... 'class': ['mammal', 'mammal', 'sauropsida', 'sauropsida'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably use mammal and bird (and change turtle by a bird). The idea is to show what xs
does in the simplest possible way, so anything that adds complexity, like using a specialized concept like sauropsida, is something I'd try to avoid.
I'd also remove num_arms, and num_spec_seen, as they are not used, and they just add "noise" that makes focusing on what xs
does more difficult.
pandas/core/generic.py
Outdated
... 'num_spec_seen': [9, 2, 7, 3], | ||
... 'class': ['mammal', 'mammal', 'sauropsida', 'sauropsida'], | ||
... 'animal': ['cat', 'dog', 'duck', 'turtle'], | ||
... 'area': [1, 1, 2, 3]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if instead of area (which for what I understand is arbitrary), we use something like "flies", "walks" (you can use bat and penguin, so you have the examples crossed). I think this way the last example will be clearer.
I am on it 👍 |
I corrected the points you highlighted, is it ok for you ? Again, thanks for you time and your help ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review, the notification for this got lost in my inbox.
Looks great, couple of comments, but almost ready.
Also, it could make sense to leave a blank line between some of the examples, so they are rendered in blocks. If you want to do a ./doc/make.py html --single=pandas.DataFrame.xs
, and see how it looks better (a single code block with all the examples, or couple of code blocks). And if you divide the examples, and you think that it would be helpful to have a comment explaining what we are doing...
pandas/core/generic.py
Outdated
Series/DataFrame. Defaults to cross-section on the rows (axis=0). | ||
Return cross-section from the Series/DataFrame. | ||
|
||
This methode takes a level argument to select data at a particular |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This methode takes a level argument to select data at a particular | |
This method takes a `key` argument to select data at a particular |
pandas/core/generic.py
Outdated
Return cross-section from the Series/DataFrame. | ||
|
||
This methode takes a level argument to select data at a particular | ||
level of a MultiIndex easier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
level of a MultiIndex easier. | |
level of a MultiIndex. |
pandas/core/generic.py
Outdated
Some label contained in the index, or partially in a MultiIndex | ||
axis : int, default 0 | ||
Axis to retrieve cross-section on | ||
key : label or tuple of label, e.g. ['a', 'b', 'c'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key : label or tuple of label, e.g. ['a', 'b', 'c'] | |
key : label or tuple of label |
pandas/core/generic.py
Outdated
Returns | ||
------- | ||
xs : Series or DataFrame | ||
Series or DataFrame of the cross-section (row(s) or column(s)) from | ||
the original Series/DataFrame. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format of the returns section is:
Series or DataFrame # type or types in a way that they can be parsed (e.g. `bool, int or str` -> `['bool', 'int', 'str']`)
Cross-section from the original Series or DataFrame corresponding to the selected index levels.
# ^ Description of what is being returned
pandas/core/generic.py
Outdated
num_legs num_wings | ||
locomotion | ||
walks 4 0 | ||
>>> df.xs('num_wings', axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're repeating this, right? See two commands above.
pandas/core/generic.py
Outdated
bat flies 2 | ||
bird penguin walks 2 | ||
Name: num_wings, dtype: int64 | ||
>>> df.xs(('mammal', 'bat')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this shows the same use case than df.xs(('mammal', 'dog'))
, I think with just one should be enough.
pandas/core/generic.py
Outdated
num_legs num_wings | ||
class locomotion | ||
mammal walks 4 0 | ||
>>> df.xs(('bird', 'walks'), level=[0, 'locomotion']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice example. And the previous too. Not sure if it'd make it clearer for the readers with examples that returns more than one value? like df.xs('walks', level=2)
and df.xs(('mammal', 'walks'), level=[0, 'locomotion'])
Up to you, whatever you think would help more a newbie to pandas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks @tlentali
Thank you very much for your help @datapythonista ! |
thanks @tlentali |
git diff upstream/master -u -- "*.py" | flake8 --diff
Hello @datapythonista !
As requested in #22892 , I corrected the docstring of the pandas/core/generic.py xs function to pass correctly the following command :
And I also validate the PEP-8 according to :
Is it ok for you ?