-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: update NDFrame.squeeze docstring #20269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/generic.py
Outdated
|
||
Squeezing is even more effective when used with DataFrames. | ||
|
||
>>> df = pd.DataFrame([[1,2], [3, 4]], columns=['a', 'b']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor edit - need space after first comma
pandas/core/generic.py
Outdated
1 3 4 | ||
|
||
Slicing a single column will produce a DataFrame with one of the | ||
axis having only 1 dimension: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
axes instead of axis
pandas/core/generic.py
Outdated
|
||
Parameters | ||
---------- | ||
axis : None, integer or string axis name, optional | ||
The axis to squeeze if 1-sized. | ||
axis : integer or string, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convention here is axis : {0 or ‘index’, 1 or ‘columns’, None}, default None
pandas/core/generic.py
Outdated
Objects along the column are 1 dimensional, so they can be squeezed | ||
into scalars: | ||
|
||
>>> df_a.squeeze('columns') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the first example is great but this one could use a little work. With how it's presented it wouldn't make sense to use squeeze
instead of just selecting the desired Series
. Perhaps using a predicate here of say 'a' == 1
and squeezing the result of that would showcase the utility better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, in most cases squeeze will be necessary because of indirect slicing rather than specific axis selections.
pandas/core/generic.py
Outdated
Slicing a single row from a single column will produce a single | ||
scalar DataFrame: | ||
|
||
>>> df_0a = df[['a']].iloc[[0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stylistically I think using a predicate than selecting a set of columns would be clearer, so something like df[df.index < 1][['a']]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, basically the same point as the one before. Will work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great pull request, added some comments/ideas.
pandas/core/generic.py
Outdated
If the given axis consists of one dimensional objects, they are turned | ||
into scalars. In case no axis is specified, all axes are subject to | ||
squeezing. In any case, objects in axes that can't be squeezed are | ||
left unchanged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good description, but personally I find that being a bit less technical would make it easier to understand. For example "If applied in a 1x1 DataFrame, it returns the contained value. In a DataFrame with one column, it converts it to a Series...". Feel free to disagree, but IMO something like this is a bit faster to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hard part of describing squeeze it's that it's too general, but perhaps I lost touch with practicality because I was thinking of squeezing N-dimensional objects, even though pandas deals mainly with 1D (Series) and 2D (Frames).
I hoped to make it concrete in the examples, but I'm afraid people will just give up reading after a paragraph like this.
pandas/core/generic.py
Outdated
|
||
.. versionadded:: 0.20.0 | ||
|
||
Returns | ||
------- | ||
scalar if 1-sized, else original object | ||
DataFrame, Series or scalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if scalar is what we use when it can return anything contained in the values of the dataframe. It's ok with me, just pointing out in case someone knows of another terminology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find anything on the guidelines about it, but I'm taking other docs as guide: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sum.html
See Also | ||
-------- | ||
Series.iloc : Integer-location based indexing for selecting scalars | ||
DataFrame.iloc : Integer-location based indexing for selecting Series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add DataFrame.to_series
? I think they do the same in the case of 1 column DataFrame, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is somewhat, I think squeezing is most useful in slicing scenarios but perhaps someone might find that a direct conversion is what they really wanted. Will add it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think DataFrame.to_series
exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hehe, that's a good reason to not add it... ;) not sure what I was thinking about, I think I got confused with Index.to_series
, sorry
pandas/core/generic.py
Outdated
>>> even_primes.squeeze() | ||
2 | ||
|
||
Squeezing objects with more than 1 dimension does nothing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically speaking "squeezing Series
with more than 1 dimension does nothing", squeezing a dataframe with 1D in one of the axis does something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, "objects" became ambiguous. The objects inside that DataFrame axis would be 1D, that's what I meant. Will try to clarify.
Squeezing all axes wil project directly into a scalar: | ||
|
||
>>> df_0a.squeeze() | ||
1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this example using df
covers what it's shown first with the primes. I'd leave just this one, personally I find it really good, and enough to not have to list the previous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to have examples with both Series and DataFrames because both classes share this docstring, so it would be a bit weird to read the docs from Series.squeeze
and find examples only of DataFrame.squeeze
. But I think I could "chain" those examples, since in the middle of the df
example I may squeeze some Series
as well.
I'll try to merge both, because the Series example is more concrete and related do slicing (the most likely use case IMO), but the second covers both classes.
[ci skip]
Updated, if anyone wants to take a look. Merging tomorrow otherwise. I went with @datapythonista's suggestion on how to phrase the extended summary. Pandas only has to worry about |
I was going to implement a few of the discussed changes next Saturday, but you can merge this one and if I actually get to it I'll make a new PR. |
No rush. Feel free to pull my changes and push them here.
…On Wed, Mar 14, 2018 at 3:29 PM, Victor Villas ***@***.***> wrote:
I was going to implement a few of the discussed changes next Saturday, but
you can merge this one and if I actually get to it I'll make a new PR.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#20269 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIu1UbiAJG66_eXrMOqK-xjNAQ8WMks5teX2VgaJpZM4Slhl1>
.
|
Merging this as-is, since it looks like this is good. @villasv : Thanks! Happy to take any additional updates as a separate PR, if you're still interested in implementing additional changes. |
Series.squeeze and DataFrame.squeeze are inherited from NDFrame
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
Please include the output of the validation script below between the "```" ticks:
The
.. versionadded
rst macro already adds the dot, so I can't just put another one after it.