Skip to content

DOC: update NDFrame.squeeze docstring #20269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 7, 2018
Merged

DOC: update NDFrame.squeeze docstring #20269

merged 3 commits into from
Jul 7, 2018

Conversation

villasv
Copy link
Contributor

@villasv villasv commented Mar 11, 2018

Series.squeeze and DataFrame.squeeze are inherited from NDFrame

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
##################### Docstring (pandas.DataFrame.squeeze) #####################
################################################################################

Squeeze 1 dimensional axis objects into scalars.

If the given axis consists of one dimensional objects, they are turned
into scalars. In case no axis is specified, all axes are subject to
squeezing. In any case, objects in axes that can't be squeezed are
left unchanged.

Parameters
----------
axis : integer or string, optional
    A specific axis to squeeze.

    .. versionadded:: 0.20.0

Returns
-------
DataFrame, Series or scalar
    The projection after squeezing axis.

See Also
--------
Series.iloc : Integer-location based indexing for selecting scalars
DataFrame.iloc : Integer-location based indexing for selecting Series

Examples
--------
>>> primes = pd.Series([2, 3, 5, 7])

Slicing might produce a Series with a single value:

>>> even_primes = primes[primes % 2 == 0]
>>> even_primes
0    2
dtype: int64
>>> even_primes.squeeze()
2

Squeezing objects with more than 1 dimension does nothing:

>>> odd_primes = primes[primes % 2 == 1]
>>> odd_primes
1    3
2    5
3    7
dtype: int64
>>> odd_primes.squeeze()
1    3
2    5
3    7
dtype: int64

Squeezing is even more effective when used with DataFrames.

>>> df = pd.DataFrame([[1,2], [3, 4]], columns=['a', 'b'])
>>> df
   a  b
0  1  2
1  3  4

Slicing a single column will produce a DataFrame with one of the
axis having only 1 dimension:

>>> df_a = df[['a']]
>>> df_a
   a
0  1
1  3

Objects along the column are 1 dimensional, so they can be squeezed
into scalars:

>>> df_a.squeeze('columns')
0    1
1    3
Name: a, dtype: int64

Slicing a single row from a single column will produce a single
scalar DataFrame:

>>> df_0a = df[['a']].iloc[[0]]
>>> df_0a
   a
0  1

Squeezing along the rows produces a single scalar Series:

>>> df_0a.squeeze('rows')
a    1
Name: 0, dtype: int64

Squeezing all axes wil project directly into a scalar:

>>> df_0a.squeeze()
1

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
        Errors in parameters section
                Parameter "axis" description should finish with "."

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

The .. versionadded rst macro already adds the dot, so I can't just put another one after it.


Squeezing is even more effective when used with DataFrames.

>>> df = pd.DataFrame([[1,2], [3, 4]], columns=['a', 'b'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor edit - need space after first comma

1 3 4

Slicing a single column will produce a DataFrame with one of the
axis having only 1 dimension:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

axes instead of axis


Parameters
----------
axis : None, integer or string axis name, optional
The axis to squeeze if 1-sized.
axis : integer or string, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convention here is axis : {0 or ‘index’, 1 or ‘columns’, None}, default None

Objects along the column are 1 dimensional, so they can be squeezed
into scalars:

>>> df_a.squeeze('columns')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the first example is great but this one could use a little work. With how it's presented it wouldn't make sense to use squeeze instead of just selecting the desired Series. Perhaps using a predicate here of say 'a' == 1 and squeezing the result of that would showcase the utility better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, in most cases squeeze will be necessary because of indirect slicing rather than specific axis selections.

Slicing a single row from a single column will produce a single
scalar DataFrame:

>>> df_0a = df[['a']].iloc[[0]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistically I think using a predicate than selecting a set of columns would be clearer, so something like df[df.index < 1][['a']]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, basically the same point as the one before. Will work on it.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great pull request, added some comments/ideas.

If the given axis consists of one dimensional objects, they are turned
into scalars. In case no axis is specified, all axes are subject to
squeezing. In any case, objects in axes that can't be squeezed are
left unchanged.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good description, but personally I find that being a bit less technical would make it easier to understand. For example "If applied in a 1x1 DataFrame, it returns the contained value. In a DataFrame with one column, it converts it to a Series...". Feel free to disagree, but IMO something like this is a bit faster to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hard part of describing squeeze it's that it's too general, but perhaps I lost touch with practicality because I was thinking of squeezing N-dimensional objects, even though pandas deals mainly with 1D (Series) and 2D (Frames).

I hoped to make it concrete in the examples, but I'm afraid people will just give up reading after a paragraph like this.


.. versionadded:: 0.20.0

Returns
-------
scalar if 1-sized, else original object
DataFrame, Series or scalar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if scalar is what we use when it can return anything contained in the values of the dataframe. It's ok with me, just pointing out in case someone knows of another terminology.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find anything on the guidelines about it, but I'm taking other docs as guide: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sum.html

See Also
--------
Series.iloc : Integer-location based indexing for selecting scalars
DataFrame.iloc : Integer-location based indexing for selecting Series
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add DataFrame.to_series? I think they do the same in the case of 1 column DataFrame, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is somewhat, I think squeezing is most useful in slicing scenarios but perhaps someone might find that a direct conversion is what they really wanted. Will add it too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think DataFrame.to_series exists?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe, that's a good reason to not add it... ;) not sure what I was thinking about, I think I got confused with Index.to_series, sorry

>>> even_primes.squeeze()
2

Squeezing objects with more than 1 dimension does nothing:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically speaking "squeezing Series with more than 1 dimension does nothing", squeezing a dataframe with 1D in one of the axis does something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, "objects" became ambiguous. The objects inside that DataFrame axis would be 1D, that's what I meant. Will try to clarify.

Squeezing all axes wil project directly into a scalar:

>>> df_0a.squeeze()
1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this example using df covers what it's shown first with the primes. I'd leave just this one, personally I find it really good, and enough to not have to list the previous.

Copy link
Contributor Author

@villasv villasv Mar 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to have examples with both Series and DataFrames because both classes share this docstring, so it would be a bit weird to read the docs from Series.squeeze and find examples only of DataFrame.squeeze. But I think I could "chain" those examples, since in the middle of the df example I may squeeze some Series as well.

I'll try to merge both, because the Series example is more concrete and related do slicing (the most likely use case IMO), but the second covers both classes.

@TomAugspurger
Copy link
Contributor

Updated, if anyone wants to take a look. Merging tomorrow otherwise.

I went with @datapythonista's suggestion on how to phrase the extended summary. Pandas only has to worry about n <= 2, so let's take advantage of that in the docs.

@villasv
Copy link
Contributor Author

villasv commented Mar 14, 2018

I was going to implement a few of the discussed changes next Saturday, but you can merge this one and if I actually get to it I'll make a new PR.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 14, 2018 via email

@jschendel
Copy link
Member

Merging this as-is, since it looks like this is good.

@villasv : Thanks! Happy to take any additional updates as a separate PR, if you're still interested in implementing additional changes.

@jschendel jschendel merged commit a82d779 into pandas-dev:master Jul 7, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants