DOC: update NDFrame.squeeze docstring #20269

villasv · 2018-03-11T02:03:28Z

Series.squeeze and DataFrame.squeeze are inherited from NDFrame

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
##################### Docstring (pandas.DataFrame.squeeze) #####################
################################################################################

Squeeze 1 dimensional axis objects into scalars.

If the given axis consists of one dimensional objects, they are turned
into scalars. In case no axis is specified, all axes are subject to
squeezing. In any case, objects in axes that can't be squeezed are
left unchanged.

Parameters
----------
axis : integer or string, optional
    A specific axis to squeeze.

    .. versionadded:: 0.20.0

Returns
-------
DataFrame, Series or scalar
    The projection after squeezing axis.

See Also
--------
Series.iloc : Integer-location based indexing for selecting scalars
DataFrame.iloc : Integer-location based indexing for selecting Series

Examples
--------
>>> primes = pd.Series([2, 3, 5, 7])

Slicing might produce a Series with a single value:

>>> even_primes = primes[primes % 2 == 0]
>>> even_primes
0    2
dtype: int64
>>> even_primes.squeeze()
2

Squeezing objects with more than 1 dimension does nothing:

>>> odd_primes = primes[primes % 2 == 1]
>>> odd_primes
1    3
2    5
3    7
dtype: int64
>>> odd_primes.squeeze()
1    3
2    5
3    7
dtype: int64

Squeezing is even more effective when used with DataFrames.

>>> df = pd.DataFrame([[1,2], [3, 4]], columns=['a', 'b'])
>>> df
   a  b
0  1  2
1  3  4

Slicing a single column will produce a DataFrame with one of the
axis having only 1 dimension:

>>> df_a = df[['a']]
>>> df_a
   a
0  1
1  3

Objects along the column are 1 dimensional, so they can be squeezed
into scalars:

>>> df_a.squeeze('columns')
0    1
1    3
Name: a, dtype: int64

Slicing a single row from a single column will produce a single
scalar DataFrame:

>>> df_0a = df[['a']].iloc[[0]]
>>> df_0a
   a
0  1

Squeezing along the rows produces a single scalar Series:

>>> df_0a.squeeze('rows')
a    1
Name: 0, dtype: int64

Squeezing all axes wil project directly into a scalar:

>>> df_0a.squeeze()
1

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
        Errors in parameters section
                Parameter "axis" description should finish with "."

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

The .. versionadded rst macro already adds the dot, so I can't just put another one after it.

WillAyd · 2018-03-11T02:33:05Z

pandas/core/generic.py

+
+        Squeezing is even more effective when used with DataFrames.
+
+        >>> df = pd.DataFrame([[1,2], [3, 4]], columns=['a', 'b'])


Minor edit - need space after first comma

WillAyd · 2018-03-11T02:33:33Z

pandas/core/generic.py

+        1  3  4
+
+        Slicing a single column will produce a DataFrame with one of the
+        axis having only 1 dimension:


axes instead of axis

WillAyd · 2018-03-11T02:34:48Z

pandas/core/generic.py


        Parameters
        ----------
-        axis : None, integer or string axis name, optional
-            The axis to squeeze if 1-sized.
+        axis : integer or string, optional


Convention here is axis : {0 or ‘index’, 1 or ‘columns’, None}, default None

WillAyd · 2018-03-11T02:42:29Z

pandas/core/generic.py

+        Objects along the column are 1 dimensional, so they can be squeezed
+        into scalars:
+
+        >>> df_a.squeeze('columns')


I think the first example is great but this one could use a little work. With how it's presented it wouldn't make sense to use squeeze instead of just selecting the desired Series. Perhaps using a predicate here of say 'a' == 1 and squeezing the result of that would showcase the utility better?

Makes sense, in most cases squeeze will be necessary because of indirect slicing rather than specific axis selections.

WillAyd · 2018-03-11T02:50:11Z

pandas/core/generic.py

+        Slicing a single row from a single column will produce a single
+        scalar DataFrame:
+
+        >>> df_0a = df[['a']].iloc[[0]]


Stylistically I think using a predicate than selecting a set of columns would be clearer, so something like df[df.index < 1][['a']]

Yep, basically the same point as the one before. Will work on it.

datapythonista

Great pull request, added some comments/ideas.

datapythonista · 2018-03-13T23:18:07Z

pandas/core/generic.py

+        If the given axis consists of one dimensional objects, they are turned
+        into scalars. In case no axis is specified, all axes are subject to
+        squeezing. In any case, objects in axes that can't be squeezed are
+        left unchanged.


I think it's a good description, but personally I find that being a bit less technical would make it easier to understand. For example "If applied in a 1x1 DataFrame, it returns the contained value. In a DataFrame with one column, it converts it to a Series...". Feel free to disagree, but IMO something like this is a bit faster to understand.

The hard part of describing squeeze it's that it's too general, but perhaps I lost touch with practicality because I was thinking of squeezing N-dimensional objects, even though pandas deals mainly with 1D (Series) and 2D (Frames).

I hoped to make it concrete in the examples, but I'm afraid people will just give up reading after a paragraph like this.

datapythonista · 2018-03-13T23:19:46Z

pandas/core/generic.py


            .. versionadded:: 0.20.0

        Returns
        -------
-        scalar if 1-sized, else original object
+        DataFrame, Series or scalar


Not sure if scalar is what we use when it can return anything contained in the values of the dataframe. It's ok with me, just pointing out in case someone knows of another terminology.

I couldn't find anything on the guidelines about it, but I'm taking other docs as guide: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sum.html

datapythonista · 2018-03-13T23:20:44Z

pandas/core/generic.py

+        See Also
+        --------
+        Series.iloc : Integer-location based indexing for selecting scalars
+        DataFrame.iloc : Integer-location based indexing for selecting Series


Does it make sense to add DataFrame.to_series? I think they do the same in the case of 1 column DataFrame, right?

It is somewhat, I think squeezing is most useful in slicing scenarios but perhaps someone might find that a direct conversion is what they really wanted. Will add it too.

I don't think DataFrame.to_series exists?

hehe, that's a good reason to not add it... ;) not sure what I was thinking about, I think I got confused with Index.to_series, sorry

datapythonista · 2018-03-13T23:22:36Z

pandas/core/generic.py

+        >>> even_primes.squeeze()
+        2
+
+        Squeezing objects with more than 1 dimension does nothing:


technically speaking "squeezing Series with more than 1 dimension does nothing", squeezing a dataframe with 1D in one of the axis does something.

Sorry, "objects" became ambiguous. The objects inside that DataFrame axis would be 1D, that's what I meant. Will try to clarify.

datapythonista · 2018-03-13T23:25:45Z

pandas/core/generic.py

+        Squeezing all axes wil project directly into a scalar:
+
+        >>> df_0a.squeeze()
+        1


I feel like this example using df covers what it's shown first with the primes. I'd leave just this one, personally I find it really good, and enough to not have to list the previous.

I wanted to have examples with both Series and DataFrames because both classes share this docstring, so it would be a bit weird to read the docs from Series.squeeze and find examples only of DataFrame.squeeze. But I think I could "chain" those examples, since in the middle of the df example I may squeeze some Series as well.

I'll try to merge both, because the Series example is more concrete and related do slicing (the most likely use case IMO), but the second covers both classes.

[ci skip]

TomAugspurger · 2018-03-14T19:50:53Z

Updated, if anyone wants to take a look. Merging tomorrow otherwise.

I went with @datapythonista's suggestion on how to phrase the extended summary. Pandas only has to worry about n <= 2, so let's take advantage of that in the docs.

villasv · 2018-03-14T20:29:05Z

I was going to implement a few of the discussed changes next Saturday, but you can merge this one and if I actually get to it I'll make a new PR.

TomAugspurger · 2018-03-14T20:31:53Z

No rush. Feel free to pull my changes and push them here.

…

On Wed, Mar 14, 2018 at 3:29 PM, Victor Villas ***@***.***> wrote: I was going to implement a few of the discussed changes next Saturday, but you can merge this one and if I actually get to it I'll make a new PR. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#20269 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIu1UbiAJG66_eXrMOqK-xjNAQ8WMks5teX2VgaJpZM4Slhl1> .

jschendel · 2018-07-07T17:36:54Z

Merging this as-is, since it looks like this is good.

@villasv : Thanks! Happy to take any additional updates as a separate PR, if you're still interested in implementing additional changes.

DOC: update NDFrame.squeeze docstring

d6ad5b1

WillAyd requested changes Mar 11, 2018

View reviewed changes

jorisvandenbossche added Docs and removed Docs labels Mar 11, 2018

datapythonista reviewed Mar 13, 2018

View reviewed changes

TomAugspurger added 2 commits March 14, 2018 14:39

Updated

c9069ee

Updates [ci skip]

d512d93

[ci skip]

jschendel merged commit a82d779 into pandas-dev:master Jul 7, 2018

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

DOC: update NDFrame.squeeze docstring (pandas-dev#20269)

993915a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: update NDFrame.squeeze docstring #20269

DOC: update NDFrame.squeeze docstring #20269

villasv commented Mar 11, 2018 •

edited

Loading

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

villasv Mar 11, 2018

WillAyd Mar 11, 2018

villasv Mar 11, 2018

datapythonista left a comment

datapythonista Mar 13, 2018

villasv Mar 14, 2018

datapythonista Mar 13, 2018

villasv Mar 14, 2018

datapythonista Mar 13, 2018

villasv Mar 14, 2018

jorisvandenbossche Mar 17, 2018

datapythonista Mar 17, 2018

datapythonista Mar 13, 2018

villasv Mar 14, 2018

datapythonista Mar 13, 2018

villasv Mar 14, 2018 •

edited

Loading

TomAugspurger commented Mar 14, 2018

villasv commented Mar 14, 2018

TomAugspurger commented Mar 14, 2018 via email

jschendel commented Jul 7, 2018


		Squeezing is even more effective when used with DataFrames.

		>>> df = pd.DataFrame([[1,2], [3, 4]], columns=['a', 'b'])

DOC: update NDFrame.squeeze docstring #20269

DOC: update NDFrame.squeeze docstring #20269

Conversation

villasv commented Mar 11, 2018 • edited Loading

Series.squeeze and DataFrame.squeeze are inherited from NDFrame

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

villasv Mar 14, 2018 • edited Loading

Choose a reason for hiding this comment

TomAugspurger commented Mar 14, 2018

villasv commented Mar 14, 2018

TomAugspurger commented Mar 14, 2018 via email

jschendel commented Jul 7, 2018

villasv commented Mar 11, 2018 •

edited

Loading

villasv Mar 14, 2018 •

edited

Loading