Skip to content

DOC: update NDFrame.squeeze docstring #20269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 7, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 86 additions & 4 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -699,18 +699,100 @@ def pop(self, item):

def squeeze(self, axis=None):
"""
Squeeze length 1 dimensions.
Squeeze 1 dimensional axis objects into scalars.

If the given axis consists of one dimensional objects, they are turned
into scalars. In case no axis is specified, all axes are subject to
squeezing. In any case, objects in axes that can't be squeezed are
left unchanged.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good description, but personally I find that being a bit less technical would make it easier to understand. For example "If applied in a 1x1 DataFrame, it returns the contained value. In a DataFrame with one column, it converts it to a Series...". Feel free to disagree, but IMO something like this is a bit faster to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hard part of describing squeeze it's that it's too general, but perhaps I lost touch with practicality because I was thinking of squeezing N-dimensional objects, even though pandas deals mainly with 1D (Series) and 2D (Frames).

I hoped to make it concrete in the examples, but I'm afraid people will just give up reading after a paragraph like this.


Parameters
----------
axis : None, integer or string axis name, optional
The axis to squeeze if 1-sized.
axis : integer or string, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convention here is axis : {0 or ‘index’, 1 or ‘columns’, None}, default None

A specific axis to squeeze.

.. versionadded:: 0.20.0

Returns
-------
scalar if 1-sized, else original object
DataFrame, Series or scalar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if scalar is what we use when it can return anything contained in the values of the dataframe. It's ok with me, just pointing out in case someone knows of another terminology.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find anything on the guidelines about it, but I'm taking other docs as guide: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sum.html

The projection after squeezing axis.

See Also
--------
Series.iloc : Integer-location based indexing for selecting scalars
DataFrame.iloc : Integer-location based indexing for selecting Series
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add DataFrame.to_series? I think they do the same in the case of 1 column DataFrame, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is somewhat, I think squeezing is most useful in slicing scenarios but perhaps someone might find that a direct conversion is what they really wanted. Will add it too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think DataFrame.to_series exists?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe, that's a good reason to not add it... ;) not sure what I was thinking about, I think I got confused with Index.to_series, sorry


Examples
--------
>>> primes = pd.Series([2, 3, 5, 7])

Slicing might produce a Series with a single value:

>>> even_primes = primes[primes % 2 == 0]
>>> even_primes
0 2
dtype: int64
>>> even_primes.squeeze()
2

Squeezing objects with more than 1 dimension does nothing:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically speaking "squeezing Series with more than 1 dimension does nothing", squeezing a dataframe with 1D in one of the axis does something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, "objects" became ambiguous. The objects inside that DataFrame axis would be 1D, that's what I meant. Will try to clarify.


>>> odd_primes = primes[primes % 2 == 1]
>>> odd_primes
1 3
2 5
3 7
dtype: int64
>>> odd_primes.squeeze()
1 3
2 5
3 7
dtype: int64

Squeezing is even more effective when used with DataFrames.

>>> df = pd.DataFrame([[1,2], [3, 4]], columns=['a', 'b'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor edit - need space after first comma

>>> df
a b
0 1 2
1 3 4

Slicing a single column will produce a DataFrame with one of the
axis having only 1 dimension:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

axes instead of axis


>>> df_a = df[['a']]
>>> df_a
a
0 1
1 3

Objects along the column are 1 dimensional, so they can be squeezed
into scalars:

>>> df_a.squeeze('columns')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the first example is great but this one could use a little work. With how it's presented it wouldn't make sense to use squeeze instead of just selecting the desired Series. Perhaps using a predicate here of say 'a' == 1 and squeezing the result of that would showcase the utility better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, in most cases squeeze will be necessary because of indirect slicing rather than specific axis selections.

0 1
1 3
Name: a, dtype: int64

Slicing a single row from a single column will produce a single
scalar DataFrame:

>>> df_0a = df[['a']].iloc[[0]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistically I think using a predicate than selecting a set of columns would be clearer, so something like df[df.index < 1][['a']]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, basically the same point as the one before. Will work on it.

>>> df_0a
a
0 1

Squeezing along the rows produces a single scalar Series:

>>> df_0a.squeeze('rows')
a 1
Name: 0, dtype: int64

Squeezing all axes wil project directly into a scalar:

>>> df_0a.squeeze()
1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this example using df covers what it's shown first with the primes. I'd leave just this one, personally I find it really good, and enough to not have to list the previous.

Copy link
Contributor Author

@villasv villasv Mar 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to have examples with both Series and DataFrames because both classes share this docstring, so it would be a bit weird to read the docs from Series.squeeze and find examples only of DataFrame.squeeze. But I think I could "chain" those examples, since in the middle of the df example I may squeeze some Series as well.

I'll try to merge both, because the Series example is more concrete and related do slicing (the most likely use case IMO), but the second covers both classes.

"""
axis = (self._AXIS_NAMES if axis is None else
(self._get_axis_number(axis),))
Expand Down