Skip to content

DOC: Added MultiIndex Example for Series Min #23338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Dec 25, 2018

Conversation

vadakattu
Copy link
Contributor

Corollary to #23298

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

@pep8speaks
Copy link

pep8speaks commented Oct 25, 2018

Hello @vadakattu! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 22, 2018 at 18:21 Hours UTC

@codecov
Copy link

codecov bot commented Oct 25, 2018

Codecov Report

Merging #23338 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23338      +/-   ##
==========================================
+ Coverage    92.3%    92.3%   +<.01%     
==========================================
  Files         162      162              
  Lines       51875    51879       +4     
==========================================
+ Hits        47883    47888       +5     
+ Misses       3992     3991       -1
Flag Coverage Δ
#multiple 90.71% <100%> (ø) ⬆️
#single 42.99% <100%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 96.62% <100%> (ø) ⬆️
pandas/util/testing.py 87.84% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e0358d...8ec816a. Read the comment docs.

@jreback jreback added this to the 0.24.0 milestone Oct 26, 2018
@jreback
Copy link
Contributor

jreback commented Oct 26, 2018

cc @datapythonista @WillAyd

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but we should avoid repeating this for every stats method, I'd prefer a template than repeating these examples many times.

@jreback
Copy link
Contributor

jreback commented Oct 26, 2018

@vadakattu can you template this as @datapythonista suggests?

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we need to keep compatibility for Python 2.7 and 3.5 for some time more, so we can't use f-strings yet.

@vadakattu
Copy link
Contributor Author

Of course! ...that was very absent minded of me.

Do you have any suggestions for a better way to do the templating? Otherwise I'll go ahead and switch it to old-style formatting.

@datapythonista
Copy link
Member

You can do the same, but without f-strings. If you need a reference, upper, lower... in strings.py also share docstring.

@datapythonista
Copy link
Member

@vadakattu can you please merge master into your branch and push, and make sure the CI is green, so we can merge this. Thanks!

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment, apply to all cases.

@datapythonista
Copy link
Member

The idea was to change the variables to the new format too (e.g. %s -> {})

@vadakattu
Copy link
Contributor Author

Done.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vadakattu, Looks good, couple of small things to make this standard with the rest of the docs.

Also, if you can run ./scripts/validate_docstrings.py pandas.Series.min... and check that there is nothing wrong with the docstrings based on the script, that would be great.

@datapythonista
Copy link
Member

I think this is modifying some the same things as #22554. @vadakattu @Roald87, can you coordinate and avoid duplicate work

@vadakattu
Copy link
Contributor Author

Have now restructured _num_doc, and incorporated the minimal changes necessary across the stat functions (as desired in #22554) such min, max & sum have fully compliant docstrings.

./scripts/validate_docstrings.py pandas.Series.min

################################################################################
######################## Docstring (pandas.Series.min)  ########################
################################################################################

Return the minimum of the values for the requested axis.

            If you want the *index* of the minimum, use ``idxmin``. This is
            the equivalent of the ``numpy.ndarray`` method ``argmin``.

Parameters
----------
axis : {index (0)}
    Axis for the function to be applied on.
skipna : bool, default True
    Exclude NA/null values when computing the result.
level : int or level name, default None
    If the axis is a MultiIndex (hierarchical), count along a
    particular level, collapsing into a scalar.
numeric_only : bool, default None
    Include only float, int, boolean columns. If None, will attempt to use
    everything, then use only numeric data. Not implemented for Series.
**kwargs
    Additional keyword arguments to be passed to the function.

Returns
-------
min : scalar or Series (if level specified)

Examples
--------

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64

>>> s.min()
0

Min using level names, as well as indices.

>>> s.min(level='blooded')
blooded
warm    2
cold    0
Name: legs, dtype: int64

>>> s.min(level=0)
blooded
warm    2
cold    0
Name: legs, dtype: int64

See Also
--------
Series.sum : Return the sum.
Series.min : Return the minimum.
Series.max : Return the maximum.
Series.idxmin : Return the index of the minimum.
Series.idxmax : Return the index of the maximum.
DataFrame.min : Return the sum over the requested axis.
DataFrame.min : Return the minimum over the requested axis.
DataFrame.max : Return the maximum over the requested axis.
DataFrame.idxmin : Return the index of the minimum over the requested axis.
DataFrame.idxmax : Return the index of the maximum over the requested axis.

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.Series.min" correct. :)

./scripts/validate_docstrings.py pandas.Series.max

################################################################################
######################## Docstring (pandas.Series.max)  ########################
################################################################################

Return the maximum of the values for the requested axis.

            If you want the *index* of the maximum, use ``idxmax``. This is
            the equivalent of the ``numpy.ndarray`` method ``argmax``.

Parameters
----------
axis : {index (0)}
    Axis for the function to be applied on.
skipna : bool, default True
    Exclude NA/null values when computing the result.
level : int or level name, default None
    If the axis is a MultiIndex (hierarchical), count along a
    particular level, collapsing into a scalar.
numeric_only : bool, default None
    Include only float, int, boolean columns. If None, will attempt to use
    everything, then use only numeric data. Not implemented for Series.
**kwargs
    Additional keyword arguments to be passed to the function.

Returns
-------
max : scalar or Series (if level specified)

Examples
--------

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64

>>> s.max()
8

Max using level names, as well as indices.

>>> s.max(level='blooded')
blooded
warm    4
cold    8
Name: legs, dtype: int64

>>> s.max(level=0)
blooded
warm    4
cold    8
Name: legs, dtype: int64

See Also
--------
Series.sum : Return the sum.
Series.min : Return the minimum.
Series.max : Return the maximum.
Series.idxmin : Return the index of the minimum.
Series.idxmax : Return the index of the maximum.
DataFrame.min : Return the sum over the requested axis.
DataFrame.min : Return the minimum over the requested axis.
DataFrame.max : Return the maximum over the requested axis.
DataFrame.idxmin : Return the index of the minimum over the requested axis.
DataFrame.idxmax : Return the index of the maximum over the requested axis.

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.Series.max" correct. :)

./scripts/validate_docstrings.py pandas.Series.sum

################################################################################
######################## Docstring (pandas.Series.sum)  ########################
################################################################################

Return the sum of the values for the requested axis.

            This is equivalent to the method ``numpy.sum``.

Parameters
----------
axis : {index (0)}
    Axis for the function to be applied on.
skipna : bool, default True
    Exclude NA/null values when computing the result.
level : int or level name, default None
    If the axis is a MultiIndex (hierarchical), count along a
    particular level, collapsing into a scalar.
numeric_only : bool, default None
    Include only float, int, boolean columns. If None, will attempt to use
    everything, then use only numeric data. Not implemented for Series.
min_count : int, default 0
    The required number of valid values to perform the operation. If fewer than
    ``min_count`` non-NA values are present the result will be NA.

    .. versionadded :: 0.22.0

       Added with the default being 0. This means the sum of an all-NA
       or empty Series is 0, and the product of an all-NA or empty
       Series is 1.
**kwargs
    Additional keyword arguments to be passed to the function.

Returns
-------
sum : scalar or Series (if level specified)

Examples
--------

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64

>>> s.sum()
14

Sum using level names, as well as indices.

>>> s.sum(level='blooded')
blooded
warm    6
cold    8
Name: legs, dtype: int64

>>> s.sum(level=0)
blooded
warm    6
cold    8
Name: legs, dtype: int64

By default, the sum of an empty or all-NA Series is ``0``.

>>> pd.Series([]).sum()  # min_count=0 is the default
0.0

This can be controlled with the ``min_count`` parameter. For example, if
you'd like the sum of an empty series to be NaN, pass ``min_count=1``.

>>> pd.Series([]).sum(min_count=1)
nan

Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and
empty series identically.

>>> pd.Series([np.nan]).sum()
0.0

>>> pd.Series([np.nan]).sum(min_count=1)
nan

See Also
--------
Series.sum : Return the sum.
Series.min : Return the minimum.
Series.max : Return the maximum.
Series.idxmin : Return the index of the minimum.
Series.idxmax : Return the index of the maximum.
DataFrame.min : Return the sum over the requested axis.
DataFrame.min : Return the minimum over the requested axis.
DataFrame.max : Return the maximum over the requested axis.
DataFrame.idxmin : Return the index of the minimum over the requested axis.
DataFrame.idxmax : Return the index of the maximum over the requested axis.

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.Series.sum" correct. :)

@datapythonista
Copy link
Member

@vadakattu can you fix the conflicts? we merged the PR that I mentioned it was touching same code as this one

@vadakattu
Copy link
Contributor Author

I believed I dealt with it last night -- the single circle ci failure seems unrelated.
Pulling master again to re-test.

=================================== FAILURES ===================================
___________________________ test_tick_add_sub[Milli] ___________________________

cls = <class 'pandas.tseries.offsets.Milli'>

    @pytest.mark.parametrize('cls', tick_classes)
>   @example(n=2, m=3)
    @example(n=800, m=300)
    @example(n=1000, m=5)
    @given(n=st.integers(-999, 999), m=st.integers(-999, 999))
    def test_tick_add_sub(cls, n, m):

pandas/tests/tseries/offsets/test_ticks.py:42: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/conda/envs/pandas-dev/lib/python3.6/site-packages/hypothesis/core.py:552: in execute
    ) % (test.__name__, text_repr[0],))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <hypothesis.core.StateForActualGivenExecution object at 0x7fb7828ac5f8>
message = 'Hypothesis test_tick_add_sub(cls=Milli, n=0, m=100) produces unreliable results: Falsified on the first call but did not on a subsequent one'

    def __flaky(self, message):
        if len(self.falsifying_examples) <= 1:
>           raise Flaky(message)
E           hypothesis.errors.Flaky: Hypothesis test_tick_add_sub(cls=Milli, n=0, m=100) produces unreliable results: Falsified on the first call but did not on a subsequent one

@vadakattu
Copy link
Contributor Author

Tests pass, and docstrings fully compliant per validate_docstrings.

@Roald87
Copy link

Roald87 commented Dec 9, 2018

@datapythonista seems as if @vadakattu already made the full docs, so there's no point in merging my PR I guess?

@datapythonista
Copy link
Member

I didn't check in detail how much overlap exists in both PRs. But as pointed out in #23338 (comment), you should talk with @vadakattu and see if it makes sense to have all the changes in one and close the other, or what.

@Roald87
Copy link

Roald87 commented Dec 9, 2018

@vadakattu what do you think? I'm fine with taking yours, I don't mine adds much.

@vadakattu
Copy link
Contributor Author

👍🏽 Sounds good.

@datapythonista
Copy link
Member

@vadakattu can you fix the conflicts please. Sorry for the delay.

Krishna added 2 commits December 22, 2018 18:04
# Conflicts:
#	pandas/core/generic.py
@vadakattu
Copy link
Contributor Author

Conflicts resolved.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @vadakattu

@WillAyd WillAyd merged commit 159772d into pandas-dev:master Dec 25, 2018
@WillAyd
Copy link
Member

WillAyd commented Dec 25, 2018

Thanks @vadakattu

@vadakattu vadakattu deleted the series-min branch December 31, 2018 15:14
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants