-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: update the pandas.DataFrame.cummax docstring #20336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
5ccedc2
04f70dd
aec6084
4acf753
1214c93
a88e95a
fe94dad
f73b52f
33e5337
15b38dd
3c30d18
0cb3168
9d46623
5d502cb
e1e190f
aa34ea0
94fc1b3
657feac
77789a8
463eef7
b03c32a
9b05313
1147a0d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8246,17 +8246,17 @@ def compound(self, axis=None, skipna=None, level=None): | |
cls.cummin = _make_cum_function( | ||
cls, 'cummin', name, name2, axis_descr, "cumulative minimum", | ||
lambda y, axis: np.minimum.accumulate(y, axis), "min", | ||
np.inf, np.nan) | ||
np.inf, np.nan, '') | ||
cls.cumsum = _make_cum_function( | ||
cls, 'cumsum', name, name2, axis_descr, "cumulative sum", | ||
lambda y, axis: y.cumsum(axis), "sum", 0., np.nan) | ||
lambda y, axis: y.cumsum(axis), "sum", 0., np.nan, '') | ||
cls.cumprod = _make_cum_function( | ||
cls, 'cumprod', name, name2, axis_descr, "cumulative product", | ||
lambda y, axis: y.cumprod(axis), "prod", 1., np.nan) | ||
lambda y, axis: y.cumprod(axis), "prod", 1., np.nan, '') | ||
cls.cummax = _make_cum_function( | ||
cls, 'cummax', name, name2, axis_descr, "cumulative max", | ||
cls, 'cummax', name, name2, axis_descr, "cumulative maximum", | ||
lambda y, axis: np.maximum.accumulate(y, axis), "max", | ||
-np.inf, np.nan) | ||
-np.inf, np.nan, _cummax_examples) | ||
|
||
cls.sum = _make_min_count_stat_function( | ||
cls, 'sum', name, name2, axis_descr, | ||
|
@@ -8518,24 +8518,198 @@ def _doc_parms(cls): | |
""" | ||
|
||
_cnum_doc = """ | ||
Return %(desc)s over a DataFrame or Series axis. | ||
|
||
Returns a DataFrame or Series of the same size containing the %(desc)s. | ||
|
||
Parameters | ||
---------- | ||
axis : %(axis_descr)s | ||
axis : {0 or 'index', 1 or 'columns'}, default 0 | ||
skipna : boolean, default True | ||
Exclude NA/null values. If an entire row/column is NA, the result | ||
will be NA | ||
will be NA. | ||
*args : default None | ||
**kwargs : default None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As |
||
Additional keywords have no effect but might be accepted for | ||
compatibility with NumPy. | ||
|
||
Returns | ||
------- | ||
%(outname)s : %(name1)s\n | ||
|
||
|
||
%(outname)s : %(name1)s or %(name2)s\n | ||
%(examples)s | ||
See also | ||
-------- | ||
pandas.core.window.Expanding.%(accum_func_name)s : Similar functionality | ||
core.window.Expanding.%(accum_func_name)s : Similar functionality | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry to go back and forth, but for this one we want to keep the |
||
but ignores ``NaN`` values. | ||
Series.%(outname)s : Return %(desc)s over Series axis. | ||
DataFrame.cummax : Return cumulative maximum over DataFrame axis. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @datapythonista @jorisvandenbossche Is it a good idea to also add: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that might be a good idea, if you can automatically make it link in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This also There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I implement these changes, we won't get a reference to any corresponding I agree that we don't really need them because we have examples of both Just out of curiosity, is there a simple way to determine whether a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The thing is that the docstring for both Series and DataFrame (of the same method) will be exactly the same (apart from some of the links here in see also). So linking to the same method but on the other object, is not that important I think, as the other one does not give you more information
Yes, the |
||
DataFrame.cummin : Return cumulative minimum over DataFrame axis. | ||
DataFrame.cumsum : Return cumulative sum over DataFrame axis. | ||
DataFrame.cumprod : Return cumulative product over DataFrame axis. | ||
""" | ||
|
||
_cummax_examples = """\ | ||
Examples | ||
-------- | ||
**DataFrame** | ||
|
||
>>> df = pd.DataFrame([[7, 1], | ||
... [3, 4], | ||
... [8, 0]], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd use smaller values. When illustrating Also, if you use a nan in one of the columns, you can reuse this example for the next section. |
||
... columns=list('AB')) | ||
>>> df | ||
A B | ||
0 7 1 | ||
1 3 4 | ||
2 8 0 | ||
|
||
**axis** | ||
|
||
axis=None : Iterates over rows and finds the cumulative value in each column. | ||
If value is different from the previous one, updates it: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd personally simply say something like "By default, cumulative functions work on the index axis, meaning that row each row, they accumulate the values from the previous". The second comment is only right for |
||
|
||
>>> df.cummax(axis=None) | ||
A B | ||
0 7 1 | ||
1 7 4 | ||
2 8 4 | ||
>>> df.cummin(axis=None) | ||
A B | ||
0 7 1 | ||
1 3 1 | ||
2 3 0 | ||
>>> df.cumsum(axis=None) | ||
A B | ||
0 7 1 | ||
1 10 5 | ||
2 18 5 | ||
>>> df.cumprod(axis=None) | ||
A B | ||
0 7 1 | ||
1 21 4 | ||
2 168 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
axis=1 : Iterates over columns and finds the cumulative value in each row. | ||
If value is different from the previous one, updates it: | ||
|
||
>>> df.cummax(axis=1) | ||
A B | ||
0 7 7 | ||
1 3 4 | ||
2 8 8 | ||
>>> df.cummin(axis=1) | ||
A B | ||
0 7 1 | ||
1 3 3 | ||
2 8 0 | ||
>>> df.cumsum(axis=1) | ||
A B | ||
0 7 8 | ||
1 3 7 | ||
2 8 8 | ||
>>> df.cumprod(axis=1) | ||
A B | ||
0 7 7 | ||
1 3 12 | ||
2 8 0 | ||
|
||
**skipna** | ||
|
||
skipna=True : Ignores NaN values during operation: | ||
|
||
>>> df = pd.DataFrame([[7, np.nan], | ||
... [np.nan, 4], | ||
... [8, 0]], | ||
... columns=list('AB')) | ||
>>> df | ||
A B | ||
0 7.0 NaN | ||
1 NaN 4.0 | ||
2 8.0 0.0 | ||
|
||
>>> df.cummax(skipna=True) | ||
A B | ||
0 7.0 NaN | ||
1 NaN 4.0 | ||
2 8.0 4.0 | ||
>>> df.cummin(skipna=True) | ||
A B | ||
0 7.0 NaN | ||
1 NaN 4.0 | ||
2 7.0 0.0 | ||
>>> df.cumsum(skipna=True) | ||
A B | ||
0 7.0 NaN | ||
1 NaN 4.0 | ||
2 15.0 4.0 | ||
>>> df.cumprod(skipna=True) | ||
A B | ||
0 7.0 NaN | ||
1 NaN 4.0 | ||
2 56.0 0.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
skipna=False : Includes NaN values: | ||
|
||
>>> df.cummax(skipna=False) | ||
A B | ||
0 7.0 NaN | ||
1 NaN NaN | ||
2 NaN NaN | ||
>>> df.cummin(skipna=False) | ||
A B | ||
0 7.0 NaN | ||
1 NaN NaN | ||
2 NaN NaN | ||
>>> df.cumsum(skipna=False) | ||
A B | ||
0 7.0 NaN | ||
1 NaN NaN | ||
2 NaN NaN | ||
>>> df.cumprod(skipna=False) | ||
A B | ||
0 7.0 NaN | ||
1 NaN NaN | ||
2 NaN NaN | ||
|
||
**Series** | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As with all the functions this is getting very long, I'd probably avoid having examples for If you keep them, personally I'd have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we decide to have separate string examples for each method, we can keep the examples for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
+ 1 Another suggestion would be to start with Series to just illustrate the concept of "cumulative max", as this will make the examples a little bit easier, and show the effect of NaNs. And then show DataFrame, saying that by default the same happens for each column of the DataFrame, and optionally use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the suggestion. I agree, it is definitely easier to see what is going on with NaNs if we use a Series example instead of DataFrame. I will change it this way. |
||
>>> s = pd.Series([2,np.nan,5,0,-1]) | ||
>>> s | ||
0 2.0 | ||
1 NaN | ||
2 5.0 | ||
3 0.0 | ||
4 -1.0 | ||
dtype: float64 | ||
|
||
>>> s.cummax() | ||
0 2.0 | ||
1 NaN | ||
2 5.0 | ||
3 5.0 | ||
4 5.0 | ||
dtype: float64 | ||
>>> s.cummin() | ||
0 2.0 | ||
1 NaN | ||
2 2.0 | ||
3 0.0 | ||
4 -1.0 | ||
dtype: float64 | ||
>>> s.cumsum() | ||
0 2.0 | ||
1 NaN | ||
2 7.0 | ||
3 7.0 | ||
4 6.0 | ||
dtype: float64 | ||
>>> s.cumprod() | ||
0 2.0 | ||
1 NaN | ||
2 10.0 | ||
3 0.0 | ||
4 -0.0 | ||
dtype: float64 | ||
""" | ||
|
||
_any_see_also = """\ | ||
|
@@ -8732,11 +8906,11 @@ def stat_func(self, axis=None, skipna=None, level=None, ddof=1, | |
|
||
|
||
def _make_cum_function(cls, name, name1, name2, axis_descr, desc, | ||
accum_func, accum_func_name, mask_a, mask_b): | ||
accum_func, accum_func_name, mask_a, mask_b, examples): | ||
@Substitution(outname=name, desc=desc, name1=name1, name2=name2, | ||
axis_descr=axis_descr, accum_func_name=accum_func_name) | ||
@Appender("Return {0} over requested axis.".format(desc) + | ||
_cnum_doc) | ||
axis_descr=axis_descr, accum_func_name=accum_func_name, | ||
examples=examples) | ||
@Appender(_cnum_doc) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's all right like this, but may be it'd be simpler to leave this as it was, and have the examples in Another option would be to have a different string for each method example, in that case, something similar to this would make more sense. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think having separate string examples for each method makes everything clearer, especially when showing examples for use of The disadvantage is user will only see examples for the method they’re checking, but I think this is ok because we are referencing all methods in the ‘See also’ section, which comes before 'Examples'. In these PRs #20216 and #20217 examples for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I am also in favor of splitting up the examples. |
||
def cum_func(self, axis=None, skipna=True, *args, **kwargs): | ||
skipna = nv.validate_cum_func_with_skipna(skipna, args, kwargs, name) | ||
if axis is None: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not technically right that default is
0
, I think it'sNone
, which I guess it's equivalent to0
.Can you double check, and and change it if that's right. Something like
{0 or 'index', 1 or 'columns'} or None, default None
would probably be the most standard way if that's right. And a description about the axis would be useful (pointing out thatNone
meansindex
if that's the case).If you check recent PRs there are some with a an axis parameter that you can check for reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is right,
cum_func
(i.e. function corresponding to all cumulative methods) is defined withaxis=None
as default argument.I also found this regarding the correct format of axis parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it is technically None, in practice it is 0 for Series/DataFrame, so I would keep the documentation like this.
The technical reason is because for Panel it is 1, but Panel is deprecated and I think we should not care about them in the documentation.