Skip to content

DOC: Updating operators docstrings #20415

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Dec 2, 2018
Merged
Changes from 10 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
acab08e
DOC: Add examples for DataFrame.gt() and DataFrame.ge()
ParfaitG Mar 11, 2018
1818aeb
Merge branch 'master' of github.com:pandas-dev/pandas into docstring_gt
ParfaitG Mar 12, 2018
86cfd56
Updated latest ops.py
ParfaitG Mar 17, 2018
b68b61f
Merge branch 'master' of github.com:pandas-dev/pandas into docstring_gt
ParfaitG Mar 20, 2018
13fed5f
DOC: Add examples to docstring of DataFrame.ge() and .gt()
ParfaitG Mar 20, 2018
8bdbc14
DOC: Add examples to docstring of DataFrame.ge() and .gt()
ParfaitG Mar 20, 2018
4668c5f
DOC: Update ops.py to add docstring, parameters, and examples to comp…
ParfaitG Jul 22, 2018
e6eb9b9
DOC: Update ops.py for operator methods - cleaning up whitespace
ParfaitG Jul 22, 2018
db143c4
DOC: Update ops.py to extend docstrings for comparison methods
ParfaitG Jul 30, 2018
33ff1e4
DOC: Create single, generalized docstring for comparison methods
ParfaitG Aug 5, 2018
e138d92
DOC: Examples and summary updates to comparison operators
ParfaitG Aug 12, 2018
50e9d98
DOC: further update to parameters and examples for comparison methods
ParfaitG Aug 16, 2018
aa016fd
Merge remote-tracking branch 'upstream/master' into docstring_gt
ParfaitG Aug 16, 2018
c2cc037
DOC: Adjusted notes and examples for comparison methods
ParfaitG Aug 17, 2018
644273b
DOC: Adjusted _flex_comp_doc_FRAME assignment logic
ParfaitG Aug 22, 2018
240a502
DOC: Extended arithmetic operator docstring to resemble comparison op…
ParfaitG Aug 23, 2018
bbcdcbe
DOC: Updated df arithmetic operators, extended series arithmetic and …
ParfaitG Aug 24, 2018
a33f003
Revert "DOC: Updated df arithmetic operators, extended series arithme…
ParfaitG Aug 25, 2018
70950c0
DOC: Update DataFrame arithmetic docstring
ParfaitG Aug 25, 2018
6bcb9b9
DOC: Updated examples in arithmetic operators
ParfaitG Sep 25, 2018
e7da1e9
Merge remote-tracking branch 'upstream/master' into docstring_gt
ParfaitG Sep 27, 2018
20cbec1
Merge remote-tracking branch 'upstream/master' into docstring_gt
ParfaitG Sep 27, 2018
722ae81
Updated doctests with core/ops.py
ParfaitG Sep 27, 2018
4580f7a
Resetting doctests and setup files
ParfaitG Sep 28, 2018
49c7b82
Updated arithmetic doctring to use equiv variable
ParfaitG Sep 29, 2018
1e4e450
Remove df_info.txt generated from doctests
ParfaitG Sep 29, 2018
ec71a04
Updated _gen_eval_kwargs docstring in ops.py to avoid pytest skip
ParfaitG Sep 30, 2018
25129ff
Resolve doctest conflict and get latest upstream changes
ParfaitG Oct 13, 2018
d344688
Update docstrings to conform to PEP 8 syntax
ParfaitG Oct 14, 2018
eaaee0d
Slight indentation fixes
ParfaitG Oct 16, 2018
e777e87
DOC: merge with master, resolved conflicts
ParfaitG Nov 24, 2018
6879e89
Merge remote-tracking branch 'upstream/master' into docstring_gt
ParfaitG Dec 2, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 163 additions & 9 deletions pandas/core/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -436,27 +436,33 @@ def _get_op_name(op, special):
'eq': {'op': '==',
'desc': 'Equal to',
'reverse': None,
'df_examples': None},
'df_examples': None,
'others': None},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't use reverse, df_examples and others, can we get rid of them instead of having all them to None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df_examples was kept from original docs and not removed for code consistency if used elsewhere. Can remove and test docstrings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse is needed for compatibility with arithmetic operators as they are included in same dictionary, _op_descriptions

'ne': {'op': '!=',
'desc': 'Not equal to',
'reverse': None,
'df_examples': None},
'df_examples': None,
'others': None},
'lt': {'op': '<',
'desc': 'Less than',
'reverse': None,
'df_examples': None},
'df_examples': None,
'others': None},
'le': {'op': '<=',
'desc': 'Less than or equal to',
'reverse': None,
'df_examples': None},
'reverse': 'gt',
'df_examples': None,
'others': None},
'gt': {'op': '>',
'desc': 'Greater than',
'reverse': None,
'df_examples': None},
'df_examples': None,
'others': None},
'ge': {'op': '>=',
'desc': 'Greater than or equal to',
'reverse': None,
'df_examples': None}}
'df_examples': None,
'others': None}}

_op_names = list(_op_descriptions.keys())
for key in _op_names:
Expand Down Expand Up @@ -582,6 +588,150 @@ def _get_op_name(op, special):
DataFrame.{reverse}
"""

_flex_comp_doc_FRAME = """
Flexible wrappers to comparison operators (`eq`, `ne`, `le`, `lt`, `ge`, `gt`).

Equivalent to `==`, `=!`, `<=`, `<`, `>=`, `>` with support to choose
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused... In the previous dictionaries, you define op, desc... and you also have name which is the key. Shouldn't we use them here? So, we explain in each page the documented method and not all them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with examples that run through several of these sibling methods, the intro lists all of them. It seems counterintuitive to have an intro that describes one method with examples from several methods.

axis (rows or columns) and level for comparison.

Parameters
----------
other : constant, list, tuple, ndarray, Series, or DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think scalar is preferred over constant. For list, tuple and ndarray I think you can abbreviate them as sequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Any single or multiple element data structure, or list-like object.
axis : int or str, optional
Axis to target. Can be either the axis name ('index', 'rows',
'columns') or number (0, 1).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent, I'd document the axis as we usually do. See this docstring: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html

level : int or name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use object than name, as in this line we document the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Broadcast across a level, matching Index values on the passed
MultiIndex level.

Returns
-------
result : DataFrame of bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need result here, you can just leave the type DataFrame of bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Result of the comparison.

See also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the A should be upper case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no current docs page that has Also as title case.

--------
DataFrame.eq : Compare DataFrames for equality elementwise
DataFrame.ne : Compare DataFrames for inequality elementwise
DataFrame.le : Compare DataFrames for less than inequality
or equality elementwise
DataFrame.lt : Compare DataFrames for strictly less than
inequality elementwise
DataFrame.ge : Compare DataFrames for greater than inequality
or equality elementwise
DataFrame.gt : Compare DataFrames for strictly greater than
inequality elementwise

Notes
--------
Mismatched indices will be unioned together.

Examples
--------
>>> df1 = pd.DataFrame({'company': ['A', 'B', 'C'],
... 'cost': [250, 150, 100],
... 'revenue': [100, 250, 300]})
>>> df1
company cost revenue
0 A 250 100
1 B 150 250
2 C 100 300
>>> df2 = pd.DataFrame({'company': ['A', 'B', 'C', 'D'],
... 'revenue': [300, 250, 100, 150]})
>>> df2
company revenue
0 A 300
1 B 250
2 C 100
3 D 150
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead if defining all the DataFrames first, I'd define the first, show the examples that don't need the others, and define the others immediately before they are used.

For the naming, I'd name df the first, other the second used to be compared. And something like df_multiindex the last one. I think that will make things easier for the users.

I also think that would make things easier for the user having smaller DataFrames (like 2x2). These big True/False DataFrame results are not immediate to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood on names. As for lengths, df1 and df2 have different number of rows and columns to show results of across different lengths. And it is just one more row and one more column. Surely not that big. I can add a note to mention results of different lengths.

>>> df3 = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
... 'revenue': [100, 250, 300, 200, 175, 225]},
... index = [['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
... ['A', 'B', 'C', 'A', 'B' ,'C']])
>>> df3
cost revenue
Q1 A 250 100
B 150 250
C 100 300
Q2 A 150 200
B 300 175
C 220 225

Compare to a constant and operator version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain that both are equivalent in this case? And may be finish the sentence with a period?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.


>>> df1 == 100
company cost revenue
0 False False True
1 False False False
2 False True False
>>> df1.eq(100)
company cost revenue
0 False False True
1 False False False
2 False True False

Compare to a Series by axis and operator version

>>> df1 != [100, 250, 300]
company cost revenue
0 True True False
1 True True False
2 True True False
>>> df1.ne([100, 250, 300], axis=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer index and columns for the axis values, it's more explicit and less confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

company cost revenue
0 True True False
1 True True False
2 True True False
>>> df1 != pd.Series([100, 250, 300])
company cost revenue 0 1 2
0 True True True True True True
1 True True True True True True
2 True True True True True True
>>> df1.ne(pd.Series([100, 250, 300]), axis=1)
company cost revenue 0 1 2
0 True True True True True True
1 True True True True True True
2 True True True True True True

Compare to a DataFrame by axis and operator version

>>> df1.reindex(['company', 'revenue'], axis='columns') > df2.iloc[:-1]
company revenue
0 False False
1 False False
2 False True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me what we're showing here.

>>> df1.gt(df2, axis=0)
company cost revenue
0 False False False
1 False False False
2 False False True
3 False False False
>>> df1.gt(df2, axis=1)
company cost revenue
0 False False False
1 False False False
2 False False True
3 False False False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use axis='index' and columns instead of 0 and 1. Or may be even better add in the text explaining the example that the parameter axis is not used when comparing two DataFrame objects.


Compare to a MultiIndex by level and operator version

>>> df1.set_index('company') <= df3.loc['Q1']
cost revenue
company
A True True
B True True
C True True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we're comparing two DataFrame objects. This is more useful to show how .loc works with a multiindex than how operators behave. I'd get rid of it.

>>> df1.set_index('company').le(df3, level=1)
cost revenue
Q1 A True True
B True True
C True True
Q2 A False True
B True False
C True False
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to explain what's going on here. And I'd probably set company as the index when creating the DataFrame, to avoid adding extra complexity here. I don't think it's a problem for other examples.

The blank line after the docstring is not required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been resolved?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add this explanation? So users can understand in an easier way what we are showing here.

_flex_doc_PANEL = """
{desc} of series and other, element-wise (binary operator `{op_name}`).
Equivalent to ``{equiv}``.
Expand Down Expand Up @@ -1546,8 +1696,12 @@ def na_op(x, y):
result = mask_cmp_op(x, y, op, (np.ndarray, ABCSeries))
return result

@Appender('Wrapper for flexible comparison methods {name}'
.format(name=op_name))
if op_name in _op_descriptions:
doc = _flex_comp_doc_FRAME
else:
doc = "Flexible wrappers to comparison methods"

@Appender(doc)
def f(self, other, axis=default_axis, level=None):

other = _align_method_FRAME(self, other, axis)
Expand Down