Skip to content

BUG: Ignore versionadded directive when checking for periods at docstring end #22423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
76 changes: 75 additions & 1 deletion pandas/tests/scripts/test_validate_docstrings.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,53 @@ def contains(self, pat, case=True, na=np.nan):
"""
pass

def mode(self, axis=0, numeric_only=False, dropna=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a smaller, more directed example would be preferable here (it doesn't need to match the original docstring and probably won't over time anyway). Can you strip this down to just what's important?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, sounds good.

"""
Get the mode(s) of each element along the axis selected. Adds a row
for each mode per label, fills in gaps with nan.

This test is to ensure that directives don't affect the tests for
periods at the end of parameters.
Note that there could be multiple values returned for the selected
axis (when more than one item share the maximum frequency), which is
the reason why a dataframe is returned. If you want to impute missing
values with the mode in a dataframe ``df``, you can just do this:
``df.fillna(df.mode().iloc[0])``

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
Describe axis.

.. versionchanged:: 0.1.2

numeric_only : boolean, default False
Describes numeric_only.

.. versionadded:: 0.1.2
.. deprecated:: 0.00.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about versionadded and versionchanged, but deprecated can have a description after if, for example:

          .. deprecated:: 0.21.0
              Use :func:`pandas.read_csv` instead.

And it can be even multiline. Do you mind adding a test for that? I'm not sure if this is working with the current implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if you check the convert_datetime64 of to_records, there are cases where the directives come before the description. I'm happy if we consider only valid having them in one place (before or after the description). But, can we make the script generate a descriptive error for it? I guess with the current implementation we'll report that the parameter has no description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test case for multi-line descriptions.
Directive positioning is a bit more tricky. Enforcing them to be in one place would help, but the problem comes when trying to determine if text after the directive is directive description, or just generic parameter description. We need to make this distinction in order to produce a nice error message.
This is made harder by the fact that we're currently working with doc_parameters, which smooshes the whole description into one single-line string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think enforcement after description is fine. I think @datapythonista is correct in that it will generate an error, albeit with the wrong message. If we wanted to clean that up I'd suggest a separate PR, though @datapythonista I'll leave that decision up to you


dropna : boolean, default True
This param tests that the versionadded directive doesn't break the
checks for the ending period.
Don't consider counts of NaN/NaT.

.. versionadded:: 0.24.0

Returns
-------
modes : DataFrame (sorted)

Examples
--------
>>> df = pd.DataFrame({'A': [1, 2, 1, 2, 1, 2, 3]})
>>> df.mode()
A
0 1
1 2
"""
pass


class BadGenericDocStrings(object):
"""Everything here has a bad docstring
Expand Down Expand Up @@ -374,6 +421,31 @@ def no_description_period(self, kind):
Doesn't end with a dot
"""

def no_description_period_with_directive(self, kind):
"""
Forgets to add a period, and also includes a directive.

Parameters
----------
kind : str
Doesn't end with a dot

.. versionadded:: 0.00.0
"""

def no_description_period_with_directives(self, kind):
"""
Forgets to add a period, and also includes multiple directives.

Parameters
----------
kind : str
Doesn't end with a dot

.. versionchanged:: 0.00.0
.. deprecated:: 0.00.0
"""

def parameter_capitalization(self, kind):
"""
Forgets to capitalize the description.
Expand Down Expand Up @@ -495,7 +567,7 @@ def test_good_class(self):

@pytest.mark.parametrize("func", [
'plot', 'sample', 'random_letters', 'sample_values', 'head', 'head1',
'contains'])
'contains', 'mode'])
def test_good_functions(self, func):
assert validate_one(self._import_path( # noqa: F821
klass='GoodDocStrings', func=func)) == 0
Expand Down Expand Up @@ -531,6 +603,8 @@ def test_bad_generic_functions(self, func):
'Parameter "kind: str" has no type')),
('BadParameters', 'no_description_period',
('Parameter "kind" description should finish with "."',)),
('BadParameters', 'no_description_period_with_directive',
('Parameter "kind" description should finish with "."',)),
('BadParameters', 'parameter_capitalization',
('Parameter "kind" description should start with a capital letter',)),
pytest.param('BadParameters', 'blank_lines', ('No error yet?',),
Expand Down
13 changes: 12 additions & 1 deletion scripts/validate_docstrings.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@


PRIVATE_CLASSES = ['NDFrame', 'IndexOpsMixin']
DIRECTIVES = ['.. versionadded', '.. versionchanged', '.. deprecated']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik, all sphinx directives start with .. so I think it's better to have just the names here.



def _load_obj(obj_name):
Expand Down Expand Up @@ -465,7 +466,17 @@ def validate_one(func_name):
param_errs.append('Parameter "{}" description '
'should start with a '
'capital letter'.format(param))
if doc.parameter_desc(param)[-1] != '.':

period_check_index = -1
for directive in DIRECTIVES:
if directive in doc.parameter_desc(param):
# Get index of character before start of directive
index = doc.parameter_desc(param).index(directive) - 1
# If this directive is closest to the description, use it.
if index < period_check_index or period_check_index is -1:
period_check_index = index

if doc.parameter_desc(param)[period_check_index] != '.':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about creating a new property in the class Docstring that returns the parameter_desc but without directives?

I think the code will be much more readable if we have this logic there, and in this part of the code where all validations happen we simply have something like if doc.parameter_desc_without_directives(param)[-1] != '.':

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that would be more useful and readable; I'll make a property.

param_errs.append('Parameter "{}" description '
'should finish with "."'.format(param))
if param_errs:
Expand Down