Fix SS03 issues in docstrings #30733

galuhsahid · 2020-01-06T12:15:37Z

Fix the docstrings where summary does not end with a period.

Current errors found:

$ ./scripts/validate_docstrings.py --errors=SS03
None:None:SS03:pandas.tseries.offsets.DateOffset.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.BusinessDay.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.BusinessHour.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.CustomBusinessDay.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.CustomBusinessHour.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.MonthOffset.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.MonthEnd.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.MonthBegin.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.BusinessMonthEnd.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.BusinessMonthBegin.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.CustomBusinessMonthEnd.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.CustomBusinessMonthBegin.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.SemiMonthOffset.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.SemiMonthEnd.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.SemiMonthBegin.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.Week.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.WeekOfMonth.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.LastWeekOfMonth.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.QuarterOffset.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.BQuarterEnd.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.BQuarterBegin.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.QuarterEnd.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.QuarterBegin.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.YearOffset.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.BYearEnd.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.BYearBegin.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.YearEnd.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.YearBegin.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.FY5253.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.FY5253Quarter.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.Easter.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.Tick.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.Day.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.Hour.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.Minute.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.Second.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.Milli.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.Micro.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.Nano.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.BDay.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.BMonthEnd.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.BMonthBegin.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.CBMonthEnd.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.CBMonthBegin.normalize:Summary does not end with a period
None:None:SS03:pandas.tseries.offsets.CDay.normalize:Summary does not end with a period
None:None:SS03:pandas.Timestamp.isoweekday:Summary does not end with a period
None:None:SS03:pandas.Timestamp.weekday:Summary does not end with a period
None:None:SS03:pandas.DatetimeIndex.freqstr:Summary does not end with a period
None:None:SS03:pandas.PeriodIndex.freqstr:Summary does not end with a period
pandas/pandas/core/window/indexers.py:34:SS03:pandas.api.indexers.BaseIndexer:Summary does not end with a period
None:None:SS03:pandas.io.formats.style.Styler.loader:Summary does not end with a period

The text was updated successfully, but these errors were encountered:

galuhsahid · 2020-01-06T12:15:47Z

I'll be working on this

datapythonista · 2020-01-06T13:03:31Z

I think all those are false positives. The docstring being analyzed is not in pandas, but a the one of a Python object. This should be already fixed in the script, but not in the pandas version, but in the version moved to numpydoc. Which we still need to integrate into pandas.

Feel free to have a look and double check.

galuhsahid · 2020-01-06T16:38:31Z

I've randomly sampled some examples and yes seems like those are all false positives. Is the revised script ready to use? If so, where can one find it? Or shall we wait until it is integrated into pandas? Thank you!

datapythonista · 2020-01-06T17:34:14Z

I'm on it, just opened the PR, #30746, hopefully it merged in few days, and we can start using the new validation. In the meantime, feel free to fetch my branch locally, and run it, so get the new list of errors.

galuhsahid · 2020-01-06T18:52:41Z

I fetched your branch locally & tried running the new script but I get the same list of errors as previously. I'm not sure if I'm missing something, though.

datapythonista · 2020-01-07T09:54:59Z

There were some changes implemented to the script, that were supposed to help get the docstring of the correct object. I'll have a look and find that PR, and see if everything is working as expected. I guess it's not.

datapythonista · 2020-01-08T11:38:20Z

I was checking, I think this is the PR that was expected to fix this: #29012

Which was added to numpydoc: https://github.com/numpy/numpydoc/blob/master/numpydoc/validate.py#L130

So, my branch shouldn't report those errors. So, I guess the unwraps function is failing in the same way as our custom function.

Do you want to have a look and see if you can find the problem and a fix for it? Thanks!

galuhsahid · 2020-01-08T12:18:46Z

Sure, I will take a look at it later!

galuhsahid · 2020-01-09T17:25:25Z

OK, so I just noticed that all of these false positives have None as their source code file and line number. And the validation script says:

            # In some cases the object is something complex like a cython
            # object that can't be easily introspected. An it's better to
            # return the source code file of the object as None, than crash

I'm guessing the errors returned here are too difficult to be introspected, so they are still reported but the source code file returns None instead. Is my understanding correct? If so do you think a fix is possible or should we just let it be?

datapythonista · 2020-01-09T17:36:00Z

I think you're mixing two different things. That comment refers to knowing where is the docstring being written in the code (in which file, and in which line number). That in some cases it's impossible to know. Imagine something like:

def foo():
   pass
foo.__doc__ = open('/tmp/docstring').read()

And there could be even trickier things, that won't let us know where to find the docstring. This is why we return None for the file.

For those false positives, we don't care about where the docstring is (we care in the json, to help contributors find where they need to make the changes). What we care is about the content of the docstring, and this should be always possible to obtain with:

foo.__doc__

The only problem is that if foo is a property, a decorated function... we don't really want the docstring of foo, but the docstring of the object that foo is wrapping. So, we have the "magic" you saw, with the unwraps and the previous _to_original_callable to deal with this. But that doesn't seem to be working well. If I'm not wrong, those false positives are of objects that return a string, and the docstring we're finally validating is the one of the Python str object.

galuhsahid · 2020-01-11T06:04:59Z

I see, noted. I assumed there missing source file and the false positives are always related, but now I understand that they aren't.

I checked again and some of the errors reported were indeed true positives (e.g. pandas.DatetimeIndex.freqstr). Will open a PR for these.

But there are some that I'm not sure whether they are false positives or not, e.g. all the errors that end with normalize such as pandas.tseries.offsets.CustomBusinessMonthBegin.normalize. I somehow can't find the related docstrings in pandas/tseries/offsets.py. I thought they were false positives but now I'm not so sure. If they are also true positives it means the script is reporting the correct errors.

xref: #27977 #30733

datapythonista · 2020-01-13T15:17:04Z

I think I answered via email, but the problem with normalize and most false positives are with the unwraps and how objects are being fetched.

I think the best would be to find a minimal example when instead of obtaining the right object, we validate something else. For example:

class MyClass:
    @property
    def my_property(self):
        """We should validate this."""
        return 'foo'  # we are validating the docstring of the Python type str instead

Then, add a unit test in numpydoc, and fix it. I think it's probably not easy, but I think it's doable.

galuhsahid · 2020-01-17T13:50:20Z

Closing this issue since I've fixed SS03 errors in #30939; will open a separate issue regarding the validation script

datapythonista added Clean Docs labels Jan 6, 2020

galuhsahid closed this as completed Jan 6, 2020

galuhsahid reopened this Jan 6, 2020

galuhsahid mentioned this issue Jan 12, 2020

DOC: Fix SS03 docstring error #30939

Merged

5 tasks

gfyoung pushed a commit that referenced this issue Jan 12, 2020

DOC: Fix SS03 docstring error (#30939)

044559a

xref: #27977 #30733

sthagen mentioned this issue Jan 12, 2020

DOC: Fix SS03 docstring error (#30939) sthagen/pandas-dev-pandas#5

Merged

5 tasks

galuhsahid closed this as completed Jan 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SS03 issues in docstrings #30733

Fix SS03 issues in docstrings #30733

galuhsahid commented Jan 6, 2020 •

edited

Loading

galuhsahid commented Jan 6, 2020

datapythonista commented Jan 6, 2020

galuhsahid commented Jan 6, 2020

datapythonista commented Jan 6, 2020

galuhsahid commented Jan 6, 2020

datapythonista commented Jan 7, 2020

datapythonista commented Jan 8, 2020

galuhsahid commented Jan 8, 2020

galuhsahid commented Jan 9, 2020

datapythonista commented Jan 9, 2020

galuhsahid commented Jan 11, 2020

datapythonista commented Jan 13, 2020

galuhsahid commented Jan 17, 2020

Fix SS03 issues in docstrings #30733

Fix SS03 issues in docstrings #30733

Comments

galuhsahid commented Jan 6, 2020 • edited Loading

galuhsahid commented Jan 6, 2020

datapythonista commented Jan 6, 2020

galuhsahid commented Jan 6, 2020

datapythonista commented Jan 6, 2020

galuhsahid commented Jan 6, 2020

datapythonista commented Jan 7, 2020

datapythonista commented Jan 8, 2020

galuhsahid commented Jan 8, 2020

galuhsahid commented Jan 9, 2020

datapythonista commented Jan 9, 2020

galuhsahid commented Jan 11, 2020

datapythonista commented Jan 13, 2020

galuhsahid commented Jan 17, 2020

galuhsahid commented Jan 6, 2020 •

edited

Loading