Skip to content

BUG: Partial fix for docstring validation for parameters #28765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ChiefMilesEdgeworth
Copy link
Contributor

This PR addresses #20298, does not close.

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

This changes the method used to find arguments from inspect.getfullargspec to inspect.signature and changes the handling accordingly. It also now checks for "*args, **kwargs" and other variations as a doc parameter, and splits them up so they can be checked against.

Using signature allows the validation to get to the root function when a function is decorated. This reduced a significant amount of the errors in the string.py file
Check for variations on "*args, **kwargs" while loading in the documentation parameters, and load them in separately if found.
Pep8 conformity
Using signature allows the validation to get to the root function when a function is decorated. This reduced a significant amount of the errors in the string.py file
Check for variations on "*args, **kwargs" while loading in the documentation parameters, and load them in separately if found.
Pep8 conformity
Those functions cause issues during the testing. This has most likely been covered by some PR merged already.
@ChiefMilesEdgeworth ChiefMilesEdgeworth changed the title Fix docstring validation BUG: Partial fix for docstring validation for parameters Oct 3, 2019
@WillAyd
Copy link
Member

WillAyd commented Oct 3, 2019

There are tests for the doctoring validator - if making changes there also will need a test case(s)

@ChiefMilesEdgeworth
Copy link
Contributor Author

Thank you! I'm working out the bugs introduced right now.

Desc was being loaded in without being turned into a string. This was causing issues when indexes were used to check capitalization and the final period.
@ChiefMilesEdgeworth
Copy link
Contributor Author

ChiefMilesEdgeworth commented Oct 3, 2019

@WillAyd Would you like me to write tests for decorated functions? That was the main point addressed by this PR. If so, just let me know, and I can do that before this is merged.

Also, I'm having trouble figuring out why I'm not passing the automated linting check. Do you have any ideas?

For some reason black didn't catch the validation script the first time. It should be fixed now.
@ChiefMilesEdgeworth
Copy link
Contributor Author

Never mind, I found it. Sorry for all the really little commits, I just recently started working with pandas, so I'm getting used to all the small things that need to be done first.

Tests added to test_validate_docstrings that check that it handles decorated functions as predicted.
@pep8speaks
Copy link

pep8speaks commented Oct 4, 2019

Hello @Nabel0721! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-10-13 19:10:49 UTC

Fixed trailing whitespace on two lines.
@ChiefMilesEdgeworth
Copy link
Contributor Author

I added tests for decorators. Look over it and see if there are any more you'd like me to add. If I have an idea of what to make I should be able to do it no problem, I'm just not sure what exactly would need to be added. The changes I made to the validation script were just to account for oversights, not to add functionality, so there weren't any new features to test.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

There are two unrelated changes here, one changing how we obtain the paramters from the docstring, and one from the signature. Do you mind addressing them in separate PRs? The one of the docstrings is quite simple I think, but I don't quite understand the other one. And in case we change our mind and we want to revert, we won't want to revert both.

Regarding inspect.signature, can you point me to an example where this change is fixing a validation error. I'm sure you know what you're doing, but would like to understand what's going on, and I fail to see if we're fixing bugs, or just using a nicer API.

Thanks!

@ChiefMilesEdgeworth
Copy link
Contributor Author

ChiefMilesEdgeworth commented Oct 7, 2019

So, here's a code example to show what the difference is between getfullargspec and signature.

from functools import wraps
from inspect import getfullargspec, signature


def decorator(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        """
        This is hidden documentation that we don't want
        """
        print("The wrapper is doing stuff!")
        return func(*args, **kwargs)
    return wrapper

@decorator
def add(a, b, *, absolute=False):
    """
    This method adds a and b, with the absolute value optionally taken
    """
    return abs(a+b) if absolute else a+b

    
print(f"Using getfullargspec: {getfullargspec(add)}")
print(f"Using signature {signature(add)}")

The following script prints out:

Using getfullargspec: FullArgSpec(args=[], varargs='args', varkw='kwargs', defaults=None, kwonlyargs=[], kwonlydefaults=None, annotations={})
Using signature (a, b, *, absolute=False)

getfullargspec only finds the signature of the decorator, since it doesn't search any deeper. However, signature checks if the object has the __wrapped__ attribute (which is assigned in the by the @wraps decorator to the decorated function), and if it does it checks the signature of that object instead. Using signature allows us to properly get the signature of decorated functions, which is important since we have decorators everywhere. For instance, there is a decorator for deprecating a specific kwarg. That was causing trouble before, because it overrode the signature of the function below it. This would solve that issue.

@ChiefMilesEdgeworth
Copy link
Contributor Author

I'll put together a separate PR for the splitting on ", ". You're right that these two issues are pretty distinctly separate, and should be kept that way.

@datapythonista
Copy link
Member

Thanks a lot for the detail explanation on signature vs getfullargsspec. Is there any case in pandas where this is an issue right now, and that it's fixed with this PR?

We've got _to_original_callable which I think it's taking care of this. I guess it could be simplified with this change, but it'd be nice to know how this change is affecting the reported docstrings.

@ChiefMilesEdgeworth
Copy link
Contributor Author

Most of the PR02 errors we're getting from pandas.Series.str are the result of this. There are other places too, I remember finding them as I was looking around at PR02 errors. They usually pop up where the PR02 error says variables aren't defined that are in the signature of the function. If it's decorated, then it gets *args, **kwargs instead of whatever is supposed to be there.

We could run the check on the object returned by to_original_callable, but if we do that we should just remove that function and use inspect.unwrap instead. It does the same thing but covers more cases than we currently do. I was thinking of submitting that idea later anyways, since it cuts out a whole function and would result in better functionality (catching more cases).

@datapythonista
Copy link
Member

Ah, I see. Do you think it makes sense to add one of the Series.str methods to the unit tests? Also, it'd probably make sense to generate the errors json with and without these changes, and find the differences, and see if the result is the expected one.

@ChiefMilesEdgeworth
Copy link
Contributor Author

Yeah, I did that when I first made the changes, but I can send two files with the differences.

@ChiefMilesEdgeworth
Copy link
Contributor Author

One file is taken from current master, so there'll be some changes that were fixed by other PRs. The main ones to look at are under Series.str.
decorator_fix.txt
master.txt

A lot of the errors are gone after applying the fix.

Reverting back to master's version so the issue can be addressed in another PR.
@datapythonista
Copy link
Member

Sounds great. Let's split the two different things, and merge them then.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just couple of last comments

I added a parameter to the test so that it's more visible, and changed the validation script to use a generator straight into a tuple. I also switched off of using `items`, since it's a set and therefore might give different orders for the parameters.
Fixed line length and whitespace requirements
Remove a test that doesn't quite examine the error we'd like.
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one last comment, to make things simpler, and we're good to go

Switched to using a builtin decorator
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @Nabel0721, nice improvement

@WillAyd
Copy link
Member

WillAyd commented Oct 14, 2019

Read through history but not entirely clear - what does this end up improving upon?

@ChiefMilesEdgeworth
Copy link
Contributor Author

ChiefMilesEdgeworth commented Oct 14, 2019

Whenever we use a decorator on a function that follows normal decorator conventions (wrapping the original function, passing in the arguments and returning the result with some added functionality) we were losing the signature of the original function. The result was that the script would look for *args, **kwargs or whatever the signature was of the wrapper instead of the actual function's signature, and that would be used in the comparison to the docstring parameters. This resolves that by using a more up-to-date function from the builtin library inspect. It checks if a function has the __wrapped__ attribute, and if so calls itself recursively on the function in that attribute. So, with this change we'll see most of the errors due to decorators go away, although a few of the decorators need a bit of work since they don't use the @wraps decorator on their wrapper function (which sets the __wrapped__ attribute along with a few other things). If you need me to go into more specifics on any part of this, just let me know. At this point I've got a really good grasp on how decorators work and what the inspect.signature does.

@WillAyd WillAyd added this to the 1.0 milestone Oct 14, 2019
@WillAyd WillAyd merged commit 2f80feb into pandas-dev:master Oct 14, 2019
@WillAyd
Copy link
Member

WillAyd commented Oct 14, 2019

Makes sense. I think generally we are inconsistent in how we maintain signatures, annotations, etc... with wrapped functions, so would welcome general ideas you have for that

@datapythonista
Copy link
Member

We were doing it manually until now with _to_original_callable. @Nabel0721 I guess you'll see if it can be fully removed now in a follow up, right?

@ChiefMilesEdgeworth
Copy link
Contributor Author

ChiefMilesEdgeworth commented Oct 14, 2019 via email

@datapythonista
Copy link
Member

Great, please tag me when you open the PR. Thanks!

@ChiefMilesEdgeworth
Copy link
Contributor Author

Ideally, we should only be working with signatures and wrapped functions using the builtin tools from the inspect library whenever possible. I'm fairly certain that this is what numpy uses as well, so it will create consistency with the imported docstring stuff from them.

@datapythonista
Copy link
Member

I didn't see that Python had that unwapping functionality, that's why I implemented _to_original_callable. If we can get the actual function just calling the Python sdlib, that would be great, less code to maintain.

@ChiefMilesEdgeworth ChiefMilesEdgeworth deleted the fix_docstring_validation branch October 15, 2019 20:43
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
bongolegend pushed a commit to bongolegend/pandas that referenced this pull request Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants