Skip to content

DOC: Validator + converting array_like to array-like in docstrings #41295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

kemcbride
Copy link
Contributor

@kemcbride kemcbride commented May 4, 2021

First try at any open source stuff! This one looked pretty doable, hopefully it's there's nothing wrong with it/let me know!
I'm not sure whether I should be checking these boxes, or a reviewer should. I believe this fixes the issue since it adds the validator, though it's not a particularly clever validator.

It technically is an added test, and it technically works - I re-added one of the entries and got a successful (correct) result for its failure... (amongst the infinite other validation failures 😢 )
I assume linting will run after the PR is submitted, but I'll look into running it locally, too.

...
None:None:ES01:pandas.IndexSlice:No extended summary foundNone:None:EX03:pandas.IndexSlice:flake8 error: E231 missing whitespace after ',' (4 times)
.../pandas/pandas/core/indexes/multi.py:434:ES01:pandas.MultiIndex.from_arrays:No extended summary found
.../pandas/pandas/core/indexes/multi.py:434:GL05:pandas.MultiIndex.from_arrays:Use 'array-like' rather than 'array_like' in docstrings.  <<<--- Message for added issue.
.../pandas/pandas/core/indexes/multi.py:500:ES01:pandas.MultiIndex.from_tuples:No extended summary found
.../pandas/pandas/core/indexes/multi.py:567:ES01:pandas.MultiIndex.from_product:No extended summary found
None:None:ES01:pandas.MultiIndex.names:No extended summary found
...

More on the validator:

  • I chose GL as the code after looking through the numpy docstring validation guide and the pandas docstring validation guide - I think it stands for "General" and that seems like the best categorization for this message.
  • I used raw_doc for this but there was another property like clean_doc as well. I could just iterate through each section, too. 🤷
  • I didn't add any extra detail/context to the error message. The line and file are already given, so I think there's nothing extra that's needed.

Using the validator to pick up any missed docstring changes:

$ time ./validate_docstrings.py 2>&1 | tee validate_docstrings.log  ### (It takes ~5m to run)
$ grep GL05 validate_docstrings.log
.../pandas/pandas/core/frame.py:10220:GL05:pandas.DataFrame.resample:Use 'array-like' rather than 'array_like' in docstrings.
.../pandas/pandas/core/series.py:5171:GL05:pandas.Series.resample:Use 'array-like' rather than 'array_like' in docstrings.

Actually, looking into those two remaining results - those are docstrings generated by lines like:
10220 @doc(NDFrame.resample, **_shared_doc_kwargs)

Looking in more detail, I'm not exactly sure how this happens, but it seems like this decorator must be generating text with "array_like" in it. I kind of want to declare it out of scope for this PR, but I can dig into it more if needed.

I suspect maybe the Substitution decorator tool can help, possibly. It seems kind of cheesy to add something that will substitute array_like for array-like in a docstring - eg, it might be the result of a function whose param is actually called array_like, which, I'm pretty sure they do exist. I'm open to feedback on that.

@kemcbride kemcbride marked this pull request as ready for review May 4, 2021 00:17
@MarcoGorelli MarcoGorelli self-requested a review May 4, 2021 07:22
@MarcoGorelli
Copy link
Member

Hi @kemcbride - solid effort here, nice! I'll take a close look later in the week

@jreback jreback added Docs Code Style Code style, linting, code_checks labels May 5, 2021
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @kemcbride

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking in more detail, I'm not exactly sure how this happens, but it seems like this decorator must be generating text with "array_like" in it. I kind of want to declare it out of scope for this PR, but I can dig into it more if needed.

Here's the part of the docstring which uses array_like:

pandas/pandas/core/generic.py

Lines 8100 to 8101 in 67c9385

>>> def custom_resampler(array_like):
... return np.sum(array_like) + 5

So, the check should probably exclude examples

@MarcoGorelli
Copy link
Member

So, the check should probably exclude examples

Or just use a different variable name in the example

@MarcoGorelli
Copy link
Member

Nice, thanks for updating! From a quick look it looks good, will check again at the weekend

@MarcoGorelli MarcoGorelli self-requested a review May 12, 2021 19:02
@kemcbride
Copy link
Contributor Author

Boom. I think this is it!
It took me a while to figure out how to run the pytest command on scripts dir to actually get the tests to run, but some googling/seeing past PRs helped out there.

Looks like I only needed to fix the one use of array_like in core/generic.py to resolve both of the examples issues.

Re-running the validator thing, using the script that's used for ci: https://gist.github.com/kemcbride/0f769423475aeb3aa4f92122cb307cdd, Looks good, return code = 0

Running the validator tests: https://gist.github.com/kemcbride/93adecdef7717fd0d4ceb689e87775ee

Thanks for the quick review!

@kemcbride
Copy link
Contributor Author

Pinging just as a reminder @MarcoGorelli

@MarcoGorelli
Copy link
Member

Hey @kemcbride - I'm a bit busy ATM with trying to get some things in to 1.3, as that's probably the last minor version before 2.0 and so that last chance for a while (2 years, maybe) to get some deprecations in, mind if I come back to this once 1.3 is in? Should just be a few weeks at most

@MarcoGorelli MarcoGorelli added this to the 1.4 milestone Jun 12, 2021
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kemcbride , I just did a slight fixup to make this more consistent with other checks

If you fancy trying a similar issue, there's #41719 still open

@MarcoGorelli MarcoGorelli merged commit a1412a0 into pandas-dev:master Jun 12, 2021
@simonjayhawkins
Copy link
Member

urgh. @MarcoGorelli please don't merge to master

@MarcoGorelli
Copy link
Member

ah so sorry @simonjayhawkins - shall I revert these?

@simonjayhawkins
Copy link
Member

I don't think that will make much difference. may need to start again as we have a commit for the release and a commit for the new tag and I doubt these will now push to master. These are created locally and only pushed once tests and checks on the documentation build have completed. normally need a 1-2 hour window with no commits to master.

We could maybe change the scripts to not created the commits, and manually tag the hash where we want to branch and then manually tag the first commit after with the dev tag.

The sdist has been built on a tagged branch with an extra commit, but that shouldn't matter for the sdist.

@simonjayhawkins simonjayhawkins mentioned this pull request Jun 12, 2021
@MarcoGorelli
Copy link
Member

Ouch...

normally need a 1-2 hour window with no commits to master.

OK, this was completely my fault, but for next time could there be a group mention somewhere advising to not merge anything? I won't make this mistake again, but if any new committers are added there's a risk they might

For what it's worth, I'm free most of today to help out with anything if needed, though I understand there's probably not much I could do here at this point

@simonjayhawkins
Copy link
Member

OK, this was completely my fault,

no worries. vary rarely goes smoothly.

but for next time could there be a group mention somewhere

That's my fault. I didn't realize i'd left the group mention out of the 1.3 RLS issue. (and I've done the same for 1.4) so that core/triage would get the notifications.

@simonjayhawkins
Copy link
Member

For what it's worth, I'm free most of today to help out with anything if needed, though I understand there's probably not much I could do here at this point

Thanks. It might get messy when we get to the conda stage. I might need help on that.

@simonjayhawkins simonjayhawkins modified the milestones: 1.4, 1.3 Jun 12, 2021
@simonjayhawkins
Copy link
Member

@MarcoGorelli when merging PRs that close issues can you also add the milestone to the closed issue.

@kemcbride kemcbride deleted the arraylike_hyphen_in_docstrings_not_underscore branch June 19, 2021 22:37
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
…andas-dev#41295)

* Converting array_like to array-like in docstrings & comments

* Add docstring validation check for array_like vs. array-like

* Add unit test for arraylike validator, and fix example in core/generic.py

* make method of PandasDocstring

Co-authored-by: Marco Gorelli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Style Code style, linting, code_checks Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: inconsistency in doc argument types
5 participants