DOC: Validate consistency of title capitalization #26941

datapythonista · 2019-06-19T13:16:37Z

In #26933, we're making the capitalization of the title sections consisten. We use to have many titles capitalized as This is the Section Title, and we changed all them (probably few were forgotten) to This is the section title.

To keep this consistency, we should validate that the capitalization is correct in the CI. This can be done by extracting all the titles, and making sure that only the first letter of the sentence is uppercase, or words defined in a short list, like Series, DataFrame,...

I think this can be done in two ways:

As a sphinx extension that validates the titles as they are processed, and generates warnings if they are not (this will automatically fail the CI).
As an independent script

The first option should be simpler if sphinx can implement this as extension, but not sure if that's the case.

The text was updated successfully, but these errors were encountered:

jreback · 2019-06-21T01:18:21Z

main things to note here are proper names, e.g. PyTables, Python and IPython (prob some others)

martinagvilas · 2019-08-28T10:50:30Z

I would like to give this a try, if that's ok.

tonywu1999 · 2020-01-12T22:21:09Z

Is this still an open issue?

datapythonista · 2020-01-13T01:31:42Z

Yes, still pending, and would be great to get this fixed. Thanks!

tonywu1999 · 2020-01-14T15:48:02Z

When you mention validating as a sphinx extension, do you mean creating a custom extension (like that done in the file 'doc/sphinxext/contributors.py')?

I'm also a little confused on the type of extension to create. I've read about Sphinx roles, Directives, and Builders, but I'm not sure if there's any specific one I should choose for this situation.

datapythonista · 2020-01-14T15:51:46Z

I don't know much about sphinx extensions, and I find sphinx itself very confusing. But yes, I assumed that since sphinx is already parsing all the files, it could be possible in a custom extension like the contributors.py one to validate the titles.

But an independent script that parses all the files, extracts all the titles, and reports any with an unwanted capitalization is also an option.

tonywu1999 · 2020-01-14T18:18:08Z

take

tonywu1999 · 2020-01-15T03:08:36Z

At this point, I have been able to create a python script where given a .rst file, this file can parse through that .rst file, identify titles from the produced doctree, and determine which titles do not follow the capitalization convention mentioned above.

I've been using the doc/source/development/contributing.rst file as a test file to see if my code is working fine. When testing, I noticed my code labeled these titles as not following the capitalization convention:

Code Base:
Pre-Commit
Type Hints
Style Guidelines
Pandas-specific Types
Validating Type Hints

Before moving on, I was wondering if these titles are special in any way (i.e. proper names, etc.) or if they simply do not follow the capitalization convention.

Also, is there any place that I could get easy access to finding proper names? I thought of looking through pandas API reference (https://pandas.pydata.org/pandas-docs/stable/reference/index.html) but I wasn't sure how to approach finding proper names in that document.

Thanks!

datapythonista · 2020-01-15T10:56:12Z

Thanks @tonywu1999, that sounds great. Those titles don't have anything special, and should be changed.

We don't have a list of proper names we want capitalized, we'll have to build that list dynamically, as we validate titles. Jeff mentioned few as examples: PyTables, Python and IPython. But not even sure if those appear in titles.

I think the way to move forward is to open a PR with your script, and you use it to validate couple of files from ci/code_checks.sh. You'll have to fix the titles in the files for the CI to pass, so we can merge the PR.

Once your PR is merged, we can open issues to fix and validate the rest of the files in the docs. Other people can help with this, there is a significant amount of titles to change.

What I'd do is that your script accepts a file, a list of files, or a directory to look for files in it recursively. So, these cases would all be valid:

./scripts/validate_rst_title_capitalization.py doc/source/index.rst
./scripts/validate_rst_title_capitalization.py doc/source/index.rst doc/source/ecosystem.rst
./scripts/validate_rst_title_capitalization.py doc/source/

Initially, we'll validate just a subset of files, and when we're done we'll just call the last command.

For the exceptions, I think the easiest is that your script has something like:

CAPITALIZATION_EXCEPTIONS = [
    "pandas",
    "NumFOCUS",
    "Python",
    ...
]

The words on the list will have to be in the exact capitalization as defined, no matter if they are the first word of the title, or a following one. The rest of the words should have the correct capitalization Xxxxx xxxxx xxxxx.

Does all this make sense?

tonywu1999 · 2020-01-15T22:59:21Z

I have a couple questions regarding your comment:

Would the python script always be executing from the root/base of the repository? And is the script supposed to run without putting the keyword "python" in front of the command?
(Ex: python ./scripts/validate_rst_title_capitalization.py doc/source/index.rst )
What should I do when my script catches a title that is improperly formatted? Should I print the title? Output a warning/error message?
What do you mean by validating a couple titles from ci/code_checks.sh? I looked inside that file and I'm not exactly sure what's going on in that file.

datapythonista · 2020-01-16T01:44:39Z

For (1), I'd use as a reference the scripts in scripts/validate_*.py. I would prefer not to assume is being executed from anywhere (it's easy to not assume that). And I'd make the script executable, that's also easy.

For (2), you should output in the terminal (CI logs), a message as descriptive as possible, so when someone finds it in the CI, can easily understand and fix the problem. You also need to make the script return an exit code different from 0, so the process fails, and the CI fails. Again, you can use the mentioned scripts for reference.

In (3) I meant that when you've got the script, and you open the PR, you can add in ci/code_checks.sh a couple of calls to your script, to start validating the first files. Like calling scripts/validate_capitalization.py doc/source/getting_started/install.rst, and may be another file. This way we can see the script in action in the CI, together with the code in the PR. I'll also ask that initially you validate a file with errors (titles with wrong capitalization), so we can see in the CI that the script works as expected, and how the errors messages look like with real examples. After we see that, you'll have to fix the errors in the file, or remove the file from the validation, so the CI is green and we can merge.

Thanks!

…added (pandas-dev#26941)

…-dev#26941)

tonywu1999 · 2020-01-18T15:05:37Z

Hi, I recently committed and made a pull request with the new script ( #31114 ), but I encountered multiple issues.

One big issue I'm having is suppressing the output of helper functions that I imported. In my script, I had created a context manager to suppress output, which worked when I ran the script on my local machine, but did not work on GitHub when code_checks.sh ran. Is there any way I can suppress the output of other helper functions?

)

…mong headings in documentation (#26941) (#31114)

…mong headings in documentation (pandas-dev#26941) (pandas-dev#31114)

datapythonista added Docs CI Continuous Integration labels Jun 19, 2019

datapythonista mentioned this issue Jun 19, 2019

DOC: Make section title capitalization consistent #26830

Closed

datapythonista mentioned this issue Aug 28, 2019

Validate that titles in pandas are correctly capitalized python-sprints/pandas-mentoring#155

Open

datapythonista added the good first issue label Nov 13, 2019

github-actions bot assigned tonywu1999 Jan 14, 2020

tonywu1999 pushed a commit to tonywu1999/pandas that referenced this issue Jan 17, 2020

Validate consistency of title capitalization in documentation script …

11556b7

…added (pandas-dev#26941)

tonywu1999 pushed a commit to tonywu1999/pandas that referenced this issue Jan 17, 2020

Adding script to validate consistency of title capitalization (pandas…

9fc312a

…-dev#26941)

tonywu1999 pushed a commit to tonywu1999/pandas that referenced this issue Jan 17, 2020

Adding validate_rst_title_capitalization.py (pandas-dev#26941)

635163d

tonywu1999 pushed a commit to tonywu1999/pandas that referenced this issue Jan 18, 2020

Testing validate_rst_capitalization.py script (pandas-dev#26941)

c4ff8bd

tonywu1999 pushed a commit to tonywu1999/pandas that referenced this issue Jan 18, 2020

Edited validate script (pandas-dev#26941)

83f778c

tonywu1999 pushed a commit to tonywu1999/pandas that referenced this issue Jan 18, 2020

Edited validate_rst_title_capitalization.py for review (pandas-dev#26941

b7c0bfd

)

tonywu1999 pushed a commit to tonywu1999/pandas that referenced this issue Jan 19, 2020

Checking if stderr output will be suppressed (pandas-dev#26941)

7ea58df

tonywu1999 pushed a commit to tonywu1999/pandas that referenced this issue Jan 19, 2020

Simplified validate_rst_title_capitalization.py to print correctly (p…

60d8db9

…andas-dev#26941)

tonywu1999 pushed a commit to tonywu1999/pandas that referenced this issue Jan 19, 2020

Testing script on doc/source/development/contributing.rst (pandas-dev…

0e344ad

…#26941)

tonywu1999 pushed a commit to tonywu1999/pandas that referenced this issue Jan 19, 2020

validate_rst_title_capitalization.py MomIsBestFriend edits (pandas-de…

3757712

…v#26941)

tonywu1999 pushed a commit to tonywu1999/pandas that referenced this issue Jan 21, 2020

Created method to correct title capitalization (pandas-dev#26941)

56bfc44

tonywu1999 pushed a commit to tonywu1999/pandas that referenced this issue Jan 21, 2020

Ran black on validate_rst_title_capitalization (pandas-dev#26941)

deddc2d

tonywu1999 pushed a commit to tonywu1999/pandas that referenced this issue Jan 22, 2020

Simplified validate_rst_title_capitalization main method (pandas-dev#…

df01730

…26941)

tonywu1999 mentioned this issue Feb 7, 2020

CI: Adding script to validate consistent and correct capitalization among headings in documentation (#26941) #31114

Merged

datapythonista closed this as completed in #31114 Mar 7, 2020

datapythonista pushed a commit that referenced this issue Mar 7, 2020

CI: Adding script to validate consistent and correct capitalization a…

2a2258d

…mong headings in documentation (#26941) (#31114)

SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this issue Mar 22, 2020

CI: Adding script to validate consistent and correct capitalization a…

a779def

…mong headings in documentation (pandas-dev#26941) (pandas-dev#31114)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Validate consistency of title capitalization #26941

DOC: Validate consistency of title capitalization #26941

datapythonista commented Jun 19, 2019

jreback commented Jun 21, 2019

martinagvilas commented Aug 28, 2019

tonywu1999 commented Jan 12, 2020

datapythonista commented Jan 13, 2020

tonywu1999 commented Jan 14, 2020 •

edited

Loading

datapythonista commented Jan 14, 2020

tonywu1999 commented Jan 14, 2020

tonywu1999 commented Jan 15, 2020

datapythonista commented Jan 15, 2020

tonywu1999 commented Jan 15, 2020

datapythonista commented Jan 16, 2020

tonywu1999 commented Jan 18, 2020

DOC: Validate consistency of title capitalization #26941

DOC: Validate consistency of title capitalization #26941

Comments

datapythonista commented Jun 19, 2019

jreback commented Jun 21, 2019

martinagvilas commented Aug 28, 2019

tonywu1999 commented Jan 12, 2020

datapythonista commented Jan 13, 2020

tonywu1999 commented Jan 14, 2020 • edited Loading

datapythonista commented Jan 14, 2020

tonywu1999 commented Jan 14, 2020

tonywu1999 commented Jan 15, 2020

datapythonista commented Jan 15, 2020

tonywu1999 commented Jan 15, 2020

datapythonista commented Jan 16, 2020

tonywu1999 commented Jan 18, 2020

tonywu1999 commented Jan 14, 2020 •

edited

Loading