-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CI: Adding script to validate consistent and correct capitalization among headings in documentation (#26941) #31114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 60 commits
Commits
Show all changes
63 commits
Select commit
Hold shift + click to select a range
85e3fe6
changed name
e2d5354
adding sphinx extension
f089c0c
Starting builder
2dd5791
experimenting with builder
2294331
before running build
c06c951
126 warnings?
2ffeee0
contributors
1364f86
experimenting
bb535ae
update
6b51df6
parser created
d6198a6
italics working
21693b6
found a way to collect all heading strings from doctree
0810c09
testing script
30c4f8c
modified validation script
aabd136
command line arguments possible for validation script
4c83edb
validation script needs better commenting
50661c3
added line number to validation script
f513f29
edited code_checks.sh
2d3cfe7
argument parser correctly implemented
4ceea5e
Added comments
11556b7
Validate consistency of title capitalization in documentation script …
e55776f
Merge remote-tracking branch 'upstream/master' into new-feature
9fc312a
Adding script to validate consistency of title capitalization (#26941)
635163d
Adding validate_rst_title_capitalization.py (#26941)
c4ff8bd
Testing validate_rst_capitalization.py script (#26941)
927e3ed
Merge remote-tracking branch 'upstream/master' into new-feature
83f778c
Edited validate script (#26941)
1907d45
Added parameter and return value information in docstrings
de06ec8
Edited validate_rst_title_capitalization.py for review (#29641)
b7c0bfd
Edited validate_rst_title_capitalization.py for review (#26941)
3d3a7f4
Merge remote-tracking branch 'upstream/master' into new-feature
7ea58df
Checking if stderr output will be suppressed (#26941)
d71be41
Merge remote-tracking branch 'upstream/master' into new-feature
60d8db9
Simplified validate_rst_title_capitalization.py to print correctly (#…
0e344ad
Testing script on doc/source/development/contributing.rst (#26941)
3757712
validate_rst_title_capitalization.py MomIsBestFriend edits (#26941)
3d95777
Merge remote-tracking branch 'upstream/master' into new-feature
56bfc44
Created method to correct title capitalization (#26941)
0ec38e2
Merge remote-tracking branch 'upstream/master' into new-feature
deddc2d
Ran black on validate_rst_title_capitalization (#26941)
0311fe0
Edit: titles with non-word character as first character are not valid
3256615
Merge remote-tracking branch 'upstream/master' into new-feature
df01730
Simplified validate_rst_title_capitalization main method (#26941)
c1e3abb
Merge remote-tracking branch 'upstream/master' into new-feature
9a9a57a
Edited parameter and return value description of main function
bafbf96
Merge remote-tracking branch 'upstream/master' into new-feature
dd5c983
Merge remote-tracking branch 'upstream/master' into new-feature
5f0f84a
Added glob module to script 01-27-2020
2fc019f
Merge remote-tracking branch 'upstream/master' into new-feature
ee45f98
Edited len(line) != 0 correction
88dfc46
Merge remote-tracking branch 'upstream/master' into new-feature
78a49c1
Edited find_titles method
95d3488
Merge remote-tracking branch 'upstream/master' into new-feature
1c7de87
Merge remote-tracking branch 'upstream/master' into new-feature
f4ffd32
edited contributing.rst to have no errors
3d2e9ce
Merge remote-tracking branch 'upstream/master' into new-feature
687053f
modified validation script
ed3cdc6
Merge remote-tracking branch 'upstream/master' into new-feature
c690281
fix linting errors
c9775cc
black pandas-dev change
66c651a
modified changes to comments
ac5c5b7
Merge remote-tracking branch 'upstream/master' into new-feature
ceedac5
Merge remote-tracking branch 'upstream/master' into new-feature
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,206 @@ | ||
#!/usr/bin/env python | ||
""" | ||
Validate that the titles in the rst files follow the proper capitalization convention. | ||
|
||
Print the titles that do not follow the convention. | ||
|
||
Usage:: | ||
./scripts/validate_rst_title_capitalization.py doc/source/development/contributing.rst | ||
./scripts/validate_rst_title_capitalization.py doc/source/ | ||
|
||
""" | ||
import argparse | ||
import sys | ||
import re | ||
import os | ||
from typing import Tuple, Generator, List | ||
import glob | ||
|
||
|
||
CAPITALIZATION_EXCEPTIONS = { | ||
"pandas", | ||
"Python", | ||
"IPython", | ||
"PyTables", | ||
"Excel", | ||
"JSON", | ||
"HTML", | ||
"SAS", | ||
"SQL", | ||
"BigQuery", | ||
"STATA", | ||
"Interval", | ||
"PEP8", | ||
"Period", | ||
"Series", | ||
"Index", | ||
"DataFrame", | ||
"C", | ||
"Git", | ||
"GitHub", | ||
"NumPy", | ||
"Apache", | ||
"Arrow", | ||
"Parquet", | ||
"MultiIndex", | ||
"NumFOCUS", | ||
"sklearn", | ||
"Docker", | ||
} | ||
|
||
CAP_EXCEPTIONS_DICT = {word.lower(): word for word in CAPITALIZATION_EXCEPTIONS} | ||
|
||
err_msg = "Heading capitalization formatted incorrectly. Please correctly capitalize" | ||
|
||
symbols = ("*", "=", "-", "^", "~", "#", '"') | ||
|
||
|
||
def correct_title_capitalization(title: str) -> str: | ||
""" | ||
Algorithm to create the correct capitalization for a given title. | ||
|
||
Parameters | ||
---------- | ||
title : str | ||
Heading string to correct. | ||
|
||
Returns | ||
------- | ||
str | ||
Correctly capitalized heading. | ||
""" | ||
|
||
# Strip all non-word characters from the beginning of the title to the | ||
# first word character. | ||
correct_title: str = re.sub(r"^\W*", "", title).capitalize() | ||
tonywu1999 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Function to remove a URL from the title. We do this because words in a URL must | ||
tonywu1999 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# stay lowercase, even if they are a capitalization exception. | ||
removed_https_title = re.sub(r"<https?:\/\/.*[\r\n]*>", "", correct_title) | ||
|
||
# Function to split a title into a list using non-word character delimiters. | ||
tonywu1999 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
word_list = re.split(r"\W", removed_https_title) | ||
|
||
for word in word_list: | ||
if word.lower() in CAP_EXCEPTIONS_DICT: | ||
correct_title = re.sub( | ||
rf"\b{word}\b", CAP_EXCEPTIONS_DICT[word.lower()], correct_title | ||
) | ||
|
||
return correct_title | ||
|
||
|
||
def find_titles(rst_file: str) -> Generator[Tuple[str, int], None, None]: | ||
""" | ||
Algorithm to identify particular text that should be considered headings in an | ||
RST file. | ||
|
||
See <https://thomas-cokelaer.info/tutorials/sphinx/rest_syntax.html> for details | ||
on what constitutes a string as a heading in RST. | ||
|
||
Parameters | ||
---------- | ||
rst_file : str | ||
RST file to scan through for headings. | ||
|
||
Yields | ||
------- | ||
title : str | ||
A heading found in the rst file. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No blank line here |
||
line_number : int | ||
The corresponding line number of the heading. | ||
""" | ||
|
||
with open(rst_file, "r") as fd: | ||
previous_line = "" | ||
for i, line in enumerate(fd): | ||
line = line[:-1] | ||
line_chars = set(line) | ||
if ( | ||
len(line_chars) == 1 | ||
and line_chars.pop() in symbols | ||
and len(line) == len(previous_line) | ||
): | ||
yield re.sub(r"[`\*_]", "", previous_line), i | ||
previous_line = line | ||
|
||
|
||
def find_rst_files(source_paths: List[str]) -> Generator[str, None, None]: | ||
""" | ||
Given the command line arguments of directory paths, this method | ||
yields the strings of the .rst file directories that these paths contain. | ||
|
||
Parameters | ||
---------- | ||
source_paths : str | ||
List of directories to validate, provided through command line arguments. | ||
|
||
Yields | ||
------- | ||
str | ||
Directory address of a .rst files found in command line argument directories. | ||
""" | ||
|
||
tonywu1999 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
for directory_address in source_paths: | ||
if not os.path.exists(directory_address): | ||
raise ValueError( | ||
"Please enter a valid path, pointing to a valid file/directory." | ||
) | ||
elif directory_address.endswith(".rst"): | ||
yield directory_address | ||
else: | ||
for filename in glob.glob( | ||
pathname=f"{directory_address}/**/*.rst", recursive=True | ||
): | ||
yield filename | ||
|
||
|
||
def main(source_paths: List[str], output_format: str) -> bool: | ||
""" | ||
The main method to print all headings with incorrect capitalization. | ||
|
||
Parameters | ||
---------- | ||
source_paths : str | ||
List of directories to validate, provided through command line arguments. | ||
output_format : str | ||
Output format of the script. | ||
|
||
Returns | ||
------- | ||
int | ||
Number of incorrect headings found overall. | ||
""" | ||
|
||
tonywu1999 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
number_of_errors: int = 0 | ||
|
||
for filename in find_rst_files(source_paths): | ||
for title, line_number in find_titles(filename): | ||
if title != correct_title_capitalization(title): | ||
print( | ||
f"""{filename}:{line_number}:{err_msg} "{title}" to "{ | ||
correct_title_capitalization(title)}" """ | ||
) | ||
number_of_errors += 1 | ||
|
||
return number_of_errors | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser(description="Validate heading capitalization") | ||
|
||
parser.add_argument( | ||
"paths", nargs="+", default=".", help="Source paths of file/directory to check." | ||
) | ||
|
||
parser.add_argument( | ||
"--format", | ||
"-f", | ||
default="{source_path}:{line_number}:{msg}:{heading}:{correct_heading}", | ||
help="Output format of incorrectly capitalized titles", | ||
) | ||
|
||
args = parser.parse_args() | ||
|
||
sys.exit(main(args.paths, args.format)) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're using this just once, I'd move it to the function where it's being used.