Skip to content

CI: Adding script to validate consistent and correct capitalization among headings in documentation (#26941) #31114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 63 commits into from
Mar 7, 2020
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
85e3fe6
changed name
Jan 13, 2020
e2d5354
adding sphinx extension
Jan 14, 2020
f089c0c
Starting builder
Jan 14, 2020
2dd5791
experimenting with builder
Jan 14, 2020
2294331
before running build
Jan 14, 2020
c06c951
126 warnings?
Jan 14, 2020
2ffeee0
contributors
Jan 14, 2020
1364f86
experimenting
Jan 14, 2020
bb535ae
update
Jan 14, 2020
6b51df6
parser created
Jan 14, 2020
d6198a6
italics working
Jan 14, 2020
21693b6
found a way to collect all heading strings from doctree
Jan 14, 2020
0810c09
testing script
Jan 15, 2020
30c4f8c
modified validation script
Jan 15, 2020
aabd136
command line arguments possible for validation script
Jan 15, 2020
4c83edb
validation script needs better commenting
Jan 16, 2020
50661c3
added line number to validation script
Jan 17, 2020
f513f29
edited code_checks.sh
Jan 17, 2020
2d3cfe7
argument parser correctly implemented
Jan 17, 2020
4ceea5e
Added comments
Jan 17, 2020
11556b7
Validate consistency of title capitalization in documentation script …
Jan 17, 2020
e55776f
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 17, 2020
9fc312a
Adding script to validate consistency of title capitalization (#26941)
Jan 17, 2020
635163d
Adding validate_rst_title_capitalization.py (#26941)
Jan 17, 2020
c4ff8bd
Testing validate_rst_capitalization.py script (#26941)
Jan 18, 2020
927e3ed
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 18, 2020
83f778c
Edited validate script (#26941)
Jan 18, 2020
1907d45
Added parameter and return value information in docstrings
Jan 18, 2020
de06ec8
Edited validate_rst_title_capitalization.py for review (#29641)
Jan 18, 2020
b7c0bfd
Edited validate_rst_title_capitalization.py for review (#26941)
Jan 18, 2020
3d3a7f4
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 18, 2020
7ea58df
Checking if stderr output will be suppressed (#26941)
Jan 19, 2020
d71be41
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 19, 2020
60d8db9
Simplified validate_rst_title_capitalization.py to print correctly (#…
Jan 19, 2020
0e344ad
Testing script on doc/source/development/contributing.rst (#26941)
Jan 19, 2020
3757712
validate_rst_title_capitalization.py MomIsBestFriend edits (#26941)
Jan 19, 2020
3d95777
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 19, 2020
56bfc44
Created method to correct title capitalization (#26941)
Jan 21, 2020
0ec38e2
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 21, 2020
deddc2d
Ran black on validate_rst_title_capitalization (#26941)
Jan 21, 2020
0311fe0
Edit: titles with non-word character as first character are not valid
Jan 22, 2020
3256615
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 22, 2020
df01730
Simplified validate_rst_title_capitalization main method (#26941)
Jan 22, 2020
c1e3abb
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 22, 2020
9a9a57a
Edited parameter and return value description of main function
Jan 22, 2020
bafbf96
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 22, 2020
dd5c983
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 23, 2020
5f0f84a
Added glob module to script 01-27-2020
Jan 27, 2020
2fc019f
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 27, 2020
ee45f98
Edited len(line) != 0 correction
Feb 4, 2020
88dfc46
Merge remote-tracking branch 'upstream/master' into new-feature
Feb 4, 2020
78a49c1
Edited find_titles method
Feb 4, 2020
95d3488
Merge remote-tracking branch 'upstream/master' into new-feature
Feb 22, 2020
1c7de87
Merge remote-tracking branch 'upstream/master' into new-feature
Feb 26, 2020
f4ffd32
edited contributing.rst to have no errors
Feb 26, 2020
3d2e9ce
Merge remote-tracking branch 'upstream/master' into new-feature
Mar 2, 2020
687053f
modified validation script
Mar 6, 2020
ed3cdc6
Merge remote-tracking branch 'upstream/master' into new-feature
Mar 6, 2020
c690281
fix linting errors
Mar 6, 2020
c9775cc
black pandas-dev change
Mar 6, 2020
66c651a
modified changes to comments
Mar 6, 2020
ac5c5b7
Merge remote-tracking branch 'upstream/master' into new-feature
Mar 6, 2020
ceedac5
Merge remote-tracking branch 'upstream/master' into new-feature
Mar 7, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions ci/code_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,10 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=GL03,GL04,GL05,GL06,GL07,GL09,GL10,SS04,SS05,PR03,PR04,PR05,PR10,EX04,RT01,RT04,RT05,SA02,SA03,SA05
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Validate correct capitalization among titles in documentation' ; echo $MSG
$BASE_DIR/scripts/validate_rst_title_capitalization.py $BASE_DIR/doc/source/development/contributing.rst
RET=$(($RET + $?)) ; echo $MSG "DONE"

fi

### DEPENDENCIES ###
Expand Down
14 changes: 7 additions & 7 deletions doc/source/development/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ requires a C compiler and Python environment. If you're making documentation
changes, you can skip to :ref:`contributing.documentation` but you won't be able
to build the documentation locally before pushing your changes.

Using a Docker Container
Using a Docker container
~~~~~~~~~~~~~~~~~~~~~~~~

Instead of manually setting up a development environment, you can use Docker to
Expand Down Expand Up @@ -754,7 +754,7 @@ You can then verify the changes look ok, then git :ref:`commit <contributing.com

.. _contributing.pre-commit:

Pre-Commit
Pre-commit
~~~~~~~~~~

You can run many of these styling checks manually as we have described above. However,
Expand Down Expand Up @@ -822,12 +822,12 @@ See :ref:`contributing.warnings` for more.

.. _contributing.type_hints:

Type Hints
Type hints
----------

*pandas* strongly encourages the use of :pep:`484` style type hints. New development should contain type hints and pull requests to annotate existing code are accepted as well!

Style Guidelines
Style guidelines
~~~~~~~~~~~~~~~~

Types imports should follow the ``from typing import ...`` convention. So rather than
Expand Down Expand Up @@ -903,7 +903,7 @@ The limitation here is that while a human can reasonably understand that ``is_nu

With custom types and inference this is not always possible so exceptions are made, but every effort should be exhausted to avoid ``cast`` before going down such paths.

Pandas-specific Types
pandas-specific types
~~~~~~~~~~~~~~~~~~~~~

Commonly used types specific to *pandas* will appear in `pandas._typing <https://github.com/pandas-dev/pandas/blob/master/pandas/_typing.py>`_ and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas.
Expand All @@ -919,7 +919,7 @@ For example, quite a few functions in *pandas* accept a ``dtype`` argument. This

This module will ultimately house types for repeatedly used concepts like "path-like", "array-like", "numeric", etc... and can also hold aliases for commonly appearing parameters like `axis`. Development of this module is active so be sure to refer to the source for the most up to date list of available types.

Validating Type Hints
Validating type hints
~~~~~~~~~~~~~~~~~~~~~

*pandas* uses `mypy <http://mypy-lang.org>`_ to statically analyze the code base and type hints. After making any change you can ensure your type hints are correct by running
Expand Down Expand Up @@ -1539,7 +1539,7 @@ The branch will still exist on GitHub, so to delete it there do::
.. _Gitter: https://gitter.im/pydata/pandas


Tips for a successful Pull Request
Tips for a successful pull request
==================================

If you have made it to the `Review your code`_ phase, one of the core contributors may
Expand Down
211 changes: 211 additions & 0 deletions scripts/validate_rst_title_capitalization.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
#!/usr/bin/env python
"""
Validate that the titles in the rst files follow the proper capitalization convention.

Print the titles that do not follow the convention.

Usage::
./scripts/validate_rst_title_capitalization.py doc/source/development/contributing.rst
./scripts/validate_rst_title_capitalization.py doc/source/

"""
import argparse
import sys
import re
import os
from typing import Tuple, Generator, List
import glob


CAPITALIZATION_EXCEPTIONS = {
"pandas",
"Python",
"IPython",
"PyTables",
"Excel",
"JSON",
"HTML",
"SAS",
"SQL",
"BigQuery",
"STATA",
"Interval",
"PEP8",
"Period",
"Series",
"Index",
"DataFrame",
"C",
"Git",
"GitHub",
"NumPy",
"Apache",
"Arrow",
"Parquet",
"MultiIndex",
"NumFOCUS",
"sklearn",
"Docker",
}

CAP_EXCEPTIONS_DICT = {word.lower(): word for word in CAPITALIZATION_EXCEPTIONS}

err_msg = "Heading capitalization formatted incorrectly. Please correctly capitalize"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're using this just once, I'd move it to the function where it's being used.



def correct_title_capitalization(title: str) -> str:
"""
Algorithm to create the correct capitalization for a given title.

Parameters
----------
title : str
Heading string to correct.

Returns
-------
correct_title : str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
correct_title : str
str

Correctly capitalized heading.

"""

# Function to strip all non-word characters from the beginning of the title to the
# first word character.
correct_title: str = re.sub(r"^\W*", "", title).capitalize()

# Function to remove a URL from the title. We do this because words in a URL must
# stay lowercase, even if they are a capitalization exception.
removed_https_title = re.sub(r"<https?:\/\/.*[\r\n]*>", "", correct_title)

# Function to split a title into a list using non-word character delimiters.
word_list = re.split(r"\W", removed_https_title)

for word in word_list:
if word.lower() in CAP_EXCEPTIONS_DICT:
correct_title = re.sub(
rf"\b{word}\b", CAP_EXCEPTIONS_DICT[word.lower()], correct_title
)

return correct_title


def find_titles(rst_file: str) -> Generator[Tuple[str, int], None, None]:
"""
Algorithm to identify particular text that should be considered headings in an
RST file.

See <https://thomas-cokelaer.info/tutorials/sphinx/rest_syntax.html> for details
on what constitutes a string as a heading in RST.

Parameters
----------
rst_file : str
RST file to scan through for headings.

Yields
-------
title : str
A heading found in the rst file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blank line here

line_number : int
The corresponding line number of the heading.

"""

with open(rst_file, "r") as file_obj:
lines = file_obj.read().split("\n")

symbols = ("*", "=", "-", "^", "~", "#", '"')

table = str.maketrans("", "", "*`_")

for i, line in enumerate(lines):
line_chars = set(line)
if (
len(line_chars) == 1
and line_chars.pop() in symbols
and len(line) == len(lines[i - 1])
):
yield lines[i - 1].translate(table), i


def find_rst_files(source_paths: List[str]) -> Generator[str, None, None]:
"""
Given the command line arguments of directory paths, this method
yields the strings of the .rst file directories that these paths contain.

Parameters
----------
source_paths : str
List of directories to validate, provided through command line arguments.

Yields
-------
directory_address : str
Directory address of a .rst files found in command line argument directories.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better remove this blank line (we're not validating this docstring, but this blank line would make it fail if we do.

"""

for directory_address in source_paths:
if not os.path.exists(directory_address):
raise ValueError(
"Please enter a valid path, pointing to a valid file/directory."
)
elif directory_address.endswith(".rst"):
yield directory_address
else:
for filename in glob.glob(
pathname=f"{directory_address}/**/*.rst", recursive=True
):
yield filename


def main(source_paths: List[str], output_format: str) -> bool:
"""
The main method to print all headings with incorrect capitalization.

Parameters
----------
source_paths : str
List of directories to validate, provided through command line arguments.
output_format : str
Output format of the script.

Returns
-------
number_of_errors : int
Number of incorrect headings found overall.

"""

number_of_errors: int = 0

for filename in find_rst_files(source_paths):
for title, line_number in find_titles(filename):
if title != correct_title_capitalization(title):
print(
f"""{filename}:{line_number}:{err_msg} "{title}" to "{
correct_title_capitalization(title)}" """
)
number_of_errors += 1

return number_of_errors


if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Validate heading capitalization")

parser.add_argument(
"paths", nargs="+", default=".", help="Source paths of file/directory to check."
)

parser.add_argument(
"--format",
"-f",
default="{source_path}:{line_number}:{msg}:{heading}:{correct_heading}",
help="Output format of incorrectly capitalized titles",
)

args = parser.parse_args()

sys.exit(main(args.paths, args.format))