Skip to content

Commit a779def

Browse files
tonywu1999SeeminSyed
tonywu1999
authored andcommitted
CI: Adding script to validate consistent and correct capitalization among headings in documentation (pandas-dev#26941) (pandas-dev#31114)
1 parent 88a54b8 commit a779def

File tree

3 files changed

+217
-7
lines changed

3 files changed

+217
-7
lines changed

ci/code_checks.sh

+4
Original file line numberDiff line numberDiff line change
@@ -320,6 +320,10 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
320320
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=GL03,GL04,GL05,GL06,GL07,GL09,GL10,SS04,SS05,PR03,PR04,PR05,PR10,EX04,RT01,RT04,RT05,SA02,SA03,SA05
321321
RET=$(($RET + $?)) ; echo $MSG "DONE"
322322

323+
MSG='Validate correct capitalization among titles in documentation' ; echo $MSG
324+
$BASE_DIR/scripts/validate_rst_title_capitalization.py $BASE_DIR/doc/source/development/contributing.rst
325+
RET=$(($RET + $?)) ; echo $MSG "DONE"
326+
323327
fi
324328

325329
### DEPENDENCIES ###

doc/source/development/contributing.rst

+7-7
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ requires a C compiler and Python environment. If you're making documentation
146146
changes, you can skip to :ref:`contributing.documentation` but you won't be able
147147
to build the documentation locally before pushing your changes.
148148

149-
Using a Docker Container
149+
Using a Docker container
150150
~~~~~~~~~~~~~~~~~~~~~~~~
151151

152152
Instead of manually setting up a development environment, you can use Docker to
@@ -754,7 +754,7 @@ You can then verify the changes look ok, then git :ref:`commit <contributing.com
754754

755755
.. _contributing.pre-commit:
756756

757-
Pre-Commit
757+
Pre-commit
758758
~~~~~~~~~~
759759

760760
You can run many of these styling checks manually as we have described above. However,
@@ -822,12 +822,12 @@ See :ref:`contributing.warnings` for more.
822822

823823
.. _contributing.type_hints:
824824

825-
Type Hints
825+
Type hints
826826
----------
827827

828828
*pandas* strongly encourages the use of :pep:`484` style type hints. New development should contain type hints and pull requests to annotate existing code are accepted as well!
829829

830-
Style Guidelines
830+
Style guidelines
831831
~~~~~~~~~~~~~~~~
832832

833833
Types imports should follow the ``from typing import ...`` convention. So rather than
@@ -903,7 +903,7 @@ The limitation here is that while a human can reasonably understand that ``is_nu
903903
904904
With custom types and inference this is not always possible so exceptions are made, but every effort should be exhausted to avoid ``cast`` before going down such paths.
905905

906-
Pandas-specific Types
906+
pandas-specific types
907907
~~~~~~~~~~~~~~~~~~~~~
908908

909909
Commonly used types specific to *pandas* will appear in `pandas._typing <https://github.com/pandas-dev/pandas/blob/master/pandas/_typing.py>`_ and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas.
@@ -919,7 +919,7 @@ For example, quite a few functions in *pandas* accept a ``dtype`` argument. This
919919
920920
This module will ultimately house types for repeatedly used concepts like "path-like", "array-like", "numeric", etc... and can also hold aliases for commonly appearing parameters like `axis`. Development of this module is active so be sure to refer to the source for the most up to date list of available types.
921921

922-
Validating Type Hints
922+
Validating type hints
923923
~~~~~~~~~~~~~~~~~~~~~
924924

925925
*pandas* uses `mypy <http://mypy-lang.org>`_ to statically analyze the code base and type hints. After making any change you can ensure your type hints are correct by running
@@ -1539,7 +1539,7 @@ The branch will still exist on GitHub, so to delete it there do::
15391539
.. _Gitter: https://gitter.im/pydata/pandas
15401540

15411541

1542-
Tips for a successful Pull Request
1542+
Tips for a successful pull request
15431543
==================================
15441544

15451545
If you have made it to the `Review your code`_ phase, one of the core contributors may
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
#!/usr/bin/env python
2+
"""
3+
Validate that the titles in the rst files follow the proper capitalization convention.
4+
5+
Print the titles that do not follow the convention.
6+
7+
Usage::
8+
./scripts/validate_rst_title_capitalization.py doc/source/development/contributing.rst
9+
./scripts/validate_rst_title_capitalization.py doc/source/
10+
11+
"""
12+
import argparse
13+
import sys
14+
import re
15+
import os
16+
from typing import Tuple, Generator, List
17+
import glob
18+
19+
20+
CAPITALIZATION_EXCEPTIONS = {
21+
"pandas",
22+
"Python",
23+
"IPython",
24+
"PyTables",
25+
"Excel",
26+
"JSON",
27+
"HTML",
28+
"SAS",
29+
"SQL",
30+
"BigQuery",
31+
"STATA",
32+
"Interval",
33+
"PEP8",
34+
"Period",
35+
"Series",
36+
"Index",
37+
"DataFrame",
38+
"C",
39+
"Git",
40+
"GitHub",
41+
"NumPy",
42+
"Apache",
43+
"Arrow",
44+
"Parquet",
45+
"MultiIndex",
46+
"NumFOCUS",
47+
"sklearn",
48+
"Docker",
49+
}
50+
51+
CAP_EXCEPTIONS_DICT = {word.lower(): word for word in CAPITALIZATION_EXCEPTIONS}
52+
53+
err_msg = "Heading capitalization formatted incorrectly. Please correctly capitalize"
54+
55+
symbols = ("*", "=", "-", "^", "~", "#", '"')
56+
57+
58+
def correct_title_capitalization(title: str) -> str:
59+
"""
60+
Algorithm to create the correct capitalization for a given title.
61+
62+
Parameters
63+
----------
64+
title : str
65+
Heading string to correct.
66+
67+
Returns
68+
-------
69+
str
70+
Correctly capitalized heading.
71+
"""
72+
73+
# Strip all non-word characters from the beginning of the title to the
74+
# first word character.
75+
correct_title: str = re.sub(r"^\W*", "", title).capitalize()
76+
77+
# Remove a URL from the title. We do this because words in a URL must
78+
# stay lowercase, even if they are a capitalization exception.
79+
removed_https_title = re.sub(r"<https?:\/\/.*[\r\n]*>", "", correct_title)
80+
81+
# Split a title into a list using non-word character delimiters.
82+
word_list = re.split(r"\W", removed_https_title)
83+
84+
for word in word_list:
85+
if word.lower() in CAP_EXCEPTIONS_DICT:
86+
correct_title = re.sub(
87+
rf"\b{word}\b", CAP_EXCEPTIONS_DICT[word.lower()], correct_title
88+
)
89+
90+
return correct_title
91+
92+
93+
def find_titles(rst_file: str) -> Generator[Tuple[str, int], None, None]:
94+
"""
95+
Algorithm to identify particular text that should be considered headings in an
96+
RST file.
97+
98+
See <https://thomas-cokelaer.info/tutorials/sphinx/rest_syntax.html> for details
99+
on what constitutes a string as a heading in RST.
100+
101+
Parameters
102+
----------
103+
rst_file : str
104+
RST file to scan through for headings.
105+
106+
Yields
107+
-------
108+
title : str
109+
A heading found in the rst file.
110+
111+
line_number : int
112+
The corresponding line number of the heading.
113+
"""
114+
115+
with open(rst_file, "r") as fd:
116+
previous_line = ""
117+
for i, line in enumerate(fd):
118+
line = line[:-1]
119+
line_chars = set(line)
120+
if (
121+
len(line_chars) == 1
122+
and line_chars.pop() in symbols
123+
and len(line) == len(previous_line)
124+
):
125+
yield re.sub(r"[`\*_]", "", previous_line), i
126+
previous_line = line
127+
128+
129+
def find_rst_files(source_paths: List[str]) -> Generator[str, None, None]:
130+
"""
131+
Given the command line arguments of directory paths, this method
132+
yields the strings of the .rst file directories that these paths contain.
133+
134+
Parameters
135+
----------
136+
source_paths : str
137+
List of directories to validate, provided through command line arguments.
138+
139+
Yields
140+
-------
141+
str
142+
Directory address of a .rst files found in command line argument directories.
143+
"""
144+
145+
for directory_address in source_paths:
146+
if not os.path.exists(directory_address):
147+
raise ValueError(
148+
"Please enter a valid path, pointing to a valid file/directory."
149+
)
150+
elif directory_address.endswith(".rst"):
151+
yield directory_address
152+
else:
153+
for filename in glob.glob(
154+
pathname=f"{directory_address}/**/*.rst", recursive=True
155+
):
156+
yield filename
157+
158+
159+
def main(source_paths: List[str], output_format: str) -> bool:
160+
"""
161+
The main method to print all headings with incorrect capitalization.
162+
163+
Parameters
164+
----------
165+
source_paths : str
166+
List of directories to validate, provided through command line arguments.
167+
output_format : str
168+
Output format of the script.
169+
170+
Returns
171+
-------
172+
int
173+
Number of incorrect headings found overall.
174+
"""
175+
176+
number_of_errors: int = 0
177+
178+
for filename in find_rst_files(source_paths):
179+
for title, line_number in find_titles(filename):
180+
if title != correct_title_capitalization(title):
181+
print(
182+
f"""{filename}:{line_number}:{err_msg} "{title}" to "{
183+
correct_title_capitalization(title)}" """
184+
)
185+
number_of_errors += 1
186+
187+
return number_of_errors
188+
189+
190+
if __name__ == "__main__":
191+
parser = argparse.ArgumentParser(description="Validate heading capitalization")
192+
193+
parser.add_argument(
194+
"paths", nargs="+", default=".", help="Source paths of file/directory to check."
195+
)
196+
197+
parser.add_argument(
198+
"--format",
199+
"-f",
200+
default="{source_path}:{line_number}:{msg}:{heading}:{correct_heading}",
201+
help="Output format of incorrectly capitalized titles",
202+
)
203+
204+
args = parser.parse_args()
205+
206+
sys.exit(main(args.paths, args.format))

0 commit comments

Comments
 (0)