Skip to content

CI, ENH: Check each minimum dependency is enforced in *.yaml and environment.yml #51189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Feb 23, 2023
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
4a15e02
Add pin_min_versions_to_ci_deps()
kathleenhang Feb 1, 2023
f71a869
Fix typos
kathleenhang Feb 1, 2023
e4b90a2
Remove debug statements
kathleenhang Feb 1, 2023
527bd37
Update YAML package versions
kathleenhang Feb 2, 2023
cbd87e5
Replace files with updated min versions
kathleenhang Feb 3, 2023
c0c4ffc
Add exclusions + edge cases
kathleenhang Feb 6, 2023
e4789d2
Revert file to original
kathleenhang Feb 6, 2023
1c8c8b4
Revert environment.yml
kathleenhang Feb 7, 2023
6d6c391
Fix bugs
kathleenhang Feb 9, 2023
7b3f50a
Merge branch 'main' into kathleenhang-50207
kathleenhang Feb 9, 2023
a4b4bdb
Add helper function
kathleenhang Feb 9, 2023
e2bb536
Ran pre-commit run validate-min-versions-in-sync --all-files
kathleenhang Feb 9, 2023
8397c94
Merge branch 'kathleenhang-50207' of https://github.com/kathleenhang/…
kathleenhang Feb 9, 2023
3a73573
Fix merge issue
kathleenhang Feb 9, 2023
ed4827b
Merge branch 'main' into kathleenhang-50207
kathleenhang Feb 9, 2023
0968c10
add parameter for custom toml dependency
kathleenhang Feb 15, 2023
6d30e63
Add dummy test files for test_validate_min_versions_in_sync.py
kathleenhang Feb 15, 2023
f923b98
Add unit test: test_update_yaml_file_version()
kathleenhang Feb 15, 2023
49e00c6
Merge branch 'kathleenhang-50207' of https://github.com/kathleenhang/…
kathleenhang Feb 15, 2023
536450f
Merge branch 'main' into kathleenhang-50207
kathleenhang Feb 15, 2023
1431ae8
Add dummy test files for edge cases
kathleenhang Feb 15, 2023
d2d5d73
Add parametrization to unit test
kathleenhang Feb 15, 2023
1af6771
Run pre-commit
kathleenhang Feb 15, 2023
3b6123f
Fix tzdata bug
kathleenhang Feb 15, 2023
2c8c746
Merge branch 'kathleenhang-50207' of https://github.com/kathleenhang/…
kathleenhang Feb 15, 2023
4af5291
Fix tzdata issue
kathleenhang Feb 15, 2023
6e8b0ef
Merge branch 'main' into kathleenhang-50207
kathleenhang Feb 15, 2023
5bd4908
Refactor
kathleenhang Feb 20, 2023
78114fb
Merge branch 'kathleenhang-50207' of https://github.com/kathleenhang/…
kathleenhang Feb 20, 2023
04dda2e
Update ci deps
kathleenhang Feb 20, 2023
db0feae
Remove pyarrow
kathleenhang Feb 20, 2023
4bd6e7a
Revert tzdata
kathleenhang Feb 20, 2023
bec1dd1
Update pyarrow
kathleenhang Feb 20, 2023
3c4e800
Revert pyarrow
kathleenhang Feb 20, 2023
964e201
Add docstrings
kathleenhang Feb 21, 2023
e89dd4d
Merge branch 'main' into kathleenhang-50207
kathleenhang Feb 21, 2023
47179f0
Remove psutil
kathleenhang Feb 22, 2023
2c39a81
Fix file rewrite issue
kathleenhang Feb 22, 2023
554ed85
Refactor test data
kathleenhang Feb 22, 2023
b2a672b
Merge branch 'kathleenhang-50207' of https://github.com/kathleenhang/…
kathleenhang Feb 22, 2023
985730f
minor fixups
Feb 23, 2023
0b25f3a
Merge remote-tracking branch 'upstream/main'
Feb 23, 2023
07c68b3
fixup
Feb 23, 2023
74965c2
improve typing
Feb 23, 2023
635c055
Merge branch 'main' into kathleenhang-50207
MarcoGorelli Feb 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ dependencies:
- lxml
- matplotlib>=3.6.1
- numba>=0.53.1
- numexpr>=2.8.0 # pin for "Run checks on imported code" job
- numexpr>=2.8.0 # pin for "Run checks on imported code" job
- openpyxl
- odfpy
- py
Expand Down Expand Up @@ -85,12 +85,12 @@ dependencies:
- ruff=0.0.215

# documentation
- gitpython # obtain contributors from git for whatsnew
- gitpython # obtain contributors from git for whatsnew
- gitdb
- natsort # DataFrame.sort_values doctest
- natsort # DataFrame.sort_values doctest
- numpydoc
- pydata-sphinx-theme<0.11
- pytest-cython # doctest
- pytest-cython # doctest
- sphinx
- sphinx-panels
- sphinx-copybutton
Expand All @@ -109,7 +109,7 @@ dependencies:
- ipykernel

# web
- jinja2 # in optional dependencies, but documented here as needed
- jinja2 # in optional dependencies, but documented here as needed
- markdown
- feedparser
- pyyaml
Expand Down
85 changes: 84 additions & 1 deletion scripts/validate_min_versions_in_sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,12 @@
"""
from __future__ import annotations

import os
import pathlib
import sys

import yaml

if sys.version_info >= (3, 11):
import tomllib
else:
Expand All @@ -28,6 +31,8 @@
)
CODE_PATH = pathlib.Path("pandas/compat/_optional.py").resolve()
SETUP_PATH = pathlib.Path("pyproject.toml").resolve()
YAML_PATH = pathlib.Path("ci/deps")
ENV_PATH = pathlib.Path("environment.yml")
EXCLUDE_DEPS = {"tzdata", "blosc"}
# pandas package is not available
# in pre-commit environment
Expand All @@ -41,6 +46,84 @@
import _optional


def pin_min_versions_to_ci_deps():
exclusion_list = {
"python=3.8[build=*_pypy]": None,
}
toml_dependencies = get_versions_from_toml()
all_yaml_files = list(YAML_PATH.iterdir())
all_yaml_files.append(ENV_PATH)
for curr_file in all_yaml_files:
with open(curr_file) as yaml_f:
data = yaml_f.read()
yaml_file = yaml.safe_load(data)
yaml_deps = yaml_file["dependencies"]
res = []
[res.append(x) for x in yaml_deps if x not in res]
yaml_deps = res
for dep_line in yaml_deps:
replace_text = operator = yaml_left = ""
search_text = str(dep_line)
if str(dep_line) in exclusion_list:
continue
if ">=" in dep_line:
operator = ">="
elif "==" in dep_line:
operator = "=="
elif "=" in dep_line:
operator = "="
elif "<" in dep_line:
operator = "<"
elif ">" in dep_line:
operator = ">"
else:
operator = ""
if operator == "":
yaml_package, yaml_version = str(dep_line).strip(), None
yaml_left = yaml_package
else:
yaml_package, yaml_version = str(dep_line).strip().split(operator)
if operator == "<" or operator == ">":
if yaml_package in toml_dependencies:
if version.parse(yaml_version) <= version.parse(
toml_dependencies[yaml_package]
):
yaml_left = yaml_package
else:
yaml_left = str(dep_line) + ", "
else:
yaml_left = yaml_package + operator
if yaml_package in toml_dependencies:
if ">" in yaml_left or "<" in yaml_left:
if "," in yaml_left:
# ex: "numpy<1.24.0," + ">=" + "1.2"
replace_text = (
yaml_left + ">=" + toml_dependencies[yaml_package]
)
else:
replace_text = yaml_left + toml_dependencies[yaml_package]
# update yaml package version to TOML min version
elif yaml_version is not None:
if version.parse(
toml_dependencies[yaml_package]
) > version.parse(yaml_version):
# ex: "hypothesis>=" + "6.34.2"
replace_text = yaml_left + toml_dependencies[yaml_package]
elif version.parse(
toml_dependencies[yaml_package]
) == version.parse(yaml_version):
replace_text = dep_line
else:
# ex: "hypothesis + ">=" + 6.34.2"
replace_text = (
yaml_package + ">=" + toml_dependencies[yaml_package]
)
data = data.replace(search_text, replace_text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great, but it's getting quite complicated 😄 how about taking this part out to its own function (which accepts yaml_file as an argument), and then unit-testing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I've separated this portion of code into its own function. I'm writing unit tests now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

BTW if you notice some CI failures for some unrelated jobs, that's probably exactly what they are (i.e. unrelated)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a new unit test file at scripts/tests/test_validate_min_versions_in_sync.py.

It comes with 3 additional files under scripts/tests/:

- dummy_toml_file.toml
- dummy_yaml_expected_file_1.yaml
- dummy_yaml_unmodified_file_1.yaml

The .toml file can be adjusted to change the baselines for the minimum dependencies. The dummy_yaml_unmodified_file_1.yaml can be adjusted to test different edge cases, however, the dummy_yaml_expected_file_1.yaml should be changed accordingly to reflect the proper results for the previously mentioned file, since the two should be in sync as a testing set.

I was thinking of creating more sets of expected/unmodified YAML files so that each set can test its own edge case like duplicates, same version, no version, range, etc. That way, each unit test can be smaller and more meaningful.

Currently, I think the dataset is too large to provide meaningful test assertion results.

@MarcoGorelli What are your thoughts? Feedback appreciated!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! this isn't user-facing, so arguably doesn't need too extensive testing. but yes, smaller test files would make it easier to reason about what the script is doing

os.remove(curr_file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary (the next line will overwrite the file)

with open(curr_file, "w") as f:
f.write(data)


def get_versions_from_code() -> dict[str, str]:
"""Min versions for checking within pandas code."""
install_map = _optional.INSTALL_MAPPING
Expand Down Expand Up @@ -108,11 +191,11 @@ def get_versions_from_toml() -> dict[str, str]:

for item in EXCLUDE_DEPS:
optional_dependencies.pop(item, None)

return optional_dependencies


def main():
pin_min_versions_to_ci_deps()
with open(CI_PATH, encoding="utf-8") as f:
_, ci_optional = get_versions_from_ci(f.readlines())
code_optional = get_versions_from_code()
Expand Down