Skip to content

ENH: Add typing tests according to the issue 45252 #45594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

zkrolikowski-vl
Copy link
Contributor

@zkrolikowski-vl zkrolikowski-vl commented Jan 24, 2022

Typing test stubs originally authored by Joanna Sendorek, Zbigniew Królikowski and pandas_stubs contributors: https://github.com/VirtusLab/pandas-stubs/graphs/contributors

Adapted to pandas needs by Torsten Wörtwein

Issue reference:
#45252

Summary of files and changes:

  • Included a number of test files under pandas/tests/typing/valid that contain functions testing different areas of pandas API. There are no assertions there - their only aim to check argument and declarations that should be allowed according to the documentation and popular use-cases.
  • Included a dedicated README under pandas/tests/typing/REAMDE.md that describes the purpose of origin of those files.
  • Included a file called STUBS_LICENSE under LICENSES. It contain the Copyright and license (MIT) of the original project.
  • Modified pyproject.toml to enable mypy and pyright on the directory.

Related and out-of-scope aspects

Test stubs originally authored by Joanna Sendorek, Zbigniew Królikowski and pandas_stubs contributors: https://github.com/VirtusLab/pandas-stubs/graphs/contributors

Adapted to pandas needs by Torsten Wörtwein
@twoertwein
Copy link
Member

Except for running isort you will need to skip a few tests depending on required dependencies to make the CI greener :) (or wait first for consensus before going forward)

df.plot.hist() in pandas/tests/typing/valid/test_frame.py:534 requires matplotlib
df4: pd.DataFrame = pd.DataFrame({"a": [1], "b": [1]}).style.pipe(foo) in pandas/tests/typing/valid/test_frame.py:733 seems to require jinja2

The easiest is probably to skip the entire test function with the existing decorator:

from pandas.util import _test_decorators as td

@td.skip_if_no("matplotlib")
def test_bar():
    ...

Probably wouldn't hurt to also have a small readme in the typing/valid folder to mention where these files came from and state the main contributors. Probably would also need someone with expertise with licenses (Pandas is BSD-3 but the stubs are MIT).

@twoertwein twoertwein added the Typing type annotations, mypy/pyright type checking label Jan 24, 2022
@zkrolikowski-vl
Copy link
Contributor Author

zkrolikowski-vl commented Jan 25, 2022

@twoertwein It's impossible to make both isort and black pass at the same time. They expect different order of imports. I was forced to push without the pre-commit hooks. Hopefully it works correctly on the CI.

EDIT: No it doesn't. I'll continue working on that. There might be a workaround.

I've added a small README but haven't mentioned the contributors - pandas repository seems devoid of such and I've deemed to continue with this approach.

@zkrolikowski-vl
Copy link
Contributor Author

@twoertwein Only available option was to configure isort to black. This might be a bit controversial.

@twoertwein
Copy link
Member

@twoertwein Only available option was to configure isort to black. This might be a bit controversial.

I think isort is already configured to be compatible with black:

profile = "black"

## Origins and attribution

The tests come from the [pandas-stubs](https://github.com/VirtusLab/pandas-stubs)
repository originally released under the MIT license.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer You seem to often reply to license compatibility questions :)

Is it okay to integrate this MIT-licened code into pandas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pandas is happy to reuse MIT licensed code (e.g., see existing examples in https://github.com/pandas-dev/pandas/tree/main/LICENSES), but you absolutely need to preserve original copyright notices. The general rule (not sure if pandas is 100% strict about it, but it's a good practice) is to keep the original copyright notice as a top level comment inside each file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer Should I include any open source contributors, who added their code since the library was originally created, in the copyright notice?


def test_types_rank() -> None:
s = pd.Series([1, 1, 2, 5, 6, np.nan, "million"])
s.rank()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work with 1.4.0 (FutureWarning) but on main this results in a TypeError
TypeError: '<' not supported between instances of 'str' and 'int'

[I have no idea why some of the tempfile.NamedTemporaryFile-tests create PermissionErrors on Windows]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards the opinion that all deprecated features shouldn't be tested - if pandas would ship with type information (which is the ultimate aim AFAIK) users should be discouraged to use them. Looks like the CI won't allow those either way.

Apparently NamedTemporaryFile cannot have two handles on Windows. Fixed that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would maybe be a good first invalid type test (but that is probably too much for this already very large PR) @simonjayhawkins

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to remove the deprecated use-cases.

@zkrolikowski-vl zkrolikowski-vl changed the title Add typing tests according to the issue 45252 ENH: Add typing tests according to the issue 45252 Jan 28, 2022
Copy link
Member

@twoertwein twoertwein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks @zkrolikowski-vl

There are still a few open points to discuss:

I'm not sure whether this PR needs to wait for all of the above.

@zkrolikowski-vl zkrolikowski-vl marked this pull request as ready for review February 22, 2022 19:10
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@simonjayhawkins
Copy link
Member

@Dr-Irv we should close this now and open these changes against https://github.com/pandas-dev/pandas-stubs in the first instance?

@Dr-Irv
Copy link
Contributor

Dr-Irv commented May 3, 2022

@Dr-Irv we should close this now and open these changes against https://github.com/pandas-dev/pandas-stubs in the first instance?

We can close it. The current pandas-stubs has an evolved version of the tests that are in this PR. No need to copy anything over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Add set of tests to type check the public APIs
6 participants