Skip to content

TYP: return type of read_csv/table #45610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 2, 2022
Merged

TYP: return type of read_csv/table #45610

merged 1 commit into from
Mar 2, 2022

Conversation

twoertwein
Copy link
Member

Return type of read_csv/table (union for now, overload can be used when positional keywords are deprecated), type TextFileReader, and type some random arguments of read_csv/table

@twoertwein twoertwein added the Typing type annotations, mypy/pyright type checking label Jan 25, 2022
@twoertwein
Copy link
Member Author

twoertwein commented Jan 25, 2022

It might be good to wait until #45594 is merged, as this PR will create new failure cases in the future type tests. Mypy is currently "happy" with read_csv because it returns "Any".

@jreback
Copy link
Contributor

jreback commented Jan 27, 2022

It might be good to wait until #45594 is merged, as this PR will create new failure cases in the future type tests. Mypy is currently "happy" with read_csv because it returns "Any".

got it , can you put in draft status until then?

@twoertwein twoertwein marked this pull request as draft January 27, 2022 22:57
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Feb 4, 2022

You might want to take a look at what I did for the Microsoft type stubs for read_csv() to use @overload to get the types right. See this PR microsoft/python-type-stubs#143 and the resulting file https://github.com/microsoft/python-type-stubs/blob/5ad4e1ab76ddc5c6d4f376579a97b69ba1f72e8d/pandas/io/parsers.pyi

@twoertwein
Copy link
Member Author

You might want to take a look at what I did for the Microsoft type stubs for read_csv() to use @overload to get the types right.

It would be good to discuss whether overload-signatures should already require keyword-only arguments, even thought the implementation still accepts positional arguments. I think the only downside is that people who use positional arguments would get some mypy/pyright errors (since no overload matches; might actually be a "feature" since positional arguments are anyway being deprecated). I would be happy to use overloads while positional arguments are still in the deprecation phase.

@simonjayhawkins @phofl

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Feb 7, 2022

It would be good to discuss whether overload-signatures should already require keyword-only arguments, even thought the implementation still accepts positional arguments. I think the only downside is that people who use positional arguments would get some mypy/pyright errors (since no overload matches; might actually be a "feature" since positional arguments are anyway being deprecated). I would be happy to use overloads while positional arguments are still in the deprecation phase.

Interesting point. With a method like read_csv(), there are so many arguments, I don't think that people would use a positional call (after the first argument - the file name/buffer). I think (but I'm not sure) that you can move the * down to later in the parameter list if you want to allow certain positional arguments all the time.

@twoertwein
Copy link
Member Author

twoertwein commented Feb 8, 2022

I think (but I'm not sure) that you can move the * down to later in the parameter list if you want to allow certain positional arguments all the time.

I like this approach, but I'm not sure how strict pandas is about semver, technically an API change.

Do people have opinions about keyword-only in overloads while positional arguments are still in the deprecation phase? Here is an example:

from __future__ import annotations

from typing import Literal, overload


@overload
def test(a: int, *, switch: Literal[True]) -> None:
    ...


@overload
def test(a: int, *, switch: Literal[False]) -> int:
    # explicit default case
    ...


@overload
def test(a: int, *, switch: Literal[False] = ...) -> int:
    # default case (same as second overload but with '= ...')
    ...


@positional_deprecation_warning_decorator
def test(a: int, switch: bool = False) -> None | int:
    ...


reveal_type(test(1))  # int
reveal_type(test(1, switch=False))  # int
reveal_type(test(1, switch=True))  # None
test(1, False)  # mypy/pyright error, but works at runtime
test(1, True)  # mypy/pyright error, but works at runtime

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Feb 8, 2022

Do people have opinions about keyword-only in overloads while positional arguments are still in the deprecation phase?

Is that a decision that has been made for the complete pandas API or just certain methods?

In your example, you use @positional_deprecation_warning_decorator . Is that something yet to be written?

@twoertwein
Copy link
Member Author

Is that a decision that has been made for the complete pandas API or just certain methods?

I think currently, we don't do that at all (for example #44678 and #44896): we use Unions until we have keyword-only argument to then use overloads in the future. Personally, I think we could use keyword-only overloads for all functions/methods where we currently return a Union and where at the same time the positional keywords are being deprecated.

In your example, you use @positional_deprecation_warning_decorator . Is that something yet to be written?

Sorry, I meant the existing @deprecate_nonkeyword_arguments(version=None, allowed_args=["a"])

@twoertwein
Copy link
Member Author

I changed the PR to use overloads for read_csv and read_table. I like @Dr-Irv's approach of using overloads (which require keyword-only arguments) while the implementation still supports positional arguments (but already emits a deprecation warning).

It seems that only four overloads are needed for the two read functions:

# iterator=True -> TextFileReader
@overload
def read_csv(
    buffer: str, *, iterator: Literal[True], chunksize: int | None = ...
) -> TextFileReader:
    ...


# chunksize=int -> TextFileReader
@overload
def read_csv(buffer: str, *, iterator: bool = ..., chunksize: int) -> TextFileReader:
    ...


# default -> DataFrame
@overload
def read_csv(
    buffer, *, iterator: Literal[False] = ..., chunksize: None = ...
) -> DataFrame:
    ...


# Union -> DataFrame | TextFileReader
@overload
def read_csv(
    buffer, *, iterator: bool = ..., chunksize: int | None = ...
) -> DataFrame | TextFileReader:
    ...

tested with mypy&pyright

from pandas import read_csv

# DataFrame
reveal_type(read_csv("file.csv"))
reveal_type(read_csv("file.csv", iterator=False))
reveal_type(read_csv("file.csv", iterator=False, chunksize=None))
reveal_type(read_csv("file.csv", chunksize=None))

# TextFileReader
reveal_type(read_csv("file.csv", iterator=True))
reveal_type(read_csv("file.csv", iterator=True, chunksize=None))
reveal_type(read_csv("file.csv", iterator=False, chunksize=1))
reveal_type(read_csv("file.csv", chunksize=1))


# DataFrame | TextFileReader
def int_None() -> int | None:
    ...


def bool_() -> bool:
    ...

reveal_type(read_csv("file.csv", chunksize=int_None()))
reveal_type(read_csv("file.csv", iterator=bool_()))
reveal_type(read_csv("file.csv", iterator=bool_(), chunksize=int_None()))

@twoertwein twoertwein marked this pull request as ready for review February 12, 2022 19:23
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Feb 12, 2022

I changed the PR to use overloads for read_csv and read_table. I like @Dr-Irv's approach of using overloads (which require keyword-only arguments) while the implementation still supports positional arguments (but already emits a deprecation warning).

@twoertwein Glad you like the approach. :-)

@twoertwein
Copy link
Member Author

It might be good to wait until #45594 is merged, as this PR will create new failure cases in the future type tests.

This PR should theoretically not interfere with #45594 anymore (unless I messed the overloads up).

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Feb 13, 2022

This PR should theoretically not interfere with #45594 anymore (unless I messed the overloads up).

Separately, in the https://github.com/microsoft/python-type-stubs repo, these overloads are used, and tests for them are included. I think the plan is to take the tests (and stubs) that are there are put them in here, once we have converged to a reasonable set of tests over there.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @twoertwein lgtm

@mroeschke mroeschke added this to the 1.5 milestone Feb 22, 2022
@jreback jreback merged commit 31c553f into pandas-dev:main Mar 2, 2022
@jreback
Copy link
Contributor

jreback commented Mar 2, 2022

thanks @twoertwein

@twoertwein twoertwein deleted the read_csv branch March 9, 2022 02:56
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants