-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYP: return type of read_csv/table #45610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It might be good to wait until #45594 is merged, as this PR will create new failure cases in the future type tests. Mypy is currently "happy" with read_csv because it returns "Any". |
got it , can you put in draft status until then? |
You might want to take a look at what I did for the Microsoft type stubs for |
It would be good to discuss whether overload-signatures should already require keyword-only arguments, even thought the implementation still accepts positional arguments. I think the only downside is that people who use positional arguments would get some mypy/pyright errors (since no overload matches; might actually be a "feature" since positional arguments are anyway being deprecated). I would be happy to use overloads while positional arguments are still in the deprecation phase. |
Interesting point. With a method like |
I like this approach, but I'm not sure how strict pandas is about semver, technically an API change. Do people have opinions about keyword-only in overloads while positional arguments are still in the deprecation phase? Here is an example: from __future__ import annotations
from typing import Literal, overload
@overload
def test(a: int, *, switch: Literal[True]) -> None:
...
@overload
def test(a: int, *, switch: Literal[False]) -> int:
# explicit default case
...
@overload
def test(a: int, *, switch: Literal[False] = ...) -> int:
# default case (same as second overload but with '= ...')
...
@positional_deprecation_warning_decorator
def test(a: int, switch: bool = False) -> None | int:
...
reveal_type(test(1)) # int
reveal_type(test(1, switch=False)) # int
reveal_type(test(1, switch=True)) # None
test(1, False) # mypy/pyright error, but works at runtime
test(1, True) # mypy/pyright error, but works at runtime |
Is that a decision that has been made for the complete pandas API or just certain methods? In your example, you use |
I think currently, we don't do that at all (for example #44678 and #44896): we use Unions until we have keyword-only argument to then use overloads in the future. Personally, I think we could use keyword-only overloads for all functions/methods where we currently return a Union and where at the same time the positional keywords are being deprecated.
Sorry, I meant the existing |
I changed the PR to use overloads for It seems that only four overloads are needed for the two read functions: # iterator=True -> TextFileReader
@overload
def read_csv(
buffer: str, *, iterator: Literal[True], chunksize: int | None = ...
) -> TextFileReader:
...
# chunksize=int -> TextFileReader
@overload
def read_csv(buffer: str, *, iterator: bool = ..., chunksize: int) -> TextFileReader:
...
# default -> DataFrame
@overload
def read_csv(
buffer, *, iterator: Literal[False] = ..., chunksize: None = ...
) -> DataFrame:
...
# Union -> DataFrame | TextFileReader
@overload
def read_csv(
buffer, *, iterator: bool = ..., chunksize: int | None = ...
) -> DataFrame | TextFileReader:
... tested with mypy&pyright from pandas import read_csv
# DataFrame
reveal_type(read_csv("file.csv"))
reveal_type(read_csv("file.csv", iterator=False))
reveal_type(read_csv("file.csv", iterator=False, chunksize=None))
reveal_type(read_csv("file.csv", chunksize=None))
# TextFileReader
reveal_type(read_csv("file.csv", iterator=True))
reveal_type(read_csv("file.csv", iterator=True, chunksize=None))
reveal_type(read_csv("file.csv", iterator=False, chunksize=1))
reveal_type(read_csv("file.csv", chunksize=1))
# DataFrame | TextFileReader
def int_None() -> int | None:
...
def bool_() -> bool:
...
reveal_type(read_csv("file.csv", chunksize=int_None()))
reveal_type(read_csv("file.csv", iterator=bool_()))
reveal_type(read_csv("file.csv", iterator=bool_(), chunksize=int_None())) |
@twoertwein Glad you like the approach. :-) |
Separately, in the https://github.com/microsoft/python-type-stubs repo, these overloads are used, and tests for them are included. I think the plan is to take the tests (and stubs) that are there are put them in here, once we have converged to a reasonable set of tests over there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @twoertwein lgtm
thanks @twoertwein |
Return type of read_csv/table (union for now, overload can be used when positional keywords are deprecated), type TextFileReader, and type some random arguments of read_csv/table