TYP: return type of read_csv/table #45610

twoertwein · 2022-01-25T02:34:26Z

Return type of read_csv/table (union for now, overload can be used when positional keywords are deprecated), type TextFileReader, and type some random arguments of read_csv/table

twoertwein · 2022-01-25T20:56:31Z

It might be good to wait until #45594 is merged, as this PR will create new failure cases in the future type tests. Mypy is currently "happy" with read_csv because it returns "Any".

jreback · 2022-01-27T22:22:54Z

It might be good to wait until #45594 is merged, as this PR will create new failure cases in the future type tests. Mypy is currently "happy" with read_csv because it returns "Any".

got it , can you put in draft status until then?

Dr-Irv · 2022-02-04T20:35:35Z

You might want to take a look at what I did for the Microsoft type stubs for read_csv() to use @overload to get the types right. See this PR microsoft/python-type-stubs#143 and the resulting file https://github.com/microsoft/python-type-stubs/blob/5ad4e1ab76ddc5c6d4f376579a97b69ba1f72e8d/pandas/io/parsers.pyi

twoertwein · 2022-02-05T02:59:09Z

You might want to take a look at what I did for the Microsoft type stubs for read_csv() to use @overload to get the types right.

It would be good to discuss whether overload-signatures should already require keyword-only arguments, even thought the implementation still accepts positional arguments. I think the only downside is that people who use positional arguments would get some mypy/pyright errors (since no overload matches; might actually be a "feature" since positional arguments are anyway being deprecated). I would be happy to use overloads while positional arguments are still in the deprecation phase.

@simonjayhawkins @phofl

Dr-Irv · 2022-02-07T14:17:29Z

It would be good to discuss whether overload-signatures should already require keyword-only arguments, even thought the implementation still accepts positional arguments. I think the only downside is that people who use positional arguments would get some mypy/pyright errors (since no overload matches; might actually be a "feature" since positional arguments are anyway being deprecated). I would be happy to use overloads while positional arguments are still in the deprecation phase.

Interesting point. With a method like read_csv(), there are so many arguments, I don't think that people would use a positional call (after the first argument - the file name/buffer). I think (but I'm not sure) that you can move the * down to later in the parameter list if you want to allow certain positional arguments all the time.

twoertwein · 2022-02-08T03:23:25Z

I think (but I'm not sure) that you can move the * down to later in the parameter list if you want to allow certain positional arguments all the time.

I like this approach, but I'm not sure how strict pandas is about semver, technically an API change.

Do people have opinions about keyword-only in overloads while positional arguments are still in the deprecation phase? Here is an example:

from __future__ import annotations

from typing import Literal, overload


@overload
def test(a: int, *, switch: Literal[True]) -> None:
    ...


@overload
def test(a: int, *, switch: Literal[False]) -> int:
    # explicit default case
    ...


@overload
def test(a: int, *, switch: Literal[False] = ...) -> int:
    # default case (same as second overload but with '= ...')
    ...


@positional_deprecation_warning_decorator
def test(a: int, switch: bool = False) -> None | int:
    ...


reveal_type(test(1))  # int
reveal_type(test(1, switch=False))  # int
reveal_type(test(1, switch=True))  # None
test(1, False)  # mypy/pyright error, but works at runtime
test(1, True)  # mypy/pyright error, but works at runtime

Dr-Irv · 2022-02-08T13:22:04Z

Do people have opinions about keyword-only in overloads while positional arguments are still in the deprecation phase?

Is that a decision that has been made for the complete pandas API or just certain methods?

In your example, you use @positional_deprecation_warning_decorator . Is that something yet to be written?

twoertwein · 2022-02-08T13:48:13Z

Is that a decision that has been made for the complete pandas API or just certain methods?

I think currently, we don't do that at all (for example #44678 and #44896): we use Unions until we have keyword-only argument to then use overloads in the future. Personally, I think we could use keyword-only overloads for all functions/methods where we currently return a Union and where at the same time the positional keywords are being deprecated.

In your example, you use @positional_deprecation_warning_decorator . Is that something yet to be written?

Sorry, I meant the existing @deprecate_nonkeyword_arguments(version=None, allowed_args=["a"])

twoertwein · 2022-02-12T19:21:37Z

I changed the PR to use overloads for read_csv and read_table. I like @Dr-Irv's approach of using overloads (which require keyword-only arguments) while the implementation still supports positional arguments (but already emits a deprecation warning).

It seems that only four overloads are needed for the two read functions:

# iterator=True -> TextFileReader
@overload
def read_csv(
    buffer: str, *, iterator: Literal[True], chunksize: int | None = ...
) -> TextFileReader:
    ...


# chunksize=int -> TextFileReader
@overload
def read_csv(buffer: str, *, iterator: bool = ..., chunksize: int) -> TextFileReader:
    ...


# default -> DataFrame
@overload
def read_csv(
    buffer, *, iterator: Literal[False] = ..., chunksize: None = ...
) -> DataFrame:
    ...


# Union -> DataFrame | TextFileReader
@overload
def read_csv(
    buffer, *, iterator: bool = ..., chunksize: int | None = ...
) -> DataFrame | TextFileReader:
    ...

tested with mypy&pyright

from pandas import read_csv

# DataFrame
reveal_type(read_csv("file.csv"))
reveal_type(read_csv("file.csv", iterator=False))
reveal_type(read_csv("file.csv", iterator=False, chunksize=None))
reveal_type(read_csv("file.csv", chunksize=None))

# TextFileReader
reveal_type(read_csv("file.csv", iterator=True))
reveal_type(read_csv("file.csv", iterator=True, chunksize=None))
reveal_type(read_csv("file.csv", iterator=False, chunksize=1))
reveal_type(read_csv("file.csv", chunksize=1))


# DataFrame | TextFileReader
def int_None() -> int | None:
    ...


def bool_() -> bool:
    ...

reveal_type(read_csv("file.csv", chunksize=int_None()))
reveal_type(read_csv("file.csv", iterator=bool_()))
reveal_type(read_csv("file.csv", iterator=bool_(), chunksize=int_None()))

Dr-Irv · 2022-02-12T21:32:43Z

I changed the PR to use overloads for read_csv and read_table. I like @Dr-Irv's approach of using overloads (which require keyword-only arguments) while the implementation still supports positional arguments (but already emits a deprecation warning).

@twoertwein Glad you like the approach. :-)

twoertwein · 2022-02-12T23:50:31Z

It might be good to wait until #45594 is merged, as this PR will create new failure cases in the future type tests.

This PR should theoretically not interfere with #45594 anymore (unless I messed the overloads up).

Dr-Irv · 2022-02-13T17:47:31Z

This PR should theoretically not interfere with #45594 anymore (unless I messed the overloads up).

Separately, in the https://github.com/microsoft/python-type-stubs repo, these overloads are used, and tests for them are included. I think the plan is to take the tests (and stubs) that are there are put them in here, once we have converged to a reasonable set of tests over there.

simonjayhawkins

Thanks @twoertwein lgtm

jreback · 2022-03-02T01:55:42Z

thanks @twoertwein

twoertwein added the Typing type annotations, mypy/pyright type checking label Jan 25, 2022

twoertwein marked this pull request as draft January 27, 2022 22:57

twoertwein marked this pull request as ready for review February 12, 2022 19:23

simonjayhawkins approved these changes Feb 20, 2022

View reviewed changes

mroeschke added this to the 1.5 milestone Feb 22, 2022

TYP: return type of read_csv/table

a2e4d0b

jreback merged commit 31c553f into pandas-dev:main Mar 2, 2022

twoertwein deleted the read_csv branch March 9, 2022 02:56

twoertwein mentioned this pull request Apr 5, 2022

TYP: tighten return type in function any #46638

Merged

4 tasks

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022

TYP: return type of read_csv/table (pandas-dev#45610)

9d43a89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TYP: return type of read_csv/table #45610

TYP: return type of read_csv/table #45610

twoertwein commented Jan 25, 2022

twoertwein commented Jan 25, 2022 •

edited

Loading

jreback commented Jan 27, 2022

Dr-Irv commented Feb 4, 2022

twoertwein commented Feb 5, 2022

Dr-Irv commented Feb 7, 2022

twoertwein commented Feb 8, 2022 •

edited

Loading

Dr-Irv commented Feb 8, 2022

twoertwein commented Feb 8, 2022

twoertwein commented Feb 12, 2022

Dr-Irv commented Feb 12, 2022

twoertwein commented Feb 12, 2022

Dr-Irv commented Feb 13, 2022

simonjayhawkins left a comment

jreback commented Mar 2, 2022

TYP: return type of read_csv/table #45610

TYP: return type of read_csv/table #45610

Conversation

twoertwein commented Jan 25, 2022

twoertwein commented Jan 25, 2022 • edited Loading

jreback commented Jan 27, 2022

Dr-Irv commented Feb 4, 2022

twoertwein commented Feb 5, 2022

Dr-Irv commented Feb 7, 2022

twoertwein commented Feb 8, 2022 • edited Loading

Dr-Irv commented Feb 8, 2022

twoertwein commented Feb 8, 2022

twoertwein commented Feb 12, 2022

Dr-Irv commented Feb 12, 2022

twoertwein commented Feb 12, 2022

Dr-Irv commented Feb 13, 2022

simonjayhawkins left a comment

Choose a reason for hiding this comment

jreback commented Mar 2, 2022

twoertwein commented Jan 25, 2022 •

edited

Loading

twoertwein commented Feb 8, 2022 •

edited

Loading