Skip to content

ENH: Add dtype-support for pandas' type-hinting #34248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
erezinman opened this issue May 19, 2020 · 5 comments
Closed

ENH: Add dtype-support for pandas' type-hinting #34248

erezinman opened this issue May 19, 2020 · 5 comments
Labels
Enhancement Typing type annotations, mypy/pyright type checking

Comments

@erezinman
Copy link

It would be great if you could annotate a series, for example with -

def process_series(ser: pd.Series[int, str]) -> pd.Series[int, int]:

to indicate that the function should accept a string-valued series with an integer index, and output
int-valued series with an integer index.
It could be even better if you would build on that such that the series member's types could be inferred using TypeVars (for example, pd.Series[int,str].iloc would accept an integer and return a string), but that's not necessary - just a bonus or a later milestone.

To do that, you could add a new module (maybe "pandas.typing"?) that would contain these type-hints and would require minimal integration (if any) into the pandas' infrastructure. There's a similar package for "numpy" that's external to it called nptyping that could be used as a reference.

@erezinman erezinman added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels May 19, 2020
@simonjayhawkins simonjayhawkins added Typing type annotations, mypy/pyright type checking and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 19, 2020
@simonjayhawkins
Copy link
Member

Thanks @erezinman for the report. Adding this would certainly be easier with stubs, but we have not yet reached a decision on stubs, xref #28142.

To do this with type annotations in the code, we would need to use typing.Generic

Thanks for the link. I suspect we would follow NumPy conventions for the type parameters. https://github.com/numpy/numpy-stubs

This might be as simple as writing np.ndarray[np.float64], but will need a decision about appropriate syntax for shape typing to ensure that this is forwards compatible with typing shapes.

In pandas we would also need to allow Series to be backed by pandas Extension Arrays for the values and index, so I suspect the type parameters would be numpy/pandas extension dtypes and not Python types

so maybe something like pd.Series[pd.Int64Dtype(),pd.StringDtype()].

This would be cumbersome so having pd.Series[int, str]) represent the same thing would certainly be welcome from a user perspective.

Further investigation and PRs welcome.

@simonjayhawkins
Copy link
Member

fixed in pandas-dev/pandas-stubs@ba7aa5f

@erezinman
Copy link
Author

Hi, @simonjayhawkins
Could you please elaborate how this commit solves the issue? I looked at the referred commit and couldn't see how.
Thanks.

@simonjayhawkins
Copy link
Member

IIUC Series and Index are now generic wrt dtype in https://github.com/pandas-dev/pandas-stubs

@Dr-Irv is this documented anywhere how to use these?

For DataFrame (not explicitly mentioned in this issue), there is an open issue pandas-dev/pandas-stubs#295

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Feb 23, 2023

IIUC Series and Index are now generic wrt dtype in https://github.com/pandas-dev/pandas-stubs

Right now, only Series is generic. We have open issues to investigate making Index generic.

@Dr-Irv is this documented anywhere how to use these?

What we have written up is here: https://github.com/pandas-dev/pandas-stubs/blob/main/docs/philosophy.md#use-of-generic-types

Note that if you want to specify an annotation like def f(x: Series[int]) -> None, you probably have to surround the type declaration with quotes, i.e. def f(x: "Series[int]") -> None because at runtime, the generic declaration is unknown.

It's worth mentioning that the pandera project supports the generic types at both a typing level and at runtime. See https://pandera.readthedocs.io/en/stable/schema_models.html . I haven't used this, but was made aware of it by the pandera authors when they reported some issues with pandas-stubs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

No branches or pull requests

4 participants