Skip to content

TYPING: more type hints for io.common #27732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 63 additions & 5 deletions pandas/io/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@
import mmap
import os
import pathlib
from typing import IO, AnyStr, BinaryIO, Optional, TextIO, Type
from pathlib import Path
from typing import IO, AnyStr, BinaryIO, Optional, TextIO, Tuple, Type, Union, overload
from urllib.error import URLError # noqa
from urllib.parse import ( # noqa
urlencode,
Expand Down Expand Up @@ -128,9 +129,33 @@ def _validate_header_arg(header) -> None:
)


def _stringify_path(
filepath_or_buffer: FilePathOrBuffer[AnyStr]
) -> FilePathOrBuffer[AnyStr]:
# Overload *variants* for '_stringify_path'.
# These variants give extra information to the type checker.
# They are ignored at runtime.


@overload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could be less strict and use casts or ignores instead where necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be hiding a bug then, these signatures should all be the same no? why do we need different signatures?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the signatures are the same. the types are different.

buf passed through unchanged (as typevar)

Path converted to string

so overload is so that if a Path is given a string is returned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see so this is essentially doing a Union of 2 TypeVars then. ok, can you give a couple of comments before the use of overload then to explain what is happening. my hesitation is that the next person to look at this will not understand w/o some hints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from typing import overload


class Foo:
    pass


class Bar:
    pass


@overload
def foo(x: Foo, y: Bar) -> str:
    ...


@overload
def foo(x: Bar, y: Foo) -> int:
    ...


def foo(x, y):
    if isinstance(x, Foo) and isinstance(y, Bar):
        return "foo"
    elif isinstance(x, Bar) and isinstance(y, Foo):
        return 42
    else:
        raise TypeError(x, y)


foo(Foo(), Bar())
foo(Bar(), Foo())
foo(Foo(), Foo())
foo(Bar(), Bar())
$ mypy test.py
test.py:33: error: No overload variant of "foo" matches argument types "Foo", "Foo"
test.py:33: note: Possible overload variants:
test.py:33: note:     def foo(x: Foo, y: Bar) -> str
test.py:33: note:     def foo(x: Bar, y: Foo) -> int
test.py:34: error: No overload variant of "foo" matches argument types "Bar", "Bar"
test.py:34: note: Possible overload variants:
test.py:34: note:     def foo(x: Foo, y: Bar) -> str
test.py:34: note:     def foo(x: Bar, y: Foo) -> int

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see so this is essentially doing a Union of 2 TypeVars then. ok, can you give a couple of comments before the use of overload then to explain what is happening. my hesitation is that the next person to look at this will not understand w/o some hints.

@jreback I imagine we could be using overload quite a bit. I'll add comments if you think necessary, but it may end up adding quite a bit of duplication with the docstrings over time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a small section in the contributing docs then of why / how we are using it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contributing Guide for Type Hints #27050 is not yet merged. added comments for now using comments from https://mypy.readthedocs.io/en/latest/more_types.html#function-overloading as boilerplate.

def _stringify_path(filepath_or_buffer: Union[str, Path]) -> str:
...


@overload
def _stringify_path(filepath_or_buffer: IO[AnyStr]) -> IO[AnyStr]:
...


# The actual *implementation* of '_stringify_path'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm somewhat confused by this comment. So does the overload fall back to maintaining the "generic" nature of items in the TypeVar that haven't been overloaded? I think that would be desirable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is copy and pasted from https://mypy.readthedocs.io/en/latest/more_types.html#function-overloading

When adding the overloads I removed the type annotations from the actual implementation to avoid duplication.

I guess this was the wrong thing to do. will add them back.

# The implementation contains the actual runtime logic.
#
# It may or may not have type hints. If it does, mypy
# will check the body of the implementation against the
# type hints.
#
# Mypy will also check and make sure the signature is
# consistent with the provided variants.


def _stringify_path(filepath_or_buffer):
"""Attempt to convert a path-like object to a string.

Parameters
Expand Down Expand Up @@ -176,11 +201,44 @@ def is_gcs_url(url) -> bool:
return False


# Overload *variants* for 'get_filepath_or_buffer'.
# These variants give extra information to the type checker.
# They are ignored at runtime.


@overload
def get_filepath_or_buffer(
filepath_or_buffer: FilePathOrBuffer,
filepath_or_buffer: IO[AnyStr],
encoding: Optional[str] = None,
compression: Optional[str] = None,
mode: Optional[str] = None,
) -> Tuple[IO[AnyStr], Optional[str], Optional[str], bool]:
...


@overload
def get_filepath_or_buffer(
filepath_or_buffer: Union[str, Path],
encoding: Optional[str] = None,
compression: Optional[str] = None,
mode: Optional[str] = None,
) -> Tuple[Union[str, IO], Optional[str], Optional[str], bool]:
...


# The actual *implementation* of 'get_filepath_or_buffer'.
# The implementation contains the actual runtime logic.
#
# It may or may not have type hints. If it does, mypy
# will check the body of the implementation against the
# type hints.
#
# Mypy will also check and make sure the signature is
# consistent with the provided variants.


def get_filepath_or_buffer(
filepath_or_buffer, encoding=None, compression=None, mode=None
):
"""
If the filepath_or_buffer is a url, translate and return the buffer.
Expand Down
2 changes: 1 addition & 1 deletion pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -458,7 +458,7 @@ def _read(filepath_or_buffer: FilePathOrBuffer, kwds):
finally:
parser.close()

if should_close:
if should_close and not isinstance(fp_or_buf, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this here and not correctly coming from get_filepath_or_buffer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_filepath_or_buffer can return a str.

as stated before, my preference would be to always add #type:ignore rather than change code unnecessarily. The code is currently type safe due to the try/except block, but mypy doesn't see that.

without the isinstance check, pandas\io\parsers.py:463: error: Item "str" of "Union[str, IO[Any]]" has no attribute "close"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is currently type safe due to the try/except block, but mypy doesn't see that.

sorry i'm wrong here.

it would be type safe if we used except (AttributeError, ValueError) but mypy would still report the same error and would need either an ignore or the isinstance check to pass.

at the moment should_close is never True when fp_or_buf is a string. but then again the purpose of type checking is to prevent bugs that may be added in the future.

maybe get_filepath_or_buffer should be converted to a context manager and handle the closing.

try:
fp_or_buf.close()
except ValueError:
Expand Down