-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYPING: more type hints for io.common #27732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
718884e
bc4f22c
9d1f8dc
011494f
7ee921a
d83a754
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,7 @@ | |
import mmap | ||
import os | ||
import pathlib | ||
from pathlib import Path | ||
from typing import ( | ||
IO, | ||
Any, | ||
|
@@ -21,6 +22,7 @@ | |
Tuple, | ||
Type, | ||
Union, | ||
overload, | ||
) | ||
from urllib.error import URLError # noqa | ||
from urllib.parse import ( # noqa | ||
|
@@ -140,9 +142,33 @@ def _validate_header_arg(header) -> None: | |
) | ||
|
||
|
||
def _stringify_path( | ||
filepath_or_buffer: FilePathOrBuffer[AnyStr] | ||
) -> FilePathOrBuffer[AnyStr]: | ||
# Overload *variants* for '_stringify_path'. | ||
# These variants give extra information to the type checker. | ||
# They are ignored at runtime. | ||
|
||
|
||
@overload | ||
def _stringify_path(filepath_or_buffer: Union[str, Path]) -> str: | ||
... | ||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
@overload | ||
def _stringify_path(filepath_or_buffer: IO[AnyStr]) -> IO[AnyStr]: | ||
WillAyd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
... | ||
|
||
|
||
# The actual *implementation* of '_stringify_path'. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm somewhat confused by this comment. So does the overload fall back to maintaining the "generic" nature of items in the TypeVar that haven't been overloaded? I think that would be desirable There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment is copy and pasted from https://mypy.readthedocs.io/en/latest/more_types.html#function-overloading When adding the overloads I removed the type annotations from the actual implementation to avoid duplication. I guess this was the wrong thing to do. will add them back. |
||
# The implementation contains the actual runtime logic. | ||
# | ||
# It may or may not have type hints. If it does, mypy | ||
# will check the body of the implementation against the | ||
# type hints. | ||
# | ||
# Mypy will also check and make sure the signature is | ||
# consistent with the provided variants. | ||
|
||
|
||
def _stringify_path(filepath_or_buffer: FilePathOrBuffer): | ||
"""Attempt to convert a path-like object to a string. | ||
|
||
Parameters | ||
|
@@ -188,6 +214,42 @@ def is_gcs_url(url) -> bool: | |
return False | ||
|
||
|
||
# Overload *variants* for 'get_filepath_or_buffer'. | ||
# These variants give extra information to the type checker. | ||
# They are ignored at runtime. | ||
|
||
|
||
@overload | ||
def get_filepath_or_buffer( | ||
filepath_or_buffer: IO[AnyStr], | ||
encoding: Optional[str] = None, | ||
compression: Optional[str] = None, | ||
mode: Optional[str] = None, | ||
) -> Tuple[IO[AnyStr], Optional[str], Optional[str], bool]: | ||
... | ||
|
||
|
||
@overload | ||
def get_filepath_or_buffer( | ||
filepath_or_buffer: Union[str, Path], | ||
encoding: Optional[str] = None, | ||
compression: Optional[str] = None, | ||
mode: Optional[str] = None, | ||
) -> Tuple[Union[str, IO], Optional[str], Optional[str], bool]: | ||
... | ||
|
||
|
||
# The actual *implementation* of 'get_filepath_or_buffer'. | ||
# The implementation contains the actual runtime logic. | ||
# | ||
# It may or may not have type hints. If it does, mypy | ||
# will check the body of the implementation against the | ||
# type hints. | ||
# | ||
# Mypy will also check and make sure the signature is | ||
# consistent with the provided variants. | ||
|
||
|
||
def get_filepath_or_buffer( | ||
filepath_or_buffer: FilePathOrBuffer, | ||
encoding: Optional[str] = None, | ||
|
@@ -213,10 +275,10 @@ def get_filepath_or_buffer( | |
compression, str, | ||
should_close, bool) | ||
""" | ||
filepath_or_buffer = _stringify_path(filepath_or_buffer) | ||
fp_or_buf = _stringify_path(filepath_or_buffer) | ||
|
||
if isinstance(filepath_or_buffer, str) and _is_url(filepath_or_buffer): | ||
req = urlopen(filepath_or_buffer) | ||
if isinstance(fp_or_buf, str) and _is_url(fp_or_buf): | ||
req = urlopen(fp_or_buf) | ||
content_encoding = req.headers.get("Content-Encoding", None) | ||
if content_encoding == "gzip": | ||
# Override compression based on Content-Encoding header | ||
|
@@ -225,28 +287,28 @@ def get_filepath_or_buffer( | |
req.close() | ||
return reader, encoding, compression, True | ||
|
||
if is_s3_url(filepath_or_buffer): | ||
if is_s3_url(fp_or_buf): | ||
from pandas.io import s3 | ||
|
||
return s3.get_filepath_or_buffer( | ||
filepath_or_buffer, encoding=encoding, compression=compression, mode=mode | ||
fp_or_buf, encoding=encoding, compression=compression, mode=mode | ||
) | ||
|
||
if is_gcs_url(filepath_or_buffer): | ||
if is_gcs_url(fp_or_buf): | ||
from pandas.io import gcs | ||
|
||
return gcs.get_filepath_or_buffer( | ||
filepath_or_buffer, encoding=encoding, compression=compression, mode=mode | ||
fp_or_buf, encoding=encoding, compression=compression, mode=mode | ||
) | ||
|
||
if isinstance(filepath_or_buffer, (str, bytes, mmap.mmap)): | ||
return _expand_user(filepath_or_buffer), None, compression, False | ||
if isinstance(fp_or_buf, (str, bytes, mmap.mmap)): | ||
return _expand_user(fp_or_buf), None, compression, False | ||
|
||
if not is_file_like(filepath_or_buffer): | ||
if not is_file_like(fp_or_buf): | ||
msg = "Invalid file path or buffer object type: {_type}" | ||
raise ValueError(msg.format(_type=type(filepath_or_buffer))) | ||
raise ValueError(msg.format(_type=type(fp_or_buf))) | ||
|
||
return filepath_or_buffer, None, compression, False | ||
return fp_or_buf, None, compression, False | ||
|
||
|
||
def file_path_to_url(path: str) -> str: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -458,7 +458,7 @@ def _read(filepath_or_buffer: FilePathOrBuffer, kwds): | |
finally: | ||
parser.close() | ||
|
||
if should_close: | ||
if should_close and not isinstance(fp_or_buf, str): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this here and not correctly coming from get_filepath_or_buffer? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
as stated before, my preference would be to always add without the isinstance check, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
sorry i'm wrong here. it would be type safe if we used at the moment maybe |
||
try: | ||
fp_or_buf.close() | ||
except ValueError: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://stackoverflow.com/questions/37553551/3-5-type-hinting-and-method-overloading
is there a reason we need overload here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could be less strict and use casts or ignores instead where necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be hiding a bug then, these signatures should all be the same no? why do we need different signatures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the signatures are the same. the types are different.
buf passed through unchanged (as typevar)
Path converted to string
so overload is so that if a Path is given a string is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see so this is essentially doing a Union of 2 TypeVars then. ok, can you give a couple of comments before the use of overload then to explain what is happening. my hesitation is that the next person to look at this will not understand w/o some hints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I imagine we could be using overload quite a bit. I'll add comments if you think necessary, but it may end up adding quite a bit of duplication with the docstrings over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a small section in the contributing docs then of why / how we are using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contributing Guide for Type Hints #27050 is not yet merged. added comments for now using comments from https://mypy.readthedocs.io/en/latest/more_types.html#function-overloading as boilerplate.