-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: do not stringify file-like objects #38141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could try for 1.2 for this
pandas/io/common.py
Outdated
# Union[IO[bytes], RawIOBase, BufferedIOBase, TextIOBase, TextIOWrapper, | ||
# mmap]]]", expected "Union[str, Union[IO[bytes], RawIOBase, BufferedIOBase, | ||
# TextIOBase, TextIOWrapper, mmap]]") [return-value] | ||
return filepath_or_buffer # type: ignore[return-value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you not add an assert here (e.g. that its a Buffer)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buffer
is a generic type, it seems that isinstance
does work with generics. Unfortunately, we cannot do not isinstane(filepath_or_buffer, os.PathLike)
. I could add a cast
but that is probably as good as the ignore statement.
pandas/io/formats/format.py
Outdated
# "str"; expected "Union[bytes, bytearray]" [arg-type] | ||
# Argument 1 to "write" of "_mmap" has incompatible type "str"; | ||
# expected "bytes" [arg-type] | ||
handles.handle.write(string) # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you not assert that this is a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be possible to partially bind the input buffer type of get_handle
and its output using generics (string stays string (needed for this case), bytes might stay bytes, and bytes might be wrapped to string).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the way to approach this typing issue would be to overload get_handle (one w/wo is_text
) to then define a different return type when is_text
is present (IOHandlesStr
, otherwise IOHanldesBytes
). I think the issue this PR is addressing a is quite a corner case (I couldn't find another example). I' happy to look into the typing part (which will take me some time), even if it doesn't make it for 1.2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this part from the PR. Except for having two functions get_handle_string
and get_handle_bytes
, there is no good way to grantee the caller to get a string/bytes handles. Even if we have two get_handles, it adds more code while removing only very few mypy-ignore statements.
can you merge master as merged the PathLike change |
Any other object is passed through unchanged, which includes bytes, | ||
strings, buffers, or anything else that's not even path-like. | ||
""" | ||
if not convert_file_like and is_file_like(filepath_or_buffer): | ||
# GH 38125: some fsspec objects implement os.PathLike but have already opened a | ||
# file. This prevents opening the file a second time. infer_compression calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in a followup if can try to remove the incompatible return issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether that is possible: the function is supposed to not return os.PathLike
. But there are objects (I only know of one) that are IO-like and os.PathLike
. We cannot assert that it isn't os.PathLike
and at the same time we cannot assert it is a Buffer
(isinstance
doesn't like that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FileOrBuffer[AnyStr]: is the return type. I just find all of the mypy warnigns really distracting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, sorry: isinstance
still doesn't like FileOrBuffer[AnyStr]
. I could simply cast it to FileOrBuffer[AnyStr]
(the return value is FileOrBuffer[AnyStr]
) and keep/remove the comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah would not like to add the typing comment (so pls update) and ping on green
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I thought I saw some issue to add these comments whenever mypy warning are ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure but ideally we do not want to ignore
instead do the proper asserts that mypy is happy
putting in the commrnts is just a temporary patch
Any other object is passed through unchanged, which includes bytes, | ||
strings, buffers, or anything else that's not even path-like. | ||
""" | ||
if not convert_file_like and is_file_like(filepath_or_buffer): | ||
# GH 38125: some fsspec objects implement os.PathLike but have already opened a | ||
# file. This prevents opening the file a second time. infer_compression calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FileOrBuffer[AnyStr]: is the return type. I just find all of the mypy warnigns really distracting.
@jreback ping |
thanks @twoertwein |
@meeseeksdev backport 1.2.x |
Co-authored-by: Torsten Wörtwein <[email protected]>
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
I will finish this PR (fix the mypy issues) after #38018 is merged.