Skip to content

BUG: do not stringify file-like objects #38141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 14, 2020
Merged

BUG: do not stringify file-like objects #38141

merged 1 commit into from
Dec 14, 2020

Conversation

twoertwein
Copy link
Member

I will finish this PR (fix the mypy issues) after #38018 is merged.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could try for 1.2 for this

# Union[IO[bytes], RawIOBase, BufferedIOBase, TextIOBase, TextIOWrapper,
# mmap]]]", expected "Union[str, Union[IO[bytes], RawIOBase, BufferedIOBase,
# TextIOBase, TextIOWrapper, mmap]]") [return-value]
return filepath_or_buffer # type: ignore[return-value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you not add an assert here (e.g. that its a Buffer)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer is a generic type, it seems that isinstance does work with generics. Unfortunately, we cannot do not isinstane(filepath_or_buffer, os.PathLike). I could add a cast but that is probably as good as the ignore statement.

# "str"; expected "Union[bytes, bytearray]" [arg-type]
# Argument 1 to "write" of "_mmap" has incompatible type "str";
# expected "bytes" [arg-type]
handles.handle.write(string) # type: ignore[arg-type]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you not assert that this is a string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be possible to partially bind the input buffer type of get_handle and its output using generics (string stays string (needed for this case), bytes might stay bytes, and bytes might be wrapped to string).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way to approach this typing issue would be to overload get_handle (one w/wo is_text) to then define a different return type when is_text is present (IOHandlesStr, otherwise IOHanldesBytes). I think the issue this PR is addressing a is quite a corner case (I couldn't find another example). I' happy to look into the typing part (which will take me some time), even if it doesn't make it for 1.2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this part from the PR. Except for having two functions get_handle_string and get_handle_bytes, there is no good way to grantee the caller to get a string/bytes handles. Even if we have two get_handles, it adds more code while removing only very few mypy-ignore statements.

@jreback jreback added Bug IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label labels Nov 29, 2020
@jreback
Copy link
Contributor

jreback commented Dec 2, 2020

can you merge master as merged the PathLike change

@twoertwein twoertwein marked this pull request as ready for review December 2, 2020 05:41
@jreback jreback added this to the 1.2 milestone Dec 12, 2020
Any other object is passed through unchanged, which includes bytes,
strings, buffers, or anything else that's not even path-like.
"""
if not convert_file_like and is_file_like(filepath_or_buffer):
# GH 38125: some fsspec objects implement os.PathLike but have already opened a
# file. This prevents opening the file a second time. infer_compression calls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a followup if can try to remove the incompatible return issue here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether that is possible: the function is supposed to not return os.PathLike. But there are objects (I only know of one) that are IO-like and os.PathLike. We cannot assert that it isn't os.PathLike and at the same time we cannot assert it is a Buffer (isinstance doesn't like that)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileOrBuffer[AnyStr]: is the return type. I just find all of the mypy warnigns really distracting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, sorry: isinstance still doesn't like FileOrBuffer[AnyStr]. I could simply cast it to FileOrBuffer[AnyStr] (the return value is FileOrBuffer[AnyStr]) and keep/remove the comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah would not like to add the typing comment (so pls update) and ping on green

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I thought I saw some issue to add these comments whenever mypy warning are ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure but ideally we do not want to ignore

instead do the proper asserts that mypy is happy

putting in the commrnts is just a temporary patch

Any other object is passed through unchanged, which includes bytes,
strings, buffers, or anything else that's not even path-like.
"""
if not convert_file_like and is_file_like(filepath_or_buffer):
# GH 38125: some fsspec objects implement os.PathLike but have already opened a
# file. This prevents opening the file a second time. infer_compression calls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileOrBuffer[AnyStr]: is the return type. I just find all of the mypy warnigns really distracting.

@twoertwein
Copy link
Member Author

@jreback ping

@jreback jreback merged commit ef75719 into pandas-dev:master Dec 14, 2020
@jreback
Copy link
Contributor

jreback commented Dec 14, 2020

thanks @twoertwein

@jreback
Copy link
Contributor

jreback commented Dec 14, 2020

@meeseeksdev backport 1.2.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: to/read_* do not use user-provided file handle if handle implements os.PathLike and also opened the file
2 participants