Skip to content

BUG: to/read_* do not use user-provided file handle if handle implements os.PathLike and also opened the file #38125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
twoertwein opened this issue Nov 27, 2020 · 0 comments · Fixed by #38141
Closed
2 of 3 tasks
Labels
Bug IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@twoertwein
Copy link
Member

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas (theoretically affected as well but my example doesn't work for <1.2 (need binary file handle) other examples should trigger this bug in <1.2).

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import os

import fsspec
import pandas as pd

# create a 'normal' file handle
with open("abc.test", mode="w") as open_obj:
    assert not isinstance(open_obj, os.PathLike)  # is not converted to a string
    position = open_obj.tell()

    # let to_csv write to the opened file
    pd.DataFrame({"a": [1, 2, 3]}).to_csv(open_obj)

    # the position of the file buffer should have changed if to_csv used it
    assert open_obj.tell() != position


# create a file handle that also implements os.PathLike/has __fspath__
fsspec_obj = fsspec.open("file://abc.test", mode="wb").open()
with fsspec_obj:
    assert isinstance(fsspec_obj, os.PathLike)  # is converted to a string
    position = fsspec_obj.tell()

    # let to_csv write to the opened file
    pd.DataFrame({"a": [1, 2, 3]}).to_csv(fsspec_obj)

    # the position of the file buffer should have changed if to_csv used it
    assert fsspec_obj.tell() != position  # fails

Problem description

get_filepath_or_buffer (<1.2) or get_handle (1.2) call stringify_path to convert pathlib.Path and other os.PathLike to a string. This string is then later opened. It seems that there is at least one file object that implements os.PathLike but at the same time already opens the file. In this case case, all to/read_* that use get_handle (or get_filepath_or_buffer in <1.2) extract the string and then open the file (even though the user already opened it).

I'm not sure whether there are other examples. I will look into how to fix this.

@twoertwein twoertwein added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 27, 2020
@alimcmaster1 alimcmaster1 added IO Data IO issues that don't fit into a more specific label and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 28, 2020
@jreback jreback added this to the 1.2 milestone Dec 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants