Skip to content

BUG: DataFrame.to_pickle(bytes_io_buffer) is automatically closed internally #35679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
le1nux opened this issue Aug 11, 2020 · 2 comments · Fixed by #35686
Closed

BUG: DataFrame.to_pickle(bytes_io_buffer) is automatically closed internally #35679

le1nux opened this issue Aug 11, 2020 · 2 comments · Fixed by #35686
Labels
Bug IO Pickle read_pickle, to_pickle
Milestone

Comments

@le1nux
Copy link

le1nux commented Aug 11, 2020

Code Sample

import pandas as pd
import io

# create example DataFrame
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)

# load dataframe into bytesIO
buffer = io.BytesIO()
df.to_pickle(path=buffer)

# make sure buffer is still open
assert not buffer.closed

Problem description

Instead of dumping the binarized DataFrame on disk as e.g., by passing a file path or a file handle, I want to store its byte stream into an in memory bytesIO buffer, as shown above. Unfortunately, inside of to_pickle the bytesIO stream is already closed, thus rendering it useless. As far as I know the Python io API does not let you reopen a stream once it was closed.

Expected Output

In my opinion it makes more sense to leave it to the user, when to close the buffer, e.g., by using a context manager:

with io.BytesIO() as f:
    df.to_pickle(path=buffer)
   # do something with f 
@le1nux le1nux added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 11, 2020
@twoertwein
Copy link
Member

thank you for providing a minimal example!

You are right that to_pickle shouldn't close the file object (to_csv doesn't close the file object). Unless you want to work on a PR for that, I will look into that.

@twoertwein twoertwein added IO Pickle read_pickle, to_pickle and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 11, 2020
@le1nux
Copy link
Author

le1nux commented Aug 12, 2020

I looked into the issue last night for a bit and figured that there would have to be some adjustments around here. Variable f equals fp_or_buf after calling the get_handle function, which is closed at the end. While I'm not too much familiar with the codebase and it looks like various modules depend on this get_handle functionality, I suggest that you just go ahead with your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Pickle read_pickle, to_pickle
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants