Skip to content

REGR: SpooledTemporaryFile support in read_csv #43447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 10, 2021
Merged

REGR: SpooledTemporaryFile support in read_csv #43447

merged 3 commits into from
Sep 10, 2021

Conversation

twoertwein
Copy link
Member

Probably should throw a warning, if we cannot wrap the handle with io.TextIOWrapper.

To avoid the try-except, we could check whether the handle has the readable attribute, but there might be other attribute errors that might occur inside TextIOWrapper.

@pep8speaks
Copy link

pep8speaks commented Sep 7, 2021

Hello @twoertwein! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-09-09 21:59:11 UTC

@simonjayhawkins
Copy link
Member

Thanks @twoertwein. Is this targeting 1.3.x?

@twoertwein
Copy link
Member Author

Thanks @twoertwein. Is this targeting 1.3.x?

It should be, I'll probably have time this evening to finish it up.

I'm not yet sure whether try-except is a good solution here. Another option (maybe more for 1.4), would be to add a new keyword to read_csv to set the mode (to_csv has that currently). Then a user could overwrite the binary/text auto-detection by specifying mode="rt" or mode="rb".

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions IO CSV read_csv, to_csv labels Sep 9, 2021
@jreback jreback added this to the 1.3.3 milestone Sep 9, 2021
@twoertwein
Copy link
Member Author

Instead of using a try-except block, I added tempfile.SpooledTemporaryFile to the list of classes that should always be treated as text handles.

I'm not sure whether #43439 is a regression: yes, it worked previously, but it only worked because of inconsistencies between the c-engine and python-engine. This PR allows this inconsistency to continue (I'm not a fan of this). It is great that the c-engine is not only faster than the python-engine, but also doesn't seem to need theio.TextIOWrapper (skipping the wrapper for the c-engine might actually give a slight performance improvement).

I would be slightly in favor of not fixing #43439 for 1.3.3 and instead adding a new keyword argument ("mode", same as in to_csv) in 1.4 for read_csv which allows the user to over-write the text/binary auto-detection. (And a PR for 1.4 to make the c-engine "dumber": if it is told to get strings, but it gets bytes, it should fail.).

If it is preferred to fix #43439 for 1.3.3, I'm happy to make this happen (I think this PR should be ready).

@jreback
Copy link
Contributor

jreback commented Sep 9, 2021

oh new commit way nicer

@twoertwein twoertwein marked this pull request as ready for review September 9, 2021 23:00
@jreback jreback merged commit c4b43cc into pandas-dev:master Sep 10, 2021
@jreback
Copy link
Contributor

jreback commented Sep 10, 2021

very nice @twoertwein

@jreback
Copy link
Contributor

jreback commented Sep 10, 2021

@meeseeksdev backport 1.3.x

@lumberbot-app
Copy link

lumberbot-app bot commented Sep 10, 2021

Something went wrong ... Please have a look at my logs.

simonjayhawkins pushed a commit that referenced this pull request Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: SpooledTemporaryFile no longer working with read_csv
5 participants