Skip to content

BUG: Help python csv engine read binary buffers #27925

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 19, 2019
Merged

BUG: Help python csv engine read binary buffers #27925

merged 10 commits into from
Aug 19, 2019

Conversation

fiendish
Copy link
Contributor

The file buffer given to read_csv could have been opened in
binary mode, but the python csv reader errors on binary buffers.

The file buffer given to read_csv could have been opened in
binary mode, but the python csv reader errors on binary buffers.

closes #23779
@fiendish
Copy link
Contributor Author

fiendish commented Aug 15, 2019

What does Linux py37_np_dev know that I don't?

@jreback jreback added the IO CSV read_csv, to_csv label Aug 15, 2019
@@ -105,7 +105,7 @@ I/O
^^^

- Avoid calling ``S3File.s3`` when reading parquet, as this was removed in s3fs version 0.3.0 (:issue:`27756`)
-
- read_csv now accepts binary mode file buffers when using the Python csv engine (:issue:`23779`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use :meth:`read_csv` ; move this note to 1.0

@@ -296,3 +296,10 @@ def test_malformed_skipfooter(python_parser_only):
msg = "Expected 3 fields in line 4, saw 5"
with pytest.raises(ParserError, match=msg):
parser.read_csv(StringIO(data), header=1, comment="#", skipfooter=1)


def test_binary_buffer(python_parser_only, csv1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run this for both csv engines

# see gh-23779
parser = python_parser_only
with open(csv1, "rb") as f:
parser.read_csv(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert the result contents

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this test to parser/test_common.py see if you can co-locate with other buffer type tests.

test both csv engines,
assert equality between ascii and binary modes,
colocate with other buffer tests
@fiendish
Copy link
Contributor Author

Made the requested changes. Do you want a rebase to squash?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to squash.

@fiendish
Copy link
Contributor Author

fiendish commented Aug 15, 2019

Is it normal for random unrelated errors to fail builds?

@TomAugspurger
Copy link
Contributor

Yes, we're debugging it still. Restarted azure.

f = TextIOWrapper(f, encoding=encoding, newline="")
handles.append(f)
g = TextIOWrapper(f, encoding=encoding, newline="")
if not isinstance(f, no_close):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is hard to follow, can you just use BufferedIOBase here rather than no_close?

@fiendish
Copy link
Contributor Author

this is hard to follow, can you just use BufferedIOBase here rather than no_close?

I figured this way was more general in case there were future issues discovered beyond just BufferedIOBase, but consider it done.

@TomAugspurger
Copy link
Contributor

Thanks @fiendish!

@TomAugspurger TomAugspurger merged commit 6e0ab71 into pandas-dev:master Aug 19, 2019
EunSeop pushed a commit to EunSeop/pandas that referenced this pull request Aug 20, 2019
* BUG: Help python csv engine read binary buffers

The file buffer given to read_csv could have been opened in
binary mode, but the python csv reader errors on binary buffers.

closes pandas-dev#23779
galuhsahid pushed a commit to galuhsahid/pandas that referenced this pull request Aug 25, 2019
* BUG: Help python csv engine read binary buffers

The file buffer given to read_csv could have been opened in
binary mode, but the python csv reader errors on binary buffers.

closes pandas-dev#23779
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_csv c engine accepts binary mode data and python engine rejects it
3 participants