-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Help python csv engine read binary buffers #27925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The file buffer given to read_csv could have been opened in binary mode, but the python csv reader errors on binary buffers. closes #23779
What does |
doc/source/whatsnew/v0.25.1.rst
Outdated
@@ -105,7 +105,7 @@ I/O | |||
^^^ | |||
|
|||
- Avoid calling ``S3File.s3`` when reading parquet, as this was removed in s3fs version 0.3.0 (:issue:`27756`) | |||
- | |||
- read_csv now accepts binary mode file buffers when using the Python csv engine (:issue:`23779`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use :meth:`read_csv`
; move this note to 1.0
@@ -296,3 +296,10 @@ def test_malformed_skipfooter(python_parser_only): | |||
msg = "Expected 3 fields in line 4, saw 5" | |||
with pytest.raises(ParserError, match=msg): | |||
parser.read_csv(StringIO(data), header=1, comment="#", skipfooter=1) | |||
|
|||
|
|||
def test_binary_buffer(python_parser_only, csv1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run this for both csv engines
# see gh-23779 | ||
parser = python_parser_only | ||
with open(csv1, "rb") as f: | ||
parser.read_csv(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert the result contents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this test to parser/test_common.py see if you can co-locate with other buffer type tests.
test both csv engines, assert equality between ascii and binary modes, colocate with other buffer tests
Made the requested changes. Do you want a rebase to squash? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to squash.
Is it normal for random unrelated errors to fail builds? |
Yes, we're debugging it still. Restarted azure. |
pandas/io/common.py
Outdated
f = TextIOWrapper(f, encoding=encoding, newline="") | ||
handles.append(f) | ||
g = TextIOWrapper(f, encoding=encoding, newline="") | ||
if not isinstance(f, no_close): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is hard to follow, can you just use BufferedIOBase here rather than no_close?
I figured this way was more general in case there were future issues discovered beyond just BufferedIOBase, but consider it done. |
Thanks @fiendish! |
* BUG: Help python csv engine read binary buffers The file buffer given to read_csv could have been opened in binary mode, but the python csv reader errors on binary buffers. closes pandas-dev#23779
* BUG: Help python csv engine read binary buffers The file buffer given to read_csv could have been opened in binary mode, but the python csv reader errors on binary buffers. closes pandas-dev#23779
The file buffer given to read_csv could have been opened in
binary mode, but the python csv reader errors on binary buffers.
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff