-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH Add check for inferred compression before get_filepath_or_buffer
#11074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Add check for inferred compression before get_filepath_or_buffer
#11074
Conversation
pls change tests which are incorrect as well |
I made this PR so that it didn't break any tests. Are the parsers ever accessed outside of the |
the infer param can be moved higher up in the stack (eg in the get_filepath_or_buffer) - makes the readers simpler in that respect |
528d4f8
to
4d8c0c6
Compare
Found it. I actually didn't need to change any tests. Now the only check for file extensions happens in the |
gr8 |
f217181
to
5f97c7b
Compare
Added a test using the new files in s3://pandas-test/. |
Should I do anything else for this PR? |
can you add a whatsnew note for this |
5f97c7b
to
9aeb3b2
Compare
pls rebase. ping when green. |
f488d81
to
67f134b
Compare
…ffer` When reading CSVs, if `compression='infer'`, check the input before calling `get_filepath_or_buffer` in the `_read` function. This way we can catch compresion extensions on S3 files. We now attempt to infer compression from an input filename only in the `_read` function, instead of separately in each parser.
67f134b
to
a49b2cd
Compare
@jreback , green! I found had to tweak |
ENH Add check for inferred compression before `get_filepath_or_buffer`
thanks! |
When reading CSVs, if
compression='infer'
, check the input before callingget_filepath_or_buffer
in the_read
function. This way we can catch compresion extensions on S3 files. Partially resolves issue #11070 .Checking for the file extension in the
_read
function should make the checks inside the parsers redundant. When I tried to remove them, however, I discovered that there's tests which assume the parsers can take an "infer" compression, so I left their checks.I also discovered that the URL-reading code has a test which reads a URL ending in "gz" but which appears not to be gzip encoded, so this PR attempts to preserve its verdict in that case.