-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
is_file_like requirements are too strict for boto3 S3 objects #16135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
1) Allows for more uniform handling of invalid file buffers to our `read_*` functions. 2) Adds a ton of new documentation to `inference.py` Closes #15337. xref #15895. Author: gfyoung <[email protected]> Closes #15894 from gfyoung/validate-file-like and squashes the following commits: 5a8f8da [gfyoung] DOC: Document all of inference.py 81103f7 [gfyoung] ENH: Add file buffer validation to I/O ops
is_file_like is correct |
As far as I saw the read_csv uses the input as a buffer and it is only using the read and the close methods. It would be nice to have a less strict is_readonly_buffer_like helper function for read validation (and maybe an is_writeonly_buffer_like helper function for write validation). |
Given this breakage, I think we can relax the requirements in two ways:
class BadStringIO(StringIO):
def tell(self):
raise NotImplementedError("nope")
def seek(self, pos, whence=0):
raise NotImplementedError("nope")
>>> data = "a\n1"
>>> read_csv(BadStringIO(data), engine="c")
a
0 1
>>>
>>> read_csv(BadStringIO(data), engine="python")
a
0 1 |
I guess we could change |
@djlancelot can you provide a test based on one of our s3 buckets that replicates this e.g. a streaming version of this |
Previously, we were requiring that all file-like objects had "read," "write," "seek," and "tell" methods, but that was too strict (e.g. read-only buffers). This commit relaxes those requirements to having EITHER "read" or "write" as attributes. Closes pandas-devgh-16135.
Previously, we were requiring that all file-like objects had "read," "write," "seek," and "tell" methods, but that was too strict (e.g. read-only buffers). This commit relaxes those requirements to having EITHER "read" or "write" as attributes. Closes pandas-devgh-16135.
Previously, we were requiring that all file-like objects had "read," "write," "seek," and "tell" methods, but that was too strict (e.g. read-only buffers). This commit relaxes those requirements to having EITHER "read" or "write" as attributes. Closes pandas-devgh-16135.
Previously, we were requiring that all file-like objects had "read," "write," "seek," and "tell" methods, but that was too strict (e.g. read-only buffers). This commit relaxes those requirements to having EITHER "read" or "write" as attributes. Closes pandas-devgh-16135.
Previously, we were requiring that all file-like objects had "read," "write," "seek," and "tell" methods, but that was too strict (e.g. read-only buffers). This commit relaxes those requirements to having EITHER "read" or "write" as attributes. Closes gh-16135.
Previously, we were requiring that all file-like objects had "read," "write," "seek," and "tell" methods, but that was too strict (e.g. read-only buffers). This commit relaxes those requirements to having EITHER "read" or "write" as attributes. Closes pandas-devgh-16135.
Code Sample, a copy-pastable example if possible
Problem description
When trying to read csv files from AWS S3 directly using boto3, the returned botocore.response. StreamingBody object does not have all the neccessary methods required in is_file_like function of pandas.core.dtypes.inference package. Although the pd.read_csv call worked flawlessly before, since the commit 20 days ago ( e4e87ec ) our app is broken.
See botocore reference ( http://botocore.readthedocs.io/en/latest/reference/response.html ) for details on the StreamingBody class.
The issue is caused by the too strict constraint in is_file_like function ( https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/inference.py#L140 )
Expected Output
No error should be raised if read method is available on read operations.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Darwin
OS-release: 16.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.0rc1
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.1
Cython: None
numpy: 1.12.0
scipy: 0.19.0
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: