-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
pandas read_csv no longer supports file-like objects from tarfile (pandas 0.20.1) #16530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Works as expected on windows with 0.20.1.
Which Python? You should include the |
|
Did you mean to close? If it isn't working for you, please reopen. Could be combination of Python version/OS. |
oups sorry, reopen. |
Working for me as well with python3 and master. @gjanvier can you make a fully reproducible example (including writing the csv and adding it to the tar) |
Sure, here is a test case. import tarfile
import pandas as pd
pd.show_versions()
data = pd.DataFrame(
data=[[1,2], [3,4]],
columns=['col1', 'col2']
)
print "data"
print data
print ""
data.to_csv('mydata.csv', sep="\t", index=False)
tar = tarfile.open('test.tar', 'w')
tar.add('mydata.csv')
tar.close()
tar = tarfile.open('test.tar', 'r')
myfile = tar.extractfile('mydata.csv')
data2 = pd.read_csv(myfile, sep=r'\s+')
print "data2"
print data2
print "" FYI, I run my tests in a docker container... Result with pandas 0.19.2 root@0a35b054b4da:xxxx# pip install pandas==0.19.2
Collecting pandas==0.19.2
Downloading pandas-0.19.2-cp27-cp27mu-manylinux1_x86_64.whl (17.2MB)
100% |################################| 17.2MB 25kB/s
Requirement already satisfied (use --upgrade to upgrade): pytz>=2011k in /usr/local/lib/python2.7/dist-packages (from pandas==0.19.2)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /usr/local/lib/python2.7/dist-packages (from pandas==0.19.2)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.7.0 in /usr/local/lib/python2.7/dist-packages (from pandas==0.19.2)
Requirement already satisfied (use --upgrade to upgrade): six>=1.5 in /usr/lib/python2.7/dist-packages (from python-dateutil->pandas==0.19.2)
Installing collected packages: pandas
Found existing installation: pandas 0.20.1
Uninstalling pandas-0.20.1:
Successfully uninstalled pandas-0.20.1
Successfully installed pandas-0.19.2
You are using pip version 8.1.1, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
root@0a35b054b4da:xxxx# python test_tar_pd.py
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-78-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.19.2
nose: None
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
statsmodels: 0.8.0
xarray: None
IPython: 5.3.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.10.3
apiclient: 1.6.2
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
data
col1 col2
0 1 2
1 3 4
data2
col1 col2
0 1 2
1 3 4 Result with pandas 0.20.1 root@0a35b054b4da:xxx# pip install pandas==0.20.1
Collecting pandas==0.20.1
Downloading pandas-0.20.1-cp27-cp27mu-manylinux1_x86_64.whl (22.3MB)
100% |################################| 22.3MB 19kB/s
Requirement already satisfied (use --upgrade to upgrade): pytz>=2011k in /usr/local/lib/python2.7/dist-packages (from pandas==0.20.1)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /usr/local/lib/python2.7/dist-packages (from pandas==0.20.1)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.7.0 in /usr/local/lib/python2.7/dist-packages (from pandas==0.20.1)
Requirement already satisfied (use --upgrade to upgrade): six>=1.5 in /usr/lib/python2.7/dist-packages (from python-dateutil->pandas==0.20.1)
Installing collected packages: pandas
Found existing installation: pandas 0.19.2
Uninstalling pandas-0.19.2:
Successfully uninstalled pandas-0.19.2
Successfully installed pandas-0.20.1
You are using pip version 8.1.1, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
root@0a35b054b4da:xxx# python test_tar_pd.py
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-78-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.1
pytest: None
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
data
col1 col2
0 1 2
1 3 4
Traceback (most recent call last):
File "test_tar_pd.py", line 23, in <module>
data2 = pd.read_csv(myfile, sep=r'\s+')
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 392, in _read
filepath_or_buffer, encoding, compression)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/common.py", line 210, in get_filepath_or_buffer
raise ValueError(msg.format(_type=type(filepath_or_buffer)))
ValueError: Invalid file path or buffer object type: <class 'tarfile.ExFileObject'> |
IIRC @gfyoung fixed this by patching |
hmm, so maybe this IS an issue on py2.7? maybe |
This is indeed a compatibility issue. Turns out I guess we just need to check for the |
yeah maybe relax in the is_file_like only |
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
@jreback : So the C engine doesn't require that the file-like have a |
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
Also, I should add reading |
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
@gfyoung - I hit the same issue when trying to pass Luigi's My understanding is that that while this is not required to be true for a file-like:
What is required is that it produce an iterator with a next method.
So if pandas wants to use |
But I'm not sure this is explicitly defined anywhere, so much as it looks like an in-practice kind of thing. |
@jtratner : For an object to be file-like, I am proposing that the object just have an Interesting idea. Worth pursuing once this issue gets resolved. |
@gfyoung - cool I agree with your definition :) - just was reinforcing that the definition of file-like as "has iter" is upheld many places but not having the |
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
Tarfile.ExFileObject has no "next" method in Python 2.x, making it an invalid file-like object in read_csv. However, they can be read in just fine, meaning our check is too strict for file-like. This commit relaxes the check to just look for "__iter__". Closes pandas-devgh-16530.
Code Sample, a copy-pastable example if possible
This code generates this error:
Problem description
This code works with pandas 0.19.2 but fails with 0.20.1.
According to pandas doc for read_csv:
I guess the new validations are too restrictive ?
The text was updated successfully, but these errors were encountered: