We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
# Your code here import pandas as pd df = pd.DataFrame({ "A": [1, 2], "B": ["x", "y"], "C": [True, False] }) df.to_feather("./test_data.feather") df2 = pd.read_feather("./test_data.feather", columns=['B', 'A'])
ArrowInvalid Traceback (most recent call last) <ipython-input-4-1e23cf201732> in <module> 15 16 ---> 17 df2 = pd.read_feather("/misc/labshare/datasets3/rating/data/preprocessing/tests/test_data.feather", columns=['B', 'A']) ~/.conda/envs/venv/lib/python3.6/site-packages/pandas/io/feather_format.py in read_feather(path, columns, use_threads) 101 path = stringify_path(path) 102 --> 103 return feather.read_feather(path, columns=columns, use_threads=bool(use_threads)) ~/.conda/envs/venv/lib/python3.6/site-packages/pyarrow/feather.py in read_feather(source, columns, use_threads, memory_map) 206 """ 207 _check_pandas_version() --> 208 return (read_table(source, columns=columns, memory_map=memory_map) 209 .to_pandas(use_threads=use_threads)) 210 ~/.conda/envs/venv/lib/python3.6/site-packages/pyarrow/feather.py in read_table(source, columns, memory_map) 237 return reader.read_indices(columns) 238 elif all(map(lambda t: t == str, column_types)): --> 239 return reader.read_names(columns) 240 241 column_type_names = [t.__name__ for t in column_types] ~/.conda/envs/venv/lib/python3.6/site-packages/pyarrow/feather.pxi in pyarrow.lib.FeatherReader.read_names() ~/.conda/envs/venv/lib/python3.6/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowInvalid: Schema at index 0 was different: B: string A: int64 vs A: int64 B: string
We don't always know the order in which our columns are. The issue is when we update pyarrow to 0.17.0
This line work fine:
df2 = pd.read_feather("./test_data.feather", columns=['B', 'A'])
Should we apply a fix here or in the pyarrow repository ?
df2 = pd.DataFrame({ "A": [1, 2], "B": ["x", "y"], })
pd.show_versions()
commit : None python : 3.6.7.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-91-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.0.3 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 46.1.3 Cython : 0.29.15 pytest : 5.3.2 hypothesis : 5.5.4 sphinx : 2.2.0 blosc : None feather : None xlsxwriter : 1.2.7 lxml.etree : 4.5.0 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.12.0 pandas_datareader: None bs4 : 4.8.2 bottleneck : 1.3.1 fastparquet : 0.3.3 gcsfs : None lxml.etree : 4.5.0 matplotlib : 3.1.3 numexpr : None odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : 0.17.0 pytables : None pytest : 5.3.2 pyxlsb : None s3fs : None scipy : 1.2.3 sqlalchemy : 1.3.13 tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.2.7 numba : 0.48.0
The text was updated successfully, but these errors were encountered:
@Benjamin15 Thanks a lot for the report! This is indeed a regression. I opened an issue for this on the Arrow side (since the bug is in the latest pyarrow 0.17 release): https://issues.apache.org/jira/browse/ARROW-8641
Sorry, something went wrong.
Closed by #34883
No branches or pull requests
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Error message
Problem description
We don't always know the order in which our columns are.
The issue is when we update pyarrow to 0.17.0
This line work fine:
Should we apply a fix here or in the pyarrow repository ?
Expected Output
df2 = pd.DataFrame({
"A": [1, 2],
"B": ["x", "y"],
})
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.6.7.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-91-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3
Cython : 0.29.15
pytest : 5.3.2
hypothesis : 5.5.4
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : 0.3.3
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.17.0
pytables : None
pytest : 5.3.2
pyxlsb : None
s3fs : None
scipy : 1.2.3
sqlalchemy : 1.3.13
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.48.0
The text was updated successfully, but these errors were encountered: