Skip to content

Requiring default index for to_feather() is unintuitive #28208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
drkarthi opened this issue Aug 28, 2019 · 13 comments
Closed

Requiring default index for to_feather() is unintuitive #28208

drkarthi opened this issue Aug 28, 2019 · 13 comments

Comments

@drkarthi
Copy link

drkarthi commented Aug 28, 2019

Code Sample

import pandas as pd
df = pd.DataFrame({'a': [1,2,3,4], 'b': [5,6,7,8]} )
df = df.drop(1)
df.to_feather("test.feather")

Problem description

I saw documentation in the code requiring default index at:

if not df.index.equals(RangeIndex.from_range(range(len(df)))):
raise ValueError(
"feather does not support serializing a "
"non-default index for the index; you "
"can .reset_index() to make the index "
"into column(s)"
)

However, I think it is unintuitive to require the user to have a default index before writing to feather (for example this is not a requirement to writing to csv). Why is this a requirement? What are your thoughts about reindexing being the default?

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Darwin
OS-release : 17.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.0
numpy : 1.17.0
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.0.1
Cython : None
pytest : 5.1.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 3.8.0
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.8.1
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 3.8.0
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.14.1
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.7
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

@jreback
Copy link
Contributor

jreback commented Aug 29, 2019

feather (which is just pyarrow under the hood) does not save the index, so round-tripping fails with a non-default index. this is a design decision of the format to make it simpler. parquet does allow saving of the index.

@jreback jreback closed this as completed Aug 29, 2019
@jreback jreback added this to the No action milestone Aug 29, 2019
@drkarthi
Copy link
Author

Got it, thanks for the info

@jmakov
Copy link
Contributor

jmakov commented Oct 3, 2021

@jreback would be great to have that in the docs.

@jreback
Copy link
Contributor

jreback commented Oct 3, 2021

would take a PR to update the docs

@jmakov
Copy link
Contributor

jmakov commented Oct 3, 2021

Would take a sponsorship from NumFOCUS to send a PR to update the docs.

@jreback
Copy link
Contributor

jreback commented Oct 3, 2021

@jmakov not for a simple PR

@JohannesWiesner
Copy link

Would be nice though, if this was possible. It would make the alignment of several data frames more fail-safe.

@mikecoder5
Copy link

@jreback does this reason still apply? Running the sample code block from above using pyarrow.feather.write_feather() seems to now work as expected. Can we delete the default index check from Pandas feather_format.py?

@jreback
Copy link
Contributor

jreback commented Dec 2, 2021

@mikecoder5 you are welcome to submit a PR that tries that and will see what the CI says

@mikecoder5
Copy link

Quick update, pyarrow Feather doesn't seem to work for indexes that aren't a RangeIndex, filed an issue on the Apache Arrow project. If the team decides to address that request, then we can move forward with the suggestion to remove the index checks from Pandas.

@kartiksubbarao
Copy link

@mikecoder5 Can we reopen this issue now that ARROW-15018 has been fixed?

@mikecoder5
Copy link

mikecoder5 commented Jun 1, 2022

@kartiksubbarao seeing some deltas - commented in ARROW-7914. We'll have to wait until those are resolved

Also please feel free to reopen this issue, I don't think I have edit access here

@dss010101
Copy link

is this still the case where say a datetime index can not be serialized?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants