-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Requiring default index for to_feather() is unintuitive #28208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
feather (which is just pyarrow under the hood) does not save the index, so round-tripping fails with a non-default index. this is a design decision of the format to make it simpler. parquet does allow saving of the index. |
Got it, thanks for the info |
would take a PR to update the docs |
Would take a sponsorship from NumFOCUS to send a PR to update the docs. |
@jmakov not for a simple PR |
Would be nice though, if this was possible. It would make the alignment of several data frames more fail-safe. |
@jreback does this reason still apply? Running the sample code block from above using |
@mikecoder5 you are welcome to submit a PR that tries that and will see what the CI says |
Quick update, pyarrow Feather doesn't seem to work for indexes that aren't a |
@mikecoder5 Can we reopen this issue now that ARROW-15018 has been fixed? |
@kartiksubbarao seeing some deltas - commented in ARROW-7914. We'll have to wait until those are resolved Also please feel free to reopen this issue, I don't think I have edit access here |
is this still the case where say a datetime index can not be serialized? |
Code Sample
Problem description
I saw documentation in the code requiring default index at:
pandas/pandas/io/feather_format.py
Lines 46 to 52 in 794be8c
However, I think it is unintuitive to require the user to have a default index before writing to feather (for example this is not a requirement to writing to csv). Why is this a requirement? What are your thoughts about reindexing being the default?
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Darwin
OS-release : 17.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.0
numpy : 1.17.0
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.0.1
Cython : None
pytest : 5.1.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 3.8.0
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.8.1
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 3.8.0
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.14.1
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.7
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
The text was updated successfully, but these errors were encountered: