RangeIndex is converted to Int64Index on save to HDF5 (to_hdf) #19997

vfilimonov · 2018-03-05T17:17:46Z

Hello, I'm not sure if it is an intended behavior or not, and I did not find any mention about this in the documentation or in the github issue tracker. I'm filing it - just in case it was not planned to work this way.

Problem description

On save to HDF5 file RangeIndex of pandas.DataFrame is converted to Int64Index (which could add quite some to the stored space for the long tables).

df = pd.DataFrame(np.random.randn(1000,2))
df.index

results in RangeIndex(start=0, stop=1000, step=1)

Then

df.to_hdf('tmp.h5', 'df')
df = pd.read_hdf('tmp.h5', 'df')
df.index

results in Int64Index([ 0, 1, ..., 999], dtype='int64', length=1000)

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.1
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

max-sixty · 2018-03-05T18:21:02Z

Is there a more efficient way of representing a range in HDF5?

jreback · 2018-03-05T18:37:16Z

duplicated of #8319

its not worth it trying to finese, this, rather just have an option to turn it off

jreback · 2018-03-05T18:37:40Z

PR's to fix are welcome!

jreback closed this as completed Mar 5, 2018

jreback added IO HDF5 read_hdf, HDFStore Duplicate Report Duplicate issue or pull request labels Mar 5, 2018

jreback added this to the No action milestone Mar 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RangeIndex is converted to Int64Index on save to HDF5 (to_hdf) #19997

RangeIndex is converted to Int64Index on save to HDF5 (to_hdf) #19997

vfilimonov commented Mar 5, 2018

INSTALLED VERSIONS

max-sixty commented Mar 5, 2018

jreback commented Mar 5, 2018

jreback commented Mar 5, 2018

RangeIndex is converted to Int64Index on save to HDF5 (to_hdf) #19997

RangeIndex is converted to Int64Index on save to HDF5 (to_hdf) #19997

Comments

vfilimonov commented Mar 5, 2018

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

max-sixty commented Mar 5, 2018

jreback commented Mar 5, 2018

jreback commented Mar 5, 2018

Output of `pd.show_versions()`