Skip to content

Add track_times flag for HDFStore put method #32682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
krysma opened this issue Mar 13, 2020 · 3 comments · Fixed by #32700
Closed

Add track_times flag for HDFStore put method #32682

krysma opened this issue Mar 13, 2020 · 3 comments · Fixed by #32700
Labels
IO HDF5 read_hdf, HDFStore
Milestone

Comments

@krysma
Copy link

krysma commented Mar 13, 2020

When adding table to HDF using put method of HDFStore, it would be good to have option to set track_times flag used in pytables in method create_table. Thanks to this flag it is possible to stop tracking of times the file was changed. This is causing problems when versioning the HDF files and using checksums.

For this, it is necessary to propagate this flag from: https://github.com/pandas-dev/pandas/blob/master/pandas/io/pytables.py#L973 to this line: https://github.com/pandas-dev/pandas/blob/master/pandas/io/pytables.py#L4144

@krysma krysma changed the title Add track_times flag for HDFStore Add track_times flag for HDFStore put method Mar 13, 2020
@krysma
Copy link
Author

krysma commented Mar 13, 2020

How to reproduce:
Run this python code to get sample h.5

import pandas as pd

df = pd.DataFrame([{"x": 1, "y": 1}, {"x": 2, "y": 2}, {"x": 3, "y":3}, {"x": 3, "y":4}])

hdf = pd.HDFStore("a.h5", mode="w")

hdf.put("table", df, format='table', data_columns=True, index=None)

hdf.close()

Once you have several files (2 is enough), check on their checksums:

sha256sum a.h5

Because of the track_times=True by default the sha is different.

Then change this line: https://github.com/pandas-dev/pandas/blob/master/pandas/io/pytables.py#L4144
to:
table._handle.create_table(table.group, track_times=False, **options)
Repeat the code to generate the hdf files again, run the sha checking and you will see, that the sha is same now.

@rbenes rbenes mentioned this issue Mar 14, 2020
5 tasks
@jreback jreback added the IO HDF5 read_hdf, HDFStore label Mar 15, 2020
@jreback
Copy link
Contributor

jreback commented Mar 15, 2020

https://www.pytables.org/usersguide/libref/file_class.html?highlight=create_table#tables.File.create_table this flag already defaults to True. not sure if we would ever want to set it to false anyhow.

@krysma
Copy link
Author

krysma commented May 12, 2020

I see, that @rbenes has implemented the feature in above mentioned PR, will it be merged soon, so we can close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants