Skip to content

Track times #32700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 14, 2020
Merged

Track times #32700

merged 15 commits into from
May 14, 2020

Conversation

rbenes
Copy link
Contributor

@rbenes rbenes commented Mar 14, 2020

I implement feature request #32682 but if someone has more experience with pytables consider these changes.

The h5 files generated with track_times=False has the same md5 hashes, that is covered in the unit test as well as the fact that the hashes are not the same with track_times=True

@jreback
Copy link
Contributor

jreback commented Mar 15, 2020

https://www.pytables.org/usersguide/libref/file_class.html?highlight=create_table#tables.File.create_table this option already defaults to True. is the purpose of this PR not to set it?

@jreback jreback added the IO HDF5 read_hdf, HDFStore label Mar 15, 2020
@rbenes
Copy link
Contributor Author

rbenes commented Mar 15, 2020

https://www.pytables.org/usersguide/libref/file_class.html?highlight=create_table#tables.File.create_table this option already defaults to True. is the purpose of this PR not to set it?

Yes, you are right. The default value is True that means that creation time is stored inside h5 (that is reported as unwanted due to change of SHA hash). I exposed the track_times to put method of HDFStore. I let default value to True in all methods that it goes through to keep current behavior. Finally i pass it to table._handle.create_table.

The possibility to set of track_times can be seen in test on line 325. And on lines 340 and 343 you can see the different or the same check-sums after appropriate setup of track_times.

@rbenes rbenes force-pushed the track_times branch 2 times, most recently from 18654d9 to 88eea0e Compare March 15, 2020 17:39
from tables import __version__ as tables_version
from packaging.version import Version

if Version(tables_version) >= Version("3.4.3"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use LooseVersion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed. Thanks for hint. But, what is the reason for using LooseVersion? I found that packaging.version is more "modern" - https://stackoverflow.com/a/11887885/1386342

@@ -4140,6 +4145,17 @@ def write(
# set the table attributes
table.set_attrs()

if track_times is not None:
from tables import __version__ as tables_version
from packaging.version import Version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this import can be at the top (but use LooseVersion)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved. LooseVersion used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to revert

from tables import __version__ as tables_version

back here because of:

 Checks on imported code1s
##[error]Process completed with exit code 2.
Run source activate pandas-dev
Check import. No warnings, and blacklist some optional dependencies
err: pandas should not import: tables, numexpr
Check import. No warnings, and blacklist some optional dependencies DONE
##[error]Process completed with exit code 2.


if Version(tables_version) >= Version("3.4.3"):
options["track_times"] = track_times
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would just ignore it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Ignored. It is now much easier.

# GH 32682
# enables to set track_times (see `pytables` `create_table` documentation)

import hashlib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all imports should be at the top; if you need to skip for a version you can do it with a @td.skip_if_no. add a test for >= 3.4.3 and one fore before that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👍

@rbenes
Copy link
Contributor Author

rbenes commented Mar 24, 2020

Hi @jreback . I did changes you requested. But, I found another possibility - not to pass track_times in parameters but put it here: https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html . What do you think?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already require 3.4.2, so ok to simply bump the minimum to 3.4.3 is fine.

@jreback
Copy link
Contributor

jreback commented Apr 10, 2020

Hi @jreback . I did changes you requested. But, I found another possibility - not to pass track_times in parameters but put it here: https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html . What do you think?

i think accepting it like in this pr is ok.

@rbenes rbenes mentioned this pull request Apr 16, 2020
5 tasks
@rbenes rbenes force-pushed the track_times branch 5 times, most recently from 36f1c07 to f134442 Compare April 17, 2020 11:17
@rbenes
Copy link
Contributor Author

rbenes commented Apr 20, 2020

we already require 3.4.2, so ok to simply bump the minimum to 3.4.3 is fine.

I updated requirements in separate PR and updated this PR accordingly.

@rbenes
Copy link
Contributor Author

rbenes commented May 13, 2020

Hi, I see that "krysma" already remind this feature. Please revise changes and decide next steps. I synced it with master, so now it is without conflicts.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine, apart from comments

@@ -984,6 +984,7 @@ def put(
data_columns: Optional[List[str]] = None,
encoding=None,
errors: str = "strict",
track_times: bool = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the doc-string, make sure to add a 1.1 versionadded tag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc string added

@rbenes rbenes force-pushed the track_times branch 2 times, most recently from 7c9661c to a5b46df Compare May 14, 2020 07:02
@jreback jreback added this to the 1.1 milestone May 14, 2020
@jreback jreback merged commit 507cb15 into pandas-dev:master May 14, 2020
@jreback
Copy link
Contributor

jreback commented May 14, 2020

thanks @rbenes

@Pierre-Bartet
Copy link

Hi, trying to use track_times=False I add to look at all the options used in the test, so that I found that index=None is needed in order to indeed obtain reproducible files.
Is it expected? If yes it should be said somewhere in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add track_times flag for HDFStore put method
4 participants