Skip to content

PERF: HDFStore __unicode__ method #16514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 55 commits into from
Closed

PERF: HDFStore __unicode__ method #16514

wants to merge 55 commits into from

Conversation

Kiv
Copy link
Contributor

@Kiv Kiv commented May 26, 2017

HDFStore unicode now only returns file path info. New info() method has the previous behavior of unicode.

import pandas as pd
store = pd.HDFStore('test.h5', 'w')
for i in range(5000):
    store.put('table_{}'.format(i), pd.DataFrame([i]))

# Before
%time str(store)
CPU times: user 26.1 s, sys: 156 ms, total: 26.2 s
Wall time: 26.2 s

# After
%time str(store)
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 40.1 µs

@Kiv Kiv changed the title Issue16503 Close #16503 May 26, 2017
@Kiv Kiv changed the title Close #16503 PERF: HDFStore __unicode__ method May 26, 2017
@@ -1161,6 +1136,35 @@ def copy(self, file, mode='w', propindexes=True, keys=None, complib=None,

return new_store

def info(self):
"""return detailed information on the store"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versionadded tag.

@@ -75,7 +75,7 @@ Removal of prior version deprecations/changes

Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- :class:`pandas.HDFStore`'s string representation is now faster and less detailed. For the previous behavior, use ``pandas.HDFStore.info()``. (:issue:`16503`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to API breaking changes

def info(self):
"""return detailed information on the store"""
output = '%s\nFile path: %s\n' % (type(self), pprint_thing(self._path))
if self.is_open:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this to api.rst

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, with some minor comment.

@jreback jreback added IO HDF5 read_hdf, HDFStore Performance Memory or execution speed performance labels May 26, 2017
@jreback
Copy link
Contributor

jreback commented May 26, 2017

ping when fixed / green.

@jreback jreback added this to the 0.21.0 milestone May 26, 2017
@codecov
Copy link

codecov bot commented May 26, 2017

Codecov Report

Merging #16514 into master will decrease coverage by <.01%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16514      +/-   ##
==========================================
- Coverage   90.79%   90.79%   -0.01%     
==========================================
  Files         161      161              
  Lines       51063    51065       +2     
==========================================
  Hits        46365    46365              
- Misses       4698     4700       +2
Flag Coverage Δ
#multiple 88.63% <4.76%> (-0.01%) ⬇️
#single 40.15% <85.71%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/io/pytables.py 92.99% <85.71%> (-0.08%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef487d9...1de16b6. Read the comment docs.

@gfyoung
Copy link
Member

gfyoung commented May 28, 2017

  1. Got a small linting error in pytables.py (hence the Travis failure)

  2. Can you provide some timing info in the PR given that this is a performance change?

@Kiv Kiv force-pushed the issue16503 branch 2 times, most recently from f5dd983 to 97eaae3 Compare May 28, 2017 12:01
@Kiv
Copy link
Contributor Author

Kiv commented May 28, 2017

@jreback I think this is ready to go?

@@ -4371,11 +4373,11 @@ def test_multiple_open_close(self):

# single
store = HDFStore(path)
assert 'CLOSED' not in str(store)
assert 'CLOSED' not in store.info()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you should also test str(store) a bit (u changed all of these to .info())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test that str(store) prints the filename, is that what you mean? There's not much behavior in there anymore to test...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

point to it it's not clear where it is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…avior.

__unicode__ now only returns file path info, not (expensive) details on all existing keys.
@gfyoung
Copy link
Member

gfyoung commented May 29, 2017

@Kiv Some timing info in the PR would be useful for future reference. Plus a benchmark test would be good here.

@jreback
Copy link
Contributor

jreback commented Jun 1, 2017

@Kiv lgtm. if you would add an asv would be great (you can use your original example, just not as long, maybe 20 nodes or something would be fine).

@jreback
Copy link
Contributor

jreback commented Jun 10, 2017

can you add an asv?

cmohl2013 and others added 7 commits June 11, 2017 07:57
* ENH pandas-dev#15972 added margins_name parameter for crosstab

* ENH 15972 minor changes as suggested by reviewers

* ENH 15972 correction in whatsnew

* ENH 15972 style changes in whatsnew
Replaces most uses of implicit global state from matplotlib in
test_datetimelike.py. This was potentially causing random failures
where a figure expected to be on a new, blank figure would instead
plot on an existing axes (that's the guess at least).
rhendric and others added 25 commits June 11, 2017 07:57
* DOC: change doc build to python 3.6

* Remove pinning of pyqt to 4.x

* Remove pinning of openpyxl

* Add xsel to doc build for clipboard
* PERF: vectorize _interp_limit

* CLN: remove old implementation

* fixup! CLN: remove old implementation
…ndas-dev#16444)

* BUG: Handle numpy strings in index names in HDF5 pandas-dev#13492

* REF: refactor to _ensure_str
…ches (pandas-dev#16460)

* pandas-devgh-14671 Check if usecols with type string contains a subset of names, if not throws an error

* tests added for pandas-devgh-14671, expected behavior of simultaneous use of usecols and names unclear so these tests are commented out

* Review comments
Splits extra information about the license and copyright holders to
AUTHORS.md.
* COMPAT: numpy 1.13 test compat

* CI: fix doc build to 1.12
@Kiv Kiv closed this Jun 11, 2017
@Kiv Kiv deleted the issue16503 branch June 11, 2017 11:14
@Kiv
Copy link
Contributor Author

Kiv commented Jun 11, 2017

My git branch got out of date and I don't know how to update it properly. New version with ASV bench is here:

#16666

@Kiv Kiv mentioned this pull request Jun 11, 2017
4 tasks
@jreback
Copy link
Contributor

jreback commented Jun 11, 2017

in the future see: http://pandas.pydata.org/pandas-docs/stable/contributing.html#combining-commits. Much much better to push to the same PR. opening new ones is rarely needed (for the same issue)

you just needed to do:

git rebase -i upstream/master
git push yourremove yourbranch -f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HDF5 read_hdf, HDFStore Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HDFStore __unicode__ very slow for many keys