-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Pytables selection enhancement & docs update for HDF5 tables #2264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1. added __str__ (to do __repr__) 2. row removal in tables is much faster if rows are consecutive 3. added Term class, refactored Selection (this is backdwards compatible) Term is a concise way of specifying conditions for queries, e.g. Term(dict(field = 'index', op = '>', value = '20121114')) Term('index', '20121114') Term('index', '>', '20121114') Term('index', ['20121114','20121114']) Term('index', datetime(2012,11,14)) Term('index>20121114') updated tests for same this should close GH pandas-dev#1996
…e (see test_append) this the result of incompatibility testing on the index_kind
think about doing this automagically for tables
@Thisch ..thanks...that is fixed.....let me know if anything else is unclear.....this functionality has mostly been in pandas for a while, but undoced....so you can try it out |
…of index columns minimum size changed pytables version test for indexing around a bit added Col class to manage the column conversions added alias to the Term class; you can specify the nomial indexers (e.g. index in DataFrame, major_axis/minor_axis or alias in Panel) updated docs for pytables to reflect these changes updated docs for indexing to incorporate whatsnew 0.9.1 for where and mask
@wesm I have also some preliminary work on speeding up table writes (with panels), currently takes about 9.5s for 1M rows (e.g. a 6 x 1000 x 1000 panel); made code a lot simpler and using cython..down to about 6s; about 1/2 of overhead is from pytables actually writing it, other from creating a list of tuples (which is then turned into a recarray by pytables) - prob will be able to PR this next week |
I think you can write whole blocks of data instead of going row by row and go much faster? @John-Colvin has worked on this I think |
I had some discussions with John (waiting for some sample timings) but I think there r really 2 cases here
so I suppose some use cases might prefer having faster writing and still have a searching ability I prefer to write my tables in small batches, but need to preserve searching (and reading is quite fast anyhow) I suppose If there is enough interest could support both approaches On Nov 16, 2012, at 10:16 PM, Wes McKinney [email protected] wrote:
|
closing this - going to put in a new PR soon that is a bit cleaner |
added str (to do repr)
row removal in tables is much faster if rows are consecutive
added Term class, refactored Selection (this is backwards compatible)
Term is a concise way of specifying conditions for queries, e.g.
updated tests for same
this should close GH PyTables enhancements for selection #1996
added docs for HDF5 table in io.html
append on a table that didn't exist was failing (because of testing of the index_kind attribute first - which may not exist)
fixed & added test
added create_table_index method to create indicies on tables (which, btw now works quite well as Int64 indicies are used as opposed to the Time64Col which has a bug); includes a check on the pytables version requirement
this should close GH Add option to create indexes in HDFStore if user is using PyTables Pro / PyTables 2.3+ #698
added min_itemsize as a paremeter to append; allows bigger default indexer columns upon table creation (even if you don't append something that big - but might later, avoid the truncation issue)
incorporated 0.9.1 whatsnew docs for where & mask into Indexing Section of main docs