-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: added support for data column queries #2561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@wesm i think this is ready to merge...also...if at all possible, once merged, can put up a dev build for 2.7 amd-64 on the site for testing.... |
@wesm done adding things - ready to merge when u r |
Hey jeff, one problem here is that the legacy test file is about 10 megs-- I don't want to bloat the size of the git repo or source archive if at all possible. fixing it may not be so bad using the interactive rebase approach described here: http://stackoverflow.com/questions/2100907/how-to-purge-a-huge-file-from-commits-history-in-git if you don't have time i could take a crack at it. will need a smaller test h5 file, though, i guess |
I can make smaller np will repost in a few On Dec 28, 2012, at 8:59 AM, Wes McKinney [email protected] wrote:
|
posted a revised file much smaller now |
…rches on the actual columns of the data) added nan_rep for supporting string columns with nan's in them performance enhancements on string columns more tests & docs for data columns
…he same type) e.g. self.store.select('df', [ Term('string', '=', 'foo'), Term('string2=foo'), Term('A>0'), Term('B<0') ])
…rror (in cases of unicode/datetime64/date)
…e passed to append/put)
added parameter chunksize to append, now writing occurs in chunks, significatnly reducing memory usage
add expectedrows keyword to append to give pytables an estimate of the total rows in a new table add start/stop keywords as selection criteria to limit searches to these rows added multi-index support for dataframes docs/tests for the above
… examples confusing
…n the results from a selector table. this allows one to potentially put the data you really want to index in a single table, and your actual (wide) data in another to speed queries
renamed keyword 'columns' to 'data_columns' when passed to 'append' (to avoid confusion with 'columns' keyword in select)
changed to use simpler cython routine to avoid copying
…eation at append time
…ndexable or data column w/o selecting the entire table
… make nomenclature consistent for compression. doc updates for compression
ok I did the interactive rebase |
Oops. I had already done the rebase. All is good and merged now, closing the PR |
data_columns
toappend
e.g. store.select('df_dc',[ Term('B>0'), Term('string=foo') ])
(where B and string are columns in the frame)
nan_rep
for supporting string columns with nan's in themunique
to fast retrieval of indexables or data columnsindex=True
to automagically create indicies on all indexables and data columns!chunksize
parameter to append to allow write chunking, significantly lower memory usage on writesexpectedrows
parameter to append to allow specification of the TOTAL expected rows in a table (to optimize performance)start/stop
parameters to select to allow limiting of the selection spaceappend_to_multple
,select_as_multiple
andselect_as_coordinates
methods to support multiple-table creation & selection