Skip to content

ENH: added support for data column queries #2561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 35 commits into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Dec 18, 2012

  • can construct searches on the actual columns of the data, by passing keyword data_columns to append
    e.g. store.select('df_dc',[ Term('B>0'), Term('string=foo') ])
    (where B and string are columns in the frame)
  • added nan_rep for supporting string columns with nan's in them
  • corectly determine dtypes of data columns that we cannot deal with (date,unicode), raise UnImplementedError
  • added support for datetime64 in columns
  • added method unique to fast retrieval of indexables or data columns
  • added keyword index=True to automagically create indicies on all indexables and data columns!
  • performance enhancements on string columns
  • added chunksize parameter to append to allow write chunking, significantly lower memory usage on writes
  • added expectedrows parameter to append to allow specification of the TOTAL expected rows in a table (to optimize performance)
  • added start/stop parameters to select to allow limiting of the selection space
  • added multi-index dataframe support (closes GH Support for hierarchical indexes in HDFStore with table=True #1277 )
  • added append_to_multple, select_as_multiple and select_as_coordinates methods to support multiple-table creation & selection
  • removed compression kw from put, replaced with complib (to make more consistent across library)
  • Term parsing a little more robust
  • more tests & docs for data columns
  • added whatsnew 0.10.1 docs

@jreback
Copy link
Contributor Author

jreback commented Dec 20, 2012

@wesm i think this is ready to merge...also...if at all possible, once merged, can put up a dev build for 2.7 amd-64 on the site for testing....

@jreback
Copy link
Contributor Author

jreback commented Dec 20, 2012

@wesm I believe these changes will allow you to close: GH #512, GH #1277
GH #698 and GH #2397 can also be closed (as of 0.10.0)

@jreback
Copy link
Contributor Author

jreback commented Dec 23, 2012

@wesm done adding things - ready to merge when u r

@wesm
Copy link
Member

wesm commented Dec 28, 2012

Hey jeff, one problem here is that the legacy test file is about 10 megs-- I don't want to bloat the size of the git repo or source archive if at all possible. fixing it may not be so bad using the interactive rebase approach described here:

http://stackoverflow.com/questions/2100907/how-to-purge-a-huge-file-from-commits-history-in-git

if you don't have time i could take a crack at it. will need a smaller test h5 file, though, i guess

@jreback
Copy link
Contributor Author

jreback commented Dec 28, 2012

I can make smaller np

will repost in a few

On Dec 28, 2012, at 8:59 AM, Wes McKinney [email protected] wrote:

Hey jeff, one problem here is that the legacy test file is about 10 megs-- I don't want to bloat the size of the git repo or source archive if at all possible. fixing it may not be so bad using the interactive rebase approach described here:

http://stackoverflow.com/questions/2100907/how-to-purge-a-huge-file-from-commits-history-in-git

if you don't have time i could take a crack at it. will need a smaller test h5 file, though, i guess


Reply to this email directly or view it on GitHub.

@jreback
Copy link
Contributor Author

jreback commented Dec 28, 2012

posted a revised file much smaller now
thxs

…rches on the actual columns of the data)

             added nan_rep for supporting string columns with nan's in them
             performance enhancements on string columns
             more tests & docs for data columns
…he same type)

     e.g. self.store.select('df', [ Term('string', '=', 'foo'), Term('string2=foo'), Term('A>0'), Term('B<0') ])
         added parameter chunksize to append, now writing occurs in chunks, significatnly reducing memory usage
     add expectedrows keyword to append to give pytables an estimate of the total rows in a new table
     add start/stop keywords as selection criteria to limit searches to these rows
     added multi-index support for dataframes
     docs/tests for the above
…n the results from a selector table.

     this allows one to potentially put the data you really want to index in a single table, and your actual (wide)
     data in another to speed queries
     renamed keyword 'columns' to 'data_columns' when passed to 'append' (to avoid confusion with 'columns' keyword in select)
@jreback
Copy link
Contributor Author

jreback commented Dec 28, 2012

ok I did the interactive rebase
then added the new file
had to force update so hopefully merge works ok

wesm added a commit that referenced this pull request Dec 28, 2012
@wesm
Copy link
Member

wesm commented Dec 28, 2012

Oops. I had already done the rebase. All is good and merged now, closing the PR

@wesm wesm closed this Dec 28, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants