ENH: PyTables Enhancements for future #2391

jreback · 2012-11-29T17:20:14Z

open (not in any particular order)

add support for other dtypes in table columns (datetime,date,unicode)
Implement variable length strings in a parallel VLArray (and synchronize): Support a VLStringCol PyTables/PyTables#198
revisit Term syntax - can we do better / more readability?
3a. implement or in Terms (maybe use pyparsing like syntax)
implement WORMTable
one big area is to test whether data columns really are slower; it thus may make sense to make data columns = True the default (but not necessarily index them). see https://groups.google.com/forum/m/?fromgroups#!topic/pydata/cmw1F3OFJSc - see the end of this post for some perf tests, so this is prob not a good idea after all
add export function, to export to different PyTables formats(an easy to read table for R (partially done), and output a GenericTable)
provide better access to columns that are data_columns (as we can directly select them) - see read_column, expand this to the entire table (if possible), allows one to avoid selecting all columns in a table (and then reindexing), this works if columns argument is provided to select or inferred from the where.
add out-of-core computation support (see my comment about 1/2 down in pandas converts int32 to int64 #622), this is partially supported now that we have an iterator (ENH: support iteration on returned results in select and select_as_multiple in HDFStore #3078)
add a method to create a table structure (create_table)?, w/o actually appending, so don't have to add parms in each call to append.
Support a better mechanism for table splitting Splitter? that a user can specify how to split (rather than a dict); then store this object, so can automatically recreate the resulting table (enable for both Storer and Table objects)
Optimize table appending, I think we can do better! (GH PERF: HDFStore table writing performance improvements #3537) makes some improvements
allow itemsize='truncate' to allow subsquent appends to proceed with string truncation (on specific columns)
allow where in select_column, return a properly indexed Series, add option to include the index (use_index=True?)
Better deal with a very long list as input to a Term, but running multiple or sub-queries
Add support for coulumn oriented tables, dep is carray, http://carray.pytables.org/docs/manual/

done

DONE (GH Pytables support for hierarchical keys #2401): access store paths via path notation / dot notation (GH BUG: issue in HDFStore with too many selectors in a where #2755)
DONE (GH ENH: ndim tables in HDFStore (allow indexables to be passed in) #2497): add to docs (GH Different HDFStores in multiple threads crashes Python #2397) - issues about reading/writing concurrently in threads/processes
http://sourceforge.net/mailarchive/message.php?msg_id=30190886
DONE (GH ENH: ndim tables in HDFStore (allow indexables to be passed in) #2497): support panelnd (GH Panelnd #2242)
DONE (GH ENH: added support for data column queries #2561): Should DataFrames be automagically indexed on 'index' (prob yes), but then should have a flag in append/put, and enable passing of the indexing options
DONE (GH ENH: ndim tables in HDFStore (allow indexables to be passed in) #2497): Check if create_table_index changes the current index if different options are passed
DONE (GH ENH: added support for data column queries #2561): for writing add chunk keyword to select to provide generator like behavior - each call to return the next chunk of data
DONE (GH ENH: added support for data column queries #2561): support multi indexes on tables
5a. DONE real dtype integration is coming on PR ENH/BUG/DOC: allow propogation and coexistance of numeric dtypes #2708 (eg even though 0.10.1 will actually read/write float32 columns u can't really do much with them w/o having them upcasted) - in any event I think HDFStore will accommodate this already. but more testing needed
DONE iterator support in select, http://stackoverflow.com/questions/14614512/merging-two-tables-with-millions-of-rows-in-python (GH ENH: support iteration on returned results in select and select_as_multiple in HDFStore #3078)
DONE (GH ENH: HDFStore enhancements #3531) support timezones in datelike columns (index should be ok already) (scott?), (GH PyTables dates don't work when you switch to a different time zone #2852)

The text was updated successfully, but these errors were encountered:

gerigk · 2012-11-29T17:26:54Z

what about allowing creation/access of groups by using "/" in the key.

i.e.,

store.put('some/path/to/df', df)

would create/access the groups some, path, to and finally df.

Right now I can only save the data on one level within an hdf5 file
although HDF5/PyTables supports access by file system like paths.
It would not break anything since the occurrence of a '/' raises an
exception right now.

On Thu, Nov 29, 2012 at 6:20 PM, jreback [email protected] wrote:

add support for other dtypes in table columns
(datetime64,datetime,date,unicode)

support min_itemsize for table columns (currently supported only in
indexers) also might be a better way of doing this (e.g. have the info
attached to a dataframe, or support a global pandas option to provide a
minimum)

revisit Term syntax - can we do better / more readability?

implement WORMTable

—
Reply to this email directly or view it on GitHubhttps://github.com/ENH: PyTables Enhancements for future #2391.

jreback · 2012-11-29T17:48:17Z

good idea...shouldn't be too hard to implement

scottkidder · 2013-01-30T01:26:20Z

Here are things that are most interesting/beneficial to my current workload:

Full Float32 support & full pandas dtype support
WORMTable (unsure of implementation or performance gains)
data_columns is very useful and I can do more testing to determine how fast/slow they are.
**read_column would also be very useful in many instances.

I like the way Term's work. Is there support for ORing Terms or other logical operations in the Selection?

I can pick up work on any of these issues, but I would absolutely to like to discuss some of the details first.

jreback · 2013-01-30T04:22:32Z

Scott send me an email and I'll send u offline so we can correspond
[email protected]

alvorithm · 2013-02-07T10:18:38Z

Term language: perhaps it makes sense to piggyback on existing syntax. SQL comes to mind, but also XESAM (whole http://xesam.org is down at the time, but one can get the gist of it here: http://banshee.fm/support/guide/searching/.

alvorithm · 2013-02-07T13:05:18Z

It would be nice if attribute access (e.g. store.df) could be enabled for all the leaves that have suitable names. This might require a big API overhaul, though (store.df.append ...).

jreback · 2013-02-07T13:14:42Z

see #2485, this is actually somewhat easy in HDFStore, the problem is that pandas in general doesnt' propogate these attributes; you can easily store/retrieve attributes if you want on the nodes themselves

something like:

s = store.get_storer('df')
s.attrs['my_attribute'] = 1

jreback · 2013-02-07T13:17:04Z

sorry...misundestood your comment....(though you meant saving attributes)

attribute access on the store is not a big deal, will add to the list

alvorithm · 2013-02-07T15:51:10Z

Thank you for considering this, dotted access will save my pinky a lot of strain [''] (dead keys b/c need accents...).

Regarding attributes on DFs actually this would preempt a number of cases for specialization of DataFrame (see recent MetaDataFrame PR #2695) and in particular perhaps support the addition for metadata that would facilitate automated merges (foreign keys...).

EDIT: there was a discussion about this topic in the mailing list

jreback · 2013-02-07T17:12:04Z

see #2755 , was pretty easy to add dotted access, so i did!

jreback · 2013-03-12T15:47:01Z

@scottkidder did you get a chance to look at issue 13. #2852

jreback · 2016-07-25T23:37:46Z

dated

jreback mentioned this issue Dec 1, 2012

Pytables support for hierarchical keys #2401

Closed

jreback mentioned this issue Jan 24, 2013

BUG: HDFStore fixes #2675

Merged

jreback mentioned this issue Mar 12, 2013

PyTables dates don't work when you switch to a different time zone #2852

Closed

jreback mentioned this issue Mar 23, 2013

savez method for DataFrame, Series: porting data between python2 and python3 #3151

Closed

jreback mentioned this issue Apr 30, 2013

Cannot append DataFrames with uint dtypes to HDFStore #3493

Closed

jreback closed this as completed Jul 25, 2016

jorisvandenbossche modified the milestones: No action, Someday Jul 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: PyTables Enhancements for future #2391

ENH: PyTables Enhancements for future #2391

jreback commented Nov 29, 2012

gerigk commented Nov 29, 2012

jreback commented Nov 29, 2012

scottkidder commented Jan 30, 2013

jreback commented Jan 30, 2013

alvorithm commented Feb 7, 2013

alvorithm commented Feb 7, 2013

jreback commented Feb 7, 2013

jreback commented Feb 7, 2013

alvorithm commented Feb 7, 2013

jreback commented Feb 7, 2013

jreback commented Mar 12, 2013

jreback commented Jul 25, 2016

ENH: PyTables Enhancements for future #2391

ENH: PyTables Enhancements for future #2391

Comments

jreback commented Nov 29, 2012

open (not in any particular order)

done

gerigk commented Nov 29, 2012

jreback commented Nov 29, 2012

scottkidder commented Jan 30, 2013

jreback commented Jan 30, 2013

alvorithm commented Feb 7, 2013

alvorithm commented Feb 7, 2013

jreback commented Feb 7, 2013

jreback commented Feb 7, 2013

alvorithm commented Feb 7, 2013

jreback commented Feb 7, 2013

jreback commented Mar 12, 2013

jreback commented Jul 25, 2016