Skip to content

Commit e6e7c0f

Browse files
committed
Merge PR #2675
2 parents f3a02a4 + cadca37 commit e6e7c0f

File tree

8 files changed

+284
-49
lines changed

8 files changed

+284
-49
lines changed

RELEASE.rst

+10-1
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ pandas 0.10.1
5252
- added method ``unique`` to select the unique values in an indexable or data column
5353
- added method ``copy`` to copy an existing store (and possibly upgrade)
5454
- show the shape of the data on disk for non-table stores when printing the store
55+
- added ability to read PyTables flavor tables (allows compatiblity to other HDF5 systems)
5556
- Add ``logx`` option to DataFrame/Series.plot (GH2327_, #2565)
5657
- Support reading gzipped data from file-like object
5758
- ``pivot_table`` aggfunc can be anything used in GroupBy.aggregate (GH2643_)
@@ -72,7 +73,9 @@ pandas 0.10.1
7273
- handle correctly ``Term`` passed types (e.g. ``index<1000``, when index
7374
is ``Int64``), (closes GH512_)
7475
- handle Timestamp correctly in data_columns (closes GH2637_)
75-
76+
- contains correctly matches on non-natural names
77+
- correctly store ``float32`` dtypes in tables (if not other float types in
78+
the same table)
7679
- Fix DataFrame.info bug with UTF8-encoded columns. (GH2576_)
7780
- Fix DatetimeIndex handling of FixedOffset tz (GH2604_)
7881
- More robust detection of being in IPython session for wide DataFrame
@@ -100,6 +103,7 @@ pandas 0.10.1
100103
- refactored HFDStore to deal with non-table stores as objects, will allow future enhancements
101104
- removed keyword ``compression`` from ``put`` (replaced by keyword
102105
``complib`` to be consistent across library)
106+
- warn `PerformanceWarning` if you are attempting to store types that will be pickled by PyTables
103107

104108
.. _GH512: https://github.com/pydata/pandas/issues/512
105109
.. _GH1277: https://github.com/pydata/pandas/issues/1277
@@ -115,12 +119,17 @@ pandas 0.10.1
115119
.. _GH2631: https://github.com/pydata/pandas/issues/2631
116120
.. _GH2633: https://github.com/pydata/pandas/issues/2633
117121
.. _GH2637: https://github.com/pydata/pandas/issues/2637
122+
<<<<<<< HEAD
118123
.. _GH2668: https://github.com/pydata/pandas/issues/2668
119124
.. _GH2690: https://github.com/pydata/pandas/issues/2690
120125
.. _GH2692: https://github.com/pydata/pandas/issues/2692
121126
.. _GH2698: https://github.com/pydata/pandas/issues/2698
122127
.. _GH2699: https://github.com/pydata/pandas/issues/2699
123128
.. _GH2700: https://github.com/pydata/pandas/issues/2700
129+
||||||| merged common ancestors
130+
=======
131+
.. _GH2694: https://github.com/pydata/pandas/issues/2694
132+
>>>>>>> jreback/pytables_update6
124133

125134
pandas 0.10.0
126135
=============

doc/source/io.rst

+30-6
Original file line numberDiff line numberDiff line change
@@ -1211,7 +1211,7 @@ You can create/modify an index for a table with ``create_table_index`` after dat
12111211
12121212
Query via Data Columns
12131213
~~~~~~~~~~~~~~~~~~~~~~
1214-
You can designate (and index) certain columns that you want to be able to perform queries (other than the `indexable` columns, which you can always query). For instance say you want to perform this common operation, on-disk, and return just the frame that matches this query.
1214+
You can designate (and index) certain columns that you want to be able to perform queries (other than the `indexable` columns, which you can always query). For instance say you want to perform this common operation, on-disk, and return just the frame that matches this query. You can specify ``data_columns = True`` to force all columns to be data_columns
12151215

12161216
.. ipython:: python
12171217
@@ -1260,7 +1260,7 @@ To retrieve the *unique* values of an indexable or data column, use the method `
12601260
12611261
concat([ store.select('df_dc',c) for c in [ crit1, crit2 ] ])
12621262
1263-
**Table Object**
1263+
**Storer Object**
12641264

12651265
If you want to inspect the stored object, retrieve via ``get_storer``. You could use this progamatically to say get the number of rows in an object.
12661266

@@ -1363,17 +1363,40 @@ Notes & Caveats
13631363
# we have provided a minimum minor_axis indexable size
13641364
store.root.wp_big_strings.table
13651365

1366-
Compatibility
1367-
~~~~~~~~~~~~~
1366+
External Compatibility
1367+
~~~~~~~~~~~~~~~~~~~~~~
1368+
1369+
``HDFStore`` write storer objects in specific formats suitable for producing loss-less roundtrips to pandas objects. For external compatibility, ``HDFStore`` can read native ``PyTables`` format tables. It is possible to write an ``HDFStore`` object that can easily be imported into ``R`` using the ``rhdf5`` library. Create a table format store like this:
1370+
1371+
.. ipython:: python
1372+
1373+
store_export = HDFStore('export.h5')
1374+
store_export.append('df_dc',df_dc,data_columns=df_dc.columns)
1375+
store_export
1376+
1377+
.. ipython:: python
1378+
:suppress:
1379+
1380+
store_export.close()
1381+
import os
1382+
os.remove('export.h5')
1383+
1384+
Backwards Compatibility
1385+
~~~~~~~~~~~~~~~~~~~~~~~
13681386

13691387
0.10.1 of ``HDFStore`` is backwards compatible for reading tables created in a prior version of pandas however, query terms using the prior (undocumented) methodology are unsupported. ``HDFStore`` will issue a warning if you try to use a prior-version format file. You must read in the entire file and write it out using the new format, using the method ``copy`` to take advantage of the updates. The group attribute ``pandas_version`` contains the version information. ``copy`` takes a number of options, please see the docstring.
13701388

13711389

1390+
.. ipython:: python
1391+
:suppress:
1392+
1393+
import os
1394+
legacy_file_path = os.path.abspath('source/_static/legacy_0.10.h5')
1395+
13721396
.. ipython:: python
13731397
13741398
# a legacy store
1375-
import os
1376-
legacy_store = HDFStore('legacy_0.10.h5', 'r')
1399+
legacy_store = HDFStore(legacy_file_path,'r')
13771400
legacy_store
13781401
13791402
# copy (and return the new handle)
@@ -1397,6 +1420,7 @@ Performance
13971420
- You can pass ``chunksize=an integer`` to ``append``, to change the writing chunksize (default is 50000). This will signficantly lower your memory usage on writing.
13981421
- You can pass ``expectedrows=an integer`` to the first ``append``, to set the TOTAL number of expectedrows that ``PyTables`` will expected. This will optimize read/write performance.
13991422
- Duplicate rows can be written to tables, but are filtered out in selection (with the last items being selected; thus a table is unique on major, minor pairs)
1423+
- A ``PerformanceWarning`` will be raised if you are attempting to store types that will be pickled by PyTables (rather than stored as endemic types). See <http://stackoverflow.com/questions/14355151/how-to-make-pandas-hdfstore-put-operation-faster/14370190#14370190> for more information and some solutions.
14001424

14011425
Experimental
14021426
~~~~~~~~~~~~

doc/source/v0.10.1.txt

+3
Original file line numberDiff line numberDiff line change
@@ -119,12 +119,15 @@ Multi-table creation via ``append_to_multiple`` and selection via ``select_as_mu
119119

120120
**Enhancements**
121121

122+
- ``HDFStore`` now can read native PyTables table format tables
122123
- You can pass ``nan_rep = 'my_nan_rep'`` to append, to change the default nan representation on disk (which converts to/from `np.nan`), this defaults to `nan`.
123124
- You can pass ``index`` to ``append``. This defaults to ``True``. This will automagically create indicies on the *indexables* and *data columns* of the table
124125
- You can pass ``chunksize=an integer`` to ``append``, to change the writing chunksize (default is 50000). This will signficantly lower your memory usage on writing.
125126
- You can pass ``expectedrows=an integer`` to the first ``append``, to set the TOTAL number of expectedrows that ``PyTables`` will expected. This will optimize read/write performance.
126127
- ``Select`` now supports passing ``start`` and ``stop`` to provide selection space limiting in selection.
127128

129+
**Bug Fixes**
130+
- ``HDFStore`` tables can now store ``float32`` types correctly (cannot be mixed with ``float64`` however)
128131

129132
See the `full release notes
130133
<https://github.com/pydata/pandas/blob/master/RELEASE.rst>`__ or issue tracker

pandas/core/reshape.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -835,4 +835,4 @@ def block2d_to_blocknd(values, items, shape, labels, ref_items=None):
835835
def factor_indexer(shape, labels):
836836
""" given a tuple of shape and a list of Factor lables, return the expanded label indexer """
837837
mult = np.array(shape)[::-1].cumprod()[::-1]
838-
return np.sum(np.array(labels).T * np.append(mult, [1]), axis=1).T
838+
return com._ensure_platform_int(np.sum(np.array(labels).T * np.append(mult, [1]), axis=1).T)

0 commit comments

Comments
 (0)