doc/source/v0.10.1.txt

.. _whatsnew_0101:

v0.10.1 (January ??, 2013)
---------------------------

This is a minor release from 0.10.0 and includes many new features and
enhancements along with a large number of bug fixes. There are also a number of
important API changes that long-time pandas users should pay close attention
to.

API changes
~~~~~~~~~~~

New features
~~~~~~~~~~~~

HDFStore
~~~~~~~~

You may need to upgrade your existing data files. Please visit the **compatibility** section in the main docs.


.. ipython:: python
   :suppress:
   :okexcept:

   os.remove('store.h5')

You can designate (and index) certain columns that you want to be able to perform queries on a table, by passing a list to ``data_columns``

.. ipython:: python

   store = HDFStore('store.h5')
   df = DataFrame(randn(8, 3), index=date_range('1/1/2000', periods=8),
              columns=['A', 'B', 'C'])
   df['string'] = 'foo'
   df.ix[4:6,'string'] = np.nan
   df.ix[7:9,'string'] = 'bar'
   df['string2'] = 'cool'
   df

   # on-disk operations
   store.append('df', df, data_columns = ['B','C','string','string2'])
   store.select('df',[ 'B > 0', 'string == foo' ])

   # this is in-memory version of this type of selection
   df[(df.B > 0) & (df.string == 'foo')]

Retrieving unique values in an indexable or data column.

.. ipython:: python

   store.unique('df','index')
   store.unique('df','string')

You can now store ``datetime64`` in data columns

.. ipython:: python

    df_mixed               = df.copy()
    df_mixed['datetime64'] = Timestamp('20010102')
    df_mixed.ix[3:4,['A','B']] = np.nan

    store.append('df_mixed', df_mixed)
    df_mixed1 = store.select('df_mixed')
    df_mixed1
    df_mixed1.get_dtype_counts()

You can pass ``columns`` keyword to select to filter a list of the return columns, this is equivalent to passing a ``Term('columns',list_of_columns_to_filter)``

.. ipython:: python

   store.select('df',columns = ['A','B'])

``HDFStore`` now serializes multi-index dataframes when appending tables.

.. ipython:: python

   index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
                              ['one', 'two', 'three']],
                      labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
                              [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
                      names=['foo', 'bar'])
   df = DataFrame(np.random.randn(10, 3), index=index,
                  columns=['A', 'B', 'C'])
   df

   store.append('mi',df)
   store.select('mi')

   # the levels are automatically included as data columns
   store.select('mi', Term('foo=bar'))

Multi-table creation via ``append_to_multiple`` and selection via ``select_as_multiple`` can create/select from multiple tables and return a combined result, by using ``where`` on a selector table.

.. ipython:: python

   df_mt = DataFrame(randn(8, 6), index=date_range('1/1/2000', periods=8), 
                                  columns=['A', 'B', 'C', 'D', 'E', 'F'])
   df_mt['foo'] = 'bar'

   # you can also create the tables individually
   store.append_to_multiple({ 'df1_mt' : ['A','B'], 'df2_mt' : None }, df_mt, selector = 'df1_mt')
   store

   # indiviual tables were created
   store.select('df1_mt')
   store.select('df2_mt')
   
   # as a multiple
   store.select_as_multiple(['df1_mt','df2_mt'], where = [ 'A>0','B>0' ], selector = 'df1_mt')

.. ipython:: python
   :suppress:

   store.close()
   import os
   os.remove('store.h5')

**Enhancements**

- ``HDFStore`` now can read native PyTables table format tables
- You can pass ``nan_rep = 'my_nan_rep'`` to append, to change the default nan representation on disk (which converts to/from `np.nan`), this defaults to `nan`.
- You can pass ``index`` to ``append``. This defaults to ``True``. This will automagically create indicies on the *indexables* and *data columns* of the table
- You can pass ``chunksize=an integer`` to ``append``, to change the writing chunksize (default is 50000). This will signficantly lower your memory usage on writing.
- You can pass ``expectedrows=an integer`` to the first ``append``, to set the TOTAL number of expectedrows that ``PyTables`` will expected. This will optimize read/write performance.
- ``Select`` now supports passing ``start`` and ``stop`` to provide selection space limiting in selection.

**Bug Fixes**
- ``HDFStore`` tables can now store ``float32`` types correctly (cannot be mixed with ``float64`` however)

See the `full release notes
<https://github.com/pydata/pandas/blob/master/RELEASE.rst>`__ or issue tracker
on GitHub for a complete list.