doc/source/v0.13.0.txt

.. _whatsnew_0130:

v0.13.0 (August ??, 2013)
-------------------------

This is a major release from 0.12.0 and includes several new features and
enhancements along with a large number of bug fixes.

.. warning::

   In 0.13.0 ``Series`` has internaly been refactored to no longer sub-class ``ndarray``
   but instead subclass ``NDFrame``, similarly to the rest of the pandas containers. This should be
   a transparent change with only very limited API implications. See :ref:`Internal Refactoring<whatsnew_0130.refactoring>`

API changes
~~~~~~~~~~~

  - ``read_excel`` now supports an integer in its ``sheetname`` argument giving
    the index of the sheet to read in (:issue:`4301`).
  - Text parser now treats anything that reads like inf ("inf", "Inf", "-Inf",
    "iNf", etc.) as infinity. (:issue:`4220`, :issue:`4219`), affecting
    ``read_table``, ``read_csv``, etc.
  - ``pandas`` now is Python 2/3 compatible without the need for 2to3 thanks to
    @jtratner. As a result, pandas now uses iterators more extensively. This
    also led to the introduction of substantive parts of the Benjamin
    Peterson's ``six`` library into compat. (:issue:`4384`, :issue:`4375`,
    :issue:`4372`)
  - ``pandas.util.compat`` and ``pandas.util.py3compat`` have been merged into
    ``pandas.compat``. ``pandas.compat`` now includes many functions allowing
    2/3 compatibility. It contains both list and iterator versions of range,
    filter, map and zip, plus other necessary elements for Python 3
    compatibility. ``lmap``, ``lzip``, ``lrange`` and ``lfilter`` all produce
    lists instead of iterators, for compatibility with ``numpy``, subscripting
    and ``pandas`` constructors.(:issue:`4384`, :issue:`4375`, :issue:`4372`)
  - deprecated ``iterkv``, which will be removed in a future release (was just
    an alias of iteritems used to get around ``2to3``'s changes).
    (:issue:`4384`, :issue:`4375`, :issue:`4372`)
  - ``Series.get`` with negative indexers now returns the same as ``[]`` (:issue:`4390`)
  - ``HDFStore``

    - Significant table writing performance improvements
    - added an ``is_open`` property to indicate if the underlying file handle is_open;
      a closed store will now report 'CLOSED' when viewing the store (rather than raising an error)
      (:issue:`4409`)
    - a close of a ``HDFStore`` now will close that instance of the ``HDFStore``
      but will only close the actual file if the ref count (by ``PyTables``) w.r.t. all of the open handles
      are 0. Essentially you have a local instance of ``HDFStore`` referenced by a variable. Once you
      close it, it will report closed. Other references (to the same file) will continue to operate
      until they themselves are closed. Performing an action on a closed file will raise
      ``ClosedFileError``

      .. ipython:: python

         path = 'test.h5'
         df = DataFrame(randn(10,2))
         store1 = HDFStore(path)
         store2 = HDFStore(path)
         store1.append('df',df)
         store2.append('df2',df)

         store1
         store2
         store1.close()
         store2
         store2.close()
         store2

    - removed the ``_quiet`` attribute, replace by a ``DuplicateWarning`` if retrieving
      duplicate rows from a table (:issue:`4367`)
    - removed the ``warn`` argument from ``open``. Instead a ``PossibleDataLossError`` exception will
      be raised if you try to use ``mode='w'`` with an OPEN file handle (:issue:`4367`)
    - allow a passed locations array or mask as a ``where`` condition (:issue:`4467`).
      See :ref:`here<io.hdf5-where_mask>` for an example.

      .. ipython:: python
         :suppress:

         import os
         os.remove(path)

  - Changes to how ``Index`` and ``MultiIndex`` handle metadata (``levels``,
    ``labels``, and ``names``) (:issue:`4039`):

    ..code-block ::

        # previously, you would have set levels or labels directly
        index.levels = [[1, 2, 3, 4], [1, 2, 4, 4]]

        # now, you use the set_levels or set_labels methods
        index = index.set_levels([[1, 2, 3, 4], [1, 2, 4, 4]])

        # similarly, for names, you can rename the object
        # but setting names is not deprecated.
        index = index.set_names(["bob", "cranberry"])

        # and all methods take an inplace kwarg
        index.set_names(["bob", "cranberry"], inplace=True)

  - Infer and downcast dtype if ``downcast='infer'`` is passed to ``fillna/ffill/bfill`` (:issue:`4604`)

Enhancements
~~~~~~~~~~~~

  - ``read_html`` now raises a ``URLError`` instead of catching and raising a
    ``ValueError`` (:issue:`4303`, :issue:`4305`)
  - Added a test for ``read_clipboard()`` and ``to_clipboard()`` (:issue:`4282`)
  - Clipboard functionality now works with PySide (:issue:`4282`)
  - Added a more informative error message when plot arguments contain
    overlapping color and style arguments (:issue:`4402`)

  - ``timedelta64[ns]`` operations

    - A Series of dtype ``timedelta64[ns]`` can now be divided by another
      ``timedelta64[ns]`` object to yield a ``float64`` dtyped Series. This
      is frequency conversion. See :ref:`here<timeseries.timedeltas_convert>` for the docs.

      .. ipython:: python

         from datetime import timedelta
         td = Series(date_range('20130101',periods=4))-Series(date_range('20121201',periods=4))
         td[2] += np.timedelta64(timedelta(minutes=5,seconds=3))
         td[3] = np.nan
         td

         # to days
         td / np.timedelta64(1,'D')

         # to seconds
         td / np.timedelta64(1,'s')

    - Dividing or multiplying a ``timedelta64[ns]`` Series by an integer or integer Series

      .. ipython:: python

         td * -1
         td * Series([1,2,3,4])

    - Absolute ``DateOffset`` objects can act equivalenty to ``timedeltas``

      .. ipython:: python

         from pandas import offsets
         td + offsets.Minute(5) + offsets.Milli(5)

.. _whatsnew_0130.refactoring:

Internal Refactoring
~~~~~~~~~~~~~~~~~~~~

In 0.13.0 there is a major refactor primarily to subclass ``Series`` from ``NDFrame``,
which is the base class currently for ``DataFrame`` and ``Panel``, to unify methods
and behaviors. Series formerly subclassed directly from ``ndarray``. (:issue:`4080`, :issue:`3862`, :issue:`816`)

.. warning::

   There are two potential incompatibilities from < 0.13.0

   - Using certain numpy functions would previously return a ``Series`` if passed a ``Series``
     as an argument. This seems only to affect ``np.ones_like``, ``np.empty_like``, and
     ``np.diff``. These now return ``ndarrays``.

     .. ipython:: python

        s = Series([1,2,3,4])

        # numpy usage
        np.ones_like(s)
        np.diff(s)

        # pandonic usage
        Series(1,index=s.index)
        s.diff()

   - Passing a ``Series`` directly to a cython function expecting an ``ndarray`` type will no
     long work directly, you must pass ``Series.values``, See :ref:`Enhancing Performance<enhancingperf.ndarray>`

   - ``Series(0.5)`` would previously return the scalar ``0.5``, instead this will return a 1-element ``Series``

- Refactor of series.py/frame.py/panel.py to move common code to generic.py

  - added ``_setup_axes`` to created generic NDFrame structures
  - moved methods

    - ``from_axes,_wrap_array,axes,ix,loc,iloc,shape,empty,swapaxes,transpose,pop``
    - ``__iter__,keys,__contains__,__len__,__neg__,__invert__``
    - ``convert_objects,as_blocks,as_matrix,values``
    - ``__getstate__,__setstate__`` (compat remains in frame/panel)
    - ``__getattr__,__setattr__``
    - ``_indexed_same,reindex_like,align,where,mask``
    - ``fillna,replace`` (``Series`` replace is now consistent with ``DataFrame``)
    - ``filter`` (also added axis argument to selectively filter on a different axis)
    - ``reindex,reindex_axis`` (which was the biggest change to make generic)
    - ``truncate`` (moved to become part of ``NDFrame``)

- These are API changes which make ``Panel`` more consistent with ``DataFrame``

  - ``swapaxes`` on a ``Panel`` with the same axes specified now return a copy
  - support attribute access for setting
  - filter supports same api as original ``DataFrame`` filter

- Reindex called with no arguments will now return a copy of the input object

- Series now inherits from ``NDFrame`` rather than directly from ``ndarray``.
  There are several minor changes that affect the API.

  - numpy functions that do not support the array interface will now
    return ``ndarrays`` rather than series, e.g. ``np.diff`` and ``np.ones_like``
  - ``Series(0.5)`` would previously return the scalar ``0.5``, this is no
    longer supported
  - ``TimeSeries`` is now an alias for ``Series``. the property ``is_time_series``
    can be used to distinguish (if desired)

- Refactor of Sparse objects to use BlockManager

  - Created a new block type in internals, ``SparseBlock``, which can hold multi-dtypes
    and is non-consolidatable. ``SparseSeries`` and ``SparseDataFrame`` now inherit
    more methods from there hierarchy (Series/DataFrame), and no longer inherit
    from ``SparseArray`` (which instead is the object of the ``SparseBlock``)
  - Sparse suite now supports integration with non-sparse data. Non-float sparse
    data is supportable (partially implemented)
  - Operations on sparse structures within DataFrames should preserve sparseness,
    merging type operations will convert to dense (and back to sparse), so might
    be somewhat inefficient
  - enable setitem on ``SparseSeries`` for boolean/integer/slices
  - ``SparsePanels`` implementation is unchanged (e.g. not using BlockManager, needs work)

- added ``ftypes`` method to Series/DataFame, similar to ``dtypes``, but indicates
  if the underlying is sparse/dense (as well as the dtype)

- All ``NDFrame`` objects now have a ``_prop_attributes``, which can be used to indcated various
  values to propogate to a new object from an existing (e.g. name in ``Series`` will follow
  more automatically now)

- Internal type checking is now done via a suite of generated classes, allowing ``isinstance(value, klass)``
  without having to directly import the klass, courtesy of @jtratner

- Bug in Series update where the parent frame is not updating its cache based on
  changes (:issue:`4080`) or types (:issue:`3217`), fillna (:issue:`3386`)

- Indexing with dtype conversions fixed (:issue:`4463`, :issue:`4204`)

- Refactor Series.reindex to core/generic.py (:issue:`4604`, :issue:`4618`), allow ``method=`` in reindexing
  on a Series to work

Bug Fixes
~~~~~~~~~

  - ``HDFStore`` raising an invalid ``TypeError`` rather than ``ValueError`` when appending
    with a different block ordering (:issue:`4096`)

  - The ``by`` argument now works correctly with the ``layout`` argument
    (:issue:`4102`, :issue:`4014`) in ``*.hist`` plotting methods

  - Fixed bug in ``PeriodIndex.map`` where using ``str`` would return the str
    representation of the index (:issue:`4136`)

  - Fixed (:issue:`3334`) in pivot_table. Margins did not compute if values is the index.

  - Fixed test failure ``test_time_series_plot_color_with_empty_kwargs`` when
    using custom matplotlib default colors (:issue:`4345`)

  - Fix running of stata IO tests. Now uses temporary files to write
    (:issue:`4353`)

  - Fixed an issue where ``DataFrame.sum`` was slower than ``DataFrame.mean``
    for integer valued frames (:issue:`4365`)

  - ``read_html`` tests now work with Python 2.6 (:issue:`4351`)

  - Fixed bug where ``network`` testing was throwing ``NameError`` because a
    local variable was undefined (:issue:`4381`)

  - Suppressed DeprecationWarning associated with internal calls issued by repr() (:issue:`4391`)

See the :ref:`full release notes
<release>` or issue tracker
on GitHub for a complete list.