doc/source/whatsnew/v0.16.2.txt

.. _whatsnew_0162:

v0.16.2 (June 12, 2015)
-----------------------

This is a minor bug-fix release from 0.16.1 and includes a a large number of
bug fixes along several new features, enhancements, and performance improvements.
We recommend that all users upgrade to this version.

Highlights include:

- Documentation on how to use ``numba`` with *pandas*, see :ref:`here <enhancingperf.numba>`
- A new ``pipe`` method, see :ref:`here <whatsnew_0162.enhancements.pipe>`

Check the :ref:`API Changes <whatsnew_0162.api>` before updating.

.. contents:: What's new in v0.16.2
    :local:
    :backlinks: none

.. _whatsnew_0162.enhancements:

New features
~~~~~~~~~~~~

.. _whatsnew_0162.enhancements.pipe:

Pipe
^^^^

We've introduced a new method :meth:`DataFrame.pipe`. As suggested by the name, ``pipe``
should be used to pipe data through a chain of function calls.
The goal is to avoid confusing nested function calls like

  .. code-block:: python

     # df is a DataFrame
     # f, g, and h are functions that take and return DataFrames
     f(g(h(df), arg1=1), arg2=2, arg3=3)

The logic flows from inside out, and function names are separated from their keyword arguments.
This can be rewritten as

  .. code-block:: python

     (df.pipe(h)
        .pipe(g, arg1=1)
        .pipe(f, arg2=2, arg3=3)
     )

Now both the code and the logic flow from top to bottom. Keyword arguments are next to
their functions. Overall the code is much more readable.

In the example above, the functions ``f``, ``g``, and ``h`` each expected the DataFrame as the first positional argument.
When the function you wish to apply takes its data anywhere other than the first argument, pass a tuple
of ``(function, keyword)`` indicating where the DataFrame should flow. For example:

.. ipython:: python

   import statsmodels.formula.api as sm

   bb = pd.read_csv('data/baseball.csv', index_col='id')

   # sm.poisson takes (formula, data)
   (bb.query('h > 0')
      .assign(ln_h = lambda df: np.log(df.h))
      .pipe((sm.poisson, 'data'), 'hr ~ ln_h + year + g + C(lg)')
      .fit()
      .summary()
   )

The pipe method is inspired by unix pipes, which stream text through
processes. More recently dplyr_ and magrittr_ have introduced the
popular ``(%>%)`` pipe operator for R_.

See the :ref:`documentation <basics.pipe>` for more. (:issue:`10129`)

.. _dplyr: https://github.com/hadley/dplyr
.. _magrittr: https://github.com/smbache/magrittr
.. _R: http://www.r-project.org

.. _whatsnew_0162.enhancements.other:

Other enhancements
^^^^^^^^^^^^^^^^^^

.. _whatsnew_0162.api:

Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0162.api_breaking:

.. _whatsnew_0162.api_breaking.other:

Other API Changes
^^^^^^^^^^^^^^^^^

- ``Holiday`` now raises ``NotImplementedError`` if both ``offset`` and ``observance`` are used in constructor instead of returning an incorrect result (:issue:`10217`).
- ``axis`` parameter of ``DataFrame.quantile`` now accepts also ``index`` and``column``. (:issue:`9543`)

- Adding empty ``DataFrame``s results in a ``DataFrame`` that ``.equals`` an empty ``DataFrame`` (:issue:`10181`)

.. _whatsnew_0162.performance:

Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- Improved ``Series.resample`` performance with dtype=datetime64[ns] (:issue:`7754`)
- Increase performance of ``str.split`` when ``expand=True`` (:issue:`10081`)

.. _whatsnew_0162.bug_fixes:

Bug Fixes
~~~~~~~~~

- Bug where read_hdf store.select modifies the passed columns list when
  multi-indexed (:issue:`7212`)
- Bug in ``Categorical`` repr with ``display.width`` of ``None`` in Python 3 (:issue:`10087`)

- Bug where some of the nan funcs do not have consistent return dtypes (:issue:`10251`)

- Bug in ``DataFrame.quantile`` on checking that a valid axis was passed (:issue:`9543`)

- Bug in groupby.apply aggregation for Categorical not preserving categories (:issue:`10138`)
- Bug in ``to_csv`` where ``date_format`` is ignored if the ``datetime`` is fractional (:issue:`10209`)
- Bug in ``DataFrame.to_json`` with mixed data types (:issue:`10289`)

- Bug in cache updating when consolidating (:issue:`10264`)

- Bug in ``mean()`` where integer dtypes can overflow (:issue:`10172`)
- Bug where Panel.from_dict does not set dtype when specified (:issue:`10058`)
- Bug in ``Index.union`` raises ``AttributeError`` when passing array-likes. (:issue:`10149`)
- Bug in ``Timestamp``'s' ``microsecond``, ``quarter``, ``dayofyear``, ``week`` and ``daysinmonth`` properties return ``np.int`` type, not built-in ``int``. (:issue:`10050`)
- Bug in ``NaT`` raises ``AttributeError`` when accessing to ``daysinmonth``, ``dayofweek`` properties. (:issue:`10096`)


- Bug in getting timezone data with ``dateutil`` on various platforms ( :issue:`9059`, :issue:`8639`, :issue:`9663`, :issue:`10121`)
- Bug in display datetimes with mixed frequencies uniformly; display 'ms' datetimes to the proper precision. (:issue:`10170`)

- Bug in ``Series`` arithmetic methods may incorrectly hold names (:issue:`10068`)

- Bug in ``GroupBy.get_group`` when grouping on multiple keys, one of which is categorical. (:issue:`10132`)

- Bug in ``DatetimeIndex`` and ``TimedeltaIndex`` names are lost after timedelta arithmetics ( :issue:`9926`)


- Bug in `Series.plot(label="LABEL")` not correctly setting the label (:issue:`10119`)

- Bug in `plot` not defaulting to matplotlib `axes.grid` setting (:issue:`9792`)

- Bug in ``Series.align`` resets ``name`` when ``fill_value`` is specified (:issue:`10067`)
- Bug in ``read_csv`` causing index name not to be set on an empty DataFrame (:issue:`10184`)
- Bug in ``SparseSeries.abs`` resets ``name`` (:issue:`10241`)
- Bug in ``TimedeltaIndex`` slicing may reset freq (:issue:`10292`)

- Bug in GroupBy.get_group raises ValueError when group key contains NaT (:issue:`6992`)
- Bug in ``SparseSeries`` constructor ignores input data name (:issue:`10258`)

- Bug where infer_freq infers timerule (WOM-5XXX) unsupported by to_offset (:issue:`9425`)
- Bug in ``DataFrame.to_hdf()`` where table format would raise a seemingly unrelated error for invalid (non-string) column names. This is now explicitly forbidden. (:issue:`9057`)
- Bug to handle masking empty ``DataFrame``(:issue:`10126`)
- Bug where MySQL interface could not handle numeric table/column names (:issue:`10255`)
- Bug in ``read_csv`` with a ``date_parser`` that returned a ``datetime64`` array of other time resolution than ``[ns]`` (:issue:`10245`).