doc/source/whatsnew/v0.17.0.txt

.. _whatsnew_0170:

v0.17.0 (July 31, 2015)
-----------------------

This is a major release from 0.16.2 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

.. warning::

   pandas >= 0.17.0 will no longer support compatibility with Python version 3.2 (:issue:`9118`)

Highlights include:

  - Release the Global Interpreter Lock (GIL) on some cython operations, see :ref:`here <whatsnew_0170.gil>`

Check the :ref:`API Changes <whatsnew_0170.api>` and :ref:`deprecations <whatsnew_0170.deprecations>` before updating.

.. contents:: What's new in v0.17.0
    :local:
    :backlinks: none

.. _whatsnew_0170.enhancements:

New features
~~~~~~~~~~~~

- SQL io functions now accept a SQLAlchemy connectable. (:issue:`7877`)
- Enable writing complex values to HDF stores when using table format (:issue:`10447`)

.. _whatsnew_0170.enhancements.other:

Other enhancements
^^^^^^^^^^^^^^^^^^
- Enable `read_hdf` to be used without specifying a key when the HDF file contains a single dataset (:issue:`10443`)

- ``.as_blocks`` will now take a ``copy`` optional argument to return a copy of the data, default is to copy (no change in behavior from prior versions), (:issue:`9607`)

- ``regex`` argument to ``DataFrame.filter`` now handles numeric column names instead of raising ``ValueError`` (:issue:`10384`).

.. _whatsnew_0170.api:

Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0170.api_breaking:

.. _whatsnew_0170.api_breaking.other:

.. _whatsnew_0170.api_breaking.convert_objects:
Changes to convert_objects
^^^^^^^^^^^^^^^^^^^^^^^^^^
- ``DataFrame.convert_objects`` keyword arguments have been shortened. (:issue:`10265`)

=====================   =============
Old                     New
=====================   =============
``convert_dates``       ``datetime``
``convert_numeric``     ``numeric``
``convert_timedelta``   ``timedelta``
=====================   =============

- Coercing types with ``DataFrame.convert_objects`` is now implemented using the
keyword argument ``coerce=True``.  Previously types were coerced by setting a
keyword argument to ``'coerce'`` instead of ``True``, as in ``convert_dates='coerce'``.

  .. ipython:: python

    df = pd.DataFrame({'i': ['1','2'],
                       'f': ['apple', '4.2'],
                       's': ['apple','banana']})
    df

  The old usage of ``DataFrame.convert_objects`` used `'coerce'` along with the
  type.

  .. code-block:: python

    In [2]: df.convert_objects(convert_numeric='coerce')

  Now the ``coerce`` keyword must be explicitly used.

  .. ipython:: python

    df.convert_objects(numeric=True, coerce=True)

- In earlier versions of pandas, ``DataFrame.convert_objects`` would not coerce
numeric types when there were no values convertible to a numeric type.  For example,

  .. code-block:: python

    In [1]: df = pd.DataFrame({'s': ['a','b']})
    In [2]: df.convert_objects(convert_numeric='coerce')
    Out[2]:
        s
     0  a
     1  b

returns the original DataFrame with no conversion. This change alters
this behavior so that

  .. ipython:: python

    pd.DataFrame({'s': ['a','b']})
    df.convert_objects(numeric=True, coerce=True)

converts all non-number-like strings to ``NaN``.

- In earlier versions of pandas, the default behavior was to try and convert
datetimes and timestamps. The new default is for ``DataFrame.convert_objects``
to do nothing, and so it is necessary to pass at least one conversion target
in the method call.


Other API Changes
^^^^^^^^^^^^^^^^^
- Enable writing Excel files in :ref:`memory <_io.excel_writing_buffer>` using StringIO/BytesIO (:issue:`7074`)
- Enable serialization of lists and dicts to strings in ExcelWriter (:issue:`8188`)
- Allow passing `kwargs` to the interpolation methods (:issue:`10378`).
- Serialize metadata properties of subclasses of pandas objects (:issue:`10553`).


.. _whatsnew_0170.deprecations:

Deprecations
^^^^^^^^^^^^

.. _whatsnew_0170.prior_deprecations:

Removal of prior version deprecations/changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. _dask: https://dask.readthedocs.org/en/latest/

.. _whatsnew_0170.gil:

Releasing the GIL
~~~~~~~~~~~~~~~~~

We are releasing the global-interpreter-lock (GIL) on some cython operations.
This will allow other threads to run simultaneously during computation, potentially allowing performance improvements
from multi-threading. Notably ``groupby`` and some indexing operations are a benefit from this. (:issue:`8882`)

For example the groupby expression in the following code will have the GIL released during the factorization step, e.g. ``df.groupby('key')``
as well as the ``.sum()`` operation.

.. code-block:: python

   N = 1e6
   df = DataFrame({'key' : np.random.randint(0,ngroups,size=N),
                   'data' : np.random.randn(N) })
   df.groupby('key')['data'].sum()

Releasing of the GIL could benefit an application that uses threads for user interactions (e.g. ``QT``), or performaning multi-threaded computations. A nice example of a library that can handle these types of computation-in-parallel is the dask_ library.

.. _whatsnew_0170.performance:


Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Added vbench benchmarks for alternative ExcelWriter engines and reading Excel files (:issue:`7171`)

- 4x improvement in ``timedelta`` string parsing (:issue:`6755`, :issue:`10426`)
- 8x improvement in ``timedelta64`` and ``datetime64`` ops (:issue:`6755`)
- Significantly improved performance of indexing ``MultiIndex`` with slicers (:issue:`10287`)
- Improved performance of ``Series.isin`` for datetimelike/integer Series (:issue:`10287`)

.. _whatsnew_0170.bug_fixes:

Bug Fixes
~~~~~~~~~

- Bug in ``DataFrame.apply`` when function returns categorical series. (:issue:`9573`)
- Bug in ``to_datetime`` with invalid dates and formats supplied (:issue:`10154`)

- Bug in ``Index.drop_duplicates`` dropping name(s) (:issue:`10115`)


- Bug in ``pd.Series`` when setting a value on an empty ``Series`` whose index has a frequency. (:issue:`10193`)

- Bug in ``DataFrame.plot`` raises ``ValueError`` when color name is specified by multiple characters (:issue:`10387`)
- Bug in ``DataFrame.reset_index`` when index contains `NaT`. (:issue:`10388`)


- Bug in ``ExcelReader`` when worksheet is empty (:issue:`6403`)


- Bug in ``Table.select_column`` where name is not preserved (:issue:`10392`)
- Bug in ``offsets.generate_range`` where ``start`` and ``end`` have finer precision than ``offset`` (:issue:`9907`)


- Bug in ``DataFrame.interpolate`` with ``axis=1`` and ``inplace=True`` (:issue:`10395`)

- Bug in ``io.sql.get_schema`` when specifying multiple columns as primary
  key (:issue:`10385`).


- Bug in ``test_categorical`` on big-endian builds (:issue:`10425`)
- Bug in ``Series.map`` using categorical ``Series`` raises ``AttributeError`` (:issue:`10324`)
- Bug in ``MultiIndex.get_level_values`` including ``Categorical`` raises ``AttributeError`` (:issue:`10460`)

- Bug that caused segfault when resampling an empty Series (:issue:`10228`)
- Bug in ``DatetimeIndex`` and ``PeriodIndex.value_counts`` resets name from its result, but retains in result's ``Index``. (:issue:`10150`)

- Bug in `pandas.concat` with ``axis=0`` when column is of dtype ``category`` (:issue:`10177`)

- Bug in ``read_msgpack`` where input type is not always checked (:issue:`10369`)

- Bug in `pandas.read_csv` with ``index_col=False`` or with ``index_col=['a', 'b']``  (:issue:`10413`, :issue:`10467`)

- Bug in `Series.from_csv` with ``header`` kwarg not setting the ``Series.name`` or the ``Series.index.name`` (:issue:`10483`)

- Bug in `groupby.var` which caused variance to be inaccurate for small float values (:issue:`10448`)

- Bug in ``Series.plot(kind='hist')`` Y Label not informative (:issue:`10485`)