doc/source/whatsnew/v0.17.0.txt

.. _whatsnew_0170:

v0.17.0 (July 31, 2015)
-----------------------

This is a major release from 0.16.2 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

.. warning::

   pandas >= 0.17.0 will no longer support compatibility with Python version 3.2 (:issue:`9118`)

Highlights include:

  - Release the Global Interpreter Lock (GIL) on some cython operations, see :ref:`here <whatsnew_0170.gil>`

Check the :ref:`API Changes <whatsnew_0170.api>` and :ref:`deprecations <whatsnew_0170.deprecations>` before updating.

.. contents:: What's new in v0.17.0
    :local:
    :backlinks: none

.. _whatsnew_0170.enhancements:

New features
~~~~~~~~~~~~

- SQL io functions now accept a SQLAlchemy connectable. (:issue:`7877`)
- Enable writing complex values to HDF stores when using table format (:issue:`10447`)

.. _whatsnew_0170.enhancements.other:

Other enhancements
^^^^^^^^^^^^^^^^^^
- Enable `read_hdf` to be used without specifying a key when the HDF file contains a single dataset (:issue:`10443`)

- ``.as_blocks`` will now take a ``copy`` optional argument to return a copy of the data, default is to copy (no change in behavior from prior versions), (:issue:`9607`)

- ``regex`` argument to ``DataFrame.filter`` now handles numeric column names instead of raising ``ValueError`` (:issue:`10384`).

.. _whatsnew_0170.api:

Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0170.api_breaking:

- Operator equal on Index should behavior similarly to Series (:issue:`9947`)

Starting in v0.17.0, comparing ``Index`` objects of different lengths will raise
a ``ValueError``. This is to be consistent with the behavior of ``Series``.

Previous behavior:

.. code-block:: python

   In [2]: pd.Index([1, 2, 3]) == pd.Index([1, 4, 5])
   Out[2]: array([ True, False, False], dtype=bool)

   In [3]: pd.Index([1, 2, 3]) == pd.Index([2])
   Out[3]: array([False,  True, False], dtype=bool)

   In [4]: pd.Index([1, 2, 3]) == pd.Index([1, 2])
   Out[4]: False

   In [5]: pd.Series([1, 2, 3]) == pd.Series([1, 4, 5])
   Out[5]:
   0     True
   1    False
   2    False
   dtype: bool

   In [6]: pd.Series([1, 2, 3]) == pd.Series([2])
   ValueError: Series lengths must match to compare

   In [7]: pd.Series([1, 2, 3]) == pd.Series([1, 2])
   ValueError: Series lengths must match to compare

New behavior:

.. code-block:: python

   In [8]: pd.Index([1, 2, 3]) == pd.Index([1, 4, 5])
   Out[8]: array([ True, False, False], dtype=bool)

   In [9]: pd.Index([1, 2, 3]) == pd.Index([2])
   ValueError: Lengths must match to compare

   In [10]: pd.Index([1, 2, 3]) == pd.Index([1, 2])
   ValueError: Lengths must match to compare

   In [11]: pd.Series([1, 2, 3]) == pd.Series([1, 4, 5])
   Out[11]:
   0     True
   1    False
   2    False
   dtype: bool

   In [12]: pd.Series([1, 2, 3]) == pd.Series([2])
   ValueError: Series lengths must match to compare

   In [13]: pd.Series([1, 2, 3]) == pd.Series([1, 2])
   ValueError: Series lengths must match to compare

Note that this is different from the ``numpy`` behavior where a comparison can
be broadcast:

.. ipython:: python

   np.array([1, 2, 3]) == np.array([1])

or it can return False if broadcasting can not be done:

.. ipython:: python

   np.array([1, 2, 3]) == np.array([1, 2])

.. _whatsnew_0170.api_breaking.other:

Other API Changes
^^^^^^^^^^^^^^^^^
- Enable writing Excel files in :ref:`memory <_io.excel_writing_buffer>` using StringIO/BytesIO (:issue:`7074`)
- Enable serialization of lists and dicts to strings in ExcelWriter (:issue:`8188`)
- Allow passing `kwargs` to the interpolation methods (:issue:`10378`).
- Serialize metadata properties of subclasses of pandas objects (:issue:`10553`).

.. _whatsnew_0170.deprecations:

Deprecations
^^^^^^^^^^^^

.. _whatsnew_0170.prior_deprecations:

Removal of prior version deprecations/changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. _dask: https://dask.readthedocs.org/en/latest/

.. _whatsnew_0170.gil:

Releasing the GIL
~~~~~~~~~~~~~~~~~

We are releasing the global-interpreter-lock (GIL) on some cython operations.
This will allow other threads to run simultaneously during computation, potentially allowing performance improvements
from multi-threading. Notably ``groupby`` and some indexing operations are a benefit from this. (:issue:`8882`)

For example the groupby expression in the following code will have the GIL released during the factorization step, e.g. ``df.groupby('key')``
as well as the ``.sum()`` operation.

.. code-block:: python

   N = 1e6
   df = DataFrame({'key' : np.random.randint(0,ngroups,size=N),
                   'data' : np.random.randn(N) })
   df.groupby('key')['data'].sum()

Releasing of the GIL could benefit an application that uses threads for user interactions (e.g. ``QT``), or performaning multi-threaded computations. A nice example of a library that can handle these types of computation-in-parallel is the dask_ library.

.. _whatsnew_0170.performance:


Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Added vbench benchmarks for alternative ExcelWriter engines and reading Excel files (:issue:`7171`)

- 4x improvement in ``timedelta`` string parsing (:issue:`6755`, :issue:`10426`)
- 8x improvement in ``timedelta64`` and ``datetime64`` ops (:issue:`6755`)
- Significantly improved performance of indexing ``MultiIndex`` with slicers (:issue:`10287`)
- Improved performance of ``Series.isin`` for datetimelike/integer Series (:issue:`10287`)

.. _whatsnew_0170.bug_fixes:

Bug Fixes
~~~~~~~~~

- Bug in ``DataFrame.apply`` when function returns categorical series. (:issue:`9573`)
- Bug in ``to_datetime`` with invalid dates and formats supplied (:issue:`10154`)

- Bug in ``Index.drop_duplicates`` dropping name(s) (:issue:`10115`)


- Bug in ``pd.Series`` when setting a value on an empty ``Series`` whose index has a frequency. (:issue:`10193`)

- Bug in ``DataFrame.plot`` raises ``ValueError`` when color name is specified by multiple characters (:issue:`10387`)
- Bug in ``DataFrame.reset_index`` when index contains `NaT`. (:issue:`10388`)


- Bug in ``ExcelReader`` when worksheet is empty (:issue:`6403`)


- Bug in ``Table.select_column`` where name is not preserved (:issue:`10392`)
- Bug in ``offsets.generate_range`` where ``start`` and ``end`` have finer precision than ``offset`` (:issue:`9907`)


- Bug in ``DataFrame.interpolate`` with ``axis=1`` and ``inplace=True`` (:issue:`10395`)

- Bug in ``io.sql.get_schema`` when specifying multiple columns as primary
  key (:issue:`10385`).


- Bug in ``test_categorical`` on big-endian builds (:issue:`10425`)
- Bug in ``Series.map`` using categorical ``Series`` raises ``AttributeError`` (:issue:`10324`)
- Bug in ``MultiIndex.get_level_values`` including ``Categorical`` raises ``AttributeError`` (:issue:`10460`)

- Bug that caused segfault when resampling an empty Series (:issue:`10228`)
- Bug in ``DatetimeIndex`` and ``PeriodIndex.value_counts`` resets name from its result, but retains in result's ``Index``. (:issue:`10150`)

- Bug in `pandas.concat` with ``axis=0`` when column is of dtype ``category`` (:issue:`10177`)

- Bug in ``read_msgpack`` where input type is not always checked (:issue:`10369`)

- Bug in `pandas.read_csv` with ``index_col=False`` or with ``index_col=['a', 'b']``  (:issue:`10413`, :issue:`10467`)

- Bug in `Series.from_csv` with ``header`` kwarg not setting the ``Series.name`` or the ``Series.index.name`` (:issue:`10483`)

- Bug in `groupby.var` which caused variance to be inaccurate for small float values (:issue:`10448`)

- Bug in ``Series.plot(kind='hist')`` Y Label not informative (:issue:`10485`)

- Bug in operator equal on Index not being consistent with Series (:issue:`9947`)