doc/source/whatsnew/v0.17.0.txt

.. _whatsnew_0170:

v0.17.0 (July 31, 2015)
-----------------------

This is a major release from 0.16.2 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

.. warning::

   pandas >= 0.17.0 will no longer support compatibility with Python version 3.2 (:issue:`9118`)

Highlights include:

  - Release the Global Interpreter Lock (GIL) on some cython operations, see :ref:`here <whatsnew_0170.gil>`
  - Development installed versions of pandas will now have ``PEP440`` compliant version strings (:issue:`9518`)

Check the :ref:`API Changes <whatsnew_0170.api>` and :ref:`deprecations <whatsnew_0170.deprecations>` before updating.

.. contents:: What's new in v0.17.0
    :local:
    :backlinks: none

.. _whatsnew_0170.enhancements:

New features
~~~~~~~~~~~~

- SQL io functions now accept a SQLAlchemy connectable. (:issue:`7877`)
- Enable writing complex values to HDF stores when using table format (:issue:`10447`)

.. _whatsnew_0170.enhancements.other:

Other enhancements
^^^^^^^^^^^^^^^^^^

- Enable `read_hdf` to be used without specifying a key when the HDF file contains a single dataset (:issue:`10443`)

- ``DatetimeIndex`` can be instantiated using strings contains ``NaT`` (:issue:`7599`)
- The string parsing of ``to_datetime``, ``Timestamp`` and ``DatetimeIndex`` has been made consistent" (:issue:`7599`)

  Prior to v0.17.0, ``Timestamp`` and ``to_datetime`` may parse year-only datetime-string incorrectly using today's date, otherwise ``DatetimeIndex`` uses the beginning of the year.
  ``Timestamp`` and ``to_datetime`` may raise ``ValueError`` in some types of datetime-string which ``DatetimeIndex`` can parse, such as quarterly string.

  Previous Behavior

  .. code-block:: python

     In [1]: Timestamp('2012Q2')
     Traceback
        ...
     ValueError: Unable to parse 2012Q2

     # Results in today's date.
     In [2]: Timestamp('2014')
     Out [2]: 2014-08-12 00:00:00

  v0.17.0 can parse them as below. It works on ``DatetimeIndex`` also.

  New Behaviour

  .. ipython:: python

     Timestamp('2012Q2')
     Timestamp('2014')
     DatetimeIndex(['2012Q2', '2014'])

  .. note:: If you want to perform calculations based on today's date, use ``Timestamp.now()`` and ``pandas.tseries.offsets``.

  .. ipython:: python

     import pandas.tseries.offsets as offsets
     Timestamp.now()
     Timestamp.now() + offsets.DateOffset(years=1)

- ``to_datetime`` can now accept ``yearfirst`` keyword (:issue:`7599`)

- ``.as_blocks`` will now take a ``copy`` optional argument to return a copy of the data, default is to copy (no change in behavior from prior versions), (:issue:`9607`)

- ``regex`` argument to ``DataFrame.filter`` now handles numeric column names instead of raising ``ValueError`` (:issue:`10384`).

.. _whatsnew_0170.api:

.. _whatsnew_0170.api_breaking:

Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0170.api_breaking.convert_objects:

Changes to convert_objects
^^^^^^^^^^^^^^^^^^^^^^^^^^

``DataFrame.convert_objects`` keyword arguments have been shortened. (:issue:`10265`)

  =====================   =============
  Old                     New
  =====================   =============
  ``convert_dates``       ``datetime``
  ``convert_numeric``     ``numeric``
  ``convert_timedelta``   ``timedelta``
  =====================   =============

Coercing types with ``DataFrame.convert_objects`` is now implemented using the
keyword argument ``coerce=True``.  Previously types were coerced by setting a
keyword argument to ``'coerce'`` instead of ``True``, as in ``convert_dates='coerce'``.

.. ipython:: python

   df = pd.DataFrame({'i': ['1','2'],
                      'f': ['apple', '4.2'],
                      's': ['apple','banana']})
   df

The old usage of ``DataFrame.convert_objects`` used `'coerce'` along with the
type.

.. code-block:: python

   In [2]: df.convert_objects(convert_numeric='coerce')

Now the ``coerce`` keyword must be explicitly used.

.. ipython:: python

   df.convert_objects(numeric=True, coerce=True)

In earlier versions of pandas, ``DataFrame.convert_objects`` would not coerce
numeric types when there were no values convertible to a numeric type. This returns
the original DataFrame with no conversion. This change alters
this behavior so that converts all non-number-like strings to ``NaN``.

.. code-block:: python

   In [1]: df = pd.DataFrame({'s': ['a','b']})
   In [2]: df.convert_objects(convert_numeric='coerce')
   Out[2]:
          s
       0  a
       1  b

.. ipython:: python

   pd.DataFrame({'s': ['a','b']})
   df.convert_objects(numeric=True, coerce=True)

In earlier versions of pandas, the default behavior was to try and convert
datetimes and timestamps. The new default is for ``DataFrame.convert_objects``
to do nothing, and so it is necessary to pass at least one conversion target
in the method call.

Changes to Index Comparisons
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Operator equal on Index should behavior similarly to Series (:issue:`9947`)

Starting in v0.17.0, comparing ``Index`` objects of different lengths will raise
a ``ValueError``. This is to be consistent with the behavior of ``Series``.

Previous behavior:

.. code-block:: python

   In [2]: pd.Index([1, 2, 3]) == pd.Index([1, 4, 5])
   Out[2]: array([ True, False, False], dtype=bool)

   In [3]: pd.Index([1, 2, 3]) == pd.Index([2])
   Out[3]: array([False,  True, False], dtype=bool)

   In [4]: pd.Index([1, 2, 3]) == pd.Index([1, 2])
   Out[4]: False

   In [5]: pd.Series([1, 2, 3]) == pd.Series([1, 4, 5])
   Out[5]:
   0     True
   1    False
   2    False
   dtype: bool

   In [6]: pd.Series([1, 2, 3]) == pd.Series([2])
   ValueError: Series lengths must match to compare

   In [7]: pd.Series([1, 2, 3]) == pd.Series([1, 2])
   ValueError: Series lengths must match to compare

New behavior:

.. code-block:: python

   In [8]: pd.Index([1, 2, 3]) == pd.Index([1, 4, 5])
   Out[8]: array([ True, False, False], dtype=bool)

   In [9]: pd.Index([1, 2, 3]) == pd.Index([2])
   ValueError: Lengths must match to compare

   In [10]: pd.Index([1, 2, 3]) == pd.Index([1, 2])
   ValueError: Lengths must match to compare

   In [11]: pd.Series([1, 2, 3]) == pd.Series([1, 4, 5])
   Out[11]:
   0     True
   1    False
   2    False
   dtype: bool

   In [12]: pd.Series([1, 2, 3]) == pd.Series([2])
   ValueError: Series lengths must match to compare

   In [13]: pd.Series([1, 2, 3]) == pd.Series([1, 2])
   ValueError: Series lengths must match to compare

Note that this is different from the ``numpy`` behavior where a comparison can
be broadcast:

.. ipython:: python

   np.array([1, 2, 3]) == np.array([1])

or it can return False if broadcasting can not be done:

.. ipython:: python

   np.array([1, 2, 3]) == np.array([1, 2])

Changes to Boolean Comparisons vs. None
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Boolean comparisons of a ``Series`` vs ``None`` will now be equivalent to comparing with ``np.nan``, rather than raise ``TypeError``. xref (:issue:`1079`).

.. ipython:: python

   s = Series(range(3))
   s.iloc[1] = None
   s

Previous behavior:

.. code-block:: python

   In [5]: s==None
   TypeError: Could not compare <type 'NoneType'> type with Series

New behavior:

.. ipython:: python

   s==None

Usually you simply want to know which values are null.

.. ipython:: python

   s.isnull()

.. warning::

   You generally will want to use ``isnull/notnull`` for these types of comparisons, as ``isnull/notnull`` tells you which elements are null. One has to be
   mindful that ``nan's`` don't compare equal, but ``None's`` do. Note that Pandas/numpy uses the fact that ``np.nan != np.nan``, and treats ``None`` like ``np.nan``.

   .. ipython:: python

      None == None
      np.nan == np.nan

Other API Changes
^^^^^^^^^^^^^^^^^

- Enable writing Excel files in :ref:`memory <_io.excel_writing_buffer>` using StringIO/BytesIO (:issue:`7074`)
- Enable serialization of lists and dicts to strings in ExcelWriter (:issue:`8188`)
- Allow passing `kwargs` to the interpolation methods (:issue:`10378`).
- Serialize metadata properties of subclasses of pandas objects (:issue:`10553`).

.. _whatsnew_0170.deprecations:

Deprecations
^^^^^^^^^^^^

.. _whatsnew_0170.prior_deprecations:

Removal of prior version deprecations/changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- Remove use of some deprecated numpy comparison operations, mainly in tests. (:issue:`10569`)

.. _dask: https://dask.readthedocs.org/en/latest/

.. _whatsnew_0170.gil:

Releasing the GIL
~~~~~~~~~~~~~~~~~

We are releasing the global-interpreter-lock (GIL) on some cython operations.
This will allow other threads to run simultaneously during computation, potentially allowing performance improvements
from multi-threading. Notably ``groupby`` and some indexing operations are a benefit from this. (:issue:`8882`)

For example the groupby expression in the following code will have the GIL released during the factorization step, e.g. ``df.groupby('key')``
as well as the ``.sum()`` operation.

.. code-block:: python

   N = 1e6
   df = DataFrame({'key' : np.random.randint(0,ngroups,size=N),
                   'data' : np.random.randn(N) })
   df.groupby('key')['data'].sum()

Releasing of the GIL could benefit an application that uses threads for user interactions (e.g. ``QT``), or performaning multi-threaded computations. A nice example of a library that can handle these types of computation-in-parallel is the dask_ library.


.. _whatsnew_0170.performance:

Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Added vbench benchmarks for alternative ExcelWriter engines and reading Excel files (:issue:`7171`)

- 4x improvement in ``timedelta`` string parsing (:issue:`6755`, :issue:`10426`)
- 8x improvement in ``timedelta64`` and ``datetime64`` ops (:issue:`6755`)
- Significantly improved performance of indexing ``MultiIndex`` with slicers (:issue:`10287`)
- Improved performance of ``Series.isin`` for datetimelike/integer Series (:issue:`10287`)
- 20x improvement in ``concat`` of Categoricals when categories are identical (:issue:`10587`)


.. _whatsnew_0170.bug_fixes:

Bug Fixes
~~~~~~~~~

- Bug in ``DataFrame.apply`` when function returns categorical series. (:issue:`9573`)
- Bug in ``to_datetime`` with invalid dates and formats supplied (:issue:`10154`)
- Bug in ``Index.drop_duplicates`` dropping name(s) (:issue:`10115`)
- Bug in ``pd.Series`` when setting a value on an empty ``Series`` whose index has a frequency. (:issue:`10193`)
- Bug in ``DataFrame.plot`` raises ``ValueError`` when color name is specified by multiple characters (:issue:`10387`)
- Bug in ``DataFrame.reset_index`` when index contains `NaT`. (:issue:`10388`)
- Bug in ``ExcelReader`` when worksheet is empty (:issue:`6403`)
- Bug in ``Table.select_column`` where name is not preserved (:issue:`10392`)
- Bug in ``offsets.generate_range`` where ``start`` and ``end`` have finer precision than ``offset`` (:issue:`9907`)


- Bug in ``DataFrame.interpolate`` with ``axis=1`` and ``inplace=True`` (:issue:`10395`)
- Bug in ``io.sql.get_schema`` when specifying multiple columns as primary
  key (:issue:`10385`).
- Bug in ``test_categorical`` on big-endian builds (:issue:`10425`)
- Bug in ``Series.map`` using categorical ``Series`` raises ``AttributeError`` (:issue:`10324`)
- Bug in ``MultiIndex.get_level_values`` including ``Categorical`` raises ``AttributeError`` (:issue:`10460`)


- Bug that caused segfault when resampling an empty Series (:issue:`10228`)
- Bug in ``DatetimeIndex`` and ``PeriodIndex.value_counts`` resets name from its result, but retains in result's ``Index``. (:issue:`10150`)
- Bug in `pandas.concat` with ``axis=0`` when column is of dtype ``category`` (:issue:`10177`)
- Bug in ``read_msgpack`` where input type is not always checked (:issue:`10369`)
- Bug in `pandas.read_csv` with kwargs ``index_col=False``, ``index_col=['a', 'b']`` or ``dtype``
  (:issue:`10413`, :issue:`10467`, :issue:`10577`)
- Bug in `Series.from_csv` with ``header`` kwarg not setting the ``Series.name`` or the ``Series.index.name`` (:issue:`10483`)
- Bug in `groupby.var` which caused variance to be inaccurate for small float values (:issue:`10448`)
- Bug in ``Series.plot(kind='hist')`` Y Label not informative (:issue:`10485`)


- Bug in operator equal on Index not being consistent with Series (:issue:`9947`)