doc/source/whatsnew/v0.16.0.txt

.. _whatsnew_0160:

v0.16.0 (February ??, 2015)
---------------------------

This is a major release from 0.15.2 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

- Highlights include:

- Check the :ref:`API Changes <whatsnew_0160.api>` and :ref:`deprecations <whatsnew_0160.deprecations>` before updating

- :ref:`Other Enhancements <whatsnew_0160.enhancements>`

- :ref:`Performance Improvements <whatsnew_0160.performance>`

- :ref:`Bug Fixes <whatsnew_0160.bug_fixes>`

New features
~~~~~~~~~~~~

.. _whatsnew_0160.api:

.. _whatsnew_0160.api_breaking:

Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0160.api_breaking.timedelta:

- In v0.15.0 a new scalar type ``Timedelta`` was introduced, that is a
  sub-class of ``datetime.timedelta``.
  Mentioned :ref:`here <whatsnew_0150.timedeltaindex>` was a notice of an API
  change w.r.t. the ``.seconds`` accessor. The intent was to provide a
  user-friendly set of accessors that give the 'natural' value for that unit,
  e.g. if you had a ``Timedelta('1 day, 10:11:12')``, then ``.seconds`` would
  return 12. However, this is at odds with the definition of
  ``datetime.timedelta``, which defines ``.seconds`` as
  ``10 * 3600 + 11 * 60 + 12 == 36672``.

  So in v0.16.0, we are restoring the API to match that of
  ``datetime.timedelta``. Further, the component values are still available
  through the ``.components`` accessor. This affects the ``.seconds`` and
  ``.microseconds`` accessors, and removes the ``.hours``, ``.minutes``,
  ``.milliseconds`` accessors. These changes affect ``TimedeltaIndex``
  and the Series ``.dt`` accessor as well. (:issue:`9185`, :issue:`9139`)

  Previous Behavior

  .. code-block:: python

     In [2]: t = pd.Timedelta('1 day, 10:11:12.100123')

     In [3]: t.days
     Out[3]: 1

     In [4]: t.seconds
     Out[4]: 12

     In [5]: t.microseconds
     Out[5]: 123

  New Behavior

  .. ipython:: python

     t = pd.Timedelta('1 day, 10:11:12.100123')
     t.days
     t.seconds
     t.microseconds

  Using ``.components`` allows the full component access

  .. ipython:: python

     t.components
     t.components.seconds

- ``Index.duplicated`` now returns `np.array(dtype=bool)` rather than `Index(dtype=object)` containing `bool` values. (:issue:`8875`)
- ``DataFrame.to_json`` now returns accurate type serialisation for each column for frames of mixed dtype (:issue:`9037`)

  Previously data was coerced to a common dtype before serialisation, which for
  example resulted in integers being serialised to floats:

  .. code-block:: python

    In [2]: pd.DataFrame({'i': [1,2], 'f': [3.0, 4.2]}).to_json()
    Out[2]: '{"f":{"0":3.0,"1":4.2},"i":{"0":1.0,"1":2.0}}'

  Now each column is serialised using its correct dtype:

  .. code-block:: python

    In [2]:  pd.DataFrame({'i': [1,2], 'f': [3.0, 4.2]}).to_json()
    Out[2]: '{"f":{"0":3.0,"1":4.2},"i":{"0":1,"1":2}}'

- ``DatetimeIndex``, ``PeriodIndex`` and ``TimedeltaIndex.summary`` now output the same format. (:issue:`9116`)
- ``TimedeltaIndex.freqstr`` now output the same string format as ``DatetimeIndex``. (:issue:`9116`)


- Bar and horizontal bar plots no longer add a dashed line along the info axis.
The prior style can be achieved with matplotlib's `axhline` or `axvline`
methods (:issue:`9088`).

Deprecations
~~~~~~~~~~~~

.. _whatsnew_0160.deprecations:


Enhancements
~~~~~~~~~~~~

.. _whatsnew_0160.enhancements:

- Paths beginning with ~ will now be expanded to begin with the user's home directory (:issue:`9066`)
- Added time interval selection in get_data_yahoo (:issue:`9071`)
- Added ``Series.str.slice_replace()``, which previously raised NotImplementedError (:issue:`8888`)
- Added ``Timestamp.to_datetime64()`` to complement ``Timedelta.to_timedelta64()`` (:issue:`9255`)
- ``tseries.frequencies.to_offset()`` now accepts ``Timedelta`` as input (:issue:`9064`)
- Lag parameter was added to the autocorrelation method of Series, defaults to lag-1 autocorrelation (:issue:`9192`)
- ``Timedelta`` will now accept nanoseconds keyword in constructor (:issue:`9273`)
- SQL code now safely escapes table and column names (:issue:`8986`)

- Added auto-complete for ``Series.str.<tab>``, ``Series.dt.<tab>`` and ``Series.cat.<tab>`` (:issue:`9322`)
- Added ``StringMethods.isalnum()``, ``isalpha()``, ``isdigit()``, ``isspace()``, ``islower()``,
``isupper()``, ``istitle()`` which behave as the same as standard ``str`` (:issue:`9282`)


- Added ``StringMethods.ljust()`` and ``rjust()`` which behave as the same as standard ``str`` (:issue:`9352`)
- ``StringMethods.pad()`` and ``center()`` now accept `fillchar` option to specify filling character (:issue:`9352`)

Performance
~~~~~~~~~~~

.. _whatsnew_0160.performance:

- Fixed a performance regression for ``.loc`` indexing with an array or list-like (:issue:`9126`:).
- ``DataFrame.to_json`` 30x performance improvement for mixed dtype frames. (:issue:`9037`)
- Performance improvements in ``MultiIndex.duplicated`` by working with labels instead of values (:issue:`9125`)
- Improved the speed of `nunique` by calling `unique` instead of `value_counts` (:issue:`9129`, :issue:`7771`)
- Performance improvement of up to 10x in ``DataFrame.count`` and ``DataFrame.dropna`` by taking advantage of homogeneous/heterogeneous dtypes appropriately (:issue:`9136`)
- Performance improvement of up to 20x in ``DataFrame.count`` when using a ``MultiIndex`` and the ``level`` keyword argument  (:issue:`9163`)
- Performance and memory usage improvements in ``merge`` when key space exceeds ``int64`` bounds (:issue:`9151`)

Bug Fixes
~~~~~~~~~

.. _whatsnew_0160.bug_fixes:

- Fixed issue using `read_csv` on s3 with Python 3.
- Fixed compatibility issue in ``DatetimeIndex`` affecting architectures where ``numpy.int_`` defaults to ``numpy.int32`` (:issue:`8943`)
- Bug in Panel indexing with an object-like (:issue:`9140`)
- Bug in the returned ``Series.dt.components`` index was reset to the default index (:issue:`9247`)


- Fixed bug in ``to_sql`` when mapping a Timestamp object column (datetime
  column with timezone info) to the according sqlalchemy type (:issue:`9085`).
- Fixed bug in ``to_sql`` ``dtype`` argument not accepting an instantiated
  SQLAlchemy type  (:issue:`9083`).


- Fixed bug on bug endian platforms which produced incorrect results in ``StataReader`` (:issue:`8688`).

- Bug in ``MultiIndex.has_duplicates`` when having many levels causes an indexer overflow (:issue:`9075`, :issue:`5873`)
- Bug in ``pivot`` and `unstack`` where ``nan`` values would break index alignment (:issue:`4862`, :issue:`7401`, :issue:`7403`, :issue:`7405`, :issue:`7466`)
- Bug in left ``join`` on multi-index with ``sort=True`` or null values (:issue:`9210`).
- Bug in ``MultiIndex`` where inserting new keys would fail (:issue:`9250`).
- Bug in ``groupby`` when key space exceeds ``int64`` bounds (:issue:`9096`).


- Fixed character encoding bug in ``read_stata`` and ``StataReader`` when loading data from a URL (:issue:`9231`).


- Bug in adding ``offsets.Nano`` to other offets raises ``TypeError`` (:issue:`9284`)


- Bug in DatetimeIndex iteration, related to (:issue:`8890`), fixed in (:issue:`9100`)


- Bug in boxplot, scatter and hexbin plot may show an unnecessary warning (:issue:`8877`)


- Bug in using grouper functions that need passed thru arguments (e.g. axis), when using wrapped function (e.g. ``fillna``), (:issue:`9221`)

- DataFrame now properly supports simultaneous ``copy`` and ``dtype`` arguments in constructor (:issue:`9099`)
- Bug in read_csv when using skiprows on a file with CR line endings with the c engine. (:issue:`9079`)
- isnull now detects ``NaT`` in PeriodIndex (:issue:`9129`)
- Bug in groupby ``.nth()`` with a multiple column groupby (:issue:`8979`)
- Bug in ``DataFrame.where`` and ``Series.where`` coerce numerics to string incorrectly (:issue:`9280`)
- Bug in ``DataFrame.where`` and ``Series.where`` raise ``ValueError`` when string list-like is passed. (:issue:`9280`)
- Accessing ``Series.str`` methods on with non-string values now raises ``TypeError`` instead of producing incorrect results (:issue:`9184`)

- Fixed division by zero error for ``Series.kurt()`` when all values are equal (:issue:`9197`)


- Fixed issue in the ``xlsxwriter`` engine where it added a default 'General' format to cells if no other format wass applied. This prevented other row or column formatting being applied. (:issue:`9167`)
- Fixes issue with ``index_col=False`` when ``usecols`` is also specified in ``read_csv``. (:issue:`9082`)
- Bug where ``wide_to_long`` would modify the input stubnames list (:issue:`9204`)
- Bug in to_sql not storing float64 values using double precision. (:issue:`9009`)


- ``SparseSeries`` and ``SparsePanel`` now accept zero argument constructors (same as their non-sparse counterparts) (:issue:`9272`).