doc/source/whatsnew/v0.18.2.txt

.. _whatsnew_0182:

v0.18.2 (July ??, 2016)
-----------------------

This is a minor bug-fix release from 0.18.1 and includes a large number of
bug fixes along with several new features, enhancements, and performance improvements.
We recommend that all users upgrade to this version.

Highlights include:


.. contents:: What's new in v0.18.2
    :local:
    :backlinks: none

.. _whatsnew_0182.new_features:

New features
~~~~~~~~~~~~

.. _whatsnew_0182.enhancements.read_csv_dupe_col_names_support:

``pd.read_csv`` has improved support for duplicate column names
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:ref:`Duplicate column names <io.dupe_names>` are now supported in ``pd.read_csv()`` whether
they are in the file or passed in as the ``names`` parameter (:issue:`7160`, :issue:`9424`)

.. ipython :: python

   data = '0,1,2\n3,4,5'
   names = ['a', 'b', 'a']

Previous behaviour:

.. code-block:: ipython

   In [2]: pd.read_csv(StringIO(data), names=names)
   Out[2]:
      a  b  a
   0  2  1  2
   1  5  4  5

The first 'a' column contains the same data as the second 'a' column, when it should have
contained the array ``[0, 3]``.

New behaviour:

.. ipython :: python

   In [2]: pd.read_csv(StringIO(data), names=names)

.. _whatsnew_0182.enhancements.other:

Other enhancements
^^^^^^^^^^^^^^^^^^

- The ``.tz_localize()`` method of ``DatetimeIndex`` and ``Timestamp`` has gained the ``errors`` keyword, so you can potentially coerce nonexistent timestamps to ``NaT``. The default behaviour remains to raising a ``NonExistentTimeError`` (:issue:`13057`)

- ``Index`` now supports ``.str.extractall()`` which returns ``DataFrame``, see :ref:`Extract all matches in each subject (extractall) <text.extractall>` (:issue:`10008`, :issue:`13156`)
- ``.to_hdf/read_hdf()`` now accept path objects (e.g. ``pathlib.Path``, ``py.path.local``) for the file path (:issue:`11773`)

  .. ipython:: python

     idx = pd.Index(["a1a2", "b1", "c1"])
     idx.str.extractall("[ab](?P<digit>\d)")

- ``Timestamp`` s can now accept positional and keyword parameters like :func:`datetime.datetime` (:issue:`10758`, :issue:`11630`)

  .. ipython:: python

    pd.Timestamp(2012, 1, 1)

    pd.Timestamp(year=2012, month=1, day=1, hour=8, minute=30)

- ``DataFrame.to_sql `` now allows a single value as the SQL type for all columns (:issue:`11886`).

- The ``pd.read_csv()`` with ``engine='python'`` has gained support for the ``decimal`` option (:issue:`12933`)

- ``Index.astype()`` now accepts an optional boolean argument ``copy``, which allows optional copying if the requirements on dtype are satisfied (:issue:`13209`)
- ``Index`` now supports the ``.where()`` function for same shape indexing (:issue:`13170`)

  .. ipython:: python

     idx = pd.Index(['a', 'b', 'c'])
     idx.where([True, False, True])

- ``Categorical.astype()`` now accepts an optional boolean argument ``copy``, effective when dtype is categorical (:issue:`13209`)
- Consistent with the Python API, ``pd.read_csv()`` will now interpret ``+inf`` as positive infinity (:issue:`13274`)

- ``pd.read_html()`` has gained support for the ``decimal`` option (:issue:`12907`)

- ``DataFrame.to_sql `` now allows a single value as the SQL type for all columns (:issue:`11886`).

.. _whatsnew_0182.api:

API changes
~~~~~~~~~~~


- Non-convertible dates in an excel date column will be returned without conversion and the column will be ``object`` dtype, rather than raising an exception  (:issue:`10001`)
- An ``UnsupportedFunctionCall`` error is now raised if numpy ufuncs like ``np.mean`` are called on groupby or resample objects (:issue:`12811`)
- Calls to ``.sample()`` will respect the random seed set via ``numpy.random.seed(n)`` (:issue:`13161`)

.. _whatsnew_0182.api.tolist:

``Series.tolist()`` will now return Python types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``Series.tolist()`` will now return Python types in the output, mimicking NumPy ``.tolist()`` behaviour (:issue:`10904`)


.. ipython:: python

   s = pd.Series([1,2,3])
   type(s.tolist()[0])

Previous Behavior:

.. code-block:: ipython

   In [7]: type(s.tolist()[0])
   Out[7]:
    <class 'numpy.int64'>

New Behavior:

.. ipython:: python

   type(s.tolist()[0])

.. _whatsnew_0182.api.promote:

``Series`` type promotion on assignment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A ``Series`` will now correctly promote its dtype for assignment with incompat values to the current dtype (:issue:`13234`)


.. ipython:: python

   s = pd.Series()

Previous Behavior:

.. code-block:: ipython

   In [2]: s["a"] = pd.Timestamp("2016-01-01")

   In [3]: s["b"] = 3.0
   TypeError: invalid type promotion

New Behavior:

.. ipython:: python

   s["a"] = pd.Timestamp("2016-01-01")
   s["b"] = 3.0
   s
   s.dtype

.. _whatsnew_0182.api.to_datetime_coerce:

``.to_datetime()`` when coercing
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A bug is fixed in ``.to_datetime()`` when passing integers or floats, and no ``unit`` and ``errors='coerce'`` (:issue:`13180`).
Previously if ``.to_datetime()`` encountered mixed integers/floats and strings, but no datetimes with ``errors='coerce'`` it would convert all to ``NaT``.

Previous Behavior:

.. code-block:: ipython

   In [2]: pd.to_datetime([1, 'foo'], errors='coerce')
   Out[2]: DatetimeIndex(['NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

This will now convert integers/floats with the default unit of ``ns``.

.. ipython:: python

   pd.to_datetime([1, 'foo'], errors='coerce')

.. _whatsnew_0182.api.merging:

Merging changes
^^^^^^^^^^^^^^^

Merging will now preserve the dtype of the join keys (:issue:`8596`)

.. ipython:: python

   df1 = pd.DataFrame({'key': [1], 'v1': [10]})
   df1
   df2 = pd.DataFrame({'key': [1, 2], 'v1': [20, 30]})
   df2

Previous Behavior:

.. code-block:: ipython

   In [5]: pd.merge(df1, df2, how='outer')
   Out[5]:
      key    v1
   0  1.0  10.0
   1  1.0  20.0
   2  2.0  30.0

   In [6]: pd.merge(df1, df2, how='outer').dtypes
   Out[6]:
   key    float64
   v1     float64
   dtype: object

New Behavior:

We are able to preserve the join keys

.. ipython:: python

   pd.merge(df1, df2, how='outer')
   pd.merge(df1, df2, how='outer').dtypes

Of course if you have missing values that are introduced, then the
resulting dtype will be upcast (unchanged from previous).

.. ipython:: python

   pd.merge(df1, df2, how='outer', on='key')
   pd.merge(df1, df2, how='outer', on='key').dtypes

.. _whatsnew_0182.api.other:

Other API changes
^^^^^^^^^^^^^^^^^

- ``Float64Index.astype(int)`` will now raise ``ValueError`` if ``Float64Index`` contains ``NaN`` values (:issue:`13149`)
- ``TimedeltaIndex.astype(int)`` and ``DatetimeIndex.astype(int)`` will now return ``Int64Index`` instead of ``np.array`` (:issue:`13209`)

.. _whatsnew_0182.deprecations:

Deprecations
^^^^^^^^^^^^


.. _whatsnew_0182.performance:

Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- Improved performance of sparse ``IntIndex.intersect`` (:issue:`13082`)
- Improved performance of sparse arithmetic with ``BlockIndex`` when the number of blocks are large, though recommended to use ``IntIndex`` in such cases (:issue:`13082`)
- increased performance of ``DataFrame.quantile()`` as it now operates per-block (:issue:`11623`)


- Improved performance of ``DataFrameGroupBy.transform`` (:issue:`12737`)


.. _whatsnew_0182.bug_fixes:

Bug Fixes
~~~~~~~~~

- Bug in ``io.json.json_normalize()``, where non-ascii keys raised an exception (:issue:`13213`)
- Bug in ``SparseSeries`` with ``MultiIndex`` ``[]`` indexing may raise ``IndexError`` (:issue:`13144`)
- Bug in ``SparseSeries`` with ``MultiIndex`` ``[]`` indexing result may have normal ``Index`` (:issue:`13144`)
- Bug in ``SparseDataFrame`` in which ``axis=None`` did not default to ``axis=0`` (:issue:`13048`)
- Bug in ``SparseSeries`` and ``SparseDataFrame`` creation with ``object`` dtype may raise ``TypeError`` (:issue:`11633`)
- Bug when passing a not-default-indexed ``Series`` as ``xerr`` or ``yerr`` in ``.plot()`` (:issue:`11858`)
- Bug in matplotlib ``AutoDataFormatter``; this restores the second scaled formatting and re-adds micro-second scaled formatting (:issue:`13131`)
- Bug in selection from a ``HDFStore`` with a fixed format and ``start`` and/or ``stop`` specified will now return the selected range (:issue:`8287`)


- Bug in ``.groupby(..).resample(..)`` when the same object is called multiple times (:issue:`13174`)
- Bug in ``.to_records()`` when index name is a unicode string (:issue:`13172`)

- Bug in calling ``.memory_usage()`` on object which doesn't implement (:issue:`12924`)

- Regression in ``Series.quantile`` with nans (also shows up in ``.median()`` and ``.describe()``); furthermore now names the ``Series`` with the quantile (:issue:`13098`, :issue:`13146`)

- Bug in ``SeriesGroupBy.transform`` with datetime values and missing groups (:issue:`13191`)

- Bug in ``Series.str.extractall()`` with ``str`` index raises ``ValueError``  (:issue:`13156`)


- Bug in ``PeriodIndex`` and ``Period`` subtraction raises ``AttributeError`` (:issue:`13071`)
- Bug in ``PeriodIndex`` construction returning a ``float64`` index in some circumstances (:issue:`13067`)
- Bug in ``.resample(..)`` with a ``PeriodIndex`` not changing its ``freq`` appropriately when empty (:issue:`13067`)
- Bug in ``.resample(..)`` with a ``PeriodIndex`` not retaining its type or name with an empty ``DataFrame``appropriately when empty (:issue:`13212`)
- Bug in ``groupby(..).resample(..)`` where passing some keywords would raise an exception (:issue:`13235`)
- Bug in ``pd.read_csv`` in which the ``nrows`` argument was not properly validated for both engines (:issue:`10476`)


- Bug in ``MultiIndex`` slicing where extra elements were returned when level is non-unique (:issue:`12896`)


- Bug in ``pd.read_csv()`` with ``engine='python'`` in which infinities of mixed-case forms were not being interpreted properly (:issue:`13274`)


- Bug in ``Series`` arithmetic raises ``TypeError`` if it contains datetime-like as ``object`` dtype (:issue:`13043`)


- Bug in ``pd.to_datetime()`` when passing invalid datatypes (e.g. bool); will now respect the ``errors`` keyword (:issue:`13176`)
- Bug in extension dtype creation where the created types were not is/identical (:issue:`13285`)

- Bug in ``NaT`` - ``Period`` raises ``AttributeError`` (:issue:`13071`)
- Bug in ``Period`` addition raises ``TypeError`` if ``Period`` is on right hand side (:issue:`13069`)
- Bug in ``Peirod`` and ``Series`` or ``Index`` comparison raises ``TypeError`` (:issue:`13200`)
- Bug in ``pd.set_eng_float_format()`` that would prevent NaN's from formatting (:issue:`11981`)
- Bug in ``.unstack`` with ``Categorical`` dtype resets ``.ordered`` to ``True`` (:issue:`13249`)


- Bug in ``groupby`` where ``apply`` returns different result depending on whether first result is ``None`` or not (:issue:`12824`)


- Bug in ``Categorical.remove_unused_categories()`` changes ``.codes`` dtype to platform int (:issue:`13261`)