doc/source/whatsnew/v0.20.0.txt

.. _whatsnew_0200:

v0.20.0 (????, 2017)
--------------------

This is a major release from 0.19 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

Highlights include:

- Building pandas for development now requires ``cython >= 0.23`` (:issue:`14831`)
- The ``.ix`` indexer has been deprecated, see :ref:`here <whatsnew_0200.api_breaking.deprecate_ix>`
- Switched the test framework to `pytest`_ (:issue:`13097`)
- A new orient for JSON serialization, ``orient='table'``, that uses the Table Schema spec, see :ref: `here <whatsnew_0200.enhancements.table_schema>`

.. _pytest: http://doc.pytest.org/en/latest/

Check the :ref:`API Changes <whatsnew_0200.api_breaking>` and :ref:`deprecations <whatsnew_0200.deprecations>` before updating.

.. contents:: What's new in v0.20.0
    :local:
    :backlinks: none

.. _whatsnew_0200.enhancements:

New features
~~~~~~~~~~~~

- Integration with the ``feather-format``, including a new top-level ``pd.read_feather()`` and ``DataFrame.to_feather()`` method, see :ref:`here <io.feather>`.
- ``Series.str.replace()`` now accepts a callable, as replacement, which is passed to ``re.sub`` (:issue:`15055`)
- ``Series.str.replace()`` now accepts a compiled regular expression as a pattern (:issue:`15446`)


.. _whatsnew_0200.enhancements.dataio_dtype:

``dtype`` keyword for data IO
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``dtype`` keyword argument in the :func:`read_csv` function for specifying the types of parsed columns is now supported with the ``'python'`` engine (:issue:`14295`). See the :ref:`io docs <io.dtypes>` for more information.

.. ipython:: python

   data = "a,b\n1,2\n3,4"
   pd.read_csv(StringIO(data), engine='python').dtypes
   pd.read_csv(StringIO(data), engine='python', dtype={'a':'float64', 'b':'object'}).dtypes

The ``dtype`` keyword argument is also now supported in the :func:`read_fwf` function for parsing
fixed-width text files, and :func:`read_excel` for parsing Excel files.

.. ipython:: python

   data = "a  b\n1  2\n3  4"
   pd.read_fwf(StringIO(data)).dtypes
   pd.read_fwf(StringIO(data), dtype={'a':'float64', 'b':'object'}).dtypes

.. _whatsnew_0200.enhancements.groupby_access:

Groupby Enhancements
^^^^^^^^^^^^^^^^^^^^

Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names (:issue:`5677`)

.. ipython:: python

   arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
             ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

   index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])

   df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3, 3],
                      'B': np.arange(8)},
                     index=index)
   df

   df.groupby(['second', 'A']).sum()

.. _whatsnew_0200.enhancements.compressed_urls:

Better support for compressed URLs in ``read_csv``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The compression code was refactored (:issue:`12688`). As a result, reading
dataframes from URLs in :func:`read_csv` or :func:`read_table` now supports
additional compression methods: ``xz``, ``bz2``, and ``zip`` (:issue:`14570`).
Previously, only ``gzip`` compression was supported. By default, compression of
URLs and paths are now both inferred using their file extensions. Additionally,
support for bz2 compression in the python 2 c-engine improved (:issue:`14874`).

.. ipython:: python

   url = 'https://github.com/{repo}/raw/{branch}/{path}'.format(
       repo = 'pandas-dev/pandas',
       branch = 'master',
       path = 'pandas/tests/io/parser/data/salaries.csv.bz2',
   )
   df = pd.read_table(url, compression='infer')  # default, infer compression
   df = pd.read_table(url, compression='bz2')  # explicitly specify compression
   df.head(2)

.. _whatsnew_0200.enhancements.uint64_support:

UInt64 Support Improved
^^^^^^^^^^^^^^^^^^^^^^^

Pandas has significantly improved support for operations involving unsigned,
or purely non-negative, integers. Previously, handling these integers would
result in improper rounding or data-type casting, leading to incorrect results.
Notably, a new numerical index, ``UInt64Index``, has been created (:issue:`14937`)

.. ipython:: python

   idx = pd.UInt64Index([1, 2, 3])
   df = pd.DataFrame({'A': ['a', 'b', 'c']}, index=idx)
   df.index

- Bug in converting object elements of array-like objects to unsigned 64-bit integers (:issue:`4471`, :issue:`14982`)
- Bug in ``Series.unique()`` in which unsigned 64-bit integers were causing overflow (:issue:`14721`)
- Bug in ``DataFrame`` construction in which unsigned 64-bit integer elements were being converted to objects (:issue:`14881`)
- Bug in ``pd.read_csv()`` in which unsigned 64-bit integer elements were being improperly converted to the wrong data types (:issue:`14983`)
- Bug in ``pd.unique()`` in which unsigned 64-bit integers were causing overflow (:issue:`14915`)
- Bug in ``pd.value_counts()`` in which unsigned 64-bit integers were being erroneously truncated in the output (:issue:`14934`)

.. _whatsnew_0200.enhancements.groupy_categorical:

GroupBy on Categoricals
^^^^^^^^^^^^^^^^^^^^^^^

In previous versions, ``.groupby(..., sort=False)`` would fail with a ``ValueError`` when grouping on a categorical series with some categories not appearing in the data. (:issue:`13179`)

.. ipython:: python

  chromosomes = np.r_[np.arange(1, 23).astype(str), ['X', 'Y']]
  df = pd.DataFrame({
      'A': np.random.randint(100),
      'B': np.random.randint(100),
      'C': np.random.randint(100),
      'chromosomes': pd.Categorical(np.random.choice(chromosomes, 100),
                                    categories=chromosomes,
                                    ordered=True)})
  df

Previous Behavior:

.. code-block:: ipython

  In [3]: df[df.chromosomes != '1'].groupby('chromosomes', sort=False).sum()
  ---------------------------------------------------------------------------
  ValueError: items in new_categories are not the same as in old categories

New Behavior:

.. ipython:: python

  df[df.chromosomes != '1'].groupby('chromosomes', sort=False).sum()

.. _whatsnew_0200.enhancements.table_schema

Table Schema Output
^^^^^^^^^^^^^^^^^^^

The new orient ``'table'`` for :meth:`DataFrame.to_json`
will generate a `Table Schema`_ compatible string representation of
the data.

.. ipython:: python

   df = pd.DataFrame(
       {'A': [1, 2, 3],
        'B': ['a', 'b', 'c'],
        'C': pd.date_range('2016-01-01', freq='d', periods=3),
       }, index=pd.Index(range(3), name='idx'))
   df
   df.to_json(orient='table')


See :ref:`IO: Table Schema for more<io.table_schema>`.

Additionally, the repr for ``DataFrame`` and ``Series`` can now publish
this JSON Table schema representation of the Series or DataFrame if you are
using IPython (or another frontend like `nteract`_ using the Jupyter messaging
protocol).
This gives frontends like the Jupyter notebook and `nteract`_
more flexiblity in how they display pandas objects, since they have
more information about the data.
You must enable this by setting the ``display.html.table_schema`` option to True.

.. _Table Schema: http://specs.frictionlessdata.io/json-table-schema/
.. _nteract: http://nteract.io/

.. _whatsnew_0200.enhancements.other:

Other enhancements
^^^^^^^^^^^^^^^^^^
- ``Series.sort_index`` accepts parameters ``kind`` and ``na_position`` (:issue:`13589`, :issue:`14444`)

- ``DataFrame`` has gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`).
- ``DataFrame.groupby()`` has gained a ``.nunique()`` method to count the distinct values for all columns within each group (:issue:`14336`, :issue:`15197`).

- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)
- Multiple offset aliases with decimal points are now supported (e.g. '0.5min' is parsed as '30s') (:issue:`8419`)
- ``.isnull()`` and ``.notnull()`` have been added to ``Index`` object to make them more consistent with the ``Series`` API (:issue:`15300`)
- ``pd.read_gbq`` method now allows query configuration preferences (:issue:`14742`)

- New ``UnsortedIndexError`` (subclass of ``KeyError``) raised when indexing/slicing into an
  unsorted MultiIndex (:issue:`11897`). This allows differentiation between errors due to lack
  of sorting or an incorrect key. See :ref:`here <advanced.unsorted>`
- ``MultiIndex`` has gained a ``.to_frame()`` method to convert to a ``DataFrame`` (:issue:`12397`)
- ``pd.cut`` and ``pd.qcut`` now support datetime64 and timedelta64 dtypes (:issue:`14714`, :issue:`14798`)
- ``pd.qcut`` has gained the ``duplicates='raise'|'drop'`` option to control whether to raise on duplicated edges (:issue:`7751`)
- ``Series`` provides a ``to_excel`` method to output Excel files (:issue:`8825`)
- The ``usecols`` argument in ``pd.read_csv`` now accepts a callable function as a value  (:issue:`14154`)
- The ``skiprows`` argument in ``pd.read_csv`` now accepts a callable function as a value  (:issue:`10882`)
- ``pd.DataFrame.plot`` now prints a title above each subplot if ``suplots=True`` and ``title`` is a list of strings (:issue:`14753`)
- ``pd.Series.interpolate`` now supports timedelta as an index type with ``method='time'`` (:issue:`6424`)
- ``Timedelta.isoformat`` method added for formatting Timedeltas as an `ISO 8601 duration`_. See the :ref:`Timedelta docs <timedeltas.isoformat>` (:issue:`15136`)
- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)
- ``.select_dtypes()`` now allows the string 'datetimetz' to generically select datetimes with tz (:issue:`14910`)
- The ``.to_latex()`` method will now accept ``multicolumn`` and ``multirow`` arguments to use the accompanying LaTeX enhancements
- ``pd.merge_asof()`` gained the option ``direction='backward'|'forward'|'nearest'`` (:issue:`14887`)
- ``Series/DataFrame.asfreq()`` have gained a ``fill_value`` parameter, to fill missing values (:issue:`3715`).
- ``Series/DataFrame.resample.asfreq`` have gained a ``fill_value`` parameter, to fill missing values during resampling (:issue:`3715`).
- ``pandas.tools.hashing`` has gained a ``hash_tuples`` routine, and ``hash_pandas_object`` has gained the ability to hash a ``MultiIndex`` (:issue:`15224`)
- ``Series/DataFrame.squeeze()`` have gained the ``axis`` parameter. (:issue:`15339`)
- ``DataFrame.to_excel()`` has a new ``freeze_panes`` parameter to turn on Freeze Panes when exporting to Excel (:issue:`15160`)
- HTML table output skips ``colspan`` or ``rowspan`` attribute if equal to 1. (:issue:`15403`)
- ``pd.TimedeltaIndex`` now has a custom datetick formatter specifically designed for nanosecond level precision (:issue:`8711`)
- ``pd.types.concat.union_categoricals`` gained the ``ignore_ordered`` argument to allow ignoring the ordered attribute of unioned categoricals (:issue:`13410`). See the :ref:`categorical union docs <categorical.union>` for more information.
- ``pandas.io.json.json_normalize()`` with an empty ``list`` will return an empty ``DataFrame`` (:issue:`15534`)

.. _ISO 8601 duration: https://en.wikipedia.org/wiki/ISO_8601#Durations


.. _whatsnew_0200.api_breaking:

Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0200.api_breaking.deprecate_ix:

Deprecate .ix
^^^^^^^^^^^^^

The ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*, depending on the data type of the index. This has caused quite a bit of user confusion over the years. The full indexing documentation are :ref:`here <indexing>`. (:issue:`14218`)


The recommended methods of indexing are:

- ``.loc`` if you want to *label* index
- ``.iloc`` if you want to *positionally* index.

Using ``.ix`` will now show a ``DeprecationWarning`` with a link to some examples of how to convert code :ref:`here <indexing.deprecate_ix>`.


.. ipython:: python

  df = pd.DataFrame({'A': [1, 2, 3],
                     'B': [4, 5, 6]},
                    index=list('abc'))

  df

Previous Behavior, where you wish to get the 0th and the 2nd elements from the index in the 'A' column.

.. code-block:: ipython

  In [3]: df.ix[[0, 2], 'A']
  Out[3]:
  a    1
  c    3
  Name: A, dtype: int64

Using ``.loc``. Here we will select the appropriate indexes from the index, then use *label* indexing.

.. ipython:: python

  df.loc[df.index[[0, 2]], 'A']

Using ``.iloc``. Here we will get the location of the 'A' column, then use *positional* indexing to select things.

.. ipython:: python

  df.iloc[[0, 2], df.columns.get_loc('A')]


.. _whatsnew.api_breaking.io_compat

Possible incompat for HDF5 formats for pandas < 0.13.0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``pd.TimeSeries`` was deprecated officially in 0.17.0, though has only been an alias since 0.13.0. It has
been dropped in favor of ``pd.Series``. (:issue:``15098).

This *may* cause HDF5 files that were created in prior versions to become unreadable if ``pd.TimeSeries``
was used. This is most likely to be for pandas < 0.13.0. If you find yourself in this situation.
You can use a recent prior version of pandas to read in your HDF5 files,
then write them out again after applying the procedure below.

.. code-block:: ipython

   In [2]: s = pd.TimeSeries([1,2,3], index=pd.date_range('20130101', periods=3))

   In [3]: s
   Out[3]:
   2013-01-01    1
   2013-01-02    2
   2013-01-03    3
   Freq: D, dtype: int64

   In [4]: type(s)
   Out[4]: pandas.core.series.TimeSeries

   In [5]: s = pd.Series(s)

   In [6]: s
   Out[6]:
   2013-01-01    1
   2013-01-02    2
   2013-01-03    3
   Freq: D, dtype: int64

   In [7]: type(s)
   Out[7]: pandas.core.series.Series


.. _whatsnew_0200.api_breaking.index_map:

Map on Index types now return other Index types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-  ``map`` on an ``Index`` now returns an ``Index``, not a numpy array (:issue:`12766`)

  .. ipython:: python

     idx = Index([1, 2])
     idx
     mi = MultiIndex.from_tuples([(1, 2), (2, 4)])
     mi

  Previous Behavior:

  .. code-block:: ipython

     In [5]: idx.map(lambda x: x * 2)
     Out[5]: array([2, 4])

     In [6]: idx.map(lambda x: (x, x * 2))
     Out[6]: array([(1, 2), (2, 4)], dtype=object)

     In [7]: mi.map(lambda x: x)
     Out[7]: array([(1, 2), (2, 4)], dtype=object)

     In [8]: mi.map(lambda x: x[0])
     Out[8]: array([1, 2])

  New Behavior:

  .. ipython:: python

      idx.map(lambda x: x * 2)

      idx.map(lambda x: (x, x * 2))

      mi.map(lambda x: x)

      mi.map(lambda x: x[0])


-  ``map`` on a ``Series`` with ``datetime64`` values may return ``int64`` dtypes rather than ``int32``

  .. ipython:: python

    s = Series(date_range('2011-01-02T00:00', '2011-01-02T02:00', freq='H').tz_localize('Asia/Tokyo'))
    s

  Previous Behavior:

  .. code-block:: ipython

    In [9]: s.map(lambda x: x.hour)
    Out[9]:
    0    0
    1    1
    2    2
    dtype: int32


  New Behavior:

  .. ipython:: python

    s.map(lambda x: x.hour)

.. _whatsnew_0200.api_breaking.s3:

S3 File Handling
^^^^^^^^^^^^^^^^

pandas now uses `s3fs <http://s3fs.readthedocs.io/>`_ for handling S3 connections. This shouldn't break
any code. However, since s3fs is not a required dependency, you will need to install it separately, like ``boto``
in prior versions of pandas. (:issue:`11915`).

.. _whatsnew_0200.api_breaking.partial_string_indexing:

Partial String Indexing Changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:ref:`DatetimeIndex Partial String Indexing <timeseries.partialindexing>` now works as exact match, provided that string resolution coincides with index resolution, including a case when both are seconds (:issue:`14826`). See :ref:`Slice vs. Exact Match <timeseries.slice_vs_exact_match>` for details.

.. ipython:: python

  df = DataFrame({'a': [1, 2, 3]}, DatetimeIndex(['2011-12-31 23:59:59',
                                                  '2012-01-01 00:00:00',
                                                  '2012-01-01 00:00:01']))
Previous Behavior:

.. code-block:: ipython

  In [4]: df['2011-12-31 23:59:59']
  Out[4]:
                         a
  2011-12-31 23:59:59  1

  In [5]: df['a']['2011-12-31 23:59:59']
  Out[5]:
  2011-12-31 23:59:59    1
  Name: a, dtype: int64


New Behavior:

.. code-block:: ipython

  In [4]: df['2011-12-31 23:59:59']
  KeyError: '2011-12-31 23:59:59'

  In [5]: df['a']['2011-12-31 23:59:59']
  Out[5]: 1

.. _whatsnew_0200.api_breaking.gbq:

Pandas Google BigQuery support has moved
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

pandas has split off Google BigQuery support into a separate package ``pandas-gbq``. You can ``pip install pandas-gbq`` to get it.
The functionality of ``pd.read_gbq()`` and ``.to_gbq()`` remains the same with the currently released version of ``pandas-gbq=0.1.2``. (:issue:`15347`)
Documentation is now hosted `here <https://pandas-gbq.readthedocs.io/>`__

.. _whatsnew_0200.api_breaking.memory_usage:

Memory Usage for Index is more Accurate
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions, showing ``.memory_usage()`` on a pandas structure that has an index, would only include actual index values and not include structures that facilitated fast indexing. This will generally be different for ``Index`` and ``MultiIndex`` and less-so for other index types. (:issue:`15237`)

Previous Behavior:

.. code-block:: ipython

   In [8]: index = Index(['foo', 'bar', 'baz'])

   In [9]: index.memory_usage(deep=True)
   Out[9]: 180

   In [10]: index.get_loc('foo')
   Out[10]: 0

   In [11]: index.memory_usage(deep=True)
   Out[11]: 180

New Behavior:

.. code-block:: ipython

   In [8]: index = Index(['foo', 'bar', 'baz'])

   In [9]: index.memory_usage(deep=True)
   Out[9]: 180

   In [10]: index.get_loc('foo')
   Out[10]: 0

   In [11]: index.memory_usage(deep=True)
   Out[11]: 260

.. _whatsnew_0200.api_breaking.groupby_describe:

Groupby Describe Formatting
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The output formatting of ``groupby.describe()`` now labels the ``describe()`` metrics in the columns instead of the index.
This format is consistent with ``groupby.agg()`` when applying multiple functions at once. (:issue:`4792`)

Previous Behavior:

.. code-block:: ipython

   In [1]: df = pd.DataFrame({'A': [1, 1, 2, 2], 'B': [1, 2, 3, 4]})

   In [2]: df.groupby('A').describe()
   Out[2]:
                   B
   A
   1 count  2.000000
     mean   1.500000
     std    0.707107
     min    1.000000
     25%    1.250000
     50%    1.500000
     75%    1.750000
     max    2.000000
   2 count  2.000000
     mean   3.500000
     std    0.707107
     min    3.000000
     25%    3.250000
     50%    3.500000
     75%    3.750000
     max    4.000000

   In [3]: df.groupby('A').agg([np.mean, np.std, np.min, np.max])
   Out[3]:
        B
     mean       std amin amax
   A
   1  1.5  0.707107    1    2
   2  3.5  0.707107    3    4

New Behavior:

.. ipython:: python

   df = pd.DataFrame({'A': [1, 1, 2, 2], 'B': [1, 2, 3, 4]})

   df.groupby('A').describe()

   df.groupby('A').agg([np.mean, np.std, np.min, np.max])

.. _whatsnew_0200.api_breaking.hdfstore_where:

HDFStore where string comparison
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions most types could be compared to string column in a ``HDFStore``
usually resulting in an invalid comparsion.  These comparisions will now raise a
``TypeError`` (:issue:`15492`)

New Behavior:

.. code-block:: ipython

   In [15]: df = pd.DataFrame({'unparsed_date': ['2014-01-01', '2014-01-01']})

   In [16]: df.dtypes
   Out[16]:
   unparsed_date    object
   dtype: object

   In [17]: df.to_hdf('store.h5', 'key', format='table', data_columns=True)

   In [18]: ts = pd.Timestamp('2014-01-01')

   In [19]: pd.read_hdf('store.h5', 'key', where='unparsed_date > ts')
   TypeError: Cannot compare 2014-01-01 00:00:00 of
   type <class 'pandas.tslib.Timestamp'> to string column


.. _whatsnew_0200.api:

Other API Changes
^^^^^^^^^^^^^^^^^

- ``numexpr`` version is now required to be >= 2.4.6 and it will not be used at all if this requisite is not fulfilled (:issue:`15213`).
- ``CParserError`` has been renamed to ``ParserError`` in ``pd.read_csv`` and will be removed in the future (:issue:`12665`)
- ``SparseArray.cumsum()`` and ``SparseSeries.cumsum()`` will now always return ``SparseArray`` and ``SparseSeries`` respectively (:issue:`12855`)
- ``DataFrame.applymap()`` with an empty ``DataFrame`` will return a copy of the empty ``DataFrame`` instead of a ``Series`` (:issue:`8222`)
- ``.loc`` has compat with ``.ix`` for accepting iterators, and NamedTuples (:issue:`15120`)
- ``interpolate()`` and ``fillna()`` will raise a ``ValueError`` if the ``limit`` keyword argument is not greater than 0. (:issue:`9217`)
- ``pd.read_csv()`` will now issue a ``ParserWarning`` whenever there are conflicting values provided by the ``dialect`` parameter and the user (:issue:`14898`)
- ``pd.read_csv()`` will now raise a ``ValueError`` for the C engine if the quote character is larger than than one byte (:issue:`11592`)
- ``inplace`` arguments now require a boolean value, else a ``ValueError`` is thrown (:issue:`14189`)
- ``pandas.api.types.is_datetime64_ns_dtype`` will now report ``True`` on a tz-aware dtype, similar to ``pandas.api.types.is_datetime64_any_dtype``
- ``DataFrame.asof()`` will return a null filled ``Series`` instead the scalar ``NaN`` if a match is not found (:issue:`15118`)
- The :func:`pd.read_gbq` method now stores ``INTEGER`` columns as ``dtype=object`` if they contain ``NULL`` values. Otherwise they are stored as ``int64``. This prevents precision lost for integers greather than 2**53. Furthermore ``FLOAT`` columns with values above 10**4 are no longer casted to ``int64`` which also caused precision loss (:issue:`14064`, :issue:`14305`).
- Reorganization of timeseries development tests (:issue:`14854`)
- Specific support for ``copy.copy()`` and ``copy.deepcopy()`` functions on NDFrame objects (:issue:`15444`)

.. _whatsnew_0200.deprecations:

Deprecations
^^^^^^^^^^^^

- ``SparseArray.to_dense()`` has deprecated the ``fill`` parameter, as that parameter was not being respected (:issue:`14647`)
- ``SparseSeries.to_dense()`` has deprecated the ``sparse_only`` parameter (:issue:`14647`)
- ``Series.repeat()`` has deprecated the ``reps`` parameter in favor of ``repeats`` (:issue:`12662`)
- ``Index.repeat()`` and ``MultiIndex.repeat()`` have deprecated the ``n`` parameter in favor of ``repeats`` (:issue:`12662`)
- ``Categorical.searchsorted()`` and ``Series.searchsorted()`` have deprecated the ``v`` parameter in favor of ``value`` (:issue:`12662`)
- ``TimedeltaIndex.searchsorted()``, ``DatetimeIndex.searchsorted()``, and ``PeriodIndex.searchsorted()`` have deprecated the ``key`` parameter in favor of ``value`` (:issue:`12662`)
- ``DataFrame.astype()`` has deprecated the ``raise_on_error`` parameter in favor of ``errors`` (:issue:`14878`)
- ``Series.sortlevel`` and ``DataFrame.sortlevel`` have been deprecated in favor of ``Series.sort_index`` and ``DataFrame.sort_index`` (:issue:`15099`)
- importing ``concat`` from ``pandas.tools.merge`` has been deprecated in favor of imports from the ``pandas`` namespace. This should only affect explict imports (:issue:`15358`)
- ``Series/DataFrame/Panel.consolidate()`` been deprecated as a public method. (:issue:`15483`)
- The following top-level pandas functions have been deprecated and will be removed in a future version (:issue:`13790`)
  * ``pd.pnow()``, replaced by ``Period.now()``
  * ``pd.Term``, is removed, as it is not applicable to user code. Instead use in-line string expressions in the where clause when searching in HDFStore
  * ``pd.Expr``, is removed, as it is not applicable to user code.
  * ``pd.match()``, is removed.
  * ``pd.groupby()``, replaced by using the ``.groupby()`` method directly on a ``Series/DataFrame``

.. _whatsnew_0200.prior_deprecations:

Removal of prior version deprecations/changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- The ``pandas.rpy`` module is removed. Similar functionality can be accessed
  through the `rpy2 <https://rpy2.readthedocs.io/>`__ project.
  See the :ref:`R interfacing docs <rpy>` for more details.
- The ``pandas.io.ga`` module with a ``google-analytics`` interface is removed (:issue:`11308`).
  Similar functionality can be found in the `Google2Pandas <https://github.com/panalysis/Google2Pandas>`__ package.
- ``pd.to_datetime`` and ``pd.to_timedelta`` have dropped the ``coerce`` parameter in favor of ``errors`` (:issue:`13602`)
- ``pandas.stats.fama_macbeth``, ``pandas.stats.ols``, ``pandas.stats.plm`` and ``pandas.stats.var``, as well as the top-level ``pandas.fama_macbeth`` and ``pandas.ols`` routines are removed. Similar functionaility can be found in the `statsmodels <shttp://www.statsmodels.org/dev/>`__ package. (:issue:`11898`)
- The ``TimeSeries`` and ``SparseTimeSeries`` classes, aliases of ``Series``
  and ``SparseSeries``, are removed (:issue:`10890`, :issue:`15098`).
- ``Series.is_time_series`` is dropped in favor of ``Series.index.is_all_dates`` (:issue:``)
- The deprecated ``irow``, ``icol``, ``iget`` and ``iget_value`` methods are removed
  in favor of ``iloc`` and ``iat`` as explained :ref:`here <whatsnew_0170.deprecations>` (:issue:`10711`).


.. _whatsnew_0200.performance:

Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- Improved performance of ``pd.wide_to_long()`` (:issue:`14779`)
- Increased performance of ``pd.factorize()`` by releasing the GIL with ``object`` dtype when inferred as strings (:issue:`14859`)
- Improved performance of timeseries plotting with an irregular DatetimeIndex
  (or with ``compat_x=True``) (:issue:`15073`).
- Improved performance of ``groupby().cummin()`` and ``groupby().cummax()`` (:issue:`15048`, :issue:`15109`, :issue:`15561`)
- Improved performance and reduced memory when indexing with a ``MultiIndex`` (:issue:`15245`)
- When reading buffer object in ``read_sas()`` method without specified format, filepath string is inferred rather than buffer object. (:issue:`14947`)
- Improved performance of `rank()` for categorical data (:issue:`15498`)
- Improved performance when using ``.unstack()`` (:issue:`15503`)


.. _whatsnew_0200.bug_fixes:

Bug Fixes
~~~~~~~~~

- Bug in ``Timestamp.replace`` now raises ``TypeError`` when incorrect argument names are given; previously this raised ``ValueError`` (:issue:`15240`)
- Bug in ``Index`` power operations with reversed operands (:issue:`14973`)
- Bug in ``TimedeltaIndex`` addition where overflow was being allowed without error (:issue:`14816`)
- Bug in ``TimedeltaIndex`` raising a ``ValueError`` when boolean indexing with ``loc`` (:issue:`14946`)
- Bug in ``DatetimeIndex.round()`` and ``Timestamp.round()`` floating point accuracy when rounding by milliseconds (:issue: `14440`)
- Bug in ``astype()`` where ``inf`` values were incorrectly converted to integers. Now raises error now with ``astype()`` for Series and DataFrames (:issue:`14265`)
- Bug in ``DataFrame(..).apply(to_numeric)`` when values are of type decimal.Decimal. (:issue:`14827`)
- Bug in ``describe()`` when passing a numpy array which does not contain the median to the ``percentiles`` keyword argument (:issue:`14908`)
- Bug in ``DataFrame.sort_values()`` when sorting by multiple columns where one column is of type ``int64`` and contains ``NaT`` (:issue:`14922`)
- Bug in ``DataFrame.reindex()`` in which ``method`` was ignored when passing ``columns`` (:issue:`14992`)
- Bug in ``pd.to_numeric()`` in which float and unsigned integer elements were being improperly casted (:issue:`14941`, :issue:`15005`)
- Cleaned up ``PeriodIndex`` constructor, including raising on floats more consistently (:issue:`13277`)
- Bug in ``pd.read_csv()`` in which the ``dialect`` parameter was not being verified before processing (:issue:`14898`)
- Bug in ``pd.read_fwf`` where the skiprows parameter was not being respected during column width inference (:issue:`11256`)
- Bug in ``pd.read_csv()`` in which missing data was being improperly handled with ``usecols`` (:issue:`6710`)
- Bug in ``pd.read_csv()`` in which a file containing a row with many columns followed by rows with fewer columns would cause a crash (:issue:`14125`)
- Bug in ``pd.tools.hashing.hash_pandas_object()`` in which hashing of categoricals depended on the ordering of categories, instead of just their values. (:issue:`15143`)
- Bug in ``.groupby(..).resample()`` when passed the ``on=`` kwarg. (:issue:`15021`)
- Bug in using ``__deepcopy__`` on empty NDFrame objects (:issue:`15370`)
- Bug in ``DataFrame.loc`` with indexing a ``MultiIndex`` with a ``Series`` indexer (:issue:`14730`, :issue:`15424`)
- Bug in ``DataFrame.loc`` with indexing a ``MultiIndex`` with a numpy array (:issue:`15434`)
- Bug in ``Rolling.quantile`` function that caused a segmentation fault when called with a quantile value outside of the range [0, 1] (:issue:`15463`)


- Bug in the display of ``.info()`` where a qualifier (+) would always be displayed with a ``MultiIndex`` that contains only non-strings (:issue:`15245`)

- Bug in ``.asfreq()``, where frequency was not set for empty ``Series` (:issue:`14320`)

- Bug in ``pd.read_msgpack()`` in which ``Series`` categoricals were being improperly processed (:issue:`14901`)
- Bug in ``Series.ffill()`` with mixed dtypes containing tz-aware datetimes. (:issue:`14956`)

- Bug in ``DataFrame.isin`` comparing datetimelike to empty frame (:issue:`15473`)

- Bug in ``Series.where()`` and ``DataFrame.where()`` where array-like conditionals were being rejected (:issue:`15414`)
- Bug in ``Series`` construction with a datetimetz (:issue:`14928`)
- Bug in output formatting of a ``MultiIndex`` when names are integers (:issue:`12223`, :issue:`15262`)

- Bug in compat for passing long integers to ``Timestamp.replace`` (:issue:`15030`)
- Bug in ``.loc`` that would not return the correct dtype for scalar access for a DataFrame (:issue:`11617`)
- Bug in ``GroupBy.get_group()`` failing with a categorical grouper (:issue:`15155`)
- Bug in ``pandas.tools.utils.cartesian_product()`` with large input can cause overflow on windows (:issue:`15265`)

- Bug in ``DataFrame.to_records()`` with converting a ``DatetimeIndex`` with a timezone (:issue:`13937`)


- Bug in ``.groupby(...).rolling(...)`` when ``on`` is specified and using a ``DatetimeIndex`` (:issue:`15130`)


- Bug in ``to_sql`` when writing a DataFrame with numeric index names (:issue:`15404`).
- Bug in ``Series.iloc`` where a ``Categorical`` object for list-like indexes input was returned, where a ``Series`` was expected. (:issue:`14580`)


- Bug in  groupby operations with timedelta64 when passing ``numeric_only=False`` (:issue:`5724`)


- Bug in ``DataFrame.to_html`` with ``index=False`` and ``max_rows`` raising in ``IndexError`` (:issue:`14998`)

- Bug in ``Categorical.searchsorted()`` where alphabetical instead of the provided categorical order was used (:issue:`14522`)


- Bug in ``resample``, where a non-string ```loffset`` argument would not be applied when resampling a timeseries (:issue:`13218`)


- Bug in ``.rank()`` which incorrectly ranks ordered categories (:issue:`15420`)
- Bug in ``.corr()`` and ``.cov()`` where the column and index were the same object (:issue:`14617`)


- Require at least 0.23 version of cython to avoid problems with character encodings (:issue:`14699`)
- Bug in ``pd.pivot_table()`` where no error was raised when values argument was not in the columns (:issue:`14938`)

- Bug in ``.to_json()`` where ``lines=True`` and contents (keys or values) contain escaped characters (:issue:`15096`)
- Bug in ``.to_json()`` causing single byte ascii characters to be expanded to four byte unicode (:issue:`15344`)
- Bug in ``.read_json()`` for Python 2 where ``lines=True`` and contents contain non-ascii unicode characters (:issue:`15132`)
- Bug in ``.rolling/expanding()`` functions where ``count()`` was not counting ``np.Inf``, nor handling ``object`` dtypes (:issue:`12541`)
- Bug in ``.rolling()`` where ``pd.Timedelta`` or ``datetime.timedelta`` was not accepted as a ``window`` argument (:issue:`15440`)
- Bug in ``DataFrame.resample().median()`` if duplicate column names are present (:issue:`14233`)

- Bug in ``DataFrame.groupby().describe()`` when grouping on ``Index`` containing tuples (:issue:`14848`)
- Bug in creating a ``MultiIndex`` with tuples and not passing a list of names; this will now raise ``ValueError`` (:issue:`15110`)
- Bug in ``groupby().nunique()`` with a datetimelike-grouper where bins counts were incorrect (:issue:`13453`)

- Bug in catching an overflow in ``Timestamp`` + ``Timedelta/Offset`` operations (:issue:`15126`)
- Bug in the HTML display with with a ``MultiIndex`` and truncation (:issue:`14882`)


- Bug in ``pd.merge_asof()`` where ``left_index``/``right_index`` together caused a failure when ``tolerance`` was specified (:issue:`15135`)
- Bug in ``DataFrame.pivot_table()`` where ``dropna=True`` would not drop all-NaN columns when the columns was a ``category`` dtype (:issue:`15193`)


- Bug in ``pd.read_hdf()`` passing a ``Timestamp`` to the ``where`` parameter with a non date column (:issue:`15492`)


- Bug in ``Series`` constructor when both ``copy=True`` and ``dtype`` arguments are provided (:issue:`15125`)
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`)
- Incorrect dtyped ``Series`` was returned by comparison methods (e.g., ``lt``, ``gt``, ...) against a constant for an empty ``DataFrame`` (:issue:`15077`)
- Bug in ``Series.dt.round`` inconsistent behaviour on NAT's with different arguments (:issue:`14940`)
- Bug in ``DataFrame.fillna()`` where the argument ``downcast`` was ignored when fillna value was of type ``dict`` (:issue:`15277`)
- Bug in ``.reset_index()`` when an all ``NaN`` level of a ``MultiIndex`` would fail (:issue:`6322`)

- Bug in ``pd.read_msgpack()`` when deserializing a ``CategoricalIndex`` (:issue:`15487`)
- Bug in ``pd.DataFrame.to_records()`` which failed with unicode characters in column names (:issue:`11879`)


- Bug in ``pd.read_csv()`` with ``float_precision='round_trip'`` which caused a segfault when a text entry is parsed (:issue:`15140`)

- Bug in ``DataFrame.to_stata()`` and ``StataWriter`` which produces incorrectly formatted files to be produced for some locales (:issue:`13856`)
- Bug in ``pd.concat()`` in which concatting with an empty dataframe with ``join='inner'`` was being improperly handled (:issue:`15328`)
- Bug in ``groupby.agg()`` incorrectly localizing timezone on ``datetime`` (:issue:`15426`, :issue:`10668`, :issue:`13046`)


- Bug in ``.read_csv()`` with ``parse_dates`` when multiline headers are specified (:issue:`15376`)
- Bug in ``groupby.transform()`` that would coerce the resultant dtypes back to the original (:issue:`10972`, :issue:`11444`)

- Bug in ``DataFrame.hist`` where ``plt.tight_layout`` caused an ``AttributeError``  (use ``matplotlib >= 0.2.0``) (:issue:`9351`)
- Bug in ``DataFrame.boxplot`` where ``fontsize`` was not applied to the tick labels on both axes (:issue:`15108`)
- Bug in ``Series.replace`` and ``DataFrame.replace`` which failed on empty replacement dicts (:issue:`15289`)
- Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
- Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`)
- Bug in ``pd.read_msgpack`` which did not allow to load dataframe with an index of type ``CategoricalIndex`` (:issue:`15487`)
- Bug in ``pivot_table`` returns ``Series`` in specific circumstance (:issue:`4386`)