doc/source/whatsnew/v0.22.0.txt

.. _whatsnew_0220:

v0.22.0
-------

This is a major release from 0.21.1 and includes a number of API changes,
deprecations, new features, enhancements, and performance improvements along
with a large number of bug fixes. We recommend that all users upgrade to this
version.

.. _whatsnew_0220.enhancements:

New features
~~~~~~~~~~~~

-
-
-


.. _whatsnew_0210.enhancements.get_dummies_dtype:

``get_dummies`` now supports ``dtype`` argument
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :func:`get_dummies` now accepts a ``dtype`` argument, which specifies a dtype for the new columns. The default remains uint8. (:issue:`18330`)

.. ipython:: python

   df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
   pd.get_dummies(df, columns=['c']).dtypes
   pd.get_dummies(df, columns=['c'], dtype=bool).dtypes


.. _whatsnew_0220.enhancements.merge_on_columns_and_levels:

Merging on a combination of columns and index levels
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Strings passed to :meth:`DataFrame.merge` as the ``on``, ``left_on``, and ``right_on``
parameters may now refer to either column names or index level names.
This enables merging ``DataFrame`` instances on a combination of index levels
and columns without resetting indexes. See the :ref:`Merge on columns and
levels <merging.merge_on_columns_and_levels>` documentation section.
(:issue:`14355`)

.. ipython:: python

   left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1')

   left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3'],
                        'key2': ['K0', 'K1', 'K0', 'K1']},
                       index=left_index)

   right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1')

   right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
                         'D': ['D0', 'D1', 'D2', 'D3'],
                         'key2': ['K0', 'K0', 'K0', 'K1']},
                        index=right_index)

   left.merge(right, on=['key1', 'key2'])


.. _whatsnew_0220.enhancements.other:

Other Enhancements
^^^^^^^^^^^^^^^^^^

- Better support for :func:`Dataframe.style.to_excel` output with the ``xlsxwriter`` engine. (:issue:`16149`)
- :func:`pandas.tseries.frequencies.to_offset` now accepts leading '+' signs e.g. '+1h'. (:issue:`18171`)
- :func:`MultiIndex.unique` now supports the ``level=`` argument, to get unique values from a specific index level (:issue:`17896`)
- :class:`pandas.io.formats.style.Styler` now has method ``hide_index()`` to determine whether the index will be rendered in ouptut (:issue:`14194`)
- :class:`pandas.io.formats.style.Styler` now has method ``hide_columns()`` to determine whether columns will be hidden in output (:issue:`14194`)
- Improved wording of ``ValueError`` raised in :func:`to_datetime` when ``unit=`` is passed with a non-convertible value (:issue:`14350`)
- :func:`Series.fillna` now accepts a Series or a dict as a ``value`` for a categorical dtype (:issue:`17033`)
- :func:`pandas.read_clipboard` updated to use qtpy, falling back to PyQt5 and then PyQt4, adding compatibility with Python3 and multiple python-qt bindings (:issue:`17722`)
- Improved wording of ``ValueError`` raised in :func:`read_csv` when the ``usecols`` argument cannot match all columns. (:issue:`17301`)

.. _whatsnew_0220.api_breaking:

Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0220.api_breaking.deps:

Dependencies have increased minimum versions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We have updated our minimum supported versions of dependencies (:issue:`15184`).
If installed, we now require:

   +-----------------+-----------------+----------+
   | Package         | Minimum Version | Required |
   +=================+=================+==========+
   | python-dateutil | 2.5.0           |    X     |
   +-----------------+-----------------+----------+
   | openpyxl        | 2.4.0           |          |
   +-----------------+-----------------+----------+


.. _whatsnew_0220.api:

Other API Changes
^^^^^^^^^^^^^^^^^

- :func:`Series.astype` and :func:`Index.astype` with an incompatible dtype will now raise a ``TypeError`` rather than a ``ValueError`` (:issue:`18231`)
- ``Series`` construction with an ``object`` dtyped tz-aware datetime and ``dtype=object`` specified, will now return an ``object`` dtyped ``Series``, previously this would infer the datetime dtype (:issue:`18231`)
- A :class:`Series` of ``dtype=category`` constructed from an empty ``dict`` will now have categories of ``dtype=object`` rather than ``dtype=float64``, consistently with the case in which an empty list is passed (:issue:`18515`)
- ``NaT`` division with :class:`datetime.timedelta` will now return ``NaN`` instead of raising (:issue:`17876`)
- All-NaN levels in a ``MultiIndex`` are now assigned ``float`` rather than ``object`` dtype, promoting consistency with ``Index`` (:issue:`17929`).
- :class:`Timestamp` will no longer silently ignore unused or invalid ``tz`` or ``tzinfo`` keyword arguments (:issue:`17690`)
- :class:`Timestamp` will no longer silently ignore invalid ``freq`` arguments (:issue:`5168`)
- :class:`CacheableOffset` and :class:`WeekDay` are no longer available in the ``pandas.tseries.offsets`` module (:issue:`17830`)
- `tseries.frequencies.get_freq_group()` and `tseries.frequencies.DAYS` are removed from the public API (:issue:`18034`)
- :func:`Series.truncate` and :func:`DataFrame.truncate` will raise a ``ValueError`` if the index is not sorted instead of an unhelpful ``KeyError`` (:issue:`17935`)
- :func:`Index.map` can now accept ``Series`` and dictionary input objects (:issue:`12756`, :issue:`18482`, :issue:`18509`).
- :func:`Dataframe.unstack` will now default to filling with ``np.nan`` for ``object`` columns. (:issue:`12815`)
- :class:`IntervalIndex` constructor will raise if the ``closed`` parameter conflicts with how the input data is inferred to be closed (:issue:`18421`)
- Inserting missing values into indexes will work for all types of indexes and automatically insert the correct type of missing value (``NaN``, ``NaT``, etc.) regardless of the type passed in (:issue:`18295`)
- Restricted ``DateOffset`` keyword arguments. Previously, ``DateOffset`` subclasses allowed arbitrary keyword arguments which could lead to unexpected behavior. Now, only valid arguments will be accepted. (:issue:`17176`, :issue:`18226`).
- :func:`DataFrame.from_items` provides a more informative error message when passed scalar values (:issue:`17312`)
- When created with duplicate labels, ``MultiIndex`` now raises a ``ValueError``. (:issue:`17464`)
- Building from source now explicity requires ``setuptools`` in ``setup.py`` (:issue:`18113`)
- :func:`Series.fillna` now raises a ``TypeError`` instead of a ``ValueError`` when passed a list, tuple or DataFrame as a ``value`` (:issue:`18293`)
- :func:`pandas.DataFrame.merge` no longer casts a ``float`` column to ``object`` when merging on ``int`` and ``float`` columns (:issue:`16572`)
- The default NA value for :class:`UInt64Index` has changed from 0 to ``NaN``, which impacts methods that mask with NA, such as ``UInt64Index.where()`` (:issue:`18398`)
- Building pandas for development now requires ``cython >= 0.24`` (:issue:`18613`)

.. _whatsnew_0220.deprecations:

Deprecations
~~~~~~~~~~~~

- ``Series.from_array`` and ``SparseSeries.from_array`` are deprecated. Use the normal constructor ``Series(..)`` and ``SparseSeries(..)`` instead (:issue:`18213`).
- ``DataFrame.as_matrix`` is deprecated. Use ``DataFrame.values`` instead (:issue:`18458`).
- ``Series.asobject``, ``DatetimeIndex.asobject``, ``PeriodIndex.asobject`` and ``TimeDeltaIndex.asobject`` have been deprecated. Use '.astype(object)' instead (:issue:`18572`)

.. _whatsnew_0220.prior_deprecations:

Removal of prior version deprecations/changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Warnings against the obsolete usage ``Categorical(codes, categories)``, which were emitted for instance when the first two arguments to ``Categorical()`` had different dtypes, and recommended the use of ``Categorical.from_codes``, have now been removed (:issue:`8074`)
- The ``levels`` and ``labels`` attributes of a ``MultiIndex`` can no longer be set directly (:issue:`4039`).
- ``pd.tseries.util.pivot_annual`` has been  removed (deprecated since v0.19). Use ``pivot_table`` instead (:issue:`18370`)
- ``pd.tseries.util.isleapyear`` has been removed (deprecated since v0.19). Use ``.is_leap_year`` property in Datetime-likes instead (:issue:`18370`)
- ``pd.ordered_merge`` has been removed (deprecated since v0.19). Use ``pd.merge_ordered`` instead (:issue:`18459`)
- The ``SparseList`` class has been removed (:issue:`14007`)
- The ``pandas.io.wb`` and ``pandas.io.data`` stub modules have been removed (:issue:`13735`)
- ``Categorical.from_array`` has been removed (:issue:`13854`)

.. _whatsnew_0220.performance:

Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- Indexers on ``Series`` or ``DataFrame`` no longer create a reference cycle (:issue:`17956`)
- Added a keyword argument, ``cache``, to :func:`to_datetime` that improved the performance of converting duplicate datetime arguments (:issue:`11665`)
- :class`DateOffset` arithmetic performance is improved (:issue:`18218`)
- Converting a ``Series`` of ``Timedelta`` objects to days, seconds, etc... sped up through vectorization of underlying methods (:issue:`18092`)
- Improved performance of ``.map()`` with a ``Series/dict`` input (:issue:`15081`)
- The overriden ``Timedelta`` properties of days, seconds and microseconds have been removed, leveraging their built-in Python versions instead (:issue:`18242`)
- ``Series`` construction will reduce the number of copies made of the input data in certain cases (:issue:`17449`)
- Improved performance of :func:`Series.dt.date` and :func:`DatetimeIndex.date` (:issue:`18058`)
- Improved performance of ``IntervalIndex.symmetric_difference()`` (:issue:`18475`)
- Improved performance of ``DatetimeIndex`` and ``Series`` arithmetic operations with Business-Month and Business-Quarter frequencies (:issue:`18489`)

.. _whatsnew_0220.docs:

Documentation Changes
~~~~~~~~~~~~~~~~~~~~~

-
-
-

.. _whatsnew_0220.bug_fixes:

Bug Fixes
~~~~~~~~~


Conversion
^^^^^^^^^^

- Bug in :class:`Index` constructor with `dtype='uint64'` where int-like floats were not coerced to :class:`UInt64Index` (:issue:`18400`)
-
-

Indexing
^^^^^^^^

- Bug in :func:`Series.truncate` which raises ``TypeError`` with a monotonic ``PeriodIndex`` (:issue:`17717`)
- Bug in :func:`DataFrame.groupby` where tuples were interpreted as lists of keys rather than as keys (:issue:`17979`, :issue:`18249`)
- Bug in :func:`MultiIndex.remove_unused_levels`` which would fill nan values (:issue:`18417`)
- Bug in :func:`MultiIndex.from_tuples`` which would fail to take zipped tuples in python3 (:issue:`18434`)
- Bug in :class:`Index`` construction from list of mixed type tuples (:issue:`18505`)
- Bug in :class:`IntervalIndex` where empty and purely NA data was constructed inconsistently depending on the construction method (:issue:`18421`)
- Bug in ``IntervalIndex.symmetric_difference()`` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`)
- Bug in indexing a datetimelike ``Index`` that raised ``ValueError`` instead of ``IndexError`` (:issue:`18386`).


I/O
^^^

- :func:`read_html` now rewinds seekable IO objects after parse failure, before attempting to parse with a new parser. If a parser errors and the object is non-seekable, an informative error is raised suggesting the use of a different parser (:issue:`17975`)
- Bug in :func:`read_msgpack` with a non existent file is passed in Python 2 (:issue:`15296`)
- Bug in :func:`read_csv` where a ``MultiIndex`` with duplicate columns was not being mangled appropriately (:issue:`18062`)
- Bug in :func:`read_sas` where a file with 0 variables gave an ``AttributeError`` incorrectly. Now it gives an ``EmptyDataError`` (:issue:`18184`)
-
-

Plotting
^^^^^^^^

-
-
-

Groupby/Resample/Rolling
^^^^^^^^^^^^^^^^^^^^^^^^

- Bug when grouping by a single column and aggregating with a class like ``list`` or ``tuple`` (:issue:`18079`)
-
-

Sparse
^^^^^^

-
-
-

Reshaping
^^^^^^^^^

- Bug in :func:`DataFrame.stack` which fails trying to sort mixed type levels under Python 3 (:issue:`18310`)

-
-

Numeric
^^^^^^^

-
-
-

Categorical
^^^^^^^^^^^

-
-
-

Other
^^^^^

- Improved error message when attempting to use a Python keyword as an identifier in a numexpr query (:issue:`18221`)
- Fixed a bug where creating a Series from an array that contains both tz-naive and tz-aware values will result in a Series whose dtype is tz-aware instead of object (:issue:`16406`)
- Fixed construction of a :class:`Series` from a ``dict`` containing ``NaN`` as key (:issue:`18480`)
- Adding a ``Period`` object to a ``datetime`` or ``Timestamp`` object will now correctly raise a ``TypeError`` (:issue:`17983`)
-