doc/source/whatsnew/v0.18.0.txt

.. _whatsnew_0180:

v0.18.0 (February ??, 2016)
---------------------------

This is a major release from 0.17.1 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

.. warning::

   pandas >= 0.18.0 will no longer support compatibility with Python version 2.6 (:issue:`7718`)

.. warning::

   pandas >= 0.18.0 will no longer support compatibility with Python version 3.3 (:issue:`11273`)

Highlights include:

- Window functions are now methods on ``.groupby`` like objects, see :ref:`here <whatsnew_0180.enhancements.moments>`.
- ``pd.test()`` top-level nose test runner is available (:issue:`4327`)
- Adding support for a ``RangeIndex`` as a specialized form of the ``Int64Index`` for memory savings, see :ref:`here <whatsnew_0180.enhancements.rangeindex>`.
- API breaking ``.resample`` changes to make it more ``.groupby`` like, see :ref:`here <whatsnew_0180.breaking.resample>`.

Check the :ref:`API Changes <whatsnew_0180.api_breaking>` and :ref:`deprecations <whatsnew_0180.deprecations>` before updating.

.. contents:: What's new in v0.18.0
    :local:
    :backlinks: none

.. _whatsnew_0180.enhancements:

New features
~~~~~~~~~~~~

.. _whatsnew_0180.enhancements.moments:

Window functions are now methods
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Window functions have been refactored to be methods on ``Series/DataFrame`` objects, rather than top-level functions, which are now deprecated. This allows these window-type functions, to have a similar API to that of ``.groupby``. See the full documentation :ref:`here <stats.moments>` (:issue:`11603`)

.. ipython:: python

   np.random.seed(1234)
   df = DataFrame({'A' : range(10), 'B' : np.random.randn(10)})
   df

Previous Behavior:

.. code-block:: python

   In [8]: pd.rolling_mean(df,window=3)
           FutureWarning: pd.rolling_mean is deprecated for DataFrame and will be removed in a future version, replace with
                          DataFrame.rolling(window=3,center=False).mean()
   Out[8]:
       A         B
   0 NaN       NaN
   1 NaN       NaN
   2   1  0.237722
   3   2 -0.023640
   4   3  0.133155
   5   4 -0.048693
   6   5  0.342054
   7   6  0.370076
   8   7  0.079587
   9   8 -0.954504

New Behavior:

.. ipython:: python

   r = df.rolling(window=3)

These show a descriptive repr

.. ipython:: python

   r
with tab-completion of available methods and properties.

.. code-block:: python

   In [9]: r.
   r.A           r.agg         r.apply       r.count       r.exclusions  r.max         r.median      r.name        r.skew        r.sum
   r.B           r.aggregate   r.corr        r.cov         r.kurt        r.mean        r.min         r.quantile    r.std         r.var

The methods operate on the ``Rolling`` object itself

.. ipython:: python

   r.mean()

They provide getitem accessors

.. ipython:: python

   r['A'].mean()

And multiple aggregations

.. ipython:: python

   r.agg({'A' : ['mean','std'],
          'B' : ['mean','std']})

.. _whatsnew_0180.enhancements.rename:

Changes to rename
^^^^^^^^^^^^^^^^^

``Series.rename`` and ``NDFrame.rename_axis`` can now take a scalar or list-like
argument for altering the Series or axis *name*, in addition to their old behaviors of altering labels. (:issue:`9494`, :issue:`11965`)

.. ipython:: python

   s = pd.Series(np.random.randn(5))
   s.rename('newname')

.. ipython:: python

   df = pd.DataFrame(np.random.randn(5, 2))
   (df.rename_axis("indexname")
      .rename_axis("columns_name", axis="columns"))

The new functionality works well in method chains. Previously these methods only accepted functions or dicts mapping a *label* to a new label.
This continues to work as before for function or dict-like values.


.. _whatsnew_0180.enhancements.rangeindex:

Range Index
^^^^^^^^^^^

A ``RangeIndex`` has been added to the ``Int64Index`` sub-classes to support a memory saving alternative for common use cases. This has a similar implementation to the python ``range`` object (``xrange`` in python 2), in that it only stores the start, stop, and step values for the index. It will transparently interact with the user API, converting to ``Int64Index`` if needed.

This will now be the default constructed index for ``NDFrame`` objects, rather than previous an ``Int64Index``. (:issue:`939`, :issue:`12070`, :issue:`12071`, :issue:`12109`)

Previous Behavior:

.. code-block:: python

   In [3]: s = Series(range(1000))

   In [4]: s.index
   Out[4]:
   Int64Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
               ...
               990, 991, 992, 993, 994, 995, 996, 997, 998, 999], dtype='int64', length=1000)

   In [6]: s.index.nbytes
   Out[6]: 8000


New Behavior:

.. ipython:: python

   s = Series(range(1000))
   s.index
   s.index.nbytes

.. _whatsnew_0180.enhancements.extract:

Changes to str.extract
^^^^^^^^^^^^^^^^^^^^^^

The :ref:`.str.extract <text.extract>` method takes a regular
expression with capture groups, finds the first match in each subject
string, and returns the contents of the capture groups
(:issue:`11386`).

In v0.18.0, the ``expand`` argument was added to
``extract``.

- ``expand=False``: it returns a ``Series``, ``Index``, or ``DataFrame``, depending on the subject and regular expression pattern (same behavior as pre-0.18.0).
- ``expand=True``: it always returns a ``DataFrame``, which is more consistent and less confusing from the perspective of a user.

Currently the default is ``expand=None`` which gives a ``FutureWarning`` and uses ``expand=False``. To avoid this warning, please explicitly specify ``expand``.

.. ipython:: python

   pd.Series(['a1', 'b2', 'c3']).str.extract('[ab](\d)', expand=None)

Extracting a regular expression with one group returns a Series if
``expand=False``.

.. ipython:: python

   pd.Series(['a1', 'b2', 'c3']).str.extract('[ab](\d)', expand=False)

It returns a ``DataFrame`` with one column if ``expand=True``.

.. ipython:: python

   pd.Series(['a1', 'b2', 'c3']).str.extract('[ab](\d)', expand=True)

Calling on an ``Index`` with a regex with exactly one capture group
returns  an ``Index`` if ``expand=False``.

.. ipython:: python

   s = pd.Series(["a1", "b2", "c3"], ["A11", "B22", "C33"])
   s
   s.index.str.extract("(?P<letter>[a-zA-Z])", expand=False)

It returns a ``DataFrame`` with one column if ``expand=True``.

.. ipython:: python

   s.index.str.extract("(?P<letter>[a-zA-Z])", expand=True)

Calling on an ``Index`` with a regex with more than one capture group
raises ``ValueError`` if ``expand=False``.

.. code-block:: python

    >>> s.index.str.extract("(?P<letter>[a-zA-Z])([0-9]+)", expand=False)
    ValueError: only one regex group is supported with Index

It returns a ``DataFrame`` if ``expand=True``.

.. ipython:: python

   s.index.str.extract("(?P<letter>[a-zA-Z])([0-9]+)", expand=True)

In summary, ``extract(expand=True)`` always returns a ``DataFrame``
with a row for every subject string, and a column for every capture
group.

Addition of str.extractall
^^^^^^^^^^^^^^^^^^^^^^^^^^

.. _whatsnew_0180.enhancements.extractall:

The :ref:`.str.extractall <text.extractall>` method was added
(:issue:`11386`).  Unlike ``extract`` (which returns only the first
match),

.. ipython:: python

   s = pd.Series(["a1a2", "b1", "c1"], ["A", "B", "C"])
   s
   s.str.extract("(?P<letter>[ab])(?P<digit>\d)", expand=False)

the ``extractall`` method returns all matches.

.. ipython:: python

   s.str.extractall("(?P<letter>[ab])(?P<digit>\d)")

.. _whatsnew_0180.enhancements.rounding:

Datetimelike rounding
^^^^^^^^^^^^^^^^^^^^^

``DatetimeIndex``, ``Timestamp``, ``TimedeltaIndex``, ``Timedelta`` have gained the ``.round()``, ``.floor()`` and ``.ceil()`` method for datetimelike rounding, flooring and ceiling. (:issue:`4314`, :issue:`11963`)

Naive datetimes

.. ipython:: python

   dr = pd.date_range('20130101 09:12:56.1234', periods=3)
   dr
   dr.round('s')

   # Timestamp scalar
   dr[0]
   dr[0].round('10s')

Tz-aware are rounded, floored and ceiled in local times

.. ipython:: python

   dr = dr.tz_localize('US/Eastern')
   dr
   dr.round('s')

Timedeltas

.. ipython:: python

   t = timedelta_range('1 days 2 hr 13 min 45 us',periods=3,freq='d')
   t
   t.round('10min')

   # Timedelta scalar
   t[0]
   t[0].round('2h')


In addition, ``.round()``, ``.floor()`` and ``.ceil()`` will be available thru the ``.dt`` accessor of ``Series``.

.. ipython:: python

   s = Series(dr)
   s
   s.dt.round('D')

Formatting of integer in FloatIndex
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Integers in ``FloatIndex``, e.g. 1., are now formatted with a decimal point
and a ``0`` digit, e.g. ``1.0`` (:issue:`11713`)

This change affects the display in jupyter, but also the output of IO methods
like ``.to_csv`` or ``.to_html``

Previous Behavior:

.. code-block:: python

   In [2]: s = Series([1,2,3], index=np.arange(3.))

   In [3]: s
   Out[3]:
   0    1
   1    2
   2    3
   dtype: int64

   In [4]: s.index
   Out[4]: Float64Index([0.0, 1.0, 2.0], dtype='float64')

   In [5]: print(s.to_csv(path=None))
   0,1
   1,2
   2,3


New Behavior:

.. ipython:: python

   s = Series([1,2,3], index=np.arange(3.))
   s
   s.index
   print(s.to_csv(path=None))

.. _whatsnew_0180.enhancements.xarray:

to_xarray
^^^^^^^^^

In a future version of pandas, we will be deprecating ``Panel`` and other > 2 ndim objects. In order to provide for continuity,
all ``NDFrame`` objects have gained the ``.to_xarray()`` method in order to convert to ``xarray`` objects, which has
a pandas-like interface for > 2 ndim.

See the `xarray full-documentation here <http://xarray.pydata.org/en/stable/>`__.

.. code-block:: python

   In [1]: p = Panel(np.arange(2*3*4).reshape(2,3,4))

   In [2]: p.to_xarray()
   Out[2]:
   <xarray.DataArray (items: 2, major_axis: 3, minor_axis: 4)>
   array([[[ 0,  1,  2,  3],
           [ 4,  5,  6,  7],
           [ 8,  9, 10, 11]],

          [[12, 13, 14, 15],
           [16, 17, 18, 19],
           [20, 21, 22, 23]]])
   Coordinates:
     * items       (items) int64 0 1
     * major_axis  (major_axis) int64 0 1 2
     * minor_axis  (minor_axis) int64 0 1 2 3

Latex Representation
^^^^^^^^^^^^^^^^^^^^

``DataFrame`` has gained a ``._repr_latex_()`` method in order to allow for conversion to latex in a ipython/jupyter notebook using nbconvert. (:issue:`11778`)

Note that this must be activated by setting the option ``display.latex.repr`` to ``True`` (issue:`12182`)

For example, if you have a jupyter notebook you plan to convert to latex using nbconvert, place the statement ``pd.set_option('display.latex.repr', True)`` in the first cell to have the contained DataFrame output also stored as latex.

Options ``display.latex.escape`` and ``display.latex.longtable`` have also been added to the configuration and are used automatically by the ``to_latex``
method. See the :ref:`options documentation<options>` for more info.

.. _whatsnew_0180.enhancements.other:

Other enhancements
^^^^^^^^^^^^^^^^^^

- Handle truncated floats in SAS xport files (:issue:`11713`)
- Added option to hide index in ``Series.to_string`` (:issue:`11729`)
- ``read_excel`` now supports s3 urls of the format ``s3://bucketname/filename`` (:issue:`11447`)
- add support for ``AWS_S3_HOST`` env variable when reading from s3 (:issue:`12198`)
- A simple version of ``Panel.round()`` is now implemented (:issue:`11763`)
- For Python 3.x, ``round(DataFrame)``, ``round(Series)``, ``round(Panel)`` will work (:issue:`11763`)
- ``sys.getsizeof(obj)`` returns the memory usage of a pandas object, including the
  values it contains (:issue:`11597`)
- ``Series`` gained an ``is_unique`` attribute (:issue:`11946`)
- ``DataFrame.quantile`` and ``Series.quantile`` now accept ``interpolation`` keyword (:issue:`10174`).
- ``DataFrame.select_dtypes`` now allows the ``np.float16`` typecode (:issue:`11990`)
- ``pivot_table()`` now accepts most iterables for the ``values`` parameter (:issue:`12017`)
- Added Google ``BigQuery`` service account authentication support, which enables authentication on remote servers. (:issue:`11881`). For further details see :ref:`here <io.bigquery_authentication>`
- ``HDFStore`` is now iterable: ``for k in store`` is equivalent to ``for k in store.keys()`` (:issue: `12221`).

.. _whatsnew_0180.api_breaking:

Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- the leading whitespaces have been removed from the output of ``.to_string(index=False)`` method (:issue:`11833`)
- the ``out`` parameter has been removed from the ``Series.round()`` method. (:issue:`11763`)
- ``DataFrame.round()`` leaves non-numeric columns unchanged in its return, rather than raises. (:issue:`11885`)
- ``DataFrame.head(0)`` and ``DataFrame.tail(0)`` return empty frames, rather than ``self``.  (:issue:`11937`)
- ``Series.head(0)`` and ``Series.tail(0)`` return empty series, rather than ``self``.  (:issue:`11937`)
- ``to_msgpack`` and ``read_msgpack`` encoding now defaults to ``'utf-8'``. (:issue:`12170`)

NaT and Timedelta operations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``NaT`` and ``Timedelta`` have expanded arithmetic operations, which are extended to ``Series``
arithmetic where applicable.  Operations defined for ``datetime64[ns]`` or ``timedelta64[ns]``
are now also defined for ``NaT`` (:issue:`11564`).

``NaT`` now supports arithmetic operations with integers and floats.

.. ipython:: python

   pd.NaT * 1
   pd.NaT * 1.5
   pd.NaT / 2
   pd.NaT * np.nan

``NaT`` defines more arithmetic operations with ``datetime64[ns]`` and ``timedelta64[ns]``.

.. ipython:: python

   pd.NaT / pd.NaT
   pd.Timedelta('1s') / pd.NaT

``NaT`` may represent either a ``datetime64[ns]`` null or a ``timedelta64[ns]`` null.
Given the ambiguity, it is treated as a ``timedelta64[ns]``, which allows more operations
to succeed.

.. ipython:: python

   pd.NaT + pd.NaT

   # same as
   pd.Timedelta('1s') + pd.Timedelta('1s')

as opposed to

.. code-block:: python

   In [3]: pd.Timestamp('19900315') + pd.Timestamp('19900315')
   TypeError: unsupported operand type(s) for +: 'Timestamp' and 'Timestamp'

However, when wrapped in a ``Series`` whose ``dtype`` is ``datetime64[ns]`` or ``timedelta64[ns]``,
the ``dtype`` information is respected.

.. code-block:: python

   In [1]: pd.Series([pd.NaT], dtype='<M8[ns]') + pd.Series([pd.NaT], dtype='<M8[ns]')
   TypeError: can only operate on a datetimes for subtraction,
              but the operator [__add__] was passed

.. ipython:: python

   pd.Series([pd.NaT], dtype='<m8[ns]') + pd.Series([pd.NaT], dtype='<m8[ns]')

``Timedelta`` division by ``floats`` now works.

.. ipython:: python

   pd.Timedelta('1s') / 2.0

Subtraction by ``Timedelta`` in a ``Series`` by a ``Timestamp`` works (:issue:`11925`)

.. ipython:: python

   ser = pd.Series(pd.timedelta_range('1 day', periods=3))
   ser
   pd.Timestamp('2012-01-01') - ser


Signature change for .rank
^^^^^^^^^^^^^^^^^^^^^^^^^^

``Series.rank`` and ``DataFrame.rank`` now have the same signature (:issue:`11759`)

Previous signature

.. code-block:: python

   In [3]: pd.Series([0,1]).rank(method='average', na_option='keep',
                                 ascending=True, pct=False)
   Out[3]:
   0    1
   1    2
   dtype: float64

   In [4]: pd.DataFrame([0,1]).rank(axis=0, numeric_only=None,
                                    method='average', na_option='keep',
                                    ascending=True, pct=False)
   Out[4]:
      0
   0  1
   1  2

New signature

.. ipython:: python

   pd.Series([0,1]).rank(axis=0, method='average', numeric_only=None,
                         na_option='keep', ascending=True, pct=False)
   pd.DataFrame([0,1]).rank(axis=0, method='average', numeric_only=None,
                            na_option='keep', ascending=True, pct=False)


Bug in QuarterBegin with n=0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions, the behavior of the QuarterBegin offset was inconsistent
depending on the date when the ``n`` parameter was 0. (:issue:`11406`)

The general semantics of anchored offsets for ``n=0`` is to not move the date
when it is an anchor point (e.g., a quarter start date), and otherwise roll
forward to the next anchor point.

.. ipython:: python

   d = pd.Timestamp('2014-02-01')
   d
   d + pd.offsets.QuarterBegin(n=0, startingMonth=2)
   d + pd.offsets.QuarterBegin(n=0, startingMonth=1)

For the ``QuarterBegin`` offset in previous versions, the date would be rolled
*backwards* if date was in the same month as the quarter start date.

.. code-block:: python

   In [3]: d = pd.Timestamp('2014-02-15')

   In [4]: d + pd.offsets.QuarterBegin(n=0, startingMonth=2)
   Out[4]: Timestamp('2014-02-01 00:00:00')

This behavior has been corrected in version 0.18.0, which is consistent with
other anchored offsets like ``MonthBegin`` and ``YearBegin``.

.. ipython:: python

   d = pd.Timestamp('2014-02-15')
   d + pd.offsets.QuarterBegin(n=0, startingMonth=2)

.. _whatsnew_0180.breaking.resample:

Resample API
^^^^^^^^^^^^

Like the change in the window functions API :ref:`above <whatsnew_0180.enhancements.moments>`, ``.resample(...)`` is changing to have a more groupby-like API. (:issue:`11732`, :issue:`12702`).

.. ipython:: python

   np.random.seed(1234)
   df = pd.DataFrame(np.random.rand(10,4),
                     columns=list('ABCD'),
                     index=pd.date_range('2010-01-01 09:00:00', periods=10, freq='s'))
   df


**Previous API**:

You would write a resampling operation that immediately evaluates. If a ``how`` parameter was not provided, it
would default to ``how='mean'``.

.. code-block:: python

   In [6]: df.resample('2s')
   Out[6]:
                            A         B         C         D
   2010-01-01 09:00:00  0.485748  0.447351  0.357096  0.793615
   2010-01-01 09:00:02  0.820801  0.794317  0.364034  0.531096
   2010-01-01 09:00:04  0.433985  0.314582  0.424104  0.625733
   2010-01-01 09:00:06  0.624988  0.609738  0.633165  0.612452
   2010-01-01 09:00:08  0.510470  0.534317  0.573201  0.806949

You could also specify a ``how`` directly

.. code-block:: python

   In [7]: df.resample('2s',how='sum')
   Out[7]:
                            A         B         C         D
   2010-01-01 09:00:00  0.971495  0.894701  0.714192  1.587231
   2010-01-01 09:00:02  1.641602  1.588635  0.728068  1.062191
   2010-01-01 09:00:04  0.867969  0.629165  0.848208  1.251465
   2010-01-01 09:00:06  1.249976  1.219477  1.266330  1.224904
   2010-01-01 09:00:08  1.020940  1.068634  1.146402  1.613897

.. warning::

   This new API for resample includes some internal changes for the prior-to-0.18.0 API, to work with a deprecation warning in most cases, as the resample operation returns a deferred object. We can intercept operations and just do what the (pre 0.18.0) API did (with a warning). Here is a typical use case:

   .. code-block:: python

      In [4]: r = df.resample('2s')

      In [6]: r*10
      pandas/tseries/resample.py:80: FutureWarning: .resample() is now a deferred operation
      use .resample(...).mean() instead of .resample(...)

      Out[6]:
                            A         B         C         D
      2010-01-01 09:00:00  4.857476  4.473507  3.570960  7.936154
      2010-01-01 09:00:02  8.208011  7.943173  3.640340  5.310957
      2010-01-01 09:00:04  4.339846  3.145823  4.241039  6.257326
      2010-01-01 09:00:06  6.249881  6.097384  6.331650  6.124518
      2010-01-01 09:00:08  5.104699  5.343172  5.732009  8.069486

   However, getting and assignment operations directly on a ``Resampler`` will raise a ``ValueError``:

   .. code-block:: python

      In [7]: r.iloc[0] = 5
      ValueError: .resample() is now a deferred operation
            use .resample(...).mean() instead of .resample(...)
            assignment will have no effect as you are working on a copy

**New API**:

Now, you can write ``.resample`` as a 2-stage operation like groupby, which
yields a ``Resampler``.

.. ipython:: python


   r = df.resample('2s')
   r

Downsampling
''''''''''''

You can then use this object to perform operations.
These are downsampling operations (going from a lower frequency to a higher one).

.. ipython:: python

   r.mean()

.. ipython:: python

   r.sum()

Furthermore, resample now supports ``getitem`` operations to perform the resample on specific columns.

.. ipython:: python

   r[['A','C']].mean()

and ``.aggregate`` type operations.

.. ipython:: python

   r.agg({'A' : 'mean', 'B' : 'sum'})

These accessors can of course, be combined

.. ipython:: python

   r[['A','B']].agg(['mean','sum'])

Upsampling
''''''''''

.. currentmodule:: pandas.tseries.resample

Upsampling operations take you from a higher frequency to a lower frequency. These are now
performed with the ``Resampler`` objects with :meth:`~Resampler.backfill`,
:meth:`~Resampler.ffill`, :meth:`~Resampler.fillna` and :meth:`~Resampler.asfreq` methods.

.. ipython:: python

   s = Series(np.arange(5,dtype='int64'),
              index=date_range('2010-01-01', periods=5, freq='Q'))
   s

Previously

.. code-block:: python

   In [6]: s.resample('M', fill_method='ffill')
   Out[6]:
   2010-03-31    0
   2010-04-30    0
   2010-05-31    0
   2010-06-30    1
   2010-07-31    1
   2010-08-31    1
   2010-09-30    2
   2010-10-31    2
   2010-11-30    2
   2010-12-31    3
   2011-01-31    3
   2011-02-28    3
   2011-03-31    4
   Freq: M, dtype: int64

New API

.. ipython:: python

   s.resample('M').ffill()

.. note::

   In the new API, you can either downsample OR upsample. The prior implementation would allow you to pass an aggregator function (like ``mean``) even though you were upsampling, providing a bit of confusion.

Changes to eval
^^^^^^^^^^^^^^^

In prior versions, new columns assignments in an ``eval`` expression resulted
in an inplace change to the ``DataFrame``. (:issue:`9297`)

.. ipython:: python

   df = pd.DataFrame({'a': np.linspace(0, 10, 5), 'b': range(5)})
   df.eval('c = a + b')
   df

In version 0.18.0, a new ``inplace`` keyword was added to choose whether the
assignment should be done inplace or return a copy.

.. ipython:: python

   df
   df.eval('d = c - b', inplace=False)
   df
   df.eval('d = c - b', inplace=True)
   df

.. warning::

   For backwards compatability, ``inplace`` defaults to ``True`` if not specified.
   This will change in a future version of pandas. If your code depends on an
   inplace assignment you should update to explicitly set ``inplace=True``

The ``inplace`` keyword parameter was also added the ``query`` method.

.. ipython:: python

   df.query('a > 5')
   df.query('a > 5', inplace=True)
   df

.. warning::

   Note that the default value for ``inplace`` in a ``query``
   is ``False``, which is consistent with prior versions.

``eval`` has also been updated to allow multi-line expressions for multiple
assignments.  These expressions will be evaluated one at a time in order.  Only
assignments are valid for multi-line expressions.

.. ipython:: python

   df
   df.eval("""
   e = d + a
   f = e - 22
   g = f / 2.0""", inplace=True)
   df


.. _whatsnew_0180.api:

Other API Changes
^^^^^^^^^^^^^^^^^

- ``DataFrame.between_time`` and ``Series.between_time`` now only parse a fixed set of time strings. Parsing of date strings is no longer supported and raises a ``ValueError``. (:issue:`11818`)

  .. ipython:: python

     s = pd.Series(range(10), pd.date_range('2015-01-01', freq='H', periods=10))
     s.between_time("7:00am", "9:00am")

  This will now raise.

  .. code-block:: python

     In [2]: s.between_time('20150101 07:00:00','20150101 09:00:00')
     ValueError: Cannot convert arg ['20150101 07:00:00'] to a time.

- ``.memory_usage()`` now includes values in the index, as does memory_usage in ``.info`` (:issue:`11597`)

- ``DataFrame.to_latex()`` now supports non-ascii encodings (eg utf-8) in Python 2 with the parameter ``encoding`` (:issue:`7061`)

- ``pandas.merge()`` and ``DataFrame.merge()`` will show a specific error message when trying to merge with an object that is not of type ``DataFrame`` or a subclass (:issue:`12081`)

- ``DataFrame.unstack`` and ``Series.unstack`` now take ``fill_value`` keyword to allow direct replacement of missing values when an unstack results in missing values in the resulting ``DataFrame``. As an added benefit, specifying ``fill_value`` will preserve the data type of the original stacked data.  (:issue:`9746`)

- As part of the new API for :ref:`window functions <whatsnew_0180.enhancements.moments>` and :ref:`resampling <whatsnew_0180.breaking.resample>`, aggregation functions have been clarified, raising more informative error messages on invalid aggregations. (:issue:`9052`). A full set of examples are presented in :ref:`groupby <groupby.aggregation>`.

.. _whatsnew_0180.deprecations:

Deprecations
^^^^^^^^^^^^

.. _whatsnew_0180.window_deprecations:

- The functions ``pd.rolling_*``, ``pd.expanding_*``, and ``pd.ewm*`` are deprecated and replaced by the corresponding method call. Note that
  the new suggested syntax includes all of the arguments (even if default) (:issue:`11603`)

  .. code-block:: python

     In [1]: s = Series(range(3))

     In [2]: pd.rolling_mean(s,window=2,min_periods=1)
             FutureWarning: pd.rolling_mean is deprecated for Series and
                  will be removed in a future version, replace with
                  Series.rolling(min_periods=1,window=2,center=False).mean()
     Out[2]:
             0    0.0
             1    0.5
             2    1.5
             dtype: float64

     In [3]: pd.rolling_cov(s, s, window=2)
             FutureWarning: pd.rolling_cov is deprecated for Series and
                  will be removed in a future version, replace with
                  Series.rolling(window=2).cov(other=<Series>)
     Out[3]:
             0    NaN
             1    0.5
             2    0.5
             dtype: float64

- The the ``freq`` and ``how`` arguments to the ``.rolling``, ``.expanding``, and ``.ewm`` (new) functions are deprecated, and will be removed in a future version. You can simply resample the input prior to creating a window function. (:issue:`11603`).

  For example, instead of ``s.rolling(window=5,freq='D').max()`` to get the max value on a rolling 5 Day window, one could use ``s.resample('D',how='max').rolling(window=5).max()``, which first resamples the data to daily data, then provides a rolling 5 day window.

- ``pd.tseries.frequencies.get_offset_name`` function is deprecated. Use offset's ``.freqstr`` property as alternative (:issue:`11192`)
- ``pandas.stats.fama_macbeth`` routines are deprecated and will be removed in a future version (:issue:`6077`)
- ``pandas.stats.ols``, ``pandas.stats.plm`` and ``pandas.stats.var`` routines are deprecated and will be removed in a future version (:issue:`6077`)
- show a ``FutureWarning`` rather than a ``DeprecationWarning`` on using long-time deprecated syntax in ``HDFStore.select``, where the ``where`` clause is not a string-like (:issue:`12027`)

.. _whatsnew_0180.prior_deprecations:

Removal of prior version deprecations/changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- Removal of ``rolling_corr_pairwise`` in favor of ``.rolling().corr(pairwise=True)`` (:issue:`4950`)
- Removal of ``expanding_corr_pairwise`` in favor of ``.expanding().corr(pairwise=True)`` (:issue:`4950`)
- Removal of ``DataMatrix`` module. This was not imported into the pandas namespace in any event (:issue:`12111`)
- Removal of ``cols`` keyword in favor of ``subset`` in ``DataFrame.duplicated()`` and ``DataFrame.drop_duplicates()`` (:issue:`6680`)
- Removal of the ``read_frame`` and ``frame_query`` (both aliases for ``pd.read_sql``)
  and ``write_frame`` (alias of ``to_sql``) functions in the ``pd.io.sql`` namespace,
  deprecated since 0.14.0 (:issue:`6292`).
- Removal of the ``order`` keyword from ``.factorize()`` (:issue:`6930`)

.. _whatsnew_0180.performance:

Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- Improved performance of ``andrews_curves`` (:issue:`11534`)
- Improved huge ``DatetimeIndex``, ``PeriodIndex`` and ``TimedeltaIndex``'s ops performance including ``NaT`` (:issue:`10277`)
- Improved performance of ``pandas.concat`` (:issue:`11958`)
- Improved performance of ``StataReader`` (:issue:`11591`)
- Improved performance in construction of ``Categoricals`` with Series of datetimes containing ``NaT`` (:issue:`12077`)


- Improved performance of ISO 8601 date parsing for dates without separators (:issue:`11899`), leading zeros (:issue:`11871`) and with whitespace preceding the time zone (:issue:`9714`)


.. _whatsnew_0180.bug_fixes:

Bug Fixes
~~~~~~~~~

- Bug in ``GroupBy.size`` when data-frame is empty. (:issue:`11699`)
- Bug in ``Period.end_time`` when a multiple of time period is requested (:issue:`11738`)
- Regression in ``.clip`` with tz-aware datetimes (:issue:`11838`)
- Bug in ``date_range`` when the boundaries fell on the frequency (:issue:`11804`)
- Bug in consistency of passing nested dicts to ``.groupby(...).agg(...)`` (:issue:`9052`)
- Accept unicode in ``Timedelta`` constructor (:issue:`11995`)
- Bug in value label reading for ``StataReader`` when reading incrementally (:issue:`12014`)
- Bug in vectorized ``DateOffset`` when ``n`` parameter is ``0`` (:issue:`11370`)
- Compat for numpy 1.11 w.r.t. ``NaT`` comparison changes (:issue:`12049`)
- Bug in ``read_csv`` when reading from a ``StringIO`` in threads (:issue:`11790`)
- Bug in not treating ``NaT`` as a missing value in datetimelikes when factorizing & with ``Categoricals`` (:issue:`12077`)
- Bug in getitem when the values of a ``Series`` were tz-aware (:issue:`12089`)
- Bug in ``Series.str.get_dummies`` when one of the variables was 'name' (:issue:`12180`)
- Bug in ``pd.concat`` while concatenating tz-aware NaT series. (:issue:`11693`, :issue:`11755`)
- Bug in ``pd.read_stata`` with version <= 108 files (:issue:`12232`)
- Bug in ``Series.resample`` using a frequency of ``Nano`` when the index is a ``DatetimeIndex`` and contains non-zero nanosecond parts (:issue:`12037`)


- Bug in ``Timedelta.round`` with negative values (:issue:`11690`)
- Bug in ``.loc`` against ``CategoricalIndex`` may result in normal ``Index`` (:issue:`11586`)
- Bug in ``DataFrame.info`` when duplicated column names exist (:issue:`11761`)
- Bug in ``.copy`` of datetime tz-aware objects (:issue:`11794`)
- Bug in ``Series.apply`` and ``Series.map`` where ``timedelta64`` was not boxed (:issue:`11349`)


- Bug in subclasses of ``DataFrame`` where ``AttributeError`` did not propagate (:issue:`11808`)
- Bug groupby on tz-aware data where selection not returning ``Timestamp`` (:issue:`11616`)
- Bug in ``pd.read_clipboard`` and ``pd.to_clipboard`` functions not supporting Unicode; upgrade included ``pyperclip`` to v1.5.15 (:issue:`9263`)
- Bug in ``DataFrame.query`` containing an assignment (:issue:`8664`)

- Bug in ``from_msgpack`` where ``__contains__()`` fails for columns of the unpacked ``DataFrame``, if the ``DataFrame`` has object columns. (:issue:`11880`)
- Bug in ``.resample`` on categorical data with ``TimedeltaIndex`` (:issue:`12169`)


- Bug in timezone info lost when broadcasting scalar datetime to ``DataFrame`` (:issue:`11682`)
- Bug in ``Index`` creation from ``Timestamp`` with mixed tz coerces to UTC (:issue:`11488`)
- Bug in ``to_numeric`` where it does not raise if input is more than one dimension (:issue:`11776`)
- Bug in parsing timezone offset strings with non-zero minutes (:issue:`11708`)
- Bug in ``df.plot`` using incorrect colors for bar plots under matplotlib 1.5+ (:issue:`11614`)
- Bug in the ``groupby`` ``plot`` method when using keyword arguments (:issue:`11805`).
- Bug in ``DataFrame.duplicated`` and ``drop_duplicates`` causing spurious matches when setting ``keep=False`` (:issue:`11864`)
- Bug in ``.loc`` result with duplicated key may have ``Index`` with incorrect dtype (:issue:`11497`)
- Bug in ``pd.rolling_median`` where memory allocation failed even with sufficient memory (:issue:`11696`)
- Bug in ``.style.bar`` may not rendered properly using specific browser (:issue:`11678`)
- Bug in rich comparison of ``Timedelta`` with a ``numpy.array`` of ``Timedelta`` that caused an infinite recursion (:issue:`11835`)
- Bug in ``DataFrame.round`` dropping column index name (:issue:`11986`)
- Bug in ``df.replace`` while replacing value in mixed dtype ``Dataframe`` (:issue:`11698`)
- Bug in ``Index`` prevents copying name of passed ``Index``, when a new name is not provided (:issue:`11193`)
- Bug in ``read_excel`` failing to read any non-empty sheets when empty sheets exist and ``sheetname=None`` (:issue:`11711`)
- Bug in ``read_excel`` failing to raise ``NotImplemented`` error when keywords ``parse_dates`` and ``date_parser`` are provided (:issue:`11544`)
- Bug in ``read_sql`` with ``pymysql`` connections failing to return chunked data (:issue:`11522`)
- Bug in ``.to_csv`` ignoring formatting parameters ``decimal``, ``na_rep``, ``float_format`` for float indexes (:issue:`11553`)
- Bug in ``Int64Index`` and ``Float64Index`` preventing the use of the modulo operator (:issue:`9244`)
- Bug in ``MultiIndex.drop`` for not lexsorted multi-indexes (:issue:`12078`)

- Bug in ``DataFrame`` when masking an empty ``DataFrame`` (:issue:`11859`)


- Bug in ``.plot`` potentially modifying the ``colors`` input when the number of columns didn't match the number of series provided (:issue:`12039`).

- Bug in ``read_excel`` failing to read data with one column when ``squeeze=True`` (:issue:`12157`)
- Bug in ``.groupby`` where a ``KeyError`` was not raised for a wrong column if there was only one row in the dataframe (:issue:`11741`)
- Bug in ``.read_csv`` with dtype specified on empty data producing an error (:issue:`12048`)
- Bug in ``.read_csv`` where strings like ``'2E'`` are treated as valid floats (:issue:`12237`)
- Bug in building *pandas* with debugging symbols (:issue:`12123`)


- Removed ``millisecond`` property of ``DatetimeIndex``. This would always raise a ``ValueError`` (:issue:`12019`).
- Bug in ``Series`` constructor with read-only data (:issue:`11502`)

- Bug in ``.loc`` setitem indexer preventing the use of a TZ-aware DatetimeIndex (:issue:`12050`)
- Bug in ``.style`` indexes and multi-indexes not appearing (:issue:`11655`)

- Bug in ``.skew`` and ``.kurt`` due to roundoff error for highly similar values (:issue:`11974`)

- Bug in ``buffer_rd_bytes`` src->buffer could be freed more than once if reading failed, causing a segfault (:issue:`12098`)
- Bug in ``crosstab`` where arguments with non-overlapping indexes would return a KeyError (:issue:`10291`)