doc/source/whatsnew/v0.18.0.txt

.. _whatsnew_0180:

v0.18.0 (January ??, 2016)
--------------------------

This is a major release from 0.17.1 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

.. warning::

   pandas >= 0.18.0 will no longer support compatibility with Python version 2.6 (:issue:`7718`)

.. warning::

   pandas >= 0.18.0 will no longer support compatibility with Python version 3.3 (:issue:`11273`)

Highlights include:

- Window functions are now methods on ``.groupby`` like objects, see :ref:`here <whatsnew_0180.enhancements.moments>`.
- ``pd.test()`` top-level nose test runner is available (:issue:`4327`)
- Adding support for a ``RangeIndex`` as a specialized form of the ``Int64Index`` for memory savings, see :ref:`here <whatsnew_0180.enhancements.rangeindex>`.

Check the :ref:`API Changes <whatsnew_0180.api>` and :ref:`deprecations <whatsnew_0180.deprecations>` before updating.

.. contents:: What's new in v0.18.0
    :local:
    :backlinks: none

.. _whatsnew_0180.enhancements:

New features
~~~~~~~~~~~~

.. _whatsnew_0180.enhancements.moments:

Window functions are now methods
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Window functions have been refactored to be methods on ``Series/DataFrame`` objects, rather than top-level functions, which are now deprecated. This allows these window-type functions, to have a similar API to that of ``.groupby``. See the full documentation :ref:`here <stats.moments>` (:issue:`11603`)

.. ipython:: python

   np.random.seed(1234)
   df = DataFrame({'A' : range(10), 'B' : np.random.randn(10)})
   df

Previous Behavior:

.. code-block:: python

   In [8]: pd.rolling_mean(df,window=3)
           FutureWarning: pd.rolling_mean is deprecated for DataFrame and will be removed in a future version, replace with
                          DataFrame.rolling(window=3,center=False).mean()
   Out[8]:
       A         B
   0 NaN       NaN
   1 NaN       NaN
   2   1  0.237722
   3   2 -0.023640
   4   3  0.133155
   5   4 -0.048693
   6   5  0.342054
   7   6  0.370076
   8   7  0.079587
   9   8 -0.954504

New Behavior:

.. ipython:: python

   r = df.rolling(window=3)

These show a descriptive repr

.. ipython:: python

   r
with tab-completion of available methods and properties.

.. code-block:: python

   In [9]: r.
   r.A           r.agg         r.apply       r.count       r.exclusions  r.max         r.median      r.name        r.skew        r.sum
   r.B           r.aggregate   r.corr        r.cov         r.kurt        r.mean        r.min         r.quantile    r.std         r.var

The methods operate on the ``Rolling`` object itself

.. ipython:: python

   r.mean()

They provide getitem accessors

.. ipython:: python

   r['A'].mean()

And multiple aggregations

.. ipython:: python

   r.agg({'A' : ['mean','std'],
          'B' : ['mean','std']})

.. _whatsnew_0180.enhancements.rangeindex:

Range Index
^^^^^^^^^^^

A ``RangeIndex`` has been added to the ``Int64Index`` sub-classes to support a memory saving alternative for common use cases. This has a similar implementation to the python ``range`` object (``xrange`` in python 2), in that it only stores the start, stop, and step values for the index. It will transparently interact with the user API, converting to ``Int64Index`` if needed.

This will now be the default constructed index for ``NDFrame`` objects, rather than previous an ``Int64Index``. (:issue:`939`)

Previous Behavior:

.. code-block:: python

   In [3]: s = Series(range(1000))

   In [4]: s.index
   Out[4]:
   Int64Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
               ...
               990, 991, 992, 993, 994, 995, 996, 997, 998, 999], dtype='int64', length=1000)

   In [6]: s.index.nbytes
   Out[6]: 8000


New Behavior:

.. ipython:: python

   s = Series(range(1000))
   s.index
   s.index.nbytes

.. _whatsnew_0180.enhancements.other:

Other enhancements
^^^^^^^^^^^^^^^^^^

- Handle truncated floats in SAS xport files (:issue:`11713`)
- Added option to hide index in ``Series.to_string`` (:issue:`11729`)
- ``read_excel`` now supports s3 urls of the format ``s3://bucketname/filename`` (:issue:`11447`)
- A simple version of ``Panel.round()`` is now implemented (:issue:`11763`)
- For Python 3.x, ``round(DataFrame)``, ``round(Series)``, ``round(Panel)`` will work (:issue:`11763`)
- ``DataFrame`` has gained a ``_repr_latex_`` method in order to allow for automatic conversion to latex in a ipython/jupyter notebook using nbconvert. Options ``display.latex.escape`` and ``display.latex.longtable`` have been added to the configuration and are used automatically by the ``to_latex`` method. (:issue:`11778`)
- ``sys.getsizeof(obj)`` returns the memory usage of a pandas object, including the
  values it contains (:issue:`11597`)
- ``Series`` gained an ``is_unique`` attribute (:issue:`11946`)
- ``DataFrame.quantile`` and ``Series.quantile`` now accept ``interpolation`` keyword (:issue:`10174`).
- ``DataFrame.select_dtypes`` now allows the ``np.float16`` typecode (:issue:`11990`)

.. _whatsnew_0180.enhancements.rounding:

Datetimelike rounding
^^^^^^^^^^^^^^^^^^^^^

``DatetimeIndex``, ``Timestamp``, ``TimedeltaIndex``, ``Timedelta`` have gained the ``.round()``, ``.floor()`` and ``.ceil()`` method for datetimelike rounding, flooring and ceiling. (:issue:`4314`, :issue:`11963`)

Naive datetimes

.. ipython:: python

   dr = pd.date_range('20130101 09:12:56.1234', periods=3)
   dr
   dr.round('s')

   # Timestamp scalar
   dr[0]
   dr[0].round('10s')

Tz-aware are rounded, floored and ceiled in local times

.. ipython:: python

   dr = dr.tz_localize('US/Eastern')
   dr
   dr.round('s')

Timedeltas

.. ipython:: python

   t = timedelta_range('1 days 2 hr 13 min 45 us',periods=3,freq='d')
   t
   t.round('10min')

   # Timedelta scalar
   t[0]
   t[0].round('2h')


In addition, ``.round()``, ``.floor()`` and ``.ceil()`` will be available thru the ``.dt`` accessor of ``Series``.

.. ipython:: python

   s = Series(dr)
   s
   s.dt.round('D')

.. _whatsnew_0180.api:

.. _whatsnew_0180.api_breaking:

Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- the leading whitespaces have been removed from the output of ``.to_string(index=False)`` method (:issue:`11833`)
- the ``out`` parameter has been removed from the ``Series.round()`` method. (:issue:`11763`)
- ``DataFrame.round()`` leaves non-numeric columns unchanged in its return, rather than raises. (:issue:`11885`)
- ``DataFrame.head(0)`` and ``DataFrame.tail(0)`` return empty frames, rather than ``self``.  (:issue:`11937`)
- ``Series.head(0)`` and ``Series.tail(0)`` return empty series, rather than ``self``.  (:issue:`11937`)

NaT and Timedelta operations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``NaT`` and ``Timedelta`` have expanded arithmetic operations, which are extended to ``Series``
arithmetic where applicable.  Operations defined for ``datetime64[ns]`` or ``timedelta64[ns]``
are now also defined for ``NaT`` (:issue:`11564`).

``NaT`` now supports arithmetic operations with integers and floats.

.. ipython:: python

   pd.NaT * 1
   pd.NaT * 1.5
   pd.NaT / 2
   pd.NaT * np.nan

``NaT`` defines more arithmetic operations with ``datetime64[ns]`` and ``timedelta64[ns]``.

.. ipython:: python

   pd.NaT / pd.NaT
   pd.Timedelta('1s') / pd.NaT

``NaT`` may represent either a ``datetime64[ns]`` null or a ``timedelta64[ns]`` null.
Given the ambiguity, it is treated as a ``timedelta64[ns]``, which allows more operations
to succeed.

.. ipython:: python

   pd.NaT + pd.NaT

   # same as
   pd.Timedelta('1s') + pd.Timedelta('1s')

as opposed to

.. code-block:: python

   In [3]: pd.Timestamp('19900315') + pd.Timestamp('19900315')
   TypeError: unsupported operand type(s) for +: 'Timestamp' and 'Timestamp'

However, when wrapped in a ``Series`` whose ``dtype`` is ``datetime64[ns]`` or ``timedelta64[ns]``,
the ``dtype`` information is respected.

.. code-block:: python

   In [1]: pd.Series([pd.NaT], dtype='<M8[ns]') + pd.Series([pd.NaT], dtype='<M8[ns]')
   TypeError: can only operate on a datetimes for subtraction,
              but the operator [__add__] was passed

.. ipython:: python

   pd.Series([pd.NaT], dtype='<m8[ns]') + pd.Series([pd.NaT], dtype='<m8[ns]')

``Timedelta`` division by ``floats`` now works.

.. ipython:: python

   pd.Timedelta('1s') / 2.0

Subtraction by ``Timedelta`` in a ``Series`` by a ``Timestamp`` works (:issue:`11925`)

.. ipython:: python

   ser = pd.Series(pd.timedelta_range('1 day', periods=3))
   ser
   pd.Timestamp('2012-01-01') - ser

Bug in QuarterBegin with n=0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions, the behavior of the QuarterBegin offset was inconsistent
depending on the date when the ``n`` parameter was 0. (:issue:`11406`)

The general semantics of anchored offsets for ``n=0`` is to not move the date
when it is an anchor point (e.g., a quarter start date), and otherwise roll
forward to the next anchor point.

.. ipython:: python

   d = pd.Timestamp('2014-02-01')
   d
   d + pd.offsets.QuarterBegin(n=0, startingMonth=2)
   d + pd.offsets.QuarterBegin(n=0, startingMonth=1)

For the ``QuarterBegin`` offset in previous versions, the date would be rolled
*backwards* if date was in the same month as the quarter start date.

.. code-block:: python

   In [3]: d = pd.Timestamp('2014-02-15')

   In [4]: d + pd.offsets.QuarterBegin(n=0, startingMonth=2)
   Out[4]: Timestamp('2014-02-01 00:00:00')

This behavior has been corrected in version 0.18.0, which is consistent with
other anchored offsets like ``MonthBegin`` and ``YearBegin``.

.. ipython:: python

   d = pd.Timestamp('2014-02-15')
   d + pd.offsets.QuarterBegin(n=0, startingMonth=2)


Other API Changes
^^^^^^^^^^^^^^^^^

- ``DataFrame.between_time`` and ``Series.between_time`` now only parse a fixed set of time strings. Parsing of date strings is no longer supported and raises a ``ValueError``. (:issue:`11818`)

  .. ipython:: python

     s = pd.Series(range(10), pd.date_range('2015-01-01', freq='H', periods=10))
     s.between_time("7:00am", "9:00am")

  This will now raise.

  .. code-block:: python

     In [2]: s.between_time('20150101 07:00:00','20150101 09:00:00')
     ValueError: Cannot convert arg ['20150101 07:00:00'] to a time.

- ``.memory_usage`` now includes values in the index, as does memory_usage in ``.info`` (:issue:`11597`)

- ``DataFrame.to_latex()`` now supports non-ascii encodings (eg utf-8) in Python 2 with the parameter ``encoding`` (:issue:`7061`)


Changes to eval
^^^^^^^^^^^^^^^

In prior versions, new columns assignments in an ``eval`` expression resulted
in an inplace change to the ``DataFrame``. (:issue:`9297`)

.. ipython:: python

   df = pd.DataFrame({'a': np.linspace(0, 10, 5), 'b': range(5)})
   df.eval('c = a + b')
   df

In version 0.18.0, a new ``inplace`` keyword was added to choose whether the
assignment should be done inplace or return a copy.

.. ipython:: python

   df
   df.eval('d = c - b', inplace=False)
   df
   df.eval('d = c - b', inplace=True)
   df

.. warning::

   For backwards compatability, ``inplace`` defaults to ``True`` if not specified.
   This will change in a future version of pandas. If your code depends on an
   inplace assignment you should update to explicitly set ``inplace=True``

The ``inplace`` keyword parameter was also added the ``query`` method.

.. ipython:: python

   df.query('a > 5')
   df.query('a > 5', inplace=True)
   df

.. warning::

   Note that the default value for ``inplace`` in a ``query``
   is ``False``, which is consistent with prior versions.

``eval`` has also been updated to allow multi-line expressions for multiple
assignments.  These expressions will be evaluated one at a time in order.  Only
assignments are valid for multi-line expressions.

.. ipython:: python

   df
   df.eval("""
   e = d + a
   f = e - 22
   g = f / 2.0""", inplace=True)
   df

.. _whatsnew_0180.deprecations:

Deprecations
^^^^^^^^^^^^

.. _whatsnew_0180.window_deprecations:

- The functions ``pd.rolling_*``, ``pd.expanding_*``, and ``pd.ewm*`` are deprecated and replaced by the corresponding method call. Note that
  the new suggested syntax includes all of the arguments (even if default) (:issue:`11603`)

  .. code-block:: python

     In [1]: s = Series(range(3))

     In [2]: pd.rolling_mean(s,window=2,min_periods=1)
             FutureWarning: pd.rolling_mean is deprecated for Series and
                  will be removed in a future version, replace with
                  Series.rolling(min_periods=1,window=2,center=False).mean()
     Out[2]:
             0    0.0
             1    0.5
             2    1.5
             dtype: float64

     In [3]: pd.rolling_cov(s, s, window=2)
             FutureWarning: pd.rolling_cov is deprecated for Series and
                  will be removed in a future version, replace with
                  Series.rolling(window=2).cov(other=<Series>)
     Out[3]:
             0    NaN
             1    0.5
             2    0.5
             dtype: float64

- The the ``freq`` and ``how`` arguments to the ``.rolling``, ``.expanding``, and ``.ewm`` (new) functions are deprecated, and will be removed in a future version. You can simply resample the input prior to creating a window function. (:issue:`11603`).

  For example, instead of ``s.rolling(window=5,freq='D').max()`` to get the max value on a rolling 5 Day window, one could use ``s.resample('D',how='max').rolling(window=5).max()``, which first resamples the data to daily data, then provides a rolling 5 day window.

- ``pd.tseries.frequencies.get_offset_name`` function is deprecated. Use offset's ``.freqstr`` property as alternative (:issue:`11192`)
- ``pandas.stats.fama_macbeth`` routines are deprecated and will be removed in a future version (:issue:`6077`)
- ``pandas.stats.ols``, ``pandas.stats.plm`` and ``pandas.stats.var`` routines are deprecated and will be removed in a future version (:issue:`6077`)
- show a ``FutureWarning`` rather than a ``DeprecationWarning`` on using long-time deprecated syntax in ``HDFStore.select``, where ``where`` clause is not a string-like (:issue:`12027`)

.. _whatsnew_0180.prior_deprecations:

Removal of prior version deprecations/changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- Removal of ``rolling_corr_pairwise`` in favor of ``.rolling().corr(pairwise=True)`` (:issue:`4950`)
- Removal of ``expanding_corr_pairwise`` in favor of ``.expanding().corr(pairwise=True)`` (:issue:`4950`)


.. _whatsnew_0180.performance:

Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- Improved performance of ``andrews_curves`` (:issue:`11534`)
- Improved huge ``DatetimeIndex``, ``PeriodIndex`` and ``TimedeltaIndex``'s ops performance including ``NaT`` (:issue:`10277`)
- Improved performance of ``pandas.concat`` (:issue:`11958`)
- Improved performance of ``StataReader`` (:issue:`11591`)


.. _whatsnew_0180.bug_fixes:

Bug Fixes
~~~~~~~~~

- Bug in ``GroupBy.size`` when data-frame is empty. (:issue:`11699`)
- Bug in ``Period.end_time`` when a multiple of time period is requested (:issue:`11738`)
- Regression in ``.clip`` with tz-aware datetimes (:issue:`11838`)
- Bug in ``date_range`` when the boundaries fell on the frequency (:issue:`11804`)
- Bug in consistency of passing nested dicts to ``.groupby(...).agg(...)`` (:issue:`9052`)
- Accept unicode in ``Timedelta`` constructor (:issue:`11995`)
- Bug in value label reading for ``StataReader`` when reading incrementally (:issue:`12014`)
- Bug in vectorized ``DateOffset`` when ``n`` parameter is ``0`` (:issue:`11370`)


- Bug in ``Timedelta.round`` with negative values (:issue:`11690`)
- Bug in ``.loc`` against ``CategoricalIndex`` may result in normal ``Index`` (:issue:`11586`)
- Bug in ``DataFrame.info`` when duplicated column names exist (:issue:`11761`)
- Bug in ``.copy`` of datetime tz-aware objects (:issue:`11794`)
- Bug in ``Series.apply`` and ``Series.map`` where ``timedelta64`` was not boxed (:issue:`11349`)


- Bug in subclasses of ``DataFrame`` where ``AttributeError`` did not propagate (:issue:`11808`)
- Bug groupby on tz-aware data where selection not returning ``Timestamp`` (:issue:`11616`)
- Bug in ``pd.read_clipboard`` and ``pd.to_clipboard`` functions not supporting Unicode; upgrade included ``pyperclip`` to v1.5.15 (:issue:`9263`)
- Bug in ``DataFrame.query`` containing an assignment (:issue:`8664`)

- Bug in ``from_msgpack`` where ``__contains__()`` fails for columns of the unpacked ``DataFrame``, if the ``DataFrame`` has object columns. (:issue: `11880`)


- Bug in timezone info lost when broadcasting scalar datetime to ``DataFrame`` (:issue:`11682`)
- Bug in ``Index`` creation from ``Timestamp`` with mixed tz coerces to UTC (:issue:`11488`)
- Bug in ``to_numeric`` where it does not raise if input is more than one dimension (:issue:`11776`)
- Bug in parsing timezone offset strings with non-zero minutes (:issue:`11708`)
- Bug in ``df.plot`` using incorrect colors for bar plots under matplotlib 1.5+ (:issue:`11614`)
- Bug in the ``groupby`` ``plot`` method when using keyword arguments (:issue:`11805`).
- Bug in ``DataFrame.duplicated`` and ``drop_duplicates`` causing spurious matches when setting ``keep=False`` (:issue:`11864`)
- Bug in ``.loc`` result with duplicated key may have ``Index`` with incorrect dtype (:issue:`11497`)
- Bug in ``pd.rolling_median`` where memory allocation failed even with sufficient memory (:issue:`11696`)
- Bug in ``.style.bar`` may not rendered properly using specific browser (:issue:`11678`)
- Bug in rich comparison of ``Timedelta`` with a ``numpy.array`` of ``Timedelta`` that caused an infinite recursion (:issue:`11835`)
- Bug in ``DataFrame.round`` dropping column index name (:issue:`11986`)
- Bug in ``df.replace`` while replacing value in mixed dtype ``Dataframe`` (:issue:`11698`)
- Bug in ``Index`` prevents copying name of passed ``Index``, when a new name is not provided (:issue:`11193`)
- Bug in ``read_excel`` failing to read any non-empty sheets when empty sheets exist and ``sheetname=None`` (:issue:`11711`)
- Bug in ``read_excel`` failing to raise ``NotImplemented`` error when keywords ``parse_dates`` and ``date_parser`` are provided (:issue:`11544`)
- Bug in ``read_sql`` with pymysql connections failing to return chunked data (:issue:`11522`)
- Bug in ``.to_csv`` ignoring formatting parameters ``decimal``, ``na_rep``, ``float_format`` for float indexes (:issue:`11553`)
- Bug in ``Int64Index`` and ``Float64Index`` preventing the use of the modulo operator (:issue:`9244`)


- Bug in ``DataFrame`` when masking an empty ``DataFrame`` (:issue:`11859`)


- Bug in ``.plot`` potentially modifying the ``colors`` input when the number
of columns didn't match the number of series provided (:issue:`12039`).


- Removed ``millisecond`` property of ``DatetimeIndex``. This would always raise a ``ValueError`` (:issue:`12019`).
- Bug in ``Series`` constructor with read-only data (:issue:`11502`)