doc/source/v0.15.0.txt

.. _whatsnew_0150:

v0.15.0 (???)
-------------

This is a major release from 0.14.1 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

- Highlights include:

  - The ``Categorical`` type was integrated as a first-class pandas type, see :ref:`here <whatsnew_0150.cat>`
  - Internal refactoring of the ``Index`` class to no longer sub-class ``ndarray``, see :ref:`Internal Refactoring <whatsnew_0150.refactoring>`
  - New datetimelike properties accessor ``.dt`` for Series, see :ref:`Dateimelike Properties <whatsnew_0150.dt>`

- :ref:`Other Enhancements <whatsnew_0150.enhancements>`

- :ref:`API Changes <whatsnew_0150.api>`

- :ref:`Performance Improvements <whatsnew_0150.performance>`

- :ref:`Prior Deprecations <whatsnew_0150.prior_deprecations>`

- :ref:`Deprecations <whatsnew_0150.deprecations>`

- :ref:`Known Issues <whatsnew_0150.knownissues>`

- :ref:`Bug Fixes <whatsnew_0150.bug_fixes>`

.. warning::

   In 0.15.0 ``Index`` has internally been refactored to no longer sub-class ``ndarray``
   but instead subclass ``PandasObject``, similarly to the rest of the pandas objects. This change allows very easy sub-classing and creation of new index types. This should be
   a transparent change with only very limited API implications (See the :ref:`Internal Refactoring <whatsnew_0150.refactoring>`)

.. _whatsnew_0150.api:

API changes
~~~~~~~~~~~

- Passing multiple levels to `DataFrame.stack()` will now work when multiple level
  numbers are passed (:issue:`7660`), and will raise a ``ValueError`` when the
  levels aren't all level names or all level numbers. See
  :ref:`Reshaping by stacking and unstacking <reshaping.stack_multiple>`.

- :func:`set_names`, :func:`set_labels`, and :func:`set_levels` methods now take an optional ``level`` keyword argument to all modification of specific level(s) of a MultiIndex. Additionally :func:`set_names` now accepts a scalar string value when operating on an ``Index`` or on a specific level of a ``MultiIndex`` (:issue:`7792`)

  .. ipython:: python

      idx = pandas.MultiIndex.from_product([['a'], range(3), list("pqr")], names=['foo', 'bar', 'baz'])
      idx.set_names('qux', level=0)
      idx.set_names(['qux','baz'], level=[0,1])
      idx.set_levels(['a','b','c'], level='bar')
      idx.set_levels([['a','b','c'],[1,2,3]], level=[1,2])

- Raise a ``ValueError`` in ``df.to_hdf`` with 'fixed' format, if ``df`` has non-unique columns as the resulting file will be broken (:issue:`7761`)

- :func:`rolling_min`, :func:`rolling_max`, :func:`rolling_cov`, and :func:`rolling_corr`
  now return objects with all ``NaN`` when ``len(arg) < min_periods <= window`` rather
  than raising. (This makes all rolling functions consistent in this behavior), (:issue:`7766`)

  Prior to 0.15.0

  .. ipython:: python

     s = Series([10, 11, 12, 13])

  .. code-block:: python

     In [15]: rolling_min(s, window=10, min_periods=5)
     ValueError: min_periods (5) must be <= window (4)

  New behavior

  .. ipython:: python

     rolling_min(s, window=10, min_periods=5)

- :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcorr`, and :func:`ewmcov`
  now have an optional ``ignore_na`` argument.
  When ``ignore_na=False`` (the default), missing values are taken into account in the weights calculation.
  When ``ignore_na=True`` (which reproduces the pre-0.15.0 behavior), missing values are ignored in the weights calculation.
  (:issue:`7543`)

  .. ipython:: python

     ewma(Series([None, 1., 100.]), com=2.5)
     ewma(Series([1., None, 100.]), com=2.5, ignore_na=True) # pre-0.15.0 behavior
     ewma(Series([1., None, 100.]), com=2.5, ignore_na=False) # default

- :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcorr`, and :func:`ewmcov`
  now set to ``NaN`` the first ``min_periods-1`` entries of the result (for ``min_periods>1``).
  Previously the first ``min_periods`` entries of the result were set to ``NaN``.
  The new behavior accords with the existing documentation. (:issue:`7884`)

- :func:`rolling_max`, :func:`rolling_min`, :func:`rolling_sum`, :func:`rolling_mean`, :func:`rolling_median`,
  :func:`rolling_std`, :func:`rolling_var`, :func:`rolling_skew`, :func:`rolling_kurt`, and :func:`rolling_quantile`,
  :func:`rolling_cov`, :func:`rolling_corr`, :func:`rolling_corr_pairwise`,
  :func:`rolling_window`, and :func:`rolling_apply` with ``center=True`` previously would return a result of the same
  structure as the input ``arg`` with ``NaN``s in the final ``(window-1)/2`` entries.
  Now the final ``(window-1)/2`` entries of the result are calculated as if the input ``arg`` were followed
  by ``(window-1)/2`` ``NaN``s. (:issue:`7925`)

  Prior behavior (note final value is ``NaN``):

  .. code-block:: python

    In [7]: rolling_sum(Series(range(5)), window=3, min_periods=0, center=True)
    Out[7]:
    0     1
    1     3
    2     6
    3     9
    4   NaN
    dtype: float64
  
  New behavior (note final value is ``7 = sum([3, 4, NaN])``):

  .. ipython:: python

    rolling_sum(Series(range(5)), window=3, min_periods=0, center=True)

- Removed ``center`` argument from :func:`expanding_max`, :func:`expanding_min`, :func:`expanding_sum`,
  :func:`expanding_mean`, :func:`expanding_median`, :func:`expanding_std`, :func:`expanding_var`,
  :func:`expanding_skew`, :func:`expanding_kurt`, :func:`expanding_quantile`, :func:`expanding_count`,
  :func:`expanding_cov`, :func:`expanding_corr`, :func:`expanding_corr_pairwise`, and :func:`expanding_apply`,
  as the results produced when ``center=True`` did not make much sense. (:issue:`7925`)

- Bug in passing a ``DatetimeIndex`` with a timezone that was not being retained in DataFrame construction from a dict (:issue:`7822`)

  In prior versions this would drop the timezone.

  .. ipython:: python

        i = date_range('1/1/2011', periods=3, freq='10s', tz = 'US/Eastern')
        i
        df = DataFrame( {'a' : i } )
        df
        df.dtypes

  This behavior is unchanged.

  .. ipython:: python

        df = DataFrame( )
        df['a'] = i
        df
        df.dtypes

- ``SettingWithCopy`` raise/warnings (according to the option ``mode.chained_assignment``) will now be issued when setting a value on a sliced mixed-dtype DataFrame using chained-assignment. (:issue:`7845`, :issue:`7950`)

  .. code-block:: python

     In [1]: df = DataFrame(np.arange(0,9), columns=['count'])

     In [2]: df['group'] = 'b'

     In [3]: df.iloc[0:5]['group'] = 'a'
     /usr/local/bin/ipython:1: SettingWithCopyWarning:
     A value is trying to be set on a copy of a slice from a DataFrame.
     Try using .loc[row_indexer,col_indexer] = value instead

     See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

- The ``infer_types`` argument to :func:`~pandas.io.html.read_html` now has no
  effect (:issue:`7762`, :issue:`7032`).

- ``DataFrame.to_stata`` and ``StataWriter`` check string length for
  compatibility with limitations imposed in dta files where fixed-width
  strings must contain 244 or fewer characters.  Attempting to write Stata
  dta files with strings longer than 244 characters raises a ``ValueError``. (:issue:`7858`)

- ``Index.isin`` now supports a ``level`` argument to specify which index level
  to use for membership tests (:issue:`7892`, :issue:`7890`)

  .. code-block:: python

     In [1]: idx = pd.MultiIndex.from_product([[0, 1], ['a', 'b', 'c']])

     In [2]: idx.values
     Out[2]: array([(0, 'a'), (0, 'b'), (0, 'c'), (1, 'a'), (1, 'b'), (1, 'c')], dtype=object)

     In [3]: idx.isin(['a', 'c', 'e'], level=1)
     Out[3]: array([ True, False,  True,  True, False,  True], dtype=bool)

- ``tz_localize(None)`` for tz-aware ``Timestamp`` and ``DatetimeIndex`` now removes timezone holding local time,
previously results in ``Exception`` or ``TypeError`` (:issue:`7812`)

  .. ipython:: python

     ts = Timestamp('2014-08-01 09:00', tz='US/Eastern')
     ts
     ts.tz_localize(None)

     didx = DatetimeIndex(start='2014-08-01 09:00', freq='H', periods=10, tz='US/Eastern')
     didx
     didx.tz_localize(None)

- ``DataFrame.tz_localize`` and ``DataFrame.tz_convert`` now accepts an optional ``level`` argument
  for localizing a specific level of a MultiIndex (:issue:`7846`)

- ``Timestamp.__repr__`` displays ``dateutil.tz.tzoffset`` info (:issue:`7907`)
- ``merge``, ``DataFrame.merge``, and ``ordered_merge`` now return the same type
  as the ``left`` argument.  (:issue:`7737`)

.. _whatsnew_0150.dt:

.dt accessor
~~~~~~~~~~~~

``Series`` has gained an accessor to succinctly return datetime like properties for the *values* of the Series, if its a datetime/period like Series. (:issue:`7207`)
This will return a Series, indexed like the existing Series. See the :ref:`docs <basics.dt_accessors>`

.. ipython:: python

   # datetime
   s = Series(date_range('20130101 09:10:12',periods=4))
   s
   s.dt.hour
   s.dt.second
   s.dt.day

This enables nice expressions like this:

.. ipython:: python

   s[s.dt.day==2]

.. ipython:: python

   # period
   s = Series(period_range('20130101',periods=4,freq='D').asobject)
   s
   s.dt.year
   s.dt.day

.. _whatsnew_0150.refactoring:

Internal Refactoring
~~~~~~~~~~~~~~~~~~~~

In 0.15.0 ``Index`` has internally been refactored to no longer sub-class ``ndarray``
but instead subclass ``PandasObject``, similarly to the rest of the pandas objects. This change allows very easy sub-classing and creation of new index types. This should be
a transparent change with only very limited API implications (:issue:`5080`, :issue:`7439`, :issue:`7796`)

- you may need to unpickle pandas version < 0.15.0 pickles using ``pd.read_pickle`` rather than ``pickle.load``. See :ref:`pickle docs <io.pickle>`
- when plotting with a ``PeriodIndex``. The ``matplotlib`` internal axes will now be arrays of ``Period`` rather than a ``PeriodIndex``. (this is similar to how a ``DatetimeIndex`` passes arrays of ``datetimes`` now)
- MultiIndexes will now raise similary to other pandas objects w.r.t. truth testing, See :ref:`here <gotchas.truth>` (:issue:`7897`).

.. _whatsnew_0150.cat:

Categoricals in Series/DataFrame
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:class:`~pandas.Categorical` can now be included in `Series` and `DataFrames` and gained new
methods to manipulate. Thanks to Jan Schultz for much of this API/implementation. (:issue:`3943`, :issue:`5313`, :issue:`5314`,
:issue:`7444`, :issue:`7839`, :issue:`7848`, :issue:`7864`, :issue:`7914`).

For full docs, see the :ref:`Categorical introduction <categorical>` and the :ref:`API documentation <api.categorical>`.

.. ipython:: python

    df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})

    # convert the raw grades to a categorical
    df["grade"] = pd.Categorical(df["raw_grade"])

    # Alternative: df["grade"] = df["raw_grade"].astype("category")
    df["grade"]

    # Rename the levels
    df["grade"].cat.levels = ["very good", "good", "very bad"]

    # Reorder the levels and simultaneously add the missing levels
    df["grade"].cat.reorder_levels(["very bad", "bad", "medium", "good", "very good"])
    df["grade"]
    df.sort("grade")
    df.groupby("grade").size()

- ``pandas.core.group_agg`` and ``pandas.core.factor_agg`` were removed. As an alternative, construct
  a dataframe and use ``df.groupby(<group>).agg(<func>)``.

- Supplying "codes/labels and levels" to the :class:`~pandas.Categorical` constructor is deprecated and does
  not work without supplying ``compat=True``. The default mode now uses "values and levels".
  Please change your code to use the :meth:`~pandas.Categorical.from_codes` constructor.

- The ``Categorical.labels`` attribute was renamed to ``Categorical.codes`` and is read
  only. If you want to manipulate codes, please use one of the
  :ref:`API methods on Categoricals <api.categorical>`.


.. _whatsnew_0150.prior_deprecations:

Prior Version Deprecations/Changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are no prior version deprecations that are taking effect as of 0.15.0.

.. _whatsnew_0150.deprecations:

Deprecations
~~~~~~~~~~~~

.. _whatsnew_0150.knownissues:

Known Issues
~~~~~~~~~~~~

.. _whatsnew_0150.enhancements:

Enhancements
~~~~~~~~~~~~
- Added support for bool, uint8, uint16 and uint32 datatypes in ``to_stata`` (:issue:`7097`, :issue:`7365`)


- ``PeriodIndex`` supports ``resolution`` as the same as ``DatetimeIndex`` (:issue:`7708`)
-``pandas.tseries.holiday`` has added support for additional holidays and ways to observe holidays (:issue: `7070`)
-``pandas.tseries.holiday.Holiday`` now supports a list of offsets in Python3 (:issue: `7070`)


- ``Period`` and ``PeriodIndex`` supports addition/subtraction with ``timedelta``-likes (:issue:`7966`)

  If ``Period`` freq is ``D``, ``H``, ``T``, ``S``, ``L``, ``U``, ``N``, ``timedelta``-like can be added if the result can have same freq. Otherwise, only the same ``offsets`` can be added.

  .. ipython:: python

    idx = pd.period_range('2014-07-01 09:00', periods=5, freq='H')
    idx
    idx + pd.offsets.Hour(2)
    idx + timedelta(minutes=120)
    idx + np.timedelta64(7200, 's')

    idx = pd.period_range('2014-07', periods=5, freq='M')
    idx
    idx + pd.offsets.MonthEnd(3)


.. _whatsnew_0150.performance:

Performance
~~~~~~~~~~~

- Performance improvements in ``DatetimeIndex.__iter__`` to allow faster iteration (:issue:`7683`)
- Performance improvements in ``Period`` creation (and ``PeriodIndex`` setitem) (:issue:`5155`)
- Improvements in Series.transform for significant performance gains (revised) (:issue:`6496`)


.. _whatsnew_0150.experimental:

Experimental
~~~~~~~~~~~~

There are no experimental changes in 0.15.0

.. _whatsnew_0150.bug_fixes:

Bug Fixes
~~~~~~~~~
- Bug in Series 0-division with a float and integer operand dtypes  (:issue:`7785`)
- Bug in ``Series.astype("unicode")`` not calling ``unicode`` on the values correctly (:issue:`7758`)
- Bug in ``DataFrame.as_matrix()`` with mixed ``datetime64[ns]`` and ``timedelta64[ns]`` dtypes (:issue:`7778`)
- Bug in ``HDFStore.select_column()`` not preserving UTC timezone info when selecting a DatetimeIndex (:issue:`7777`)
- Bug in ``to_datetime`` when ``format='%Y%m%d'`` and ``coerce=True`` are specified, where previously an object array was returned (rather than
  a coerced time-series with ``NaT``), (:issue:`7930`)
- Bug in ``DatetimeIndex`` and ``PeriodIndex`` in-place addition and subtraction cause different result from normal one (:issue:`6527`)
- Bug in adding and subtracting ``PeriodIndex`` with ``PeriodIndex`` raise ``TypeError`` (:issue:`7741`)
- Bug in ``combine_first`` with ``PeriodIndex`` data raises ``TypeError`` (:issue:`3367`)
- Bug in multi-index slicing with missing indexers (:issue:`7866`)
- Regression in multi-index indexing with a non-scalar type object (:issue:`7914`)

- Bug in pickles contains ``DateOffset`` may raise ``AttributeError`` when ``normalize`` attribute is reffered internally (:issue:`7748`)

- Bug in pickle deserialization that failed for pre-0.14.1 containers with dup items trying to avoid ambiguity
  when matching block and manager items, when there's only one block there's no ambiguity (:issue:`7794`)


- Bug in repeated timeseries line and area plot may result in ``ValueError`` or incorrect kind (:issue:`7733`)


- Bug in ``offsets.apply``, ``rollforward`` and ``rollback`` may reset nanosecond (:issue:`7697`)
- Bug in ``offsets.apply``, ``rollforward`` and ``rollback`` may raise ``AttributeError`` if ``Timestamp`` has ``dateutil`` tzinfo (:issue:`7697`)


- Bug in ``is_superperiod`` and ``is_subperiod`` cannot handle higher frequencies than ``S`` (:issue:`7760`, :issue:`7772`, :issue:`7803`)

- Bug in ``PeriodIndex.unique`` returns int64 ``np.ndarray`` (:issue:`7540`)

- Bug in ``DataFrame.reset_index`` which has ``MultiIndex`` contains ``PeriodIndex`` or ``DatetimeIndex`` with tz raises ``ValueError`` (:issue:`7746`, :issue:`7793`)


- Bug in ``DataFrame.plot`` with ``subplots=True`` may draw unnecessary minor xticks and yticks (:issue:`7801`)
- Bug in ``StataReader`` which did not read variable labels in 117 files due to difference between Stata documentation and implementation (:issue:`7816`)
- Bug in ``StataReader`` where strings were always converted to 244 characters-fixed width irrespective of underlying string size (:issue:`7858`)
- Bug in ``expanding_cov``, ``expanding_corr``, ``rolling_cov``, ``rolling_cov``, ``ewmcov``, and ``ewmcorr``
  returning results with columns sorted by name and producing an error for non-unique columns;
  now handles non-unique columns and returns columns in original order
  (except for the case of two DataFrames with ``pairwise=False``, where behavior is unchanged) (:issue:`7542`)


- Bug in ``DataFrame.plot`` and ``Series.plot`` may ignore ``rot`` and ``fontsize`` keywords (:issue:`7844`)


- Bug in ``DatetimeIndex.value_counts`` doesn't preserve tz  (:issue:`7735`)
- Bug in ``PeriodIndex.value_counts`` results in ``Int64Index`` (:issue:`7735`)


- Bug in ``GroupBy.transform()`` where int groups with a transform that
  didn't preserve the index were incorrectly truncated (:issue:`7972`).

- Bug in ``groupby`` where callable objects without name attributes would take the wrong path,
  and produce a ``DataFrame`` instead of a ``Series`` (:issue:`7929`)


- Bug in ``read_html`` where the ``infer_types`` argument forced coercion of
  date-likes incorrectly (:issue:`7762`, :issue:`7032`).


- Bug in ``Series.str.cat`` with an index which was filtered as to not include the first item (:issue:`7857`)


- Bug in ``Timestamp`` cannot parse ``nanosecond`` from string (:issue:`7878`)
- Bug in ``Timestamp`` with string offset and ``tz`` results incorrect (:issue:`7833`)

- Bug in ``tslib.tz_convert`` and ``tslib.tz_convert_single`` may return different results (:issue:`7798`)
- Bug in ``DatetimeIndex.intersection`` of non-overlapping timestamps with tz raises ``IndexError`` (:issue:`7880`)


- Bug in ``GroupBy.filter()`` where fast path vs. slow path made the filter
  return a non scalar value that appeared valid but wasn't (:issue:`7870`).
- Bug in ``date_range()``/``DatetimeIndex()`` when the timezone was inferred from input dates yet incorrect
  times were returned when crossing DST boundaries (:issue:`7835`, :issue:`7901`).


- ``Period`` and ``PeriodIndex`` addition/subtraction with ``np.timedelta64`` results in incorrect internal representations (:issue:`7740`)