doc/source/v0.15.0.txt

.. _whatsnew_0150:

v0.15.0 (???)
-------------

This is a major release from 0.14.1 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

- Highlights include:

  - The ``Categorical`` type was integrated as a first-class pandas type, see :ref:`here <whatsnew_0150.cat>`

- :ref:`Other Enhancements <whatsnew_0150.enhancements>`

- :ref:`API Changes <whatsnew_0150.api>`

- :ref:`Performance Improvements <whatsnew_0150.performance>`

- :ref:`Prior Deprecations <whatsnew_0150.prior_deprecations>`

- :ref:`Deprecations <whatsnew_0150.deprecations>`

- :ref:`Known Issues <whatsnew_0150.knownissues>`

- :ref:`Bug Fixes <whatsnew_0150.bug_fixes>`

.. _whatsnew_0150.api:

API changes
~~~~~~~~~~~

- Passing multiple levels to `DataFrame.stack()` will now work when multiple level
  numbers are passed (:issue:`7660`), and will raise a ``ValueError`` when the
  levels aren't all level names or all level numbers. See
  :ref:`Reshaping by stacking and unstacking <reshaping.stack_multiple>`.

- Raise a ``ValueError`` in ``df.to_hdf`` with 'fixed' format, if ``df`` has non-unique columns as the resulting file will be broken (:issue:`7761`)

- :func:`rolling_min`, :func:`rolling_max`, :func:`rolling_cov`, and :func:`rolling_corr`
  now return objects with all ``NaN`` when ``len(arg) < min_periods <= window`` rather
  than raising. (This makes all rolling functions consistent in this behavior), (:issue:`7766`)

  Prior to 0.15.0

  .. ipython:: python

     s = Series([10, 11, 12, 13])

  .. code-block:: python

     In [15]: rolling_min(s, window=10, min_periods=5)
     ValueError: min_periods (5) must be <= window (4)

  New behavior

  .. ipython:: python

     rolling_min(s, window=10, min_periods=5)

- :func:`ewma`, :func:`ewmastd`, :func:`ewmavar`, :func:`ewmacorr`, and :func:`ewmacov`
  now have an optional ``ignore_na`` argument.
  When ``ignore_na=False`` (the default), missing values are taken into account in the weights calculation.
  When ``ignore_na=True`` (which reproduces the pre-0.15.0 behavior), missing values are ignored in the weights calculation.
  (:issue:`7543`)

  .. ipython:: python

     ewma(Series([None, 1., 100.]), com=2.5)
     ewma(Series([1., None, 100.]), com=2.5, ignore_na=True) # pre-0.15.0 behavior
     ewma(Series([1., None, 100.]), com=2.5, ignore_na=False) # default

- Bug in passing a ``DatetimeIndex`` with a timezone that was not being retained in DataFrame construction from a dict (:issue:`7822`)

  In prior versions this would drop the timezone.

  .. ipython:: python

        i = date_range('1/1/2011', periods=3, freq='10s', tz = 'US/Eastern')
        i
        df = DataFrame( {'a' : i } )
        df
        df.dtypes

  This behavior is unchanged.

  .. ipython:: python

        df = DataFrame( )
        df['a'] = i
        df
        df.dtypes

.. _whatsnew_0150.cat:

Categoricals in Series/DataFrame
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:class:`~pandas.Categorical` can now be included in `Series` and `DataFrames` and gained new
methods to manipulate. Thanks to Jan Schultz for much of this API/implementation. (:issue:`3943`, :issue:`5313`, :issue:`5314`, :issue:`7444`).

For full docs, see the :ref:`Categorical introduction <categorical>` and the :ref:`API documentation <api.categorical>`.

.. ipython:: python

    df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})

    # convert the raw grades to a categorical
    df["grade"] = pd.Categorical(df["raw_grade"])

    # Alternative: df["grade"] = df["raw_grade"].astype("category")
    df["grade"]

    # Rename the levels
    df["grade"].cat.levels = ["very good", "good", "very bad"]

    # Reorder the levels and simultaneously add the missing levels
    df["grade"].cat.reorder_levels(["very bad", "bad", "medium", "good", "very good"])
    df["grade"]
    df.sort("grade")
    df.groupby("grade").size()

- ``pandas.core.group_agg`` and ``pandas.core.factor_agg`` were removed. As an alternative, construct
  a dataframe and use ``df.groupby(<group>).agg(<func>)``.

- Supplying "codes/labels and levels" to the :class:`~pandas.Categorical` constructor is deprecated and does
  not work without supplying ``compat=True``. The default mode now uses "values and levels".
  Please change your code to use the :meth:`~pandas.Categorical.from_codes` constructor.

- The ``Categorical.labels`` attribute was renamed to ``Categorical.codes`` and is read
  only. If you want to manipulate codes, please use one of the
  :ref:`API methods on Categoricals <api.categorical>`.


.. _whatsnew_0150.prior_deprecations:

Prior Version Deprecations/Changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are no prior version deprecations that are taking effect as of 0.15.0.

.. _whatsnew_0150.deprecations:

Deprecations
~~~~~~~~~~~~

.. _whatsnew_0150.knownissues:

Known Issues
~~~~~~~~~~~~

.. _whatsnew_0150.enhancements:

Enhancements
~~~~~~~~~~~~
- Added support for bool, uint8, uint16 and uint32 datatypes in ``to_stata`` (:issue:`7097`, :issue:`7365`)


- ``PeriodIndex`` supports ``resolution`` as the same as ``DatetimeIndex`` (:issue:`7708`)


.. _whatsnew_0150.performance:

Performance
~~~~~~~~~~~

- Performance improvements in ``DatetimeIndex.__iter__`` to allow faster iteration (:issue:`7683`)


.. _whatsnew_0150.experimental:

Experimental
~~~~~~~~~~~~

There are no experimental changes in 0.15.0

.. _whatsnew_0150.bug_fixes:

Bug Fixes
~~~~~~~~~
- Bug in Series 0-division with a float and integer operand dtypes  (:issue:`7785`)
- Bug in ``Series.astype("unicode")`` not calling ``unicode`` on the values correctly (:issue:`7758`)
- Bug in ``DataFrame.as_matrix()`` with mixed ``datetime64[ns]`` and ``timedelta64[ns]`` dtypes (:issue:`7778`)
- Bug in ``HDFStore.select_column()`` not preserving UTC timezone info when selecting a DatetimeIndex (:issue:`7777`)

- Bug in ``DatetimeIndex`` and ``PeriodIndex`` in-place addition and subtraction cause different result from normal one (:issue:`6527`)
- Bug in adding and subtracting ``PeriodIndex`` with ``PeriodIndex`` raise ``TypeError`` (:issue:`7741`)
- Bug in ``combine_first`` with ``PeriodIndex`` data raises ``TypeError`` (:issue:`3367`)


- Bug in pickles contains ``DateOffset`` may raise ``AttributeError`` when ``normalize`` attribute is reffered internally (:issue:`7748`)

- Bug in pickle deserialization that failed for pre-0.14.1 containers with dup items trying to avoid ambiguity
  when matching block and manager items, when there's only one block there's no ambiguity (:issue:`7794`)


- Bug in repeated timeseries line and area plot may result in ``ValueError`` or incorrect kind (:issue:`7733`)


- Bug in ``is_superperiod`` and ``is_subperiod`` cannot handle higher frequencies than ``S`` (:issue:`7760`, :issue:`7772`, :issue:`7803`)

- Bug in ``DataFrame.reset_index`` which has ``MultiIndex`` contains ``PeriodIndex`` or ``DatetimeIndex`` with tz raises ``ValueError`` (:issue:`7746`, :issue:`7793`)


- Bug in ``DataFrame.plot`` with ``subplots=True`` may draw unnecessary minor xticks and yticks (:issue:`7801`)
- Bug in ``StataReader`` which did not read variable labels in 117 files due to difference between Stata documentation and implementation (:issue:`7816`)

- Bug in ``expanding_cov``, ``expanding_corr``, ``rolling_cov``, ``rolling_cov``, ``ewmcov``, and ``ewmcorr``
  returning results with columns sorted by name and producing an error for non-unique columns;
  now handles non-unique columns and returns columns in original order
  (except for the case of two DataFrames with ``pairwise=False``, where behavior is unchanged) (:issue:`7542`)