Skip to content

DOC: some reviewing of the 0.20 whatsnew file #16254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 48 additions & 66 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,13 @@ Highlights include:
- The ``.ix`` indexer has been deprecated, see :ref:`here <whatsnew_0200.api_breaking.deprecate_ix>`
- ``Panel`` has been deprecated, see :ref:`here <whatsnew_0200.api_breaking.deprecate_panel>`
- Addition of an ``IntervalIndex`` and ``Interval`` scalar type, see :ref:`here <whatsnew_0200.enhancements.intervalindex>`
- Improved user API when accessing levels in ``.groupby()``, see :ref:`here <whatsnew_0200.enhancements.groupby_access>`
- Improved user API when grouping by index levels in ``.groupby()``, see :ref:`here <whatsnew_0200.enhancements.groupby_access>`
- Improved support for ``UInt64`` dtypes, see :ref:`here <whatsnew_0200.enhancements.uint64_support>`
- A new orient for JSON serialization, ``orient='table'``, that uses the :ref:`Table Schema spec <whatsnew_0200.enhancements.table_schema>`
- Experimental support for exporting ``DataFrame.style`` formats to Excel, see :ref:`here <whatsnew_0200.enhancements.style_excel>`
- A new orient for JSON serialization, ``orient='table'``, that uses the Table Schema spec and that gives the possibility for a more interactive repr in the Jupyter Notebook, see :ref:`here <whatsnew_0200.enhancements.table_schema>`
- Experimental support for exporting styled DataFrames (``DataFrame.style``) to Excel, see :ref:`here <whatsnew_0200.enhancements.style_excel>`
- Window binary corr/cov operations now return a MultiIndexed ``DataFrame`` rather than a ``Panel``, as ``Panel`` is now deprecated, see :ref:`here <whatsnew_0200.api_breaking.rolling_pairwise>`
- Support for S3 handling now uses ``s3fs``, see :ref:`here <whatsnew_0200.api_breaking.s3>`
- Google BigQuery support now uses the ``pandas-gbq`` library, see :ref:`here <whatsnew_0200.api_breaking.gbq>`
- Switched the test framework to use `pytest <http://doc.pytest.org/en/latest>`__ (:issue:`13097`)

.. warning::

Expand All @@ -41,12 +40,12 @@ New features

.. _whatsnew_0200.enhancements.agg:

``agg`` API
^^^^^^^^^^^
``agg`` API for DataFrame/Series
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Series & DataFrame have been enhanced to support the aggregation API. This is a familiar API
from groupby, window operations, and resampling. This allows aggregation operations in a concise
by using :meth:`~DataFrame.agg`, and :meth:`~DataFrame.transform`. The full documentation
from groupby, window operations, and resampling. This allows aggregation operations in a concise way
by using :meth:`~DataFrame.agg` and :meth:`~DataFrame.transform`. The full documentation
is :ref:`here <basics.aggregate>` (:issue:`1623`).

Here is a sample
Expand Down Expand Up @@ -107,22 +106,14 @@ aggregations. This is similiar to how groupby ``.agg()`` works. (:issue:`15015`)
``dtype`` keyword for data IO
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``'python'`` engine for :func:`read_csv` now accepts the ``dtype`` keyword argument for specifying the types of specific columns (:issue:`14295`). See the :ref:`io docs <io.dtypes>` for more information.
The ``'python'`` engine for :func:`read_csv`, as well as the :func:`read_fwf` function for parsing
fixed-width text files and :func:`read_excel` for parsing Excel files, now accept the ``dtype`` keyword argument for specifying the types of specific columns (:issue:`14295`). See the :ref:`io docs <io.dtypes>` for more information.

.. ipython:: python
:suppress:

from pandas.compat import StringIO

.. ipython:: python

data = "a,b\n1,2\n3,4"
pd.read_csv(StringIO(data), engine='python').dtypes
pd.read_csv(StringIO(data), engine='python', dtype={'a':'float64', 'b':'object'}).dtypes

The ``dtype`` keyword argument is also now supported in the :func:`read_fwf` function for parsing
fixed-width text files, and :func:`read_excel` for parsing Excel files.

.. ipython:: python

data = "a b\n1 2\n3 4"
Expand All @@ -135,16 +126,16 @@ fixed-width text files, and :func:`read_excel` for parsing Excel files.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`to_datetime` has gained a new parameter, ``origin``, to define a reference date
from where to compute the resulting ``DatetimeIndex`` when ``unit`` is specified. (:issue:`11276`, :issue:`11745`)
from where to compute the resulting timestamps when parsing numerical values with a specific ``unit`` specified. (:issue:`11276`, :issue:`11745`)

Start with 1960-01-01 as the starting date
For example, with 1960-01-01 as the starting date:

.. ipython:: python

pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('1960-01-01'))

The default is set at ``origin='unix'``, which defaults to ``1970-01-01 00:00:00``.
Commonly called 'unix epoch' or POSIX time. This was the previous default, so this is a backward compatible change.
The default is set at ``origin='unix'``, which defaults to ``1970-01-01 00:00:00``, which is
commonly called 'unix epoch' or POSIX time. This was the previous default, so this is a backward compatible change.

.. ipython:: python

Expand All @@ -156,7 +147,7 @@ Commonly called 'unix epoch' or POSIX time. This was the previous default, so th
Groupby Enhancements
^^^^^^^^^^^^^^^^^^^^

Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names.
Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names. Previously, only column names could be referenced. This allows to easily group by a column and index level at the same time. (:issue:`5677`)

.. ipython:: python

Expand All @@ -172,8 +163,6 @@ Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now refere

df.groupby(['second', 'A']).sum()

Previously, only column names could be referenced. (:issue:`5677`)


.. _whatsnew_0200.enhancements.compressed_urls:

Expand Down Expand Up @@ -203,7 +192,7 @@ support for bz2 compression in the python 2 C-engine improved (:issue:`14874`).
Pickle file I/O now supports compression
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`read_pickle`, :meth:`DataFame.to_pickle` and :meth:`Series.to_pickle`
:func:`read_pickle`, :meth:`DataFrame.to_pickle` and :meth:`Series.to_pickle`
can now read from and write to compressed pickle files. Compression methods
can be an explicit parameter or be inferred from the file extension.
See :ref:`the docs here. <io.pickle.compression>`
Expand All @@ -221,33 +210,24 @@ Using an explicit compression type

df.to_pickle("data.pkl.compress", compression="gzip")
rt = pd.read_pickle("data.pkl.compress", compression="gzip")
rt

Inferring compression type from the extension

.. ipython:: python
rt.head()

df.to_pickle("data.pkl.xz", compression="infer")
rt = pd.read_pickle("data.pkl.xz", compression="infer")
rt

The default is to ``infer``:
The default is to infer the compression type from the extension (``compression='infer'``):

.. ipython:: python

df.to_pickle("data.pkl.gz")
rt = pd.read_pickle("data.pkl.gz")
rt
rt.head()
df["A"].to_pickle("s1.pkl.bz2")
rt = pd.read_pickle("s1.pkl.bz2")
rt
rt.head()

.. ipython:: python
:suppress:

import os
os.remove("data.pkl.compress")
os.remove("data.pkl.xz")
os.remove("data.pkl.gz")
os.remove("s1.pkl.bz2")

Expand Down Expand Up @@ -293,15 +273,15 @@ In previous versions, ``.groupby(..., sort=False)`` would fail with a ``ValueErr
ordered=True)})
df

Previous Behavior:
**Previous Behavior**:

.. code-block:: ipython

In [3]: df[df.chromosomes != '1'].groupby('chromosomes', sort=False).sum()
---------------------------------------------------------------------------
ValueError: items in new_categories are not the same as in old categories

New Behavior:
**New Behavior**:

.. ipython:: python

Expand All @@ -327,7 +307,7 @@ the data.
df.to_json(orient='table')


See :ref:`IO: Table Schema for more<io.table_schema>`.
See :ref:`IO: Table Schema for more information <io.table_schema>`.

Additionally, the repr for ``DataFrame`` and ``Series`` can now publish
this JSON Table schema representation of the Series or DataFrame if you are
Expand Down Expand Up @@ -411,6 +391,11 @@ pandas has gained an ``IntervalIndex`` with its own dtype, ``interval`` as well
notation, specifically as a return type for the categories in :func:`cut` and :func:`qcut`. The ``IntervalIndex`` allows some unique indexing, see the
:ref:`docs <indexing.intervallindex>`. (:issue:`7640`, :issue:`8625`)

.. warning::

These indexing behaviors of the IntervalIndex are provisional and may change in a future version of pandas. Feedback on usage is welcome.


Previous behavior:

The returned categories were strings, representing Intervals
Expand Down Expand Up @@ -473,9 +458,8 @@ Other Enhancements
- ``Series.str.replace()`` now accepts a callable, as replacement, which is passed to ``re.sub`` (:issue:`15055`)
- ``Series.str.replace()`` now accepts a compiled regular expression as a pattern (:issue:`15446`)
- ``Series.sort_index`` accepts parameters ``kind`` and ``na_position`` (:issue:`13589`, :issue:`14444`)
- ``DataFrame`` has gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`).
- ``DataFrame`` and ``DataFrame.groupby()`` have gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`, :issue:`15197`).
- ``DataFrame`` has gained a ``melt()`` method, equivalent to ``pd.melt()``, for unpivoting from a wide to long format (:issue:`12640`).
- ``DataFrame.groupby()`` has gained a ``.nunique()`` method to count the distinct values for all columns within each group (:issue:`14336`, :issue:`15197`).
- ``pd.read_excel()`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)
- Multiple offset aliases with decimal points are now supported (e.g. ``0.5min`` is parsed as ``30s``) (:issue:`8419`)
- ``.isnull()`` and ``.notnull()`` have been added to ``Index`` object to make them more consistent with the ``Series`` API (:issue:`15300`)
Expand Down Expand Up @@ -506,9 +490,8 @@ Other Enhancements
- ``DataFrame.to_excel()`` has a new ``freeze_panes`` parameter to turn on Freeze Panes when exporting to Excel (:issue:`15160`)
- ``pd.read_html()`` will parse multiple header rows, creating a MutliIndex header. (:issue:`13434`).
- HTML table output skips ``colspan`` or ``rowspan`` attribute if equal to 1. (:issue:`15403`)
- :class:`pandas.io.formats.style.Styler`` template now has blocks for easier extension, :ref:`see the example notebook <style.ipynb#Subclassing>` (:issue:`15649`)
- :meth:`pandas.io.formats.style.Styler.render` now accepts ``**kwargs`` to allow user-defined variables in the template (:issue:`15649`)
- ``pd.io.api.Styler.render`` now accepts ``**kwargs`` to allow user-defined variables in the template (:issue:`15649`)
- :class:`pandas.io.formats.style.Styler` template now has blocks for easier extension, :ref:`see the example notebook <style.ipynb#Subclassing>` (:issue:`15649`)
- :meth:`Styler.render() <pandas.io.formats.style.Styler.render>` now accepts ``**kwargs`` to allow user-defined variables in the template (:issue:`15649`)
- Compatibility with Jupyter notebook 5.0; MultiIndex column labels are left-aligned and MultiIndex row-labels are top-aligned (:issue:`15379`)
- ``TimedeltaIndex`` now has a custom date-tick formatter specifically designed for nanosecond level precision (:issue:`8711`)
- ``pd.api.types.union_categoricals`` gained the ``ignore_ordered`` argument to allow ignoring the ordered attribute of unioned categoricals (:issue:`13410`). See the :ref:`categorical union docs <categorical.union>` for more information.
Expand All @@ -519,7 +502,7 @@ Other Enhancements
- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)
- ``pandas.io.json.json_normalize()`` with an empty ``list`` will return an empty ``DataFrame`` (:issue:`15534`)
- ``pandas.io.json.json_normalize()`` has gained a ``sep`` option that accepts ``str`` to separate joined fields; the default is ".", which is backward compatible. (:issue:`14883`)
- :meth:`~MultiIndex.remove_unused_levels` has been added to facilitate :ref:`removing unused levels <advanced.shown_levels>`. (:issue:`15694`)
- :meth:`MultiIndex.remove_unused_levels` has been added to facilitate :ref:`removing unused levels <advanced.shown_levels>`. (:issue:`15694`)
- ``pd.read_csv()`` will now raise a ``ParserError`` error whenever any parsing error occurs (:issue:`15913`, :issue:`15925`)
- ``pd.read_csv()`` now supports the ``error_bad_lines`` and ``warn_bad_lines`` arguments for the Python parser (:issue:`15925`)
- The ``display.show_dimensions`` option can now also be used to specify
Expand All @@ -542,7 +525,7 @@ Backwards incompatible API changes
Possible incompatibility for HDF5 formats created with pandas < 0.13.0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``pd.TimeSeries`` was deprecated officially in 0.17.0, though has only been an alias since 0.13.0. It has
``pd.TimeSeries`` was deprecated officially in 0.17.0, though has already been an alias since 0.13.0. It has
been dropped in favor of ``pd.Series``. (:issue:`15098`).

This *may* cause HDF5 files that were created in prior versions to become unreadable if ``pd.TimeSeries``
Expand Down Expand Up @@ -680,7 +663,7 @@ ndarray, you can always convert explicitly using ``np.asarray(idx.hour)``.
pd.unique will now be consistent with extension types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In prior versions, using ``Series.unique()`` and :func:`unique` on ``Categorical`` and tz-aware
In prior versions, using :meth:`Series.unique` and :func:`pandas.unique` on ``Categorical`` and tz-aware
data-types would yield different return types. These are now made consistent. (:issue:`15903`)

- Datetime tz-aware
Expand Down Expand Up @@ -729,21 +712,21 @@ data-types would yield different return types. These are now made consistent. (:

.. code-block:: ipython

In [1]: pd.Series(pd.Categorical(list('baabc'))).unique()
In [1]: pd.Series(list('baabc'), dtype='category').unique()
Out[1]:
[b, a, c]
Categories (3, object): [b, a, c]

In [2]: pd.unique(pd.Series(pd.Categorical(list('baabc'))))
In [2]: pd.unique(pd.Series(list('baabc'), dtype='category'))
Out[2]: array(['b', 'a', 'c'], dtype=object)

New Behavior:

.. ipython:: python

# returns a Categorical
pd.Series(pd.Categorical(list('baabc'))).unique()
pd.unique(pd.Series(pd.Categorical(list('baabc'))).unique())
pd.Series(list('baabc'), dtype='category').unique()
pd.unique(pd.Series(list('baabc'), dtype='category'))

.. _whatsnew_0200.api_breaking.s3:

Expand Down Expand Up @@ -804,16 +787,14 @@ Now the smallest acceptable dtype will be used (:issue:`13247`)
df1 = pd.DataFrame(np.array([1.0], dtype=np.float32, ndmin=2))
df1.dtypes

.. ipython:: python

df2 = pd.DataFrame(np.array([np.nan], dtype=np.float32, ndmin=2))
df2.dtypes

Previous Behavior:

.. code-block:: ipython

In [7]: pd.concat([df1,df2]).dtypes
In [7]: pd.concat([df1, df2]).dtypes
Out[7]:
0 float64
dtype: object
Expand All @@ -822,7 +803,7 @@ New Behavior:

.. ipython:: python

pd.concat([df1,df2]).dtypes
pd.concat([df1, df2]).dtypes

.. _whatsnew_0200.api_breaking.gbq:

Expand Down Expand Up @@ -1012,7 +993,7 @@ See the section on :ref:`Windowed Binary Operations <stats.moments.binary>` for
periods=100, freq='D', name='foo'))
df.tail()

Old Behavior:
Previous Behavior:

.. code-block:: ipython

Expand Down Expand Up @@ -1228,12 +1209,12 @@ If indicated, a deprecation warning will be issued if you reference theses modul
"pandas.algos", "pandas._libs.algos", ""
"pandas.hashtable", "pandas._libs.hashtable", ""
"pandas.indexes", "pandas.core.indexes", ""
"pandas.json", "pandas._libs.json", "X"
"pandas.json", "pandas._libs.json / pandas.io.json", "X"
"pandas.parser", "pandas._libs.parsers", "X"
"pandas.formats", "pandas.io.formats", ""
"pandas.sparse", "pandas.core.sparse", ""
"pandas.tools", "pandas.core.reshape", ""
"pandas.types", "pandas.core.dtypes", ""
"pandas.tools", "pandas.core.reshape", "X"
"pandas.types", "pandas.core.dtypes", "X"
"pandas.io.sas.saslib", "pandas.io.sas._sas", ""
"pandas._join", "pandas._libs.join", ""
"pandas._hash", "pandas._libs.hashing", ""
Expand All @@ -1249,11 +1230,12 @@ exposed in the top-level namespace: ``pandas.errors``, ``pandas.plotting`` and
certain functions in the ``pandas.io`` and ``pandas.tseries`` submodules,
these are now the public subpackages.

Further changes:

- The function :func:`~pandas.api.types.union_categoricals` is now importable from ``pandas.api.types``, formerly from ``pandas.types.concat`` (:issue:`15998`)
- The type import ``pandas.tslib.NaTType`` is deprecated and can be replaced by using ``type(pandas.NaT)`` (:issue:`16146`)
- The public functions in ``pandas.tools.hashing`` deprecated from that locations, but are now importable from ``pandas.util`` (:issue:`16223`)
- The modules in ``pandas.util``: ``decorators``, ``print_versions``, ``doctools``, `validators``, ``depr_module`` are now private (:issue:`16223`)
- The modules in ``pandas.util``: ``decorators``, ``print_versions``, ``doctools``, ``validators``, ``depr_module`` are now private. Only the functions exposed in ``pandas.util`` itself are public (:issue:`16223`)

.. _whatsnew_0200.privacy.errors:

Expand Down Expand Up @@ -1320,7 +1302,7 @@ Deprecations
Deprecate ``.ix``
^^^^^^^^^^^^^^^^^

The ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*, depending on the data type of the index. This has caused quite a bit of user confusion over the years. The full indexing documentation are :ref:`here <indexing>`. (:issue:`14218`)
The ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*, depending on the data type of the index. This has caused quite a bit of user confusion over the years. The full indexing documentation is :ref:`here <indexing>`. (:issue:`14218`)

The recommended methods of indexing are:

Expand Down Expand Up @@ -1368,7 +1350,7 @@ Deprecate Panel

``Panel`` is deprecated and will be removed in a future version. The recommended way to represent 3-D data are
with a ``MultiIndex`` on a ``DataFrame`` via the :meth:`~Panel.to_frame` or with the `xarray package <http://xarray.pydata.org/en/stable/>`__. Pandas
provides a :meth:`~Panel.to_xarray` method to automate this conversion. See the documentation :ref:`Deprecate Panel <dsintro.deprecate_panel>`. (:issue:`13563`).
provides a :meth:`~Panel.to_xarray` method to automate this conversion. For more details see :ref:`Deprecate Panel <dsintro.deprecate_panel>` documentation. (:issue:`13563`).

.. ipython:: python
:okwarning:
Expand Down Expand Up @@ -1416,7 +1398,7 @@ This is an illustrative example:

Here is a typical useful syntax for computing different aggregations for different columns. This
is a natural, and useful syntax. We aggregate from the dict-to-list by taking the specified
columns and applying the list of functions. This returns a ``MultiIndex`` for the columns.
columns and applying the list of functions. This returns a ``MultiIndex`` for the columns (this is *not* deprecated).

.. ipython:: python

Expand Down
3 changes: 3 additions & 0 deletions pandas/core/indexes/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,9 @@ class IntervalIndex(IntervalMixin, Index):

.. versionadded:: 0.20.0

Warning: the indexing behaviors are provisional and may change in
a future version of pandas.

Attributes
----------
left, right : array-like (1-dimensional)
Expand Down