Skip to content

DOC: some reviewing of the 0.20 whatsnew file #16254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 31 additions & 48 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,13 @@ Highlights include:
- The ``.ix`` indexer has been deprecated, see :ref:`here <whatsnew_0200.api_breaking.deprecate_ix>`
- ``Panel`` has been deprecated, see :ref:`here <whatsnew_0200.api_breaking.deprecate_panel>`
- Addition of an ``IntervalIndex`` and ``Interval`` scalar type, see :ref:`here <whatsnew_0200.enhancements.intervalindex>`
- Improved user API when accessing levels in ``.groupby()``, see :ref:`here <whatsnew_0200.enhancements.groupby_access>`
- Improved user API when grouping by index levels in ``.groupby()``, see :ref:`here <whatsnew_0200.enhancements.groupby_access>`
- Improved support for ``UInt64`` dtypes, see :ref:`here <whatsnew_0200.enhancements.uint64_support>`
- A new orient for JSON serialization, ``orient='table'``, that uses the :ref:`Table Schema spec <whatsnew_0200.enhancements.table_schema>`
- Experimental support for exporting ``DataFrame.style`` formats to Excel, see :ref:`here <whatsnew_0200.enhancements.style_excel>`
- A new orient for JSON serialization, ``orient='table'``, that uses the Table Schema spec and that gives the possibility for a more interactive repr in the Jupyter Notebook, see :ref:`here <whatsnew_0200.enhancements.table_schema>`
- Experimental support for exporting styled DataFrames (``DataFrame.style``) to Excel, see :ref:`here <whatsnew_0200.enhancements.style_excel>`
- Window binary corr/cov operations now return a MultiIndexed ``DataFrame`` rather than a ``Panel``, as ``Panel`` is now deprecated, see :ref:`here <whatsnew_0200.api_breaking.rolling_pairwise>`
- Support for S3 handling now uses ``s3fs``, see :ref:`here <whatsnew_0200.api_breaking.s3>`
- Google BigQuery support now uses the ``pandas-gbq`` library, see :ref:`here <whatsnew_0200.api_breaking.gbq>`
- Switched the test framework to use `pytest <http://doc.pytest.org/en/latest>`__ (:issue:`13097`)

.. warning::

Expand All @@ -41,12 +40,12 @@ New features

.. _whatsnew_0200.enhancements.agg:

``agg`` API
^^^^^^^^^^^
``agg`` API for DataFrame/Series
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Series & DataFrame have been enhanced to support the aggregation API. This is a familiar API
from groupby, window operations, and resampling. This allows aggregation operations in a concise
by using :meth:`~DataFrame.agg`, and :meth:`~DataFrame.transform`. The full documentation
from groupby, window operations, and resampling. This allows aggregation operations in a concise way
by using :meth:`~DataFrame.agg` and :meth:`~DataFrame.transform`. The full documentation
is :ref:`here <basics.aggregate>` (:issue:`1623`).

Here is a sample
Expand Down Expand Up @@ -107,22 +106,14 @@ aggregations. This is similiar to how groupby ``.agg()`` works. (:issue:`15015`)
``dtype`` keyword for data IO
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``'python'`` engine for :func:`read_csv` now accepts the ``dtype`` keyword argument for specifying the types of specific columns (:issue:`14295`). See the :ref:`io docs <io.dtypes>` for more information.
The ``'python'`` engine for :func:`read_csv`, as well as the :func:`read_fwf` function for parsing
fixed-width text files and :func:`read_excel` for parsing Excel files, now accept the ``dtype`` keyword argument for specifying the types of specific columns (:issue:`14295`). See the :ref:`io docs <io.dtypes>` for more information.

.. ipython:: python
:suppress:

from pandas.compat import StringIO

.. ipython:: python

data = "a,b\n1,2\n3,4"
pd.read_csv(StringIO(data), engine='python').dtypes
pd.read_csv(StringIO(data), engine='python', dtype={'a':'float64', 'b':'object'}).dtypes

The ``dtype`` keyword argument is also now supported in the :func:`read_fwf` function for parsing
fixed-width text files, and :func:`read_excel` for parsing Excel files.

.. ipython:: python

data = "a b\n1 2\n3 4"
Expand All @@ -135,16 +126,16 @@ fixed-width text files, and :func:`read_excel` for parsing Excel files.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`to_datetime` has gained a new parameter, ``origin``, to define a reference date
from where to compute the resulting ``DatetimeIndex`` when ``unit`` is specified. (:issue:`11276`, :issue:`11745`)
from where to compute the resulting timestamps when parsing numerical values with a specific ``unit`` specified. (:issue:`11276`, :issue:`11745`)

Start with 1960-01-01 as the starting date
For example, with 1960-01-01 as the starting date:

.. ipython:: python

pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('1960-01-01'))

The default is set at ``origin='unix'``, which defaults to ``1970-01-01 00:00:00``.
Commonly called 'unix epoch' or POSIX time. This was the previous default, so this is a backward compatible change.
The default is set at ``origin='unix'``, which defaults to ``1970-01-01 00:00:00``, which is
commonly called 'unix epoch' or POSIX time. This was the previous default, so this is a backward compatible change.

.. ipython:: python

Expand All @@ -156,7 +147,7 @@ Commonly called 'unix epoch' or POSIX time. This was the previous default, so th
Groupby Enhancements
^^^^^^^^^^^^^^^^^^^^

Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names.
Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names. Previously, only column names could be referenced. This allows to easily group by a column and index level at the same time. (:issue:`5677`)

.. ipython:: python

Expand All @@ -172,8 +163,6 @@ Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now refere

df.groupby(['second', 'A']).sum()

Previously, only column names could be referenced. (:issue:`5677`)


.. _whatsnew_0200.enhancements.compressed_urls:

Expand Down Expand Up @@ -203,7 +192,7 @@ support for bz2 compression in the python 2 C-engine improved (:issue:`14874`).
Pickle file I/O now supports compression
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`read_pickle`, :meth:`DataFame.to_pickle` and :meth:`Series.to_pickle`
:func:`read_pickle`, :meth:`DataFrame.to_pickle` and :meth:`Series.to_pickle`
can now read from and write to compressed pickle files. Compression methods
can be an explicit parameter or be inferred from the file extension.
See :ref:`the docs here. <io.pickle.compression>`
Expand All @@ -221,33 +210,24 @@ Using an explicit compression type

df.to_pickle("data.pkl.compress", compression="gzip")
rt = pd.read_pickle("data.pkl.compress", compression="gzip")
rt
rt.head()

Inferring compression type from the extension

.. ipython:: python

df.to_pickle("data.pkl.xz", compression="infer")
rt = pd.read_pickle("data.pkl.xz", compression="infer")
rt

The default is to ``infer``:
The default is to inferring compression type from the extension (``compression='infer'``):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is to inferring" -> "is to infer"


.. ipython:: python

df.to_pickle("data.pkl.gz")
rt = pd.read_pickle("data.pkl.gz")
rt
rt.head()
df["A"].to_pickle("s1.pkl.bz2")
rt = pd.read_pickle("s1.pkl.bz2")
rt
rt.head()

.. ipython:: python
:suppress:

import os
os.remove("data.pkl.compress")
os.remove("data.pkl.xz")
os.remove("data.pkl.gz")
os.remove("s1.pkl.bz2")

Expand Down Expand Up @@ -293,15 +273,15 @@ In previous versions, ``.groupby(..., sort=False)`` would fail with a ``ValueErr
ordered=True)})
df

Previous Behavior:
**Previous Behavior**:

.. code-block:: ipython

In [3]: df[df.chromosomes != '1'].groupby('chromosomes', sort=False).sum()
---------------------------------------------------------------------------
ValueError: items in new_categories are not the same as in old categories

New Behavior:
**New Behavior**:

.. ipython:: python

Expand All @@ -327,7 +307,7 @@ the data.
df.to_json(orient='table')


See :ref:`IO: Table Schema for more<io.table_schema>`.
See :ref:`IO: Table Schema for more information <io.table_schema>`.

Additionally, the repr for ``DataFrame`` and ``Series`` can now publish
this JSON Table schema representation of the Series or DataFrame if you are
Expand Down Expand Up @@ -411,6 +391,11 @@ pandas has gained an ``IntervalIndex`` with its own dtype, ``interval`` as well
notation, specifically as a return type for the categories in :func:`cut` and :func:`qcut`. The ``IntervalIndex`` allows some unique indexing, see the
:ref:`docs <indexing.intervallindex>`. (:issue:`7640`, :issue:`8625`)

.. warning::

These indexing behaviors of the IntervalIndex are provisional and may change in a future version of pandas. Feedback on usage very welcome.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is very welcome" (or just "is welcome")



Previous behavior:

The returned categories were strings, representing Intervals
Expand Down Expand Up @@ -473,9 +458,8 @@ Other Enhancements
- ``Series.str.replace()`` now accepts a callable, as replacement, which is passed to ``re.sub`` (:issue:`15055`)
- ``Series.str.replace()`` now accepts a compiled regular expression as a pattern (:issue:`15446`)
- ``Series.sort_index`` accepts parameters ``kind`` and ``na_position`` (:issue:`13589`, :issue:`14444`)
- ``DataFrame`` has gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`).
- ``DataFrame`` and ``DataFrame.groupby()`` have gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`, :issue:`15197`).
- ``DataFrame`` has gained a ``melt()`` method, equivalent to ``pd.melt()``, for unpivoting from a wide to long format (:issue:`12640`).
- ``DataFrame.groupby()`` has gained a ``.nunique()`` method to count the distinct values for all columns within each group (:issue:`14336`, :issue:`15197`).
- ``pd.read_excel()`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)
- Multiple offset aliases with decimal points are now supported (e.g. ``0.5min`` is parsed as ``30s``) (:issue:`8419`)
- ``.isnull()`` and ``.notnull()`` have been added to ``Index`` object to make them more consistent with the ``Series`` API (:issue:`15300`)
Expand Down Expand Up @@ -506,9 +490,8 @@ Other Enhancements
- ``DataFrame.to_excel()`` has a new ``freeze_panes`` parameter to turn on Freeze Panes when exporting to Excel (:issue:`15160`)
- ``pd.read_html()`` will parse multiple header rows, creating a MutliIndex header. (:issue:`13434`).
- HTML table output skips ``colspan`` or ``rowspan`` attribute if equal to 1. (:issue:`15403`)
- :class:`pandas.io.formats.style.Styler`` template now has blocks for easier extension, :ref:`see the example notebook <style.ipynb#Subclassing>` (:issue:`15649`)
- :meth:`pandas.io.formats.style.Styler.render` now accepts ``**kwargs`` to allow user-defined variables in the template (:issue:`15649`)
- ``pd.io.api.Styler.render`` now accepts ``**kwargs`` to allow user-defined variables in the template (:issue:`15649`)
- :class:`pandas.io.formats.style.Styler` template now has blocks for easier extension, :ref:`see the example notebook <style.ipynb#Subclassing>` (:issue:`15649`)
- :meth:`Styler.render() <pandas.io.formats.style.Styler.render>` now accepts ``**kwargs`` to allow user-defined variables in the template (:issue:`15649`)
- Compatibility with Jupyter notebook 5.0; MultiIndex column labels are left-aligned and MultiIndex row-labels are top-aligned (:issue:`15379`)
- ``TimedeltaIndex`` now has a custom date-tick formatter specifically designed for nanosecond level precision (:issue:`8711`)
- ``pd.api.types.union_categoricals`` gained the ``ignore_ordered`` argument to allow ignoring the ordered attribute of unioned categoricals (:issue:`13410`). See the :ref:`categorical union docs <categorical.union>` for more information.
Expand All @@ -519,7 +502,7 @@ Other Enhancements
- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)
- ``pandas.io.json.json_normalize()`` with an empty ``list`` will return an empty ``DataFrame`` (:issue:`15534`)
- ``pandas.io.json.json_normalize()`` has gained a ``sep`` option that accepts ``str`` to separate joined fields; the default is ".", which is backward compatible. (:issue:`14883`)
- :meth:`~MultiIndex.remove_unused_levels` has been added to facilitate :ref:`removing unused levels <advanced.shown_levels>`. (:issue:`15694`)
- :meth:`MultiIndex.remove_unused_levels` has been added to facilitate :ref:`removing unused levels <advanced.shown_levels>`. (:issue:`15694`)
- ``pd.read_csv()`` will now raise a ``ParserError`` error whenever any parsing error occurs (:issue:`15913`, :issue:`15925`)
- ``pd.read_csv()`` now supports the ``error_bad_lines`` and ``warn_bad_lines`` arguments for the Python parser (:issue:`15925`)
- The ``display.show_dimensions`` option can now also be used to specify
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/indexes/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,9 @@ class IntervalIndex(IntervalMixin, Index):

.. versionadded:: 0.20.0

Warning: the indexing behaviors are provisional and may change in
a future version of pandas.

Attributes
----------
left, right : array-like (1-dimensional)
Expand Down