diff --git a/doc/source/whatsnew/v0.20.0.txt b/doc/source/whatsnew/v0.20.0.txt index 61042071a52ec..551c4bd67146b 100644 --- a/doc/source/whatsnew/v0.20.0.txt +++ b/doc/source/whatsnew/v0.20.0.txt @@ -14,14 +14,13 @@ Highlights include: - The ``.ix`` indexer has been deprecated, see :ref:`here ` - ``Panel`` has been deprecated, see :ref:`here ` - Addition of an ``IntervalIndex`` and ``Interval`` scalar type, see :ref:`here ` -- Improved user API when accessing levels in ``.groupby()``, see :ref:`here ` +- Improved user API when grouping by index levels in ``.groupby()``, see :ref:`here ` - Improved support for ``UInt64`` dtypes, see :ref:`here ` -- A new orient for JSON serialization, ``orient='table'``, that uses the :ref:`Table Schema spec ` -- Experimental support for exporting ``DataFrame.style`` formats to Excel, see :ref:`here ` +- A new orient for JSON serialization, ``orient='table'``, that uses the Table Schema spec and that gives the possibility for a more interactive repr in the Jupyter Notebook, see :ref:`here ` +- Experimental support for exporting styled DataFrames (``DataFrame.style``) to Excel, see :ref:`here ` - Window binary corr/cov operations now return a MultiIndexed ``DataFrame`` rather than a ``Panel``, as ``Panel`` is now deprecated, see :ref:`here ` - Support for S3 handling now uses ``s3fs``, see :ref:`here ` - Google BigQuery support now uses the ``pandas-gbq`` library, see :ref:`here ` -- Switched the test framework to use `pytest `__ (:issue:`13097`) .. warning:: @@ -41,12 +40,12 @@ New features .. _whatsnew_0200.enhancements.agg: -``agg`` API -^^^^^^^^^^^ +``agg`` API for DataFrame/Series +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Series & DataFrame have been enhanced to support the aggregation API. This is a familiar API -from groupby, window operations, and resampling. This allows aggregation operations in a concise -by using :meth:`~DataFrame.agg`, and :meth:`~DataFrame.transform`. The full documentation +from groupby, window operations, and resampling. This allows aggregation operations in a concise way +by using :meth:`~DataFrame.agg` and :meth:`~DataFrame.transform`. The full documentation is :ref:`here ` (:issue:`1623`). Here is a sample @@ -107,22 +106,14 @@ aggregations. This is similiar to how groupby ``.agg()`` works. (:issue:`15015`) ``dtype`` keyword for data IO ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The ``'python'`` engine for :func:`read_csv` now accepts the ``dtype`` keyword argument for specifying the types of specific columns (:issue:`14295`). See the :ref:`io docs ` for more information. +The ``'python'`` engine for :func:`read_csv`, as well as the :func:`read_fwf` function for parsing +fixed-width text files and :func:`read_excel` for parsing Excel files, now accept the ``dtype`` keyword argument for specifying the types of specific columns (:issue:`14295`). See the :ref:`io docs ` for more information. .. ipython:: python :suppress: from pandas.compat import StringIO -.. ipython:: python - - data = "a,b\n1,2\n3,4" - pd.read_csv(StringIO(data), engine='python').dtypes - pd.read_csv(StringIO(data), engine='python', dtype={'a':'float64', 'b':'object'}).dtypes - -The ``dtype`` keyword argument is also now supported in the :func:`read_fwf` function for parsing -fixed-width text files, and :func:`read_excel` for parsing Excel files. - .. ipython:: python data = "a b\n1 2\n3 4" @@ -135,16 +126,16 @@ fixed-width text files, and :func:`read_excel` for parsing Excel files. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :func:`to_datetime` has gained a new parameter, ``origin``, to define a reference date -from where to compute the resulting ``DatetimeIndex`` when ``unit`` is specified. (:issue:`11276`, :issue:`11745`) +from where to compute the resulting timestamps when parsing numerical values with a specific ``unit`` specified. (:issue:`11276`, :issue:`11745`) -Start with 1960-01-01 as the starting date +For example, with 1960-01-01 as the starting date: .. ipython:: python pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('1960-01-01')) -The default is set at ``origin='unix'``, which defaults to ``1970-01-01 00:00:00``. -Commonly called 'unix epoch' or POSIX time. This was the previous default, so this is a backward compatible change. +The default is set at ``origin='unix'``, which defaults to ``1970-01-01 00:00:00``, which is +commonly called 'unix epoch' or POSIX time. This was the previous default, so this is a backward compatible change. .. ipython:: python @@ -156,7 +147,7 @@ Commonly called 'unix epoch' or POSIX time. This was the previous default, so th Groupby Enhancements ^^^^^^^^^^^^^^^^^^^^ -Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names. +Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names. Previously, only column names could be referenced. This allows to easily group by a column and index level at the same time. (:issue:`5677`) .. ipython:: python @@ -172,8 +163,6 @@ Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now refere df.groupby(['second', 'A']).sum() -Previously, only column names could be referenced. (:issue:`5677`) - .. _whatsnew_0200.enhancements.compressed_urls: @@ -203,7 +192,7 @@ support for bz2 compression in the python 2 C-engine improved (:issue:`14874`). Pickle file I/O now supports compression ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -:func:`read_pickle`, :meth:`DataFame.to_pickle` and :meth:`Series.to_pickle` +:func:`read_pickle`, :meth:`DataFrame.to_pickle` and :meth:`Series.to_pickle` can now read from and write to compressed pickle files. Compression methods can be an explicit parameter or be inferred from the file extension. See :ref:`the docs here. ` @@ -221,33 +210,24 @@ Using an explicit compression type df.to_pickle("data.pkl.compress", compression="gzip") rt = pd.read_pickle("data.pkl.compress", compression="gzip") - rt - -Inferring compression type from the extension - -.. ipython:: python + rt.head() - df.to_pickle("data.pkl.xz", compression="infer") - rt = pd.read_pickle("data.pkl.xz", compression="infer") - rt - -The default is to ``infer``: +The default is to infer the compression type from the extension (``compression='infer'``): .. ipython:: python df.to_pickle("data.pkl.gz") rt = pd.read_pickle("data.pkl.gz") - rt + rt.head() df["A"].to_pickle("s1.pkl.bz2") rt = pd.read_pickle("s1.pkl.bz2") - rt + rt.head() .. ipython:: python :suppress: import os os.remove("data.pkl.compress") - os.remove("data.pkl.xz") os.remove("data.pkl.gz") os.remove("s1.pkl.bz2") @@ -293,7 +273,7 @@ In previous versions, ``.groupby(..., sort=False)`` would fail with a ``ValueErr ordered=True)}) df -Previous Behavior: +**Previous Behavior**: .. code-block:: ipython @@ -301,7 +281,7 @@ Previous Behavior: --------------------------------------------------------------------------- ValueError: items in new_categories are not the same as in old categories -New Behavior: +**New Behavior**: .. ipython:: python @@ -327,7 +307,7 @@ the data. df.to_json(orient='table') -See :ref:`IO: Table Schema for more`. +See :ref:`IO: Table Schema for more information `. Additionally, the repr for ``DataFrame`` and ``Series`` can now publish this JSON Table schema representation of the Series or DataFrame if you are @@ -411,6 +391,11 @@ pandas has gained an ``IntervalIndex`` with its own dtype, ``interval`` as well notation, specifically as a return type for the categories in :func:`cut` and :func:`qcut`. The ``IntervalIndex`` allows some unique indexing, see the :ref:`docs `. (:issue:`7640`, :issue:`8625`) +.. warning:: + + These indexing behaviors of the IntervalIndex are provisional and may change in a future version of pandas. Feedback on usage is welcome. + + Previous behavior: The returned categories were strings, representing Intervals @@ -473,9 +458,8 @@ Other Enhancements - ``Series.str.replace()`` now accepts a callable, as replacement, which is passed to ``re.sub`` (:issue:`15055`) - ``Series.str.replace()`` now accepts a compiled regular expression as a pattern (:issue:`15446`) - ``Series.sort_index`` accepts parameters ``kind`` and ``na_position`` (:issue:`13589`, :issue:`14444`) -- ``DataFrame`` has gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`). +- ``DataFrame`` and ``DataFrame.groupby()`` have gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`, :issue:`15197`). - ``DataFrame`` has gained a ``melt()`` method, equivalent to ``pd.melt()``, for unpivoting from a wide to long format (:issue:`12640`). -- ``DataFrame.groupby()`` has gained a ``.nunique()`` method to count the distinct values for all columns within each group (:issue:`14336`, :issue:`15197`). - ``pd.read_excel()`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`) - Multiple offset aliases with decimal points are now supported (e.g. ``0.5min`` is parsed as ``30s``) (:issue:`8419`) - ``.isnull()`` and ``.notnull()`` have been added to ``Index`` object to make them more consistent with the ``Series`` API (:issue:`15300`) @@ -506,9 +490,8 @@ Other Enhancements - ``DataFrame.to_excel()`` has a new ``freeze_panes`` parameter to turn on Freeze Panes when exporting to Excel (:issue:`15160`) - ``pd.read_html()`` will parse multiple header rows, creating a MutliIndex header. (:issue:`13434`). - HTML table output skips ``colspan`` or ``rowspan`` attribute if equal to 1. (:issue:`15403`) -- :class:`pandas.io.formats.style.Styler`` template now has blocks for easier extension, :ref:`see the example notebook ` (:issue:`15649`) -- :meth:`pandas.io.formats.style.Styler.render` now accepts ``**kwargs`` to allow user-defined variables in the template (:issue:`15649`) -- ``pd.io.api.Styler.render`` now accepts ``**kwargs`` to allow user-defined variables in the template (:issue:`15649`) +- :class:`pandas.io.formats.style.Styler` template now has blocks for easier extension, :ref:`see the example notebook ` (:issue:`15649`) +- :meth:`Styler.render() ` now accepts ``**kwargs`` to allow user-defined variables in the template (:issue:`15649`) - Compatibility with Jupyter notebook 5.0; MultiIndex column labels are left-aligned and MultiIndex row-labels are top-aligned (:issue:`15379`) - ``TimedeltaIndex`` now has a custom date-tick formatter specifically designed for nanosecond level precision (:issue:`8711`) - ``pd.api.types.union_categoricals`` gained the ``ignore_ordered`` argument to allow ignoring the ordered attribute of unioned categoricals (:issue:`13410`). See the :ref:`categorical union docs ` for more information. @@ -519,7 +502,7 @@ Other Enhancements - ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`) - ``pandas.io.json.json_normalize()`` with an empty ``list`` will return an empty ``DataFrame`` (:issue:`15534`) - ``pandas.io.json.json_normalize()`` has gained a ``sep`` option that accepts ``str`` to separate joined fields; the default is ".", which is backward compatible. (:issue:`14883`) -- :meth:`~MultiIndex.remove_unused_levels` has been added to facilitate :ref:`removing unused levels `. (:issue:`15694`) +- :meth:`MultiIndex.remove_unused_levels` has been added to facilitate :ref:`removing unused levels `. (:issue:`15694`) - ``pd.read_csv()`` will now raise a ``ParserError`` error whenever any parsing error occurs (:issue:`15913`, :issue:`15925`) - ``pd.read_csv()`` now supports the ``error_bad_lines`` and ``warn_bad_lines`` arguments for the Python parser (:issue:`15925`) - The ``display.show_dimensions`` option can now also be used to specify @@ -542,7 +525,7 @@ Backwards incompatible API changes Possible incompatibility for HDF5 formats created with pandas < 0.13.0 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -``pd.TimeSeries`` was deprecated officially in 0.17.0, though has only been an alias since 0.13.0. It has +``pd.TimeSeries`` was deprecated officially in 0.17.0, though has already been an alias since 0.13.0. It has been dropped in favor of ``pd.Series``. (:issue:`15098`). This *may* cause HDF5 files that were created in prior versions to become unreadable if ``pd.TimeSeries`` @@ -680,7 +663,7 @@ ndarray, you can always convert explicitly using ``np.asarray(idx.hour)``. pd.unique will now be consistent with extension types ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -In prior versions, using ``Series.unique()`` and :func:`unique` on ``Categorical`` and tz-aware +In prior versions, using :meth:`Series.unique` and :func:`pandas.unique` on ``Categorical`` and tz-aware data-types would yield different return types. These are now made consistent. (:issue:`15903`) - Datetime tz-aware @@ -729,12 +712,12 @@ data-types would yield different return types. These are now made consistent. (: .. code-block:: ipython - In [1]: pd.Series(pd.Categorical(list('baabc'))).unique() + In [1]: pd.Series(list('baabc'), dtype='category').unique() Out[1]: [b, a, c] Categories (3, object): [b, a, c] - In [2]: pd.unique(pd.Series(pd.Categorical(list('baabc')))) + In [2]: pd.unique(pd.Series(list('baabc'), dtype='category')) Out[2]: array(['b', 'a', 'c'], dtype=object) New Behavior: @@ -742,8 +725,8 @@ data-types would yield different return types. These are now made consistent. (: .. ipython:: python # returns a Categorical - pd.Series(pd.Categorical(list('baabc'))).unique() - pd.unique(pd.Series(pd.Categorical(list('baabc'))).unique()) + pd.Series(list('baabc'), dtype='category').unique() + pd.unique(pd.Series(list('baabc'), dtype='category')) .. _whatsnew_0200.api_breaking.s3: @@ -804,8 +787,6 @@ Now the smallest acceptable dtype will be used (:issue:`13247`) df1 = pd.DataFrame(np.array([1.0], dtype=np.float32, ndmin=2)) df1.dtypes -.. ipython:: python - df2 = pd.DataFrame(np.array([np.nan], dtype=np.float32, ndmin=2)) df2.dtypes @@ -813,7 +794,7 @@ Previous Behavior: .. code-block:: ipython - In [7]: pd.concat([df1,df2]).dtypes + In [7]: pd.concat([df1, df2]).dtypes Out[7]: 0 float64 dtype: object @@ -822,7 +803,7 @@ New Behavior: .. ipython:: python - pd.concat([df1,df2]).dtypes + pd.concat([df1, df2]).dtypes .. _whatsnew_0200.api_breaking.gbq: @@ -1012,7 +993,7 @@ See the section on :ref:`Windowed Binary Operations ` for periods=100, freq='D', name='foo')) df.tail() -Old Behavior: +Previous Behavior: .. code-block:: ipython @@ -1228,12 +1209,12 @@ If indicated, a deprecation warning will be issued if you reference theses modul "pandas.algos", "pandas._libs.algos", "" "pandas.hashtable", "pandas._libs.hashtable", "" "pandas.indexes", "pandas.core.indexes", "" - "pandas.json", "pandas._libs.json", "X" + "pandas.json", "pandas._libs.json / pandas.io.json", "X" "pandas.parser", "pandas._libs.parsers", "X" "pandas.formats", "pandas.io.formats", "" "pandas.sparse", "pandas.core.sparse", "" - "pandas.tools", "pandas.core.reshape", "" - "pandas.types", "pandas.core.dtypes", "" + "pandas.tools", "pandas.core.reshape", "X" + "pandas.types", "pandas.core.dtypes", "X" "pandas.io.sas.saslib", "pandas.io.sas._sas", "" "pandas._join", "pandas._libs.join", "" "pandas._hash", "pandas._libs.hashing", "" @@ -1249,11 +1230,12 @@ exposed in the top-level namespace: ``pandas.errors``, ``pandas.plotting`` and certain functions in the ``pandas.io`` and ``pandas.tseries`` submodules, these are now the public subpackages. +Further changes: - The function :func:`~pandas.api.types.union_categoricals` is now importable from ``pandas.api.types``, formerly from ``pandas.types.concat`` (:issue:`15998`) - The type import ``pandas.tslib.NaTType`` is deprecated and can be replaced by using ``type(pandas.NaT)`` (:issue:`16146`) - The public functions in ``pandas.tools.hashing`` deprecated from that locations, but are now importable from ``pandas.util`` (:issue:`16223`) -- The modules in ``pandas.util``: ``decorators``, ``print_versions``, ``doctools``, `validators``, ``depr_module`` are now private (:issue:`16223`) +- The modules in ``pandas.util``: ``decorators``, ``print_versions``, ``doctools``, ``validators``, ``depr_module`` are now private. Only the functions exposed in ``pandas.util`` itself are public (:issue:`16223`) .. _whatsnew_0200.privacy.errors: @@ -1320,7 +1302,7 @@ Deprecations Deprecate ``.ix`` ^^^^^^^^^^^^^^^^^ -The ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*, depending on the data type of the index. This has caused quite a bit of user confusion over the years. The full indexing documentation are :ref:`here `. (:issue:`14218`) +The ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*, depending on the data type of the index. This has caused quite a bit of user confusion over the years. The full indexing documentation is :ref:`here `. (:issue:`14218`) The recommended methods of indexing are: @@ -1368,7 +1350,7 @@ Deprecate Panel ``Panel`` is deprecated and will be removed in a future version. The recommended way to represent 3-D data are with a ``MultiIndex`` on a ``DataFrame`` via the :meth:`~Panel.to_frame` or with the `xarray package `__. Pandas -provides a :meth:`~Panel.to_xarray` method to automate this conversion. See the documentation :ref:`Deprecate Panel `. (:issue:`13563`). +provides a :meth:`~Panel.to_xarray` method to automate this conversion. For more details see :ref:`Deprecate Panel ` documentation. (:issue:`13563`). .. ipython:: python :okwarning: @@ -1416,7 +1398,7 @@ This is an illustrative example: Here is a typical useful syntax for computing different aggregations for different columns. This is a natural, and useful syntax. We aggregate from the dict-to-list by taking the specified -columns and applying the list of functions. This returns a ``MultiIndex`` for the columns. +columns and applying the list of functions. This returns a ``MultiIndex`` for the columns (this is *not* deprecated). .. ipython:: python diff --git a/pandas/core/indexes/interval.py b/pandas/core/indexes/interval.py index 8363cead01e56..b1523cd6c0d0c 100644 --- a/pandas/core/indexes/interval.py +++ b/pandas/core/indexes/interval.py @@ -99,6 +99,9 @@ class IntervalIndex(IntervalMixin, Index): .. versionadded:: 0.20.0 + Warning: the indexing behaviors are provisional and may change in + a future version of pandas. + Attributes ---------- left, right : array-like (1-dimensional)