Skip to content

Commit 1b8c86d

Browse files
jorisvandenbosschepcluo
authored andcommitted
DOC: some reviewing of the 0.20 whatsnew file (pandas-dev#16254)
1 parent 79562b1 commit 1b8c86d

File tree

2 files changed

+51
-66
lines changed

2 files changed

+51
-66
lines changed

doc/source/whatsnew/v0.20.0.txt

+48-66
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,13 @@ Highlights include:
1414
- The ``.ix`` indexer has been deprecated, see :ref:`here <whatsnew_0200.api_breaking.deprecate_ix>`
1515
- ``Panel`` has been deprecated, see :ref:`here <whatsnew_0200.api_breaking.deprecate_panel>`
1616
- Addition of an ``IntervalIndex`` and ``Interval`` scalar type, see :ref:`here <whatsnew_0200.enhancements.intervalindex>`
17-
- Improved user API when accessing levels in ``.groupby()``, see :ref:`here <whatsnew_0200.enhancements.groupby_access>`
17+
- Improved user API when grouping by index levels in ``.groupby()``, see :ref:`here <whatsnew_0200.enhancements.groupby_access>`
1818
- Improved support for ``UInt64`` dtypes, see :ref:`here <whatsnew_0200.enhancements.uint64_support>`
19-
- A new orient for JSON serialization, ``orient='table'``, that uses the :ref:`Table Schema spec <whatsnew_0200.enhancements.table_schema>`
20-
- Experimental support for exporting ``DataFrame.style`` formats to Excel, see :ref:`here <whatsnew_0200.enhancements.style_excel>`
19+
- A new orient for JSON serialization, ``orient='table'``, that uses the Table Schema spec and that gives the possibility for a more interactive repr in the Jupyter Notebook, see :ref:`here <whatsnew_0200.enhancements.table_schema>`
20+
- Experimental support for exporting styled DataFrames (``DataFrame.style``) to Excel, see :ref:`here <whatsnew_0200.enhancements.style_excel>`
2121
- Window binary corr/cov operations now return a MultiIndexed ``DataFrame`` rather than a ``Panel``, as ``Panel`` is now deprecated, see :ref:`here <whatsnew_0200.api_breaking.rolling_pairwise>`
2222
- Support for S3 handling now uses ``s3fs``, see :ref:`here <whatsnew_0200.api_breaking.s3>`
2323
- Google BigQuery support now uses the ``pandas-gbq`` library, see :ref:`here <whatsnew_0200.api_breaking.gbq>`
24-
- Switched the test framework to use `pytest <http://doc.pytest.org/en/latest>`__ (:issue:`13097`)
2524

2625
.. warning::
2726

@@ -46,12 +45,12 @@ New features
4645

4746
.. _whatsnew_0200.enhancements.agg:
4847

49-
``agg`` API
50-
^^^^^^^^^^^
48+
``agg`` API for DataFrame/Series
49+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5150

5251
Series & DataFrame have been enhanced to support the aggregation API. This is a familiar API
53-
from groupby, window operations, and resampling. This allows aggregation operations in a concise
54-
by using :meth:`~DataFrame.agg`, and :meth:`~DataFrame.transform`. The full documentation
52+
from groupby, window operations, and resampling. This allows aggregation operations in a concise way
53+
by using :meth:`~DataFrame.agg` and :meth:`~DataFrame.transform`. The full documentation
5554
is :ref:`here <basics.aggregate>` (:issue:`1623`).
5655

5756
Here is a sample
@@ -112,22 +111,14 @@ aggregations. This is similiar to how groupby ``.agg()`` works. (:issue:`15015`)
112111
``dtype`` keyword for data IO
113112
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
114113

115-
The ``'python'`` engine for :func:`read_csv` now accepts the ``dtype`` keyword argument for specifying the types of specific columns (:issue:`14295`). See the :ref:`io docs <io.dtypes>` for more information.
114+
The ``'python'`` engine for :func:`read_csv`, as well as the :func:`read_fwf` function for parsing
115+
fixed-width text files and :func:`read_excel` for parsing Excel files, now accept the ``dtype`` keyword argument for specifying the types of specific columns (:issue:`14295`). See the :ref:`io docs <io.dtypes>` for more information.
116116

117117
.. ipython:: python
118118
:suppress:
119119

120120
from pandas.compat import StringIO
121121

122-
.. ipython:: python
123-
124-
data = "a,b\n1,2\n3,4"
125-
pd.read_csv(StringIO(data), engine='python').dtypes
126-
pd.read_csv(StringIO(data), engine='python', dtype={'a':'float64', 'b':'object'}).dtypes
127-
128-
The ``dtype`` keyword argument is also now supported in the :func:`read_fwf` function for parsing
129-
fixed-width text files, and :func:`read_excel` for parsing Excel files.
130-
131122
.. ipython:: python
132123

133124
data = "a b\n1 2\n3 4"
@@ -140,16 +131,16 @@ fixed-width text files, and :func:`read_excel` for parsing Excel files.
140131
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
141132

142133
:func:`to_datetime` has gained a new parameter, ``origin``, to define a reference date
143-
from where to compute the resulting ``DatetimeIndex`` when ``unit`` is specified. (:issue:`11276`, :issue:`11745`)
134+
from where to compute the resulting timestamps when parsing numerical values with a specific ``unit`` specified. (:issue:`11276`, :issue:`11745`)
144135

145-
Start with 1960-01-01 as the starting date
136+
For example, with 1960-01-01 as the starting date:
146137

147138
.. ipython:: python
148139

149140
pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('1960-01-01'))
150141

151-
The default is set at ``origin='unix'``, which defaults to ``1970-01-01 00:00:00``.
152-
Commonly called 'unix epoch' or POSIX time. This was the previous default, so this is a backward compatible change.
142+
The default is set at ``origin='unix'``, which defaults to ``1970-01-01 00:00:00``, which is
143+
commonly called 'unix epoch' or POSIX time. This was the previous default, so this is a backward compatible change.
153144

154145
.. ipython:: python
155146

@@ -161,7 +152,7 @@ Commonly called 'unix epoch' or POSIX time. This was the previous default, so th
161152
Groupby Enhancements
162153
^^^^^^^^^^^^^^^^^^^^
163154

164-
Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names.
155+
Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names. Previously, only column names could be referenced. This allows to easily group by a column and index level at the same time. (:issue:`5677`)
165156

166157
.. ipython:: python
167158

@@ -177,8 +168,6 @@ Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now refere
177168

178169
df.groupby(['second', 'A']).sum()
179170

180-
Previously, only column names could be referenced. (:issue:`5677`)
181-
182171

183172
.. _whatsnew_0200.enhancements.compressed_urls:
184173

@@ -208,7 +197,7 @@ support for bz2 compression in the python 2 C-engine improved (:issue:`14874`).
208197
Pickle file I/O now supports compression
209198
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
210199

211-
:func:`read_pickle`, :meth:`DataFame.to_pickle` and :meth:`Series.to_pickle`
200+
:func:`read_pickle`, :meth:`DataFrame.to_pickle` and :meth:`Series.to_pickle`
212201
can now read from and write to compressed pickle files. Compression methods
213202
can be an explicit parameter or be inferred from the file extension.
214203
See :ref:`the docs here. <io.pickle.compression>`
@@ -226,33 +215,24 @@ Using an explicit compression type
226215

227216
df.to_pickle("data.pkl.compress", compression="gzip")
228217
rt = pd.read_pickle("data.pkl.compress", compression="gzip")
229-
rt
230-
231-
Inferring compression type from the extension
232-
233-
.. ipython:: python
218+
rt.head()
234219

235-
df.to_pickle("data.pkl.xz", compression="infer")
236-
rt = pd.read_pickle("data.pkl.xz", compression="infer")
237-
rt
238-
239-
The default is to ``infer``:
220+
The default is to infer the compression type from the extension (``compression='infer'``):
240221

241222
.. ipython:: python
242223

243224
df.to_pickle("data.pkl.gz")
244225
rt = pd.read_pickle("data.pkl.gz")
245-
rt
226+
rt.head()
246227
df["A"].to_pickle("s1.pkl.bz2")
247228
rt = pd.read_pickle("s1.pkl.bz2")
248-
rt
229+
rt.head()
249230

250231
.. ipython:: python
251232
:suppress:
252233

253234
import os
254235
os.remove("data.pkl.compress")
255-
os.remove("data.pkl.xz")
256236
os.remove("data.pkl.gz")
257237
os.remove("s1.pkl.bz2")
258238

@@ -298,15 +278,15 @@ In previous versions, ``.groupby(..., sort=False)`` would fail with a ``ValueErr
298278
ordered=True)})
299279
df
300280

301-
Previous Behavior:
281+
**Previous Behavior**:
302282

303283
.. code-block:: ipython
304284

305285
In [3]: df[df.chromosomes != '1'].groupby('chromosomes', sort=False).sum()
306286
---------------------------------------------------------------------------
307287
ValueError: items in new_categories are not the same as in old categories
308288

309-
New Behavior:
289+
**New Behavior**:
310290

311291
.. ipython:: python
312292

@@ -332,7 +312,7 @@ the data.
332312
df.to_json(orient='table')
333313

334314

335-
See :ref:`IO: Table Schema for more<io.table_schema>`.
315+
See :ref:`IO: Table Schema for more information <io.table_schema>`.
336316

337317
Additionally, the repr for ``DataFrame`` and ``Series`` can now publish
338318
this JSON Table schema representation of the Series or DataFrame if you are
@@ -415,6 +395,11 @@ pandas has gained an ``IntervalIndex`` with its own dtype, ``interval`` as well
415395
notation, specifically as a return type for the categories in :func:`cut` and :func:`qcut`. The ``IntervalIndex`` allows some unique indexing, see the
416396
:ref:`docs <indexing.intervallindex>`. (:issue:`7640`, :issue:`8625`)
417397

398+
.. warning::
399+
400+
These indexing behaviors of the IntervalIndex are provisional and may change in a future version of pandas. Feedback on usage is welcome.
401+
402+
418403
Previous behavior:
419404

420405
The returned categories were strings, representing Intervals
@@ -477,9 +462,8 @@ Other Enhancements
477462
- ``Series.str.replace()`` now accepts a callable, as replacement, which is passed to ``re.sub`` (:issue:`15055`)
478463
- ``Series.str.replace()`` now accepts a compiled regular expression as a pattern (:issue:`15446`)
479464
- ``Series.sort_index`` accepts parameters ``kind`` and ``na_position`` (:issue:`13589`, :issue:`14444`)
480-
- ``DataFrame`` has gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`).
465+
- ``DataFrame`` and ``DataFrame.groupby()`` have gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`, :issue:`15197`).
481466
- ``DataFrame`` has gained a ``melt()`` method, equivalent to ``pd.melt()``, for unpivoting from a wide to long format (:issue:`12640`).
482-
- ``DataFrame.groupby()`` has gained a ``.nunique()`` method to count the distinct values for all columns within each group (:issue:`14336`, :issue:`15197`).
483467
- ``pd.read_excel()`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)
484468
- Multiple offset aliases with decimal points are now supported (e.g. ``0.5min`` is parsed as ``30s``) (:issue:`8419`)
485469
- ``.isnull()`` and ``.notnull()`` have been added to ``Index`` object to make them more consistent with the ``Series`` API (:issue:`15300`)
@@ -510,9 +494,8 @@ Other Enhancements
510494
- ``DataFrame.to_excel()`` has a new ``freeze_panes`` parameter to turn on Freeze Panes when exporting to Excel (:issue:`15160`)
511495
- ``pd.read_html()`` will parse multiple header rows, creating a MutliIndex header. (:issue:`13434`).
512496
- HTML table output skips ``colspan`` or ``rowspan`` attribute if equal to 1. (:issue:`15403`)
513-
- :class:`pandas.io.formats.style.Styler`` template now has blocks for easier extension, :ref:`see the example notebook <style.ipynb#Subclassing>` (:issue:`15649`)
514-
- :meth:`pandas.io.formats.style.Styler.render` now accepts ``**kwargs`` to allow user-defined variables in the template (:issue:`15649`)
515-
- ``pd.io.api.Styler.render`` now accepts ``**kwargs`` to allow user-defined variables in the template (:issue:`15649`)
497+
- :class:`pandas.io.formats.style.Styler` template now has blocks for easier extension, :ref:`see the example notebook <style.ipynb#Subclassing>` (:issue:`15649`)
498+
- :meth:`Styler.render() <pandas.io.formats.style.Styler.render>` now accepts ``**kwargs`` to allow user-defined variables in the template (:issue:`15649`)
516499
- Compatibility with Jupyter notebook 5.0; MultiIndex column labels are left-aligned and MultiIndex row-labels are top-aligned (:issue:`15379`)
517500
- ``TimedeltaIndex`` now has a custom date-tick formatter specifically designed for nanosecond level precision (:issue:`8711`)
518501
- ``pd.api.types.union_categoricals`` gained the ``ignore_ordered`` argument to allow ignoring the ordered attribute of unioned categoricals (:issue:`13410`). See the :ref:`categorical union docs <categorical.union>` for more information.
@@ -523,7 +506,7 @@ Other Enhancements
523506
- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)
524507
- ``pandas.io.json.json_normalize()`` with an empty ``list`` will return an empty ``DataFrame`` (:issue:`15534`)
525508
- ``pandas.io.json.json_normalize()`` has gained a ``sep`` option that accepts ``str`` to separate joined fields; the default is ".", which is backward compatible. (:issue:`14883`)
526-
- :meth:`~MultiIndex.remove_unused_levels` has been added to facilitate :ref:`removing unused levels <advanced.shown_levels>`. (:issue:`15694`)
509+
- :meth:`MultiIndex.remove_unused_levels` has been added to facilitate :ref:`removing unused levels <advanced.shown_levels>`. (:issue:`15694`)
527510
- ``pd.read_csv()`` will now raise a ``ParserError`` error whenever any parsing error occurs (:issue:`15913`, :issue:`15925`)
528511
- ``pd.read_csv()`` now supports the ``error_bad_lines`` and ``warn_bad_lines`` arguments for the Python parser (:issue:`15925`)
529512
- The ``display.show_dimensions`` option can now also be used to specify
@@ -546,7 +529,7 @@ Backwards incompatible API changes
546529
Possible incompatibility for HDF5 formats created with pandas < 0.13.0
547530
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
548531

549-
``pd.TimeSeries`` was deprecated officially in 0.17.0, though has only been an alias since 0.13.0. It has
532+
``pd.TimeSeries`` was deprecated officially in 0.17.0, though has already been an alias since 0.13.0. It has
550533
been dropped in favor of ``pd.Series``. (:issue:`15098`).
551534

552535
This *may* cause HDF5 files that were created in prior versions to become unreadable if ``pd.TimeSeries``
@@ -684,7 +667,7 @@ ndarray, you can always convert explicitly using ``np.asarray(idx.hour)``.
684667
pd.unique will now be consistent with extension types
685668
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
686669

687-
In prior versions, using ``Series.unique()`` and :func:`unique` on ``Categorical`` and tz-aware
670+
In prior versions, using :meth:`Series.unique` and :func:`pandas.unique` on ``Categorical`` and tz-aware
688671
data-types would yield different return types. These are now made consistent. (:issue:`15903`)
689672

690673
- Datetime tz-aware
@@ -733,21 +716,21 @@ data-types would yield different return types. These are now made consistent. (:
733716

734717
.. code-block:: ipython
735718

736-
In [1]: pd.Series(pd.Categorical(list('baabc'))).unique()
719+
In [1]: pd.Series(list('baabc'), dtype='category').unique()
737720
Out[1]:
738721
[b, a, c]
739722
Categories (3, object): [b, a, c]
740723

741-
In [2]: pd.unique(pd.Series(pd.Categorical(list('baabc'))))
724+
In [2]: pd.unique(pd.Series(list('baabc'), dtype='category'))
742725
Out[2]: array(['b', 'a', 'c'], dtype=object)
743726

744727
New Behavior:
745728

746729
.. ipython:: python
747730

748731
# returns a Categorical
749-
pd.Series(pd.Categorical(list('baabc'))).unique()
750-
pd.unique(pd.Series(pd.Categorical(list('baabc'))).unique())
732+
pd.Series(list('baabc'), dtype='category').unique()
733+
pd.unique(pd.Series(list('baabc'), dtype='category'))
751734

752735
.. _whatsnew_0200.api_breaking.s3:
753736

@@ -808,16 +791,14 @@ Now the smallest acceptable dtype will be used (:issue:`13247`)
808791
df1 = pd.DataFrame(np.array([1.0], dtype=np.float32, ndmin=2))
809792
df1.dtypes
810793

811-
.. ipython:: python
812-
813794
df2 = pd.DataFrame(np.array([np.nan], dtype=np.float32, ndmin=2))
814795
df2.dtypes
815796

816797
Previous Behavior:
817798

818799
.. code-block:: ipython
819800

820-
In [7]: pd.concat([df1,df2]).dtypes
801+
In [7]: pd.concat([df1, df2]).dtypes
821802
Out[7]:
822803
0 float64
823804
dtype: object
@@ -826,7 +807,7 @@ New Behavior:
826807

827808
.. ipython:: python
828809

829-
pd.concat([df1,df2]).dtypes
810+
pd.concat([df1, df2]).dtypes
830811

831812
.. _whatsnew_0200.api_breaking.gbq:
832813

@@ -1016,7 +997,7 @@ See the section on :ref:`Windowed Binary Operations <stats.moments.binary>` for
1016997
periods=100, freq='D', name='foo'))
1017998
df.tail()
1018999

1019-
Old Behavior:
1000+
Previous Behavior:
10201001

10211002
.. code-block:: ipython
10221003

@@ -1232,12 +1213,12 @@ If indicated, a deprecation warning will be issued if you reference theses modul
12321213
"pandas.algos", "pandas._libs.algos", ""
12331214
"pandas.hashtable", "pandas._libs.hashtable", ""
12341215
"pandas.indexes", "pandas.core.indexes", ""
1235-
"pandas.json", "pandas._libs.json", "X"
1216+
"pandas.json", "pandas._libs.json / pandas.io.json", "X"
12361217
"pandas.parser", "pandas._libs.parsers", "X"
12371218
"pandas.formats", "pandas.io.formats", ""
12381219
"pandas.sparse", "pandas.core.sparse", ""
1239-
"pandas.tools", "pandas.core.reshape", ""
1240-
"pandas.types", "pandas.core.dtypes", ""
1220+
"pandas.tools", "pandas.core.reshape", "X"
1221+
"pandas.types", "pandas.core.dtypes", "X"
12411222
"pandas.io.sas.saslib", "pandas.io.sas._sas", ""
12421223
"pandas._join", "pandas._libs.join", ""
12431224
"pandas._hash", "pandas._libs.hashing", ""
@@ -1253,11 +1234,12 @@ exposed in the top-level namespace: ``pandas.errors``, ``pandas.plotting`` and
12531234
certain functions in the ``pandas.io`` and ``pandas.tseries`` submodules,
12541235
these are now the public subpackages.
12551236

1237+
Further changes:
12561238

12571239
- The function :func:`~pandas.api.types.union_categoricals` is now importable from ``pandas.api.types``, formerly from ``pandas.types.concat`` (:issue:`15998`)
12581240
- The type import ``pandas.tslib.NaTType`` is deprecated and can be replaced by using ``type(pandas.NaT)`` (:issue:`16146`)
12591241
- The public functions in ``pandas.tools.hashing`` deprecated from that locations, but are now importable from ``pandas.util`` (:issue:`16223`)
1260-
- The modules in ``pandas.util``: ``decorators``, ``print_versions``, ``doctools``, `validators``, ``depr_module`` are now private (:issue:`16223`)
1242+
- The modules in ``pandas.util``: ``decorators``, ``print_versions``, ``doctools``, ``validators``, ``depr_module`` are now private. Only the functions exposed in ``pandas.util`` itself are public (:issue:`16223`)
12611243

12621244
.. _whatsnew_0200.privacy.errors:
12631245

@@ -1324,7 +1306,7 @@ Deprecations
13241306
Deprecate ``.ix``
13251307
^^^^^^^^^^^^^^^^^
13261308

1327-
The ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*, depending on the data type of the index. This has caused quite a bit of user confusion over the years. The full indexing documentation are :ref:`here <indexing>`. (:issue:`14218`)
1309+
The ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*, depending on the data type of the index. This has caused quite a bit of user confusion over the years. The full indexing documentation is :ref:`here <indexing>`. (:issue:`14218`)
13281310

13291311
The recommended methods of indexing are:
13301312

@@ -1372,7 +1354,7 @@ Deprecate Panel
13721354

13731355
``Panel`` is deprecated and will be removed in a future version. The recommended way to represent 3-D data are
13741356
with a ``MultiIndex`` on a ``DataFrame`` via the :meth:`~Panel.to_frame` or with the `xarray package <http://xarray.pydata.org/en/stable/>`__. Pandas
1375-
provides a :meth:`~Panel.to_xarray` method to automate this conversion. See the documentation :ref:`Deprecate Panel <dsintro.deprecate_panel>`. (:issue:`13563`).
1357+
provides a :meth:`~Panel.to_xarray` method to automate this conversion. For more details see :ref:`Deprecate Panel <dsintro.deprecate_panel>` documentation. (:issue:`13563`).
13761358

13771359
.. ipython:: python
13781360
:okwarning:
@@ -1420,7 +1402,7 @@ This is an illustrative example:
14201402

14211403
Here is a typical useful syntax for computing different aggregations for different columns. This
14221404
is a natural, and useful syntax. We aggregate from the dict-to-list by taking the specified
1423-
columns and applying the list of functions. This returns a ``MultiIndex`` for the columns.
1405+
columns and applying the list of functions. This returns a ``MultiIndex`` for the columns (this is *not* deprecated).
14241406

14251407
.. ipython:: python
14261408

pandas/core/indexes/interval.py

+3
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,9 @@ class IntervalIndex(IntervalMixin, Index):
9999
100100
.. versionadded:: 0.20.0
101101
102+
Warning: the indexing behaviors are provisional and may change in
103+
a future version of pandas.
104+
102105
Attributes
103106
----------
104107
left, right : array-like (1-dimensional)

0 commit comments

Comments
 (0)