You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/source/whatsnew/v0.20.0.txt
+38-41
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
.. _whatsnew_0200:
2
2
3
-
v0.20.0 (May 12, 2017)
4
-
------------------------
3
+
v0.20.0 (May 4, 2017)
4
+
---------------------
5
5
6
6
This is a major release from 0.19.2 and includes a number of API changes, deprecations, new features,
7
7
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
@@ -17,8 +17,8 @@ Highlights include:
17
17
- Improved user API when accessing levels in ``.groupby()``, see :ref:`here <whatsnew_0200.enhancements.groupby_access>`
18
18
- Improved support for ``UInt64`` dtypes, see :ref:`here <whatsnew_0200.enhancements.uint64_support>`
19
19
- A new orient for JSON serialization, ``orient='table'``, that uses the :ref:`Table Schema spec <whatsnew_0200.enhancements.table_schema>`
20
-
- Experimental support for exporting ``DataFrame.style`` formats to Excel, see :ref:`here <whatsnew_0200.enhancements.style_excel>`
21
-
- Window Binary Corr/Cov operations now return a MultiIndexed ``DataFrame`` rather than a ``Panel``, as ``Panel`` is now deprecated, see :ref:`here <whatsnew_0200.api_breaking.rolling_pairwise>`
20
+
- Experimental support for exporting ``DataFrame.style`` formats to Excel, see :ref:`here <whatsnew_0200.enhancements.style_excel>`
21
+
- Window binary corr/cov operations now return a MultiIndexed ``DataFrame`` rather than a ``Panel``, as ``Panel`` is now deprecated, see :ref:`here <whatsnew_0200.api_breaking.rolling_pairwise>`
22
22
- Support for S3 handling now uses ``s3fs``, see :ref:`here <whatsnew_0200.api_breaking.s3>`
23
23
- Google BigQuery support now uses the ``pandas-gbq`` library, see :ref:`here <whatsnew_0200.api_breaking.gbq>`
24
24
- Switched the test framework to use `pytest <http://doc.pytest.org/en/latest>`__ (:issue:`13097`)
@@ -44,10 +44,10 @@ New features
44
44
``agg`` API
45
45
^^^^^^^^^^^
46
46
47
-
Series & DataFrame have been enhanced to support the aggregation API. This is an already familiar API that
48
-
is supported for groupby, window operations, and resampling. This allows one to express aggregation operations
49
-
in a single concise way by using :meth:`~DataFrame.agg`,
50
-
and :meth:`~DataFrame.transform`. The full documentation is :ref:`here <basics.aggregate>` (:issue:`1623`).
47
+
Series & DataFrame have been enhanced to support the aggregation API. This is a familiar API
48
+
from groupby, window operations, and resampling. This allows aggregation operations in a concise
49
+
by using :meth:`~DataFrame.agg`, and :meth:`~DataFrame.transform`. The full documentation
50
+
is :ref:`here <basics.aggregate>` (:issue:`1623`).
51
51
52
52
Here is a sample
53
53
@@ -66,28 +66,28 @@ Using a single function is equivalent to ``.apply``.
66
66
67
67
df.agg('sum')
68
68
69
-
Multiple functions in lists.
69
+
Multiple aggregations with a list of functions.
70
70
71
71
.. ipython:: python
72
72
73
73
df.agg(['sum', 'min'])
74
74
75
-
Using a dict provides the ability to have selective aggregation per column.
76
-
You will get a matrix-like output of all of the aggregators. The output will consist
77
-
of all unique functions. Those that are not noted for a particular column will be ``NaN``:
75
+
Using a dict provides the ability to apply specific aggregations per column.
76
+
You will get a matrix-like output of all of the aggregators. The output has one column
77
+
per unique function. Those functions applied to a particular column will be ``NaN``:
The API also supports a ``.transform()`` function to provide for broadcasting results.
83
+
The API also supports a ``.transform()`` function for broadcasting results.
84
84
85
85
.. ipython:: python
86
86
:okwarning:
87
87
88
88
df.transform(['abs', lambda x: x - x.min()])
89
89
90
-
When presented with mixed dtypes that cannot aggregate, ``.agg()`` will only take the valid
90
+
When presented with mixed dtypes that cannot be aggregated, ``.agg()`` will only take the valid
91
91
aggregations. This is similiar to how groupby ``.agg()`` works. (:issue:`15015`)
92
92
93
93
.. ipython:: python
@@ -107,7 +107,7 @@ aggregations. This is similiar to how groupby ``.agg()`` works. (:issue:`15015`)
107
107
``dtype`` keyword for data IO
108
108
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
109
109
110
-
The ``dtype`` keyword argument in the :func:`read_csv` function for specifying the types of parsed columns is now supported with the ``'python'`` engine (:issue:`14295`). See the :ref:`io docs <io.dtypes>` for more information.
110
+
The ``'python'`` engine for :func:`read_csv` now accepts the ``dtype`` keyword argument for specifying the types of specific columns (:issue:`14295`). See the :ref:`io docs <io.dtypes>` for more information.
111
111
112
112
.. ipython:: python
113
113
:suppress:
@@ -156,7 +156,7 @@ Commonly called 'unix epoch' or POSIX time. This was the previous default, so th
156
156
Groupby Enhancements
157
157
^^^^^^^^^^^^^^^^^^^^
158
158
159
-
Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names (:issue:`5677`)
159
+
Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names.
160
160
161
161
.. ipython:: python
162
162
@@ -172,6 +172,9 @@ Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now refere
172
172
173
173
df.groupby(['second', 'A']).sum()
174
174
175
+
Previously, only column names could be referenced. (:issue:`5677`)
176
+
177
+
175
178
.. _whatsnew_0200.enhancements.compressed_urls:
176
179
177
180
Better support for compressed URLs in ``read_csv``
@@ -181,8 +184,8 @@ The compression code was refactored (:issue:`12688`). As a result, reading
181
184
dataframes from URLs in :func:`read_csv` or :func:`read_table` now supports
182
185
additional compression methods: ``xz``, ``bz2``, and ``zip`` (:issue:`14570`).
183
186
Previously, only ``gzip`` compression was supported. By default, compression of
184
-
URLs and paths are now both inferred using their file extensions. Additionally,
185
-
support for bz2 compression in the python 2 c-engine improved (:issue:`14874`).
187
+
URLs and paths are now inferred using their file extensions. Additionally,
188
+
support for bz2 compression in the python 2 C-engine improved (:issue:`14874`).
186
189
187
190
.. ipython:: python
188
191
@@ -203,7 +206,7 @@ Pickle file I/O now supports compression
203
206
:func:`read_pickle`, :meth:`DataFame.to_pickle` and :meth:`Series.to_pickle`
204
207
can now read from and write to compressed pickle files. Compression methods
205
208
can be an explicit parameter or be inferred from the file extension.
206
-
See :ref:`the docs here <io.pickle.compression>`
209
+
See :ref:`the docs here. <io.pickle.compression>`
207
210
208
211
.. ipython:: python
209
212
@@ -432,7 +435,7 @@ New behavior:
432
435
c
433
436
c.categories
434
437
435
-
Furthermore, this allows one to bin *other* data with these same bins, with ``NaN`` represents a missing
438
+
Furthermore, this allows one to bin *other* data with these same bins, with ``NaN`` representing a missing
436
439
value similar to other dtypes.
437
440
438
441
.. ipython:: python
@@ -465,19 +468,17 @@ Selecting via a scalar value that is contained *in* the intervals.
465
468
Other Enhancements
466
469
^^^^^^^^^^^^^^^^^^
467
470
468
-
- ``DataFrame.rolling()`` now accepts the parameter ``closed='right'|'left'|'both'|'neither'`` to choose the rolling windowendpoint closedness. See the :ref:`documentation <stats.rolling_window.endpoints>` (:issue:`13965`)
471
+
- ``DataFrame.rolling()`` now accepts the parameter ``closed='right'|'left'|'both'|'neither'`` to choose the rolling window-endpoint closedness. See the :ref:`documentation <stats.rolling_window.endpoints>` (:issue:`13965`)
469
472
- Integration with the ``feather-format``, including a new top-level ``pd.read_feather()`` and ``DataFrame.to_feather()`` method, see :ref:`here <io.feather>`.
470
473
- ``Series.str.replace()`` now accepts a callable, as replacement, which is passed to ``re.sub`` (:issue:`15055`)
471
474
- ``Series.str.replace()`` now accepts a compiled regular expression as a pattern (:issue:`15446`)
472
475
- ``Series.sort_index`` accepts parameters ``kind`` and ``na_position`` (:issue:`13589`, :issue:`14444`)
473
476
- ``DataFrame`` has gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`).
474
477
- ``DataFrame`` has gained a ``melt()`` method, equivalent to ``pd.melt()``, for unpivoting from a wide to long format (:issue:`12640`).
475
478
- ``DataFrame.groupby()`` has gained a ``.nunique()`` method to count the distinct values for all columns within each group (:issue:`14336`, :issue:`15197`).
476
-
477
479
- ``pd.read_excel()`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)
478
480
- Multiple offset aliases with decimal points are now supported (e.g. ``0.5min`` is parsed as ``30s``) (:issue:`8419`)
479
481
- ``.isnull()`` and ``.notnull()`` have been added to ``Index`` object to make them more consistent with the ``Series`` API (:issue:`15300`)
480
-
481
482
- New ``UnsortedIndexError`` (subclass of ``KeyError``) raised when indexing/slicing into an
482
483
unsorted MultiIndex (:issue:`11897`). This allows differentiation between errors due to lack
483
484
of sorting or an incorrect key. See :ref:`here <advanced.unsorted>`
@@ -497,20 +498,19 @@ Other Enhancements
497
498
- ``Timedelta.isoformat`` method added for formatting Timedeltas as an `ISO 8601 duration`_. See the :ref:`Timedelta docs <timedeltas.isoformat>` (:issue:`15136`)
498
499
- ``.select_dtypes()`` now allows the string ``datetimetz`` to generically select datetimes with tz (:issue:`14910`)
499
500
- The ``.to_latex()`` method will now accept ``multicolumn`` and ``multirow`` arguments to use the accompanying LaTeX enhancements
500
-
501
501
- ``pd.merge_asof()`` gained the option ``direction='backward'|'forward'|'nearest'`` (:issue:`14887`)
502
502
- ``Series/DataFrame.asfreq()`` have gained a ``fill_value`` parameter, to fill missing values (:issue:`3715`).
503
503
- ``Series/DataFrame.resample.asfreq`` have gained a ``fill_value`` parameter, to fill missing values during resampling (:issue:`3715`).
504
-
- ``pandas.util.hashing`` has gained a ``hash_tuples`` routine, and ``hash_pandas_object`` has gained the ability to hash a ``MultiIndex`` (:issue:`15224`)
504
+
- ``pandas.util`` has gained a ``hash_tuples`` routine, and ``pandas.util.hash_pandas_object`` has gained the ability to hash a ``MultiIndex`` (:issue:`15224`)
505
505
- ``Series/DataFrame.squeeze()`` have gained the ``axis`` parameter. (:issue:`15339`)
506
506
- ``DataFrame.to_excel()`` has a new ``freeze_panes`` parameter to turn on Freeze Panes when exporting to Excel (:issue:`15160`)
507
-
- ``pd.read_html()`` will parse multiple header rows, creating a multiindex header. (:issue:`13434`).
507
+
- ``pd.read_html()`` will parse multiple header rows, creating a MutliIndex header. (:issue:`13434`).
508
508
- HTML table output skips ``colspan`` or ``rowspan`` attribute if equal to 1. (:issue:`15403`)
509
-
- ``pd.io.api.Styler`` template now has blocks for easier extension, :ref:`see the example notebook <style.ipynb#Subclassing>` (:issue:`15649`)
509
+
- :class:`pandas.io.formats.style.Styler`` template now has blocks for easier extension, :ref:`see the example notebook <style.ipynb#Subclassing>` (:issue:`15649`)
510
+
- :meth:`pandas.io.formats.style.Styler.render` now accepts ``**kwargs`` to allow user-defined variables in the template (:issue:`15649`)
510
511
- ``pd.io.api.Styler.render`` now accepts ``**kwargs`` to allow user-defined variables in the template (:issue:`15649`)
511
-
- Compatability with Jupyter notebook 5.0; MultiIndex column labels are left-aligned and MultiIndex row-labels are top-aligned (:issue:`15379`)
512
-
513
-
- ``TimedeltaIndex`` now has a custom datetick formatter specifically designed for nanosecond level precision (:issue:`8711`)
512
+
- Compatibility with Jupyter notebook 5.0; MultiIndex column labels are left-aligned and MultiIndex row-labels are top-aligned (:issue:`15379`)
513
+
- ``TimedeltaIndex`` now has a custom date-tick formatter specifically designed for nanosecond level precision (:issue:`8711`)
514
514
- ``pd.api.types.union_categoricals`` gained the ``ignore_ordered`` argument to allow ignoring the ordered attribute of unioned categoricals (:issue:`13410`). See the :ref:`categorical union docs <categorical.union>` for more information.
515
515
- ``DataFrame.to_latex()`` and ``DataFrame.to_string()`` now allow optional header aliases. (:issue:`15536`)
516
516
- Re-enable the ``parse_dates`` keyword of ``pd.read_excel()`` to parse string columns as dates (:issue:`14326`)
@@ -524,9 +524,8 @@ Other Enhancements
524
524
- ``pd.read_csv()`` now supports the ``error_bad_lines`` and ``warn_bad_lines`` arguments for the Python parser (:issue:`15925`)
525
525
- The ``display.show_dimensions`` option can now also be used to specify
526
526
whether the length of a ``Series`` should be shown in its repr (:issue:`7117`).
527
-
- ``parallel_coordinates()`` has gained a ``sort_labels`` keyword arg that sorts class labels and the colours assigned to them (:issue:`15908`)
527
+
- ``parallel_coordinates()`` has gained a ``sort_labels`` keyword argument that sorts class labels and the colors assigned to them (:issue:`15908`)
528
528
- Options added to allow one to turn on/off using ``bottleneck`` and ``numexpr``, see :ref:`here <basics.accelerate>` (:issue:`16157`)
529
-
530
529
- ``DataFrame.style.bar()`` now accepts two more options to further customize the bar chart. Bar alignment is set with ``align='left'|'mid'|'zero'``, the default is "left", which is backward compatible; You can now pass a list of ``color=[color_negative, color_positive]``. (:issue:`14757`)
531
530
532
531
@@ -653,7 +652,7 @@ Accessing datetime fields of Index now return Index
653
652
The datetime-related attributes (see :ref:`here <timeseries.components>`
654
653
for an overview) of ``DatetimeIndex``, ``PeriodIndex`` and ``TimedeltaIndex`` previously
655
654
returned numpy arrays. They will now return a new ``Index`` object, except
656
-
in the case of a boolean field, where the result will stil be a boolean ndarray. (:issue:`15022`)
655
+
in the case of a boolean field, where the result will still be a boolean ndarray. (:issue:`15022`)
657
656
658
657
Previous behaviour:
659
658
@@ -682,7 +681,7 @@ pd.unique will now be consistent with extension types
:meth:`Index.intersection` now preserves the order of the calling ``Index`` (left)
1088
-
instead of the other ``Index`` (right) (:issue:`15582`). This affects the inner
1089
-
joins, :meth:`DataFrame.join` and :func:`merge`, and the ``.align`` methods.
1087
+
instead of the other ``Index`` (right) (:issue:`15582`). This affects inner
1088
+
joins, :meth:`DataFrame.join` and :func:`merge`, and the ``.align`` method.
1090
1089
1091
1090
- ``Index.intersection``
1092
1091
@@ -1141,7 +1140,7 @@ Pivot Table always returns a DataFrame
1141
1140
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1142
1141
1143
1142
The documentation for :meth:`pivot_table` states that a ``DataFrame`` is *always* returned. Here a bug
1144
-
is fixed that allowed this to return a ``Series`` under a narrow circumstance. (:issue:`4386`)
1143
+
is fixed that allowed this to return a ``Series`` under certain circumstance. (:issue:`4386`)
1145
1144
1146
1145
.. ipython:: python
1147
1146
@@ -1199,7 +1198,6 @@ Other API Changes
1199
1198
- ``NaT`` will now returns ``NaT`` for ``tz_localize`` and ``tz_convert``
1200
1199
methods (:issue:`15830`)
1201
1200
- ``DataFrame`` and ``Panel`` constructors with invalid input will now raise ``ValueError`` rather than ``PandasError``, if called with scalar inputs and not axes (:issue:`15541`)
1202
-
1203
1201
- ``DataFrame`` and ``Panel`` constructors with invalid input will now raise ``ValueError`` rather than ``pandas.core.common.PandasError``, if called with scalar inputs and not axes; The exception ``PandasError`` is removed as well. (:issue:`15541`)
1204
1202
- The exception ``pandas.core.common.AmbiguousIndexError`` is removed as it is not referenced (:issue:`15541`)
1205
1203
@@ -1324,7 +1322,6 @@ Deprecate ``.ix``
1324
1322
1325
1323
The ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*, depending on the data type of the index. This has caused quite a bit of user confusion over the years. The full indexing documentation are :ref:`here <indexing>`. (:issue:`14218`)
1326
1324
1327
-
1328
1325
The recommended methods of indexing are:
1329
1326
1330
1327
- ``.loc`` if you want to *label* index
@@ -1720,7 +1717,7 @@ Reshaping
1720
1717
- Bug in ``DataFrame.pivot_table()`` where ``dropna=True`` would not drop all-NaN columns when the columns was a ``category`` dtype (:issue:`15193`)
1721
1718
- Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
1722
1719
- Bug in ``pd.pivot_table()`` where no error was raised when values argument was not in the columns (:issue:`14938`)
1723
-
- Bug in ``pd.concat()`` in which concatting with an empty dataframe with ``join='inner'`` was being improperly handled (:issue:`15328`)
1720
+
- Bug in ``pd.concat()`` in which concatenating with an empty dataframe with ``join='inner'`` was being improperly handled (:issue:`15328`)
1724
1721
- Bug with ``sort=True`` in ``DataFrame.join`` and ``pd.merge`` when joining on indexes (:issue:`15582`)
1725
1722
- Bug in ``DataFrame.nsmallest`` and ``DataFrame.nlargest`` where identical values resulted in duplicated rows (:issue:`15297`)
0 commit comments