forked from pandas-dev/pandas
-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathv0.24.0.txt
560 lines (382 loc) · 19.4 KB
/
v0.24.0.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
.. _whatsnew_0240:
v0.24.0 (Month XX, 2018)
------------------------
.. _whatsnew_0240.enhancements:
New features
~~~~~~~~~~~~
- ``ExcelWriter`` now accepts ``mode`` as a keyword argument, enabling append to existing workbooks when using the ``openpyxl`` engine (:issue:`3441`)
.. _whatsnew_0240.enhancements.extension_array_operators:
``ExtensionArray`` operator support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A ``Series`` based on an ``ExtensionArray`` now supports arithmetic and comparison
operators (:issue:`19577`). There are two approaches for providing operator support for an ``ExtensionArray``:
1. Define each of the operators on your ``ExtensionArray`` subclass.
2. Use an operator implementation from pandas that depends on operators that are already defined
on the underlying elements (scalars) of the ``ExtensionArray``.
See the :ref:`ExtensionArray Operator Support
<extending.extension.operator>` documentation section for details on both
ways of adding operator support.
.. _whatsnew_0240.enhancements.intna:
Optional Integer NA Support
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Pandas has gained the ability to hold integer dtypes with missing values. This long requested feature is enabled through the use of :ref:`extension types <extending.extension-types>`.
Here is an example of the usage.
We can construct a ``Series`` with the specified dtype. The dtype string ``Int64`` is a pandas ``ExtensionDtype``. Specifying a list or array using the traditional missing value
marker of ``np.nan`` will infer to integer dtype. The display of the ``Series`` will also use the ``NaN`` to indicate missing values in string outputs. (:issue:`20700`, :issue:`20747`)
.. ipython:: python
s = pd.Series([1, 2, np.nan], dtype='Int64')
s
Operations on these dtypes will propagate ``NaN`` as other pandas operations.
.. ipython:: python
# arithmetic
s + 1
# comparison
s == 1
# indexing
s.iloc[1:3]
# operate with other dtypes
s + s.iloc[1:3].astype('Int8')
# coerce when needed
s + 0.01
These dtypes can operate as part of of ``DataFrame``.
.. ipython:: python
df = pd.DataFrame({'A': s, 'B': [1, 1, 3], 'C': list('aab')})
df
df.dtypes
These dtypes can be merged & reshaped & casted.
.. ipython:: python
pd.concat([df[['A']], df[['B', 'C']]], axis=1).dtypes
df['A'].astype(float)
.. warning::
The Integer NA support currently uses the captilized dtype version, e.g. ``Int8`` as compared to the traditional ``int8``. This may be changed at a future date.
.. _whatsnew_0240.enhancements.read_html:
``read_html`` Enhancements
^^^^^^^^^^^^^^^^^^^^^^^^^^
:func:`read_html` previously ignored ``colspan`` and ``rowspan`` attributes.
Now it understands them, treating them as sequences of cells with the same
value. (:issue:`17054`)
.. ipython:: python
result = pd.read_html("""
<table>
<thead>
<tr>
<th>A</th><th>B</th><th>C</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="2">1</td><td>2</td>
</tr>
</tbody>
</table>""")
Previous Behavior:
.. code-block:: ipython
In [13]: result
Out [13]:
[ A B C
0 1 2 NaN]
Current Behavior:
.. ipython:: python
result
.. _whatsnew_0240.enhancements.interval:
Storing Interval Data in Series and DataFrame
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Interval data may now be stored in a ``Series`` or ``DataFrame``, in addition to an
:class:`IntervalIndex` like previously (:issue:`19453`).
.. ipython:: python
ser = pd.Series(pd.interval_range(0, 5))
ser
ser.dtype
Previously, these would be cast to a NumPy array of ``Interval`` objects. In general,
this should result in better performance when storing an array of intervals in
a :class:`Series`.
Note that the ``.values`` of a ``Series`` containing intervals is no longer a NumPy
array, but rather an ``ExtensionArray``:
.. ipython:: python
ser.values
This is the same behavior as ``Series.values`` for categorical data. See
:ref:`whatsnew_0240.api_breaking.interval_values` for more.
.. _whatsnew_0240.enhancements.other:
Other Enhancements
^^^^^^^^^^^^^^^^^^
- :func:`to_datetime` now supports the ``%Z`` and ``%z`` directive when passed into ``format`` (:issue:`13486`)
- :func:`Series.mode` and :func:`DataFrame.mode` now support the ``dropna`` parameter which can be used to specify whether NaN/NaT values should be considered (:issue:`17534`)
- :func:`to_csv` now supports ``compression`` keyword when a file handle is passed. (:issue:`21227`)
- :meth:`Index.droplevel` is now implemented also for flat indexes, for compatibility with :class:`MultiIndex` (:issue:`21115`)
- Added support for reading from Google Cloud Storage via the ``gcsfs`` library (:issue:`19454`)
- :func:`to_gbq` and :func:`read_gbq` signature and documentation updated to
reflect changes from the `Pandas-GBQ library version 0.5.0
<https://pandas-gbq.readthedocs.io/en/latest/changelog.html#changelog-0-5-0>`__.
(:issue:`21627`)
- New method :meth:`HDFStore.walk` will recursively walk the group hierarchy of an HDF5 file (:issue:`10932`)
- :func:`read_html` copies cell data across ``colspan`` and ``rowspan``, and it treats all-``th`` table rows as headers if ``header`` kwarg is not given and there is no ``thead`` (:issue:`17054`)
- :meth:`Series.nlargest`, :meth:`Series.nsmallest`, :meth:`DataFrame.nlargest`, and :meth:`DataFrame.nsmallest` now accept the value ``"all"`` for the ``keep`` argument. This keeps all ties for the nth largest/smallest value (:issue:`16818`)
- :class:`IntervalIndex` has gained the :meth:`~IntervalIndex.set_closed` method to change the existing ``closed`` value (:issue:`21670`)
- :func:`~DataFrame.to_csv` and :func:`~DataFrame.to_json` now support ``compression='infer'`` to infer compression based on filename (:issue:`15008`)
-
.. _whatsnew_0240.api_breaking:
Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. _whatsnew_0240.api_breaking.interval_values:
``IntervalIndex.values`` is now an ``IntervalArray``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The :attr:`~Interval.values` attribute of an :class:`IntervalIndex` now returns an
``IntervalArray``, rather than a NumPy array of :class:`Interval` objects (:issue:`19453`).
Previous Behavior:
.. code-block:: ipython
In [1]: idx = pd.interval_range(0, 4)
In [2]: idx.values
Out[2]:
array([Interval(0, 1, closed='right'), Interval(1, 2, closed='right'),
Interval(2, 3, closed='right'), Interval(3, 4, closed='right')],
dtype=object)
New Behavior:
.. ipython:: python
idx = pd.interval_range(0, 4)
idx.values
This mirrors ``CateogricalIndex.values``, which returns a ``Categorical``.
For situations where you need an ``ndarray`` of ``Interval`` objects, use
:meth:`numpy.asarray` or ``idx.astype(object)``.
.. ipython:: python
np.asarray(idx)
idx.values.astype(object)
.. _whatsnew_0240.api.datetimelike.normalize:
Tick DateOffset Normalize Restrictions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Creating a ``Tick`` object (:class:`Day`, :class:`Hour`, :class:`Minute`,
:class:`Second`, :class:`Milli`, :class:`Micro`, :class:`Nano`) with
`normalize=True` is no longer supported. This prevents unexpected behavior
where addition could fail to be monotone or associative. (:issue:`21427`)
*Previous Behavior*:
.. code-block:: ipython
In [2]: ts = pd.Timestamp('2018-06-11 18:01:14')
In [3]: ts
Out[3]: Timestamp('2018-06-11 18:01:14')
In [4]: tic = pd.offsets.Hour(n=2, normalize=True)
...:
In [5]: tic
Out[5]: <2 * Hours>
In [6]: ts + tic
Out[6]: Timestamp('2018-06-11 00:00:00')
In [7]: ts + tic + tic + tic == ts + (tic + tic + tic)
Out[7]: False
*Current Behavior*:
.. ipython:: python
ts = pd.Timestamp('2018-06-11 18:01:14')
tic = pd.offsets.Hour(n=2)
ts + tic + tic + tic == ts + (tic + tic + tic)
.. _whatsnew_0240.api.datetimelike:
.. _whatsnew_0240.api.period_subtraction:
Period Subtraction
^^^^^^^^^^^^^^^^^^
Subtraction of a ``Period`` from another ``Period`` will give a ``DateOffset``.
instead of an integer (:issue:`21314`)
.. ipython:: python
june = pd.Period('June 2018')
april = pd.Period('April 2018')
june - april
Previous Behavior:
.. code-block:: ipython
In [2]: june = pd.Period('June 2018')
In [3]: april = pd.Period('April 2018')
In [4]: june - april
Out [4]: 2
Similarly, subtraction of a ``Period`` from a ``PeriodIndex`` will now return
an ``Index`` of ``DateOffset`` objects instead of an ``Int64Index``
.. ipython:: python
pi = pd.period_range('June 2018', freq='M', periods=3)
pi - pi[0]
Previous Behavior:
.. code-block:: ipython
In [2]: pi = pd.period_range('June 2018', freq='M', periods=3)
In [3]: pi - pi[0]
Out[3]: Int64Index([0, 1, 2], dtype='int64')
.. _whatsnew_0240.api.extension:
ExtensionType Changes
^^^^^^^^^^^^^^^^^^^^^
- ``ExtensionArray`` has gained the abstract methods ``.dropna()`` (:issue:`21185`)
- ``ExtensionDtype`` has gained the ability to instantiate from string dtypes, e.g. ``decimal`` would instantiate a registered ``DecimalDtype``; furthermore
the ``ExtensionDtype`` has gained the method ``construct_array_type`` (:issue:`21185`)
- The ``ExtensionArray`` constructor, ``_from_sequence`` now take the keyword arg ``copy=False`` (:issue:`21185`)
- Bug in :meth:`Series.get` for ``Series`` using ``ExtensionArray`` and integer index (:issue:`21257`)
- :meth:`Series.combine()` works correctly with :class:`~pandas.api.extensions.ExtensionArray` inside of :class:`Series` (:issue:`20825`)
- :meth:`Series.combine()` with scalar argument now works for any function type (:issue:`21248`)
-
.. _whatsnew_0240.api.incompatibilities:
Series and Index Data-Dtype Incompatibilities
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``Series`` and ``Index`` constructors now raise when the
data is incompatible with a passed ``dtype=`` (:issue:`15832`)
Previous Behavior:
.. code-block:: ipython
In [4]: pd.Series([-1], dtype="uint64")
Out [4]:
0 18446744073709551615
dtype: uint64
Current Behavior:
.. code-block:: ipython
In [4]: pd.Series([-1], dtype="uint64")
Out [4]:
...
OverflowError: Trying to coerce negative values to unsigned integers
Datetimelike API Changes
^^^^^^^^^^^^^^^^^^^^^^^^
- For :class:`DatetimeIndex` and :class:`TimedeltaIndex` with non-``None`` ``freq`` attribute, addition or subtraction of integer-dtyped array or ``Index`` will return an object of the same class (:issue:`19959`)
- :class:`DateOffset` objects are now immutable. Attempting to alter one of these will now raise ``AttributeError`` (:issue:`21341`)
- :class:`PeriodIndex` subtraction of another ``PeriodIndex`` will now return an object-dtype :class:`Index` of :class:`DateOffset` objects instead of raising a ``TypeError`` (:issue:`20049`)
- :func:`cut` and :func:`qcut` now returns a :class:`DatetimeIndex` or :class:`TimedeltaIndex` bins when the input is datetime or timedelta dtype respectively and ``retbins=True`` (:issue:`19891`)
.. _whatsnew_0240.api.other:
Other API Changes
^^^^^^^^^^^^^^^^^
- :class:`DatetimeIndex` now accepts :class:`Int64Index` arguments as epoch timestamps (:issue:`20997`)
- Accessing a level of a ``MultiIndex`` with a duplicate name (e.g. in
:meth:~MultiIndex.get_level_values) now raises a ``ValueError`` instead of
a ``KeyError`` (:issue:`21678`).
- Invalid construction of ``IntervalDtype`` will now always raise a ``TypeError`` rather than a ``ValueError`` if the subdtype is invalid (:issue:`21185`)
- Trying to reindex a ``DataFrame`` with a non unique ``MultiIndex`` now raises a ``ValueError`` instead of an ``Exception`` (:issue:`21770`)
-
.. _whatsnew_0240.deprecations:
Deprecations
~~~~~~~~~~~~
- :meth:`DataFrame.to_stata`, :meth:`read_stata`, :class:`StataReader` and :class:`StataWriter` have deprecated the ``encoding`` argument. The encoding of a Stata dta file is determined by the file type and cannot be changed (:issue:`21244`).
- :meth:`MultiIndex.to_hierarchical` is deprecated and will be removed in a future version (:issue:`21613`)
- :meth:`Series.ptp` is deprecated. Use ``numpy.ptp`` instead (:issue:`21614`)
-
.. _whatsnew_0240.prior_deprecations:
Removal of prior version deprecations/changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- The ``LongPanel`` and ``WidePanel`` classes have been removed (:issue:`10892`)
-
-
-
.. _whatsnew_0240.performance:
Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Very large improvement in performance of slicing when the index is a :class:`CategoricalIndex`,
both when indexing by label (using .loc) and position(.iloc).
Likewise, slicing a ``CategoricalIndex`` itself (i.e. ``ci[100:200]``) shows similar speed improvements (:issue:`21659`)
- Improved performance of :func:`Series.describe` in case of numeric dtpyes (:issue:`21274`)
- Improved performance of :func:`pandas.core.groupby.GroupBy.rank` when dealing with tied rankings (:issue:`21237`)
- Improved performance of :func:`DataFrame.set_index` with columns consisting of :class:`Period` objects (:issue:`21582`,:issue:`21606`)
- Improved performance of membership checks in :class:`Categorical` and :class:`CategoricalIndex`
(i.e. ``x in cat``-style checks are much faster). :meth:`CategoricalIndex.contains`
is likewise much faster (:issue:`21369`, :issue:`21508`)
- Improved performance of :meth:`HDFStore.groups` (and dependent functions like
:meth:`~HDFStore.keys`. (i.e. ``x in store`` checks are much faster)
(:issue:`21372`)
-
.. _whatsnew_0240.docs:
Documentation Changes
~~~~~~~~~~~~~~~~~~~~~
- Added sphinx spelling extension, updated documentation on how to use the spell check (:issue:`21079`)
-
-
.. _whatsnew_0240.bug_fixes:
Bug Fixes
~~~~~~~~~
Categorical
^^^^^^^^^^^
-
-
-
Datetimelike
^^^^^^^^^^^^
- Fixed bug where two :class:`DateOffset` objects with different ``normalize`` attributes could evaluate as equal (:issue:`21404`)
- Fixed bug where :meth:`Timestamp.resolution` incorrectly returned 1-microsecond ``timedelta`` instead of 1-nanosecond :class:`Timedelta` (:issue:`21336`,:issue:`21365`)
Timedelta
^^^^^^^^^
-
-
-
Timezones
^^^^^^^^^
- Bug in :meth:`DatetimeIndex.shift` where an ``AssertionError`` would raise when shifting across DST (:issue:`8616`)
- Bug in :class:`Timestamp` constructor where passing an invalid timezone offset designator (``Z``) would not raise a ``ValueError`` (:issue:`8910`)
- Bug in :meth:`Timestamp.replace` where replacing at a DST boundary would retain an incorrect offset (:issue:`7825`)
- Bug in :meth:`Series.replace` with ``datetime64[ns, tz]`` data when replacing ``NaT`` (:issue:`11792`)
- Bug in :class:`Timestamp` when passing different string date formats with a timezone offset would produce different timezone offsets (:issue:`12064`)
- Bug when comparing a tz-naive :class:`Timestamp` to a tz-aware :class:`DatetimeIndex` which would coerce the :class:`DatetimeIndex` to tz-naive (:issue:`12601`)
- Bug in :meth:`Series.truncate` with a tz-aware :class:`DatetimeIndex` which would cause a core dump (:issue:`9243`)
- Bug in :class:`Series` constructor which would coerce tz-aware and tz-naive :class:`Timestamp` to tz-aware (:issue:`13051`)
- Bug in :class:`Index` with ``datetime64[ns, tz]`` dtype that did not localize integer data correctly (:issue:`20964`)
- Bug in :class:`DatetimeIndex` where constructing with an integer and tz would not localize correctly (:issue:`12619`)
- Fixed bug where :meth:`DataFrame.describe` and :meth:`Series.describe` on tz-aware datetimes did not show `first` and `last` result (:issue:`21328`)
Offsets
^^^^^^^
-
-
Numeric
^^^^^^^
- Bug in :class:`Series` ``__rmatmul__`` doesn't support matrix vector multiplication (:issue:`21530`)
- Bug in :func:`factorize` fails with read-only array (:issue:`12813`)
-
-
Strings
^^^^^^^
-
-
-
Interval
^^^^^^^^
- Bug in the :class:`IntervalIndex` constructor where the ``closed`` parameter did not always override the inferred ``closed`` (:issue:`19370`)
- Bug in the ``IntervalIndex`` repr where a trailing comma was missing after the list of intervals (:issue:`20611`)
-
-
Indexing
^^^^^^^^
- The traceback from a ``KeyError`` when asking ``.loc`` for a single missing label is now shorter and more clear (:issue:`21557`)
- When ``.ix`` is asked for a missing integer label in a :class:`MultiIndex` with a first level of integer type, it now raises a ``KeyError``, consistently with the case of a flat :class:`Int64Index, rather than falling back to positional indexing (:issue:`21593`)
- Bug in :meth:`DatetimeIndex.reindex` when reindexing a tz-naive and tz-aware :class:`DatetimeIndex` (:issue:`8306`)
- Bug in :class:`DataFrame` when setting values with ``.loc`` and a timezone aware :class:`DatetimeIndex` (:issue:`11365`)
- ``DataFrame.__getitem__`` now accepts dictionaries and dictionary keys as list-likes of labels, consistently with ``Series.__getitem__`` (:issue:`21294`)
- Fixed ``DataFrame[np.nan]`` when columns are non-unique (:issue:`21428`)
- Bug when indexing :class:`DatetimeIndex` with nanosecond resolution dates and timezones (:issue:`11679`)
- Bug where indexing with a Numpy array containing negative values would mutate the indexer (:issue:`21867`)
Missing
^^^^^^^
- Bug in :func:`DataFrame.fillna` where a ``ValueError`` would raise when one column contained a ``datetime64[ns, tz]`` dtype (:issue:`15522`)
- Bug in :func:`Series.hasnans` that could be incorrectly cached and return incorrect answers if null elements are introduced after an initial call (:issue:`19700`)
MultiIndex
^^^^^^^^^^
-
-
-
I/O
^^^
- :func:`read_html()` no longer ignores all-whitespace ``<tr>`` within ``<thead>`` when considering the ``skiprows`` and ``header`` arguments. Previously, users had to decrease their ``header`` and ``skiprows`` values on such tables to work around the issue. (:issue:`21641`)
- :func:`read_excel()` will correctly show the deprecation warning for previously deprecated ``sheetname`` (:issue:`17994`)
-
Plotting
^^^^^^^^
- Bug in :func:'DataFrame.plot.scatter' and :func:'DataFrame.plot.hexbin' caused x-axis label and ticklabels to disappear when colorbar was on in IPython inline backend (:issue:`10611`, :issue:`10678`, and :issue:`20455`)
-
Groupby/Resample/Rolling
^^^^^^^^^^^^^^^^^^^^^^^^
- Bug in :func:`pandas.core.groupby.GroupBy.first` and :func:`pandas.core.groupby.GroupBy.last` with ``as_index=False`` leading to the loss of timezone information (:issue:`15884`)
- Bug in :meth:`DatetimeIndex.resample` when downsampling across a DST boundary (:issue:`8531`)
-
-
Sparse
^^^^^^
-
-
-
Reshaping
^^^^^^^^^
- Bug in :func:`pandas.concat` when joining resampled DataFrames with timezone aware index (:issue:`13783`)
- Bug in :meth:`Series.combine_first` with ``datetime64[ns, tz]`` dtype which would return tz-naive result (:issue:`21469`)
- Bug in :meth:`Series.where` and :meth:`DataFrame.where` with ``datetime64[ns, tz]`` dtype (:issue:`21546`)
-
-
Build Changes
^^^^^^^^^^^^^
- Building pandas for development now requires ``cython >= 0.28.2`` (:issue:`21688`)
-
Other
^^^^^
- :meth: `~pandas.io.formats.style.Styler.background_gradient` now takes a ``text_color_threshold`` parameter to automatically lighten the text color based on the luminance of the background color. This improves readability with dark background colors without the need to limit the background colormap range. (:issue:`21258`)
- Require at least 0.28.2 version of ``cython`` to support read-only memoryviews (:issue:`21688`)
- :meth: `~pandas.io.formats.style.Styler.background_gradient` now also supports tablewise application (in addition to rowwise and columnwise) with ``axis=None`` (:issue:`15204`)
-
-
-