forked from pandas-dev/pandas
-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathv0.18.1.txt
694 lines (464 loc) · 29.1 KB
/
v0.18.1.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
.. _whatsnew_0181:
v0.18.1 (May 3, 2016)
---------------------
This is a minor bug-fix release from 0.18.0 and includes a large number of
bug fixes along with several new features, enhancements, and performance improvements.
We recommend that all users upgrade to this version.
Highlights include:
- ``.groupby(...)`` has been enhanced to provide convenient syntax when working with ``.rolling(..)``, ``.expanding(..)`` and ``.resample(..)`` per group, see :ref:`here <whatsnew_0181.deferred_ops>`
- ``pd.to_datetime()`` has gained the ability to assemble dates from a ``DataFrame``, see :ref:`here <whatsnew_0181.enhancements.assembling>`
- Method chaining improvements, see :ref:`here <whatsnew_0181.enhancements.method_chain>`.
- Custom business hour offset, see :ref:`here <whatsnew_0181.enhancements.custombusinesshour>`.
- Many bug fixes in the handling of ``sparse``, see :ref:`here <whatsnew_0181.sparse>`
- Expanded the :ref:`Tutorials section <tutorial-modern>` with a feature on modern pandas, courtesy of `@TomAugsburger <https://twitter.com/TomAugspurger>`__. (:issue:`13045`).
.. contents:: What's new in v0.18.1
:local:
:backlinks: none
.. _whatsnew_0181.new_features:
New features
~~~~~~~~~~~~
.. _whatsnew_0181.enhancements.custombusinesshour:
Custom Business Hour
^^^^^^^^^^^^^^^^^^^^
The ``CustomBusinessHour`` is a mixture of ``BusinessHour`` and ``CustomBusinessDay`` which
allows you to specify arbitrary holidays. For details,
see :ref:`Custom Business Hour <timeseries.custombusinesshour>` (:issue:`11514`)
.. ipython:: python
from pandas.tseries.offsets import CustomBusinessHour
from pandas.tseries.holiday import USFederalHolidayCalendar
bhour_us = CustomBusinessHour(calendar=USFederalHolidayCalendar())
Friday before MLK Day
.. ipython:: python
dt = datetime(2014, 1, 17, 15)
dt + bhour_us
Tuesday after MLK Day (Monday is skipped because it's a holiday)
.. ipython:: python
dt + bhour_us * 2
.. _whatsnew_0181.deferred_ops:
``.groupby(..)`` syntax with window and resample operations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``.groupby(...)`` has been enhanced to provide convenient syntax when working with ``.rolling(..)``, ``.expanding(..)`` and ``.resample(..)`` per group, see (:issue:`12486`, :issue:`12738`).
You can now use ``.rolling(..)`` and ``.expanding(..)`` as methods on groupbys. These return another deferred object (similar to what ``.rolling()`` and ``.expanding()`` do on ungrouped pandas objects). You can then operate on these ``RollingGroupby`` objects in a similar manner.
Previously you would have to do this to get a rolling window mean per-group:
.. ipython:: python
df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
'B': np.arange(40)})
df
.. ipython:: python
df.groupby('A').apply(lambda x: x.rolling(4).B.mean())
Now you can do:
.. ipython:: python
df.groupby('A').rolling(4).B.mean()
For ``.resample(..)`` type of operations, previously you would have to:
.. ipython:: python
df = pd.DataFrame({'date': pd.date_range(start='2016-01-01',
periods=4,
freq='W'),
'group': [1, 1, 2, 2],
'val': [5, 6, 7, 8]}).set_index('date')
df
.. ipython:: python
df.groupby('group').apply(lambda x: x.resample('1D').ffill())
Now you can do:
.. ipython:: python
df.groupby('group').resample('1D').ffill()
.. _whatsnew_0181.enhancements.method_chain:
Method chaining improvements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The following methods / indexers now accept a ``callable``. It is intended to make
these more useful in method chains, see the :ref:`documentation <indexing.callable>`.
(:issue:`11485`, :issue:`12533`)
- ``.where()`` and ``.mask()``
- ``.loc[]``, ``iloc[]`` and ``.ix[]``
- ``[]`` indexing
``.where()`` and ``.mask()``
""""""""""""""""""""""""""""
These can accept a callable for the condition and ``other``
arguments.
.. ipython:: python
df = pd.DataFrame({'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]})
df.where(lambda x: x > 4, lambda x: x + 10)
``.loc[]``, ``.iloc[]``, ``.ix[]``
""""""""""""""""""""""""""""""""""
These can accept a callable, and a tuple of callable as a slicer. The callable
can return a valid boolean indexer or anything which is valid for these indexer's input.
.. ipython:: python
# callable returns bool indexer
df.loc[lambda x: x.A >= 2, lambda x: x.sum() > 10]
# callable returns list of labels
df.loc[lambda x: [1, 2], lambda x: ['A', 'B']]
``[]`` indexing
"""""""""""""""
Finally, you can use a callable in ``[]`` indexing of Series, DataFrame and Panel.
The callable must return a valid input for ``[]`` indexing depending on its
class and index type.
.. ipython:: python
df[lambda x: 'A']
Using these methods / indexers, you can chain data selection operations
without using temporary variable.
.. ipython:: python
bb = pd.read_csv('data/baseball.csv', index_col='id')
(bb.groupby(['year', 'team'])
.sum()
.loc[lambda df: df.r > 100]
)
.. _whatsnew_0181.partial_string_indexing:
Partial string indexing on ``DateTimeIndex`` when part of a ``MultiIndex``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Partial string indexing now matches on ``DateTimeIndex`` when part of a ``MultiIndex`` (:issue:`10331`)
.. ipython:: python
dft2 = pd.DataFrame(np.random.randn(20, 1),
columns=['A'],
index=pd.MultiIndex.from_product([pd.date_range('20130101',
periods=10,
freq='12H'),
['a', 'b']]))
dft2
dft2.loc['2013-01-05']
On other levels
.. ipython:: python
idx = pd.IndexSlice
dft2 = dft2.swaplevel(0, 1).sort_index()
dft2
dft2.loc[idx[:, '2013-01-05'], :]
.. _whatsnew_0181.enhancements.assembling:
Assembling Datetimes
^^^^^^^^^^^^^^^^^^^^
``pd.to_datetime()`` has gained the ability to assemble datetimes from a passed in ``DataFrame`` or a dict. (:issue:`8158`).
.. ipython:: python
df = pd.DataFrame({'year': [2015, 2016],
'month': [2, 3],
'day': [4, 5],
'hour': [2, 3]})
df
Assembling using the passed frame.
.. ipython:: python
pd.to_datetime(df)
You can pass only the columns that you need to assemble.
.. ipython:: python
pd.to_datetime(df[['year', 'month', 'day']])
.. _whatsnew_0181.other:
Other Enhancements
^^^^^^^^^^^^^^^^^^
- ``pd.read_csv()`` now supports ``delim_whitespace=True`` for the Python engine (:issue:`12958`)
- ``pd.read_csv()`` now supports opening ZIP files that contains a single CSV, via extension inference or explicit ``compression='zip'`` (:issue:`12175`)
- ``pd.read_csv()`` now supports opening files using xz compression, via extension inference or explicit ``compression='xz'`` is specified; ``xz`` compressions is also supported by ``DataFrame.to_csv`` in the same way (:issue:`11852`)
- ``pd.read_msgpack()`` now always gives writeable ndarrays even when compression is used (:issue:`12359`).
- ``pd.read_msgpack()`` now supports serializing and de-serializing categoricals with msgpack (:issue:`12573`)
- ``.to_json()`` now supports ``NDFrames`` that contain categorical and sparse data (:issue:`10778`)
- ``interpolate()`` now supports ``method='akima'`` (:issue:`7588`).
- ``pd.read_excel()`` now accepts path objects (e.g. ``pathlib.Path``, ``py.path.local``) for the file path, in line with other ``read_*`` functions (:issue:`12655`)
- Added ``.weekday_name`` property as a component to ``DatetimeIndex`` and the ``.dt`` accessor. (:issue:`11128`)
- ``Index.take`` now handles ``allow_fill`` and ``fill_value`` consistently (:issue:`12631`)
.. ipython:: python
idx = pd.Index([1., 2., 3., 4.], dtype='float')
# default, allow_fill=True, fill_value=None
idx.take([2, -1])
idx.take([2, -1], fill_value=True)
- ``Index`` now supports ``.str.get_dummies()`` which returns ``MultiIndex``, see :ref:`Creating Indicator Variables <text.indicator>` (:issue:`10008`, :issue:`10103`)
.. ipython:: python
idx = pd.Index(['a|b', 'a|c', 'b|c'])
idx.str.get_dummies('|')
- ``pd.crosstab()`` has gained a ``normalize`` argument for normalizing frequency tables (:issue:`12569`). Examples in the updated docs :ref:`here <reshaping.crosstabulations>`.
- ``.resample(..).interpolate()`` is now supported (:issue:`12925`)
- ``.isin()`` now accepts passed ``sets`` (:issue:`12988`)
.. _whatsnew_0181.sparse:
Sparse changes
~~~~~~~~~~~~~~
These changes conform sparse handling to return the correct types and work to make a smoother experience with indexing.
``SparseArray.take`` now returns a scalar for scalar input, ``SparseArray`` for others. Furthermore, it handles a negative indexer with the same rule as ``Index`` (:issue:`10560`, :issue:`12796`)
.. code-block:: python
s = pd.SparseArray([np.nan, np.nan, 1, 2, 3, np.nan, 4, 5, np.nan, 6])
s.take(0)
s.take([1, 2, 3])
- Bug in ``SparseSeries[]`` indexing with ``Ellipsis`` raises ``KeyError`` (:issue:`9467`)
- Bug in ``SparseArray[]`` indexing with tuples are not handled properly (:issue:`12966`)
- Bug in ``SparseSeries.loc[]`` with list-like input raises ``TypeError`` (:issue:`10560`)
- Bug in ``SparseSeries.iloc[]`` with scalar input may raise ``IndexError`` (:issue:`10560`)
- Bug in ``SparseSeries.loc[]``, ``.iloc[]`` with ``slice`` returns ``SparseArray``, rather than ``SparseSeries`` (:issue:`10560`)
- Bug in ``SparseDataFrame.loc[]``, ``.iloc[]`` may results in dense ``Series``, rather than ``SparseSeries`` (:issue:`12787`)
- Bug in ``SparseArray`` addition ignores ``fill_value`` of right hand side (:issue:`12910`)
- Bug in ``SparseArray`` mod raises ``AttributeError`` (:issue:`12910`)
- Bug in ``SparseArray`` pow calculates ``1 ** np.nan`` as ``np.nan`` which must be 1 (:issue:`12910`)
- Bug in ``SparseArray`` comparison output may incorrect result or raise ``ValueError`` (:issue:`12971`)
- Bug in ``SparseSeries.__repr__`` raises ``TypeError`` when it is longer than ``max_rows`` (:issue:`10560`)
- Bug in ``SparseSeries.shape`` ignores ``fill_value`` (:issue:`10452`)
- Bug in ``SparseSeries`` and ``SparseArray`` may have different ``dtype`` from its dense values (:issue:`12908`)
- Bug in ``SparseSeries.reindex`` incorrectly handle ``fill_value`` (:issue:`12797`)
- Bug in ``SparseArray.to_frame()`` results in ``DataFrame``, rather than ``SparseDataFrame`` (:issue:`9850`)
- Bug in ``SparseSeries.value_counts()`` does not count ``fill_value`` (:issue:`6749`)
- Bug in ``SparseArray.to_dense()`` does not preserve ``dtype`` (:issue:`10648`)
- Bug in ``SparseArray.to_dense()`` incorrectly handle ``fill_value`` (:issue:`12797`)
- Bug in ``pd.concat()`` of ``SparseSeries`` results in dense (:issue:`10536`)
- Bug in ``pd.concat()`` of ``SparseDataFrame`` incorrectly handle ``fill_value`` (:issue:`9765`)
- Bug in ``pd.concat()`` of ``SparseDataFrame`` may raise ``AttributeError`` (:issue:`12174`)
- Bug in ``SparseArray.shift()`` may raise ``NameError`` or ``TypeError`` (:issue:`12908`)
.. _whatsnew_0181.api:
API changes
~~~~~~~~~~~
.. _whatsnew_0181.api.groubynth:
``.groupby(..).nth()`` changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The index in ``.groupby(..).nth()`` output is now more consistent when the ``as_index`` argument is passed (:issue:`11039`):
.. ipython:: python
df = DataFrame({'A' : ['a', 'b', 'a'],
'B' : [1, 2, 3]})
df
Previous Behavior:
.. code-block:: ipython
In [3]: df.groupby('A', as_index=True)['B'].nth(0)
Out[3]:
0 1
1 2
Name: B, dtype: int64
In [4]: df.groupby('A', as_index=False)['B'].nth(0)
Out[4]:
0 1
1 2
Name: B, dtype: int64
New Behavior:
.. ipython:: python
df.groupby('A', as_index=True)['B'].nth(0)
df.groupby('A', as_index=False)['B'].nth(0)
Furthermore, previously, a ``.groupby`` would always sort, regardless if ``sort=False`` was passed with ``.nth()``.
.. ipython:: python
np.random.seed(1234)
df = pd.DataFrame(np.random.randn(100, 2), columns=['a', 'b'])
df['c'] = np.random.randint(0, 4, 100)
Previous Behavior:
.. code-block:: ipython
In [4]: df.groupby('c', sort=True).nth(1)
Out[4]:
a b
c
0 -0.334077 0.002118
1 0.036142 -2.074978
2 -0.720589 0.887163
3 0.859588 -0.636524
In [5]: df.groupby('c', sort=False).nth(1)
Out[5]:
a b
c
0 -0.334077 0.002118
1 0.036142 -2.074978
2 -0.720589 0.887163
3 0.859588 -0.636524
New Behavior:
.. ipython:: python
df.groupby('c', sort=True).nth(1)
df.groupby('c', sort=False).nth(1)
.. _whatsnew_0181.numpy_compatibility:
numpy function compatibility
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Compatibility between pandas array-like methods (e.g. ``sum`` and ``take``) and their ``numpy``
counterparts has been greatly increased by augmenting the signatures of the ``pandas`` methods so
as to accept arguments that can be passed in from ``numpy``, even if they are not necessarily
used in the ``pandas`` implementation (:issue:`12644`, :issue:`12638`, :issue:`12687`)
- ``.searchsorted()`` for ``Index`` and ``TimedeltaIndex`` now accept a ``sorter`` argument to maintain compatibility with numpy's ``searchsorted`` function (:issue:`12238`)
- Bug in numpy compatibility of ``np.round()`` on a ``Series`` (:issue:`12600`)
An example of this signature augmentation is illustrated below:
.. ipython:: python
sp = pd.SparseDataFrame([1, 2, 3])
sp
Previous behaviour:
.. code-block:: ipython
In [2]: np.cumsum(sp, axis=0)
...
TypeError: cumsum() takes at most 2 arguments (4 given)
New behaviour:
.. ipython:: python
np.cumsum(sp, axis=0)
.. _whatsnew_0181.apply_resample:
Using ``.apply`` on groupby resampling
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Using ``apply`` on resampling groupby operations (using a ``pd.TimeGrouper``) now has the same output types as similar ``apply`` calls on other groupby operations. (:issue:`11742`).
.. ipython:: python
df = pd.DataFrame({'date': pd.to_datetime(['10/10/2000', '11/10/2000']),
'value': [10, 13]})
df
Previous behavior:
.. code-block:: ipython
In [1]: df.groupby(pd.TimeGrouper(key='date', freq='M')).apply(lambda x: x.value.sum())
Out[1]:
...
TypeError: cannot concatenate a non-NDFrame object
# Output is a Series
In [2]: df.groupby(pd.TimeGrouper(key='date', freq='M')).apply(lambda x: x[['value']].sum())
Out[2]:
date
2000-10-31 value 10
2000-11-30 value 13
dtype: int64
New Behavior:
.. code-block:: python
# Output is a Series
In [55]: df.groupby(pd.TimeGrouper(key='date', freq='M')).apply(lambda x: x.value.sum())
Out[55]:
date
2000-10-31 10
2000-11-30 13
Freq: M, dtype: int64
# Output is a DataFrame
In [56]: df.groupby(pd.TimeGrouper(key='date', freq='M')).apply(lambda x: x[['value']].sum())
Out[56]:
value
date
2000-10-31 10
2000-11-30 13
.. _whatsnew_0181.read_csv_exceptions:
Changes in ``read_csv`` exceptions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In order to standardize the ``read_csv`` API for both the ``c`` and ``python`` engines, both will now raise an
``EmptyDataError``, a subclass of ``ValueError``, in response to empty columns or header (:issue:`12493`, :issue:`12506`)
Previous behaviour:
.. code-block:: ipython
In [1]: df = pd.read_csv(StringIO(''), engine='c')
...
ValueError: No columns to parse from file
In [2]: df = pd.read_csv(StringIO(''), engine='python')
...
StopIteration
New behaviour:
.. code-block:: ipython
In [1]: df = pd.read_csv(StringIO(''), engine='c')
...
pandas.io.common.EmptyDataError: No columns to parse from file
In [2]: df = pd.read_csv(StringIO(''), engine='python')
...
pandas.io.common.EmptyDataError: No columns to parse from file
In addition to this error change, several others have been made as well:
- ``CParserError`` now sub-classes ``ValueError`` instead of just a ``Exception`` (:issue:`12551`)
- A ``CParserError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the ``c`` engine cannot parse a column (:issue:`12506`)
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the ``c`` engine encounters a ``NaN`` value in an integer column (:issue:`12506`)
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when ``true_values`` is specified, and the ``c`` engine encounters an element in a column containing unencodable bytes (:issue:`12506`)
- ``pandas.parser.OverflowError`` exception has been removed and has been replaced with Python's built-in ``OverflowError`` exception (:issue:`12506`)
- ``pd.read_csv()`` no longer allows a combination of strings and integers for the ``usecols`` parameter (:issue:`12678`)
.. _whatsnew_0181.api.to_datetime:
``to_datetime`` error changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Bugs in ``pd.to_datetime()`` when passing a ``unit`` with convertible entries and ``errors='coerce'`` or non-convertible with ``errors='ignore'``. Furthermore, an ``OutOfBoundsDateime`` exception will be raised when an out-of-range value is encountered for that unit when ``errors='raise'``. (:issue:`11758`, :issue:`13052`, :issue:`13059`)
Previous behaviour:
.. code-block:: ipython
In [27]: pd.to_datetime(1420043460, unit='s', errors='coerce')
Out[27]: NaT
In [28]: pd.to_datetime(11111111, unit='D', errors='ignore')
OverflowError: Python int too large to convert to C long
In [29]: pd.to_datetime(11111111, unit='D', errors='raise')
OverflowError: Python int too large to convert to C long
New behaviour:
.. code-block:: ipython
In [2]: pd.to_datetime(1420043460, unit='s', errors='coerce')
Out[2]: Timestamp('2014-12-31 16:31:00')
In [3]: pd.to_datetime(11111111, unit='D', errors='ignore')
Out[3]: 11111111
In [4]: pd.to_datetime(11111111, unit='D', errors='raise')
OutOfBoundsDatetime: cannot convert input with unit 'D'
.. _whatsnew_0181.api.other:
Other API changes
^^^^^^^^^^^^^^^^^
- ``.swaplevel()`` for ``Series``, ``DataFrame``, ``Panel``, and ``MultiIndex`` now features defaults for its first two parameters ``i`` and ``j`` that swap the two innermost levels of the index. (:issue:`12934`)
- ``.searchsorted()`` for ``Index`` and ``TimedeltaIndex`` now accept a ``sorter`` argument to maintain compatibility with numpy's ``searchsorted`` function (:issue:`12238`)
- ``Period`` and ``PeriodIndex`` now raises ``IncompatibleFrequency`` error which inherits ``ValueError`` rather than raw ``ValueError`` (:issue:`12615`)
- ``Series.apply`` for category dtype now applies the passed function to each of the ``.categories`` (and not the ``.codes``), and returns a ``category`` dtype if possible (:issue:`12473`)
- ``read_csv`` will now raise a ``TypeError`` if ``parse_dates`` is neither a boolean, list, or dictionary (matches the doc-string) (:issue:`5636`)
- The default for ``.query()/.eval()`` is now ``engine=None``, which will use ``numexpr`` if it's installed; otherwise it will fallback to the ``python`` engine. This mimics the pre-0.18.1 behavior if ``numexpr`` is installed (and which, previously, if numexpr was not installed, ``.query()/.eval()`` would raise). (:issue:`12749`)
- ``pd.show_versions()`` now includes ``pandas_datareader`` version (:issue:`12740`)
- Provide a proper ``__name__`` and ``__qualname__`` attributes for generic functions (:issue:`12021`)
- ``pd.concat(ignore_index=True)`` now uses ``RangeIndex`` as default (:issue:`12695`)
- ``pd.merge()`` and ``DataFrame.join()`` will show a ``UserWarning`` when merging/joining a single- with a multi-leveled dataframe (:issue:`9455`, :issue:`12219`)
- Compat with ``scipy`` > 0.17 for deprecated ``piecewise_polynomial`` interpolation method; support for the replacement ``from_derivatives`` method (:issue:`12887`)
.. _whatsnew_0181.deprecations:
Deprecations
^^^^^^^^^^^^
- The method name ``Index.sym_diff()`` is deprecated and can be replaced by ``Index.symmetric_difference()`` (:issue:`12591`)
- The method name ``Categorical.sort()`` is deprecated in favor of ``Categorical.sort_values()`` (:issue:`12882`)
.. _whatsnew_0181.performance:
Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Improved speed of SAS reader (:issue:`12656`, :issue:`12961`)
- Performance improvements in ``.groupby(..).cumcount()`` (:issue:`11039`)
- Improved memory usage in ``pd.read_csv()`` when using ``skiprows=an_integer`` (:issue:`13005`)
- Improved performance of ``DataFrame.to_sql`` when checking case sensitivity for tables. Now only checks if table has been created correctly when table name is not lower case. (:issue:`12876`)
- Improved performance of ``Period`` construction and time series plotting (:issue:`12903`, :issue:`11831`).
- Improved performance of ``.str.encode()`` and ``.str.decode()`` methods (:issue:`13008`)
- Improved performance of ``to_numeric`` if input is numeric dtype (:issue:`12777`)
- Improved performance of sparse arithmetic with ``IntIndex`` (:issue:`13036`)
.. _whatsnew_0181.bug_fixes:
Bug Fixes
~~~~~~~~~
- ``usecols`` parameter in ``pd.read_csv`` is now respected even when the lines of a CSV file are not even (:issue:`12203`)
- Bug in ``groupby.transform(..)`` when ``axis=1`` is specified with a non-monotonic ordered index (:issue:`12713`)
- Bug in ``Period`` and ``PeriodIndex`` creation raises ``KeyError`` if ``freq="Minute"`` is specified. Note that "Minute" freq is deprecated in v0.17.0, and recommended to use ``freq="T"`` instead (:issue:`11854`)
- Bug in ``.resample(...).count()`` with a ``PeriodIndex`` always raising a ``TypeError`` (:issue:`12774`)
- Bug in ``.resample(...)`` with a ``PeriodIndex`` casting to a ``DatetimeIndex`` when empty (:issue:`12868`)
- Bug in ``.resample(...)`` with a ``PeriodIndex`` when resampling to an existing frequency (:issue:`12770`)
- Bug in printing data which contains ``Period`` with different ``freq`` raises ``ValueError`` (:issue:`12615`)
- Bug in ``Series`` construction with ``Categorical`` and ``dtype='category'`` is specified (:issue:`12574`)
- Bugs in concatenation with a coercible dtype was too aggressive, resulting in different dtypes in output formatting when an object was longer than ``display.max_rows`` (:issue:`12411`, :issue:`12045`, :issue:`11594`, :issue:`10571`, :issue:`12211`)
- Bug in ``float_format`` option with option not being validated as a callable. (:issue:`12706`)
- Bug in ``GroupBy.filter`` when ``dropna=False`` and no groups fulfilled the criteria (:issue:`12768`)
- Bug in ``__name__`` of ``.cum*`` functions (:issue:`12021`)
- Bug in ``.astype()`` of a ``Float64Inde/Int64Index`` to an ``Int64Index`` (:issue:`12881`)
- Bug in round tripping an integer based index in ``.to_json()/.read_json()`` when ``orient='index'`` (the default) (:issue:`12866`)
- Bug in plotting ``Categorical`` dtypes cause error when attempting stacked bar plot (:issue:`13019`)
- Compat with >= ``numpy`` 1.11 for ``NaT`` comparisons (:issue:`12969`)
- Bug in ``.drop()`` with a non-unique ``MultiIndex``. (:issue:`12701`)
- Bug in ``.concat`` of datetime tz-aware and naive DataFrames (:issue:`12467`)
- Bug in correctly raising a ``ValueError`` in ``.resample(..).fillna(..)`` when passing a non-string (:issue:`12952`)
- Bug fixes in various encoding and header processing issues in ``pd.read_sas()`` (:issue:`12659`, :issue:`12654`, :issue:`12647`, :issue:`12809`)
- Bug in ``pd.crosstab()`` where would silently ignore ``aggfunc`` if ``values=None`` (:issue:`12569`).
- Potential segfault in ``DataFrame.to_json`` when serialising ``datetime.time`` (:issue:`11473`).
- Potential segfault in ``DataFrame.to_json`` when attempting to serialise 0d array (:issue:`11299`).
- Segfault in ``to_json`` when attempting to serialise a ``DataFrame`` or ``Series`` with non-ndarray values; now supports serialization of ``category``, ``sparse``, and ``datetime64[ns, tz]`` dtypes (:issue:`10778`).
- Bug in ``DataFrame.to_json`` with unsupported dtype not passed to default handler (:issue:`12554`).
- Bug in ``.align`` not returning the sub-class (:issue:`12983`)
- Bug in aligning a ``Series`` with a ``DataFrame`` (:issue:`13037`)
- Bug in ``ABCPanel`` in which ``Panel4D`` was not being considered as a valid instance of this generic type (:issue:`12810`)
- Bug in consistency of ``.name`` on ``.groupby(..).apply(..)`` cases (:issue:`12363`)
- Bug in ``Timestamp.__repr__`` that caused ``pprint`` to fail in nested structures (:issue:`12622`)
- Bug in ``Timedelta.min`` and ``Timedelta.max``, the properties now report the true minimum/maximum ``timedeltas`` as recognized by pandas. See the :ref:`documentation <timedeltas.limitations>`. (:issue:`12727`)
- Bug in ``.quantile()`` with interpolation may coerce to ``float`` unexpectedly (:issue:`12772`)
- Bug in ``.quantile()`` with empty ``Series`` may return scalar rather than empty ``Series`` (:issue:`12772`)
- Bug in ``.loc`` with out-of-bounds in a large indexer would raise ``IndexError`` rather than ``KeyError`` (:issue:`12527`)
- Bug in resampling when using a ``TimedeltaIndex`` and ``.asfreq()``, would previously not include the final fencepost (:issue:`12926`)
- Bug in equality testing with a ``Categorical`` in a ``DataFrame`` (:issue:`12564`)
- Bug in ``GroupBy.first()``, ``.last()`` returns incorrect row when ``TimeGrouper`` is used (:issue:`7453`)
- Bug in ``pd.read_csv()`` with the ``c`` engine when specifying ``skiprows`` with newlines in quoted items (:issue:`10911`, :issue:`12775`)
- Bug in ``DataFrame`` timezone lost when assigning tz-aware datetime ``Series`` with alignment (:issue:`12981`)
- Bug in ``.value_counts()`` when ``normalize=True`` and ``dropna=True`` where nulls still contributed to the normalized count (:issue:`12558`)
- Bug in ``Series.value_counts()`` loses name if its dtype is ``category`` (:issue:`12835`)
- Bug in ``Series.value_counts()`` loses timezone info (:issue:`12835`)
- Bug in ``Series.value_counts(normalize=True)`` with ``Categorical`` raises ``UnboundLocalError`` (:issue:`12835`)
- Bug in ``Panel.fillna()`` ignoring ``inplace=True`` (:issue:`12633`)
- Bug in ``pd.read_csv()`` when specifying ``names``, ``usecols``, and ``parse_dates`` simultaneously with the ``c`` engine (:issue:`9755`)
- Bug in ``pd.read_csv()`` when specifying ``delim_whitespace=True`` and ``lineterminator`` simultaneously with the ``c`` engine (:issue:`12912`)
- Bug in ``Series.rename``, ``DataFrame.rename`` and ``DataFrame.rename_axis`` not treating ``Series`` as mappings to relabel (:issue:`12623`).
- Clean in ``.rolling.min`` and ``.rolling.max`` to enhance dtype handling (:issue:`12373`)
- Bug in ``groupby`` where complex types are coerced to float (:issue:`12902`)
- Bug in ``Series.map`` raises ``TypeError`` if its dtype is ``category`` or tz-aware ``datetime`` (:issue:`12473`)
- Bugs on 32bit platforms for some test comparisons (:issue:`12972`)
- Bug in index coercion when falling back from ``RangeIndex`` construction (:issue:`12893`)
- Better error message in window functions when invalid argument (e.g. a float window) is passed (:issue:`12669`)
- Bug in slicing subclassed ``DataFrame`` defined to return subclassed ``Series`` may return normal ``Series`` (:issue:`11559`)
- Bug in ``.str`` accessor methods may raise ``ValueError`` if input has ``name`` and the result is ``DataFrame`` or ``MultiIndex`` (:issue:`12617`)
- Bug in ``DataFrame.last_valid_index()`` and ``DataFrame.first_valid_index()`` on empty frames (:issue:`12800`)
- Bug in ``CategoricalIndex.get_loc`` returns different result from regular ``Index`` (:issue:`12531`)
- Bug in ``PeriodIndex.resample`` where name not propagated (:issue:`12769`)
- Bug in ``date_range`` ``closed`` keyword and timezones (:issue:`12684`).
- Bug in ``pd.concat`` raises ``AttributeError`` when input data contains tz-aware datetime and timedelta (:issue:`12620`)
- Bug in ``pd.concat`` did not handle empty ``Series`` properly (:issue:`11082`)
- Bug in ``.plot.bar`` alignment when ``width`` is specified with ``int`` (:issue:`12979`)
- Bug in ``fill_value`` is ignored if the argument to a binary operator is a constant (:issue:`12723`)
- Bug in ``pd.read_html()`` when using bs4 flavor and parsing table with a header and only one column (:issue:`9178`)
- Bug in ``.pivot_table`` when ``margins=True`` and ``dropna=True`` where nulls still contributed to margin count (:issue:`12577`)
- Bug in ``.pivot_table`` when ``dropna=False`` where table index/column names disappear (:issue:`12133`)
- Bug in ``pd.crosstab()`` when ``margins=True`` and ``dropna=False`` which raised (:issue:`12642`)
- Bug in ``Series.name`` when ``name`` attribute can be a hashable type (:issue:`12610`)
- Bug in ``.describe()`` resets categorical columns information (:issue:`11558`)
- Bug where ``loffset`` argument was not applied when calling ``resample().count()`` on a timeseries (:issue:`12725`)
- ``pd.read_excel()`` now accepts column names associated with keyword argument ``names`` (:issue:`12870`)
- Bug in ``pd.to_numeric()`` with ``Index`` returns ``np.ndarray``, rather than ``Index`` (:issue:`12777`)
- Bug in ``pd.to_numeric()`` with datetime-like may raise ``TypeError`` (:issue:`12777`)
- Bug in ``pd.to_numeric()`` with scalar raises ``ValueError`` (:issue:`12777`)