@@ -76,51 +76,51 @@ This also illustrates using the ``by`` parameter to group data before merging.
76
76
77
77
.. ipython :: python
78
78
79
- trades = pd.DataFrame({
80
- ' time' : pd.to_datetime([' 20160525 13:30:00.023' ,
81
- ' 20160525 13:30:00.038' ,
82
- ' 20160525 13:30:00.048' ,
83
- ' 20160525 13:30:00.048' ,
84
- ' 20160525 13:30:00.048' ]),
85
- ' ticker' : [' MSFT' , ' MSFT' ,
86
- ' GOOG' , ' GOOG' , ' AAPL' ],
87
- ' price' : [51.95 , 51.95 ,
88
- 720.77 , 720.92 , 98.00 ],
89
- ' quantity' : [75 , 155 ,
90
- 100 , 100 , 100 ]},
91
- columns = [' time' , ' ticker' , ' price' , ' quantity' ])
92
-
93
- quotes = pd.DataFrame({
94
- ' time' : pd.to_datetime([' 20160525 13:30:00.023' ,
95
- ' 20160525 13:30:00.023' ,
96
- ' 20160525 13:30:00.030' ,
97
- ' 20160525 13:30:00.041' ,
98
- ' 20160525 13:30:00.048' ,
99
- ' 20160525 13:30:00.049' ,
100
- ' 20160525 13:30:00.072' ,
101
- ' 20160525 13:30:00.075' ]),
102
- ' ticker' : [' GOOG' , ' MSFT' , ' MSFT' , ' MSFT' ,
103
- ' GOOG' , ' AAPL' , ' GOOG' , ' MSFT' ],
104
- ' bid' : [720.50 , 51.95 , 51.97 , 51.99 ,
105
- 720.50 , 97.99 , 720.50 , 52.01 ],
106
- ' ask' : [720.93 , 51.96 , 51.98 , 52.00 ,
107
- 720.93 , 98.01 , 720.88 , 52.03 ]},
108
- columns = [' time' , ' ticker' , ' bid' , ' ask' ])
79
+ trades = pd.DataFrame({
80
+ ' time' : pd.to_datetime([' 20160525 13:30:00.023' ,
81
+ ' 20160525 13:30:00.038' ,
82
+ ' 20160525 13:30:00.048' ,
83
+ ' 20160525 13:30:00.048' ,
84
+ ' 20160525 13:30:00.048' ]),
85
+ ' ticker' : [' MSFT' , ' MSFT' ,
86
+ ' GOOG' , ' GOOG' , ' AAPL' ],
87
+ ' price' : [51.95 , 51.95 ,
88
+ 720.77 , 720.92 , 98.00 ],
89
+ ' quantity' : [75 , 155 ,
90
+ 100 , 100 , 100 ]},
91
+ columns = [' time' , ' ticker' , ' price' , ' quantity' ])
92
+
93
+ quotes = pd.DataFrame({
94
+ ' time' : pd.to_datetime([' 20160525 13:30:00.023' ,
95
+ ' 20160525 13:30:00.023' ,
96
+ ' 20160525 13:30:00.030' ,
97
+ ' 20160525 13:30:00.041' ,
98
+ ' 20160525 13:30:00.048' ,
99
+ ' 20160525 13:30:00.049' ,
100
+ ' 20160525 13:30:00.072' ,
101
+ ' 20160525 13:30:00.075' ]),
102
+ ' ticker' : [' GOOG' , ' MSFT' , ' MSFT' , ' MSFT' ,
103
+ ' GOOG' , ' AAPL' , ' GOOG' , ' MSFT' ],
104
+ ' bid' : [720.50 , 51.95 , 51.97 , 51.99 ,
105
+ 720.50 , 97.99 , 720.50 , 52.01 ],
106
+ ' ask' : [720.93 , 51.96 , 51.98 , 52.00 ,
107
+ 720.93 , 98.01 , 720.88 , 52.03 ]},
108
+ columns = [' time' , ' ticker' , ' bid' , ' ask' ])
109
109
110
110
.. ipython :: python
111
111
112
- trades
113
- quotes
112
+ trades
113
+ quotes
114
114
115
115
An asof merge joins on the ``on ``, typically a datetimelike field, which is ordered, and
116
116
in this case we are using a grouper in the ``by `` field. This is like a left-outer join, except
117
117
that forward filling happens automatically taking the most recent non-NaN value.
118
118
119
119
.. ipython :: python
120
120
121
- pd.merge_asof(trades, quotes,
122
- on = ' time' ,
123
- by = ' ticker' )
121
+ pd.merge_asof(trades, quotes,
122
+ on = ' time' ,
123
+ by = ' ticker' )
124
124
125
125
This returns a merged DataFrame with the entries in the same order as the original left
126
126
passed DataFrame (``trades `` in this case), with the fields of the ``quotes `` merged.
@@ -135,17 +135,17 @@ See the full documentation :ref:`here <stats.moments.ts>`.
135
135
136
136
.. ipython :: python
137
137
138
- dft = pd.DataFrame({' B' : [0 , 1 , 2 , np.nan, 4 ]},
139
- index = pd.date_range(' 20130101 09:00:00' ,
140
- periods = 5 , freq = ' s' ))
141
- dft
138
+ dft = pd.DataFrame({' B' : [0 , 1 , 2 , np.nan, 4 ]},
139
+ index = pd.date_range(' 20130101 09:00:00' ,
140
+ periods = 5 , freq = ' s' ))
141
+ dft
142
142
143
143
This is a regular frequency index. Using an integer window parameter works to roll along the window frequency.
144
144
145
145
.. ipython :: python
146
146
147
- dft.rolling(2 ).sum()
148
- dft.rolling(2 , min_periods = 1 ).sum()
147
+ dft.rolling(2 ).sum()
148
+ dft.rolling(2 , min_periods = 1 ).sum()
149
149
150
150
Specifying an offset allows a more intuitive specification of the rolling frequency.
151
151
@@ -271,10 +271,10 @@ Categorical Concatenation
271
271
272
272
.. ipython :: python
273
273
274
- from pandas.api.types import union_categoricals
275
- a = pd.Categorical([" b" , " c" ])
276
- b = pd.Categorical([" a" , " b" ])
277
- union_categoricals([a, b])
274
+ from pandas.api.types import union_categoricals
275
+ a = pd.Categorical([" b" , " c" ])
276
+ b = pd.Categorical([" a" , " b" ])
277
+ union_categoricals([a, b])
278
278
279
279
- ``concat `` and ``append `` now can concat ``category `` dtypes with different ``categories `` as ``object `` dtype (:issue: `13524 `)
280
280
@@ -287,14 +287,14 @@ Categorical Concatenation
287
287
288
288
.. code-block :: ipython
289
289
290
- In [1]: pd.concat([s1, s2])
291
- ValueError: incompatible categories in categorical concat
290
+ In [1]: pd.concat([s1, s2])
291
+ ValueError: incompatible categories in categorical concat
292
292
293
293
**New behavior **:
294
294
295
295
.. ipython :: python
296
296
297
- pd.concat([s1, s2])
297
+ pd.concat([s1, s2])
298
298
299
299
.. _whatsnew_0190.enhancements.semi_month_offsets :
300
300
@@ -307,31 +307,31 @@ These provide date offsets anchored (by default) to the 15th and end of month, a
307
307
308
308
.. ipython :: python
309
309
310
- from pandas.tseries.offsets import SemiMonthEnd, SemiMonthBegin
310
+ from pandas.tseries.offsets import SemiMonthEnd, SemiMonthBegin
311
311
312
312
**SemiMonthEnd **:
313
313
314
314
.. ipython :: python
315
315
316
- pd.Timestamp(' 2016-01-01' ) + SemiMonthEnd()
316
+ pd.Timestamp(' 2016-01-01' ) + SemiMonthEnd()
317
317
318
- pd.date_range(' 2015-01-01' , freq = ' SM' , periods = 4 )
318
+ pd.date_range(' 2015-01-01' , freq = ' SM' , periods = 4 )
319
319
320
320
**SemiMonthBegin **:
321
321
322
322
.. ipython :: python
323
323
324
- pd.Timestamp(' 2016-01-01' ) + SemiMonthBegin()
324
+ pd.Timestamp(' 2016-01-01' ) + SemiMonthBegin()
325
325
326
- pd.date_range(' 2015-01-01' , freq = ' SMS' , periods = 4 )
326
+ pd.date_range(' 2015-01-01' , freq = ' SMS' , periods = 4 )
327
327
328
328
Using the anchoring suffix, you can also specify the day of month to use instead of the 15th.
329
329
330
330
.. ipython :: python
331
331
332
- pd.date_range(' 2015-01-01' , freq = ' SMS-16' , periods = 4 )
332
+ pd.date_range(' 2015-01-01' , freq = ' SMS-16' , periods = 4 )
333
333
334
- pd.date_range(' 2015-01-01' , freq = ' SM-14' , periods = 4 )
334
+ pd.date_range(' 2015-01-01' , freq = ' SM-14' , periods = 4 )
335
335
336
336
.. _whatsnew_0190.enhancements.index :
337
337
@@ -360,11 +360,11 @@ For ``MultiIndex``, values are dropped if any level is missing by default. Speci
360
360
361
361
.. ipython :: python
362
362
363
- midx = pd.MultiIndex.from_arrays([[1 , 2 , np.nan, 4 ],
364
- [1 , 2 , np.nan, np.nan]])
365
- midx
366
- midx.dropna()
367
- midx.dropna(how = ' all' )
363
+ midx = pd.MultiIndex.from_arrays([[1 , 2 , np.nan, 4 ],
364
+ [1 , 2 , np.nan, np.nan]])
365
+ midx
366
+ midx.dropna()
367
+ midx.dropna(how = ' all' )
368
368
369
369
``Index `` now supports ``.str.extractall() `` which returns a ``DataFrame ``, see the :ref: `docs here <text.extractall >` (:issue: `10008 `, :issue: `13156 `)
370
370
@@ -464,23 +464,24 @@ Other enhancements
464
464
465
465
.. ipython :: python
466
466
467
- pd.Timestamp(2012 , 1 , 1 )
467
+ pd.Timestamp(2012 , 1 , 1 )
468
468
469
- pd.Timestamp(year = 2012 , month = 1 , day = 1 , hour = 8 , minute = 30 )
469
+ pd.Timestamp(year = 2012 , month = 1 , day = 1 , hour = 8 , minute = 30 )
470
470
471
471
- The ``.resample() `` function now accepts a ``on= `` or ``level= `` parameter for resampling on a datetimelike column or ``MultiIndex `` level (:issue: `13500 `)
472
472
473
473
.. ipython :: python
474
474
475
- df = pd.DataFrame({' date' : pd.date_range(' 2015-01-01' , freq = ' W' , periods = 5 ),
476
- ' a' : np.arange(5 )},
475
+ df = pd.DataFrame({' date' : pd.date_range(' 2015-01-01' , freq = ' W' , periods = 5 ),
476
+ ' a' : np.arange(5 )},
477
477
index = pd.MultiIndex.from_arrays([[1 , 2 , 3 , 4 , 5 ],
478
- pd.date_range(' 2015-01-01' , freq = ' W' , periods = 5 )],
479
- names = [' v' , ' d' ])
480
- )
481
- df
482
- df.resample(' M' , on = ' date' ).sum()
483
- df.resample(' M' , level = ' d' ).sum()
478
+ pd.date_range(' 2015-01-01' ,
479
+ freq = ' W' ,
480
+ periods = 5 )
481
+ ], names = [' v' , ' d' ]))
482
+ df
483
+ df.resample(' M' , on = ' date' ).sum()
484
+ df.resample(' M' , level = ' d' ).sum()
484
485
485
486
- The ``.get_credentials() `` method of ``GbqConnector `` can now first try to fetch `the application default credentials <https://developers.google.com/identity/protocols/application-default-credentials >`__. See the docs for more details (:issue: `13577 `).
486
487
- The ``.tz_localize() `` method of ``DatetimeIndex `` and ``Timestamp `` has gained the ``errors `` keyword, so you can potentially coerce nonexistent timestamps to ``NaT ``. The default behavior remains to raising a ``NonExistentTimeError `` (:issue: `13057 `)
@@ -975,23 +976,23 @@ Previous behavior:
975
976
976
977
.. code-block :: ipython
977
978
978
- In [1]: pd.Index(['a', 'b']) + pd.Index(['a', 'c'])
979
- FutureWarning: using '+' to provide set union with Indexes is deprecated, use '|' or .union()
980
- Out[1]: Index(['a', 'b', 'c'], dtype='object')
979
+ In [1]: pd.Index(['a', 'b']) + pd.Index(['a', 'c'])
980
+ FutureWarning: using '+' to provide set union with Indexes is deprecated, use '|' or .union()
981
+ Out[1]: Index(['a', 'b', 'c'], dtype='object')
981
982
982
983
**New behavior **: the same operation will now perform element-wise addition:
983
984
984
985
.. ipython :: python
985
986
986
- pd.Index([' a' , ' b' ]) + pd.Index([' a' , ' c' ])
987
+ pd.Index([' a' , ' b' ]) + pd.Index([' a' , ' c' ])
987
988
988
989
Note that numeric Index objects already performed element-wise operations.
989
990
For example, the behavior of adding two integer Indexes is unchanged.
990
991
The base ``Index `` is now made consistent with this behavior.
991
992
992
993
.. ipython :: python
993
994
994
- pd.Index([1 , 2 , 3 ]) + pd.Index([2 , 3 , 4 ])
995
+ pd.Index([1 , 2 , 3 ]) + pd.Index([2 , 3 , 4 ])
995
996
996
997
Further, because of this change, it is now possible to subtract two
997
998
DatetimeIndex objects resulting in a TimedeltaIndex:
@@ -1056,23 +1057,23 @@ Previously, most ``Index`` classes returned ``np.ndarray``, and ``DatetimeIndex`
1056
1057
1057
1058
.. code-block :: ipython
1058
1059
1059
- In [1]: pd.Index([1, 2, 3]).unique()
1060
- Out[1]: array([1, 2, 3])
1060
+ In [1]: pd.Index([1, 2, 3]).unique()
1061
+ Out[1]: array([1, 2, 3])
1061
1062
1062
- In [2]: pd.DatetimeIndex(['2011-01-01', '2011-01-02',
1063
- ...: '2011-01-03'], tz='Asia/Tokyo').unique()
1064
- Out[2]:
1065
- DatetimeIndex(['2011-01-01 00:00:00+09:00', '2011-01-02 00:00:00+09:00',
1066
- '2011-01-03 00:00:00+09:00'],
1067
- dtype='datetime64[ns, Asia/Tokyo]', freq=None)
1063
+ In [2]: pd.DatetimeIndex(['2011-01-01', '2011-01-02',
1064
+ ...: '2011-01-03'], tz='Asia/Tokyo').unique()
1065
+ Out[2]:
1066
+ DatetimeIndex(['2011-01-01 00:00:00+09:00', '2011-01-02 00:00:00+09:00',
1067
+ '2011-01-03 00:00:00+09:00'],
1068
+ dtype='datetime64[ns, Asia/Tokyo]', freq=None)
1068
1069
1069
1070
**New behavior **:
1070
1071
1071
1072
.. ipython :: python
1072
1073
1073
- pd.Index([1 , 2 , 3 ]).unique()
1074
- pd.DatetimeIndex([' 2011-01-01' , ' 2011-01-02' , ' 2011-01-03' ],
1075
- tz = ' Asia/Tokyo' ).unique()
1074
+ pd.Index([1 , 2 , 3 ]).unique()
1075
+ pd.DatetimeIndex([' 2011-01-01' , ' 2011-01-02' , ' 2011-01-03' ],
1076
+ tz = ' Asia/Tokyo' ).unique()
1076
1077
1077
1078
.. _whatsnew_0190.api.multiindex :
1078
1079
@@ -1236,27 +1237,27 @@ Operators now preserve dtypes
1236
1237
1237
1238
.. ipython :: python
1238
1239
1239
- s = pd.SparseSeries([0 , 2 , 0 , 1 ], fill_value = 0 , dtype = np.int64)
1240
- s.dtype
1240
+ s = pd.SparseSeries([0 , 2 , 0 , 1 ], fill_value = 0 , dtype = np.int64)
1241
+ s.dtype
1241
1242
1242
- s + 1
1243
+ s + 1
1243
1244
1244
1245
- Sparse data structure now support ``astype `` to convert internal ``dtype `` (:issue: `13900 `)
1245
1246
1246
1247
.. ipython :: python
1247
1248
1248
- s = pd.SparseSeries([1 ., 0 ., 2 ., 0 .], fill_value = 0 )
1249
- s
1250
- s.astype(np.int64)
1249
+ s = pd.SparseSeries([1 ., 0 ., 2 ., 0 .], fill_value = 0 )
1250
+ s
1251
+ s.astype(np.int64)
1251
1252
1252
1253
`` astype`` fails if data contains values which cannot be converted to specified `` dtype`` .
1253
1254
Note that the limitation is applied to `` fill_value`` which default is `` np.nan`` .
1254
1255
1255
1256
.. code-block :: ipython
1256
1257
1257
- In [7]: pd.SparseSeries([1., np.nan, 2., np.nan], fill_value=np.nan).astype(np.int64)
1258
- Out[7]:
1259
- ValueError: unable to coerce current fill_value nan to int64 dtype
1258
+ In [7]: pd.SparseSeries([1., np.nan, 2., np.nan], fill_value=np.nan).astype(np.int64)
1259
+ Out[7]:
1260
+ ValueError: unable to coerce current fill_value nan to int64 dtype
1260
1261
1261
1262
Other sparse fixes
1262
1263
""""""""""""""""""
0 commit comments