forked from pandas-dev/pandas
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathv0.16.0.txt
238 lines (132 loc) · 8.82 KB
/
v0.16.0.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
.. _whatsnew_0160:
v0.16.0 (February ??, 2015)
---------------------------
This is a major release from 0.15.2 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.
- Highlights include:
- Check the :ref:`API Changes <whatsnew_0160.api>` and :ref:`deprecations <whatsnew_0160.deprecations>` before updating
- :ref:`Other Enhancements <whatsnew_0160.enhancements>`
- :ref:`Performance Improvements <whatsnew_0160.performance>`
- :ref:`Bug Fixes <whatsnew_0160.bug_fixes>`
New features
~~~~~~~~~~~~
.. _whatsnew_0160.api:
.. _whatsnew_0160.api_breaking:
Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. _whatsnew_0160.api_breaking.timedelta:
- In v0.15.0 a new scalar type ``Timedelta`` was introduced, that is a
sub-class of ``datetime.timedelta``.
Mentioned :ref:`here <whatsnew_0150.timedeltaindex>` was a notice of an API
change w.r.t. the ``.seconds`` accessor. The intent was to provide a
user-friendly set of accessors that give the 'natural' value for that unit,
e.g. if you had a ``Timedelta('1 day, 10:11:12')``, then ``.seconds`` would
return 12. However, this is at odds with the definition of
``datetime.timedelta``, which defines ``.seconds`` as
``10 * 3600 + 11 * 60 + 12 == 36672``.
So in v0.16.0, we are restoring the API to match that of
``datetime.timedelta``. Further, the component values are still available
through the ``.components`` accessor. This affects the ``.seconds`` and
``.microseconds`` accessors, and removes the ``.hours``, ``.minutes``,
``.milliseconds`` accessors. These changes affect ``TimedeltaIndex``
and the Series ``.dt`` accessor as well. (:issue:`9185`, :issue:`9139`)
Previous Behavior
.. code-block:: python
In [2]: t = pd.Timedelta('1 day, 10:11:12.100123')
In [3]: t.days
Out[3]: 1
In [4]: t.seconds
Out[4]: 12
In [5]: t.microseconds
Out[5]: 123
New Behavior
.. ipython:: python
t = pd.Timedelta('1 day, 10:11:12.100123')
t.days
t.seconds
t.microseconds
Using ``.components`` allows the full component access
.. ipython:: python
t.components
t.components.seconds
- ``Index.duplicated`` now returns `np.array(dtype=bool)` rather than `Index(dtype=object)` containing `bool` values. (:issue:`8875`)
- ``DataFrame.to_json`` now returns accurate type serialisation for each column for frames of mixed dtype (:issue:`9037`)
Previously data was coerced to a common dtype before serialisation, which for
example resulted in integers being serialised to floats:
.. code-block:: python
In [2]: pd.DataFrame({'i': [1,2], 'f': [3.0, 4.2]}).to_json()
Out[2]: '{"f":{"0":3.0,"1":4.2},"i":{"0":1.0,"1":2.0}}'
Now each column is serialised using its correct dtype:
.. code-block:: python
In [2]: pd.DataFrame({'i': [1,2], 'f': [3.0, 4.2]}).to_json()
Out[2]: '{"f":{"0":3.0,"1":4.2},"i":{"0":1,"1":2}}'
- ``DatetimeIndex``, ``PeriodIndex`` and ``TimedeltaIndex.summary`` now output the same format. (:issue:`9116`)
- ``TimedeltaIndex.freqstr`` now output the same string format as ``DatetimeIndex``. (:issue:`9116`)
- Bar and horizontal bar plots no longer add a dashed line along the info axis.
The prior style can be achieved with matplotlib's `axhline` or `axvline`
methods (:issue:`9088`).
Deprecations
~~~~~~~~~~~~
.. _whatsnew_0160.deprecations:
Enhancements
~~~~~~~~~~~~
.. _whatsnew_0160.enhancements:
- Paths beginning with ~ will now be expanded to begin with the user's home directory (:issue:`9066`)
- Added time interval selection in get_data_yahoo (:issue:`9071`)
- Added ``Series.str.slice_replace()``, which previously raised NotImplementedError (:issue:`8888`)
- Added ``Timestamp.to_datetime64()`` to complement ``Timedelta.to_timedelta64()`` (:issue:`9255`)
- ``tseries.frequencies.to_offset()`` now accepts ``Timedelta`` as input (:issue:`9064`)
- Lag parameter was added to the autocorrelation method of Series, defaults to lag-1 autocorrelation (:issue:`9192`)
- ``Timedelta`` will now accept nanoseconds keyword in constructor (:issue:`9273`)
- SQL code now safely escapes table and column names (:issue:`8986`)
- Added auto-complete for ``Series.str.<tab>``, ``Series.dt.<tab>`` and ``Series.cat.<tab>`` (:issue:`9322`)
- Added ``StringMethods.isalnum()``, ``isalpha()``, ``isdigit()``, ``isspace()``, ``islower()``,
``isupper()``, ``istitle()`` which behave as the same as standard ``str`` (:issue:`9282`)
- Added ``StringMethods.ljust()`` and ``rjust()`` which behave as the same as standard ``str`` (:issue:`9352`)
- ``StringMethods.pad()`` and ``center()`` now accept `fillchar` option to specify filling character (:issue:`9352`)
Performance
~~~~~~~~~~~
.. _whatsnew_0160.performance:
- Fixed a performance regression for ``.loc`` indexing with an array or list-like (:issue:`9126`:).
- ``DataFrame.to_json`` 30x performance improvement for mixed dtype frames. (:issue:`9037`)
- Performance improvements in ``MultiIndex.duplicated`` by working with labels instead of values (:issue:`9125`)
- Improved the speed of `nunique` by calling `unique` instead of `value_counts` (:issue:`9129`, :issue:`7771`)
- Performance improvement of up to 10x in ``DataFrame.count`` and ``DataFrame.dropna`` by taking advantage of homogeneous/heterogeneous dtypes appropriately (:issue:`9136`)
- Performance improvement of up to 20x in ``DataFrame.count`` when using a ``MultiIndex`` and the ``level`` keyword argument (:issue:`9163`)
- Performance and memory usage improvements in ``merge`` when key space exceeds ``int64`` bounds (:issue:`9151`)
Bug Fixes
~~~~~~~~~
.. _whatsnew_0160.bug_fixes:
- Fixed issue using `read_csv` on s3 with Python 3.
- Fixed compatibility issue in ``DatetimeIndex`` affecting architectures where ``numpy.int_`` defaults to ``numpy.int32`` (:issue:`8943`)
- Bug in Panel indexing with an object-like (:issue:`9140`)
- Bug in the returned ``Series.dt.components`` index was reset to the default index (:issue:`9247`)
- Fixed bug in ``to_sql`` when mapping a Timestamp object column (datetime
column with timezone info) to the according sqlalchemy type (:issue:`9085`).
- Fixed bug in ``to_sql`` ``dtype`` argument not accepting an instantiated
SQLAlchemy type (:issue:`9083`).
- Fixed bug on bug endian platforms which produced incorrect results in ``StataReader`` (:issue:`8688`).
- Bug in ``MultiIndex.has_duplicates`` when having many levels causes an indexer overflow (:issue:`9075`, :issue:`5873`)
- Bug in ``pivot`` and `unstack`` where ``nan`` values would break index alignment (:issue:`4862`, :issue:`7401`, :issue:`7403`, :issue:`7405`, :issue:`7466`)
- Bug in left ``join`` on multi-index with ``sort=True`` or null values (:issue:`9210`).
- Bug in ``MultiIndex`` where inserting new keys would fail (:issue:`9250`).
- Bug in ``groupby`` when key space exceeds ``int64`` bounds (:issue:`9096`).
- Fixed character encoding bug in ``read_stata`` and ``StataReader`` when loading data from a URL (:issue:`9231`).
- Bug in adding ``offsets.Nano`` to other offets raises ``TypeError`` (:issue:`9284`)
- Bug in DatetimeIndex iteration, related to (:issue:`8890`), fixed in (:issue:`9100`)
- Bug in boxplot, scatter and hexbin plot may show an unnecessary warning (:issue:`8877`)
- Bug in using grouper functions that need passed thru arguments (e.g. axis), when using wrapped function (e.g. ``fillna``), (:issue:`9221`)
- DataFrame now properly supports simultaneous ``copy`` and ``dtype`` arguments in constructor (:issue:`9099`)
- Bug in read_csv when using skiprows on a file with CR line endings with the c engine. (:issue:`9079`)
- isnull now detects ``NaT`` in PeriodIndex (:issue:`9129`)
- Bug in groupby ``.nth()`` with a multiple column groupby (:issue:`8979`)
- Bug in ``DataFrame.where`` and ``Series.where`` coerce numerics to string incorrectly (:issue:`9280`)
- Bug in ``DataFrame.where`` and ``Series.where`` raise ``ValueError`` when string list-like is passed. (:issue:`9280`)
- Accessing ``Series.str`` methods on with non-string values now raises ``TypeError`` instead of producing incorrect results (:issue:`9184`)
- Fixed division by zero error for ``Series.kurt()`` when all values are equal (:issue:`9197`)
- Fixed issue in the ``xlsxwriter`` engine where it added a default 'General' format to cells if no other format wass applied. This prevented other row or column formatting being applied. (:issue:`9167`)
- Fixes issue with ``index_col=False`` when ``usecols`` is also specified in ``read_csv``. (:issue:`9082`)
- Bug where ``wide_to_long`` would modify the input stubnames list (:issue:`9204`)
- Bug in to_sql not storing float64 values using double precision. (:issue:`9009`)
- ``SparseSeries`` and ``SparsePanel`` now accept zero argument constructors (same as their non-sparse counterparts) (:issue:`9272`).