forked from pandas-dev/pandas
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathv0.14.1.txt
271 lines (227 loc) · 14.8 KB
/
v0.14.1.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
.. _whatsnew_0141:
v0.14.1 (July 11, 2014)
-----------------------
This is a minor release from 0.14.0 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.
- Highlights include:
- New methods :meth:`~pandas.DataFrame.select_dtypes` to select columns
based on the dtype and :meth:`~pandas.Series.sem` to calculate the
standard error of the mean.
- Support for dateutil timezones (see :ref:`docs <timeseries.timezone>`).
- Support for ignoring full line comments in the :func:`~pandas.read_csv`
text parser.
- New documentation section on :ref:`Options and Settings <options>`.
- Lots of bug fixes.
- :ref:`Enhancements <whatsnew_0141.enhancements>`
- :ref:`API Changes <whatsnew_0141.api>`
- :ref:`Performance Improvements <whatsnew_0141.performance>`
- :ref:`Experimental Changes <whatsnew_0141.experimental>`
- :ref:`Bug Fixes <whatsnew_0141.bug_fixes>`
.. _whatsnew_0141.api:
API changes
~~~~~~~~~~~
- Openpyxl now raises a ValueError on construction of the openpyxl writer
instead of warning on pandas import (:issue:`7284`).
- For ``StringMethods.extract``, when no match is found, the result - only
containing ``NaN`` values - now also has ``dtype=object`` instead of
``float`` (:issue:`7242`)
- ``Period`` objects no longer raise a ``TypeError`` when compared using ``==``
with another object that *isn't* a ``Period``. Instead
when comparing a ``Period`` with another object using ``==`` if the other
object isn't a ``Period`` ``False`` is returned. (:issue:`7376`)
- Previously, the behaviour on resetting the time or not in
``offsets.apply``, ``rollforward`` and ``rollback`` operations differed
between offsets. With the support of the ``normalize`` keyword for all offsets(see
below) with a default value of False (preserve time), the behaviour changed for certain
offsets (BusinessMonthBegin, MonthEnd, BusinessMonthEnd, CustomBusinessMonthEnd,
BusinessYearBegin, LastWeekOfMonth, FY5253Quarter, LastWeekOfMonth, Easter):
.. code-block:: ipython
In [6]: from pandas.tseries import offsets
In [7]: d = pd.Timestamp('2014-01-01 09:00')
# old behaviour < 0.14.1
In [8]: d + offsets.MonthEnd()
Out[8]: Timestamp('2014-01-31 00:00:00')
Starting from 0.14.1 all offsets preserve time by default. The old
behaviour can be obtained with ``normalize=True``
.. ipython:: python
:suppress:
import pandas.tseries.offsets as offsets
d = pd.Timestamp('2014-01-01 09:00')
.. ipython:: python
# new behaviour
d + offsets.MonthEnd()
d + offsets.MonthEnd(normalize=True)
Note that for the other offsets the default behaviour did not change.
- Add back ``#N/A N/A`` as a default NA value in text parsing, (regresion from 0.12) (:issue:`5521`)
- Raise a ``TypeError`` on inplace-setting with a ``.where`` and a non ``np.nan`` value as this is inconsistent
with a set-item expression like ``df[mask] = None`` (:issue:`7656`)
.. _whatsnew_0141.enhancements:
Enhancements
~~~~~~~~~~~~
- Add ``dropna`` argument to ``value_counts`` and ``nunique`` (:issue:`5569`).
- Add :meth:`~pandas.DataFrame.select_dtypes` method to allow selection of
columns based on dtype (:issue:`7316`). See :ref:`the docs <basics.selectdtypes>`.
- All ``offsets`` suppports the ``normalize`` keyword to specify whether
``offsets.apply``, ``rollforward`` and ``rollback`` resets the time (hour,
minute, etc) or not (default ``False``, preserves time) (:issue:`7156`):
.. ipython:: python
import pandas.tseries.offsets as offsets
day = offsets.Day()
day.apply(Timestamp('2014-01-01 09:00'))
day = offsets.Day(normalize=True)
day.apply(Timestamp('2014-01-01 09:00'))
- ``PeriodIndex`` is represented as the same format as ``DatetimeIndex`` (:issue:`7601`)
- ``StringMethods`` now work on empty Series (:issue:`7242`)
- The file parsers ``read_csv`` and ``read_table`` now ignore line comments provided by
the parameter `comment`, which accepts only a single character for the C reader.
In particular, they allow for comments before file data begins (:issue:`2685`)
- Add ``NotImplementedError`` for simultaneous use of ``chunksize`` and ``nrows``
for read_csv() (:issue:`6774`).
- Tests for basic reading of public S3 buckets now exist (:issue:`7281`).
- ``read_html`` now sports an ``encoding`` argument that is passed to the
underlying parser library. You can use this to read non-ascii encoded web
pages (:issue:`7323`).
- ``read_excel`` now supports reading from URLs in the same way
that ``read_csv`` does. (:issue:`6809`)
- Support for dateutil timezones, which can now be used in the same way as
pytz timezones across pandas. (:issue:`4688`)
.. ipython:: python
rng = date_range('3/6/2012 00:00', periods=10, freq='D',
tz='dateutil/Europe/London')
rng.tz
See :ref:`the docs <timeseries.timezone>`.
- Implemented ``sem`` (standard error of the mean) operation for ``Series``,
``DataFrame``, ``Panel``, and ``Groupby`` (:issue:`6897`)
- Add ``nlargest`` and ``nsmallest`` to the ``Series`` ``groupby`` whitelist,
which means you can now use these methods on a ``SeriesGroupBy`` object
(:issue:`7053`).
- All offsets ``apply``, ``rollforward`` and ``rollback`` can now handle ``np.datetime64``, previously results in ``ApplyTypeError`` (:issue:`7452`)
- ``Period`` and ``PeriodIndex`` can contain ``NaT`` in its values (:issue:`7485`)
- Support pickling ``Series``, ``DataFrame`` and ``Panel`` objects with
non-unique labels along *item* axis (``index``, ``columns`` and ``items``
respectively) (:issue:`7370`).
- Improved inference of datetime/timedelta with mixed null objects. Regression from 0.13.1 in interpretation of an object Index
with all null elements (:issue:`7431`)
.. _whatsnew_0141.performance:
Performance
~~~~~~~~~~~
- Improvements in dtype inference for numeric operations involving yielding performance gains for dtypes: ``int64``, ``timedelta64``, ``datetime64`` (:issue:`7223`)
- Improvements in Series.transform for significant performance gains (:issue:`6496`)
- Improvements in DataFrame.transform with ufuncs and built-in grouper functions for signifcant performance gains (:issue:`7383`)
- Regression in groupby aggregation of datetime64 dtypes (:issue:`7555`)
- Improvements in `MultiIndex.from_product` for large iterables (:issue:`7627`)
.. _whatsnew_0141.experimental:
Experimental
~~~~~~~~~~~~
- ``pandas.io.data.Options`` has a new method, ``get_all_data`` method, and now consistently returns a
multi-indexed ``DataFrame`` (:issue:`5602`)
- ``io.gbq.read_gbq`` and ``io.gbq.to_gbq`` were refactored to remove the
dependency on the Google ``bq.py`` command line client. This submodule
now uses ``httplib2`` and the Google ``apiclient`` and ``oauth2client`` API client
libraries which should be more stable and, therefore, reliable than
``bq.py``. See :ref:`the docs <io.bigquery>`. (:issue:`6937`).
.. _whatsnew_0141.bug_fixes:
Bug Fixes
~~~~~~~~~
- Bug in ``DataFrame.where`` with a symmetric shaped frame and a passed other of a DataFrame (:issue:`7506`)
- Bug in Panel indexing with a multi-index axis (:issue:`7516`)
- Regression in datetimelike slice indexing with a duplicated index and non-exact end-points (:issue:`7523`)
- Bug in setitem with list-of-lists and single vs mixed types (:issue:`7551`:)
- Bug in timeops with non-aligned Series (:issue:`7500`)
- Bug in timedelta inference when assigning an incomplete Series (:issue:`7592`)
- Bug in groupby ``.nth`` with a Series and integer-like column name (:issue:`7559`)
- Bug in ``Series.get`` with a boolean accessor (:issue:`7407`)
- Bug in ``value_counts`` where ``NaT`` did not qualify as missing (``NaN``) (:issue:`7423`)
- Bug in ``to_timedelta`` that accepted invalid units and misinterpreted 'm/h' (:issue:`7611`, :issue:`6423`)
- Bug in line plot doesn't set correct ``xlim`` if ``secondary_y=True`` (:issue:`7459`)
- Bug in grouped ``hist`` and ``scatter`` plots use old ``figsize`` default (:issue:`7394`)
- Bug in plotting subplots with ``DataFrame.plot``, ``hist`` clears passed ``ax`` even if the number of subplots is one (:issue:`7391`).
- Bug in plotting subplots with ``DataFrame.boxplot`` with ``by`` kw raises ``ValueError`` if the number of subplots exceeds 1 (:issue:`7391`).
- Bug in subplots displays ``ticklabels`` and ``labels`` in different rule (:issue:`5897`)
- Bug in ``Panel.apply`` with a multi-index as an axis (:issue:`7469`)
- Bug in ``DatetimeIndex.insert`` doesn't preserve ``name`` and ``tz`` (:issue:`7299`)
- Bug in ``DatetimeIndex.asobject`` doesn't preserve ``name`` (:issue:`7299`)
- Bug in multi-index slicing with datetimelike ranges (strings and Timestamps), (:issue:`7429`)
- Bug in ``Index.min`` and ``max`` doesn't handle ``nan`` and ``NaT`` properly (:issue:`7261`)
- Bug in ``PeriodIndex.min/max`` results in ``int`` (:issue:`7609`)
- Bug in ``resample`` where ``fill_method`` was ignored if you passed ``how`` (:issue:`2073`)
- Bug in ``TimeGrouper`` doesn't exclude column specified by ``key`` (:issue:`7227`)
- Bug in ``DataFrame`` and ``Series`` bar and barh plot raises ``TypeError`` when ``bottom``
and ``left`` keyword is specified (:issue:`7226`)
- Bug in ``DataFrame.hist`` raises ``TypeError`` when it contains non numeric column (:issue:`7277`)
- Bug in ``Index.delete`` does not preserve ``name`` and ``freq`` attributes (:issue:`7302`)
- Bug in ``DataFrame.query()``/``eval`` where local string variables with the @
sign were being treated as temporaries attempting to be deleted
(:issue:`7300`).
- Bug in ``Float64Index`` which didn't allow duplicates (:issue:`7149`).
- Bug in ``DataFrame.replace()`` where truthy values were being replaced
(:issue:`7140`).
- Bug in ``StringMethods.extract()`` where a single match group Series
would use the matcher's name instead of the group name (:issue:`7313`).
- Bug in ``isnull()`` when ``mode.use_inf_as_null == True`` where isnull
wouldn't test ``True`` when it encountered an ``inf``/``-inf``
(:issue:`7315`).
- Bug in inferred_freq results in None for eastern hemisphere timezones (:issue:`7310`)
- Bug in ``Easter`` returns incorrect date when offset is negative (:issue:`7195`)
- Bug in broadcasting with ``.div``, integer dtypes and divide-by-zero (:issue:`7325`)
- Bug in ``CustomBusinessDay.apply`` raiases ``NameError`` when ``np.datetime64`` object is passed (:issue:`7196`)
- Bug in ``MultiIndex.append``, ``concat`` and ``pivot_table`` don't preserve timezone (:issue:`6606`)
- Bug in ``.loc`` with a list of indexers on a single-multi index level (that is not nested) (:issue:`7349`)
- Bug in ``Series.map`` when mapping a dict with tuple keys of different lengths (:issue:`7333`)
- Bug all ``StringMethods`` now work on empty Series (:issue:`7242`)
- Fix delegation of `read_sql` to `read_sql_query` when query does not contain 'select' (:issue:`7324`).
- Bug where a string column name assignment to a ``DataFrame`` with a
``Float64Index`` raised a ``TypeError`` during a call to ``np.isnan``
(:issue:`7366`).
- Bug where ``NDFrame.replace()`` didn't correctly replace objects with
``Period`` values (:issue:`7379`).
- Bug in ``.ix`` getitem should always return a Series (:issue:`7150`)
- Bug in multi-index slicing with incomplete indexers (:issue:`7399`)
- Bug in multi-index slicing with a step in a sliced level (:issue:`7400`)
- Bug where negative indexers in ``DatetimeIndex`` were not correctly sliced
(:issue:`7408`)
- Bug where ``NaT`` wasn't repr'd correctly in a ``MultiIndex`` (:issue:`7406`,
:issue:`7409`).
- Bug where bool objects were converted to ``nan`` in ``convert_objects``
(:issue:`7416`).
- Bug in ``quantile`` ignoring the axis keyword argument (:issue`7306`)
- Bug where ``nanops._maybe_null_out`` doesn't work with complex numbers
(:issue:`7353`)
- Bug in several ``nanops`` functions when ``axis==0`` for
1-dimensional ``nan`` arrays (:issue:`7354`)
- Bug where ``nanops.nanmedian`` doesn't work when ``axis==None``
(:issue:`7352`)
- Bug where ``nanops._has_infs`` doesn't work with many dtypes
(:issue:`7357`)
- Bug in ``StataReader.data`` where reading a 0-observation dta failed (:issue:`7369`)
- Bug in ``StataReader`` when reading Stata 13 (117) files containing fixed width strings (:issue:`7360`)
- Bug in ``StataWriter`` where encoding was ignored (:issue:`7286`)
- Bug in ``DatetimeIndex`` comparison doesn't handle ``NaT`` properly (:issue:`7529`)
- Bug in passing input with ``tzinfo`` to some offsets ``apply``, ``rollforward`` or ``rollback`` resets ``tzinfo`` or raises ``ValueError`` (:issue:`7465`)
- Bug in ``DatetimeIndex.to_period``, ``PeriodIndex.asobject``, ``PeriodIndex.to_timestamp`` doesn't preserve ``name`` (:issue:`7485`)
- Bug in ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestanp`` handle ``NaT`` incorrectly (:issue:`7228`)
- Bug in ``offsets.apply``, ``rollforward`` and ``rollback`` may return normal ``datetime`` (:issue:`7502`)
- Bug in ``resample`` raises ``ValueError`` when target contains ``NaT`` (:issue:`7227`)
- Bug in ``Timestamp.tz_localize`` resets ``nanosecond`` info (:issue:`7534`)
- Bug in ``DatetimeIndex.asobject`` raises ``ValueError`` when it contains ``NaT`` (:issue:`7539`)
- Bug in ``Timestamp.__new__`` doesn't preserve nanosecond properly (:issue:`7610`)
- Bug in ``Index.astype(float)`` where it would return an ``object`` dtype
``Index`` (:issue:`7464`).
- Bug in ``DataFrame.reset_index`` loses ``tz`` (:issue:`3950`)
- Bug in ``DatetimeIndex.freqstr`` raises ``AttributeError`` when ``freq`` is ``None`` (:issue:`7606`)
- Bug in ``GroupBy.size`` created by ``TimeGrouper`` raises ``AttributeError`` (:issue:`7453`)
- Bug in single column bar plot is misaligned (:issue:`7498`).
- Bug in area plot with tz-aware time series raises ``ValueError`` (:issue:`7471`)
- Bug in non-monotonic ``Index.union`` may preserve ``name`` incorrectly (:issue:`7458`)
- Bug in ``DatetimeIndex.intersection`` doesn't preserve timezone (:issue:`4690`)
- Bug in ``rolling_var`` where a window larger than the array would raise an error(:issue:`7297`)
- Bug with last plotted timeseries dictating ``xlim`` (:issue:`2960`)
- Bug with ``secondary_y`` axis not being considered for timeseries ``xlim`` (:issue:`3490`)
- Bug in ``Float64Index`` assignment with a non scalar indexer (:issue:`7586`)
- Bug in ``pandas.core.strings.str_contains`` does not properly match in a case insensitive fashion when ``regex=False`` and ``case=False`` (:issue:`7505`)
- Bug in ``expanding_cov``, ``expanding_corr``, ``rolling_cov``, and ``rolling_corr`` for two arguments with mismatched index (:issue:`7512`)
- Bug in ``to_sql`` taking the boolean column as text column (:issue:`7678`)
- Bug in grouped `hist` doesn't handle `rot` kw and `sharex` kw properly (:issue:`7234`)
- Bug in ``.loc`` performing fallback integer indexing with ``object`` dtype indices (:issue:`7496`)
- Bug (regression) in ``PeriodIndex`` constructor when passed ``Series`` objects (:issue:`7701`).