forked from pandas-dev/pandas
-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathv0.13.0.txt
277 lines (203 loc) · 11.5 KB
/
v0.13.0.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
.. _whatsnew_0130:
v0.13.0 (August ??, 2013)
-------------------------
This is a major release from 0.12.0 and includes several new features and
enhancements along with a large number of bug fixes.
.. warning::
In 0.13.0 ``Series`` has internaly been refactored to no longer sub-class ``ndarray``
but instead subclass ``NDFrame``, similarly to the rest of the pandas containers. This should be
a transparent change with only very limited API implications. See :ref:`Internal Refactoring<whatsnew_0130.refactoring>`
API changes
~~~~~~~~~~~
- ``read_excel`` now supports an integer in its ``sheetname`` argument giving
the index of the sheet to read in (:issue:`4301`).
- Text parser now treats anything that reads like inf ("inf", "Inf", "-Inf",
"iNf", etc.) as infinity. (:issue:`4220`, :issue:`4219`), affecting
``read_table``, ``read_csv``, etc.
- ``pandas`` now is Python 2/3 compatible without the need for 2to3 thanks to
@jtratner. As a result, pandas now uses iterators more extensively. This
also led to the introduction of substantive parts of the Benjamin
Peterson's ``six`` library into compat. (:issue:`4384`, :issue:`4375`,
:issue:`4372`)
- ``pandas.util.compat`` and ``pandas.util.py3compat`` have been merged into
``pandas.compat``. ``pandas.compat`` now includes many functions allowing
2/3 compatibility. It contains both list and iterator versions of range,
filter, map and zip, plus other necessary elements for Python 3
compatibility. ``lmap``, ``lzip``, ``lrange`` and ``lfilter`` all produce
lists instead of iterators, for compatibility with ``numpy``, subscripting
and ``pandas`` constructors.(:issue:`4384`, :issue:`4375`, :issue:`4372`)
- deprecated ``iterkv``, which will be removed in a future release (was just
an alias of iteritems used to get around ``2to3``'s changes).
(:issue:`4384`, :issue:`4375`, :issue:`4372`)
- ``Series.get`` with negative indexers now returns the same as ``[]`` (:issue:`4390`)
- ``HDFStore``
- Significant table writing performance improvements
- added an ``is_open`` property to indicate if the underlying file handle is_open;
a closed store will now report 'CLOSED' when viewing the store (rather than raising an error)
(:issue:`4409`)
- a close of a ``HDFStore`` now will close that instance of the ``HDFStore``
but will only close the actual file if the ref count (by ``PyTables``) w.r.t. all of the open handles
are 0. Essentially you have a local instance of ``HDFStore`` referenced by a variable. Once you
close it, it will report closed. Other references (to the same file) will continue to operate
until they themselves are closed. Performing an action on a closed file will raise
``ClosedFileError``
.. ipython:: python
path = 'test.h5'
df = DataFrame(randn(10,2))
store1 = HDFStore(path)
store2 = HDFStore(path)
store1.append('df',df)
store2.append('df2',df)
store1
store2
store1.close()
store2
store2.close()
store2
- removed the ``_quiet`` attribute, replace by a ``DuplicateWarning`` if retrieving
duplicate rows from a table (:issue:`4367`)
- removed the ``warn`` argument from ``open``. Instead a ``PossibleDataLossError`` exception will
be raised if you try to use ``mode='w'`` with an OPEN file handle (:issue:`4367`)
- allow a passed locations array or mask as a ``where`` condition (:issue:`4467`).
See :ref:`here<io.hdf5-where_mask>` for an example.
.. ipython:: python
:suppress:
import os
os.remove(path)
- Changes to how ``Index`` and ``MultiIndex`` handle metadata (``levels``,
``labels``, and ``names``) (:issue:`4039`):
..code-block ::
# previously, you would have set levels or labels directly
index.levels = [[1, 2, 3, 4], [1, 2, 4, 4]]
# now, you use the set_levels or set_labels methods
index = index.set_levels([[1, 2, 3, 4], [1, 2, 4, 4]])
# similarly, for names, you can rename the object
# but setting names is not deprecated.
index = index.set_names(["bob", "cranberry"])
# and all methods take an inplace kwarg
index.set_names(["bob", "cranberry"], inplace=True)
- Infer and downcast dtype if ``downcast='infer'`` is passed to ``fillna/ffill/bfill`` (:issue:`4604`)
Enhancements
~~~~~~~~~~~~
- ``read_html`` now raises a ``URLError`` instead of catching and raising a
``ValueError`` (:issue:`4303`, :issue:`4305`)
- Added a test for ``read_clipboard()`` and ``to_clipboard()`` (:issue:`4282`)
- Clipboard functionality now works with PySide (:issue:`4282`)
- Added a more informative error message when plot arguments contain
overlapping color and style arguments (:issue:`4402`)
- ``timedelta64[ns]`` operations
- A Series of dtype ``timedelta64[ns]`` can now be divided by another
``timedelta64[ns]`` object to yield a ``float64`` dtyped Series. This
is frequency conversion. See :ref:`here<timeseries.timedeltas_convert>` for the docs.
.. ipython:: python
from datetime import timedelta
td = Series(date_range('20130101',periods=4))-Series(date_range('20121201',periods=4))
td[2] += np.timedelta64(timedelta(minutes=5,seconds=3))
td[3] = np.nan
td
# to days
td / np.timedelta64(1,'D')
# to seconds
td / np.timedelta64(1,'s')
- Dividing or multiplying a ``timedelta64[ns]`` Series by an integer or integer Series
.. ipython:: python
td * -1
td * Series([1,2,3,4])
- Absolute ``DateOffset`` objects can act equivalenty to ``timedeltas``
.. ipython:: python
from pandas import offsets
td + offsets.Minute(5) + offsets.Milli(5)
.. _whatsnew_0130.refactoring:
Internal Refactoring
~~~~~~~~~~~~~~~~~~~~
In 0.13.0 there is a major refactor primarily to subclass ``Series`` from ``NDFrame``,
which is the base class currently for ``DataFrame`` and ``Panel``, to unify methods
and behaviors. Series formerly subclassed directly from ``ndarray``. (:issue:`4080`, :issue:`3862`, :issue:`816`)
.. warning::
There are two potential incompatibilities from < 0.13.0
- Using certain numpy functions would previously return a ``Series`` if passed a ``Series``
as an argument. This seems only to affect ``np.ones_like``, ``np.empty_like``, and
``np.diff``. These now return ``ndarrays``.
.. ipython:: python
s = Series([1,2,3,4])
# numpy usage
np.ones_like(s)
np.diff(s)
# pandonic usage
Series(1,index=s.index)
s.diff()
- Passing a ``Series`` directly to a cython function expecting an ``ndarray`` type will no
long work directly, you must pass ``Series.values``, See :ref:`Enhancing Performance<enhancingperf.ndarray>`
- ``Series(0.5)`` would previously return the scalar ``0.5``, instead this will return a 1-element ``Series``
- Refactor of series.py/frame.py/panel.py to move common code to generic.py
- added ``_setup_axes`` to created generic NDFrame structures
- moved methods
- ``from_axes,_wrap_array,axes,ix,loc,iloc,shape,empty,swapaxes,transpose,pop``
- ``__iter__,keys,__contains__,__len__,__neg__,__invert__``
- ``convert_objects,as_blocks,as_matrix,values``
- ``__getstate__,__setstate__`` (compat remains in frame/panel)
- ``__getattr__,__setattr__``
- ``_indexed_same,reindex_like,align,where,mask``
- ``fillna,replace`` (``Series`` replace is now consistent with ``DataFrame``)
- ``filter`` (also added axis argument to selectively filter on a different axis)
- ``reindex,reindex_axis`` (which was the biggest change to make generic)
- ``truncate`` (moved to become part of ``NDFrame``)
- These are API changes which make ``Panel`` more consistent with ``DataFrame``
- ``swapaxes`` on a ``Panel`` with the same axes specified now return a copy
- support attribute access for setting
- filter supports same api as original ``DataFrame`` filter
- Reindex called with no arguments will now return a copy of the input object
- Series now inherits from ``NDFrame`` rather than directly from ``ndarray``.
There are several minor changes that affect the API.
- numpy functions that do not support the array interface will now
return ``ndarrays`` rather than series, e.g. ``np.diff`` and ``np.ones_like``
- ``Series(0.5)`` would previously return the scalar ``0.5``, this is no
longer supported
- ``TimeSeries`` is now an alias for ``Series``. the property ``is_time_series``
can be used to distinguish (if desired)
- Refactor of Sparse objects to use BlockManager
- Created a new block type in internals, ``SparseBlock``, which can hold multi-dtypes
and is non-consolidatable. ``SparseSeries`` and ``SparseDataFrame`` now inherit
more methods from there hierarchy (Series/DataFrame), and no longer inherit
from ``SparseArray`` (which instead is the object of the ``SparseBlock``)
- Sparse suite now supports integration with non-sparse data. Non-float sparse
data is supportable (partially implemented)
- Operations on sparse structures within DataFrames should preserve sparseness,
merging type operations will convert to dense (and back to sparse), so might
be somewhat inefficient
- enable setitem on ``SparseSeries`` for boolean/integer/slices
- ``SparsePanels`` implementation is unchanged (e.g. not using BlockManager, needs work)
- added ``ftypes`` method to Series/DataFame, similar to ``dtypes``, but indicates
if the underlying is sparse/dense (as well as the dtype)
- All ``NDFrame`` objects now have a ``_prop_attributes``, which can be used to indcated various
values to propogate to a new object from an existing (e.g. name in ``Series`` will follow
more automatically now)
- Internal type checking is now done via a suite of generated classes, allowing ``isinstance(value, klass)``
without having to directly import the klass, courtesy of @jtratner
- Bug in Series update where the parent frame is not updating its cache based on
changes (:issue:`4080`) or types (:issue:`3217`), fillna (:issue:`3386`)
- Indexing with dtype conversions fixed (:issue:`4463`, :issue:`4204`)
- Refactor Series.reindex to core/generic.py (:issue:`4604`, :issue:`4618`), allow ``method=`` in reindexing
on a Series to work
Bug Fixes
~~~~~~~~~
- ``HDFStore`` raising an invalid ``TypeError`` rather than ``ValueError`` when appending
with a different block ordering (:issue:`4096`)
- The ``by`` argument now works correctly with the ``layout`` argument
(:issue:`4102`, :issue:`4014`) in ``*.hist`` plotting methods
- Fixed bug in ``PeriodIndex.map`` where using ``str`` would return the str
representation of the index (:issue:`4136`)
- Fixed (:issue:`3334`) in pivot_table. Margins did not compute if values is the index.
- Fixed test failure ``test_time_series_plot_color_with_empty_kwargs`` when
using custom matplotlib default colors (:issue:`4345`)
- Fix running of stata IO tests. Now uses temporary files to write
(:issue:`4353`)
- Fixed an issue where ``DataFrame.sum`` was slower than ``DataFrame.mean``
for integer valued frames (:issue:`4365`)
- ``read_html`` tests now work with Python 2.6 (:issue:`4351`)
- Fixed bug where ``network`` testing was throwing ``NameError`` because a
local variable was undefined (:issue:`4381`)
- Suppressed DeprecationWarning associated with internal calls issued by repr() (:issue:`4391`)
See the :ref:`full release notes
<release>` or issue tracker
on GitHub for a complete list.