-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
/
Copy pathv0.7.0.txt
166 lines (135 loc) · 6.09 KB
/
v0.7.0.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
.. _whatsnew_0700:
v.0.7.0 (Not Yet Released)
--------------------------
New features
~~~~~~~~~~~~
- You can now :ref:`set multiple columns <indexing.columns.multiple>` in a
DataFrame via ``__getitem__``, useful for transformation (GH342_)
- New ``merge`` :ref:`function <merging.join>` for efficiently performing full
gamut of database / relational-algebra operations. Refactored existing join
methods to use the new infrastructure, resulting in substantial performance
gains (GH220_, GH249_, GH267_)
- Handle differently-indexed output values in ``DataFrame.apply`` (GH498_)
.. ipython:: python
df = DataFrame(randn(10, 4))
df.apply(lambda x: x.describe())
- :ref:`Can<basics.dataframe.from_list_of_dicts>` pass list of dicts (e.g., a
list of shallow JSON objects) to DataFrame constructor (GH526_)
- :ref:`Add<indexing.reorderlevels>` ``reorder_levels`` method to Series and
DataFrame (PR534_)
- :ref:`Add<indexing.dictionarylike>` dict-like ``get`` function to DataFrame and Panel (PR521_)
- :ref:`Add<basics.iterrows>` ``DataFrame.iterrows`` method for efficiently
iterating through the rows of a DataFrame
- :ref:`Add<dsintro.to_panel>` ``DataFrame.to_panel`` with code adapted from
``LongPanel.to_long``
- :ref:`Add <basics.reindexing>` ``reindex_axis`` method added to DataFrame
- :ref:`Add <basics.stats>` ``level`` option to binary arithmetic functions on
``DataFrame`` and ``Series``
- :ref:`Add <indexing.advanced_reindex>` ``level`` option to the ``reindex``
and ``align`` methods on Series and DataFrame for broadcasting values across
a level (GH542_, PR552_, others)
- :ref:`Add <dsintro.panel_item_selection>` attribute-based item access to
``Panel`` and add IPython completion (PR563_)
- :ref:`Add <visualization.basic>` ``logy`` option to ``Series.plot`` for
log-scaling on the Y axis
- :ref:`Add <io.formatting>` ``index`` and ``header`` options to
``DataFrame.to_string``
Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
:ref:`Cythonized GroupBy aggregations <groupby.aggregate.cython>` no longer
presort the data, thus achieving a significant speedup (GH93_). Here's a
graph of the performance of this operation over time on a dataset with 100,000
rows and 10,000 unique groups:
.. .. code-block:: ipython
.. In [5]: df
.. Out[5]:
.. <class 'pandas.core.frame.DataFrame'>
.. Int64Index: 100000 entries, 0 to 99999
.. Data columns:
.. data 100000 non-null values
.. key1 100000 non-null values
.. key2 100000 non-null values
.. dtypes: float64(1), object(2)
.. In [6]: df[:10]
.. Out[6]:
.. data key1 key2
.. 0 2.708 4 1
.. 1 -1.945 2 4
.. 2 -1.123 2 0
.. 3 0.09592 0 0
.. 4 2.328 0 3
.. 5 -0.6481 0 0
.. 6 0.2957 3 2
.. 7 0.7336 2 1
.. 8 0.371 2 4
.. 9 1.009 2 4
.. In [6]: df.groupby(['key1', 'key2']).sum()
.. Out[6]:
.. data
.. key1 key2
.. 0 0 114
.. 1 -14.69
.. 2 89.06
.. 3 -61.31
.. 4 -32.24
.. 1 0 57.91
.. 1 -16.08
.. 2 -46.51
.. 3 15.46
.. 4 18.96
.. ...
.. image:: vbench/figures/groupby_multi_cython.png
:width: 6in
On this similar vein,
GroupBy aggregations with Python functions significantly sped up by clever
manipulation of the ndarray data type in Cython (GH496_). Benchmark of a
similar operation to the above but using a Python function:
.. image:: vbench/figures/groupby_multi_python.png
:width: 6in
- Better error message in DataFrame constructor when passed column labels
don't match data (GH497_)
- Substantially improve performance of multi-GroupBy aggregation when a
Python function is passed, reuse ndarray object in Cython (GH496_)
- Can store objects indexed by tuples and floats in HDFStore (GH492_)
- Don't print length by default in Series.to_string, add `length` option (GH489_)
- Improve Cython code for multi-groupby to aggregate without having to sort
the data (GH93_)
- Improve MultiIndex reindexing speed by storing tuples in the MultiIndex,
test for backwards unpickling compatibility
- Improve column reindexing performance by using specialized Cython take
function
- Further performance tweaking of Series.__getitem__ for standard use cases
- Avoid Index dict creation in some cases (i.e. when getting slices, etc.),
regression from prior versions
- Friendlier error message in setup.py if NumPy not installed
- Use common set of NA-handling operations (sum, mean, etc.) in Panel class
also (GH536_)
- Default name assignment when calling ``reset_index`` on DataFrame with a
regular (non-hierarchical) index (GH476_)
- Use Cythonized groupers when possible in Series/DataFrame stat ops with
``level`` parameter passed (GH545_)
- Ported skiplist data structure to C to speed up ``rolling_median`` by about
5-10x in most typical use cases (GH374_)
.. _GH220: https://github.com/wesm/pandas/issues/220
.. _GH249: https://github.com/wesm/pandas/issues/249
.. _GH267: https://github.com/wesm/pandas/issues/267
.. _GH342: https://github.com/wesm/pandas/issues/342
.. _GH374: https://github.com/wesm/pandas/issues/374
.. _GH439: https://github.com/wesm/pandas/issues/439
.. _GH476: https://github.com/wesm/pandas/issues/476
.. _GH489: https://github.com/wesm/pandas/issues/489
.. _GH492: https://github.com/wesm/pandas/issues/492
.. _GH496: https://github.com/wesm/pandas/issues/496
.. _GH497: https://github.com/wesm/pandas/issues/497
.. _GH498: https://github.com/wesm/pandas/issues/498
.. _GH526: https://github.com/wesm/pandas/issues/526
.. _GH536: https://github.com/wesm/pandas/issues/536
.. _GH542: https://github.com/wesm/pandas/issues/542
.. _GH545: https://github.com/wesm/pandas/issues/545
.. _GH93: https://github.com/wesm/pandas/issues/93
.. _GH93: https://github.com/wesm/pandas/issues/93
.. _PR521: https://github.com/wesm/pandas/pull/521
.. _PR534: https://github.com/wesm/pandas/pull/534
.. _PR552: https://github.com/wesm/pandas/pull/552
.. _PR554: https://github.com/wesm/pandas/pull/554
.. _PR563: https://github.com/wesm/pandas/pull/563