{{ header }}
- New unified :ref:`merge function <merging.join>` for efficiently performing full gamut of database / relational-algebra operations. Refactored existing join methods to use the new infrastructure, resulting in substantial performance gains (:issue:`220`, :issue:`249`, :issue:`267`)
- New :ref:`unified concatenation function <merging.concat>` for concatenating
Series, DataFrame or Panel objects along an axis. Can form union or
intersection of the other axes. Improves performance of
Series.append
andDataFrame.append
(:issue:`468`, :issue:`479`, :issue:`273`) - :ref:`Can <merging.concatenation>` pass multiple DataFrames to
DataFrame.append to concatenate (stack) and multiple Series to
Series.append
too - :ref:`Can<basics.dataframe.from_list_of_dicts>` pass list of dicts (e.g., a list of JSON objects) to DataFrame constructor (:issue:`526`)
- You can now :ref:`set multiple columns <indexing.columns.multiple>` in a
DataFrame via
__getitem__
, useful for transformation (:issue:`342`) - Handle differently-indexed output values in
DataFrame.apply
(:issue:`498`)
.. ipython:: python df = pd.DataFrame(np.random.randn(10, 4)) df.apply(lambda x: x.describe())
- :ref:`Add<advanced.reorderlevels>`
reorder_levels
method to Series and DataFrame (:issue:`534`) - :ref:`Add<indexing.dictionarylike>` dict-like
get
function to DataFrame and Panel (:issue:`521`) - :ref:`Add<basics.iterrows>`
DataFrame.iterrows
method for efficiently iterating through the rows of a DataFrame - :ref:`Add<dsintro.to_panel>`
DataFrame.to_panel
with code adapted fromLongPanel.to_long
- :ref:`Add <basics.reindexing>`
reindex_axis
method added to DataFrame - :ref:`Add <basics.stats>`
level
option to binary arithmetic functions onDataFrame
andSeries
- :ref:`Add <advanced.advanced_reindex>`
level
option to thereindex
andalign
methods on Series and DataFrame for broadcasting values across a level (:issue:`542`, :issue:`552`, others) - :ref:`Add <dsintro.panel_item_selection>` attribute-based item access to
Panel
and add IPython completion (:issue:`563`) - :ref:`Add <visualization.basic>`
logy
option toSeries.plot
for log-scaling on the Y axis - :ref:`Add <io.formatting>`
index
andheader
options toDataFrame.to_string
- :ref:`Can <merging.multiple_join>` pass multiple DataFrames to
DataFrame.join
to join on index (:issue:`115`) - :ref:`Can <merging.multiple_join>` pass multiple Panels to
Panel.join
(:issue:`115`) - :ref:`Added <io.formatting>`
justify
argument toDataFrame.to_string
to allow different alignment of column headers - :ref:`Add <groupby.attributes>`
sort
option to GroupBy to allow disabling sorting of the group keys for potential speedups (:issue:`595`) - :ref:`Can <basics.dataframe.from_series>` pass MaskedArray to Series constructor (:issue:`563`)
- :ref:`Add <dsintro.panel_item_selection>` Panel item access via attributes and IPython completion (:issue:`554`)
- Implement
DataFrame.lookup
, fancy-indexing analogue for retrieving values given a sequence of row and column labels (:issue:`338`) - Can pass a :ref:`list of functions <groupby.aggregate.multifunc>` to aggregate with groupby on a DataFrame, yielding an aggregated result with hierarchical columns (:issue:`166`)
- Can call
cummin
andcummax
on Series and DataFrame to get cumulative minimum and maximum, respectively (:issue:`647`) value_range
added as utility function to get min and max of a dataframe (:issue:`288`)- Added
encoding
argument toread_csv
,read_table
,to_csv
andfrom_csv
for non-ascii text (:issue:`717`) - :ref:`Added <basics.stats>`
abs
method to pandas objects - :ref:`Added <reshaping.pivot>`
crosstab
function for easily computing frequency tables - :ref:`Added <indexing.set_ops>`
isin
method to index objects - :ref:`Added <advanced.xs>`
level
argument toxs
method of DataFrame.
One of the potentially riskiest API changes in 0.7.0, but also one of the most important, was a complete review of how integer indexes are handled with regard to label-based indexing. Here is an example:
.. ipython:: python s = pd.Series(np.random.randn(10), index=range(0, 20, 2)) s s[0] s[2] s[4]
This is all exactly identical to the behavior before. However, if you ask for a
key not contained in the Series, in versions 0.6.1 and prior, Series would
fall back on a location-based lookup. This now raises a KeyError
:
In [2]: s[1]
KeyError: 1
This change also has the same impact on DataFrame:
In [3]: df = pd.DataFrame(np.random.randn(8, 4), index=range(0, 16, 2))
In [4]: df
0 1 2 3
0 0.88427 0.3363 -0.1787 0.03162
2 0.14451 -0.1415 0.2504 0.58374
4 -1.44779 -0.9186 -1.4996 0.27163
6 -0.26598 -2.4184 -0.2658 0.11503
8 -0.58776 0.3144 -0.8566 0.61941
10 0.10940 -0.7175 -1.0108 0.47990
12 -1.16919 -0.3087 -0.6049 -0.43544
14 -0.07337 0.3410 0.0424 -0.16037
In [5]: df.ix[3]
KeyError: 3
In order to support purely integer-based indexing, the following methods have been added:
Method | Description |
---|---|
Series.iget_value(i) |
Retrieve value stored at location i |
Series.iget(i) |
Alias for iget_value |
DataFrame.irow(i) |
Retrieve the i -th row |
DataFrame.icol(j) |
Retrieve the j -th column |
DataFrame.iget_value(i, j) |
Retrieve the value at row i and column j |
Label-based slicing using ix
now requires that the index be sorted
(monotonic) unless both the start and endpoint are contained in the index:
In [1]: s = pd.Series(np.random.randn(6), index=list('gmkaec'))
In [2]: s
Out[2]:
g -1.182230
m -0.276183
k -0.243550
a 1.628992
e 0.073308
c -0.539890
dtype: float64
Then this is OK:
In [3]: s.ix['k':'e']
Out[3]:
k -0.243550
a 1.628992
e 0.073308
dtype: float64
But this is not:
In [12]: s.ix['b':'h']
KeyError 'b'
If the index had been sorted, the "range selection" would have been possible:
In [4]: s2 = s.sort_index()
In [5]: s2
Out[5]:
a 1.628992
c -0.539890
e 0.073308
g -1.182230
k -0.243550
m -0.276183
dtype: float64
In [6]: s2.ix['b':'h']
Out[6]:
c -0.539890
e 0.073308
g -1.182230
dtype: float64
As as notational convenience, you can pass a sequence of labels or a label
slice to a Series when getting and setting values via []
(i.e. the
__getitem__
and __setitem__
methods). The behavior will be the same as
passing similar input to ix
except in the case of integer indexing:
.. ipython:: python s = pd.Series(np.random.randn(6), index=list('acegkm')) s s[['m', 'a', 'c', 'e']] s['b':'l'] s['c':'k']
In the case of integer indexes, the behavior will be exactly as before
(shadowing ndarray
):
.. ipython:: python s = pd.Series(np.random.randn(6), index=range(0, 12, 2)) s[[4, 0, 2]] s[1:5]
If you wish to do indexing with sequences and slicing on an integer index with
label semantics, use ix
.
- The deprecated
LongPanel
class has been completely removed - If
Series.sort
is called on a column of a DataFrame, an exception will now be raised. Before it was possible to accidentally mutate a DataFrame's column by doingdf[col].sort()
instead of the side-effect free methoddf[col].order()
(:issue:`316`) - Miscellaneous renames and deprecations which will (harmlessly) raise
FutureWarning
drop
added as an optional parameter toDataFrame.reset_index
(:issue:`699`)
- :ref:`Cythonized GroupBy aggregations <groupby.aggregate.cython>` no longer presort the data, thus achieving a significant speedup (:issue:`93`). GroupBy aggregations with Python functions significantly sped up by clever manipulation of the ndarray data type in Cython (:issue:`496`).
- Better error message in DataFrame constructor when passed column labels don't match data (:issue:`497`)
- Substantially improve performance of multi-GroupBy aggregation when a Python function is passed, reuse ndarray object in Cython (:issue:`496`)
- Can store objects indexed by tuples and floats in HDFStore (:issue:`492`)
- Don't print length by default in Series.to_string, add length option (:issue:`489`)
- Improve Cython code for multi-groupby to aggregate without having to sort the data (:issue:`93`)
- Improve MultiIndex reindexing speed by storing tuples in the MultiIndex, test for backwards unpickling compatibility
- Improve column reindexing performance by using specialized Cython take function
- Further performance tweaking of Series.__getitem__ for standard use cases
- Avoid Index dict creation in some cases (i.e. when getting slices, etc.), regression from prior versions
- Friendlier error message in setup.py if NumPy not installed
- Use common set of NA-handling operations (sum, mean, etc.) in Panel class also (:issue:`536`)
- Default name assignment when calling
reset_index
on DataFrame with a regular (non-hierarchical) index (:issue:`476`) - Use Cythonized groupers when possible in Series/DataFrame stat ops with
level
parameter passed (:issue:`545`) - Ported skiplist data structure to C to speed up
rolling_median
by about 5-10x in most typical use cases (:issue:`374`)
.. contributors:: v0.6.1..v0.7.0