Skip to content

Commit 0ba8a4f

Browse files
committed
CLN: rebase to 0.12
BUG: groupby filter that return a series/ndarray truth testing BUG: refixed GH3880, prop name index BUG: not handling sparse block deletes in internals/_delete_from_block BUG: refix generic/truncate TST: refixed generic/replace (bug in core/internals/putmask) revealed as well TST: fix spare_array to put up correct type exceptions rather than Exception CLN: cleanups BUG: fix stata dtype inference (error in core/internals/astype) BUG: fix ujson handling of new series object BUG: fixed scalar coercion (e.g. calling float(series)) to work BUG: fixed astyping with and w/o copy ENH: added _propogate_attributes method to generic.py to allow subclasses to automatically propogate things like name DOC: added v0.13.0.txt feature descriptions CLN: pep8ish cleanups BUG: fix 32-bit,numpy 1.6.1 issue with datetimes in astype_nansafe PERF: speedup for groupby by passing a SNDArray (Series like ndarray) object to evaluation functions if allowed, can avoid Series creation overhead BUG: issue with older numpy (1.6.1) in SeriesGrouper, fallback to passing a Series rather than SNDArray DOC: release notes & doc updates DOC: fixup doc build failures DOC: change pasing of direct ndarrays to cython doc functions (enhancedperformance.rst)
1 parent cfe5f00 commit 0ba8a4f

32 files changed

+649
-525
lines changed

doc/source/basics.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -477,7 +477,7 @@ maximum value for each column occurred:
477477
478478
tsdf = DataFrame(randn(1000, 3), columns=['A', 'B', 'C'],
479479
index=date_range('1/1/2000', periods=1000))
480-
tsdf.apply(lambda x: x.index[x.dropna().argmax()])
480+
tsdf.apply(lambda x: x[x.idxmax()])
481481
482482
You may also pass additional arguments and keyword arguments to the ``apply``
483483
method. For instance, consider the following function you would like to apply:

doc/source/dsintro.rst

+17-13
Original file line numberDiff line numberDiff line change
@@ -44,10 +44,15 @@ When using pandas, we recommend the following import convention:
4444
Series
4545
------
4646

47-
:class:`Series` is a one-dimensional labeled array (technically a subclass of
48-
ndarray) capable of holding any data type (integers, strings, floating point
49-
numbers, Python objects, etc.). The axis labels are collectively referred to as
50-
the **index**. The basic method to create a Series is to call:
47+
.. warning::
48+
49+
In 0.13.0 ``Series`` has internaly been refactored to no longer sub-class ``ndarray``
50+
but instead subclass ``NDFrame``, similarly to the rest of the pandas containers. This should be
51+
a transparent change with only very limited API implications (See the :ref:`release notes <release.refactoring_0_13_0>`)
52+
53+
:class:`Series` is a one-dimensional labeled array capable of holding any data
54+
type (integers, strings, floating point numbers, Python objects, etc.). The axis
55+
labels are collectively referred to as the **index**. The basic method to create a Series is to call:
5156

5257
::
5358

@@ -109,9 +114,8 @@ provided. The value will be repeated to match the length of **index**
109114
Series is ndarray-like
110115
~~~~~~~~~~~~~~~~~~~~~~
111116

112-
As a subclass of ndarray, Series is a valid argument to most NumPy functions
113-
and behaves similarly to a NumPy array. However, things like slicing also slice
114-
the index.
117+
``Series`` acts very similary to a ``ndarray``, and is a valid argument to most NumPy functions.
118+
However, things like slicing also slice the index.
115119

116120
.. ipython :: python
117121
@@ -177,7 +181,7 @@ labels.
177181
178182
The result of an operation between unaligned Series will have the **union** of
179183
the indexes involved. If a label is not found in one Series or the other, the
180-
result will be marked as missing (NaN). Being able to write code without doing
184+
result will be marked as missing ``NaN``. Being able to write code without doing
181185
any explicit data alignment grants immense freedom and flexibility in
182186
interactive data analysis and research. The integrated data alignment features
183187
of the pandas data structures set pandas apart from the majority of related
@@ -924,11 +928,11 @@ Here we slice to a Panel4D.
924928
from pandas.core import panelnd
925929
Panel5D = panelnd.create_nd_panel_factory(
926930
klass_name = 'Panel5D',
927-
axis_orders = [ 'cool', 'labels','items','major_axis','minor_axis'],
928-
axis_slices = { 'labels' : 'labels', 'items' : 'items',
929-
'major_axis' : 'major_axis', 'minor_axis' : 'minor_axis' },
930-
slicer = Panel4D,
931-
axis_aliases = { 'major' : 'major_axis', 'minor' : 'minor_axis' },
931+
orders = [ 'cool', 'labels','items','major_axis','minor_axis'],
932+
slices = { 'labels' : 'labels', 'items' : 'items',
933+
'major_axis' : 'major_axis', 'minor_axis' : 'minor_axis' },
934+
slicer = Panel4D,
935+
aliases = { 'major' : 'major_axis', 'minor' : 'minor_axis' },
932936
stat_axis = 2)
933937
934938
p5d = Panel5D(dict(C1 = p4d))

doc/source/enhancingperf.rst

+28-10
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Enhancing Performance
2626
Cython (Writing C extensions for pandas)
2727
----------------------------------------
2828

29-
For many use cases writing pandas in pure python and numpy is sufficient. In some
29+
For many use cases writing pandas in pure python and numpy is sufficient. In some
3030
computationally heavy applications however, it can be possible to achieve sizeable
3131
speed-ups by offloading work to `cython <http://cython.org/>`_.
3232

@@ -35,7 +35,7 @@ trying to remove for loops and making use of numpy vectorization, it's always wo
3535
optimising in python first.
3636

3737
This tutorial walks through a "typical" process of cythonizing a slow computation.
38-
We use an `example from the cython documentation <http://docs.cython.org/src/quickstart/cythonize.html>`_
38+
We use an `example from the cython documentation <http://docs.cython.org/src/quickstart/cythonize.html>`_
3939
but in the context of pandas. Our final cythonized solution is around 100 times
4040
faster than the pure python.
4141

@@ -68,7 +68,7 @@ Here's the function in pure python:
6868
We achieve our result by by using ``apply`` (row-wise):
6969

7070
.. ipython:: python
71-
71+
7272
%timeit df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)
7373
7474
But clearly this isn't fast enough for us. Let's take a look and see where the
@@ -83,7 +83,7 @@ By far the majority of time is spend inside either ``integrate_f`` or ``f``,
8383
hence we'll concentrate our efforts cythonizing these two functions.
8484

8585
.. note::
86-
86+
8787
In python 2 replacing the ``range`` with its generator counterpart (``xrange``)
8888
would mean the ``range`` line would vanish. In python 3 range is already a generator.
8989

@@ -125,7 +125,7 @@ is here to distinguish between function versions):
125125
126126
%timeit df.apply(lambda x: integrate_f_plain(x['a'], x['b'], x['N']), axis=1)
127127
128-
Already this has shaved a third off, not too bad for a simple copy and paste.
128+
Already this has shaved a third off, not too bad for a simple copy and paste.
129129

130130
.. _enhancingperf.type:
131131

@@ -175,7 +175,7 @@ in python, so maybe we could minimise these by cythonizing the apply part.
175175
We are now passing ndarrays into the cython function, fortunately cython plays
176176
very nicely with numpy.
177177

178-
.. ipython::
178+
.. ipython::
179179

180180
In [4]: %%cython
181181
...: cimport numpy as np
@@ -205,20 +205,38 @@ The implementation is simple, it creates an array of zeros and loops over
205205
the rows, applying our ``integrate_f_typed``, and putting this in the zeros array.
206206

207207

208+
.. warning::
209+
210+
In 0.13.0 since ``Series`` has internaly been refactored to no longer sub-class ``ndarray``
211+
but instead subclass ``NDFrame``, you can **not pass** a ``Series`` directly as a ``ndarray`` typed parameter
212+
to a cython function. Instead pass the actual ``ndarray`` using the ``.values`` attribute of the Series.
213+
214+
Prior to 0.13.0
215+
216+
.. code-block:: python
217+
218+
apply_integrate_f(df['a'], df['b'], df['N'])
219+
220+
Use ``.values`` to get the underlying ``ndarray``
221+
222+
.. code-block:: python
223+
224+
apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
225+
208226
.. note::
209227

210228
Loop like this would be *extremely* slow in python, but in cython looping over
211229
numpy arrays is *fast*.
212230

213231
.. ipython:: python
214232
215-
%timeit apply_integrate_f(df['a'], df['b'], df['N'])
233+
%timeit apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
216234
217235
We've gone another three times faster! Let's check again where the time is spent:
218236

219237
.. ipython:: python
220238
221-
%prun -l 4 apply_integrate_f(df['a'], df['b'], df['N'])
239+
%prun -l 4 apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
222240
223241
As one might expect, the majority of the time is now spent in ``apply_integrate_f``,
224242
so if we wanted to make anymore efficiencies we must continue to concentrate our
@@ -261,7 +279,7 @@ advanced cython techniques:
261279

262280
.. ipython:: python
263281
264-
%timeit apply_integrate_f_wrap(df['a'], df['b'], df['N'])
282+
%timeit apply_integrate_f_wrap(df['a'].values, df['b'].values, df['N'].values)
265283
266284
This shaves another third off!
267285

@@ -270,4 +288,4 @@ Further topics
270288

271289
- Loading C modules into cython.
272290

273-
Read more in the `cython docs <http://docs.cython.org/>`_.
291+
Read more in the `cython docs <http://docs.cython.org/>`_.

doc/source/release.rst

+62
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,68 @@ pandas 0.13
4242

4343
**API Changes**
4444

45+
**Internal Refactoring**
46+
47+
.. _release.refactoring_0_13_0:
48+
49+
In 0.13.0 there is a major refactor primarily to subclass ``Series`` from ``NDFrame``,
50+
which is the base class currently for ``DataFrame`` and ``Panel``, to unify methods
51+
and behaviors. Series formerly subclassed directly from ``ndarray``.
52+
53+
- Refactor of series.py/frame.py/panel.py to move common code to generic.py
54+
- added _setup_axes to created generic NDFrame structures
55+
- moved methods
56+
57+
- from_axes,_wrap_array,axes,ix,shape,empty,swapaxes,transpose,pop
58+
- __iter__,keys,__contains__,__len__,__neg__,__invert__
59+
- convert_objects,as_blocks,as_matrix,values
60+
- __getstate__,__setstate__ (though compat remains in frame/panel)
61+
- __getattr__,__setattr__
62+
- _indexed_same,reindex_like,reindex,align,where,mask
63+
- filter (also added axis argument to selectively filter on a different axis)
64+
- reindex,reindex_axis (which was the biggest change to make generic)
65+
- truncate (moved to become part of ``NDFrame``)
66+
67+
- These are API changes which make ``Panel`` more consistent with ``DataFrame``
68+
- swapaxes on a Panel with the same axes specified now return a copy
69+
- support attribute access for setting
70+
- filter supports same api as original DataFrame filter
71+
72+
- Reindex called with no arguments will now return a copy of the input object
73+
74+
- Series now inherits from ``NDFrame`` rather than directly from ``ndarray``.
75+
There are several minor changes that affect the API.
76+
77+
- numpy functions that do not support the array interface will now
78+
return ``ndarrays`` rather than series, e.g. ``np.diff`` and ``np.where``
79+
- ``Series(0.5)`` would previously return the scalar ``0.5``, this is no
80+
longer supported
81+
- several methods from frame/series have moved to ``NDFrame``
82+
(convert_objects,where,mask)
83+
- ``TimeSeries`` is now an alias for ``Series``. the property ``is_time_series``
84+
can be used to distinguish (if desired)
85+
86+
- Refactor of Sparse objects to use BlockManager
87+
88+
- Created a new block type in internals, ``SparseBlock``, which can hold multi-dtypes
89+
and is non-consolidatable. ``SparseSeries`` and ``SparseDataFrame`` now inherit
90+
more methods from there hierarchy (Series/DataFrame), and no longer inherit
91+
from ``SparseArray`` (which instead is the object of the ``SparseBlock``)
92+
- Sparse suite now supports integration with non-sparse data. Non-float sparse
93+
data is supportable (partially implemented)
94+
- Operations on sparse structures within DataFrames should preserve sparseness,
95+
merging type operations will convert to dense (and back to sparse), so might
96+
be somewhat inefficient
97+
- enable setitem on ``SparseSeries`` for boolean/integer/slices
98+
- ``SparsePanels`` implementation is unchanged (e.g. not using BlockManager, needs work)
99+
100+
- added ``ftypes`` method to Series/DataFame, similar to ``dtypes``, but indicates
101+
if the underlying is sparse/dense (as well as the dtype)
102+
103+
- All ``NDFrame`` objects now have a ``_prop_attributes``, which can be used to indcated various
104+
values to propogate to a new object from an existing (e.g. name in ``Series`` will follow
105+
more automatically now)
106+
45107
**Experimental Features**
46108

47109
**Bug Fixes**

doc/source/v0.13.0.txt

+64-3
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,76 @@ enhancements along with a large number of bug fixes.
99
API changes
1010
~~~~~~~~~~~
1111

12-
- ``read_excel`` now supports an integer in its ``sheetname`` argument giving
13-
the index of the sheet to read in (:issue:`4301`).
14-
1512
Enhancements
1613
~~~~~~~~~~~~
1714

1815
- ``read_html`` now raises a ``URLError`` instead of catching and raising a
1916
``ValueError`` (:issue:`4303`, :issue:`4305`)
2017

18+
- ``read_excel`` now supports an integer in its ``sheetname`` argument giving
19+
the index of the sheet to read in (:issue:`4301`).
20+
21+
Internal Refactoring
22+
~~~~~~~~~~~~~~~~~~~~
23+
24+
In 0.13.0 there is a major refactor primarily to subclass ``Series`` from ``NDFrame``,
25+
which is the base class currently for ``DataFrame`` and ``Panel``, to unify methods
26+
and behaviors. Series formerly subclassed directly from ``ndarray``. (:issue:`4080`,:issue:`3862`,:issue:`816`)
27+
28+
- Refactor of series.py/frame.py/panel.py to move common code to generic.py
29+
- added _setup_axes to created generic NDFrame structures
30+
- moved methods
31+
32+
- from_axes,_wrap_array,axes,ix,shape,empty,swapaxes,transpose,pop
33+
- __iter__,keys,__contains__,__len__,__neg__,__invert__
34+
- convert_objects,as_blocks,as_matrix,values
35+
- __getstate__,__setstate__ (though compat remains in frame/panel)
36+
- __getattr__,__setattr__
37+
- _indexed_same,reindex_like,reindex,align,where,mask
38+
- filter (also added axis argument to selectively filter on a different axis)
39+
- reindex,reindex_axis (which was the biggest change to make generic)
40+
- truncate (moved to become part of ``NDFrame``)
41+
42+
- These are API changes which make ``Panel`` more consistent with ``DataFrame``
43+
- swapaxes on a Panel with the same axes specified now return a copy
44+
- support attribute access for setting
45+
- filter supports same api as original DataFrame filter
46+
47+
- Reindex called with no arguments will now return a copy of the input object
48+
49+
- Series now inherits from ``NDFrame`` rather than directly from ``ndarray``.
50+
There are several minor changes that affect the API.
51+
52+
- numpy functions that do not support the array interface will now
53+
return ``ndarrays`` rather than series, e.g. ``np.diff`` and ``np.where``
54+
- ``Series(0.5)`` would previously return the scalar ``0.5``, this is no
55+
longer supported
56+
- several methods from frame/series have moved to ``NDFrame``
57+
(convert_objects,where,mask)
58+
- ``TimeSeries`` is now an alias for ``Series``. the property ``is_time_series``
59+
can be used to distinguish (if desired)
60+
61+
- Refactor of Sparse objects to use BlockManager
62+
63+
- Created a new block type in internals, ``SparseBlock``, which can hold multi-dtypes
64+
and is non-consolidatable. ``SparseSeries`` and ``SparseDataFrame`` now inherit
65+
more methods from there hierarchy (Series/DataFrame), and no longer inherit
66+
from ``SparseArray`` (which instead is the object of the ``SparseBlock``)
67+
- Sparse suite now supports integration with non-sparse data. Non-float sparse
68+
data is supportable (partially implemented)
69+
- Operations on sparse structures within DataFrames should preserve sparseness,
70+
merging type operations will convert to dense (and back to sparse), so might
71+
be somewhat inefficient
72+
- enable setitem on ``SparseSeries`` for boolean/integer/slices
73+
- ``SparsePanels`` implementation is unchanged (e.g. not using BlockManager, needs work)
74+
75+
- added ``ftypes`` method to Series/DataFame, similar to ``dtypes``, but indicates
76+
if the underlying is sparse/dense (as well as the dtype)
77+
78+
- All ``NDFrame`` objects now have a ``_prop_attributes``, which can be used to indcated various
79+
values to propogate to a new object from an existing (e.g. name in ``Series`` will follow
80+
more automatically now)
81+
2182
Bug Fixes
2283
~~~~~~~~~
2384

pandas/core/array.py

+16
Original file line numberDiff line numberDiff line change
@@ -34,3 +34,19 @@
3434
globals()[_f] = getattr(np.random, _f)
3535

3636
NA = np.nan
37+
38+
#### a series-like ndarray ####
39+
40+
class SNDArray(Array):
41+
42+
def __new__(cls, data, index=None, name=None):
43+
data = data.view(SNDArray)
44+
data.index = index
45+
data.name = name
46+
47+
return data
48+
49+
@property
50+
def values(self):
51+
return self.view(Array)
52+

pandas/core/base.py

-14
Original file line numberDiff line numberDiff line change
@@ -8,20 +8,6 @@ class StringMixin(object):
88
"""implements string methods so long as object defines a `__unicode__` method.
99
Handles Python2/3 compatibility transparently."""
1010
# side note - this could be made into a metaclass if more than one object nees
11-
def __str__(self):
12-
13-
class PandasObject(object):
14-
""" The base class for pandas objects """
15-
16-
#----------------------------------------------------------------------
17-
# Reconstruction
18-
19-
def save(self, path):
20-
com.save(self, path)
21-
22-
@classmethod
23-
def load(cls, path):
24-
return com.load(path)
2511

2612
#----------------------------------------------------------------------
2713
# Formatting

0 commit comments

Comments
 (0)