Skip to content

Commit 4493bf3

Browse files
committed
CLN: rebase to 0.12
BUG: groupby filter that return a series/ndarray truth testing BUG: refixed GH3880, prop name index BUG: not handling sparse block deletes in internals/_delete_from_block BUG: refix generic/truncate TST: refixed generic/replace (bug in core/internals/putmask) revealed as well TST: fix spare_array to put up correct type exceptions rather than Exception CLN: cleanups BUG: fix stata dtype inference (error in core/internals/astype) BUG: fix ujson handling of new series object BUG: fixed scalar coercion (e.g. calling float(series)) to work BUG: fixed astyping with and w/o copy ENH: added _propogate_attributes method to generic.py to allow subclasses to automatically propogate things like name DOC: added v0.13.0.txt feature descriptions CLN: pep8ish cleanups BUG: fix 32-bit,numpy 1.6.1 issue with datetimes in astype_nansafe PERF: speedup for groupby by passing a SNDArray (Series like ndarray) object to evaluation functions if allowed, can avoid Series creation overhead BUG: issue with older numpy (1.6.1) in SeriesGrouper, fallback to passing a Series rather than SNDArray DOC: release notes & doc updates DOC: fixup doc build failures DOC: change pasing of direct ndarrays to cython doc functions (enhancedperformance.rst)
1 parent 8ee0a89 commit 4493bf3

30 files changed

+657
-367
lines changed

doc/source/basics.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -478,7 +478,7 @@ maximum value for each column occurred:
478478
479479
tsdf = DataFrame(randn(1000, 3), columns=['A', 'B', 'C'],
480480
index=date_range('1/1/2000', periods=1000))
481-
tsdf.apply(lambda x: x.index[x.dropna().argmax()])
481+
tsdf.apply(lambda x: x[x.idxmax()])
482482
483483
You may also pass additional arguments and keyword arguments to the ``apply``
484484
method. For instance, consider the following function you would like to apply:

doc/source/dsintro.rst

+17-13
Original file line numberDiff line numberDiff line change
@@ -44,10 +44,15 @@ When using pandas, we recommend the following import convention:
4444
Series
4545
------
4646

47-
:class:`Series` is a one-dimensional labeled array (technically a subclass of
48-
ndarray) capable of holding any data type (integers, strings, floating point
49-
numbers, Python objects, etc.). The axis labels are collectively referred to as
50-
the **index**. The basic method to create a Series is to call:
47+
.. warning::
48+
49+
In 0.13.0 ``Series`` has internaly been refactored to no longer sub-class ``ndarray``
50+
but instead subclass ``NDFrame``, similarly to the rest of the pandas containers. This should be
51+
a transparent change with only very limited API implications (See the :ref:`release notes <release.refactoring_0_13_0>`)
52+
53+
:class:`Series` is a one-dimensional labeled array capable of holding any data
54+
type (integers, strings, floating point numbers, Python objects, etc.). The axis
55+
labels are collectively referred to as the **index**. The basic method to create a Series is to call:
5156

5257
::
5358

@@ -109,9 +114,8 @@ provided. The value will be repeated to match the length of **index**
109114
Series is ndarray-like
110115
~~~~~~~~~~~~~~~~~~~~~~
111116

112-
As a subclass of ndarray, Series is a valid argument to most NumPy functions
113-
and behaves similarly to a NumPy array. However, things like slicing also slice
114-
the index.
117+
``Series`` acts very similary to a ``ndarray``, and is a valid argument to most NumPy functions.
118+
However, things like slicing also slice the index.
115119

116120
.. ipython :: python
117121
@@ -177,7 +181,7 @@ labels.
177181
178182
The result of an operation between unaligned Series will have the **union** of
179183
the indexes involved. If a label is not found in one Series or the other, the
180-
result will be marked as missing (NaN). Being able to write code without doing
184+
result will be marked as missing ``NaN``. Being able to write code without doing
181185
any explicit data alignment grants immense freedom and flexibility in
182186
interactive data analysis and research. The integrated data alignment features
183187
of the pandas data structures set pandas apart from the majority of related
@@ -924,11 +928,11 @@ Here we slice to a Panel4D.
924928
from pandas.core import panelnd
925929
Panel5D = panelnd.create_nd_panel_factory(
926930
klass_name = 'Panel5D',
927-
axis_orders = [ 'cool', 'labels','items','major_axis','minor_axis'],
928-
axis_slices = { 'labels' : 'labels', 'items' : 'items',
929-
'major_axis' : 'major_axis', 'minor_axis' : 'minor_axis' },
930-
slicer = Panel4D,
931-
axis_aliases = { 'major' : 'major_axis', 'minor' : 'minor_axis' },
931+
orders = [ 'cool', 'labels','items','major_axis','minor_axis'],
932+
slices = { 'labels' : 'labels', 'items' : 'items',
933+
'major_axis' : 'major_axis', 'minor_axis' : 'minor_axis' },
934+
slicer = Panel4D,
935+
aliases = { 'major' : 'major_axis', 'minor' : 'minor_axis' },
932936
stat_axis = 2)
933937
934938
p5d = Panel5D(dict(C1 = p4d))

doc/source/enhancingperf.rst

+26-8
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Enhancing Performance
2626
Cython (Writing C extensions for pandas)
2727
----------------------------------------
2828

29-
For many use cases writing pandas in pure python and numpy is sufficient. In some
29+
For many use cases writing pandas in pure python and numpy is sufficient. In some
3030
computationally heavy applications however, it can be possible to achieve sizeable
3131
speed-ups by offloading work to `cython <http://cython.org/>`__.
3232

@@ -68,7 +68,7 @@ Here's the function in pure python:
6868
We achieve our result by by using ``apply`` (row-wise):
6969

7070
.. ipython:: python
71-
71+
7272
%timeit df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)
7373
7474
But clearly this isn't fast enough for us. Let's take a look and see where the
@@ -83,7 +83,7 @@ By far the majority of time is spend inside either ``integrate_f`` or ``f``,
8383
hence we'll concentrate our efforts cythonizing these two functions.
8484

8585
.. note::
86-
86+
8787
In python 2 replacing the ``range`` with its generator counterpart (``xrange``)
8888
would mean the ``range`` line would vanish. In python 3 range is already a generator.
8989

@@ -125,7 +125,7 @@ is here to distinguish between function versions):
125125
126126
%timeit df.apply(lambda x: integrate_f_plain(x['a'], x['b'], x['N']), axis=1)
127127
128-
Already this has shaved a third off, not too bad for a simple copy and paste.
128+
Already this has shaved a third off, not too bad for a simple copy and paste.
129129

130130
.. _enhancingperf.type:
131131

@@ -175,7 +175,7 @@ in python, so maybe we could minimise these by cythonizing the apply part.
175175
We are now passing ndarrays into the cython function, fortunately cython plays
176176
very nicely with numpy.
177177

178-
.. ipython::
178+
.. ipython::
179179

180180
In [4]: %%cython
181181
...: cimport numpy as np
@@ -205,20 +205,38 @@ The implementation is simple, it creates an array of zeros and loops over
205205
the rows, applying our ``integrate_f_typed``, and putting this in the zeros array.
206206

207207

208+
.. warning::
209+
210+
In 0.13.0 since ``Series`` has internaly been refactored to no longer sub-class ``ndarray``
211+
but instead subclass ``NDFrame``, you can **not pass** a ``Series`` directly as a ``ndarray`` typed parameter
212+
to a cython function. Instead pass the actual ``ndarray`` using the ``.values`` attribute of the Series.
213+
214+
Prior to 0.13.0
215+
216+
.. code-block:: python
217+
218+
apply_integrate_f(df['a'], df['b'], df['N'])
219+
220+
Use ``.values`` to get the underlying ``ndarray``
221+
222+
.. code-block:: python
223+
224+
apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
225+
208226
.. note::
209227

210228
Loop like this would be *extremely* slow in python, but in cython looping over
211229
numpy arrays is *fast*.
212230

213231
.. ipython:: python
214232
215-
%timeit apply_integrate_f(df['a'], df['b'], df['N'])
233+
%timeit apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
216234
217235
We've gone another three times faster! Let's check again where the time is spent:
218236

219237
.. ipython:: python
220238
221-
%prun -l 4 apply_integrate_f(df['a'], df['b'], df['N'])
239+
%prun -l 4 apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
222240
223241
As one might expect, the majority of the time is now spent in ``apply_integrate_f``,
224242
so if we wanted to make anymore efficiencies we must continue to concentrate our
@@ -261,7 +279,7 @@ advanced cython techniques:
261279

262280
.. ipython:: python
263281
264-
%timeit apply_integrate_f_wrap(df['a'], df['b'], df['N'])
282+
%timeit apply_integrate_f_wrap(df['a'].values, df['b'].values, df['N'].values)
265283
266284
This shaves another third off!
267285

doc/source/release.rst

+62
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,68 @@ pandas 0.13
115115
- ``MultiIndex.astype()`` now only allows ``np.object_``-like dtypes and
116116
now returns a ``MultiIndex`` rather than an ``Index``. (:issue:`4039`)
117117

118+
**Internal Refactoring**
119+
120+
.. _release.refactoring_0_13_0:
121+
122+
In 0.13.0 there is a major refactor primarily to subclass ``Series`` from ``NDFrame``,
123+
which is the base class currently for ``DataFrame`` and ``Panel``, to unify methods
124+
and behaviors. Series formerly subclassed directly from ``ndarray``.
125+
126+
- Refactor of series.py/frame.py/panel.py to move common code to generic.py
127+
- added _setup_axes to created generic NDFrame structures
128+
- moved methods
129+
130+
- from_axes,_wrap_array,axes,ix,shape,empty,swapaxes,transpose,pop
131+
- __iter__,keys,__contains__,__len__,__neg__,__invert__
132+
- convert_objects,as_blocks,as_matrix,values
133+
- __getstate__,__setstate__ (though compat remains in frame/panel)
134+
- __getattr__,__setattr__
135+
- _indexed_same,reindex_like,reindex,align,where,mask
136+
- filter (also added axis argument to selectively filter on a different axis)
137+
- reindex,reindex_axis (which was the biggest change to make generic)
138+
- truncate (moved to become part of ``NDFrame``)
139+
140+
- These are API changes which make ``Panel`` more consistent with ``DataFrame``
141+
- swapaxes on a Panel with the same axes specified now return a copy
142+
- support attribute access for setting
143+
- filter supports same api as original DataFrame filter
144+
145+
- Reindex called with no arguments will now return a copy of the input object
146+
147+
- Series now inherits from ``NDFrame`` rather than directly from ``ndarray``.
148+
There are several minor changes that affect the API.
149+
150+
- numpy functions that do not support the array interface will now
151+
return ``ndarrays`` rather than series, e.g. ``np.diff`` and ``np.where``
152+
- ``Series(0.5)`` would previously return the scalar ``0.5``, this is no
153+
longer supported
154+
- several methods from frame/series have moved to ``NDFrame``
155+
(convert_objects,where,mask)
156+
- ``TimeSeries`` is now an alias for ``Series``. the property ``is_time_series``
157+
can be used to distinguish (if desired)
158+
159+
- Refactor of Sparse objects to use BlockManager
160+
161+
- Created a new block type in internals, ``SparseBlock``, which can hold multi-dtypes
162+
and is non-consolidatable. ``SparseSeries`` and ``SparseDataFrame`` now inherit
163+
more methods from there hierarchy (Series/DataFrame), and no longer inherit
164+
from ``SparseArray`` (which instead is the object of the ``SparseBlock``)
165+
- Sparse suite now supports integration with non-sparse data. Non-float sparse
166+
data is supportable (partially implemented)
167+
- Operations on sparse structures within DataFrames should preserve sparseness,
168+
merging type operations will convert to dense (and back to sparse), so might
169+
be somewhat inefficient
170+
- enable setitem on ``SparseSeries`` for boolean/integer/slices
171+
- ``SparsePanels`` implementation is unchanged (e.g. not using BlockManager, needs work)
172+
173+
- added ``ftypes`` method to Series/DataFame, similar to ``dtypes``, but indicates
174+
if the underlying is sparse/dense (as well as the dtype)
175+
176+
- All ``NDFrame`` objects now have a ``_prop_attributes``, which can be used to indcated various
177+
values to propogate to a new object from an existing (e.g. name in ``Series`` will follow
178+
more automatically now)
179+
118180
**Experimental Features**
119181

120182
**Bug Fixes**

doc/source/v0.13.0.txt

+61
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,67 @@ Enhancements
134134
from pandas import offsets
135135
td + offsets.Minute(5) + offsets.Milli(5)
136136

137+
Internal Refactoring
138+
~~~~~~~~~~~~~~~~~~~~
139+
140+
In 0.13.0 there is a major refactor primarily to subclass ``Series`` from ``NDFrame``,
141+
which is the base class currently for ``DataFrame`` and ``Panel``, to unify methods
142+
and behaviors. Series formerly subclassed directly from ``ndarray``. (:issue:`4080`,:issue:`3862`,:issue:`816`)
143+
144+
- Refactor of series.py/frame.py/panel.py to move common code to generic.py
145+
- added _setup_axes to created generic NDFrame structures
146+
- moved methods
147+
148+
- from_axes,_wrap_array,axes,ix,shape,empty,swapaxes,transpose,pop
149+
- __iter__,keys,__contains__,__len__,__neg__,__invert__
150+
- convert_objects,as_blocks,as_matrix,values
151+
- __getstate__,__setstate__ (though compat remains in frame/panel)
152+
- __getattr__,__setattr__
153+
- _indexed_same,reindex_like,reindex,align,where,mask
154+
- filter (also added axis argument to selectively filter on a different axis)
155+
- reindex,reindex_axis (which was the biggest change to make generic)
156+
- truncate (moved to become part of ``NDFrame``)
157+
158+
- These are API changes which make ``Panel`` more consistent with ``DataFrame``
159+
- swapaxes on a Panel with the same axes specified now return a copy
160+
- support attribute access for setting
161+
- filter supports same api as original DataFrame filter
162+
163+
- Reindex called with no arguments will now return a copy of the input object
164+
165+
- Series now inherits from ``NDFrame`` rather than directly from ``ndarray``.
166+
There are several minor changes that affect the API.
167+
168+
- numpy functions that do not support the array interface will now
169+
return ``ndarrays`` rather than series, e.g. ``np.diff`` and ``np.where``
170+
- ``Series(0.5)`` would previously return the scalar ``0.5``, this is no
171+
longer supported
172+
- several methods from frame/series have moved to ``NDFrame``
173+
(convert_objects,where,mask)
174+
- ``TimeSeries`` is now an alias for ``Series``. the property ``is_time_series``
175+
can be used to distinguish (if desired)
176+
177+
- Refactor of Sparse objects to use BlockManager
178+
179+
- Created a new block type in internals, ``SparseBlock``, which can hold multi-dtypes
180+
and is non-consolidatable. ``SparseSeries`` and ``SparseDataFrame`` now inherit
181+
more methods from there hierarchy (Series/DataFrame), and no longer inherit
182+
from ``SparseArray`` (which instead is the object of the ``SparseBlock``)
183+
- Sparse suite now supports integration with non-sparse data. Non-float sparse
184+
data is supportable (partially implemented)
185+
- Operations on sparse structures within DataFrames should preserve sparseness,
186+
merging type operations will convert to dense (and back to sparse), so might
187+
be somewhat inefficient
188+
- enable setitem on ``SparseSeries`` for boolean/integer/slices
189+
- ``SparsePanels`` implementation is unchanged (e.g. not using BlockManager, needs work)
190+
191+
- added ``ftypes`` method to Series/DataFame, similar to ``dtypes``, but indicates
192+
if the underlying is sparse/dense (as well as the dtype)
193+
194+
- All ``NDFrame`` objects now have a ``_prop_attributes``, which can be used to indcated various
195+
values to propogate to a new object from an existing (e.g. name in ``Series`` will follow
196+
more automatically now)
197+
137198
Bug Fixes
138199
~~~~~~~~~
139200

pandas/core/array.py

+16
Original file line numberDiff line numberDiff line change
@@ -34,3 +34,19 @@
3434
globals()[_f] = getattr(np.random, _f)
3535

3636
NA = np.nan
37+
38+
#### a series-like ndarray ####
39+
40+
class SNDArray(Array):
41+
42+
def __new__(cls, data, index=None, name=None):
43+
data = data.view(SNDArray)
44+
data.index = index
45+
data.name = name
46+
47+
return data
48+
49+
@property
50+
def values(self):
51+
return self.view(Array)
52+

pandas/core/base.py

-14
Original file line numberDiff line numberDiff line change
@@ -9,20 +9,6 @@ class StringMixin(object):
99
"""implements string methods so long as object defines a `__unicode__` method.
1010
Handles Python2/3 compatibility transparently."""
1111
# side note - this could be made into a metaclass if more than one object nees
12-
def __str__(self):
13-
14-
class PandasObject(object):
15-
""" The base class for pandas objects """
16-
17-
#----------------------------------------------------------------------
18-
# Reconstruction
19-
20-
def save(self, path):
21-
com.save(self, path)
22-
23-
@classmethod
24-
def load(cls, path):
25-
return com.load(path)
2612

2713
#----------------------------------------------------------------------
2814
# Formatting

pandas/core/common.py

+17-8
Original file line numberDiff line numberDiff line change
@@ -45,17 +45,22 @@ class AmbiguousIndexError(PandasError, KeyError):
4545
_DATELIKE_DTYPES = set([ np.dtype(t) for t in ['M8[ns]','m8[ns]'] ])
4646

4747
def is_series(obj):
48-
return getattr(obj,'_typ',None) == 'series'
48+
return getattr(obj, '_typ' ,None) == 'series'
49+
4950
def is_sparse_series(obj):
50-
return getattr(obj,'_subtyp',None) in ('sparse_series','sparse_time_series')
51+
return getattr(obj, '_subtyp', None) in ('sparse_series','sparse_time_series')
52+
5153
def is_sparse_array_like(obj):
52-
return getattr(obj,'_subtyp',None) in ['sparse_array','sparse_series','sparse_array']
54+
return getattr(obj, '_subtyp', None) in ['sparse_array','sparse_series','sparse_array']
55+
5356
def is_dataframe(obj):
54-
return getattr(obj,'_typ',None) == 'dataframe'
57+
return getattr(obj, '_typ', None) == 'dataframe'
58+
5559
def is_panel(obj):
56-
return getattr(obj,'_typ',None) == 'panel'
60+
return getattr(obj, '_typ', None) == 'panel'
61+
5762
def is_generic(obj):
58-
return getattr(obj,'_data',None) is not None
63+
return getattr(obj, '_data', None) is not None
5964

6065
def isnull(obj):
6166
"""Detect missing values (NaN in numeric arrays, None/NaN in object arrays)
@@ -1155,7 +1160,10 @@ def _maybe_box(indexer, values, obj, key):
11551160

11561161
def _values_from_object(o):
11571162
""" return my values or the object if we are say an ndarray """
1158-
return o.get_values() if hasattr(o,'get_values') else o
1163+
f = getattr(o,'get_values',None)
1164+
if f is not None:
1165+
o = f()
1166+
return o
11591167

11601168
def _possibly_convert_objects(values, convert_dates=True, convert_numeric=True):
11611169
""" if we have an object dtype, try to coerce dates and/or numers """
@@ -1733,7 +1741,8 @@ def _is_sequence(x):
17331741

17341742

17351743
def _astype_nansafe(arr, dtype, copy=True):
1736-
""" return a view if copy is False """
1744+
""" return a view if copy is False, but
1745+
need to be very careful as the result shape could change! """
17371746
if not isinstance(dtype, np.dtype):
17381747
dtype = np.dtype(dtype)
17391748

0 commit comments

Comments
 (0)