Skip to content

Commit 0327add

Browse files
committed
Merge pull request #3482 from jreback/series2
CLN: series to now inherit from NDFrame
2 parents e52ff84 + 9129dc1 commit 0327add

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+6332
-4590
lines changed

doc/source/basics.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -478,7 +478,7 @@ maximum value for each column occurred:
478478
479479
tsdf = DataFrame(randn(1000, 3), columns=['A', 'B', 'C'],
480480
index=date_range('1/1/2000', periods=1000))
481-
tsdf.apply(lambda x: x.index[x.dropna().argmax()])
481+
tsdf.apply(lambda x: x[x.idxmax()])
482482
483483
You may also pass additional arguments and keyword arguments to the ``apply``
484484
method. For instance, consider the following function you would like to apply:

doc/source/dsintro.rst

+17-13
Original file line numberDiff line numberDiff line change
@@ -44,10 +44,15 @@ When using pandas, we recommend the following import convention:
4444
Series
4545
------
4646

47-
:class:`Series` is a one-dimensional labeled array (technically a subclass of
48-
ndarray) capable of holding any data type (integers, strings, floating point
49-
numbers, Python objects, etc.). The axis labels are collectively referred to as
50-
the **index**. The basic method to create a Series is to call:
47+
.. warning::
48+
49+
In 0.13.0 ``Series`` has internaly been refactored to no longer sub-class ``ndarray``
50+
but instead subclass ``NDFrame``, similarly to the rest of the pandas containers. This should be
51+
a transparent change with only very limited API implications (See the :ref:`Internal Refactoring<whatsnew_0130.refactoring>`)
52+
53+
:class:`Series` is a one-dimensional labeled array capable of holding any data
54+
type (integers, strings, floating point numbers, Python objects, etc.). The axis
55+
labels are collectively referred to as the **index**. The basic method to create a Series is to call:
5156

5257
::
5358

@@ -109,9 +114,8 @@ provided. The value will be repeated to match the length of **index**
109114
Series is ndarray-like
110115
~~~~~~~~~~~~~~~~~~~~~~
111116

112-
As a subclass of ndarray, Series is a valid argument to most NumPy functions
113-
and behaves similarly to a NumPy array. However, things like slicing also slice
114-
the index.
117+
``Series`` acts very similary to a ``ndarray``, and is a valid argument to most NumPy functions.
118+
However, things like slicing also slice the index.
115119

116120
.. ipython :: python
117121
@@ -177,7 +181,7 @@ labels.
177181
178182
The result of an operation between unaligned Series will have the **union** of
179183
the indexes involved. If a label is not found in one Series or the other, the
180-
result will be marked as missing (NaN). Being able to write code without doing
184+
result will be marked as missing ``NaN``. Being able to write code without doing
181185
any explicit data alignment grants immense freedom and flexibility in
182186
interactive data analysis and research. The integrated data alignment features
183187
of the pandas data structures set pandas apart from the majority of related
@@ -924,11 +928,11 @@ Here we slice to a Panel4D.
924928
from pandas.core import panelnd
925929
Panel5D = panelnd.create_nd_panel_factory(
926930
klass_name = 'Panel5D',
927-
axis_orders = [ 'cool', 'labels','items','major_axis','minor_axis'],
928-
axis_slices = { 'labels' : 'labels', 'items' : 'items',
929-
'major_axis' : 'major_axis', 'minor_axis' : 'minor_axis' },
930-
slicer = Panel4D,
931-
axis_aliases = { 'major' : 'major_axis', 'minor' : 'minor_axis' },
931+
orders = [ 'cool', 'labels','items','major_axis','minor_axis'],
932+
slices = { 'labels' : 'labels', 'items' : 'items',
933+
'major_axis' : 'major_axis', 'minor_axis' : 'minor_axis' },
934+
slicer = Panel4D,
935+
aliases = { 'major' : 'major_axis', 'minor' : 'minor_axis' },
932936
stat_axis = 2)
933937
934938
p5d = Panel5D(dict(C1 = p4d))

doc/source/enhancingperf.rst

+26-8
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Enhancing Performance
2626
Cython (Writing C extensions for pandas)
2727
----------------------------------------
2828

29-
For many use cases writing pandas in pure python and numpy is sufficient. In some
29+
For many use cases writing pandas in pure python and numpy is sufficient. In some
3030
computationally heavy applications however, it can be possible to achieve sizeable
3131
speed-ups by offloading work to `cython <http://cython.org/>`__.
3232

@@ -68,7 +68,7 @@ Here's the function in pure python:
6868
We achieve our result by by using ``apply`` (row-wise):
6969

7070
.. ipython:: python
71-
71+
7272
%timeit df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)
7373
7474
But clearly this isn't fast enough for us. Let's take a look and see where the
@@ -83,7 +83,7 @@ By far the majority of time is spend inside either ``integrate_f`` or ``f``,
8383
hence we'll concentrate our efforts cythonizing these two functions.
8484

8585
.. note::
86-
86+
8787
In python 2 replacing the ``range`` with its generator counterpart (``xrange``)
8888
would mean the ``range`` line would vanish. In python 3 range is already a generator.
8989

@@ -125,7 +125,7 @@ is here to distinguish between function versions):
125125
126126
%timeit df.apply(lambda x: integrate_f_plain(x['a'], x['b'], x['N']), axis=1)
127127
128-
Already this has shaved a third off, not too bad for a simple copy and paste.
128+
Already this has shaved a third off, not too bad for a simple copy and paste.
129129

130130
.. _enhancingperf.type:
131131

@@ -175,7 +175,7 @@ in python, so maybe we could minimise these by cythonizing the apply part.
175175
We are now passing ndarrays into the cython function, fortunately cython plays
176176
very nicely with numpy.
177177

178-
.. ipython::
178+
.. ipython::
179179

180180
In [4]: %%cython
181181
...: cimport numpy as np
@@ -205,20 +205,38 @@ The implementation is simple, it creates an array of zeros and loops over
205205
the rows, applying our ``integrate_f_typed``, and putting this in the zeros array.
206206

207207

208+
.. warning::
209+
210+
In 0.13.0 since ``Series`` has internaly been refactored to no longer sub-class ``ndarray``
211+
but instead subclass ``NDFrame``, you can **not pass** a ``Series`` directly as a ``ndarray`` typed parameter
212+
to a cython function. Instead pass the actual ``ndarray`` using the ``.values`` attribute of the Series.
213+
214+
Prior to 0.13.0
215+
216+
.. code-block:: python
217+
218+
apply_integrate_f(df['a'], df['b'], df['N'])
219+
220+
Use ``.values`` to get the underlying ``ndarray``
221+
222+
.. code-block:: python
223+
224+
apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
225+
208226
.. note::
209227

210228
Loop like this would be *extremely* slow in python, but in cython looping over
211229
numpy arrays is *fast*.
212230

213231
.. ipython:: python
214232
215-
%timeit apply_integrate_f(df['a'], df['b'], df['N'])
233+
%timeit apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
216234
217235
We've gone another three times faster! Let's check again where the time is spent:
218236

219237
.. ipython:: python
220238
221-
%prun -l 4 apply_integrate_f(df['a'], df['b'], df['N'])
239+
%prun -l 4 apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
222240
223241
As one might expect, the majority of the time is now spent in ``apply_integrate_f``,
224242
so if we wanted to make anymore efficiencies we must continue to concentrate our
@@ -261,7 +279,7 @@ advanced cython techniques:
261279

262280
.. ipython:: python
263281
264-
%timeit apply_integrate_f_wrap(df['a'], df['b'], df['N'])
282+
%timeit apply_integrate_f_wrap(df['a'].values, df['b'].values, df['N'].values)
265283
266284
This shaves another third off!
267285

doc/source/release.rst

+70
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,76 @@ pandas 0.13
115115
- ``MultiIndex.astype()`` now only allows ``np.object_``-like dtypes and
116116
now returns a ``MultiIndex`` rather than an ``Index``. (:issue:`4039`)
117117

118+
**Internal Refactoring**
119+
120+
In 0.13.0 there is a major refactor primarily to subclass ``Series`` from ``NDFrame``,
121+
which is the base class currently for ``DataFrame`` and ``Panel``, to unify methods
122+
and behaviors. Series formerly subclassed directly from ``ndarray``. (:issue:`4080`, :issue:`3862`, :issue:`816`)
123+
See :ref:`Internal Refactoring<whatsnew_0130.refactoring>`
124+
125+
- Refactor of series.py/frame.py/panel.py to move common code to generic.py
126+
127+
- added ``_setup_axes`` to created generic NDFrame structures
128+
- moved methods
129+
130+
- ``from_axes,_wrap_array,axes,ix,loc,iloc,shape,empty,swapaxes,transpose,pop``
131+
- ``__iter__,keys,__contains__,__len__,__neg__,__invert__``
132+
- ``convert_objects,as_blocks,as_matrix,values``
133+
- ``__getstate__,__setstate__`` (compat remains in frame/panel)
134+
- ``__getattr__,__setattr__``
135+
- ``_indexed_same,reindex_like,align,where,mask``
136+
- ``fillna,replace`` (``Series`` replace is now consistent with ``DataFrame``)
137+
- ``filter`` (also added axis argument to selectively filter on a different axis)
138+
- ``reindex,reindex_axis`` (which was the biggest change to make generic)
139+
- ``truncate`` (moved to become part of ``NDFrame``)
140+
141+
- These are API changes which make ``Panel`` more consistent with ``DataFrame``
142+
143+
- ``swapaxes`` on a ``Panel`` with the same axes specified now return a copy
144+
- support attribute access for setting
145+
- filter supports same api as original ``DataFrame`` filter
146+
147+
- Reindex called with no arguments will now return a copy of the input object
148+
149+
- Series now inherits from ``NDFrame`` rather than directly from ``ndarray``.
150+
There are several minor changes that affect the API.
151+
152+
- numpy functions that do not support the array interface will now
153+
return ``ndarrays`` rather than series, e.g. ``np.diff`` and ``np.ones_like``
154+
- ``Series(0.5)`` would previously return the scalar ``0.5``, this is no
155+
longer supported
156+
- ``TimeSeries`` is now an alias for ``Series``. the property ``is_time_series``
157+
can be used to distinguish (if desired)
158+
159+
- Refactor of Sparse objects to use BlockManager
160+
161+
- Created a new block type in internals, ``SparseBlock``, which can hold multi-dtypes
162+
and is non-consolidatable. ``SparseSeries`` and ``SparseDataFrame`` now inherit
163+
more methods from there hierarchy (Series/DataFrame), and no longer inherit
164+
from ``SparseArray`` (which instead is the object of the ``SparseBlock``)
165+
- Sparse suite now supports integration with non-sparse data. Non-float sparse
166+
data is supportable (partially implemented)
167+
- Operations on sparse structures within DataFrames should preserve sparseness,
168+
merging type operations will convert to dense (and back to sparse), so might
169+
be somewhat inefficient
170+
- enable setitem on ``SparseSeries`` for boolean/integer/slices
171+
- ``SparsePanels`` implementation is unchanged (e.g. not using BlockManager, needs work)
172+
173+
- added ``ftypes`` method to Series/DataFame, similar to ``dtypes``, but indicates
174+
if the underlying is sparse/dense (as well as the dtype)
175+
176+
- All ``NDFrame`` objects now have a ``_prop_attributes``, which can be used to indcated various
177+
values to propogate to a new object from an existing (e.g. name in ``Series`` will follow
178+
more automatically now)
179+
180+
- Internal type checking is now done via a suite of generated classes, allowing ``isinstance(value, klass)``
181+
without having to directly import the klass, courtesy of @jtratner
182+
183+
- Bug in Series update where the parent frame is not updating its cache based on
184+
changes (:issue:`4080`) or types (:issue:`3217`), fillna (:issue:`3386`)
185+
186+
- Indexing with dtype conversions fixed (:issue:`4463`, :issue:`4204`)
187+
118188
**Experimental Features**
119189

120190
**Bug Fixes**

doc/source/v0.13.0.txt

+103
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@ v0.13.0 (August ??, 2013)
66
This is a major release from 0.12.0 and includes several new features and
77
enhancements along with a large number of bug fixes.
88

9+
.. warning::
10+
11+
In 0.13.0 ``Series`` has internaly been refactored to no longer sub-class ``ndarray``
12+
but instead subclass ``NDFrame``, similarly to the rest of the pandas containers. This should be
13+
a transparent change with only very limited API implications. See :ref:`Internal Refactoring<whatsnew_0130.refactoring>`
14+
915
API changes
1016
~~~~~~~~~~~
1117

@@ -134,6 +140,103 @@ Enhancements
134140
from pandas import offsets
135141
td + offsets.Minute(5) + offsets.Milli(5)
136142

143+
.. _whatsnew_0130.refactoring:
144+
145+
Internal Refactoring
146+
~~~~~~~~~~~~~~~~~~~~
147+
148+
In 0.13.0 there is a major refactor primarily to subclass ``Series`` from ``NDFrame``,
149+
which is the base class currently for ``DataFrame`` and ``Panel``, to unify methods
150+
and behaviors. Series formerly subclassed directly from ``ndarray``. (:issue:`4080`, :issue:`3862`, :issue:`816`)
151+
152+
.. warning::
153+
154+
There are two potential incompatibilities from < 0.13.0
155+
156+
- Using certain numpy functions would previously return a ``Series`` if passed a ``Series``
157+
as an argument. This seems only to affect ``np.ones_like``, ``np.empty_like``, and
158+
``np.diff``. These now return ``ndarrays``.
159+
160+
.. ipython:: python
161+
162+
s = Series([1,2,3,4])
163+
164+
# numpy usage
165+
np.ones_like(s)
166+
np.diff(s)
167+
168+
# pandonic usage
169+
Series(1,index=s.index)
170+
s.diff()
171+
172+
- Passing a ``Series`` directly to a cython function expecting an ``ndarray`` type will no
173+
long work directly, you must pass ``Series.values``, See :ref:`Enhancing Performance<enhancingperf.ndarray>`
174+
175+
- ``Series(0.5)`` would previously return the scalar ``0.5``, instead this will return a 1-element ``Series``
176+
177+
- Refactor of series.py/frame.py/panel.py to move common code to generic.py
178+
179+
- added ``_setup_axes`` to created generic NDFrame structures
180+
- moved methods
181+
182+
- ``from_axes,_wrap_array,axes,ix,loc,iloc,shape,empty,swapaxes,transpose,pop``
183+
- ``__iter__,keys,__contains__,__len__,__neg__,__invert__``
184+
- ``convert_objects,as_blocks,as_matrix,values``
185+
- ``__getstate__,__setstate__`` (compat remains in frame/panel)
186+
- ``__getattr__,__setattr__``
187+
- ``_indexed_same,reindex_like,align,where,mask``
188+
- ``fillna,replace`` (``Series`` replace is now consistent with ``DataFrame``)
189+
- ``filter`` (also added axis argument to selectively filter on a different axis)
190+
- ``reindex,reindex_axis`` (which was the biggest change to make generic)
191+
- ``truncate`` (moved to become part of ``NDFrame``)
192+
193+
- These are API changes which make ``Panel`` more consistent with ``DataFrame``
194+
195+
- ``swapaxes`` on a ``Panel`` with the same axes specified now return a copy
196+
- support attribute access for setting
197+
- filter supports same api as original ``DataFrame`` filter
198+
199+
- Reindex called with no arguments will now return a copy of the input object
200+
201+
- Series now inherits from ``NDFrame`` rather than directly from ``ndarray``.
202+
There are several minor changes that affect the API.
203+
204+
- numpy functions that do not support the array interface will now
205+
return ``ndarrays`` rather than series, e.g. ``np.diff`` and ``np.ones_like``
206+
- ``Series(0.5)`` would previously return the scalar ``0.5``, this is no
207+
longer supported
208+
- ``TimeSeries`` is now an alias for ``Series``. the property ``is_time_series``
209+
can be used to distinguish (if desired)
210+
211+
- Refactor of Sparse objects to use BlockManager
212+
213+
- Created a new block type in internals, ``SparseBlock``, which can hold multi-dtypes
214+
and is non-consolidatable. ``SparseSeries`` and ``SparseDataFrame`` now inherit
215+
more methods from there hierarchy (Series/DataFrame), and no longer inherit
216+
from ``SparseArray`` (which instead is the object of the ``SparseBlock``)
217+
- Sparse suite now supports integration with non-sparse data. Non-float sparse
218+
data is supportable (partially implemented)
219+
- Operations on sparse structures within DataFrames should preserve sparseness,
220+
merging type operations will convert to dense (and back to sparse), so might
221+
be somewhat inefficient
222+
- enable setitem on ``SparseSeries`` for boolean/integer/slices
223+
- ``SparsePanels`` implementation is unchanged (e.g. not using BlockManager, needs work)
224+
225+
- added ``ftypes`` method to Series/DataFame, similar to ``dtypes``, but indicates
226+
if the underlying is sparse/dense (as well as the dtype)
227+
228+
- All ``NDFrame`` objects now have a ``_prop_attributes``, which can be used to indcated various
229+
values to propogate to a new object from an existing (e.g. name in ``Series`` will follow
230+
more automatically now)
231+
232+
- Internal type checking is now done via a suite of generated classes, allowing ``isinstance(value, klass)``
233+
without having to directly import the klass, courtesy of @jtratner
234+
235+
- Bug in Series update where the parent frame is not updating its cache based on
236+
changes (:issue:`4080`) or types (:issue:`3217`), fillna (:issue:`3386`)
237+
238+
- Indexing with dtype conversions fixed (:issue:`4463`, :issue:`4204`)
239+
137240
Bug Fixes
138241
~~~~~~~~~
139242

0 commit comments

Comments
 (0)