Skip to content

CLN: series to now inherit from NDFrame #3482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 16, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -478,7 +478,7 @@ maximum value for each column occurred:

tsdf = DataFrame(randn(1000, 3), columns=['A', 'B', 'C'],
index=date_range('1/1/2000', periods=1000))
tsdf.apply(lambda x: x.index[x.dropna().argmax()])
tsdf.apply(lambda x: x[x.idxmax()])

You may also pass additional arguments and keyword arguments to the ``apply``
method. For instance, consider the following function you would like to apply:
Expand Down
30 changes: 17 additions & 13 deletions doc/source/dsintro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,15 @@ When using pandas, we recommend the following import convention:
Series
------

:class:`Series` is a one-dimensional labeled array (technically a subclass of
ndarray) capable of holding any data type (integers, strings, floating point
numbers, Python objects, etc.). The axis labels are collectively referred to as
the **index**. The basic method to create a Series is to call:
.. warning::

In 0.13.0 ``Series`` has internaly been refactored to no longer sub-class ``ndarray``
but instead subclass ``NDFrame``, similarly to the rest of the pandas containers. This should be
a transparent change with only very limited API implications (See the :ref:`Internal Refactoring<whatsnew_0130.refactoring>`)

:class:`Series` is a one-dimensional labeled array capable of holding any data
type (integers, strings, floating point numbers, Python objects, etc.). The axis
labels are collectively referred to as the **index**. The basic method to create a Series is to call:

::

Expand Down Expand Up @@ -109,9 +114,8 @@ provided. The value will be repeated to match the length of **index**
Series is ndarray-like
~~~~~~~~~~~~~~~~~~~~~~

As a subclass of ndarray, Series is a valid argument to most NumPy functions
and behaves similarly to a NumPy array. However, things like slicing also slice
the index.
``Series`` acts very similary to a ``ndarray``, and is a valid argument to most NumPy functions.
However, things like slicing also slice the index.

.. ipython :: python

Expand Down Expand Up @@ -177,7 +181,7 @@ labels.

The result of an operation between unaligned Series will have the **union** of
the indexes involved. If a label is not found in one Series or the other, the
result will be marked as missing (NaN). Being able to write code without doing
result will be marked as missing ``NaN``. Being able to write code without doing
any explicit data alignment grants immense freedom and flexibility in
interactive data analysis and research. The integrated data alignment features
of the pandas data structures set pandas apart from the majority of related
Expand Down Expand Up @@ -924,11 +928,11 @@ Here we slice to a Panel4D.
from pandas.core import panelnd
Panel5D = panelnd.create_nd_panel_factory(
klass_name = 'Panel5D',
axis_orders = [ 'cool', 'labels','items','major_axis','minor_axis'],
axis_slices = { 'labels' : 'labels', 'items' : 'items',
'major_axis' : 'major_axis', 'minor_axis' : 'minor_axis' },
slicer = Panel4D,
axis_aliases = { 'major' : 'major_axis', 'minor' : 'minor_axis' },
orders = [ 'cool', 'labels','items','major_axis','minor_axis'],
slices = { 'labels' : 'labels', 'items' : 'items',
'major_axis' : 'major_axis', 'minor_axis' : 'minor_axis' },
slicer = Panel4D,
aliases = { 'major' : 'major_axis', 'minor' : 'minor_axis' },
stat_axis = 2)

p5d = Panel5D(dict(C1 = p4d))
Expand Down
34 changes: 26 additions & 8 deletions doc/source/enhancingperf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Enhancing Performance
Cython (Writing C extensions for pandas)
----------------------------------------

For many use cases writing pandas in pure python and numpy is sufficient. In some
For many use cases writing pandas in pure python and numpy is sufficient. In some
computationally heavy applications however, it can be possible to achieve sizeable
speed-ups by offloading work to `cython <http://cython.org/>`__.

Expand Down Expand Up @@ -68,7 +68,7 @@ Here's the function in pure python:
We achieve our result by by using ``apply`` (row-wise):

.. ipython:: python

%timeit df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)

But clearly this isn't fast enough for us. Let's take a look and see where the
Expand All @@ -83,7 +83,7 @@ By far the majority of time is spend inside either ``integrate_f`` or ``f``,
hence we'll concentrate our efforts cythonizing these two functions.

.. note::

In python 2 replacing the ``range`` with its generator counterpart (``xrange``)
would mean the ``range`` line would vanish. In python 3 range is already a generator.

Expand Down Expand Up @@ -125,7 +125,7 @@ is here to distinguish between function versions):

%timeit df.apply(lambda x: integrate_f_plain(x['a'], x['b'], x['N']), axis=1)

Already this has shaved a third off, not too bad for a simple copy and paste.
Already this has shaved a third off, not too bad for a simple copy and paste.

.. _enhancingperf.type:

Expand Down Expand Up @@ -175,7 +175,7 @@ in python, so maybe we could minimise these by cythonizing the apply part.
We are now passing ndarrays into the cython function, fortunately cython plays
very nicely with numpy.

.. ipython::
.. ipython::

In [4]: %%cython
...: cimport numpy as np
Expand Down Expand Up @@ -205,20 +205,38 @@ The implementation is simple, it creates an array of zeros and loops over
the rows, applying our ``integrate_f_typed``, and putting this in the zeros array.


.. warning::

In 0.13.0 since ``Series`` has internaly been refactored to no longer sub-class ``ndarray``
but instead subclass ``NDFrame``, you can **not pass** a ``Series`` directly as a ``ndarray`` typed parameter
to a cython function. Instead pass the actual ``ndarray`` using the ``.values`` attribute of the Series.

Prior to 0.13.0

.. code-block:: python

apply_integrate_f(df['a'], df['b'], df['N'])

Use ``.values`` to get the underlying ``ndarray``

.. code-block:: python

apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)

.. note::

Loop like this would be *extremely* slow in python, but in cython looping over
numpy arrays is *fast*.

.. ipython:: python

%timeit apply_integrate_f(df['a'], df['b'], df['N'])
%timeit apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)

We've gone another three times faster! Let's check again where the time is spent:

.. ipython:: python

%prun -l 4 apply_integrate_f(df['a'], df['b'], df['N'])
%prun -l 4 apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)

As one might expect, the majority of the time is now spent in ``apply_integrate_f``,
so if we wanted to make anymore efficiencies we must continue to concentrate our
Expand Down Expand Up @@ -261,7 +279,7 @@ advanced cython techniques:

.. ipython:: python

%timeit apply_integrate_f_wrap(df['a'], df['b'], df['N'])
%timeit apply_integrate_f_wrap(df['a'].values, df['b'].values, df['N'].values)

This shaves another third off!

Expand Down
70 changes: 70 additions & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,76 @@ pandas 0.13
- ``MultiIndex.astype()`` now only allows ``np.object_``-like dtypes and
now returns a ``MultiIndex`` rather than an ``Index``. (:issue:`4039`)

**Internal Refactoring**

In 0.13.0 there is a major refactor primarily to subclass ``Series`` from ``NDFrame``,
which is the base class currently for ``DataFrame`` and ``Panel``, to unify methods
and behaviors. Series formerly subclassed directly from ``ndarray``. (:issue:`4080`, :issue:`3862`, :issue:`816`)
See :ref:`Internal Refactoring<whatsnew_0130.refactoring>`

- Refactor of series.py/frame.py/panel.py to move common code to generic.py

- added ``_setup_axes`` to created generic NDFrame structures
- moved methods

- ``from_axes,_wrap_array,axes,ix,loc,iloc,shape,empty,swapaxes,transpose,pop``
- ``__iter__,keys,__contains__,__len__,__neg__,__invert__``
- ``convert_objects,as_blocks,as_matrix,values``
- ``__getstate__,__setstate__`` (compat remains in frame/panel)
- ``__getattr__,__setattr__``
- ``_indexed_same,reindex_like,align,where,mask``
- ``fillna,replace`` (``Series`` replace is now consistent with ``DataFrame``)
- ``filter`` (also added axis argument to selectively filter on a different axis)
- ``reindex,reindex_axis`` (which was the biggest change to make generic)
- ``truncate`` (moved to become part of ``NDFrame``)

- These are API changes which make ``Panel`` more consistent with ``DataFrame``

- ``swapaxes`` on a ``Panel`` with the same axes specified now return a copy
- support attribute access for setting
- filter supports same api as original ``DataFrame`` filter

- Reindex called with no arguments will now return a copy of the input object

- Series now inherits from ``NDFrame`` rather than directly from ``ndarray``.
There are several minor changes that affect the API.

- numpy functions that do not support the array interface will now
return ``ndarrays`` rather than series, e.g. ``np.diff`` and ``np.ones_like``
- ``Series(0.5)`` would previously return the scalar ``0.5``, this is no
longer supported
- ``TimeSeries`` is now an alias for ``Series``. the property ``is_time_series``
can be used to distinguish (if desired)

- Refactor of Sparse objects to use BlockManager

- Created a new block type in internals, ``SparseBlock``, which can hold multi-dtypes
and is non-consolidatable. ``SparseSeries`` and ``SparseDataFrame`` now inherit
more methods from there hierarchy (Series/DataFrame), and no longer inherit
from ``SparseArray`` (which instead is the object of the ``SparseBlock``)
- Sparse suite now supports integration with non-sparse data. Non-float sparse
data is supportable (partially implemented)
- Operations on sparse structures within DataFrames should preserve sparseness,
merging type operations will convert to dense (and back to sparse), so might
be somewhat inefficient
- enable setitem on ``SparseSeries`` for boolean/integer/slices
- ``SparsePanels`` implementation is unchanged (e.g. not using BlockManager, needs work)

- added ``ftypes`` method to Series/DataFame, similar to ``dtypes``, but indicates
if the underlying is sparse/dense (as well as the dtype)

- All ``NDFrame`` objects now have a ``_prop_attributes``, which can be used to indcated various
values to propogate to a new object from an existing (e.g. name in ``Series`` will follow
more automatically now)

- Internal type checking is now done via a suite of generated classes, allowing ``isinstance(value, klass)``
without having to directly import the klass, courtesy of @jtratner

- Bug in Series update where the parent frame is not updating its cache based on
changes (:issue:`4080`) or types (:issue:`3217`), fillna (:issue:`3386`)

- Indexing with dtype conversions fixed (:issue:`4463`, :issue:`4204`)

**Experimental Features**

**Bug Fixes**
Expand Down
103 changes: 103 additions & 0 deletions doc/source/v0.13.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ v0.13.0 (August ??, 2013)
This is a major release from 0.12.0 and includes several new features and
enhancements along with a large number of bug fixes.

.. warning::

In 0.13.0 ``Series`` has internaly been refactored to no longer sub-class ``ndarray``
but instead subclass ``NDFrame``, similarly to the rest of the pandas containers. This should be
a transparent change with only very limited API implications. See :ref:`Internal Refactoring<whatsnew_0130.refactoring>`

API changes
~~~~~~~~~~~

Expand Down Expand Up @@ -134,6 +140,103 @@ Enhancements
from pandas import offsets
td + offsets.Minute(5) + offsets.Milli(5)

.. _whatsnew_0130.refactoring:

Internal Refactoring
~~~~~~~~~~~~~~~~~~~~

In 0.13.0 there is a major refactor primarily to subclass ``Series`` from ``NDFrame``,
which is the base class currently for ``DataFrame`` and ``Panel``, to unify methods
and behaviors. Series formerly subclassed directly from ``ndarray``. (:issue:`4080`, :issue:`3862`, :issue:`816`)

.. warning::

There are two potential incompatibilities from < 0.13.0

- Using certain numpy functions would previously return a ``Series`` if passed a ``Series``
as an argument. This seems only to affect ``np.ones_like``, ``np.empty_like``, and
``np.diff``. These now return ``ndarrays``.

.. ipython:: python

s = Series([1,2,3,4])

# numpy usage
np.ones_like(s)
np.diff(s)

# pandonic usage
Series(1,index=s.index)
s.diff()

- Passing a ``Series`` directly to a cython function expecting an ``ndarray`` type will no
long work directly, you must pass ``Series.values``, See :ref:`Enhancing Performance<enhancingperf.ndarray>`

- ``Series(0.5)`` would previously return the scalar ``0.5``, instead this will return a 1-element ``Series``

- Refactor of series.py/frame.py/panel.py to move common code to generic.py

- added ``_setup_axes`` to created generic NDFrame structures
- moved methods

- ``from_axes,_wrap_array,axes,ix,loc,iloc,shape,empty,swapaxes,transpose,pop``
- ``__iter__,keys,__contains__,__len__,__neg__,__invert__``
- ``convert_objects,as_blocks,as_matrix,values``
- ``__getstate__,__setstate__`` (compat remains in frame/panel)
- ``__getattr__,__setattr__``
- ``_indexed_same,reindex_like,align,where,mask``
- ``fillna,replace`` (``Series`` replace is now consistent with ``DataFrame``)
- ``filter`` (also added axis argument to selectively filter on a different axis)
- ``reindex,reindex_axis`` (which was the biggest change to make generic)
- ``truncate`` (moved to become part of ``NDFrame``)

- These are API changes which make ``Panel`` more consistent with ``DataFrame``

- ``swapaxes`` on a ``Panel`` with the same axes specified now return a copy
- support attribute access for setting
- filter supports same api as original ``DataFrame`` filter

- Reindex called with no arguments will now return a copy of the input object

- Series now inherits from ``NDFrame`` rather than directly from ``ndarray``.
There are several minor changes that affect the API.

- numpy functions that do not support the array interface will now
return ``ndarrays`` rather than series, e.g. ``np.diff`` and ``np.ones_like``
- ``Series(0.5)`` would previously return the scalar ``0.5``, this is no
longer supported
- ``TimeSeries`` is now an alias for ``Series``. the property ``is_time_series``
can be used to distinguish (if desired)

- Refactor of Sparse objects to use BlockManager

- Created a new block type in internals, ``SparseBlock``, which can hold multi-dtypes
and is non-consolidatable. ``SparseSeries`` and ``SparseDataFrame`` now inherit
more methods from there hierarchy (Series/DataFrame), and no longer inherit
from ``SparseArray`` (which instead is the object of the ``SparseBlock``)
- Sparse suite now supports integration with non-sparse data. Non-float sparse
data is supportable (partially implemented)
- Operations on sparse structures within DataFrames should preserve sparseness,
merging type operations will convert to dense (and back to sparse), so might
be somewhat inefficient
- enable setitem on ``SparseSeries`` for boolean/integer/slices
- ``SparsePanels`` implementation is unchanged (e.g. not using BlockManager, needs work)

- added ``ftypes`` method to Series/DataFame, similar to ``dtypes``, but indicates
if the underlying is sparse/dense (as well as the dtype)

- All ``NDFrame`` objects now have a ``_prop_attributes``, which can be used to indcated various
values to propogate to a new object from an existing (e.g. name in ``Series`` will follow
more automatically now)

- Internal type checking is now done via a suite of generated classes, allowing ``isinstance(value, klass)``
without having to directly import the klass, courtesy of @jtratner

- Bug in Series update where the parent frame is not updating its cache based on
changes (:issue:`4080`) or types (:issue:`3217`), fillna (:issue:`3386`)

- Indexing with dtype conversions fixed (:issue:`4463`, :issue:`4204`)

Bug Fixes
~~~~~~~~~

Expand Down
Loading