Skip to content

API: define _constructor_expanddim for subclassing Series and DataFrame #9802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 18, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -369,3 +369,4 @@ just a thin layer around the ``QTableView``.
mw = MainWidget()
mw.show()
app.exec_()
152 changes: 152 additions & 0 deletions doc/source/internals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,3 +95,155 @@ constructors ``from_tuples`` and ``from_arrays`` ensure that this is true, but
if you compute the levels and labels yourself, please be careful.


.. _:
Subclassing pandas Data Structures
----------------------------------

.. warning:: There are some easier alternatives before considering subclassing ``pandas`` data structures.

1. Monkey-patching: See :ref:`Adding Features to your pandas Installation <ref-monkey-patching>`.

2. Use *composition*. See `here <http://en.wikipedia.org/wiki/Composition_over_inheritance>`_.

This section describes how to subclass ``pandas`` data structures to meet more specific needs. There are 2 points which need attention:

1. Override constructor properties.
2. Define original properties

.. note:: You can find a nice example in `geopandas <https://github.com/geopandas/geopandas>`_ project.

Override Constructor Properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Each data structure has constructor properties to specifying data constructors. By overriding these properties, you can retain defined-classes through ``pandas`` data manipulations.

There are 3 constructors to be defined:

- ``_constructor``: Used when a manipulation result has the same dimesions as the original.
- ``_constructor_sliced``: Used when a manipulation result has one lower dimension(s) as the original, such as ``DataFrame`` single columns slicing.
- ``_constructor_expanddim``: Used when a manipulation result has one higher dimension as the original, such as ``Series.to_frame()`` and ``DataFrame.to_panel()``.

Following table shows how ``pandas`` data structures define constructor properties by default.

=========================== ======================= =================== =======================
Property Attributes ``Series`` ``DataFrame`` ``Panel``
=========================== ======================= =================== =======================
``_constructor`` ``Series`` ``DataFrame`` ``Panel``
``_constructor_sliced`` ``NotImplementedError`` ``Series`` ``DataFrame``
``_constructor_expanddim`` ``DataFrame`` ``Panel`` ``NotImplementedError``
=========================== ======================= =================== =======================

Below example shows how to define ``SubclassedSeries`` and ``SubclassedDataFrame`` overriding constructor properties.

.. code-block:: python
class SubclassedSeries(Series):
@property
def _constructor(self):
return SubclassedSeries
@property
def _constructor_expanddim(self):
return SubclassedDataFrame
class SubclassedDataFrame(DataFrame):
@property
def _constructor(self):
return SubclassedDataFrame
@property
def _constructor_sliced(self):
return SubclassedSeries
.. code-block:: python
>>> s = SubclassedSeries([1, 2, 3])
>>> type(s)
<class '__main__.SubclassedSeries'>
>>> to_framed = s.to_frame()
>>> type(to_framed)
<class '__main__.SubclassedDataFrame'>
>>> df = SubclassedDataFrame({'A', [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
>>> df
A B C
0 1 4 7
1 2 5 8
2 3 6 9
>>> type(df)
<class '__main__.SubclassedDataFrame'>
>>> sliced1 = df[['A', 'B']]
>>> sliced1
A B
0 1 4
1 2 5
2 3 6
>>> type(sliced1)
<class '__main__.SubclassedDataFrame'>
>>> sliced2 = df['A']
>>> sliced2
0 1
1 2
2 3
Name: A, dtype: int64
>>> type(sliced2)
<class '__main__.SubclassedSeries'>
Define Original Properties
~~~~~~~~~~~~~~~~~~~~~~~~~~

To let original data structures have additional properties, you should let ``pandas`` knows what properties are added. ``pandas`` maps unknown properties to data names overriding ``__getattribute__``. Defining original properties can be done in one of 2 ways:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in one of 2 ways:

1. Define ``_internal_names`` and ``_internal_names_set`` for temporary properties which WILL NOT be passed to manipulation results.
2. Define ``_metadata`` for normal properties which will be passed to manipulation results.

Below is an example to define 2 original properties, "internal_cache" as a temporary property and "added_property" as a normal property

.. code-block:: python
class SubclassedDataFrame2(DataFrame):
# temporary properties
_internal_names = DataFrame._internal_names + ['internal_cache']
_internal_names_set = set(_internal_names)
# normal properties
_metadata = ['added_property']
@property
def _constructor(self):
return SubclassedDataFrame2
.. code-block:: python
>>> df = SubclassedDataFrame2({'A', [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
>>> df
A B C
0 1 4 7
1 2 5 8
2 3 6 9
>>> df.internal_cache = 'cached'
>>> df.added_property = 'property'
>>> df.internal_cache
cached
>>> df.added_property
property
# properties defined in _internal_names is reset after manipulation
>>> df[['A', 'B']].internal_cache
AttributeError: 'SubclassedDataFrame2' object has no attribute 'internal_cache'
# properties defined in _metadata are retained
>>> df[['A', 'B']].added_property
property
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.16.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ Enhancements

- Trying to write an excel file now raises ``NotImplementedError`` if the ``DataFrame`` has a ``MultiIndex`` instead of writing a broken Excel file. (:issue:`9794`)

- ``DataFrame`` and ``Series`` now have ``_constructor_expanddim`` property as overridable constructor for one higher dimensionality data. This should be used only when it is really needed, see :ref:`here <ref-subclassing-pandas>`

.. _whatsnew_0161.api:

API changes
Expand Down
9 changes: 6 additions & 3 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,11 @@ def _constructor(self):

_constructor_sliced = Series

@property
def _constructor_expanddim(self):
from pandas.core.panel import Panel
return Panel

def __init__(self, data=None, index=None, columns=None, dtype=None,
copy=False):
if data is None:
Expand Down Expand Up @@ -1061,8 +1066,6 @@ def to_panel(self):
-------
panel : Panel
"""
from pandas.core.panel import Panel

# only support this kind for now
if (not isinstance(self.index, MultiIndex) or # pragma: no cover
len(self.index.levels) != 2):
Expand Down Expand Up @@ -1100,7 +1103,7 @@ def to_panel(self):
shape=shape,
ref_items=selfsorted.columns)

return Panel(new_mgr)
return self._constructor_expanddim(new_mgr)

to_wide = deprecate('to_wide', to_panel)

Expand Down
4 changes: 4 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,10 @@ def _local_dir(self):
def _constructor_sliced(self):
raise AbstractMethodError(self)

@property
def _constructor_expanddim(self):
raise NotImplementedError

#----------------------------------------------------------------------
# Axis

Expand Down
10 changes: 7 additions & 3 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,11 @@ def from_array(cls, arr, index=None, name=None, dtype=None, copy=False,
def _constructor(self):
return Series

@property
def _constructor_expanddim(self):
from pandas.core.frame import DataFrame
return DataFrame

# types
@property
def _can_hold_na(self):
Expand Down Expand Up @@ -1047,11 +1052,10 @@ def to_frame(self, name=None):
-------
data_frame : DataFrame
"""
from pandas.core.frame import DataFrame
if name is None:
df = DataFrame(self)
df = self._constructor_expanddim(self)
else:
df = DataFrame({name: self})
df = self._constructor_expanddim({name: self})

return df

Expand Down
22 changes: 21 additions & 1 deletion pandas/tests/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
import pandas.core.common as com
import pandas.core.format as fmt
import pandas.core.datetools as datetools
from pandas import (DataFrame, Index, Series, notnull, isnull,
from pandas import (DataFrame, Index, Series, Panel, notnull, isnull,
MultiIndex, DatetimeIndex, Timestamp, date_range,
read_csv, timedelta_range, Timedelta,
option_context)
Expand Down Expand Up @@ -14214,6 +14214,26 @@ def _constructor(self):
# GH9776
self.assertEqual(df.iloc[0:1, :].testattr, 'XXX')

def test_to_panel_expanddim(self):
# GH 9762

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add issue number in a comment here

class SubclassedFrame(DataFrame):
@property
def _constructor_expanddim(self):
return SubclassedPanel

class SubclassedPanel(Panel):
pass

index = MultiIndex.from_tuples([(0, 0), (0, 1), (0, 2)])
df = SubclassedFrame({'X':[1, 2, 3], 'Y': [4, 5, 6]}, index=index)
result = df.to_panel()
self.assertTrue(isinstance(result, SubclassedPanel))
expected = SubclassedPanel([[[1, 2, 3]], [[4, 5, 6]]],
items=['X', 'Y'], major_axis=[0],
minor_axis=[0, 1, 2])
tm.assert_panel_equal(result, expected)


def skip_if_no_ne(engine='numexpr'):
if engine == 'numexpr':
Expand Down
16 changes: 16 additions & 0 deletions pandas/tests/test_series.py
Original file line number Diff line number Diff line change
Expand Up @@ -6851,6 +6851,22 @@ def test_searchsorted_sorter(self):
e = np.array([0, 2])
tm.assert_array_equal(r, e)

def test_to_frame_expanddim(self):
# GH 9762

class SubclassedSeries(Series):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@property
def _constructor_expanddim(self):
return SubclassedFrame

class SubclassedFrame(DataFrame):
pass

s = SubclassedSeries([1, 2, 3], name='X')
result = s.to_frame()
self.assertTrue(isinstance(result, SubclassedFrame))
expected = SubclassedFrame({'X': [1, 2, 3]})
assert_frame_equal(result, expected)


class TestSeriesNonUnique(tm.TestCase):
Expand Down