Skip to content

Commit 997426d

Browse files
committed
Merge pull request #9802 from sinhrks/expanddim
API: define _constructor_expanddim for subclassing Series and DataFrame
2 parents a8c1d5d + 134f177 commit 997426d

File tree

8 files changed

+209
-7
lines changed

8 files changed

+209
-7
lines changed

doc/source/faq.rst

+1
Original file line numberDiff line numberDiff line change
@@ -369,3 +369,4 @@ just a thin layer around the ``QTableView``.
369369
mw = MainWidget()
370370
mw.show()
371371
app.exec_()
372+

doc/source/internals.rst

+152
Original file line numberDiff line numberDiff line change
@@ -95,3 +95,155 @@ constructors ``from_tuples`` and ``from_arrays`` ensure that this is true, but
9595
if you compute the levels and labels yourself, please be careful.
9696

9797

98+
.. _:
99+
100+
Subclassing pandas Data Structures
101+
----------------------------------
102+
103+
.. warning:: There are some easier alternatives before considering subclassing ``pandas`` data structures.
104+
105+
1. Monkey-patching: See :ref:`Adding Features to your pandas Installation <ref-monkey-patching>`.
106+
107+
2. Use *composition*. See `here <http://en.wikipedia.org/wiki/Composition_over_inheritance>`_.
108+
109+
This section describes how to subclass ``pandas`` data structures to meet more specific needs. There are 2 points which need attention:
110+
111+
1. Override constructor properties.
112+
2. Define original properties
113+
114+
.. note:: You can find a nice example in `geopandas <https://github.com/geopandas/geopandas>`_ project.
115+
116+
Override Constructor Properties
117+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
118+
119+
Each data structure has constructor properties to specifying data constructors. By overriding these properties, you can retain defined-classes through ``pandas`` data manipulations.
120+
121+
There are 3 constructors to be defined:
122+
123+
- ``_constructor``: Used when a manipulation result has the same dimesions as the original.
124+
- ``_constructor_sliced``: Used when a manipulation result has one lower dimension(s) as the original, such as ``DataFrame`` single columns slicing.
125+
- ``_constructor_expanddim``: Used when a manipulation result has one higher dimension as the original, such as ``Series.to_frame()`` and ``DataFrame.to_panel()``.
126+
127+
Following table shows how ``pandas`` data structures define constructor properties by default.
128+
129+
=========================== ======================= =================== =======================
130+
Property Attributes ``Series`` ``DataFrame`` ``Panel``
131+
=========================== ======================= =================== =======================
132+
``_constructor`` ``Series`` ``DataFrame`` ``Panel``
133+
``_constructor_sliced`` ``NotImplementedError`` ``Series`` ``DataFrame``
134+
``_constructor_expanddim`` ``DataFrame`` ``Panel`` ``NotImplementedError``
135+
=========================== ======================= =================== =======================
136+
137+
Below example shows how to define ``SubclassedSeries`` and ``SubclassedDataFrame`` overriding constructor properties.
138+
139+
.. code-block:: python
140+
141+
class SubclassedSeries(Series):
142+
143+
@property
144+
def _constructor(self):
145+
return SubclassedSeries
146+
147+
@property
148+
def _constructor_expanddim(self):
149+
return SubclassedDataFrame
150+
151+
class SubclassedDataFrame(DataFrame):
152+
153+
@property
154+
def _constructor(self):
155+
return SubclassedDataFrame
156+
157+
@property
158+
def _constructor_sliced(self):
159+
return SubclassedSeries
160+
161+
.. code-block:: python
162+
163+
>>> s = SubclassedSeries([1, 2, 3])
164+
>>> type(s)
165+
<class '__main__.SubclassedSeries'>
166+
167+
>>> to_framed = s.to_frame()
168+
>>> type(to_framed)
169+
<class '__main__.SubclassedDataFrame'>
170+
171+
>>> df = SubclassedDataFrame({'A', [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
172+
>>> df
173+
A B C
174+
0 1 4 7
175+
1 2 5 8
176+
2 3 6 9
177+
178+
>>> type(df)
179+
<class '__main__.SubclassedDataFrame'>
180+
181+
>>> sliced1 = df[['A', 'B']]
182+
>>> sliced1
183+
A B
184+
0 1 4
185+
1 2 5
186+
2 3 6
187+
>>> type(sliced1)
188+
<class '__main__.SubclassedDataFrame'>
189+
190+
>>> sliced2 = df['A']
191+
>>> sliced2
192+
0 1
193+
1 2
194+
2 3
195+
Name: A, dtype: int64
196+
>>> type(sliced2)
197+
<class '__main__.SubclassedSeries'>
198+
199+
Define Original Properties
200+
~~~~~~~~~~~~~~~~~~~~~~~~~~
201+
202+
To let original data structures have additional properties, you should let ``pandas`` knows what properties are added. ``pandas`` maps unknown properties to data names overriding ``__getattribute__``. Defining original properties can be done in one of 2 ways:
203+
204+
1. Define ``_internal_names`` and ``_internal_names_set`` for temporary properties which WILL NOT be passed to manipulation results.
205+
2. Define ``_metadata`` for normal properties which will be passed to manipulation results.
206+
207+
Below is an example to define 2 original properties, "internal_cache" as a temporary property and "added_property" as a normal property
208+
209+
.. code-block:: python
210+
211+
class SubclassedDataFrame2(DataFrame):
212+
213+
# temporary properties
214+
_internal_names = DataFrame._internal_names + ['internal_cache']
215+
_internal_names_set = set(_internal_names)
216+
217+
# normal properties
218+
_metadata = ['added_property']
219+
220+
@property
221+
def _constructor(self):
222+
return SubclassedDataFrame2
223+
224+
.. code-block:: python
225+
226+
>>> df = SubclassedDataFrame2({'A', [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
227+
>>> df
228+
A B C
229+
0 1 4 7
230+
1 2 5 8
231+
2 3 6 9
232+
233+
>>> df.internal_cache = 'cached'
234+
>>> df.added_property = 'property'
235+
236+
>>> df.internal_cache
237+
cached
238+
>>> df.added_property
239+
property
240+
241+
# properties defined in _internal_names is reset after manipulation
242+
>>> df[['A', 'B']].internal_cache
243+
AttributeError: 'SubclassedDataFrame2' object has no attribute 'internal_cache'
244+
245+
# properties defined in _metadata are retained
246+
>>> df[['A', 'B']].added_property
247+
property
248+
249+

doc/source/whatsnew/v0.16.1.txt

+2
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,8 @@ Enhancements
5656

5757
- Trying to write an excel file now raises ``NotImplementedError`` if the ``DataFrame`` has a ``MultiIndex`` instead of writing a broken Excel file. (:issue:`9794`)
5858

59+
- ``DataFrame`` and ``Series`` now have ``_constructor_expanddim`` property as overridable constructor for one higher dimensionality data. This should be used only when it is really needed, see :ref:`here <ref-subclassing-pandas>`
60+
5961
.. _whatsnew_0161.api:
6062

6163
API changes

pandas/core/frame.py

+6-3
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,11 @@ def _constructor(self):
191191

192192
_constructor_sliced = Series
193193

194+
@property
195+
def _constructor_expanddim(self):
196+
from pandas.core.panel import Panel
197+
return Panel
198+
194199
def __init__(self, data=None, index=None, columns=None, dtype=None,
195200
copy=False):
196201
if data is None:
@@ -1061,8 +1066,6 @@ def to_panel(self):
10611066
-------
10621067
panel : Panel
10631068
"""
1064-
from pandas.core.panel import Panel
1065-
10661069
# only support this kind for now
10671070
if (not isinstance(self.index, MultiIndex) or # pragma: no cover
10681071
len(self.index.levels) != 2):
@@ -1100,7 +1103,7 @@ def to_panel(self):
11001103
shape=shape,
11011104
ref_items=selfsorted.columns)
11021105

1103-
return Panel(new_mgr)
1106+
return self._constructor_expanddim(new_mgr)
11041107

11051108
to_wide = deprecate('to_wide', to_panel)
11061109

pandas/core/generic.py

+4
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,10 @@ def _local_dir(self):
155155
def _constructor_sliced(self):
156156
raise AbstractMethodError(self)
157157

158+
@property
159+
def _constructor_expanddim(self):
160+
raise NotImplementedError
161+
158162
#----------------------------------------------------------------------
159163
# Axis
160164

pandas/core/series.py

+7-3
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,11 @@ def from_array(cls, arr, index=None, name=None, dtype=None, copy=False,
236236
def _constructor(self):
237237
return Series
238238

239+
@property
240+
def _constructor_expanddim(self):
241+
from pandas.core.frame import DataFrame
242+
return DataFrame
243+
239244
# types
240245
@property
241246
def _can_hold_na(self):
@@ -1047,11 +1052,10 @@ def to_frame(self, name=None):
10471052
-------
10481053
data_frame : DataFrame
10491054
"""
1050-
from pandas.core.frame import DataFrame
10511055
if name is None:
1052-
df = DataFrame(self)
1056+
df = self._constructor_expanddim(self)
10531057
else:
1054-
df = DataFrame({name: self})
1058+
df = self._constructor_expanddim({name: self})
10551059

10561060
return df
10571061

pandas/tests/test_frame.py

+21-1
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
import pandas.core.common as com
3232
import pandas.core.format as fmt
3333
import pandas.core.datetools as datetools
34-
from pandas import (DataFrame, Index, Series, notnull, isnull,
34+
from pandas import (DataFrame, Index, Series, Panel, notnull, isnull,
3535
MultiIndex, DatetimeIndex, Timestamp, date_range,
3636
read_csv, timedelta_range, Timedelta,
3737
option_context)
@@ -14214,6 +14214,26 @@ def _constructor(self):
1421414214
# GH9776
1421514215
self.assertEqual(df.iloc[0:1, :].testattr, 'XXX')
1421614216

14217+
def test_to_panel_expanddim(self):
14218+
# GH 9762
14219+
14220+
class SubclassedFrame(DataFrame):
14221+
@property
14222+
def _constructor_expanddim(self):
14223+
return SubclassedPanel
14224+
14225+
class SubclassedPanel(Panel):
14226+
pass
14227+
14228+
index = MultiIndex.from_tuples([(0, 0), (0, 1), (0, 2)])
14229+
df = SubclassedFrame({'X':[1, 2, 3], 'Y': [4, 5, 6]}, index=index)
14230+
result = df.to_panel()
14231+
self.assertTrue(isinstance(result, SubclassedPanel))
14232+
expected = SubclassedPanel([[[1, 2, 3]], [[4, 5, 6]]],
14233+
items=['X', 'Y'], major_axis=[0],
14234+
minor_axis=[0, 1, 2])
14235+
tm.assert_panel_equal(result, expected)
14236+
1421714237

1421814238
def skip_if_no_ne(engine='numexpr'):
1421914239
if engine == 'numexpr':

pandas/tests/test_series.py

+16
Original file line numberDiff line numberDiff line change
@@ -6851,6 +6851,22 @@ def test_searchsorted_sorter(self):
68516851
e = np.array([0, 2])
68526852
tm.assert_array_equal(r, e)
68536853

6854+
def test_to_frame_expanddim(self):
6855+
# GH 9762
6856+
6857+
class SubclassedSeries(Series):
6858+
@property
6859+
def _constructor_expanddim(self):
6860+
return SubclassedFrame
6861+
6862+
class SubclassedFrame(DataFrame):
6863+
pass
6864+
6865+
s = SubclassedSeries([1, 2, 3], name='X')
6866+
result = s.to_frame()
6867+
self.assertTrue(isinstance(result, SubclassedFrame))
6868+
expected = SubclassedFrame({'X': [1, 2, 3]})
6869+
assert_frame_equal(result, expected)
68546870

68556871

68566872
class TestSeriesNonUnique(tm.TestCase):

0 commit comments

Comments
 (0)