Skip to content

Subclassed reshape clean #15655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 25 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
642bfce
add subclassed stack/unstack/pivot tests
delgadom Mar 8, 2017
0295604
add melt test
delgadom Mar 8, 2017
60a2cfd
use _constructor* properties to create Series and DataFrame objects t…
delgadom Mar 8, 2017
d65cff5
document _Unstacker
delgadom Mar 8, 2017
9d1cf63
fix bug in wide_to_long_test, add GH issue numbers
delgadom Mar 11, 2017
3efb82f
add whatsnew entry
delgadom Mar 11, 2017
200c752
flake8 cleanup
delgadom Mar 11, 2017
5e480c6
fix bug in existing docs ``internals.rst:220`` ``{A, [`` --> ``{A: [``
delgadom Mar 11, 2017
4f3319c
clarify language and add subclassed reshape and math examples to doc/…
delgadom Mar 11, 2017
8af21c1
additional clarification in doc/source/internals.rst
delgadom Mar 11, 2017
027f36a
remove references to Panel from doc/source/internals.rst subclassing …
delgadom Mar 11, 2017
8a61374
change code-block to ipython directives in doc/source/internals.rst
delgadom Mar 12, 2017
6715a25
change from python to ipython code blocks in docs
delgadom May 13, 2017
f751a85
reformat docstrings
delgadom May 13, 2017
ca85796
add subclassed stack/unstack/pivot tests
delgadom Mar 8, 2017
b0bc8f4
add melt test
delgadom Mar 8, 2017
246a464
use _constructor* properties to create Series and DataFrame objects t…
delgadom Mar 8, 2017
1c672a9
document _Unstacker
delgadom Mar 8, 2017
eff151e
fix bug in wide_to_long_test, add GH issue numbers
delgadom Mar 11, 2017
16dae8e
flake8 cleanup
delgadom Mar 11, 2017
66b2e42
fix bug in existing docs ``internals.rst:220`` ``{A, [`` --> ``{A: [``
delgadom Mar 11, 2017
7641812
clarify language and add subclassed reshape and math examples to doc/…
delgadom Mar 11, 2017
be66ce0
additional clarification in doc/source/internals.rst
delgadom Mar 11, 2017
d27034d
remove references to Panel from doc/source/internals.rst subclassing …
delgadom Mar 11, 2017
dc3b07e
merge conflicts
delgadom May 13, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
228 changes: 160 additions & 68 deletions doc/source/internals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
np.random.seed(123456)
np.set_printoptions(precision=4, suppress=True)
import pandas as pd
from pandas import Series, DataFrame
pd.options.display.max_rows = 15

*********
Expand Down Expand Up @@ -110,15 +111,15 @@ This section describes how to subclass ``pandas`` data structures to meet more s
Override Constructor Properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Each data structure has constructor properties to specifying data constructors. By overriding these properties, you can retain defined-classes through ``pandas`` data manipulations.
Each data structure has constructor properties to specifying data constructors. By overriding these properties, you can retain defined subclass families through ``pandas`` data manipulations.

There are 3 constructors to be defined:

- ``_constructor``: Used when a manipulation result has the same dimesions as the original.
- ``_constructor_sliced``: Used when a manipulation result has one lower dimension(s) as the original, such as ``DataFrame`` single columns slicing.
- ``_constructor_expanddim``: Used when a manipulation result has one higher dimension as the original, such as ``Series.to_frame()`` and ``DataFrame.to_panel()``.
- ``_constructor_expanddim``: Used when a manipulation result has one higher dimension as the original, such as ``Series.to_frame()``.

Following table shows how ``pandas`` data structures define constructor properties by default.
The following table shows how ``pandas`` data structures define constructor properties by default.

=========================== ======================= =================== =======================
Property Attributes ``Series`` ``DataFrame`` ``Panel``
Expand All @@ -128,68 +129,153 @@ Property Attributes ``Series`` ``DataFrame`` ``Panel
``_constructor_expanddim`` ``DataFrame`` ``Panel`` ``NotImplementedError``
=========================== ======================= =================== =======================

Below example shows how to define ``SubclassedSeries`` and ``SubclassedDataFrame`` overriding constructor properties.
The below example shows how to define ``SubclassedSeries`` and ``SubclassedDataFrame`` classes, overriding the default constructor properties.

.. code-block:: python
.. ipython:: python

In [1]: class SubclassedSeries(Series):
...:
...: @property
...: def _constructor(self):
...: return SubclassedSeries
...:
...: @property
...: def _constructor_expanddim(self):
...: return SubclassedDataFrame
...:

In [1]: class SubclassedDataFrame(DataFrame):
...:
...: @property
...: def _constructor(self):
...: return SubclassedDataFrame
...:
...: @property
...: def _constructor_sliced(self):
...: return SubclassedSeries
...:

Overriding constructor properties allows subclass families to be preserved across slice and reshape operations:

.. ipython:: python

class SubclassedSeries(Series):
In [1]: ser = SubclassedSeries([1, 2, 3])
In [1]: ser
Out[1]:
0 1
1 2
2 3
dtype: int64
In [1]: type(ser)
Out[1]:
<class '__main__.SubclassedSeries'>

@property
def _constructor(self):
return SubclassedSeries
In [1]: to_framed = s.to_frame()
In [1]: type(to_framed)
Out[1]:
<class '__main__.SubclassedDataFrame'>

@property
def _constructor_expanddim(self):
return SubclassedDataFrame
In [1]: df = SubclassedDataFrame({
...: 'A': ['a', 'a', 'b', 'b'],
...: 'B': ['x', 'y', 'x', 'y'],
...: 'C': [1, 2, 3, 4]})
In [1]: df
Out[1]:
A B C
0 a x 0
1 a y 1
2 b x 2
3 b y 3

class SubclassedDataFrame(DataFrame):
In [1]: type(df)
Out[1]:
<class '__main__.SubclassedDataFrame'>

@property
def _constructor(self):
return SubclassedDataFrame
In [1]: sliced1 = df[['A', 'B']]
In [1]: sliced1
Out[1]:
A B
0 a x
1 a y
2 b x
3 b y
In [1]: type(sliced1)
Out[1]:
<class '__main__.SubclassedDataFrame'>

@property
def _constructor_sliced(self):
return SubclassedSeries
In [1]: sliced2 = df['C']
In [1]: sliced2
Out[1]:
0 0
1 1
2 2
3 3
Name: A, dtype: int64

.. code-block:: python
In [1]: type(sliced2)
Out[1]:
<class '__main__.SubclassedSeries'>

>>> s = SubclassedSeries([1, 2, 3])
>>> type(s)
In [1]: stacked = df.stack()
In [1]: stacked
Out[1]:
0 A a
B x
C 1
1 A a
B y
C 2
2 A b
B x
C 3
3 A b
B y
C 4
dtype: object
In [1]: type(stacked)
Out[1]:
<class '__main__.SubclassedSeries'>

>>> to_framed = s.to_frame()
>>> type(to_framed)
In [1]: pivoted = pd.pivot(index='A', columns='B', values='C')
In [1]: pivoted
Out[1]:
B x y
A
a 1 2
b 3 4
In [1]: type(pivoted)
Out[1]:
<class '__main__.SubclassedDataFrame'>

>>> df = SubclassedDataFrame({'A', [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
>>> df
A B C
0 1 4 7
1 2 5 8
2 3 6 9
Most data operations also preserve the class:

>>> type(df)
<class '__main__.SubclassedDataFrame'>
.. ipython:: python

>>> sliced1 = df[['A', 'B']]
>>> sliced1
A B
0 1 4
1 2 5
2 3 6
>>> type(sliced1)
In [1]: squared = pivoted**2
In [1]: squared
Out[1]:
B x y
A
a 1 4
b 9 16
In [1]: type(pivoted)
Out[1]:
<class '__main__.SubclassedDataFrame'>

>>> sliced2 = df['A']
>>> sliced2
0 1
1 2
2 3
Name: A, dtype: int64
>>> type(sliced2)
In [1]: interped = ser.loc[[0, 0.5, 1, 1.5, 2]].interpolate()
In [1]: interped
Out[1]:
0.0 1.0
0.5 1.5
1.0 2.0
1.5 2.5
2.0 3.0
dtype: float64
In [1]: type(interped)
Out[1]:
<class '__main__.SubclassedSeries'>


Define Original Properties
~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand All @@ -200,42 +286,48 @@ To let original data structures have additional properties, you should let ``pan

Below is an example to define 2 original properties, "internal_cache" as a temporary property and "added_property" as a normal property

.. code-block:: python

class SubclassedDataFrame2(DataFrame):

# temporary properties
_internal_names = pd.DataFrame._internal_names + ['internal_cache']
_internal_names_set = set(_internal_names)

# normal properties
_metadata = ['added_property']
.. ipython:: python

@property
def _constructor(self):
return SubclassedDataFrame2
In [1]: class SubclassedDataFrame2(DataFrame):
...:
...: # temporary properties
...: _internal_names = DataFrame._internal_names + ['internal_cache']
...: _internal_names_set = set(_internal_names)
...:
...: # normal properties
...: _metadata = ['added_property']
...:
...: @property
...: def _constructor(self):
...: return SubclassedDataFrame2

.. code-block:: python
.. ipython:: python

>>> df = SubclassedDataFrame2({'A', [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
>>> df
In [1]: df = SubclassedDataFrame2({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
In [1]: df
Out[1]:
A B C
0 1 4 7
1 2 5 8
2 3 6 9

>>> df.internal_cache = 'cached'
>>> df.added_property = 'property'
In [1]: df.internal_cache = 'cached'
In [1]: df.added_property = 'property'
Out[1]:

>>> df.internal_cache
In [1]: df.internal_cache
Out[1]:
cached
>>> df.added_property
In [1]: df.added_property
Out[1]:
property

# properties defined in _internal_names is reset after manipulation
>>> df[['A', 'B']].internal_cache
In [1]: df[['A', 'B']].internal_cache
Out[1]:
AttributeError: 'SubclassedDataFrame2' object has no attribute 'internal_cache'

# properties defined in _metadata are retained
>>> df[['A', 'B']].added_property
In [1]: df[['A', 'B']].added_property
Out[1]:
property
Loading