Skip to content

__finalize__ called with other as _Concatenator? #18999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Dec 30, 2017 · 9 comments
Closed

__finalize__ called with other as _Concatenator? #18999

jbrockmendel opened this issue Dec 30, 2017 · 9 comments
Labels
API Design Subclassing Subclassing pandas objects

Comments

@jbrockmendel
Copy link
Member

Revisiting some 6 month old class that subclasses Series and overrides __finalize__ as:

    def __finalize__(self, other, method=None, **kwargs):
        pd.Series.__finalize__(self, other, method=method, **kwargs)
        self.metadata = other.metadata.copy()
        self.metadata.parent = self
        return self

With 0.21.1 these objects can no longer be rendered:

>>> ser
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/base.py", line 80, in __repr__
    return str(self)
  File "pandas/core/base.py", line 60, in __str__
    return self.__bytes__()
  File "pandas/core/base.py", line 72, in __bytes__
    return self.__unicode__().encode(encoding, 'replace')
  File "pandas/core/series.py", line 1066, in __unicode__
    max_rows=max_rows, length=show_dimensions)
  File "pandas/core/series.py", line 1109, in to_string
    max_rows=max_rows)
  File "pandas/io/formats/format.py", line 176, in __init__
    self._chk_truncate()
  File "pandas/io/formats/format.py", line 190, in _chk_truncate
    series.iloc[-row_num:]))
  File "pandas/core/reshape/concat.py", line 213, in concat
    return op.get_result()
  File "pandas/core/reshape/concat.py", line 377, in get_result
    return cons(mgr, name=name).__finalize__(self, method='concat')
  File "pdsm/core/series.py", line 125, in __finalize__
    self.metadata = other.metadata.copy()
AttributeError: '_Concatenator' object has no attribute 'metadata'

I had assumed that other would always be of the same class as self. Is _Concatenator a recent addition?

@jreback
Copy link
Contributor

jreback commented Dec 30, 2017

can you demonstrate an actual example?

@jbrockmendel
Copy link
Member Author

jbrockmendel commented Jan 2, 2018

class Metadata(object):
    def __init__(self, parent):
        self.parent = parent

    def copy(self):
        return Metadata(parent=self.parent)


class MetaSeries(pd.Series):
    _metadata = pd.Series._metadata + ['metadata']

    @property
    def _constructor(self):
        return MetaSeries

    def __init__(self, *args, **kwargs):
        super(MetaSeries, self).__init__(*args, **kwargs)
        md = Metadata(parent=self)
        object.__setattr__(self, 'metadata', md)

    def __finalize__(self, other, method=None, **kwargs):
        pd.Series.__finalize__(self, other, method=method, **kwargs)
        md = other.metadata.copy()
        object.__setattr__(self, 'metadata', md)
        md.parent = self
        return self
>>> ser = pd.Series(range(100))
>>> mser = MetaSeries(ser)
>>> mser[:5]
0    0
1    1
2    2
3    3
4    4
dtype: int64
>>> mser
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas/core/base.py", line 80, in __repr__
    return str(self)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/base.py", line 60, in __str__
    return self.__bytes__()
  File "/usr/local/lib/python2.7/site-packages/pandas/core/base.py", line 72, in __bytes__
    return self.__unicode__().encode(encoding, 'replace')
  File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 1066, in __unicode__
    max_rows=max_rows, length=show_dimensions)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 1109, in to_string
    max_rows=max_rows)
  File "/usr/local/lib/python2.7/site-packages/pandas/io/formats/format.py", line 176, in __init__
    self._chk_truncate()
  File "/usr/local/lib/python2.7/site-packages/pandas/io/formats/format.py", line 190, in _chk_truncate
    series.iloc[-row_num:]))
  File "/usr/local/lib/python2.7/site-packages/pandas/core/reshape/concat.py", line 213, in concat
    return op.get_result()
  File "/usr/local/lib/python2.7/site-packages/pandas/core/reshape/concat.py", line 377, in get_result
    return cons(mgr, name=name).__finalize__(self, method='concat')
  File "subclass.py", line 27, in __finalize__
    self.metadata = other.metadata.copy()
AttributeError: '_Concatenator' object has no attribute 'metadata'

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 11, 2018

@jbrockmendel That is code that I recently rewrote (#17728), so it is quite possible I introduces a bug.

However, I cannot reproduce the exact error you get. I get:

In [3]: mser
..
ValueError: All objects passed were None

and I already get to for 0.20.3 as well.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 11, 2018

Ah, there was a return self missing in the __finalize__ implementation in the code example above. Now I get the error you were showing (and now it also works in older pandas version).

And I suppose that I meant to pass the first object from the list to concat instead of self. Does that make sense?

However, it seems that I just did how it was already done before (using this __finalize__(self) construct where self is the _Concatenator object). So not sure why it started failing.

@jorisvandenbossche
Copy link
Member

OK, I understand what changed the behaviour. Before my PR (#17728), concat of serieses always returned a Series, while my PR changed it to return an instance of the class of the first object.
So in practice, the __finalize__ of your MetaSeries is now called instead of Series.__finalize__, and the latter has a check that self is NDFrame like:

pandas/pandas/core/generic.py

Lines 4000 to 4003 in 8acdf80

if isinstance(other, NDFrame):
for name in self._metadata:
object.__setattr__(self, name, getattr(other, name, None))
return self

Given that, I would argue the current code might be correct, as passing through the metadata of the first object is maybe not the best default? (Eg that would ignore different metadata in the different objects you are concatting).
The subclass can always have specialized code in __finalize__ for checking such metadata

@jbrockmendel
Copy link
Member Author

Ah, there was a return self missing in the finalize implementation in the code example above. Now I get the error you were showing (and now it also works in older pandas version).

Good catch. I just edited the previous comment to fix that mistake.

Given that, I would argue the current code might be correct, as passing through the metadata of the first object is maybe not the best default? (Eg that would ignore different metadata in the different objects you are concatting).

I think you're right that the problem above would be fixed by having a isinstance(other, NDFrame) check in the subclass's __finalize__ method. (In the non-toy version MetaSeries, the metadata object is responsible for figuring out how to propagate itself under various operations.)

concat is well outside the areas that I'm familiar with (and I was surprised that it was relevant for __repr__). For my own understanding, why is _Concatenator a reasonable object to pass to __finalize__? I don't see how it could usefully propagate any metadata.

@jorisvandenbossche
Copy link
Member

I was surprised that it was relevant for __repr__

Because it is a truncated repr, concatting first and last part.

For my own understanding, why is _Concatenator a reasonable object to pass to finalize? I don't see how it could usefully propagate any metadata.

That object holds more information, eg the list of all concatenated objects. So for example you could check for a certain metadata field if all objects have the same value, and only in that case set it on the resulting object.

@jbrockmendel
Copy link
Member Author

This is probably closable, possibly worth keeping around on some tracker for "subclassing pitfalls"

@jreback jreback closed this as completed Jan 17, 2018
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Jan 20, 2018
@jorisvandenbossche
Copy link
Member

@jbrockmendel want to do a PR to mention this in the subclassing docs?

@jorisvandenbossche jorisvandenbossche added the Subclassing Subclassing pandas objects label Dec 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Subclassing Subclassing pandas objects
Projects
None yet
Development

No branches or pull requests

4 participants