Skip to content

Series __finalized__ not correctly called in binary operators #13208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jdfekete opened this issue May 17, 2016 · 12 comments · Fixed by #30769
Closed

Series __finalized__ not correctly called in binary operators #13208

jdfekete opened this issue May 17, 2016 · 12 comments · Fixed by #30769
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Subclassing Subclassing pandas objects
Milestone

Comments

@jdfekete
Copy link

jdfekete commented May 17, 2016

#!/bin/env python
"""
Example bug in derived Pandas Series.

__finalized__ is not called in arithmetic binary operators, but it is in in some booleans cases.

>>> m = MySeries([1, 2, 3], name='test')
>>> m.x = 42
>>> n=m[:2]
>>> n
0    1
1    2
dtype: int64
>>> n.x
42
>>> o=n+1
>>> o
0    2
1    3
dtype: int64
>>> o.x
Traceback (most recent call last):
        ...
AttributeError: 'MySeries' object has no attribute 'x'
>>> m = MySeries([True, False, True], name='test2')
>>> m.x = 42
>>> n=m[:2]
>>> n
0     True
1    False
dtype: bool
>>> n.x
42
>>> o=n ^ True
>>> o
0    False
1     True
dtype: bool
>>> o.x
42
>>> p = n ^ o
>>> p
0    True
1    True
dtype: bool
>>> p.x
42

"""

import pandas as pd

class MySeries(pd.Series):
    _metadata = ['x']

    @property
    def _constructor(self):
        return MySeries

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Expected Output

In all cases, the metadata 'x' should be transferred from the passed values when applying binary operators.
When the right-hand value is a constant, the left-hand value metadata should be used in finalize for arithmetic operators, just like it is for Boolean binary operators.
When two series are used in binary operators, some resolution should be possible in finalize.
I would pass the second (right-hand) value by calling finalize(self, other=other), leaving the resolution to the derived class implementer, but there might be a smarter approach.

output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-59-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: None
setuptools: 20.2.2
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: 0.9999999
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented May 17, 2016

yeah it appears some paths don't call finalize; though you can't just call finalize for example on binary operations between 2 Series. This would have to be handled in the sub-class itself. We don't have the machinery to ignore this though.

So I am suggesting that we DO call __finalize__ but maybe pass a parameter that will be default propogate (but on binary functions this parameter can be set to False and prevent propogation / unless overriden)

@jreback jreback added this to the Next Major Release milestone May 17, 2016
@jreback
Copy link
Contributor

jreback commented May 17, 2016

pull-requests welcome!

@pfrcks
Copy link
Contributor

pfrcks commented May 18, 2016

@jreback I would like to attempt it. Based on what you have written, I have located the function def of finalize in generic.py.
I'm thinking of adding a paramter with default value as True just as you mentioned above( which can be set to False on binary functions.)
However I'm not able to understand what do you mean by "DO call finalize". Does that mean that something has to be changed in the process during operations between two Series?

@jdfekete
Copy link
Author

@pfrcks I think it means you need to change pandas/core/ops.py so that binary operations now call your brand new finalize
Each call to self._constructor should be followed by a call to finalize to make sure the metadata is propagated.
Also, adding a named argument to finalize will break upward compatibility, but that might be the only way to do it right.

@pfrcks
Copy link
Contributor

pfrcks commented May 18, 2016

@jdfekete That makes more sense. Thanks. Will look into it.

@jreback
Copy link
Contributor

jreback commented May 18, 2016

so to clarify x and y are NDFrames (e.g. Series/DataFrame)

x + 1 -> __finalize__(other)
x + y -> __finalize__(other, method = '__add__')

would prob work. So the result in both cases is a NDFrame, they can be distinguished (and possibly operated on but a sub-class) when method is not None

@jdfekete
Copy link
Author

I am not sure. Looking in ops.py on the wrapper methods, maybe:
x + 1 -> __finalize__(self)
x + y -> __finalize__(self, method='__add__', other=other)

So far, the operators are not passed to the __finalize__ function. It could be handy in general, but that's a separate issue I think. Adding method=operator is useful but does not brake the API (I think). Adding the other=other parameter do change it a bit, although the current __finalize__ method will take it in its **kwargs .
Also, I have seen references to possible cycles/loops in some exchanges but I don't quite understand when they happen and whether my solution solves them.

@jreback
Copy link
Contributor

jreback commented May 18, 2016

So far, the operators are not passed to the finalize function. It could be handy in general, but that's a separate issue I think. Adding method=operator is useful but does not brake the API (I think). Adding the other=other parameter do change it a bit, although the current finalize method will take it in its **kwargs .

that's exactly what needs to be one, __finalize__ NEEDS to be called. There is no API change.

@matthiasha
Copy link

Same issue, different context

Code Sample

import pandas

# subclass series and define property 'meta' to be passed on
class S(pandas.Series):
    _metadata = ['meta']
    
    @property
    def _constructor(self):
        return S
    
# create new instance of Series subclass and set custom property
x = S(range(3))
x.meta = 'test'

# 'meta' gets passed on to slice, as expected
print(x[1:].meta)

# calculation results 
print((x * 2).meta)

Problem description

The documentation states:
Define _metadata for normal properties which will be passed to manipulation results

See http://pandas.pydata.org/pandas-docs/stable/internals.html#define-original-properties.

I think multiplication / adding etc. are also manipulation results.

This should be discussed with others who are already subclassing Pandas classes and are making use of _metadata.

Expected Output

The property is expected to be passed on to the calculation result.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.2.0
Cython: None
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@matthiasha
Copy link

My workaround: Always call __finalize__ when constructing a new Series:

@property
def _constructor(self):
    def f(*args, **kwargs):
        # workaround for https://github.com/pandas-dev/pandas/issues/13208
        return MySubclass(*args, **kwargs).__finalize__(self)
    return f

@akashgurava
Copy link

Which module contains the operations part? Like add I would love to be able to contribute.

@jbrockmendel
Copy link
Member

This is fixed in master, may need a dedicated test

@mroeschke mroeschke added the Needs Tests Unit test(s) needed to prevent regressions label Oct 21, 2019
@simonjayhawkins simonjayhawkins modified the milestones: Contributions Welcome, 1.0 Jan 7, 2020
@jorisvandenbossche jorisvandenbossche added the Subclassing Subclassing pandas objects label Dec 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Subclassing Subclassing pandas objects
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants