Skip to content

QST: Subclassing and concat #35415

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
EmilianoJordan opened this issue Jul 26, 2020 · 4 comments
Open

QST: Subclassing and concat #35415

EmilianoJordan opened this issue Jul 26, 2020 · 4 comments
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode Subclassing Subclassing pandas objects

Comments

@EmilianoJordan
Copy link

I was looking into df.astype not respecting subclassed objects and noticed the issue had been resolved by modifying pd.concat to respect subclassed objects using the following code:

cons = self.objs[0]._constructor

Line 464 PR #29627

cons = self.objs[0]._constructor_expanddim

Line 477 PR #33884 (Set to be released on 1.1.0)

This assumes homogeneous object types, or at the very least that the first object is representative. My assumption was that functions on the pandas namespace did not respect subclassing and only methods on the objects did. Leaving developers to implement their own special use cases for subclassing. This seemed like a good delimitation to me, both logically and programmatically. But, now diving deeper into functions I don't regularly use I see I was mistaken as merge and merge_ordered both respect the left arg's subclassing.

With this in mind, I have some questions.

  1. Should there be more documentation around subclassing? This would include how concat, merge, merge_ordered etc handle subclassing.
  2. If functions on the pandas names space are going to implement varying degrees of support for subclassing then, should there be a constructor kwarg added to functions like read_sql and read_csv?
  3. Does to_hdf not working with subclassing need to be fixed?
@EmilianoJordan EmilianoJordan added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Jul 26, 2020
@jbrockmendel
Copy link
Member

cc @TomAugspurger

@jbrockmendel jbrockmendel added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 2, 2020
@TomAugspurger
Copy link
Contributor

Generally speaking, we're happy to accept PRs for subclassing things as long as they don't affect the typical user's experience. Switching things like DataFrame to use ._constructor is a good example.

Should there be more documentation around subclassing?

Sure. We have a page dedicated to subclassing.

should there be a constructor kwarg added to functions like read_sql and read_csv?

IMO, no. Most users wouldn't ever need it so I wouldn't want it in the signature.

Does to_hdf not working with subclassing need to be fixed?

I'm not familiar with the issue.

@jorisvandenbossche jorisvandenbossche added the Subclassing Subclassing pandas objects label Dec 7, 2020
@mitar
Copy link
Contributor

mitar commented Apr 6, 2021

I was looking into df.astype not respecting subclassed objects

I looks like it still does not work when astype is provided as a dict: #40810

@EmilianoJordan
Copy link
Author

EmilianoJordan commented Oct 22, 2021

I'd like to clarify the behavior in my original post for this issue with a code example:

This is intended to show that either the 'left' value or the first value (in the case of concat) is respected in the case of subclassing. In my opinion the better design decision would be to make these operations commutative. This could be accomplished by only returning a subclass when the objects are homogeneous in type and a pandas object should be returned in all other cases.

No matter what decision is made here I believe this should be documented in the subclassing documentation.

import pandas as pd

class SubclassedSeries(pd.Series):
    @property
    def _constructor(self):
        return SubclassedSeries

    @property
    def _constructor_expanddim(self):
        return SubclassedDataFrame


class SubclassedDataFrame(pd.DataFrame):
    @property
    def _constructor(self):
        return SubclassedDataFrame

    @property
    def _constructor_sliced(self):
        return SubclassedSeries

subclassed_series = SubclassedSeries(range(10,20), name='test')
s = pd.Series(range(10,20), name='test')

# pandas.merge respects lef class type
t = pd.merge(subclassed_series, s)
assert isinstance(t, SubclassedDataFrame)

t = pd.merge(s, subclassed_series)
assert not isinstance(t, SubclassedDataFrame)

# pandas concat behavior
t = pd.concat([subclassed_series, s])
assert isinstance(t, SubclassedSeries)

t = pd.concat([s,subclassed_series])
assert not isinstance(t, SubclassedSeries)

t = pd.concat([subclassed_series, s], axis=1)
assert isinstance(t, SubclassedDataFrame)

t = pd.concat([s,subclassed_series], axis=1)
assert not isinstance(t, SubclassedDataFrame)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode Subclassing Subclassing pandas objects
Projects
None yet
Development

No branches or pull requests

6 participants