-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: simplify _constructor/_from_mgr/_constructor_from_mgr/... #56681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just chiming in as a maintainer of a package that subclasses Pandas DataFrames and Series'. Apart from having many constructor functions and properties, at the moment it is also not very clear what a subclass can override, should override and must not override. The documentation for subclassing (https://pandas.pydata.org/docs/development/extending.html#subclassing-pandas-data-structures) only mentions Understanding the concepts here is fairly difficult if one does not actively participate in core development. And this has become more difficult over the last releases with the addition of new constructor related methods. Therefore, I am in favour of simplifying this system. But additionally, it would then be great to have some more documentation on the purpose of each function or property. Even if it's just in the form of comments in the code. |
take |
take |
take |
take |
take |
ATM we have
I suspect that we can make do with fewer of these, or at least simplify them (e.g. make some
@final
,@classmethod
, #51772, make downstream libraries never touch Managers, very slightly improve perf in the non-subclass cases). Some of these ideas are mutually exclusive:ATM _from_mgr is a classmethod, all the other _from_mgr methods are regular methods. The constructor_from_mgr methods rely on
self._constructor*
, so could not be made classmethods unless those constructors also became classmethods. Does anyone out there rely on _constructor not being a classmethod?_expanddim_from_mgr
and_sliced_from_mgr
are each only used once, in_constructor_expanddim_from_mgr
and_constructor_sliced_from_mgr
, respectively. Can we just inline these and remove the methods (maybe keep the less-verbose names)? If not, can we at least@final
and@classmethod
them?Can
_from_mgr
be made@final
? If not can_constructor_from_mgr
be inlined into it?The 3 listed expanddim methods are there just for
Series.to_frame
. We could get rid of all three and tell subclasses to just overrideto_frame
. (Doing this would also make it easier to tell subclass authors to handle metadata propagation in to_frame xref _metadata items of subclassed pd.Series are not propagated into corresponding SubclassedDataFrame #32860)We can slightly improve perf for non-subclasses by changing
if self._constructor is Series
toif type(self) is Series
at the cost of slightly hurting perf for subclasses.Most extreme: we can make it so that a) (most) subclasses don't need to override any from_mgr methods, b) subclasses
__init__
don't need to know how to handle Managers, c) we don't need the constructorfrom_mgr methods and d) (most) subclass authors never touch Managers by writing e.g.i.e. assume that the subclass
__init__
knows what to do when given a pandas object.@jorisvandenbossche writes that this would inconvenience geopandas since it does inference when given a DataFrame but not when given a Manager. In cases like that geopandas would need to override _from_mgr (why "most" is needed in a) and d) above), but ATM they need to override _constructor_from_mgr, so that wouldn't be a net increase in Stuff They Need To Do.
This idea is roughly #53871, which was reverted by #54922.
The text was updated successfully, but these errors were encountered: