Skip to content

BUG: DataFrame.convert_dtypes doesn't preserve subclasses #43668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Sep 20, 2021 · 1 comment · Fixed by #44249
Closed

BUG: DataFrame.convert_dtypes doesn't preserve subclasses #43668

jorisvandenbossche opened this issue Sep 20, 2021 · 1 comment · Fixed by #44249
Assignees
Labels
Bug Subclassing Subclassing pandas objects
Milestone

Comments

@jorisvandenbossche
Copy link
Member

The DataFrame.convert_dtypes method returns the same dataframe (with potentially updated dtypes in certain columns), but because of the internal use of concat, it doesn't necessarily preserve the class type of the dataframe for subclasses.

Noticed this in GeoPandas: geopandas/geopandas#1870

This stems from the fact that the dataframe is basically decomposed into columns (Series), and then those are combined again into a DataFrame with concat. But at that point, you are concatting Series objects, and concat doesn't know anymore about the original dataframe class. Personally, I would say that the use of concat here is an implementation detail, and that convert_dtypes could easily preserve the original class of the calling dataframe.

Small reproducer without geopandas:

class SubclassedDataFrame(DataFrame):

    @property
    def _constructor(self):
        return SubclassedDataFrame


In [51]: df = SubclassedDataFrame({'a': [1, 2, 3]})

In [52]: type(df)
Out[52]: __main__.SubclassedDataFrame

In [53]: type(df.convert_dtypes())
Out[53]: pandas.core.frame.DataFrame

Note I am not using pd._testing.SubclassedDataFrame since for this subclass, each column is also a SubclassesSeries, and then concat will actually preserve the subclass.

@m-richards
Copy link
Contributor

take

@jreback jreback added this to the 1.4 milestone Nov 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Subclassing Subclassing pandas objects
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants