Skip to content

DataFrame.convert_dtypes doesn't preserve subclasses #44249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Nov 13, 2021

Conversation

m-richards
Copy link
Contributor

I've added a fixture for the specific construction of a subclassed dataframe from the issue - which mirrors the setup in geopandas, I expect this probably needs to be relocated (I could have constructed this inline with the test and maybe that's the right solution, but I can see analogous changes for astype perhaps being useful in the geopandas context as well).

I've also added the call to __finalize__ although it's not strictly needed, since that's also needed by #28283 (Note that the 1 dimensional case already has __finalize__ called indirectly via astype.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok, one question

@jreback jreback added this to the 1.4 milestone Nov 5, 2021
@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Constructors Series/DataFrame/Index/pd.array Constructors Subclassing Subclassing pandas objects labels Nov 5, 2021
@m-richards
Copy link
Contributor Author

m-richards commented Nov 5, 2021

CI failures now don't seem related, but I could be wrong

Edit: now it definitely does, trying to sort out the type hints

@@ -6219,8 +6220,12 @@ def convert_dtypes(
for col_name, col in self.items()
]
if len(results) > 0:
result = concat(results, axis=1, copy=False)
cons = cast(Type[DataFrame], self._constructor)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super happy with this. Was getting the error

pandas/core/generic.py:6222: error: Argument 1 to "NDFrame" has incompatible type "DataFrame"; expected "Union[ArrayManager, SingleArrayManager, BlockManager, SingleBlockManager]"  [arg-type]

at https://github.com/pandas-dev/pandas/runs/4114882794?check_suite_focus=true, which refers to the line
result = self._constructor(concat(results, axis=1, copy=False)).

Bit confused because NDFrame._constructor raises an abstractmethod error, I'm guessing mypy is then reading the parent class PandasObject which returns type(self) i.e. in this case NDFrame. Then we're calling NDFrame with a DataFrame (the result of concat with more that 1 column). Finally the mypy type error seems to come from DataFrame in the __init__ of NDFrame which specifies the first argument is a Manager and not a DataFrame.

I guess given the return signature of concat is Series | DataFrame this is technically correct, but it seems awkward to use the DataFrame directly in the definition of the more general NDFrame

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok, need to remove the print. ping on green.

@m-richards
Copy link
Contributor Author

looks ok, need to remove the print. ping on green.

@jreback looks like everything is passing now.

@jreback jreback merged commit 14bfd6b into pandas-dev:master Nov 13, 2021
@jreback
Copy link
Contributor

jreback commented Nov 13, 2021

thanks @m-richards

nickleus27 pushed a commit to nickleus27/pandas that referenced this pull request Nov 28, 2021
@m-richards m-richards deleted the convert_dtypes_subclasses branch September 16, 2023 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Constructors Series/DataFrame/Index/pd.array Constructors Subclassing Subclassing pandas objects
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.convert_dtypes doesn't preserve subclasses
3 participants