-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame.convert_dtypes doesn't preserve subclasses #44249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame.convert_dtypes doesn't preserve subclasses #44249
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok, one question
CI failures now don't seem related, but I could be wrong Edit: now it definitely does, trying to sort out the type hints |
pandas/core/generic.py
Outdated
@@ -6219,8 +6220,12 @@ def convert_dtypes( | |||
for col_name, col in self.items() | |||
] | |||
if len(results) > 0: | |||
result = concat(results, axis=1, copy=False) | |||
cons = cast(Type[DataFrame], self._constructor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super happy with this. Was getting the error
pandas/core/generic.py:6222: error: Argument 1 to "NDFrame" has incompatible type "DataFrame"; expected "Union[ArrayManager, SingleArrayManager, BlockManager, SingleBlockManager]" [arg-type]
at https://github.com/pandas-dev/pandas/runs/4114882794?check_suite_focus=true, which refers to the line
result = self._constructor(concat(results, axis=1, copy=False))
.
Bit confused because NDFrame._constructor
raises an abstractmethod error, I'm guessing mypy is then reading the parent class PandasObject
which returns type(self)
i.e. in this case NDFrame
. Then we're calling NDFrame
with a DataFrame
(the result of concat
with more that 1 column). Finally the mypy type error seems to come from DataFrame
in the __init__
of NDFrame
which specifies the first argument is a Manager
and not a DataFrame.
I guess given the return signature of concat
is Series | DataFrame
this is technically correct, but it seems awkward to use the DataFrame
directly in the definition of the more general NDFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok, need to remove the print. ping on green.
@jreback looks like everything is passing now. |
thanks @m-richards |
I've added a fixture for the specific construction of a subclassed dataframe from the issue - which mirrors the setup in geopandas, I expect this probably needs to be relocated (I could have constructed this inline with the test and maybe that's the right solution, but I can see analogous changes for
astype
perhaps being useful in the geopandas context as well).I've also added the call to
__finalize__
although it's not strictly needed, since that's also needed by #28283 (Note that the 1 dimensional case already has__finalize__
called indirectly viaastype
.