Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Support an ExtensionDtype and ExtensionArray #554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support an ExtensionDtype and ExtensionArray #554
Changes from 3 commits
4bae5e7
011af56
2edb805
fb1c15b
fb9a7ae
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twoertwein @Dr-Irv While I understand the intention, this one causes a problem with the following types from Pandas:
pandas.core.frame.DataFrame.reorder_levels(self, order: Sequence[Axis], axis: Axis = 0)
The order parameter expects a sequence of ints or strings - with this change this is forbidden.
From my code:
PS: Sorry for not creating an issue (yet). I am short on time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you do have the time, please create an issue with a full code sample.
This has picked up that
reorder_levels()
has an incorrect type fororder
. It should beSequence[Hashable]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we go: #560
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
na_value
seems to be a property (but the return value can be Any for some classes)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it turns out while pandas implements them as properties, from a typing perspective, they are really class variables, so I annotated them that way in the next commit.
Throughout pandas, we just set
type = xxx
andname = yyy
etc. in the subclasses, so changing them toClassVar
makes that work for typing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't be surpsrised if they are used inconsistently within pandas :( in some places as properties, in some as normal attributes and in some as class variables. I just grepped through pandas/core/dtypes, I didn't find a single line that assigns a value to a class variable
na_value
ortype
. It seems consistently to be a property.If they are class variables, can you add a quick
isinstance(ClassName.na_value, ...)
test.If the pandas tests set it for simplicity/laziness to a class varaible (which is fine with me), we can just add an ignore to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at
pandas/core/arrays/integer.py
. You'll see assignments oftype =
for the variousInteger
dtypes.I believe it's declared as a property in
pandas/core/dtypes/base.py
so that if someone doesn't define it, it raisesAbstractMethodError
. But it is used internally as aClassVar
.For
na_value
, it is inconsistent, but based on the example forDecimalDtype
, it was treated as a class variable there as well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left
name
andtype
asClassVar
, but changedna_value
to be a property, because it has a default value.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wounder why mypy/pyright didn't catch that. Maybe because a parent class re-declares
type
asAny
or because of the decorators that mypy doesn't understand. Seems like something that should be fixed in pandas.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering the same thing.
In reality,
type
andname
are abstract class properties ofExtensionDtype
, but the type checkers aren't supporting that concept. See long discussions here: microsoft/pyright#2601 and python/mypy#8993