-
-
Notifications
You must be signed in to change notification settings - Fork 141
Support an ExtensionDtype and ExtensionArray #554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas-stubs/core/dtypes/base.pyi
Outdated
@property | ||
def type(self) -> type_t: ... | ||
type: type_t | ||
na_value: Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
na_value
seems to be a property (but the return value can be Any for some classes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it turns out while pandas implements them as properties, from a typing perspective, they are really class variables, so I annotated them that way in the next commit.
Throughout pandas, we just set type = xxx
and name = yyy
etc. in the subclasses, so changing them to ClassVar
makes that work for typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't be surpsrised if they are used inconsistently within pandas :( in some places as properties, in some as normal attributes and in some as class variables. I just grepped through pandas/core/dtypes, I didn't find a single line that assigns a value to a class variable na_value
or type
. It seems consistently to be a property.
If they are class variables, can you add a quick isinstance(ClassName.na_value, ...)
test.
If the pandas tests set it for simplicity/laziness to a class varaible (which is fine with me), we can just add an ignore to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't be surpsrised if they are used inconsistently within pandas :( in some places as properties, in some as normal attributes and in some as class variables. I just grepped through pandas/core/dtypes, I didn't find a single line that assigns a value to a class variable
na_value
ortype
. It seems consistently to be a property.
Take a look at pandas/core/arrays/integer.py
. You'll see assignments of type =
for the various Integer
dtypes.
I believe it's declared as a property in pandas/core/dtypes/base.py
so that if someone doesn't define it, it raises AbstractMethodError
. But it is used internally as a ClassVar
.
For na_value
, it is inconsistent, but based on the example for DecimalDtype
, it was treated as a class variable there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left name
and type
as ClassVar
, but changed na_value
to be a property, because it has a default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at
pandas/core/arrays/integer.py
. You'll see assignments oftype =
for the variousInteger
dtypes.
I wounder why mypy/pyright didn't catch that. Maybe because a parent class re-declares type
as Any
or because of the decorators that mypy doesn't understand. Seems like something that should be fixed in pandas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wounder why mypy/pyright didn't catch that. Maybe because a parent class re-declares
type
asAny
or because of the decorators that mypy doesn't understand. Seems like something that should be fixed in pandas.
I was wondering the same thing.
In reality, type
and name
are abstract class properties of ExtensionDtype
, but the type checkers aren't supporting that concept. See long discussions here: microsoft/pyright#2601 and python/mypy#8993
Thanks @Dr-Irv ! |
Axis: TypeAlias = str | int | ||
AxisInt: TypeAlias = Literal[0, 1] | ||
Axis: TypeAlias = AxisInt | Literal["index", "columns", "rows"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twoertwein @Dr-Irv While I understand the intention, this one causes a problem with the following types from Pandas:
pandas.core.frame.DataFrame.reorder_levels(self, order: Sequence[Axis], axis: Axis = 0)
The order parameter expects a sequence of ints or strings - with this change this is forbidden.
From my code:
reportquery.py:612: error: Argument 1 to
"reorder_levels" of "DataFrame" has incompatible type "Sequence[str]"; expected
"Sequence[Union[Literal[0, 1], Literal['index', 'columns', 'rows']]]"
[arg-type]
df = df.reorder_levels(column_names, axis=1)
PS: Sorry for not creating an issue (yet). I am short on time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS: Sorry for not creating an issue (yet). I am short on time.
When you do have the time, please create an issue with a full code sample.
This has picked up that reorder_levels()
has an incorrect type for order
. It should be Sequence[Hashable]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we go: #560
* Support an ExtensionDtype and ExtensionArray * update TypeGuard usage. Test astype for Decimal * use ClassVar. Change is_float to accept numpy. Remove self: Self * remove declarations of private funcs. make na_value a property
While working on #519 , I wanted to test that we could pass a user-created
ExtensionDtype
andExtensionArray
. So I copied the testing code frompandas
that creates aDecimalArray
, and it revealed a number of issues with our type declarations.So this PR addresses that. Not everything is really tested in terms of all the
ExtensionArray
declarations, but the goal here is to see that a user-definedExtensionArray
will pass typing checks.