-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYP: core.arrays.numpy_ #31348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TYP: core.arrays.numpy_ #31348
Conversation
@@ -35,52 +35,68 @@ class PandasDtype(ExtensionDtype): | |||
|
|||
Parameters | |||
---------- | |||
dtype : numpy.dtype | |||
dtype : object | |||
Object to be converted to a NumPy data type object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see how this is more accurate than just "numpy.dtype", but can we be more specific here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's the types accepted by np.dtype which is more permissive than just another np.dtype. Object is straight out of the numpy docs.
do you want to restrict this as before, or establish the actual types accepted by numpy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if its straight out of the numpy docs, then sounds fine
def __init__(self, dtype): | ||
dtype = np.dtype(dtype) | ||
self._dtype = dtype | ||
self._type = dtype.type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche does turning this into properties have perf implications we need to worry about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for consistency with name that was changed in #31037
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thanks. That was the perf affect I had in mind, but pre-coffee i had it backwards.
pandas/core/arrays/numpy_.py
Outdated
""" | ||
A bit-width name for this data-type. | ||
|
||
Un-sized flexible data-type objects do not have this attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can leave this out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem. will remove. can we wrap all numpy dtypes?
pandas/core/arrays/numpy_.py
Outdated
return self._dtype.itemsize | ||
|
||
|
||
PandasArrayT = TypeVar("PandasArrayT", bound="PandasArray") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L166 is type self, so needed for constructor signature to ensure values is type self.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we use the name as a string in other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I didn't want to imply you necessarily needed to remove it, rather I want to understand it (eg what is the difference with using the class name as a string as we do elsewhere)
To me, this typing is adding a lot of black magic to the codebase, and I want to ensure it stays as understandable as possible for people (including me) that are not so familiar with typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an example for the case where the return type is from a base class method
class Base:
def foo(self) -> "Base":
return type(self)()
class Derived(Base):
x = 0
def bar(self) -> int:
return self.foo().x
print(Derived().bar())
This is valid python...
$ python pandas/test.py
0
however, mypy will report errors...
$ mypy pandas/test.py
pandas\test.py:10: error: "Base" has no attribute "x"
Found 1 error in 1 file (checked 1 source file)
whereas using a typevar...
from typing import TypeVar
_T = TypeVar("_T")
class Base:
def foo(self: _T) -> _T:
return type(self)()
class Derived(Base):
x = 0
def bar(self) -> int:
return self.foo().x
print(Derived().bar())
gives
$ mypy pandas/test.py
Success: no issues found in 1 source file
so typing the base class needs to use typevars to keep mypy green when we add types to the subclasses. I have reverted, but will probably need to add back when the subclasses get merged.
I've not included an example for the init case, but similar principal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that explanation helps (but if we start adding it, it needs to be explained in our contributing guidelines)
But: PandasArray
is not a base class, it is not further subclassed I think. So why is it needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, actually, StringArray now subclasses PandasArray, so forget the last sentence of what I just said ;)
Thanks! |
@simonjayhawkins a follow-up documenting this practice somewhere would be welcome! |
No description provided.