Skip to content

TYP: core.arrays.numpy_ #31348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 31, 2020
Merged

Conversation

simonjayhawkins
Copy link
Member

No description provided.

@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Jan 27, 2020
@@ -35,52 +35,68 @@ class PandasDtype(ExtensionDtype):

Parameters
----------
dtype : numpy.dtype
dtype : object
Object to be converted to a NumPy data type object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see how this is more accurate than just "numpy.dtype", but can we be more specific here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the types accepted by np.dtype which is more permissive than just another np.dtype. Object is straight out of the numpy docs.

do you want to restrict this as before, or establish the actual types accepted by numpy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its straight out of the numpy docs, then sounds fine

def __init__(self, dtype):
dtype = np.dtype(dtype)
self._dtype = dtype
self._type = dtype.type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche does turning this into properties have perf implications we need to worry about

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for consistency with name that was changed in #31037

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks. That was the perf affect I had in mind, but pre-coffee i had it backwards.

"""
A bit-width name for this data-type.

Un-sized flexible data-type objects do not have this attribute.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can leave this out

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem. will remove. can we wrap all numpy dtypes?

return self._dtype.itemsize


PandasArrayT = TypeVar("PandasArrayT", bound="PandasArray")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L166 is type self, so needed for constructor signature to ensure values is type self.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we use the name as a string in other places?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I didn't want to imply you necessarily needed to remove it, rather I want to understand it (eg what is the difference with using the class name as a string as we do elsewhere)

To me, this typing is adding a lot of black magic to the codebase, and I want to ensure it stays as understandable as possible for people (including me) that are not so familiar with typing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an example for the case where the return type is from a base class method

class Base:
    def foo(self) -> "Base":
        return type(self)()


class Derived(Base):
    x = 0

    def bar(self) -> int:
        return self.foo().x


print(Derived().bar())

This is valid python...

$ python pandas/test.py
0

however, mypy will report errors...

$ mypy pandas/test.py
pandas\test.py:10: error: "Base" has no attribute "x"
Found 1 error in 1 file (checked 1 source file)

whereas using a typevar...

from typing import TypeVar

_T = TypeVar("_T")


class Base:
    def foo(self: _T) -> _T:
        return type(self)()


class Derived(Base):
    x = 0

    def bar(self) -> int:
        return self.foo().x


print(Derived().bar())

gives

$ mypy pandas/test.py
Success: no issues found in 1 source file

so typing the base class needs to use typevars to keep mypy green when we add types to the subclasses. I have reverted, but will probably need to add back when the subclasses get merged.

I've not included an example for the init case, but similar principal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that explanation helps (but if we start adding it, it needs to be explained in our contributing guidelines)

But: PandasArray is not a base class, it is not further subclassed I think. So why is it needed here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually, StringArray now subclasses PandasArray, so forget the last sentence of what I just said ;)

@WillAyd WillAyd added this to the 1.1 milestone Jan 31, 2020
@WillAyd WillAyd merged commit 12af3cd into pandas-dev:master Jan 31, 2020
@WillAyd
Copy link
Member

WillAyd commented Jan 31, 2020

Thanks!

@jorisvandenbossche
Copy link
Member

@simonjayhawkins a follow-up documenting this practice somewhere would be welcome!

@simonjayhawkins simonjayhawkins deleted the typing-pandas-array branch January 31, 2020 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants