-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYPING: Partial typing of Categorical #27318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/_typing.py
Outdated
@@ -26,8 +26,11 @@ | |||
) | |||
ArrayLike = TypeVar("ArrayLike", ABCExtensionArray, np.ndarray) | |||
DatetimeLikeScalar = TypeVar("DatetimeLikeScalar", Period, Timestamp, Timedelta) | |||
Dtype = Union[str, np.dtype, ExtensionDtype] | |||
Dtype = Union[str, np.dtype, "dtypes.ExtensionDtype"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I don't do it like this, I get a circular import error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually resolve back to "ExtensionDtype" or does it replace it with Any
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it works in PyCharm + I tried setting the string to a random string, which mypy won't accept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. Wondering if we can extend that to solve #27146 (separate PR obviously)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that should be possible, though it wold be even better if the ABCs could be used...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't you use 'ExtendionDtype' here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I guess maybe revert to what you had before is more clear (e.g. do't use the string)
@@ -519,6 +521,8 @@ def astype(self, dtype, copy=True): | |||
|
|||
""" | |||
if is_categorical_dtype(dtype): | |||
dtype = cast(Union[str, CategoricalDtype], dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy is not able to infer the dtype type here, so I have to do a cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a typing dummy, can you explain why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add something to the contributing guide about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for keeping these going
pandas/_typing.py
Outdated
@@ -26,8 +26,11 @@ | |||
) | |||
ArrayLike = TypeVar("ArrayLike", ABCExtensionArray, np.ndarray) | |||
DatetimeLikeScalar = TypeVar("DatetimeLikeScalar", Period, Timestamp, Timedelta) | |||
Dtype = Union[str, np.dtype, ExtensionDtype] | |||
Dtype = Union[str, np.dtype, "dtypes.ExtensionDtype"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually resolve back to "ExtensionDtype" or does it replace it with Any
?
pandas/_typing.py
Outdated
FilePathOrBuffer = Union[str, Path, IO[AnyStr]] | ||
|
||
FrameOrSeries = TypeVar("FrameOrSeries", ABCSeries, ABCDataFrame) | ||
Scalar = Union[str, int, float] | ||
|
||
# TODO(GH26403): Replace with Optional[bool] or bool | ||
OrderedType = Union[None, bool, object] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always go for Optional
instead of Union[None...]
.
Also object
is very rarely what you want in Typing so would want to use Any
instead, though the Union
of Any
and another object also doesn't make sense since Any
encompasses everything...
Can't this just be Optional[bool]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also reading through the entire diff I see you just moved this from elsewhere so no worries but makes sense to correct here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional
only works with a single argument, so we have to use Union here.
The object type will also be removed in 1.0, so this type can just be bool
. I think @TomAugspurger will deal with that. If you look at the comment in core/dtypes/dtypes.py, line 21 to see the reasoning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha but the sentinel never actually gets returned from def ordered
below right? If that's correct then I think should be Ordered = Optional[bool]
and in the init for CategoricalDtype should be Union[Ordered, object]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I can't get that to work. Categorical.ordered
's type is same as CategoricalDtype._ordered
which typewise may be a object (but actually can't be).
I don't have a good solution, except waiting for ordered_sentinel
to be removed in 1.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share the error it gives you? If the problem is that the ordered_sentinel
is in the constructor of CategoricalDtype
I think could just explicitly annotate that parameter to be Optional[bool]
as well, which would be cleanest in the long run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas\core\arrays\categorical.py:475: error: Incompatible return value type (got "object", expected "Optional[bool]")
It all goes back to the ordered
parameter being of type Union[None, bool, object]
, while the ordered
attribute should be of type Optional[bool]
(but isn't, as its type comes from the parameter's type).
Don't know if there's a pattern for having different parameter and attribute types...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just add # type: ignore
to that one line? I think that is the lesser of all evils and easiest to revert whenever that change gets made
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nitpicky but can you also change the name from OrderedType
? That would seem to imply that this is something like a TypeVar or NewType.
Either OrderedAlias
or just plain Ordered
would be a better fit
pandas/_typing.py
Outdated
@@ -26,8 +26,11 @@ | |||
) | |||
ArrayLike = TypeVar("ArrayLike", ABCExtensionArray, np.ndarray) | |||
DatetimeLikeScalar = TypeVar("DatetimeLikeScalar", Period, Timestamp, Timedelta) | |||
Dtype = Union[str, np.dtype, ExtensionDtype] | |||
Dtype = Union[str, np.dtype, "dtypes.ExtensionDtype"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. Wondering if we can extend that to solve #27146 (separate PR obviously)
pandas/_typing.py
Outdated
FilePathOrBuffer = Union[str, Path, IO[AnyStr]] | ||
|
||
FrameOrSeries = TypeVar("FrameOrSeries", ABCSeries, ABCDataFrame) | ||
Scalar = Union[str, int, float] | ||
|
||
# TODO(GH26403): Replace with Optional[bool] or bool | ||
OrderedType = Union[None, bool, object] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha but the sentinel never actually gets returned from def ordered
below right? If that's correct then I think should be Ordered = Optional[bool]
and in the init for CategoricalDtype should be Union[Ordered, object]
pandas/_typing.py
Outdated
FilePathOrBuffer = Union[str, Path, IO[AnyStr]] | ||
|
||
FrameOrSeries = TypeVar("FrameOrSeries", ABCSeries, ABCDataFrame) | ||
Scalar = Union[str, int, float] | ||
|
||
# TODO(GH26403): Replace with Optional[bool] or bool | ||
OrderedType = Union[None, bool, object] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just add # type: ignore
to that one line? I think that is the lesser of all evils and easiest to revert whenever that change gets made
pandas/_typing.py
Outdated
FilePathOrBuffer = Union[str, Path, IO[AnyStr]] | ||
|
||
FrameOrSeries = TypeVar("FrameOrSeries", ABCSeries, ABCDataFrame) | ||
Scalar = Union[str, int, float] | ||
|
||
# TODO(GH26403): Replace with Optional[bool] or bool | ||
OrderedType = Union[None, bool, object] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nitpicky but can you also change the name from OrderedType
? That would seem to imply that this is something like a TypeVar or NewType.
Either OrderedAlias
or just plain Ordered
would be a better fit
pandas/core/dtypes/dtypes.py
Outdated
@@ -547,6 +548,8 @@ def update_dtype(self, dtype: "CategoricalDtype") -> "CategoricalDtype": | |||
).format(dtype=dtype) | |||
raise ValueError(msg) | |||
|
|||
dtype = cast(CategoricalDtype, dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this below subsequent comment? Seems to read better and more logically fit there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside I was kind of curious why this couldn't be inferred more easily since it's really just a str and a CategoricalDtype that are allowed here. So ideally I thought isinstance(str)
and isinstance(self)
would be two possible branches.
I think the reason this wasn't laid out that way is to allow Categorical
objects to be passed as well, but doing so actually causes an error:
t = pd.CategoricalDtype(categories=['b', 'a'], ordered=True)
c = pd.Categorical(['a','b','c','a','b','c'], ordered=True, categories=['c', 'b', 'a'])
>>> t.update_dtype(c)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/williamayd/clones/pandas/pandas/core/dtypes/dtypes.py", line 561, in update_dtype
new_ordered = dtype._ordered
AttributeError: 'Categorical' object has no attribute '_ordered'
cc @TomAugspurger for intent. If a bug maybe we drop this annotation for now and clean up with bug resolution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a bug in update_dtype
due to the changes we recently made with ordered cc @jschendel
The code in there is assuming we have a CategoricalDtype (which has a _ordered
now), but it can apparently also be a Categorical (or should that raise?), because is_dtype
allows that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks - for now I think OK to remove annotation from this PR then and can be addressed with whatever bug fix is there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #27338 to address this
looks good, small comment, merge master and ping on green. |
Can we address my comment on |
can you rebase |
c3919fe
to
5a6de11
Compare
5a6de11
to
b76f27e
Compare
Rebased. |
lgtm. over to you @WillAyd |
Great thanks a lot @topper-123 |
Adds some typing to
Categorical
.This PR does the simpler typing of
Categorical
. In addition there are some more thorny typing that I've stayed away from for now to see if others can implement the basic types (list-likes etc.).