Skip to content

TYP: enable strict_equality to prohibit comparisons of non-overlappin… #30619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 4, 2020

Conversation

simonjayhawkins
Copy link
Member

…g types

By default, mypy allows always-false comparisons like 42 == 'no'. Use this flag to prohibit such comparisons of non-overlapping types, and similar identity and container checks

pandas\core\dtypes\base.py:239: error: Non-overlapping equality check (left operand type: "str", right operand type: "Callable[[ExtensionDtype], str]")      
pandas\core\arrays\period.py:657: error: Non-overlapping container check (element type: "Callable[[Any], Any]", container item type: "Callable[[Any, Any], Any]")
pandas\core\arrays\period.py:658: error: Non-overlapping identity check (left operand type: "Callable[[Any], Any]", right operand type: "Callable[[Any, Any], Any]")
pandas\io\formats\format.py:307: error: Non-overlapping equality check (left operand type: "bool", right operand type: "Literal['truncate']")
pandas\io\formats\format.py:461: error: Non-overlapping equality check (left operand type: "bool", right operand type: "Literal['truncate']")
pandas\io\excel\_odfreader.py:159: error: Non-overlapping equality check (left operand type: "str", right operand type: "float")

@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Jan 2, 2020
@@ -59,6 +59,7 @@ class BooleanDtype(ExtensionDtype):
>>> pd.BooleanDtype()
BooleanDtype
"""
name = "boolean"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed for consistency with IntervalDType and CatergoricalDType. ExtensionDtype subclasses with class attribute for name do not need to override construct_from_string

on master:

>>> import pandas as pd; pd.__version__
'0.26.0.dev0+1554.g0913ed04d'
>>>
>>> pd.core.dtypes.dtypes.CategoricalDtype.name
'category'
>>>
>>> pd.core.dtypes.dtypes.IntervalDtype.name
'interval'
>>>
>>> pd.core.arrays.boolean.BooleanDtype.name
<property object at 0x0000026B6A278E58>
>>>
>>> pd.core.arrays.string_.StringDtype.name
<property object at 0x0000026B6A550908>
>>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do this the other way? I think property is a little nicer than class attributes

@TomAugspurger

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct_from_string is a class method, so I think it makes more sense to make name a class attribute.

(should probably consider making PeriodDType.name and DatetimeTZDtype.name class attributes for consistency)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the only downside is non-dynamic names? I don't think we can make PeriodDtype.name a class attribute, since it depends on the freq.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the static ones this seems fine

@@ -638,7 +638,7 @@ def _sub_period(self, other):
return new_data

def _addsub_int_array(
self, other: np.ndarray, op: Callable[[Any], Any],
self, other: np.ndarray, op: Callable[[Any, Any], Any],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binop needs two arguments

@@ -156,7 +156,7 @@ def _get_cell_value(self, cell, convert_float: bool) -> Scalar:
# GH5394
cell_value = float(cell.attributes.get((OFFICENS, "value")))

if cell_value == 0.0 and str(cell) != cell_value: # NA handling
if cell_value == 0.0: # NA handling
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove code always True

@@ -231,7 +231,7 @@ def __init__(
self,
series: "Series",
buf: Optional[IO[str]] = None,
length: bool = True,
length: Union[bool, str] = True,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length takes 'truncate'. used by options but not part of public api

@@ -450,7 +450,7 @@ def _get_adjustment() -> TextAdjustment:

class TableFormatter:

show_dimensions: bool
show_dimensions: Union[bool, str]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show_dimensions can also accept "truncate"

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool looks good. Some comments

@@ -59,6 +59,7 @@ class BooleanDtype(ExtensionDtype):
>>> pd.BooleanDtype()
BooleanDtype
"""
name = "boolean"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do this the other way? I think property is a little nicer than class attributes

@TomAugspurger

@@ -236,6 +236,10 @@ def construct_from_string(cls, string: str):
"""
if not isinstance(string, str):
raise TypeError(f"Expects a string, got {type(string).__name__}")

# error: Non-overlapping equality check (left operand type: "str", right
# operand type: "Callable[[ExtensionDtype], str]") [comparison-overlap]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this use the self._ensure_type ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think that's applicable here.

the assert is to avoid an ignore on the following line (the comparison).

the comparison would still fail strict_equality even if all subclasses override this method/override name attribute. This is because the type checker is checking this method using the name attribute in the base class. i.e. checking the base class isolation.

I think this is related to "Python language's lack of notation for abstract class attributes" see python/mypy#7760

so without the assert an ignore would be required here.

However, the assert prevents creating subclasses that call this method when not using a class attribute for name.

This could instead be enforced by mypy instead of the assert and i believe the implementation of name in the base class should be...

diff --git a/pandas/core/dtypes/base.py b/pandas/core/dtypes/base.py
index 1b4e7062b..5235aac91 100644
--- a/pandas/core/dtypes/base.py
+++ b/pandas/core/dtypes/base.py
@@ -1,6 +1,6 @@
 """Extend pandas with custom array types"""
 from typing import Any, List, Optional, Tuple, Type
-
+import abc
 import numpy as np
 
 from pandas.errors import AbstractMethodError
@@ -8,7 +8,7 @@ from pandas.errors import AbstractMethodError
 from pandas.core.dtypes.generic import ABCDataFrame, ABCIndexClass, ABCSeries
 
 
-class ExtensionDtype:
+class ExtensionDtype(metaclass=abc.ABCMeta):
     """
     A custom data type, to be paired with an ExtensionArray.
 
@@ -162,13 +162,15 @@ class ExtensionDtype:
         return "O"
 
     @property
-    def name(self) -> str:
+    @classmethod
+    @abc.abstractmethod
+    def name(cls) -> str:
         """
         A string identifying the data type.
 
         Will be used for display in, e.g. ``Series.dtype``
         """
-        raise AbstractMethodError(self)
+        pass
 
     @property
     def names(self) -> Optional[List[str]]:

However, mypy would give the following errors...

pandas\core\dtypes\base.py:241: error: Non-overlapping equality check (left operand type: "str", right operand type: "Callable[[], str]")
pandas\core\dtypes\dtypes.py:754: error: Signature of "name" incompatible with supertype "ExtensionDtype"
pandas\core\dtypes\dtypes.py:895: error: Signature of "name" incompatible with supertype "ExtensionDtype"
pandas\tests\extension\test_common.py:44: error: Cannot instantiate abstract class 'DummyDtype' with abstract attribute 'name'

the first is due to python/mypy#7760, the next two are because PeriodDType.name and DatetimeTZDtype.name are not implemented using class attributes

@jreback jreback added this to the 1.0 milestone Jan 3, 2020
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ex comments of others. generally would be open to stricter settings as you come across them @simonjayhawkins

@jreback jreback merged commit 06c5d24 into pandas-dev:master Jan 4, 2020
@jreback
Copy link
Contributor

jreback commented Jan 4, 2020

thanks @simonjayhawkins

@WillAyd
Copy link
Member

WillAyd commented Jan 4, 2020

Master seems to be failing now with pandas/core/generic.py:464: error: Non-overlapping identity check (left operand type: "str", right operand type: "Type[int]") which I think might be a result of this. Giving it a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants