Skip to content

TYP: enable strict_equality to prohibit comparisons of non-overlappin… #30619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 4, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 1 addition & 13 deletions pandas/core/arrays/boolean.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ class BooleanDtype(ExtensionDtype):
>>> pd.BooleanDtype()
BooleanDtype
"""
name = "boolean"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed for consistency with IntervalDType and CatergoricalDType. ExtensionDtype subclasses with class attribute for name do not need to override construct_from_string

on master:

>>> import pandas as pd; pd.__version__
'0.26.0.dev0+1554.g0913ed04d'
>>>
>>> pd.core.dtypes.dtypes.CategoricalDtype.name
'category'
>>>
>>> pd.core.dtypes.dtypes.IntervalDtype.name
'interval'
>>>
>>> pd.core.arrays.boolean.BooleanDtype.name
<property object at 0x0000026B6A278E58>
>>>
>>> pd.core.arrays.string_.StringDtype.name
<property object at 0x0000026B6A550908>
>>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do this the other way? I think property is a little nicer than class attributes

@TomAugspurger

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct_from_string is a class method, so I think it makes more sense to make name a class attribute.

(should probably consider making PeriodDType.name and DatetimeTZDtype.name class attributes for consistency)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the only downside is non-dynamic names? I don't think we can make PeriodDtype.name a class attribute, since it depends on the freq.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the static ones this seems fine


@property
def na_value(self) -> "Scalar":
Expand All @@ -79,19 +80,6 @@ def type(self) -> Type:
def kind(self) -> str:
return "b"

@property
def name(self) -> str:
"""
The alias for BooleanDtype is ``'boolean'``.
"""
return "boolean"

@classmethod
def construct_from_string(cls, string: str) -> ExtensionDtype:
if string == "boolean":
return cls()
return super().construct_from_string(string)

@classmethod
def construct_array_type(cls) -> "Type[BooleanArray]":
return BooleanArray
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/arrays/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,7 @@ def _check_compatible_with(self, other):
def dtype(self):
return self._dtype

# read-only property overwriting read/write
# error: Read-only property cannot override read-write property [misc]
@property # type: ignore
def freq(self):
"""
Expand Down Expand Up @@ -638,7 +638,7 @@ def _sub_period(self, other):
return new_data

def _addsub_int_array(
self, other: np.ndarray, op: Callable[[Any], Any],
self, other: np.ndarray, op: Callable[[Any, Any], Any],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binop needs two arguments

) -> "PeriodArray":
"""
Add or subtract array of integers; equivalent to applying
Expand Down
14 changes: 1 addition & 13 deletions pandas/core/arrays/string_.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ class StringDtype(ExtensionDtype):
>>> pd.StringDtype()
StringDtype
"""
name = "string"

#: StringDtype.na_value uses pandas.NA
na_value = libmissing.NA
Expand All @@ -54,19 +55,6 @@ class StringDtype(ExtensionDtype):
def type(self) -> Type:
return str

@property
def name(self) -> str:
"""
The alias for StringDtype is ``'string'``.
"""
return "string"

@classmethod
def construct_from_string(cls, string: str) -> ExtensionDtype:
if string == "string":
return cls()
return super().construct_from_string(string)

@classmethod
def construct_array_type(cls) -> "Type[StringArray]":
return StringArray
Expand Down
4 changes: 4 additions & 0 deletions pandas/core/dtypes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,10 @@ def construct_from_string(cls, string: str):
"""
if not isinstance(string, str):
raise TypeError(f"Expects a string, got {type(string).__name__}")

# error: Non-overlapping equality check (left operand type: "str", right
# operand type: "Callable[[ExtensionDtype], str]") [comparison-overlap]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this use the self._ensure_type ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think that's applicable here.

the assert is to avoid an ignore on the following line (the comparison).

the comparison would still fail strict_equality even if all subclasses override this method/override name attribute. This is because the type checker is checking this method using the name attribute in the base class. i.e. checking the base class isolation.

I think this is related to "Python language's lack of notation for abstract class attributes" see python/mypy#7760

so without the assert an ignore would be required here.

However, the assert prevents creating subclasses that call this method when not using a class attribute for name.

This could instead be enforced by mypy instead of the assert and i believe the implementation of name in the base class should be...

diff --git a/pandas/core/dtypes/base.py b/pandas/core/dtypes/base.py
index 1b4e7062b..5235aac91 100644
--- a/pandas/core/dtypes/base.py
+++ b/pandas/core/dtypes/base.py
@@ -1,6 +1,6 @@
 """Extend pandas with custom array types"""
 from typing import Any, List, Optional, Tuple, Type
-
+import abc
 import numpy as np
 
 from pandas.errors import AbstractMethodError
@@ -8,7 +8,7 @@ from pandas.errors import AbstractMethodError
 from pandas.core.dtypes.generic import ABCDataFrame, ABCIndexClass, ABCSeries
 
 
-class ExtensionDtype:
+class ExtensionDtype(metaclass=abc.ABCMeta):
     """
     A custom data type, to be paired with an ExtensionArray.
 
@@ -162,13 +162,15 @@ class ExtensionDtype:
         return "O"
 
     @property
-    def name(self) -> str:
+    @classmethod
+    @abc.abstractmethod
+    def name(cls) -> str:
         """
         A string identifying the data type.
 
         Will be used for display in, e.g. ``Series.dtype``
         """
-        raise AbstractMethodError(self)
+        pass
 
     @property
     def names(self) -> Optional[List[str]]:

However, mypy would give the following errors...

pandas\core\dtypes\base.py:241: error: Non-overlapping equality check (left operand type: "str", right operand type: "Callable[[], str]")
pandas\core\dtypes\dtypes.py:754: error: Signature of "name" incompatible with supertype "ExtensionDtype"
pandas\core\dtypes\dtypes.py:895: error: Signature of "name" incompatible with supertype "ExtensionDtype"
pandas\tests\extension\test_common.py:44: error: Cannot instantiate abstract class 'DummyDtype' with abstract attribute 'name'

the first is due to python/mypy#7760, the next two are because PeriodDType.name and DatetimeTZDtype.name are not implemented using class attributes

assert isinstance(cls.name, str), (cls, type(cls.name))
if string != cls.name:
raise TypeError(f"Cannot construct a '{cls.__name__}' from '{string}'")
return cls()
Expand Down
2 changes: 1 addition & 1 deletion pandas/io/excel/_odfreader.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ def _get_cell_value(self, cell, convert_float: bool) -> Scalar:
# GH5394
cell_value = float(cell.attributes.get((OFFICENS, "value")))

if cell_value == 0.0 and str(cell) != cell_value: # NA handling
if cell_value == 0.0: # NA handling
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove code always True

return str(cell)

if convert_float:
Expand Down
8 changes: 4 additions & 4 deletions pandas/io/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ def __init__(
self,
series: "Series",
buf: Optional[IO[str]] = None,
length: bool = True,
length: Union[bool, str] = True,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length takes 'truncate'. used by options but not part of public api

header: bool = True,
index: bool = True,
na_rep: str = "NaN",
Expand Down Expand Up @@ -450,7 +450,7 @@ def _get_adjustment() -> TextAdjustment:

class TableFormatter:

show_dimensions: bool
show_dimensions: Union[bool, str]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show_dimensions can also accept "truncate"

is_truncated: bool
formatters: formatters_type
columns: Index
Expand Down Expand Up @@ -554,7 +554,7 @@ def __init__(
max_rows: Optional[int] = None,
min_rows: Optional[int] = None,
max_cols: Optional[int] = None,
show_dimensions: bool = False,
show_dimensions: Union[bool, str] = False,
decimal: str = ".",
table_id: Optional[str] = None,
render_links: bool = False,
Expand Down Expand Up @@ -1276,7 +1276,7 @@ class FloatArrayFormatter(GenericArrayFormatter):
"""

def __init__(self, *args, **kwargs):
GenericArrayFormatter.__init__(self, *args, **kwargs)
super().__init__(*args, **kwargs)

# float_format is expected to be a string
# formatter should be used to pass a function
Expand Down
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ skip = pandas/__init__.py,pandas/core/api.py
ignore_missing_imports=True
no_implicit_optional=True
check_untyped_defs=True
strict_equality=True

[mypy-pandas.tests.*]
check_untyped_defs=False
Expand Down