Skip to content

26302 add typing to assert star equal funcs #29364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

samuelsinayoko
Copy link
Contributor

@samuelsinayoko samuelsinayoko commented Nov 2, 2019

@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Nov 2, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks for taking a stab at these

check_dtype: bool = True,
err_msg: Optional[str] = None,
check_same: Optional[str] = None,
obj: str = "numpy array",
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could annotate return of None here as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update this one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

elif is_interval_dtype(left) or is_interval_dtype(right):
assert_interval_array_equal(left.array, right.array)
elif is_interval_dtype(left) or is_interval_dtype(left):
# must cast to interval dtype to keep mypy happy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the complaint on this? This changes the actual assertions being done I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't cast to IntervalArray I get these errors

pandas/util/testing.py:1211: error: Argument 1 to "assert_interval_array_equal" has incompatible type "ExtensionArray"; expected "IntervalArray"
pandas/util/testing.py:1211: error: Argument 2 to "assert_interval_array_equal" has incompatible type "ExtensionArray"; expected "IntervalArray"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the asserts though, they weren't needed here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by @jreback should use cast from the typing module here - don't want to actually construct a new object via a call to IntervalArray

@@ -1415,27 +1440,36 @@ def assert_equal(left, right, **kwargs):
"""
__tracebackhide__ = True

if isinstance(left, pd.Index):
if isinstance(left, Index):
assert isinstance(right, Index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the message here? These changes also impact the tests so don't think we want these

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we don't want to actually change the semantics here, remove these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I comment out assert isinstance(right, Index) then mypy complains about assert_index_equal(left, right, **kwargs):

pandas/util/testing.py:1443: error: Argument 2 to "assert_index_equal" has incompatible type "Union[DataFrame, Index]"; expected "Index"
pandas/util/testing.py:1443: error: Argument 2 to "assert_index_equal" has incompatible type "Union[DataFrame, Any]"; expected "Index"

But assuming left is an index, assert_index_equal(left, right) should raise if right isn't an index, so I don't think adding an assert isinstance(right, index) changes the semantics...? We will just catch the exception earlier based on the type.
I got the idea from @simonjayhawkins but I might have misunderstood.
Happy to do something different if there's a better way of handling this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I've just pushed another attempt (0c2f692) that uses the _check_isinstance(left, right, Index) command, which is always called by assert_index_equal. There's no change in semantics as far as I can tell. If that line doesn't raise then it's safe to case right to Index which keeps mypy happy.
If this is OK I can try and do something similar for the other types in assert_equal. Please let me know what you think.

@@ -1415,27 +1440,36 @@ def assert_equal(left, right, **kwargs):
"""
__tracebackhide__ = True

if isinstance(left, pd.Index):
if isinstance(left, Index):
assert isinstance(right, Index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we don't want to actually change the semantics here, remove these

# must cast to interval dtype to keep mypy happy
assert is_interval_dtype(right)
assert is_interval_dtype(left)
left_array = IntervalArray(left.array)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we don't want to do this (cast is ok), but don't actually coerce with a constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha got it, wasn't aware of typing.cast. Have updated the code.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good - a few more comments

elif is_interval_dtype(left) or is_interval_dtype(right):
assert_interval_array_equal(left.array, right.array)
elif is_interval_dtype(left) or is_interval_dtype(left):
# must cast to interval dtype to keep mypy happy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by @jreback should use cast from the typing module here - don't want to actually construct a new object via a call to IntervalArray

@samuelsinayoko
Copy link
Contributor Author

OK this is ready for another review. I believe I've addressed all your comments.

@WillAyd
Copy link
Member

WillAyd commented Nov 8, 2019

Looks like CI is failing - can you take a look?

@samuelsinayoko
Copy link
Contributor Author

CI is passing now.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few minor things around unnecessary casts but otherwise good to go

assert_index_equal(
left.left, right.left, exact=exact, obj="{obj}.left".format(obj=obj)
)
assert_index_equal(
left.right, right.right, exact=exact, obj="{obj}.left".format(obj=obj)
)
left = cast(IntervalArray, left)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these casts required? Seems like they only subsequently get sent to assert_attr_equal which is untyped, so not sure what the failure would be (may also be overlooking)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these casts are not required here.

$ mypy pandas --warn-redundant-casts
pandas\util\testing.py:886: error: Redundant cast to "IntervalArray"
pandas\util\testing.py:887: error: Redundant cast to "IntervalArray"

elif is_interval_dtype(left) or is_interval_dtype(right):
assert_interval_array_equal(left.array, right.array)
elif is_interval_dtype(left) or is_interval_dtype(left):
# must cast to interval dtype to keep mypy happy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can delete this comment

@@ -1415,27 +1438,36 @@ def assert_equal(left, right, **kwargs):
"""
__tracebackhide__ = True

if isinstance(left, pd.Index):
if isinstance(left, Index):
right = cast(Index, right)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need any of the casts here; isinstance will narrow the types appropriately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I remove the cast I get the following error.

pandas/util/testing.py:1440: error: Argument 2 to "assert_index_equal" has incompatible type "Union[DataFrame, Index]"; expected "Index"
pandas/util/testing.py:1440: error: Argument 2 to "assert_index_equal" has incompatible type "Union[DataFrame, Any]"; expected "Index"

@jreback jreback added this to the 1.0 milestone Nov 13, 2019
@jreback
Copy link
Contributor

jreback commented Nov 13, 2019

lgtm. @simonjayhawkins @WillAyd

any way we can get around having to cast in lots of places?

@simonjayhawkins
Copy link
Member

any way we can get around having to cast in lots of places?

we need a cast nearly every time a is_* method is called.

mypy can't perform type narrowing in these cases. would need to use an isinstance check instead to avoid casts (for nominal subtypes).

maybe Protocol could be used for structural subtypes see https://mypy.readthedocs.io/en/latest/protocols.html#simple-user-defined-protocols or wait til py3.8 is our minimum version?

@@ -53,6 +53,7 @@
Series,
bdate_range,
)
from pandas._typing import AnyArrayLike
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnyArrayLike resolves to Any.

If it adds value in the form of code documentation then OK, but mypy is effectively not checking these annotations.

assert_index_equal(
left.left, right.left, exact=exact, obj="{obj}.left".format(obj=obj)
)
assert_index_equal(
left.right, right.right, exact=exact, obj="{obj}.left".format(obj=obj)
)
left = cast(IntervalArray, left)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these casts are not required here.

$ mypy pandas --warn-redundant-casts
pandas\util\testing.py:886: error: Redundant cast to "IntervalArray"
pandas\util\testing.py:887: error: Redundant cast to "IntervalArray"

@@ -1185,8 +1201,11 @@ def assert_series_equal(
right._internal_get_values(),
check_dtype=check_dtype,
)
elif is_interval_dtype(left) or is_interval_dtype(right):
assert_interval_array_equal(left.array, right.array)
elif is_interval_dtype(left) or is_interval_dtype(left):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now just elif is_interval_dtype(left) or should the second condition not have been changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I think change was my mistake. Have removed it.

assert_index_equal(left, right, **kwargs)
elif isinstance(left, pd.Series):
elif isinstance(left, Series):
right = cast(Series, right)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these casts shouldn't be added here. casts are for when we know better than mypy. in these cases the second argument to assert_series_equal etc may be anything. since the is_instance checks are performed in assert_series_equal etc.

it may be better to type all the the assert functions as

def assert_index_equal(
    left: Any,
    right: Any,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions do expect certain types (as described in the docstring), so I would think we'd want to be more explicit here, even if we need to add a few casts. These casts aren't happening at runtime anyway, right? https://docs.python.org/3/library/typing.html#typing.cast

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thing the docstrings are wrong. The correct behaviour is that the assertion helpers take any two objects and raise an AssertionError if the objects are of the wrong type and not equal.

If the objects passed were expected to be of a certain type I would expect to get a TypeError.

>>> import pandas as pd
>>> import pandas.util.testing as tm
>>> pd.__version__
'0.26.0.dev0+984.g5ba71f883.dirty'
>>>
>>> tm.assert_frame_equal(pd.DataFrame(), pd.Series())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\pandas\pandas\util\testing.py", line 1330, in assert_frame_equal
    _check_isinstance(left, right, DataFrame)
  File "C:\Users\simon\pandas\pandas\util\testing.py", line 392, in _check_isinstance
    err_msg.format(name=cls_name, exp_type=cls, act_type=type(right))
AssertionError: DataFrame Expected type <class 'pandas.core.frame.DataFrame'>, found <class 'pandas.core.series.Series'> instead
>>>
>>> tm.assert_frame_equal(pd.DataFrame(), 'foo')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\pandas\pandas\util\testing.py", line 1330, in assert_frame_equal
    _check_isinstance(left, right, DataFrame)
  File "C:\Users\simon\pandas\pandas\util\testing.py", line 392, in _check_isinstance
    err_msg.format(name=cls_name, exp_type=cls, act_type=type(right))
AssertionError: DataFrame Expected type <class 'pandas.core.frame.DataFrame'>, found <class 'str'> instead
>>>
>>> tm.assert_frame_equal(pd.Series(), pd.Series())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\pandas\pandas\util\testing.py", line 1330, in assert_frame_equal
    _check_isinstance(left, right, DataFrame)
  File "C:\Users\simon\pandas\pandas\util\testing.py", line 388, in _check_isinstance
    err_msg.format(name=cls_name, exp_type=cls, act_type=type(left))
AssertionError: DataFrame Expected type <class 'pandas.core.frame.DataFrame'>, found <class 'pandas.core.series.Series'> instead
>>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if they are not typed as Any then this could trip up users who depend on this instance checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting a change to the implementation here? When I call an assert_* function I would not expect it to raise something other than an assertion error. So I wouldn't expect to get a TypeError. For example if I compare 3 with a bool with unittest.TestCase.assertIsInstance I just get an AssertionError, as I expect.

In [6]: unittest.TestCase.assertIsInstance(unittest.TestCase(), 3, bool)                       
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-6-4b36bb9f21c2> in <module>
----> 1 unittest.TestCase.assertIsInstance(unittest.TestCase(), 3, bool)

~/apps/miniconda3/envs/pandas-dev/lib/python3.7/unittest/case.py in assertIsInstance(self, obj, cls, msg)
   1274         if not isinstance(obj, cls):
   1275             standardMsg = '%s is not an instance of %r' % (safe_repr(obj), cls)
-> 1276             self.fail(self._formatMessage(msg, standardMsg))
   1277 
   1278     def assertNotIsInstance(self, obj, cls, msg=None):

~/apps/miniconda3/envs/pandas-dev/lib/python3.7/unittest/case.py in fail(self, msg)
    691     def fail(self, msg=None):
    692         """Fail immediately, with the given message."""
--> 693         raise self.failureException(msg)
    694 
    695     def assertFalse(self, expr, msg=None):

AssertionError: 3 is not an instance of <class 'bool'>

Am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternate approach here may be to only type the left argument as required and leave right typed to Any. I guess that is exactly what we are doing, since we just do an isinstance(left, ...) check to determine the function to call

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise we could narrow the types of both left and right and fall through to a failure message at the end that they aren’t of the same type, but maybe better inspected as a follow up; not sure we want that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR I think it makes sense to follow the existing implementation as described in the docstrings. If you call assert_index_equal I expect to be comparing two indexes, and if I feed something else to the function I’d expect a failure.
What mypy does is catch that failure at compile time when that is possible. Surely this is a good thing?
Again my understanding is that the casts aren’t changing the runtime behaviour so the current PR is faithful to the existing implementation.
In terms of semantics it also matches that of similar assertEquals methods in unittest.TestCase.
Shall we postpone deeper changes and refactorings to a follow up PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the following code

def test_raises():
    a = pd.Series([1])
    b = np.array([1])
    with pytest.raises(AssertionError):
        tm.assert_series_equal(a, b)

Is that valid code? IIUC mypy will complain about it, since b isn't a Series.

Copy link
Contributor Author

@samuelsinayoko samuelsinayoko Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well for me that's not valid code, I would expect this to be caught by mypy since we're comparing a series to an array. But my understanding is that @simonjayhawkins thinkgs we should leave mypy out of this and allow any object in these functions.
@TomAugspurger see also my response to @simonjayhawkins: #29364 (comment)

@jreback
Copy link
Contributor

jreback commented Nov 20, 2019

what is the status here? reading comments seems like we should type inputs as Any and then cast?

@simonjayhawkins
Copy link
Member

what is the status here? reading comments seems like we should type inputs as Any and then cast?

if we type input as Any, we wouldn't need to cast (nor would anyone else using stubgen to typecheck their code, since the assert_* functions is part of the public API)

@jreback
Copy link
Contributor

jreback commented Nov 20, 2019

what is the status here? reading comments seems like we should type inputs as Any and then cast?

if we type input as Any, we wouldn't need to cast (nor would anyone else using stubgen to typecheck their code, since the assert_* functions is part of the public API)

ok then is PR what you are expecting? so some errors now would be caught at type checking time rather than at test time, e.g. you can't pass a scalar to a Series typed assert function for example (assume that you can infer the type to a scalar for example)

@simonjayhawkins
Copy link
Member

ok then is PR what you are expecting?

not at present

so some errors now would be caught at type checking time rather than at test time, e.g. you can't pass a scalar to a Series typed assert function for example (assume that you can infer the type to a scalar for example)

if the intent is to pass a Series to assert_series_equal somewher in the codebase, then no this won't be picked up if we change the types to Any.

with regard to potential coding errors in the assert functions themselves;

we tend to use a _check_isinstance helper in the assert functions, so could probably add a cast in those functions after the _check_isinstance call to perform type narrowing and allow stricter static type checking of the bodies of the assert functions. This could be a follow-on though.

@samuelsinayoko
Copy link
Contributor Author

Just to be clear, @simonjayhawkins, you want all the assert_*_equal functions to have type left: Any, right: Any?

@simonjayhawkins
Copy link
Member

Just to be clear, @simonjayhawkins, you want all the assert_*_equal functions to have type left: Any, right: Any?

yes. (or object or left untyped)

@WillAyd
Copy link
Member

WillAyd commented Dec 17, 2019

@samuelsinayoko can you fix merge conflict and incorporate feedback from @simonjayhawkins ?

@jreback
Copy link
Contributor

jreback commented Dec 27, 2019

can you merge master and will look again

@samuelsinayoko
Copy link
Contributor Author

samuelsinayoko commented Dec 27, 2019 via email

@jreback jreback removed this from the 1.0 milestone Jan 1, 2020
@datapythonista
Copy link
Member

@samuelsinayoko do you have time to merge master, and check last comments in the discussion?

@samuelsinayoko
Copy link
Contributor Author

samuelsinayoko commented Jan 6, 2020 via email

@pep8speaks
Copy link

Hello @samuelsinayoko! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 266:89: E501 line too long (90 > 88 characters)
Line 276:89: E501 line too long (90 > 88 characters)
Line 862:89: E501 line too long (90 > 88 characters)

@samuelsinayoko
Copy link
Contributor Author

Just to be clear, @simonjayhawkins, you want all the assert_*_equal functions to have type left: Any, right: Any?

yes. (or object or left untyped)

I think we should just close this PR then. As you say if you want the types to be Any, or object, or just left untyped then we don't need this PR.

Personally I think we should be more strict rather than less, especially in tests, so obvious bugs are caught at compile time by mypy rather than propagate further down the code and cause an exception at runtime. If a function is of the form assert_series_equal(a, b), I wouldn't expect it to pass for anything other than two series. These two objects are never going to be equal if they're not both series.

@simonjayhawkins
Copy link
Member

As a side note: typing the left and right parameters of the assert* methods will cause many false positives as the return types of many methods are unions (or optional) and cannot be typed more exact without utilizing Literal, which is in py3.8. we are still supporting 3.6.

import numpy as np

import pandas as pd
import pandas.testing as tm

# optional return type

df = pd.DataFrame([[np.nan, 2, np.nan, 0]])
result = df.fillna("foo")
expected = pd.DataFrame([["foo", 2, "foo", 0]])
tm.assert_frame_equal(result, expected)

# union return type

ser = pd.Series([1, 2, 3])
result2 = pd.concat([ser, ser], axis=1)
expected2 = pd.DataFrame([[1, 1], [2, 2], [3, 3]])
tm.assert_frame_equal(result2, expected2)

test.py:11: error: Argument 1 to "assert_frame_equal" has incompatible type "Optional[DataFrame]"; expected "DataFrame"
test.py:18: error: Argument 1 to "assert_frame_equal" has incompatible type "Union[DataFrame, Series]"; expected "DataFrame"

@WillAyd
Copy link
Member

WillAyd commented Jan 14, 2020

Can we not just make the left argument something like Optional[NDFrame] (replace where applicable) and the right argument object?

This one has been open for a while. This isn't user facing code so I don't think need to split hairs

@WillAyd
Copy link
Member

WillAyd commented Feb 12, 2020

@samuelsinayoko is this still active? What do you think about previous suggestion?

@WillAyd
Copy link
Member

WillAyd commented Mar 11, 2020

Closing as stale but ping if you'd like to pick back up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Annotations to assert_*_equal functions
7 participants