-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
26302 add typing to assert star equal funcs #29364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
26302 add typing to assert star equal funcs #29364
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks for taking a stab at these
pandas/util/testing.py
Outdated
check_dtype: bool = True, | ||
err_msg: Optional[str] = None, | ||
check_same: Optional[str] = None, | ||
obj: str = "numpy array", | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could annotate return of None here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/util/testing.py
Outdated
elif is_interval_dtype(left) or is_interval_dtype(right): | ||
assert_interval_array_equal(left.array, right.array) | ||
elif is_interval_dtype(left) or is_interval_dtype(left): | ||
# must cast to interval dtype to keep mypy happy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the complaint on this? This changes the actual assertions being done I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I don't cast to IntervalArray I get these errors
pandas/util/testing.py:1211: error: Argument 1 to "assert_interval_array_equal" has incompatible type "ExtensionArray"; expected "IntervalArray"
pandas/util/testing.py:1211: error: Argument 2 to "assert_interval_array_equal" has incompatible type "ExtensionArray"; expected "IntervalArray"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the asserts though, they weren't needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned by @jreback should use cast
from the typing module here - don't want to actually construct a new object via a call to IntervalArray
pandas/util/testing.py
Outdated
@@ -1415,27 +1440,36 @@ def assert_equal(left, right, **kwargs): | |||
""" | |||
__tracebackhide__ = True | |||
|
|||
if isinstance(left, pd.Index): | |||
if isinstance(left, Index): | |||
assert isinstance(right, Index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the message here? These changes also impact the tests so don't think we want these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we don't want to actually change the semantics here, remove these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I comment out assert isinstance(right, Index)
then mypy complains about assert_index_equal(left, right, **kwargs)
:
pandas/util/testing.py:1443: error: Argument 2 to "assert_index_equal" has incompatible type "Union[DataFrame, Index]"; expected "Index"
pandas/util/testing.py:1443: error: Argument 2 to "assert_index_equal" has incompatible type "Union[DataFrame, Any]"; expected "Index"
But assuming left is an index, assert_index_equal(left, right) should raise if right isn't an index, so I don't think adding an assert isinstance(right, index)
changes the semantics...? We will just catch the exception earlier based on the type.
I got the idea from @simonjayhawkins but I might have misunderstood.
Happy to do something different if there's a better way of handling this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I've just pushed another attempt (0c2f692) that uses the _check_isinstance(left, right, Index)
command, which is always called by assert_index_equal. There's no change in semantics as far as I can tell. If that line doesn't raise then it's safe to case right to Index which keeps mypy happy.
If this is OK I can try and do something similar for the other types in assert_equal. Please let me know what you think.
pandas/util/testing.py
Outdated
@@ -1415,27 +1440,36 @@ def assert_equal(left, right, **kwargs): | |||
""" | |||
__tracebackhide__ = True | |||
|
|||
if isinstance(left, pd.Index): | |||
if isinstance(left, Index): | |||
assert isinstance(right, Index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we don't want to actually change the semantics here, remove these
pandas/util/testing.py
Outdated
# must cast to interval dtype to keep mypy happy | ||
assert is_interval_dtype(right) | ||
assert is_interval_dtype(left) | ||
left_array = IntervalArray(left.array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we don't want to do this (cast is ok), but don't actually coerce with a constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha got it, wasn't aware of typing.cast. Have updated the code.
…com:samuelsinayoko/pandas into 26302-add-typing-to-assert-star-equal-funcs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good - a few more comments
pandas/util/testing.py
Outdated
elif is_interval_dtype(left) or is_interval_dtype(right): | ||
assert_interval_array_equal(left.array, right.array) | ||
elif is_interval_dtype(left) or is_interval_dtype(left): | ||
# must cast to interval dtype to keep mypy happy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned by @jreback should use cast
from the typing module here - don't want to actually construct a new object via a call to IntervalArray
OK this is ready for another review. I believe I've addressed all your comments. |
Looks like CI is failing - can you take a look? |
CI is passing now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. A few minor things around unnecessary casts but otherwise good to go
pandas/util/testing.py
Outdated
assert_index_equal( | ||
left.left, right.left, exact=exact, obj="{obj}.left".format(obj=obj) | ||
) | ||
assert_index_equal( | ||
left.right, right.right, exact=exact, obj="{obj}.left".format(obj=obj) | ||
) | ||
left = cast(IntervalArray, left) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these casts required? Seems like they only subsequently get sent to assert_attr_equal
which is untyped, so not sure what the failure would be (may also be overlooking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these casts are not required here.
$ mypy pandas --warn-redundant-casts
pandas\util\testing.py:886: error: Redundant cast to "IntervalArray"
pandas\util\testing.py:887: error: Redundant cast to "IntervalArray"
pandas/util/testing.py
Outdated
elif is_interval_dtype(left) or is_interval_dtype(right): | ||
assert_interval_array_equal(left.array, right.array) | ||
elif is_interval_dtype(left) or is_interval_dtype(left): | ||
# must cast to interval dtype to keep mypy happy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can delete this comment
pandas/util/testing.py
Outdated
@@ -1415,27 +1438,36 @@ def assert_equal(left, right, **kwargs): | |||
""" | |||
__tracebackhide__ = True | |||
|
|||
if isinstance(left, pd.Index): | |||
if isinstance(left, Index): | |||
right = cast(Index, right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need any of the casts here; isinstance
will narrow the types appropriately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I remove the cast I get the following error.
pandas/util/testing.py:1440: error: Argument 2 to "assert_index_equal" has incompatible type "Union[DataFrame, Index]"; expected "Index"
pandas/util/testing.py:1440: error: Argument 2 to "assert_index_equal" has incompatible type "Union[DataFrame, Any]"; expected "Index"
lgtm. @simonjayhawkins @WillAyd any way we can get around having to cast in lots of places? |
we need a cast nearly every time a is_* method is called. mypy can't perform type narrowing in these cases. would need to use an isinstance check instead to avoid casts (for nominal subtypes). maybe Protocol could be used for structural subtypes see https://mypy.readthedocs.io/en/latest/protocols.html#simple-user-defined-protocols or wait til py3.8 is our minimum version? |
pandas/util/testing.py
Outdated
@@ -53,6 +53,7 @@ | |||
Series, | |||
bdate_range, | |||
) | |||
from pandas._typing import AnyArrayLike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AnyArrayLike resolves to Any.
If it adds value in the form of code documentation then OK, but mypy is effectively not checking these annotations.
pandas/util/testing.py
Outdated
assert_index_equal( | ||
left.left, right.left, exact=exact, obj="{obj}.left".format(obj=obj) | ||
) | ||
assert_index_equal( | ||
left.right, right.right, exact=exact, obj="{obj}.left".format(obj=obj) | ||
) | ||
left = cast(IntervalArray, left) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these casts are not required here.
$ mypy pandas --warn-redundant-casts
pandas\util\testing.py:886: error: Redundant cast to "IntervalArray"
pandas\util\testing.py:887: error: Redundant cast to "IntervalArray"
pandas/util/testing.py
Outdated
@@ -1185,8 +1201,11 @@ def assert_series_equal( | |||
right._internal_get_values(), | |||
check_dtype=check_dtype, | |||
) | |||
elif is_interval_dtype(left) or is_interval_dtype(right): | |||
assert_interval_array_equal(left.array, right.array) | |||
elif is_interval_dtype(left) or is_interval_dtype(left): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is now just elif is_interval_dtype(left)
or should the second condition not have been changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I think change was my mistake. Have removed it.
pandas/util/testing.py
Outdated
assert_index_equal(left, right, **kwargs) | ||
elif isinstance(left, pd.Series): | ||
elif isinstance(left, Series): | ||
right = cast(Series, right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these casts shouldn't be added here. casts are for when we know better than mypy. in these cases the second argument to assert_series_equal etc may be anything. since the is_instance checks are performed in assert_series_equal etc.
it may be better to type all the the assert functions as
def assert_index_equal(
left: Any,
right: Any,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions do expect certain types (as described in the docstring), so I would think we'd want to be more explicit here, even if we need to add a few casts. These casts aren't happening at runtime anyway, right? https://docs.python.org/3/library/typing.html#typing.cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thing the docstrings are wrong. The correct behaviour is that the assertion helpers take any two objects and raise an AssertionError if the objects are of the wrong type and not equal.
If the objects passed were expected to be of a certain type I would expect to get a TypeError.
>>> import pandas as pd
>>> import pandas.util.testing as tm
>>> pd.__version__
'0.26.0.dev0+984.g5ba71f883.dirty'
>>>
>>> tm.assert_frame_equal(pd.DataFrame(), pd.Series())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\simon\pandas\pandas\util\testing.py", line 1330, in assert_frame_equal
_check_isinstance(left, right, DataFrame)
File "C:\Users\simon\pandas\pandas\util\testing.py", line 392, in _check_isinstance
err_msg.format(name=cls_name, exp_type=cls, act_type=type(right))
AssertionError: DataFrame Expected type <class 'pandas.core.frame.DataFrame'>, found <class 'pandas.core.series.Series'> instead
>>>
>>> tm.assert_frame_equal(pd.DataFrame(), 'foo')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\simon\pandas\pandas\util\testing.py", line 1330, in assert_frame_equal
_check_isinstance(left, right, DataFrame)
File "C:\Users\simon\pandas\pandas\util\testing.py", line 392, in _check_isinstance
err_msg.format(name=cls_name, exp_type=cls, act_type=type(right))
AssertionError: DataFrame Expected type <class 'pandas.core.frame.DataFrame'>, found <class 'str'> instead
>>>
>>> tm.assert_frame_equal(pd.Series(), pd.Series())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\simon\pandas\pandas\util\testing.py", line 1330, in assert_frame_equal
_check_isinstance(left, right, DataFrame)
File "C:\Users\simon\pandas\pandas\util\testing.py", line 388, in _check_isinstance
err_msg.format(name=cls_name, exp_type=cls, act_type=type(left))
AssertionError: DataFrame Expected type <class 'pandas.core.frame.DataFrame'>, found <class 'pandas.core.series.Series'> instead
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if they are not typed as Any then this could trip up users who depend on this instance checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting a change to the implementation here? When I call an assert_* function I would not expect it to raise something other than an assertion error. So I wouldn't expect to get a TypeError. For example if I compare 3
with a bool
with unittest.TestCase.assertIsInstance I just get an AssertionError, as I expect.
In [6]: unittest.TestCase.assertIsInstance(unittest.TestCase(), 3, bool)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-6-4b36bb9f21c2> in <module>
----> 1 unittest.TestCase.assertIsInstance(unittest.TestCase(), 3, bool)
~/apps/miniconda3/envs/pandas-dev/lib/python3.7/unittest/case.py in assertIsInstance(self, obj, cls, msg)
1274 if not isinstance(obj, cls):
1275 standardMsg = '%s is not an instance of %r' % (safe_repr(obj), cls)
-> 1276 self.fail(self._formatMessage(msg, standardMsg))
1277
1278 def assertNotIsInstance(self, obj, cls, msg=None):
~/apps/miniconda3/envs/pandas-dev/lib/python3.7/unittest/case.py in fail(self, msg)
691 def fail(self, msg=None):
692 """Fail immediately, with the given message."""
--> 693 raise self.failureException(msg)
694
695 def assertFalse(self, expr, msg=None):
AssertionError: 3 is not an instance of <class 'bool'>
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternate approach here may be to only type the left
argument as required and leave right
typed to Any
. I guess that is exactly what we are doing, since we just do an isinstance(left, ...)
check to determine the function to call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise we could narrow the types of both left and right and fall through to a failure message at the end that they aren’t of the same type, but maybe better inspected as a follow up; not sure we want that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR I think it makes sense to follow the existing implementation as described in the docstrings. If you call assert_index_equal
I expect to be comparing two indexes, and if I feed something else to the function I’d expect a failure.
What mypy does is catch that failure at compile time when that is possible. Surely this is a good thing?
Again my understanding is that the casts aren’t changing the runtime behaviour so the current PR is faithful to the existing implementation.
In terms of semantics it also matches that of similar assertEquals methods in unittest.TestCase.
Shall we postpone deeper changes and refactorings to a follow up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the following code
def test_raises():
a = pd.Series([1])
b = np.array([1])
with pytest.raises(AssertionError):
tm.assert_series_equal(a, b)
Is that valid code? IIUC mypy will complain about it, since b
isn't a Series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well for me that's not valid code, I would expect this to be caught by mypy since we're comparing a series to an array. But my understanding is that @simonjayhawkins thinkgs we should leave mypy out of this and allow any object in these functions.
@TomAugspurger see also my response to @simonjayhawkins: #29364 (comment)
This reverts commit 6d38c1b.
…to-assert-star-equal-funcs
what is the status here? reading comments seems like we should type inputs as Any and then cast? |
if we type input as Any, we wouldn't need to cast (nor would anyone else using stubgen to typecheck their code, since the assert_* functions is part of the public API) |
ok then is PR what you are expecting? so some errors now would be caught at type checking time rather than at test time, e.g. you can't pass a scalar to a Series typed assert function for example (assume that you can infer the type to a scalar for example) |
not at present
if the intent is to pass a Series to assert_series_equal somewher in the codebase, then no this won't be picked up if we change the types to Any. with regard to potential coding errors in the assert functions themselves; we tend to use a |
Just to be clear, @simonjayhawkins, you want all the assert_*_equal functions to have type left: Any, right: Any? |
yes. (or |
@samuelsinayoko can you fix merge conflict and incorporate feedback from @simonjayhawkins ? |
can you merge master and will look again |
Will do in the next couple days.
… On 27 Dec 2019, at 19:52, Jeff Reback ***@***.***> wrote:
can you merge master and will look again
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@samuelsinayoko do you have time to merge master, and check last comments in the discussion? |
Sorry I haven’t had a chance to review this yet. Will take a look tonight.
… On 6 Jan 2020, at 12:16, Marc Garcia ***@***.***> wrote:
@samuelsinayoko do you have time to merge master, and check last comments in the discussion?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hello @samuelsinayoko! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
|
I think we should just close this PR then. As you say if you want the types to be Any, or object, or just left untyped then we don't need this PR. Personally I think we should be more strict rather than less, especially in tests, so obvious bugs are caught at compile time by mypy rather than propagate further down the code and cause an exception at runtime. If a function is of the form |
As a side note: typing the left and right parameters of the assert* methods will cause many false positives as the return types of many methods are unions (or optional) and cannot be typed more exact without utilizing Literal, which is in py3.8. we are still supporting 3.6. import numpy as np
import pandas as pd
import pandas.testing as tm
# optional return type
df = pd.DataFrame([[np.nan, 2, np.nan, 0]])
result = df.fillna("foo")
expected = pd.DataFrame([["foo", 2, "foo", 0]])
tm.assert_frame_equal(result, expected)
# union return type
ser = pd.Series([1, 2, 3])
result2 = pd.concat([ser, ser], axis=1)
expected2 = pd.DataFrame([[1, 1], [2, 2], [3, 3]])
tm.assert_frame_equal(result2, expected2) test.py:11: error: Argument 1 to "assert_frame_equal" has incompatible type "Optional[DataFrame]"; expected "DataFrame" |
Can we not just make the left argument something like This one has been open for a while. This isn't user facing code so I don't think need to split hairs |
@samuelsinayoko is this still active? What do you think about previous suggestion? |
Closing as stale but ping if you'd like to pick back up |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff