Skip to content

Add typing annotation to assert_index_equal #26535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 8, 2019
39 changes: 23 additions & 16 deletions pandas/util/testing.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import string
import tempfile
import traceback
from typing import Union
import warnings
import zipfile

Expand Down Expand Up @@ -515,9 +516,14 @@ def equalContents(arr1, arr2):
return frozenset(arr1) == frozenset(arr2)


def assert_index_equal(left, right, exact='equiv', check_names=True,
check_less_precise=False, check_exact=True,
check_categorical=True, obj='Index'):
def assert_index_equal(left: Index,
right: Index,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a raw Index, is it possible to add a typing.TypeVar to _typing.py:

AnyIndex = TypeVar('AnyIndex ', pd.Index, pd.RangeIndex, pd.Int64Index, pd.MultiIndex, + other index types...)

and then use that here (I'm not super aqianted with typing, so could be wrong)?

That would avoid the need to adapt the code below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok as all of these are Index subclasses

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point was that AnyIndex would capture .levels and other attributes that are not set on the base Index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK too. MyPy would still complain about those attributes not existing on Index

exact: Union[bool, str] = 'equiv',
check_names: bool = True,
check_less_precise: Union[bool, int] = False,
check_exact: bool = True,
check_categorical: bool = True,
obj: str = 'Index') -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general comment but NoReturn may be applicable here at some point when we drop 3.5, though not sure if Optional[NoReturn] makes sense

"""Check that left and right Index are equal.

Parameters
Expand Down Expand Up @@ -587,19 +593,20 @@ def _get_ilevel_values(index, level):
raise_assert_detail(obj, msg1, msg2, msg3)

# MultiIndex special comparison for little-friendly error messages
if left.nlevels > 1:
for level in range(left.nlevels):
# cannot use get_level_values here because it can change dtype
llevel = _get_ilevel_values(left, level)
rlevel = _get_ilevel_values(right, level)

lobj = 'MultiIndex level [{level}]'.format(level=level)
assert_index_equal(llevel, rlevel,
exact=exact, check_names=check_names,
check_less_precise=check_less_precise,
check_exact=check_exact, obj=lobj)
# get_level_values may change dtype
_check_types(left.levels[level], right.levels[level], obj=obj)
if isinstance(left, MultiIndex) and isinstance(right, MultiIndex):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is needed to avoid type issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Otherwise, mypy showed
pandas/util/testing.py:608: error: "Index" has no attribute "levels"; maybe "nlevels"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this just replace the existing condition then? Seems duplicative to have both checks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that if isinstance(left, MultiIndex) and if left.nlevels > 1 are equivalent so no need to have both (just the former).

Also does this have any impact on assertions given the check previously only looked at the left index and not both?

if left.nlevels > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this line should be removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or alternately we could keep original code in tact and simply cast inside of this condition; that would prevent any actual code changes but give mypy the inference it needs that we are only working with MI in this branch.

I'd actually prefer that but @jreback may have differing thoughts

Copy link
Contributor

@jreback jreback Jun 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah maybe the cast inside the condition is better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon me. I am not familiar with casting in Python. By casting, do you mean left = MuliIndex(left)? This is a construction of a new MuliIndex which result in type error in our case TypeError: Must pass both levels and labels.

Or one more type annotation?

left  # type: MultiIndex
right  # type: MultiIndex
_check_types(left.levels[level], right.levels[level], obj=obj)

mypy still regards left and right as Index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for level in range(left.nlevels):
# cannot use get_level_values here because it can change dtype
llevel = _get_ilevel_values(left, level)
rlevel = _get_ilevel_values(right, level)

lobj = 'MultiIndex level [{level}]'.format(level=level)
assert_index_equal(llevel, rlevel,
exact=exact, check_names=check_names,
check_less_precise=check_less_precise,
check_exact=check_exact, obj=lobj)
# get_level_values may change dtype
_check_types(left.levels[level], right.levels[level], obj=obj)

# skip exact index checking when `check_categorical` is False
if check_exact and check_categorical:
Expand Down