Skip to content

Using pandas.DataFrame.loc with a tuple[str, str] multi-index seems to elicit mypy errors #466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
goxberry opened this issue Dec 10, 2022 · 1 comment · Fixed by #494
Closed

Comments

@goxberry
Copy link

Describe the bug

Using pandas.DataFrame.loc with a multi-index -- expressed as a tuple[str, str], per the syntax described in https://pandas.pydata.org/docs/user_guide/advanced.html#advanced-indexing-with-hierarchical-index -- seems to elicit mypy errors.

To Reproduce

import itertools

import pandas


def main() -> None:
    rows = pandas.Index(["project A", "project B", "project C"])
    years: tuple[str, ...] = ("Year 1", "Year 2", "Year 3")
    quarters: tuple[str, ...] = ("Q1", "Q2", "Q3", "Q4")
    index_tuples: list[tuple[str, ...]] = list(itertools.product(years, quarters))
    cols = pandas.MultiIndex.from_tuples(index_tuples)
    budget = pandas.DataFrame(index=rows, columns=cols)
    multi_index: tuple[str, str] = ("Year 1", "Q1")
    budget.loc["project A", multi_index] = 4700


if __name__ == "__main__":
    main()

Running mypy version 0.991 on that source code yields the following error:

Invalid index type "Tuple[str, Tuple[str, str]]" for "_LocIndexerFrame"; expected type "Union[Series[bool], ndarray[Any, dtype[bool_]], List[bool], str, Tuple[Union[Index, Union[Series[bool], ndarray[Any, dtype[bool_]], List[bool]], Union[str, bytes, date, datetime, timedelta, bool, int, float, complex, Timestamp, Timedelta], List[Any], slice], ...], List[<nothing>]]"  [index]

Please complete the following information:

  • Linux
  • Ubuntu 22.04.1
  • 3.9.9
  • mypy 0.991
  • pandas-stubs 1.5.2.221124

Additional context

I tried changing

list[HashableT] | slice | Series[bool] | Callable,
by appending an ellipsis, but it had no effect. I suspect part of the issue I saw no effect is that I neglected to modify the overloads of _LocIndexerFrame.__getitem__ as well, so I may try that, and if it works, I'll submit a PR.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Dec 10, 2022

Thanks for the report. There are 2 changes to make here. One is support __getitem__(), the other is to support __setitem__:
In

idx: tuple[int | StrLike | tuple[ScalarT, ...], int | StrLike],

add tuple[ScalarT, ...] to the second part of the main tuple in the union.

In

Union[Index, MaskType, Scalar, list[ScalarT], slice], ...

add tuple[ScalarT, ...] after slice in the Union

Then create your test, testing both x = budget.loc["project A", multi_index] and the line you showed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants