Skip to content

Problem accessing .loc with tuple for MultiIndex Series #899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lionel42 opened this issue Apr 2, 2024 · 3 comments · Fixed by #901
Closed

Problem accessing .loc with tuple for MultiIndex Series #899

lionel42 opened this issue Apr 2, 2024 · 3 comments · Fixed by #901
Labels
Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@lionel42
Copy link
Contributor

lionel42 commented Apr 2, 2024

Describe the bug
When accessing with tuples a MultiIndex Serie, I get an Invalid index type error

The example code is from the offical pandas user guides:

https://pandas.pydata.org/docs/user_guide/advanced.html#advanced-indexing-with-hierarchical-index

It seems the type is simply not in the expected types, but it should by, as this is an offical way of accessing multi-index timeseries.

To Reproduce
Minimal runnable pandas example that is not properly checked by the stubs:

 import pandas as pd
s = pd.Series(

    [1, 2, 3, 4, 5, 6],

    index=pd.MultiIndex.from_product([["A", "B"], ["c", "d", "e"]]),

)

s.loc[[("A", "c"), ("B", "d")]]

using mypy

receiving:

 error: Invalid index type "list[tuple[str, str]]" for "_LocIndexerSeries[int]"; expected type "Series[bool] | ndarray[Any, dtype[bool_]] | list[bool] | Index[Any] | Sequence[float] | list[str] | slice | tuple[Index[Any] | Series[bool] | ndarray[Any, dtype[bool_]] | list[bool] | str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex | list[Any] | slice | tuple[str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex, ...], ...] | Callable[..., Any]"  [index]

  • OS: Windows
  • OS Version 10
  • python version 3.12
  • version of type checker 1.9.0
  • version of installed pandas-stubs 2.2.1.240316
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Apr 2, 2024

Thanks for the report.

I think that if we add list[_IndexSliceTuple] to:

idx: (
MaskType
| Index
| Sequence[float]
| list[str]
| slice
| _IndexSliceTuple
| Callable

that will fix this. PR with tests welcome.

@Dr-Irv Dr-Irv added the Indexing Related to indexing on series/frames, not to indexes themselves label Apr 2, 2024
lionel42 added a commit to lionel42/pandas-stubs that referenced this issue Apr 3, 2024
@lionel42 lionel42 mentioned this issue Apr 3, 2024
2 tasks
@lionel42
Copy link
Contributor Author

lionel42 commented Apr 3, 2024

I updated a test case for the issue in the PR.

I tested the above suggested solution which fails with mypy .

It gives

tests\test_series.py:186: error: Invalid index type "list[tuple[str, str]]" for "_LocIndexerSeries[int]"; expected type "Series[bool] | ndarray[Any, dtype[bool_]] | list[bool] | Index[Any] | Sequence[float] | list[str] | slice | tuple[Index[Any] | Series[bool] | ndarray[Any, dtype[bool_]] | list[bool] | str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex | list[Any] | slice | tuple[str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex, ...], ...] | list[tuple[Index[Any] | Series[bool] | ndarray[Any, dtype[bool_]] | list[bool] | str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex | list[Any] | slice | tuple[str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex, ...], ...]] | Callable[..., Any]"  [index]

instead of

tests\test_series.py:186: error: Invalid index type "list[tuple[str, str]]" for "_LocIndexerSeries[int]"; expected type "Series[bool] | ndarray[Any, dtype[bool_]] | list[bool] | Index[Any] | Sequence[float] | list[str] | slice | tuple[Index[Any] | Series[bool] | ndarray[Any, dtype[bool_]] | list[bool] | str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex | list[Any] | slice | tuple[str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex, ...], ...] | Callable[..., Any]"  [index]

I also tried adding an additional overload option:

    # Multiindex case
    @overload
    def __getitem__(
        self,
        idx: list[_IndexSliceTuple],
    ) -> Series[S1]: ...

This one worked with mypy (no error raised) but it failed for pyright

Poe => test

===========================================
Beginning: 'Run mypy on 'tests' (using the local stubs) and on the local stubs'
===========================================

Success: no issues found in 226 source files

===========================================
End: 'Run mypy on 'tests' (using the local stubs) and on the local stubs', runtime: 39.946 seconds.
===========================================


===========================================
Beginning: 'Run pyright on 'tests' (using the local stubs) and on the local stubs'
===========================================


test_series.py:186:23 - error: No overloads for "__getitem__" match the provided arguments (reportCallIssue)
test_series.py:186:23 - error: Argument of type "list[tuple[Literal['A'], Literal['c']] | tuple[Literal['B'], Literal['d']]]" cannot be assigned to parameter "idx" of type "list[_IndexSliceTuple[Unknown]]" in function "__getitem__"
    "list[tuple[Literal['A'], Literal['c']] | tuple[Literal['B'], Literal['d']]]" is incompatible with "list[_IndexSliceTuple[Unknown]]"
      Type parameter "_T@list" is invariant, but "tuple[Literal['A'], Literal['c']] | tuple[Literal['B'], Literal['d']]" is not the same as "_IndexSliceTuple[Unknown]"
      Consider switching from "list" to "Sequence" which is covariant (reportArgumentType)
test_series.py:186:23 - error: "assert_type" mismatch: expected "Series[int]" but received "Unknown" (reportAssertTypeFailure)
3 errors, 0 warnings, 0 informations

===========================================
Step: 'Run pyright on 'tests' (using the local stubs) and on the local stubs' failed!
===========================================

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Apr 3, 2024

Instead of list[_IndexSliceTuple] that I suggested here: #899 (comment)

use Sequence[_IndexSliceTuple] . I tried it on the example above and it worked.

lionel42 added a commit to lionel42/pandas-stubs that referenced this issue Apr 4, 2024
Dr-Irv pushed a commit that referenced this issue Apr 4, 2024
* adding test case for #899

* formatted black

* fix for #899
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants