Skip to content

Add npt.NDArray[np.uint64] to IndexType? #508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ischurov opened this issue Jan 12, 2023 · 6 comments · Fixed by #510
Closed

Add npt.NDArray[np.uint64] to IndexType? #508

ischurov opened this issue Jan 12, 2023 · 6 comments · Fixed by #510

Comments

@ischurov
Copy link

ischurov commented Jan 12, 2023

Currently, IndexType contains np_ndarray_int64 (a.k.a. npt.NDArray[np.int64]). Is that possible to add npt.NDArray[np.uint64] to IndexType as well?

To Reproduce

import numpy as np
import numpy.typing as npt
import pandas as pd

df = pd.DataFrame(dict(x=[1, 2, 3]), index=np.array([10, 20, 30], dtype="uint64"))
assert df.index.dtype == np.uint64

def get_NDArray(df: pd.DataFrame, key: npt.NDArray[np.uint64]):
    df2 = df.loc[key]
    reveal_type(df2)
    return df2
  1. Indicate which type checker you are using (mypy or pyright).

Both.

  1. Show the error message received from that type checker while checking your example.

In pyright, the inferred type of df2 is Series[Unkown], which is incorrect (should be DataFrame instead).

In mypy, it says

error: Invalid index type "ndarray[Any, dtype[unsignedinteger[_64Bit]]]" 
for "_LocIndexerFrame"; expected type 
"Union[slice, ndarray[Any, dtype[signedinteger[_64Bit]]], Index, List[int], 
Series[int], Series[bool], ndarray[Any, dtype[bool_]], List[bool], 
List[<nothing>], Tuple[Union[slice, ndarray[Any, dtype[signedinteger[_64Bit]]], 
Index, List[int], Series[int], Series[bool], ndarray[Any, dtype[bool_]], 
List[bool], List[<nothing>], Hashable], Union[List[<nothing>], slice, 
Series[bool], Callable[..., Any]]]]"

Please complete the following information:

  • OS: MacOS
  • OS Version: 13.0.1 (22A400)
  • python version: 3.10.8
  • version of type checker: mypy: 0.991, Pylance language server 2023.1.10 (pyright bbf0ae78)
  • version of installed pandas-stubs: 1.5.2.230105 (build pyhd8ed1ab_2)
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jan 12, 2023

The problem here is that df2 = df.loc[key] is ambiguous. The key could be referring to the columns or the rows. The solution in this case is to use df2 = df.loc[key, :] and then the revealed type will be a DataFrame.

So I don't think the issue is that npt.NDArray[np.uint64] should be added to IndexType. The issue is how you use DataFrame.loc[]. As implemented, that is dynamically typed. But we are doing static typing with the stubs, so you have to use pandas in a way that removes the ambiguity.

But maybe there is something else you are trying to do here that we need to address.

@ischurov
Copy link
Author

@Dr-Irv yeah, I tried df.loc[key, :] as well, and get a similar error with mypy:

error: Invalid index type "Tuple[ndarray[Any, dtype[unsignedinteger[_64Bit]]], slice]" for "_LocIndexerFrame"; expected type "Union[slice, ndarray[Any, dtype[signedinteger[_64Bit]]], Index, List[int], Series[int], Series[bool], ndarray[Any, dtype[bool_]], List[bool], List[<nothing>], Tuple[Union[slice, ndarray[Any, dtype[signedinteger[_64Bit]]], Index, List[int], Series[int], Series[bool], ndarray[Any, dtype[bool_]], List[bool], List[<nothing>], Hashable], Union[List[<nothing>], slice, Series[bool], Callable[..., Any]]]]"  [index]

and the following error with pyright:

(parameter) key: NDArray[uint64]
Argument of type "tuple[NDArray[uint64], slice]" cannot be assigned to parameter "idx" of type "ScalarT@__getitem__ | tuple[IndexType | MaskType | _IndexSliceTuple[Unknown], ScalarT@__getitem__ | None] | None" in function "__getitem__"
  Type "tuple[NDArray[uint64], slice]" cannot be assigned to type "ScalarT@__getitem__ | tuple[IndexType | MaskType | _IndexSliceTuple[Unknown], ScalarT@__getitem__ | None] | None"
    Type "tuple[NDArray[uint64], slice]" cannot be assigned to type "Scalar"
      Type "tuple[NDArray[uint64], slice]" cannot be assigned to type "Scalar"
        "tuple[NDArray[uint64], slice]" is incompatible with "str"
        "tuple[NDArray[uint64], slice]" is incompatible with "bytes"
        "tuple[NDArray[uint64], slice]" is incompatible with "date"
        "tuple[NDArray[uint64], slice]" is incompatible with "datetime"
        "tuple[NDArray[uint64], slice]" is incompatible with "timedelta"

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jan 12, 2023

@ischurov can you see which version of numpy you have installed? I'm unable to reproduce with numpy 1.23.5

@ischurov
Copy link
Author

I have numpy version 1.24.1. Indeed, it is not reproducible on 1.23.5.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jan 12, 2023

OK. Right now, we have a pandas issue with numpy 1.24 that causes one of our tests to fail when we run the code. So we have pinned our tests to 1.23.5.

I'm going to change the title of the issue to reflect this is compatibility with 1.24.1

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jan 12, 2023

So I'm doing a PR to address this and to allow our tests to deal with numpy 1.24.1.

It turns out that np.uint64 is not a subclass of np.int64 (which is a bit of a surprise). So the solution is to just use np.integer instead of np.int64 when referring to what can do the indexing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants