-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYP: Numpy compatibility of definition of "array like" #41807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I haven't seen this come up elsewhere. If anything, numpy's definition you quoted seems like it doesn't rule out much of anything. |
we would probably want to see if that is consistent with out Series and DataFrame constructors and use it there. in other parts of the code, we probably pass around EAs or numpy arrays so our alias is fine in those cases. Some public methods may allow wider types than our ArrayLike alias, for those cases we probably use AnyArrayLike, which includes Series and Index or add other allowable types. Some of the public methods where we do this may benefit from a wider alias that AnyArrayLike and maybe could consider enhancing the api in places for compatibility with numpy array-like definition. For return types, the type should be as precise as possible in annotations, so a wide array-like alias has no relevance there, so the return type of searchsorted should be restricted to the actual types returned, and not array-like to match the numpy docs |
That's my point. I was confused because when you see So we have methods like |
For My question is whether we continue in this direction, i.e.,
Are we confusing users because of numpy's definition being different than ours? I'm not a fan of their definition, but I don't think we can change that. |
I think we use array-like in the public docstrings in many places. those should be in places where the accepted types are the same as numpy array-like. (with the possible exception of Scalar)
I don't think that's true. array-like in docstrings probably means that other iterable types are allowed. Do not confuse the doc-string types with the annotation aliases.
EAs should probably support the __array__ protocol so array-like in a docstring also implicitly includes EAs (we do support the protocol for internal EAs) The naming we use for the typing alias may not be an issue. I think when/if the types are published they are expanded and when mypy reports errors it also expands the alias. so there shouldn't be confusion for users. but sure we could probably use an alias representing numpy array-like, that we use in public methods where we state in the docstring that the input is array-like (and check that we do actually support the types, well mypy should do that for us, and fix where needed) where our docstring is more restrictive in the allowable types, but passes them onto a numpy function that accepts array-like or is a function/method with the same name/purpose as a numpy function, then we could also considering updating those (the second being more of a priority/nice to do than the first) for EA, we can't widen the accepted types without changing the EA api, 3rd party authors would need notice to test/change their code. |
I made that statement by inference. The docstring for
And I'm questioning whether we are consistent through the docs, and when we say "array-like" in the docs, do we mean "numpy array-like" or "something like an array, but not a scalar" ?
I think it might be an issue for pandas contributors to understand the difference. My confusion arose because when I saw numpy say "array-like", I thought it was similar to our type
So are you saying that when the pandas docs say "array-like", we should take that to mean "numpy array-like" ?
So this is where the confusion is. I don't know how we are defining "array-like" in the EA API
There may be other examples as well. These are the ones I've dealt with. |
Based on dev discussion on 6/9/2021, pandas will define "array-like" to mean "something like an array", which does not include scalars. Need to check the docs that we are consistent between examples and documented API to see if we are consistent. |
Maybe "sequence with dtype attribute"? i think that excludes 0-dim ndarrays (which i view as desirable) |
No, because in many cases, we don't require the dtype. E.g. |
Another option would to explicitly include the dimensionality of the array-like object, e.g. >=1D array-like, 2D array-like, etc. |
see scikit-learn/scikit-learn#16705 (comment) we could maybe just have the alias for numpy.typing in our pandas._typing, maybe this would also simplify #41185. we could gradually replace |
opened #41945 |
Is this issue basically addressing the problem that the following does not type check? (Or should we track that in a separate issue?) import numpy.typing as npt
import pandas as pd
def takes_arraylike(a: npt.ArrayLike) -> None:
...
takes_arraylike(pd.Series([1, 2, 3]).values)
takes_arraylike(pd.Series([1, 2, 3]).array) I would have expected these two expression to type check properly, i.e., a compatiblity between the returned
Interestingly, |
Your example doesn't type check because |
... which comes as a surprise consider that So this means it is actually possible to produce a runtime error by passing a |
The issue here is that
Yes, a function/method that accepts
The differences here are due to subtleties in how the typing is done in numpy and pandas. In addition, this goes back to the original issue I created - the words |
In
pandas/_typing.py
, we defineArrayLike
as:In the numpy glossary https://numpy.org/doc/stable/glossary.html?highlight=array_like, numpy defines
array_like
as:Are we creating confusion by using the term
ArrayLike
to only mean arrays, whereas numpy defines it to include scalars?This came up in terms of reconciling the arguments of
np.searchsorted()
andExtensionArray.searchsorted()
.Comments from @jbrockmendel and @simonjayhawkins welcome.
The text was updated successfully, but these errors were encountered: