Skip to content

DEPR: is_array_like #52834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jbrockmendel opened this issue Apr 21, 2023 · 13 comments
Open

DEPR: is_array_like #52834

jbrockmendel opened this issue Apr 21, 2023 · 13 comments
Labels
Deprecate Functionality to remove in pandas Dtype Conversions Unexpected or buggy dtype conversions Needs Discussion Requires discussion from core team before further action

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Apr 21, 2023

is_array_like is a a one-liner return is_list_like(obj) and hasattr(obj, "dtype"). It does not recognize pyarrow objects. Often we use it when we really want something like isinstance(obj, (np.ndarray, Series, Index, ExtensionArray)), which is much more typing-friendly. We should replace its usage with these more specific checks and deprecate it.

Discussed briefly in a dev meeting a couple months ago with generally positive reception.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 21, 2023
@rhshadrach
Copy link
Member

which is much more typing-friendly

Would using a TypeGuard with updated logic be better? If we ever want to change what is being identified as array-like, inlining would require trying to grep through the codebase.

@jbrockmendel
Copy link
Member Author

I'd be fine with that

@phofl
Copy link
Member

phofl commented Apr 22, 2023

sounds good to me as well

@rhshadrach rhshadrach added Deprecate Functionality to remove in pandas Dtype Conversions Unexpected or buggy dtype conversions and removed Deprecate Functionality to remove in pandas Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 25, 2023
@MarcoGorelli MarcoGorelli added the Deprecate Functionality to remove in pandas label Apr 25, 2023
@lithomas1
Copy link
Member

Going to tag this as good first issue.

See https://pandas.pydata.org/docs/development/contributing_codebase.html#backwards-compatibility for how to do a deprecation.

@phofl
Copy link
Member

phofl commented Apr 25, 2023

I might be missing something but I think we agreed on changing the implementation, not deprecating the function?

@phofl phofl added Needs Discussion Requires discussion from core team before further action and removed good first issue labels Apr 25, 2023
@rhshadrach
Copy link
Member

I think we agreed on changing the implementation, not deprecating the function?

We could still deprecate the function and make it internal. This would disadvantage users utilizing it but would make any future changes not have to go through deprecation. Not sure how much this might be used today.

@phofl
Copy link
Member

phofl commented Apr 25, 2023

I fixed a bunch of things in Dask when we merged the other deprecations. This has value for downstream libraries, I am -1 on deprecating as soon as the functions become more than a simple isinstance check

@jbrockmendel
Copy link
Member Author

I guess I imagine implementing a new non-public version and changing out internal usages to it. Then can either deprecate the old one saying it's moving to the new one's behavior or deprecate it entirely or just let it sit there unused forever.

@phofl
Copy link
Member

phofl commented Apr 27, 2023

Just switching the implementation is probably not straightforward?

@jbrockmendel
Copy link
Member Author

Probably not, no. I expect there are a bunch of subtle bugs where we used is_arraylike but really meant something more akin to isinstance(obj, (np.ndarray, ExtensionArray, Series, Index))

@jbrockmendel
Copy link
Member Author

Another thought that doesn't fit perfectly here but close enough: we have places where we check for hasattr(obj, "dtype") that would be nice to subsume into one pattern, and also we have checks for __array__ both in Index.__new__ and in sanitize_array (unfortunately with slightly different behavior) that would be good to standardize.

@jbrockmendel
Copy link
Member Author

I count 26 non-test uses of is_array_like internally. One of them is removed by #52973. One more in core.algorithms I have an upcoming PR to deprecate, and one other in _ensure_arraylike I'd like to deprecate but that will take more effort. Both in core.common are pretty clearly wrong (upcoming PR will fix). 5 in reshape.merge are on the chopping block xref #48454. One in sparse.array seems unnecessary. Still looking through the rest.

Point being: it seems like getting rid of internal usages may be feasible and worthwhile.

@Aloqeely
Copy link
Member

Aloqeely commented Apr 8, 2024

Any final thought on this? I can start working on it if all agree on just changing the implementation without any deprecating

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Dtype Conversions Unexpected or buggy dtype conversions Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

6 participants