Skip to content

Adding support for len/__len__ #481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jakirkham opened this issue Sep 23, 2022 · 10 comments
Closed

Adding support for len/__len__ #481

jakirkham opened this issue Sep 23, 2022 · 10 comments
Labels
API extension Adds new functions or objects to the API. status: Rejected Proposed change was not accepted.

Comments

@jakirkham
Copy link
Member

Several array implementation currently implement len/__len__ basically as shorthand for .shape[0]. It would be handy to have this to simplify adoption in existing codes as well as fit more with how current array implementations work

@rgommers
Copy link
Member

It's hard to search for because "len" isn't a very unique string, but we discussed this multiple times for both arrays and dataframes and there was wide agreement that len(x) is a bad idea. I honestly still need to look up myself or test quickly in IPython along what axis it iterates.

@jakirkham I propose we close this.

@jakirkham
Copy link
Member Author

Interesting maybe I'm missing context here

Thought the main issue was dimensions of unknown length. This was noted when it was originally included and then removed ( #289 )

Anyways am wondering if we should revisit given the number of lines of code using len needed to change in scikit-learn to accommodate the Array API. So more generally am concerned about the churn we may be causing by leaving things like this out

@seberg
Copy link
Contributor

seberg commented Sep 23, 2022

len() is something that I am slightly hedgy about specifying, but if no projects involved ever played with the idea of behaving differently, I suppose it can be done.

The reason is that there is an alternative way to look at arrays; which is also the one that SymPy uses (for its matrix):

  • len(arr) should be the same as arr.size
  • For loops should iterate all elements (rather than the first dimension/axis).

By not specifying len() you remain compatible with that choice.

Starting from scratch, I would personally consider that choice. Because I like to think of arrays more as a "collection of values/elements with a homogenous structure". The "collection of values/elements" comes first and the "structure" only later in my philosophy :).

That does break the list-of-lists analogy which may be confusing to users. But to some degree that is a feature not a bug.

@rgommers
Copy link
Member

Starting from scratch, I would personally consider that choice.

Yes, I think that is more reasonable indeed. len is only unambiguous for 1-D arrays. "list of lists" is a concept that I don't think many users have in mind, and it's pretty much 2-D specific as well (you can think about a 3-D array as "list of lists of lists" in principle, but who would actually do that?).

Anyways am wondering if we should revisit given the number of lines of code using len needed to change in scikit-learn to accommodate the Array API.

Honestly, I think this is (a) not a hard thing to change all at once for a code base, and (b) a good improvement for readability even in the absence of array API usage.

@seberg
Copy link
Contributor

seberg commented Sep 23, 2022

One semi-related thing I would actually like to add to NumPy - but never worked on - is arr.iteraxis(axis=None, *, order="K"), with the semantics of axis=None iterating all elements (similar/identical to arr.flat, returning scalars in NumPy).

And otherwise axis selecting all axes to be iterate.

@shoyer
Copy link
Contributor

shoyer commented Sep 23, 2022

I agree that arrays don't need to be directly iterable & with a length, but we do need a way to unpack arrays along an axis into a list.

If we don't have any way to do that currently in the standard, then we need an unstack function.

@asmeurer
Copy link
Member

asmeurer commented Sep 23, 2022

I agree that arrays don't need to be directly iterable & with a length, but we do need a way to unpack arrays along an axis into a list.

Can you clarify why this is needed? Typically if you want to work along an axis of an array you just work along that axis.

This can be done with a list comprehension: [a[:, :, i, ...] for i in range(a.shape[k])] (with k colons).

@shoyer
Copy link
Contributor

shoyer commented Sep 23, 2022

I agree that arrays don't need to be directly iterable & with a length, but we do need a way to unpack arrays along an axis into a list.

Can you clarify why this is needed? Typically if you want to work along an axis of an array you just work along that axis.

This can be done with a list comprehension: [a[:, :, i, ...] for i in range(a.shape[k])] (with k :s).

OK, fair enough, indexing in a loop would definitely do the trick here. Still maybe worth considering adding unstack() as a convenience function. (I'll open a separate issue to discuss this.)

@shoyer shoyer mentioned this issue Sep 23, 2022
@rgommers
Copy link
Member

rgommers commented Oct 7, 2022

Can we close this issue with outcome that the proposal is rejected? No requirement to add __len__ (but of course implementers are free to add it anyway, outside of their standard support).

@rgommers rgommers added the API extension Adds new functions or objects to the API. label Nov 16, 2022
@rgommers
Copy link
Member

No more comments, I'll close this. Thanks all!

@rgommers rgommers added wont-fix This will not be worked on. status: Rejected Proposed change was not accepted. and removed wont-fix This will not be worked on. labels Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API extension Adds new functions or objects to the API. status: Rejected Proposed change was not accepted.
Projects
None yet
Development

No branches or pull requests

5 participants