Skip to content

Add design topic page on use of Python builtin types #153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 27, 2023

Conversation

rgommers
Copy link
Member

This addresses a concern that has come up a number of times about whether it's okay to implement a library-specific object instead of a builtin type. E.g.,
#140 (comment)

This addresses a concern that has come up a number of times about
whether it's okay to implement a library-specific object instead
of a builtin type. E.g.,
data-apis#140 (comment)
@rgommers rgommers added documentation Improvements or additions to documentation API design labels Apr 26, 2023
Copy link
Collaborator

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rgommers this looks like a good start.

I suspect we'll want to more rigidly define guaranteed functions available on scalar objects in the future, but we can tackle things as they come up.

@rgommers
Copy link
Member Author

I suspect we'll want to more rigidly define guaranteed functions available on scalar objects in the future, but we can tackle things as they come up.

Agreed. I thought about exhaustively documenting the methods each duck type should have, but that's a large/tedious job with limited value for the moment.

@MarcoGorelli MarcoGorelli merged commit 7952c2e into data-apis:main Apr 27, 2023
@rgommers rgommers deleted the python-builtin-types branch April 27, 2023 11:17
Comment on lines +29 to +31
builtin types to CPU. In the above example, the `.mean()` call returns a
`float`. It is likely beneficial though to implement this as a library-specific
scalar object which duck types with `float`. This means that it should (a) have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, cudf doesn't actually do this, right now, is that correct? (for example it does return numpy scalars for numeric types, i.e. what pandas does)

But cudf would like to do this? (I seem to remember discussions in the past about Scalar objects)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it did. If not, maybe @kkraus14 or @shwina can suggest a better example for where cuDF uses scalars.

That said, numpy scalars are also an example of special objects that duck type, they're not Python builtin float, int, etc.

@jorisvandenbossche
Copy link
Member

I suspect we'll want to more rigidly define guaranteed functions available on scalar objects in the future, but we can tackle things as they come up.

One example that recently came up in the pyarrow issue tracker (about pyarrow.Scalar objects): implementing __bool__. For boolean/numeric scalars with an actual value, that's kind of obvious, but how should that behave for a null scalar of those types?

@rgommers
Copy link
Member Author

For boolean/numeric scalars with an actual value, that's kind of obvious, but how should that behave for a null scalar of those types?

False I imagine? It's missing ~= empty. So I'd think it would work like an empty sequence:

>>> bool([])
False

@jorisvandenbossche
Copy link
Member

Empty and missing is not necessarily the same (we currently don't cover nested types, but for example in pyarrow/cudf's list type, a list scalar can be empty or null, which are two separate states)

Pandas (bool(pd.NA)), but that also got some pushback.

@rgommers
Copy link
Member Author

So was there an outcome or was it unresolved? I think you only have two options right, False or raising an exception. True doesn't seem very reasonable.

@jorisvandenbossche
Copy link
Member

Not really a clear resolution in general (but that's mostly because of my preference to just stay out of this for pyarrow Scalars altogether, and not start with trying to make them behave like python scalars, to avoid all those questions), although we will probably raise for boolean null scalar, just to avoid that people rely on it returning False (that's never the way you should check that it is null)

@rgommers
Copy link
Member Author

That sounds good to me. Should we add that as a separate thing? Maybe best to include with a null object what bool(null) does (either raise, or is undefined).

@rgommers
Copy link
Member Author

Maybe best to include with a null object what bool(null) does (either raise, or is undefined).

See gh-157 for adding a null object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants