Add design topic page on use of Python builtin types #153

rgommers · 2023-04-26T23:31:02Z

This addresses a concern that has come up a number of times about whether it's okay to implement a library-specific object instead of a builtin type. E.g.,
#140 (comment)

This addresses a concern that has come up a number of times about whether it's okay to implement a library-specific object instead of a builtin type. E.g., data-apis#140 (comment)

kkraus14

Thanks @rgommers this looks like a good start.

I suspect we'll want to more rigidly define guaranteed functions available on scalar objects in the future, but we can tackle things as they come up.

rgommers · 2023-04-27T09:22:40Z

I suspect we'll want to more rigidly define guaranteed functions available on scalar objects in the future, but we can tackle things as they come up.

Agreed. I thought about exhaustively documenting the methods each duck type should have, but that's a large/tedious job with limited value for the moment.

spec/design_topics/python_builtin_types.md

jorisvandenbossche · 2023-04-27T12:24:22Z

spec/design_topics/python_builtin_types.md

+builtin types to CPU. In the above example, the `.mean()` call returns a
+`float`. It is likely beneficial though to implement this as a library-specific
+scalar object which duck types with `float`. This means that it should (a) have


Out of curiosity, cudf doesn't actually do this, right now, is that correct? (for example it does return numpy scalars for numeric types, i.e. what pandas does)

But cudf would like to do this? (I seem to remember discussions in the past about Scalar objects)

I thought it did. If not, maybe @kkraus14 or @shwina can suggest a better example for where cuDF uses scalars.

That said, numpy scalars are also an example of special objects that duck type, they're not Python builtin float, int, etc.

jorisvandenbossche · 2023-04-27T12:27:11Z

I suspect we'll want to more rigidly define guaranteed functions available on scalar objects in the future, but we can tackle things as they come up.

One example that recently came up in the pyarrow issue tracker (about pyarrow.Scalar objects): implementing __bool__. For boolean/numeric scalars with an actual value, that's kind of obvious, but how should that behave for a null scalar of those types?

rgommers · 2023-04-27T12:30:58Z

For boolean/numeric scalars with an actual value, that's kind of obvious, but how should that behave for a null scalar of those types?

False I imagine? It's missing ~= empty. So I'd think it would work like an empty sequence:

>>> bool([])
False

jorisvandenbossche · 2023-04-27T12:40:48Z

Empty and missing is not necessarily the same (we currently don't cover nested types, but for example in pyarrow/cudf's list type, a list scalar can be empty or null, which are two separate states)

Pandas (bool(pd.NA)), but that also got some pushback.

rgommers · 2023-04-27T12:43:39Z

So was there an outcome or was it unresolved? I think you only have two options right, False or raising an exception. True doesn't seem very reasonable.

jorisvandenbossche · 2023-04-27T13:13:11Z

Not really a clear resolution in general (but that's mostly because of my preference to just stay out of this for pyarrow Scalars altogether, and not start with trying to make them behave like python scalars, to avoid all those questions), although we will probably raise for boolean null scalar, just to avoid that people rely on it returning False (that's never the way you should check that it is null)

rgommers · 2023-04-27T13:52:45Z

That sounds good to me. Should we add that as a separate thing? Maybe best to include with a null object what bool(null) does (either raise, or is undefined).

rgommers · 2023-04-27T21:59:19Z

Maybe best to include with a null object what bool(null) does (either raise, or is undefined).

See gh-157 for adding a null object.

Add design topic page on use of Python builtin types

79fd621

This addresses a concern that has come up a number of times about whether it's okay to implement a library-specific object instead of a builtin type. E.g., data-apis#140 (comment)

rgommers added documentation Improvements or additions to documentation API design labels Apr 26, 2023

rgommers requested review from kkraus14 and MarcoGorelli April 26, 2023 23:31

rgommers mentioned this pull request Apr 26, 2023

add __len__ and __getitem__ to Column #140

Merged

kkraus14 approved these changes Apr 27, 2023

View reviewed changes

rgommers mentioned this pull request Apr 27, 2023

add Column.from_sequence #148

Merged

MarcoGorelli reviewed Apr 27, 2023

View reviewed changes

spec/design_topics/python_builtin_types.md Outdated Show resolved Hide resolved

skipna -> skip_nulls

4c231de

MarcoGorelli approved these changes Apr 27, 2023

View reviewed changes

MarcoGorelli merged commit 7952c2e into data-apis:main Apr 27, 2023

rgommers deleted the python-builtin-types branch April 27, 2023 11:17

jorisvandenbossche reviewed Apr 27, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add design topic page on use of Python builtin types #153

Add design topic page on use of Python builtin types #153

Uh oh!

rgommers commented Apr 26, 2023

Uh oh!

kkraus14 left a comment

Uh oh!

rgommers commented Apr 27, 2023

Uh oh!

Uh oh!

jorisvandenbossche Apr 27, 2023

Uh oh!

rgommers Apr 27, 2023

Uh oh!

jorisvandenbossche commented Apr 27, 2023

Uh oh!

rgommers commented Apr 27, 2023

Uh oh!

jorisvandenbossche commented Apr 27, 2023

Uh oh!

rgommers commented Apr 27, 2023

Uh oh!

jorisvandenbossche commented Apr 27, 2023

Uh oh!

rgommers commented Apr 27, 2023

Uh oh!

rgommers commented Apr 27, 2023

Uh oh!

Uh oh!

Add design topic page on use of Python builtin types #153

Add design topic page on use of Python builtin types #153

Uh oh!

Conversation

rgommers commented Apr 26, 2023

Uh oh!

kkraus14 left a comment

Choose a reason for hiding this comment

Uh oh!

rgommers commented Apr 27, 2023

Uh oh!

Uh oh!

jorisvandenbossche Apr 27, 2023

Choose a reason for hiding this comment

Uh oh!

rgommers Apr 27, 2023

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Apr 27, 2023

Uh oh!

rgommers commented Apr 27, 2023

Uh oh!

jorisvandenbossche commented Apr 27, 2023

Uh oh!

rgommers commented Apr 27, 2023

Uh oh!

jorisvandenbossche commented Apr 27, 2023

Uh oh!

rgommers commented Apr 27, 2023

Uh oh!

rgommers commented Apr 27, 2023

Uh oh!

Uh oh!