-
Notifications
You must be signed in to change notification settings - Fork 21
Add design topic page on use of Python builtin types #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This addresses a concern that has come up a number of times about whether it's okay to implement a library-specific object instead of a builtin type. E.g., data-apis#140 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rgommers this looks like a good start.
I suspect we'll want to more rigidly define guaranteed functions available on scalar objects in the future, but we can tackle things as they come up.
Agreed. I thought about exhaustively documenting the methods each duck type should have, but that's a large/tedious job with limited value for the moment. |
builtin types to CPU. In the above example, the `.mean()` call returns a | ||
`float`. It is likely beneficial though to implement this as a library-specific | ||
scalar object which duck types with `float`. This means that it should (a) have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, cudf doesn't actually do this, right now, is that correct? (for example it does return numpy scalars for numeric types, i.e. what pandas does)
But cudf would like to do this? (I seem to remember discussions in the past about Scalar objects)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One example that recently came up in the pyarrow issue tracker (about pyarrow.Scalar objects): implementing |
>>> bool([])
False |
Empty and missing is not necessarily the same (we currently don't cover nested types, but for example in pyarrow/cudf's list type, a list scalar can be empty or null, which are two separate states) Pandas ( |
So was there an outcome or was it unresolved? I think you only have two options right, |
Not really a clear resolution in general (but that's mostly because of my preference to just stay out of this for pyarrow Scalars altogether, and not start with trying to make them behave like python scalars, to avoid all those questions), although we will probably raise for boolean null scalar, just to avoid that people rely on it returning False (that's never the way you should check that it is null) |
That sounds good to me. Should we add that as a separate thing? Maybe best to include with a |
See gh-157 for adding a |
This addresses a concern that has come up a number of times about whether it's okay to implement a library-specific object instead of a builtin type. E.g.,
#140 (comment)