Skip to content

Commit 7952c2e

Browse files
authored
Add design topic page on use of Python builtin types (#153)
* Add design topic page on use of Python builtin types This addresses a concern that has come up a number of times about whether it's okay to implement a library-specific object instead of a builtin type. E.g., #140 (comment) * skipna -> `skip_nulls`
1 parent b2d977c commit 7952c2e

File tree

2 files changed

+42
-0
lines changed

2 files changed

+42
-0
lines changed

spec/design_topics/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@ Design topics & constraints
77

88
backwards_compatibility
99
data_interchange
10+
python_builtin_types
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Python builtin types and duck typing
2+
3+
Use of Python's builtin types - `bool`, `int`, `float`, `str`, `dict`, `list`,
4+
`tuple`, `datetime.datetime`, etc. - is often natural and convenient. However,
5+
it is also potentially problematic when trying to write performant dataframe
6+
library code or supporting devices other than CPU.
7+
8+
This standard specifies the use of Python types in quite a few places, and uses
9+
them as type annotations. As a concrete example, consider the `mean` method and
10+
the `float` it is documented to return, in combination with the `__gt__` method
11+
(i.e., the `>` operator) on the dataframe:
12+
13+
```python
14+
class DataFrame:
15+
def __gt__(self, other: DataFrame | Scalar) -> DataFrame:
16+
...
17+
def get_column_by_name(self, name: str, /) -> Column:
18+
...
19+
20+
class Column:
21+
def mean(self, skip_nulls: bool = True) -> float:
22+
...
23+
24+
larger = df2 > df1.get_column_by_name('foo').mean()
25+
```
26+
27+
For a GPU dataframe library, it is desirable for all data to reside on the GPU,
28+
and not incur a performance penalty from synchronizing instances of Python
29+
builtin types to CPU. In the above example, the `.mean()` call returns a
30+
`float`. It is likely beneficial though to implement this as a library-specific
31+
scalar object which duck types with `float`. This means that it should (a) have
32+
the same semantics as a builtin `float` when used within a library, and (b)
33+
support usage as a `float` outside of the library (i.e., implement
34+
`__float__`). Duck typing is usually not perfect, for example `isinstance`
35+
usage on the float-like duck type will behave differently. Such explicit "type
36+
of object" checks don't have to be supported.
37+
38+
The following design rule applies everywhere builtin Python types are used
39+
within this API standard: _where a Python builtin type is specified, an
40+
implementation may always replace it by an equivalent library-specific type
41+
that duck types with the Python builtin type._

0 commit comments

Comments
 (0)