Skip to content

Commit 0f4188b

Browse files
committed
reword
1 parent 7a8dcf0 commit 0f4188b

File tree

2 files changed

+18
-31
lines changed

2 files changed

+18
-31
lines changed

spec/API_specification/dataframe_api/dataframe_object.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -981,7 +981,7 @@ def join(
981981
"""
982982
...
983983

984-
def may_execute(self) -> Self:
984+
def maybe_execute(self) -> Self:
985985
"""
986986
Hint that execution may be triggered, depending on the implementation.
987987

spec/design_topics/execution_model.md

Lines changed: 17 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -38,40 +38,13 @@ for such an operation to be executed:
3838
...: print('scalar is positive')
3939
...:
4040
---------------------------------------------------------------------------
41-
TypeError Traceback (most recent call last)
42-
Cell In[5], line 1
43-
----> 1 if scalar:
44-
2 print('scalar is positive')
45-
46-
File ~/tmp/.venv/lib/python3.10/site-packages/dask/dataframe/core.py:312, in Scalar.__bool__(self)
47-
311 def __bool__(self):
48-
--> 312 raise TypeError(
49-
313 f"Trying to convert {self} to a boolean value. Because Dask objects are "
50-
314 "lazily evaluated, they cannot be converted to a boolean value or used "
51-
315 "in boolean conditions like if statements. Try calling .compute() to "
52-
316 "force computation prior to converting to a boolean value or using in "
53-
317 "a conditional statement."
54-
318 )
41+
[...]
5542

5643
TypeError: Trying to convert dd.Scalar<gt-bbc3..., dtype=bool> to a boolean value. Because Dask objects are lazily evaluated, they cannot be converted to a boolean value or used in boolean conditions like if statements. Try calling .compute() to force computation prior to converting to a boolean value or using in a conditional statement.
5744
```
5845

59-
Exactly which methods require computation may vary across implementations. Some may
60-
implicitly do it for users under-the-hood for certain methods, whereas others require
61-
the user to explicitly trigger it.
62-
63-
Therefore, the Dataframe API has a `Dataframe.maybe_evaluate` method. This is to be
64-
interpreted as a hint, rather than as a directive - the implementation itself may decide
65-
whether to force execution at this step, or whether to defer it to later.
66-
67-
Operations which require `DataFrame.may_execute` to have been called at some prior
68-
point are:
69-
- `DataFrame.to_array`
70-
- `DataFrame.shape`
71-
- `Column.to_array`
72-
- calling `bool`, `int`, or `float` on a scalar
73-
74-
Therefore, the Standard-compliant way to write the code above is:
46+
The Dataframe API has a `DataFrame.maybe_evaluate` for addressing the above. We can use it to rewrite the code above
47+
as follows:
7548
```python
7649
df: DataFrame
7750
df = df.may_execute()
@@ -82,6 +55,20 @@ for column_name in df.column_names:
8255
return features
8356
```
8457

58+
Note that `maybe_evaluate` is to be interpreted as a hint, rather than as a directive -
59+
the implementation itself may decide
60+
whether to force execution at this step, or whether to defer it to later.
61+
For example, a dataframe which can convert to a lazy array could decide to ignore
62+
`maybe_evaluate` when evaluting `DataFrame.to_array` but to respect it when evaluating
63+
`float(Column.std())`.
64+
65+
Operations which require `DataFrame.may_execute` to have been called at some prior
66+
point are:
67+
- `DataFrame.to_array`
68+
- `DataFrame.shape`
69+
- `Column.to_array`
70+
- calling `bool`, `int`, or `float` on a scalar
71+
8572
Note now `DataFrame.may_execute` is called only once, and as late as possible.
8673
Conversely, the "wrong" way to execute the above would be:
8774

0 commit comments

Comments
 (0)