Skip to content

Commit b481a28

Browse files
nmdefriesdsweber2
authored andcommitted
get_test_data help
1 parent 5469dda commit b481a28

File tree

1 file changed

+7
-6
lines changed

1 file changed

+7
-6
lines changed

vignettes/custom_epiworkflows.Rmd

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -293,7 +293,8 @@ However, it does not generate any predictions; predictions need to be created in
293293

294294
## Predicting
295295

296-
To make a prediction, it helps to narrow the data set down to the relevant observations using `get_test_data()`. Not doing this will still fit, but it will predict on every day in the data-set, and not just on the `reference_date`.
296+
To make a prediction, it helps to narrow the data set down to the relevant observations using `get_test_data()`.
297+
We can still generate predictions without doing this first, but it will predict on _every_ day in the data-set, and not just on the `reference_date`.
297298

298299
```{r grab_data}
299300
relevant_data <- get_test_data(
@@ -302,23 +303,22 @@ relevant_data <- get_test_data(
302303
)
303304
```
304305

305-
In this example, we're creating `relevant_data` from `training_data`, but the data set we want predictions for could be an entirely new data set.
306+
In this example, we're creating `relevant_data` from `training_data`, but the data set we want predictions for could be entirely new data, unrelated to the one we used when building the workflow.
306307

307308
With a trained workflow and data in hand, we can actually make our predictions:
308309

309310
```{r workflow_pred}
310311
fit_workflow |> predict(relevant_data)
311312
```
312313

313-
Note that if we simply plug `training_data` into `predict()` we will still get
314+
Note that if we simply plug the full `training_data` into `predict()` we will still get
314315
predictions:
315316

316317
```{r workflow_pred_training}
317318
fit_workflow |> predict(training_data)
318319
```
319320

320321
The resulting tibble is 800 rows long, however.
321-
Not running `get_test_data()` means that we're providing irrelevant data along with relevant, valid data.
322322
Passing the non-subsetted data set produces forecasts for not just the requested `reference_date`, but for every
323323
day in the data set that has sufficient data to produce a prediction.
324324
To narrow this down, we could filter to rows where the `time_value` matches the `forecast_date`:
@@ -329,8 +329,9 @@ fit_workflow |>
329329
filter(time_value == forecast_date)
330330
```
331331

332-
This can be useful for cases where `get_test_data()` doesn't pull sufficient
333-
data.
332+
This can be useful for as a workaround when `get_test_data()` fails to pull enough
333+
data to produce a forecast.
334+
This is generally a problem when the recipe (preprocessor) is sufficiently complicated, and `get_test_data()` can't determine precisely what data is required.
334335

335336
# Extending `four_week_ahead`
336337

0 commit comments

Comments
 (0)