You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/custom_epiworkflows.Rmd
+7-6Lines changed: 7 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -293,7 +293,8 @@ However, it does not generate any predictions; predictions need to be created in
293
293
294
294
## Predicting
295
295
296
-
To make a prediction, it helps to narrow the data set down to the relevant observations using `get_test_data()`. Not doing this will still fit, but it will predict on every day in the data-set, and not just on the `reference_date`.
296
+
To make a prediction, it helps to narrow the data set down to the relevant observations using `get_test_data()`.
297
+
We can still generate predictions without doing this first, but it will predict on _every_ day in the data-set, and not just on the `reference_date`.
In this example, we're creating `relevant_data` from `training_data`, but the data set we want predictions for could be an entirely new data set.
306
+
In this example, we're creating `relevant_data` from `training_data`, but the data set we want predictions for could be entirely new data, unrelated to the one we used when building the workflow.
306
307
307
308
With a trained workflow and data in hand, we can actually make our predictions:
308
309
309
310
```{r workflow_pred}
310
311
fit_workflow |> predict(relevant_data)
311
312
```
312
313
313
-
Note that if we simply plug `training_data` into `predict()` we will still get
314
+
Note that if we simply plug the full `training_data` into `predict()` we will still get
314
315
predictions:
315
316
316
317
```{r workflow_pred_training}
317
318
fit_workflow |> predict(training_data)
318
319
```
319
320
320
321
The resulting tibble is 800 rows long, however.
321
-
Not running `get_test_data()` means that we're providing irrelevant data along with relevant, valid data.
322
322
Passing the non-subsetted data set produces forecasts for not just the requested `reference_date`, but for every
323
323
day in the data set that has sufficient data to produce a prediction.
324
324
To narrow this down, we could filter to rows where the `time_value` matches the `forecast_date`:
@@ -329,8 +329,9 @@ fit_workflow |>
329
329
filter(time_value == forecast_date)
330
330
```
331
331
332
-
This can be useful for cases where `get_test_data()` doesn't pull sufficient
333
-
data.
332
+
This can be useful for as a workaround when `get_test_data()` fails to pull enough
333
+
data to produce a forecast.
334
+
This is generally a problem when the recipe (preprocessor) is sufficiently complicated, and `get_test_data()` can't determine precisely what data is required.
0 commit comments