Skip to content

Commit 5cae3b9

Browse files
committed
sliding updates
1 parent bea7dd1 commit 5cae3b9

File tree

1 file changed

+26
-23
lines changed

1 file changed

+26
-23
lines changed

vignettes/articles/sliding.Rmd

Lines changed: 26 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -93,10 +93,9 @@ fc_time_values <- seq(as.Date("2020-08-01"), as.Date("2021-12-01"),
9393
k_week_ahead <- function(epi_df, outcome, predictors, ahead = 7, engine) {
9494
epi_df %>%
9595
epi_slide(
96-
~arx_forecaster(
96+
~ arx_forecaster(
9797
.x, outcome, predictors, engine,
98-
args_list = arx_args_list(ahead = ahead)
99-
) %>%
98+
args_list = arx_args_list(ahead = ahead)) %>%
10099
extract2("predictions") %>%
101100
select(-c(geo_value, time_value)),
102101
n = 120,
@@ -107,40 +106,41 @@ k_week_ahead <- function(epi_df, outcome, predictors, ahead = 7, engine) {
107106
mutate(engine_type = engine$engine)
108107
}
109108
110-
# Generate the forecasts, and bind them together
109+
# Generate the forecasts and bind them together
111110
fc <- bind_rows(
112111
purrr::map_dfr(
113112
c(7,14,21,28),
114-
~ k_week_ahead(x_latest, "case_rate", c("case_rate", "percent_cli"), .x,
115-
engine = linear_reg())
116-
),
113+
~ k_week_ahead(
114+
x_latest, "case_rate", c("case_rate", "percent_cli"), .x,
115+
engine = linear_reg())
116+
),
117117
purrr::map_dfr(
118118
c(7,14,21,28),
119-
~ k_week_ahead(x_latest, "case_rate", c("case_rate", "percent_cli"), .x,
120-
engine = rand_forest(mode = "regression"))
121-
)) %>%
122-
mutate(.pred_distn = nested_quantiles(fc_.pred_distn)) %>% # "nested" list-col
119+
~ k_week_ahead(
120+
x_latest, "case_rate", c("case_rate", "percent_cli"), .x,
121+
engine = rand_forest(mode = "regression"))
122+
)) %>%
123+
mutate(.pred_distn = nested_quantiles(fc_.pred_distn)) %>%
123124
unnest(.pred_distn) %>%
124125
pivot_wider(names_from = tau, values_from = q)
125126
```
126127

127128
Here, `arx_forecaster()` does all the heavy lifting. It creates leads of the
128129
target (respecting time stamps and locations) along with lags of the features
129130
(here, the response and doctors visits), estimates a forecasting model using the
130-
specified engine, creates predictions, and non-parametric confidence bands. All
131-
of these are tunable parameters.
131+
specified engine, creates predictions, and non-parametric confidence bands.
132132

133133
To see how the predictions compare, we plot them on top of the latest case
134-
rates. Note that even though we've fitted on all states, we'll just display the
134+
rates. Note that even though we've fitted the model on all states,
135+
we'll just display the
135136
results for two states, California (CA) and Florida (FL), to get a sense of the
136-
model performance while keeping it simple. So feel free to modify the code to
137-
look over the results for other states.
137+
model performance while keeping the graphic simple.
138138

139139
```{r plot-arx, message = FALSE, warning = FALSE, fig.width = 9, fig.height = 6}
140140
fc_cafl <- fc %>% filter(geo_value %in% c("ca", "fl"))
141141
x_latest_cafl <- x_latest %>% filter(geo_value %in% c("ca", "fl"))
142142
143-
ggplot(fc_cafl, aes(x = fc_target_date, group = time_value, fill = engine_type)) +
143+
ggplot(fc_cafl, aes(fc_target_date, group = time_value, fill = engine_type)) +
144144
geom_line(data = x_latest_cafl, aes(x = time_value, y = case_rate),
145145
inherit.aes = FALSE, color = "gray50") +
146146
geom_ribbon(aes(ymin = `0.05`, ymax = `0.95`), alpha = 0.4) +
@@ -155,14 +155,17 @@ ggplot(fc_cafl, aes(x = fc_target_date, group = time_value, fill = engine_type))
155155
```
156156

157157
For the two states of interest, simple linear regression clearly performs better
158-
than random forest in terms of accuracy of the predictions and not does not
159-
result in such in overconfident predictions (too narrow confidence bands).
160-
Though, in general, both approaches do not perform great. This could be because
158+
than random forest in terms of accuracy of the predictions and does not
159+
result in such in overconfident predictions (overly narrow confidence bands).
160+
Though, in general, neither approach produces amazingly accurate forecasts.
161+
This could be because
161162
the behaviour is rather different across states and the effects of other notable
162163
factors such as age and public health measures may be important to account for
163164
in such forecasting. Including such factors as well as making enhancements such
164-
as correcting for outliers are some improvements one could make to our simple
165-
model in the future.
165+
as correcting for outliers are some improvements one could make to this simple
166+
model.[^1]
167+
168+
[^1]: Note that, despite the above caveats, simple models like this tend to out-perform many far more complicated models in the online Covid forecasting due to those models high variance predictions.
166169

167170
### Example using case data from Canada
168171

@@ -175,7 +178,7 @@ daily time series data on COVID-19 cases, deaths, recoveries, testing and
175178
vaccinations at the health region and province levels. Data are collected from
176179
publicly available sources such as government datasets and news releases.
177180
Unfortunately, there is no simple versioned source, so we have created our own
178-
from the Commit history.
181+
from the Github commit history.
179182

180183
First, we load versioned case rates at the provincial level. After converting
181184
these to 7-day averages (due to highly variable provincial reporting

0 commit comments

Comments
 (0)