-
Notifications
You must be signed in to change notification settings - Fork 8
Future redesign of epi_slide
and epix_slide
#458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
On formats for library(tidyverse)
library(epiprocess)
#>
#> Attaching package: 'epiprocess'
#> The following object is masked from 'package:stats':
#>
#> filter
library(epipredict)
#> Loading required package: parsnip
#>
#> Attaching package: 'epipredict'
#> The following object is masked from 'package:ggplot2':
#>
#> layer
# NOTE from ?arx_forecaster
jhu <- case_death_rate_subset %>%
dplyr::filter(time_value >= as.Date("2021-12-01"))
out <- arx_forecaster(
jhu, "death_rate",
c("case_rate", "death_rate")
)
# Here are potential formats we could get out of `epix_slide()`:
deep <- tibble(
time_value = 1:3, # XXX actually, version...
slide_value = rep(list(out), 3)
)
long <- deep %>%
rowwise() %>%
reframe(time_value, enframe(unclass(slide_value)))
wide <- long %>%
pivot_wider(id_cols = time_value)
wide_packed <- wide %>%
pack(slide_value = c(predictions, epi_workflow, metadata))
deep
#> # A tibble: 3 × 2
#> time_value slide_value
#> <int> <list>
#> 1 1 <arx_fcst>
#> 2 2 <arx_fcst>
#> 3 3 <arx_fcst>
long
#> # A tibble: 9 × 3
#> time_value name value
#> <int> <chr> <list>
#> 1 1 predictions <tibble [56 × 5]>
#> 2 1 epi_workflow <ep_wrkfl>
#> 3 1 metadata <named list [2]>
#> 4 2 predictions <tibble [56 × 5]>
#> 5 2 epi_workflow <ep_wrkfl>
#> 6 2 metadata <named list [2]>
#> 7 3 predictions <tibble [56 × 5]>
#> 8 3 epi_workflow <ep_wrkfl>
#> 9 3 metadata <named list [2]>
# NOTE there could also be an alternative long format using a single named list column, but I have no idea how to work with that.
wide
#> # A tibble: 3 × 4
#> time_value predictions epi_workflow metadata
#> <int> <list> <list> <list>
#> 1 1 <tibble [56 × 5]> <ep_wrkfl> <named list [2]>
#> 2 2 <tibble [56 × 5]> <ep_wrkfl> <named list [2]>
#> 3 3 <tibble [56 × 5]> <ep_wrkfl> <named list [2]>
wide_packed
#> # A tibble: 3 × 2
#> time_value slide_value$predictions $epi_workflow $metadata
#> <int> <list> <list> <list>
#> 1 1 <tibble [56 × 5]> <ep_wrkfl> <named list [2]>
#> 2 2 <tibble [56 × 5]> <ep_wrkfl> <named list [2]>
#> 3 3 <tibble [56 × 5]> <ep_wrkfl> <named list [2]> # Here's one way for each that we could use to get to submission-ish format:
preds <-
deep %>%
mutate(predictions = map(slide_value, "predictions"),
slide_value = NULL) %>%
unnest(predictions)
# (This one (`deep`) took a bit of time to think of. I think I knew the solution
# above was possible but ugly, and was hoping to think of a cleaner solution but
# failed and finally fell back on the above.)
# or
# deep %>%
# mutate(slide_value = map(slide_value, unclass)) %>%
# hoist(slide_value, "predictions") %>%
# select(-slide_value) %>%
# unnest(predictions)
# or
# deep %>%
# mutate(predictions = map(slide_value, "predictions"),
# .keep = "unused") %>%
# unnest(predictions)
preds <-
long %>%
filter(name == "predictions") %>%
select(-name) %>%
unnest(value)
preds <-
wide %>%
select(time_value, predictions) %>%
unnest(predictions)
preds <-
wide_packed %>%
mutate(predictions = slide_value$predictions,
slide_value = NULL) %>%
unnest(predictions)
# or
# preds <-
# wide_packed %>%
# rowwise() %>%
# reframe(time_value, slide_value$predictions[[1L]])
# or ... # Here's how we might extract coefficients in a long format:
get_coefs <- function(fit_ewf) {
fit_ewf %>% workflows::extract_fit_engine() %>% coef()
}
coefs <-
deep %>%
rowwise() %>%
reframe(time_value, slide_value$epi_workflow %>% get_coefs() %>% enframe())
coefs <-
long %>%
filter(name == "epi_workflow") %>%
mutate(coefs = map(value, ~ get_coefs(.x) %>% enframe())) %>%
unnest(coefs, names_sep = "_")
# or, in a better output format:
# coefs <-
# long %>%
# filter(name == "epi_workflow") %>%
# mutate(name = NULL,
# coefs = map(value, ~ get_coefs(.x) %>% enframe()),
# value = NULL) %>%
# unnest(coefs)
coefs <-
wide %>%
rowwise() %>%
reframe(time_value, epi_workflow %>% get_coefs() %>% enframe())
coefs <-
wide_packed %>%
rowwise() %>%
reframe(time_value, slide_value$epi_workflow[[1]] %>% get_coefs() %>% enframe())
# (This one (`wide_packed`) took a bit of time to think of.)
# or
# coefs <-
# wide_packed %>%
# transmute(time_value, coefs = map(slide_value$epi_workflow, ~ get_coefs(.x) %>% enframe())) %>%
# unnest(coefs)
# or .... Created on 2024-06-06 with reprex v2.0.2 Comment N+1: Non-
Comment N+2: a lot of these felt awkward, trading off between Comment N+3: we could also change the output of |
Unfortunately, accepting named lists on their own will likely not fully resolve the issue Ryan and Richard were running into: we want for slide computations to be able to just output an |
Long format will also upset |
Status: slide improvements pass has specialized
Remaining:
|
Comment N+4:
|
Related issue: #234
Logan and I did some brainstorming about what use cases
epi_slide
is trying to cover. We found that it doesn't behave gracefully when the slide computation functionf
returns data.frame/tibble outputs (which is something may occur if you're trying to, say, make a forecast per ref_time_value). This led us to thinking that perhaps we shouldn't even recommend usingepi_slide
for forecasting use cases, perhapsepix_slide
should be the goto function for that (where an epi_df can be quickly converted to a fake archive withversion = time_value
). This in turn even led us to suspect that maybeepi_slide
computation functionsf
should be limited to functions that return atomic vectors, like those covered byepi_slide_opt
.TODO
epi_slide_reframe
orepi_slide_mutate
, to imply and constrain the outputs off
to the userSide-note: formats for epix_slide to use when the slide function
f
returns data.frame/tibble outputsThree possible output formats
The deep format is simple from the epi_slide writer's POV, but maybe difficult for the user to index into. The wide format will have issues with name collisions (e.g. if the output has geo_value and time_value as well).
The text was updated successfully, but these errors were encountered: