Skip to content

epix_slide() converts to a tibble rather than an epi_df - incompatible with arx_forecaster() #208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rachlobay opened this issue Aug 16, 2022 · 0 comments · Fixed by #212
Assignees
Labels
P0 high priority

Comments

@rachlobay
Copy link
Collaborator

rachlobay commented Aug 16, 2022

In the slide function for an epi_archive object, there is a call to convert to a tibble when performing the computation by group:

if (!missing(f)) {
  if (rlang::is_formula(f)) f = rlang::as_function(f)
  
  x = purrr::map_dfr(ref_time_values, function(t) {
    self$as_of(t, min_time_value = t - before_num) %>%
      tibble::as_tibble() %>%  # problem line
      dplyr::group_by(!!!group_by) %>%
      dplyr::group_modify(comp_one_grp,
                          f = f, ..., 
                          time_value = t,
                          key_vars = key_vars,
                          new_col = new_col,
                          .keep = TRUE) %>%
      dplyr::ungroup()
  })
}

So, when a function is used the output from that slide function is a tibble, but the main forecasting function epi_forecaster() from epipredict requires an epi_df to be inputted else it throws an error. Below is an example where we use the latest version of epipredict on GitHub:

library(epipredict)
library(epiprocess)
library(covidcast)
library(data.table)
library(dplyr)
library(tidyr)
library(ggplot2)

y <- covidcast_signals(
  c("doctor-visits", "jhu-csse"),
  c("smoothed_adj_cli", "confirmed_7dav_incidence_prop"),
  start_day = "2020-06-01",
  end_day = "2021-12-01",
  issues = c("2020-06-01", "2021-12-01"),
  geo_type = "state",
  geo_values = c("ca", "fl"))

z <- y[[1]] %>%
  select(geo_value, time_value, version = issue, percent_cli = value) %>%
  as_epi_archive()

z <- epix_merge(
  z, y[[2]] %>%
    select(geo_value, time_value, version = issue, case_rate = value) %>%
    as_epi_archive(), sync = "locf")

fc_time_values <- seq(as.Date("2020-08-01"), as.Date("2021-12-01"),
                      by = "1 month")
ahead = 7

# Old arx_forecaster works fine on the archive
z %>% epix_slide(fc = arx_forecaster(
  percent_cli, case_rate, geo_value, time_value,
  args = arx_args_list(ahead = ahead)),
  n = 120, ref_time_values = fc_time_values)

# New arx does not work on the archive: Error: epi_data must be an `epi_df`.
z %>%
  epix_slide(function(x, ...)
    arx_forecaster(x, outcome = "case_rate",
                       predictors = c("case_rate", "percent_cli"),
                       args = arx_args_list(ahead = ahead))$predictions %>%
      select(-c(geo_value, time_value)),
    n = 120, ref_time_values = fc_time_values, new_col_name = "fc")

So, it looks like we're currently blocked from sliding a forecaster using epi_forecaster() over an epi_archive. Based on a quick inspection, removing the tibble::as_tibble() %>% line seems to do the trick so that epi_forecaster() can work when used as the function in epix_slide(), but the impact of this change should be further investigated before implementing (Some qs about that: Would that change cause any major issues elsewhere? For what purpose was the conversion to tibble done in the first place?). Also, I am not sure that some of the resulting metadata (ex. as_of) makes sense in the outputted epi_df if we do make the above change, but perhaps that could be a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 high priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant