Skip to content

Profile the time issue for arx_epi_forecaster() on an epi_archive #127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rachlobay opened this issue Aug 16, 2022 · 0 comments
Closed

Profile the time issue for arx_epi_forecaster() on an epi_archive #127

rachlobay opened this issue Aug 16, 2022 · 0 comments
Labels
P1 high priority

Comments

@rachlobay
Copy link
Contributor

rachlobay commented Aug 16, 2022

arx_epi_forecaster() is noticeably slower than the old arx_forecaster() on an epi_archive object (EDIT: I see the forecasters are now both called arx_forecaster(), but to make it clear which I'm referring to, I'll use the old names here). So, the time issue for `arx_epi_forecaster() should be examined... Ex. What's causing the forecaster to be slower (and what can we do about that)?

An example is given below to compare arx_epi_forecaster() to arx_forecaster() on an epi_archive object. Note that this example can be used and profiling can proceed after the tibble::as_tibble() %>% line is removed in the slide function for an archive in epiprocess (see Issue #208). We probably don't have to wait until that issue is fixed to start profiling, but the person who addresses this issue should then use a branch of epiprocess where the problem line in slide is removed (else if you run the below ex, you'll get an error: Error: epi_data must be an epi_df).

library(epipredict)
library(epiprocess)
library(covidcast)
library(data.table)
library(dplyr)
library(tidyr)
library(ggplot2)

y <- covidcast_signals(
  c("doctor-visits", "jhu-csse"),
  c("smoothed_adj_cli", "confirmed_7dav_incidence_prop"),
  start_day = "2020-06-01",
  end_day = "2021-12-01",
  issues = c("2020-06-01", "2021-12-01"),
  geo_type = "state",
  geo_values = c("ca", "fl"))

z <- y[[1]] %>%
  select(geo_value, time_value, version = issue, percent_cli = value) %>%
  as_epi_archive()

z <- epix_merge(
  z, y[[2]] %>%
    select(geo_value, time_value, version = issue, case_rate = value) %>%
    as_epi_archive(), sync = "locf")

fc_time_values <- seq(as.Date("2020-08-01"), as.Date("2021-12-01"),
                      by = "1 month")
ahead = 7

# Old arx_forecaster is pretty quick
z %>% epix_slide(fc = arx_forecaster(y = case_rate, 
                                     key_vars = geo_value, time_value = time_value,
                                     args = arx_args_list(ahead = ahead)),
                 n = 120, ref_time_values = fc_time_values)

# New arx_epi_forecaster is noticeably slower?
z %>%
  epix_slide(function(x, ...)
    arx_epi_forecaster(x, outcome = "case_rate",
                       predictors = c("case_rate"),
                       args = arx_args_list(ahead = ahead))$predictions %>%
      select(-c(geo_value, time_value)),
    n = 120, ref_time_values = fc_time_values, new_col_name = "fc")
@rachlobay rachlobay added the P1 high priority label Aug 16, 2022
@rachlobay rachlobay changed the title Profile the time issue for arx_epi_forecaster() on an epi_archive Profile the time issue for arx_forecaster() on an epi_archive Aug 16, 2022
@rachlobay rachlobay changed the title Profile the time issue for arx_forecaster() on an epi_archive Profile the time issue for arx_epi_forecaster() on an epi_archive Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 high priority
Projects
None yet
Development

No branches or pull requests

2 participants