Skip to content

Provide better interface or documentation for per-geo modeling #336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
brookslogan opened this issue Jun 4, 2024 · 1 comment
Open

Comments

@brookslogan
Copy link
Contributor

brookslogan commented Jun 4, 2024

Context

@dsweber2 was just noting that

epi_df %>%
  epi_slide(
    ~ .x %>%
      group_by(geo_value) %>%
      arx_forecaster(.....)
    )

doesn't fit per-geo models; it actually just ignores the grouping altogether.

We suggested "transposing" the operations, but @rnayebi21 found that

epi_df %>%
  group_by(geo_value) %>%
  epi_slide(~ arx_forecaster(.x, .....), .....)

doesn't work either; .x doesn't have the geo_value column thus lacks epi_dfness. I believe these problems also apply to when you are trying to do version-faithful backtesting with epix_slide().

Workarounds seem a little bit of a pain, either

  • fixing up the first approach by doing something like
    • split + map + bind_rows, or
    • group_by + group_split + map + bind_rows, or
    • group_by + group_modify(.keep = TRUE), or
    • group_by + reframe (using the deprecated-but-not-replaced cur_data_all()...)
  • fixing up the second approach by reconstructing an epi_df inside the slide computation using .x, .group_key, and .ref_time_value, or
  • [mutate(geo_value2 = geo_value) and group by that instead of geo_value. Or just group_by(geo_value2 = geo_value).]

The first workaround seems more modular (you can have a list of forecasters that can all rely on ungrouped slides, rather than having to do a different type of slide call for each one).

Proposal

  1. Make arx_forecaster() check specifically if there's a missing geo_value and hint that if they were doing a grouped epix_slide() or epi_slide() with geo_value in the group variables, that won't work, and to do <some workaround / feature> instead.
  2. Check if input to arx_forecaster() etc. is grouped; if so, either
    • warn
    • abort
    • fit & forecast one model per group
  3. [Also check for groupedness and warn/abort in the epi workflow internals.]

Musings

We can also probably make things easier epiprocess-side, by adding a .keep parameter if we're not already able to forward to group_modify() via dots. But I'm not sure we actually want to... this makes it easier to use epi_slide() for forecasting when it shouldn't actually be (epix_slide() should be favored and maybe renamed to make this clear).

[@dshemetov points out we should document this geo-grouped epi_slide gotcha in epiprocess. And actually fixing what's going wrong is part of a much larger project, epiprocess#223.]

@dajmcdon
Copy link
Contributor

  • I think this is possibly outside the scope of arx_forecaster(). Just make a new one.

Maybe this was described above, but I'm not quite clear. It sounds like the major issue is that calling group_by() on an epi_df has a strange effect on the geo_value (and whatever you are grouping on) that causes it to behave poorly with the epi_workflow processing. Is that right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants