Skip to content

fix: more bugs in #290 #304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 3, 2024
Merged

fix: more bugs in #290 #304

merged 5 commits into from
Apr 3, 2024

Conversation

dshemetov
Copy link
Contributor

@dshemetov dshemetov commented Apr 2, 2024

Checklist

Please:

  • Make sure this PR is against "dev", not "main".
  • Request a review from one of the current epipredict main reviewers:
    dajmcdon.
  • Makes sure to bump the version number in DESCRIPTION and NEWS.md.
    Always increment the patch version number (the third number), unless you are
    making a release PR from dev to main, in which case increment the minor
    version number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    0.7.2, then write your changes under the 0.8 heading).

Change explanations for reviewer

Addresses some additional bugs in #290, merge into #300

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

@dshemetov
Copy link
Contributor Author

dshemetov commented Apr 2, 2024

Given today's meeting discussion, I'm going to abandon this argument validation track. I'm going to revert those changes and instead add some messaging to the canned forecasters that make its arguments clearer.

UPDATE: Done, force-pushed.

@dshemetov
Copy link
Contributor Author

So my addition to layer_residual_quantiles successfully errors when n_training < ahead in flatline_forecaster, but I got stuck when trying to make a non-canned unit test, not sure why residuals keep coming out non-empty in the case below. Can't tell if I'm just misunderstanding epipredict here or my fix in this PR isn't a tight-enough check for n_training < ahead. Thoughts @dajmcdon ?

test_that("error when residuals NA", {
  r <- epi_recipe(jhu) %>%
    step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
    step_epi_ahead(death_rate, ahead = 7) %>%
    step_epi_naomit() %>%
    step_training_window(n_recent = 6)
  f <- frosting() %>%
    layer_residual_quantiles(quantile_levels = c(0.5, 0.95))

  latest <- get_test_data(recipe = r, x = jhu)
  wf <- epi_workflow(r, parsnip::linear_reg(), f)
  wf <- fit(wf, jhu)
  # Does not throw error
  expect_error(predict(wf, latest))
  # Because wf$fit$fit$fit$residuals is not NA
  wf$fit$fit$fit$residuals
})

@dajmcdon
Copy link
Contributor

dajmcdon commented Apr 3, 2024

It's due to the ordering of the steps. In this case, the NA's are omitted first, rather than after limiting the training window. I suspect this is actually "what you want", (but not what you thought you were getting).

jhu <- case_death_rate_subset
ds_r <- epi_recipe(jhu) %>%
  step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
  step_epi_ahead(death_rate, ahead = 7) %>%
  step_epi_naomit() %>%
  step_training_window(n_recent = 6)

other_r <- epi_recipe(jhu) %>%
  step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
  step_epi_ahead(death_rate, ahead = 7) %>%
  step_training_window(n_recent = 6) %>%
  step_epi_naomit()

# Show the training data
bake(prep(ds_r, jhu), new_data = NULL)
#> An `epi_df` object, 336 x 8 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2022-05-31 12:08:25.791826
#> 
#> # A tibble: 336 × 8
#>    time_value geo_value case_rate death_rate lag_0_death_rate lag_7_death_rate
#>  * <date>     <chr>         <dbl>      <dbl>            <dbl>            <dbl>
#>  1 2021-12-19 ak             23.1      1.19             1.19            0.0593
#>  2 2021-12-20 ak             23.2      1.17             1.17            0.0791
#>  3 2021-12-21 ak             23.2      1.17             1.17            0.0791
#>  4 2021-12-22 ak             20.3      1.72             1.72            0.0593
#>  5 2021-12-23 ak             20.3      1.72             1.72            0.0593
#>  6 2021-12-24 ak             12.6      0.593            0.593           1.19  
#>  7 2021-12-19 al             16.2      0.255            0.255           0.232 
#>  8 2021-12-20 al             16.9      0.255            0.255           0.232 
#>  9 2021-12-21 al             15.7      0.241            0.241           0.250 
#> 10 2021-12-22 al             18.1      0.221            0.221           0.293 
#> # ℹ 326 more rows
#> # ℹ 2 more variables: lag_14_death_rate <dbl>, ahead_7_death_rate <dbl>
bake(prep(other_r, jhu), new_data = NULL) # an empty tibble
#> Warning in max.default(structure(numeric(0), class = "Date"), na.rm = FALSE):
#> no non-missing arguments to max; returning -Inf
#> An `epi_df` object, 0 x 8 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2022-05-31 12:08:25.791826
#> 
#> # A tibble: 0 × 8
#> # ℹ 8 variables: time_value <date>, geo_value <chr>, case_rate <dbl>,
#> #   death_rate <dbl>, lag_0_death_rate <dbl>, lag_7_death_rate <dbl>,
#> #   lag_14_death_rate <dbl>, ahead_7_death_rate <dbl>

There's another option too, and this is what I was describing: where we limit the training data FIRST:

another_r <- epi_recipe(jhu) %>%
  step_training_window(n_recent = 6) %>%
  step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
  step_epi_ahead(death_rate, ahead = 7)

bake(prep(another_r, jhu), new_data = NULL) # an empty tibble
#> An `epi_df` object, 1,344 x 8 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2022-05-31 12:08:25.791826
#> 
#> # A tibble: 1,344 × 8
#>    time_value geo_value case_rate death_rate lag_0_death_rate lag_7_death_rate
#>  * <date>     <chr>         <dbl>      <dbl>            <dbl>            <dbl>
#>  1 2021-12-19 ak               NA         NA               NA               NA
#>  2 2021-12-19 al               NA         NA               NA               NA
#>  3 2021-12-19 ar               NA         NA               NA               NA
#>  4 2021-12-19 as               NA         NA               NA               NA
#>  5 2021-12-19 az               NA         NA               NA               NA
#>  6 2021-12-19 ca               NA         NA               NA               NA
#>  7 2021-12-19 co               NA         NA               NA               NA
#>  8 2021-12-19 ct               NA         NA               NA               NA
#>  9 2021-12-19 dc               NA         NA               NA               NA
#> 10 2021-12-19 de               NA         NA               NA               NA
#> # ℹ 1,334 more rows
#> # ℹ 2 more variables: lag_14_death_rate <dbl>, ahead_7_death_rate <dbl>

@dshemetov
Copy link
Contributor Author

Got it, thank you. So in your case, the train data is actually either empty or full of NA rows, so I'm getting an lm error before we even get to layer_quantile_residuals, so there's no unit test to do there. Any other tests to add here or can I just merge this into your PR?

@dshemetov dshemetov merged commit d933087 into djm/fix-290 Apr 3, 2024
@dshemetov dshemetov deleted the ds/fix-290 branch April 3, 2024 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants