-
Notifications
You must be signed in to change notification settings - Fork 10
fix: more bugs in #290 #304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Given today's meeting discussion, I'm going to abandon this argument validation track. I'm going to revert those changes and instead add some messaging to the canned forecasters that make its arguments clearer. UPDATE: Done, force-pushed. |
So my addition to test_that("error when residuals NA", {
r <- epi_recipe(jhu) %>%
step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
step_epi_ahead(death_rate, ahead = 7) %>%
step_epi_naomit() %>%
step_training_window(n_recent = 6)
f <- frosting() %>%
layer_residual_quantiles(quantile_levels = c(0.5, 0.95))
latest <- get_test_data(recipe = r, x = jhu)
wf <- epi_workflow(r, parsnip::linear_reg(), f)
wf <- fit(wf, jhu)
# Does not throw error
expect_error(predict(wf, latest))
# Because wf$fit$fit$fit$residuals is not NA
wf$fit$fit$fit$residuals
}) |
It's due to the ordering of the steps. In this case, the NA's are omitted first, rather than after limiting the training window. I suspect this is actually "what you want", (but not what you thought you were getting). jhu <- case_death_rate_subset
ds_r <- epi_recipe(jhu) %>%
step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
step_epi_ahead(death_rate, ahead = 7) %>%
step_epi_naomit() %>%
step_training_window(n_recent = 6)
other_r <- epi_recipe(jhu) %>%
step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
step_epi_ahead(death_rate, ahead = 7) %>%
step_training_window(n_recent = 6) %>%
step_epi_naomit()
# Show the training data
bake(prep(ds_r, jhu), new_data = NULL)
#> An `epi_df` object, 336 x 8 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2022-05-31 12:08:25.791826
#>
#> # A tibble: 336 × 8
#> time_value geo_value case_rate death_rate lag_0_death_rate lag_7_death_rate
#> * <date> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 2021-12-19 ak 23.1 1.19 1.19 0.0593
#> 2 2021-12-20 ak 23.2 1.17 1.17 0.0791
#> 3 2021-12-21 ak 23.2 1.17 1.17 0.0791
#> 4 2021-12-22 ak 20.3 1.72 1.72 0.0593
#> 5 2021-12-23 ak 20.3 1.72 1.72 0.0593
#> 6 2021-12-24 ak 12.6 0.593 0.593 1.19
#> 7 2021-12-19 al 16.2 0.255 0.255 0.232
#> 8 2021-12-20 al 16.9 0.255 0.255 0.232
#> 9 2021-12-21 al 15.7 0.241 0.241 0.250
#> 10 2021-12-22 al 18.1 0.221 0.221 0.293
#> # ℹ 326 more rows
#> # ℹ 2 more variables: lag_14_death_rate <dbl>, ahead_7_death_rate <dbl>
bake(prep(other_r, jhu), new_data = NULL) # an empty tibble
#> Warning in max.default(structure(numeric(0), class = "Date"), na.rm = FALSE):
#> no non-missing arguments to max; returning -Inf
#> An `epi_df` object, 0 x 8 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2022-05-31 12:08:25.791826
#>
#> # A tibble: 0 × 8
#> # ℹ 8 variables: time_value <date>, geo_value <chr>, case_rate <dbl>,
#> # death_rate <dbl>, lag_0_death_rate <dbl>, lag_7_death_rate <dbl>,
#> # lag_14_death_rate <dbl>, ahead_7_death_rate <dbl> There's another option too, and this is what I was describing: where we limit the training data FIRST: another_r <- epi_recipe(jhu) %>%
step_training_window(n_recent = 6) %>%
step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
step_epi_ahead(death_rate, ahead = 7)
bake(prep(another_r, jhu), new_data = NULL) # an empty tibble
#> An `epi_df` object, 1,344 x 8 with metadata:
#> * geo_type = state
#> * time_type = day
#> * as_of = 2022-05-31 12:08:25.791826
#>
#> # A tibble: 1,344 × 8
#> time_value geo_value case_rate death_rate lag_0_death_rate lag_7_death_rate
#> * <date> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 2021-12-19 ak NA NA NA NA
#> 2 2021-12-19 al NA NA NA NA
#> 3 2021-12-19 ar NA NA NA NA
#> 4 2021-12-19 as NA NA NA NA
#> 5 2021-12-19 az NA NA NA NA
#> 6 2021-12-19 ca NA NA NA NA
#> 7 2021-12-19 co NA NA NA NA
#> 8 2021-12-19 ct NA NA NA NA
#> 9 2021-12-19 dc NA NA NA NA
#> 10 2021-12-19 de NA NA NA NA
#> # ℹ 1,334 more rows
#> # ℹ 2 more variables: lag_14_death_rate <dbl>, ahead_7_death_rate <dbl> |
Got it, thank you. So in your case, the train data is actually either empty or full of NA rows, so I'm getting an lm error before we even get to layer_quantile_residuals, so there's no unit test to do there. Any other tests to add here or can I just merge this into your PR? |
Co-authored-by: Daniel McDonald <[email protected]>
Checklist
Please:
Make sure this PR is against "dev", not "main".dajmcdon.
Makes sure to bump the version number inDESCRIPTION
andNEWS.md
.Always increment the patch version number (the third number), unless you are
making a release PR from dev to main, in which case increment the minor
version number (the second number).
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
0.7.2, then write your changes under the 0.8 heading).
Change explanations for reviewer
Addresses some additional bugs in #290, merge into #300
Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch