fix: more bugs in #290 #304

dshemetov · 2024-04-02T17:59:43Z

Checklist

Please:

~~Make sure this PR is against "dev", not "main".~~
Request a review from one of the current epipredict main reviewers:
dajmcdon.
Makes sure to bump the version number in DESCRIPTION and NEWS.md.
Always increment the patch version number (the third number), unless you are
making a release PR from dev to main, in which case increment the minor
version number (the second number).
Describe changes made in NEWS.md, making sure breaking changes
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
0.7.2, then write your changes under the 0.8 heading).

Change explanations for reviewer

Addresses some additional bugs in #290, merge into #300

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

dshemetov · 2024-04-02T19:49:40Z

Given today's meeting discussion, I'm going to abandon this argument validation track. I'm going to revert those changes and instead add some messaging to the canned forecasters that make its arguments clearer.

UPDATE: Done, force-pushed.

dshemetov · 2024-04-03T17:10:38Z

So my addition to layer_residual_quantiles successfully errors when n_training < ahead in flatline_forecaster, but I got stuck when trying to make a non-canned unit test, not sure why residuals keep coming out non-empty in the case below. Can't tell if I'm just misunderstanding epipredict here or my fix in this PR isn't a tight-enough check for n_training < ahead. Thoughts @dajmcdon ?

test_that("error when residuals NA", {
  r <- epi_recipe(jhu) %>%
    step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
    step_epi_ahead(death_rate, ahead = 7) %>%
    step_epi_naomit() %>%
    step_training_window(n_recent = 6)
  f <- frosting() %>%
    layer_residual_quantiles(quantile_levels = c(0.5, 0.95))

  latest <- get_test_data(recipe = r, x = jhu)
  wf <- epi_workflow(r, parsnip::linear_reg(), f)
  wf <- fit(wf, jhu)
  # Does not throw error
  expect_error(predict(wf, latest))
  # Because wf$fit$fit$fit$residuals is not NA
  wf$fit$fit$fit$residuals
})

dajmcdon · 2024-04-03T19:49:38Z

It's due to the ordering of the steps. In this case, the NA's are omitted first, rather than after limiting the training window. I suspect this is actually "what you want", (but not what you thought you were getting).

jhu <- case_death_rate_subset
ds_r <- epi_recipe(jhu) %>%
  step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
  step_epi_ahead(death_rate, ahead = 7) %>%
  step_epi_naomit() %>%
  step_training_window(n_recent = 6)

other_r <- epi_recipe(jhu) %>%
  step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
  step_epi_ahead(death_rate, ahead = 7) %>%
  step_training_window(n_recent = 6) %>%
  step_epi_naomit()

# Show the training data
bake(prep(ds_r, jhu), new_data = NULL)
#> An `epi_df` object, 336 x 8 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2022-05-31 12:08:25.791826
#> 
#> # A tibble: 336 × 8
#>    time_value geo_value case_rate death_rate lag_0_death_rate lag_7_death_rate
#>  * <date>     <chr>         <dbl>      <dbl>            <dbl>            <dbl>
#>  1 2021-12-19 ak             23.1      1.19             1.19            0.0593
#>  2 2021-12-20 ak             23.2      1.17             1.17            0.0791
#>  3 2021-12-21 ak             23.2      1.17             1.17            0.0791
#>  4 2021-12-22 ak             20.3      1.72             1.72            0.0593
#>  5 2021-12-23 ak             20.3      1.72             1.72            0.0593
#>  6 2021-12-24 ak             12.6      0.593            0.593           1.19  
#>  7 2021-12-19 al             16.2      0.255            0.255           0.232 
#>  8 2021-12-20 al             16.9      0.255            0.255           0.232 
#>  9 2021-12-21 al             15.7      0.241            0.241           0.250 
#> 10 2021-12-22 al             18.1      0.221            0.221           0.293 
#> # ℹ 326 more rows
#> # ℹ 2 more variables: lag_14_death_rate <dbl>, ahead_7_death_rate <dbl>
bake(prep(other_r, jhu), new_data = NULL) # an empty tibble
#> Warning in max.default(structure(numeric(0), class = "Date"), na.rm = FALSE):
#> no non-missing arguments to max; returning -Inf
#> An `epi_df` object, 0 x 8 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2022-05-31 12:08:25.791826
#> 
#> # A tibble: 0 × 8
#> # ℹ 8 variables: time_value <date>, geo_value <chr>, case_rate <dbl>,
#> #   death_rate <dbl>, lag_0_death_rate <dbl>, lag_7_death_rate <dbl>,
#> #   lag_14_death_rate <dbl>, ahead_7_death_rate <dbl>

There's another option too, and this is what I was describing: where we limit the training data FIRST:

another_r <- epi_recipe(jhu) %>%
  step_training_window(n_recent = 6) %>%
  step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
  step_epi_ahead(death_rate, ahead = 7)

bake(prep(another_r, jhu), new_data = NULL) # an empty tibble
#> An `epi_df` object, 1,344 x 8 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2022-05-31 12:08:25.791826
#> 
#> # A tibble: 1,344 × 8
#>    time_value geo_value case_rate death_rate lag_0_death_rate lag_7_death_rate
#>  * <date>     <chr>         <dbl>      <dbl>            <dbl>            <dbl>
#>  1 2021-12-19 ak               NA         NA               NA               NA
#>  2 2021-12-19 al               NA         NA               NA               NA
#>  3 2021-12-19 ar               NA         NA               NA               NA
#>  4 2021-12-19 as               NA         NA               NA               NA
#>  5 2021-12-19 az               NA         NA               NA               NA
#>  6 2021-12-19 ca               NA         NA               NA               NA
#>  7 2021-12-19 co               NA         NA               NA               NA
#>  8 2021-12-19 ct               NA         NA               NA               NA
#>  9 2021-12-19 dc               NA         NA               NA               NA
#> 10 2021-12-19 de               NA         NA               NA               NA
#> # ℹ 1,334 more rows
#> # ℹ 2 more variables: lag_14_death_rate <dbl>, ahead_7_death_rate <dbl>

dshemetov · 2024-04-03T22:36:55Z

Got it, thank you. So in your case, the train data is actually either empty or full of NA rows, so I'm getting an lm error before we even get to layer_quantile_residuals, so there's no unit test to do there. Any other tests to add here or can I just merge this into your PR?

R/layer_residual_quantiles.R

Co-authored-by: Daniel McDonald <[email protected]>

dshemetov requested a review from dajmcdon as a code owner April 2, 2024 17:59

dshemetov force-pushed the ds/fix-290 branch from c8917d6 to 3db3677 Compare April 2, 2024 18:06

dshemetov force-pushed the ds/fix-290 branch from 3db3677 to ebccec5 Compare April 3, 2024 01:16

dshemetov added 4 commits April 3, 2024 10:05

fix: address n_train and ahead mismatch #290

dee4de4

doc: update NEWS

6f3035b

lint: add .lintr config from epiprocess

33be27f

fix: partial match warnings

9e57965

dshemetov force-pushed the ds/fix-290 branch from ebccec5 to 9e57965 Compare April 3, 2024 17:05

dajmcdon approved these changes Apr 3, 2024

View reviewed changes

R/layer_residual_quantiles.R Outdated Show resolved Hide resolved

Update R/layer_residual_quantiles.R

18860ce

Co-authored-by: Daniel McDonald <[email protected]>

dshemetov merged commit d933087 into djm/fix-290 Apr 3, 2024

dshemetov deleted the ds/fix-290 branch April 3, 2024 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: more bugs in #290 #304

fix: more bugs in #290 #304

dshemetov commented Apr 2, 2024 •

edited

Loading

dshemetov commented Apr 2, 2024 •

edited

Loading

dshemetov commented Apr 3, 2024

dajmcdon commented Apr 3, 2024

dshemetov commented Apr 3, 2024

fix: more bugs in #290 #304

fix: more bugs in #290 #304

Conversation

dshemetov commented Apr 2, 2024 • edited Loading

Checklist

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

dshemetov commented Apr 2, 2024 • edited Loading

dshemetov commented Apr 3, 2024

dajmcdon commented Apr 3, 2024

dshemetov commented Apr 3, 2024

dshemetov commented Apr 2, 2024 •

edited

Loading

dshemetov commented Apr 2, 2024 •

edited

Loading