bug in `layer_add_forecast_date()` #109

dajmcdon · 2022-07-20T01:26:56Z

The default forecast_date (and target_date) seems wrong.

Using case_death_rate_subset the fd should default to 2021-12-31 (most recent available) but instead is 2022-01-28. I think this has to do with the fact that the components$keys$time_value gets adjusted in the training data to add the ahead. We actually need the latest value in the test data (though if there isn't a lag 0, this wouldn't work either).

It may help to have access to the entire workflow at predict time as suggested in #108 .

The text was updated successfully, but these errors were encountered:

rachlobay · 2022-07-20T16:33:25Z

Whoops... Posted in the related #108 by accident, so moving those comments over here.

Based on the ~July 6 discussion about this on Components conundrum doc, the time_value from components$keys is currently being used for both forecast_date and test_date defaults. If you want the non-forged test data to be used instead, it still does not look to be currently available in components, as I think you noted. For reference, I will post the ex. from that doc below to show what is available in components for the layer_add_forecast_date test (it looks like the molded training data and the forged test data parts are there). If that is the case, components does not look to have what we need. So what is the best way to get the original test data? Has there been something new introduced that would make this a piece of cake because right now, it isn’t clear that object, components, the_recipe, or the_fit contains it… And even if we included the_frosting in slather(), would that contain the test data that is just inputted in predict()? I am not sure about that, so let me know if it does…. Likewise for the_workflow.

We could always revert these back to what I had a long time ago and have the user input the (non-forged) test data into the layer directly as an argument, but I’d have thought that there’d be an easier way for something like this.

rachlobay · 2022-07-20T16:34:42Z

Components conundrum ex. set-up (see test-layer_add_forecast_date.R for this):

jhu <- case_death_rate_subset %>%
  dplyr::filter(time_value > "2021-11-01", geo_value %in% c("ak", "ca", "ny"))
r <- epi_recipe(jhu) %>%
  step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
  step_epi_ahead(death_rate, ahead = 7) %>%
  step_naomit(all_predictors()) %>%
  step_naomit(all_outcomes(), skip = TRUE)
wf <- epi_workflow(r, parsnip::linear_reg()) %>% fit(jhu)
latest <- jhu %>%
  dplyr::filter(time_value >= max(time_value) - 14)

What (I think) we want to extract max(time_value):

> latest
An `epi_df` object, with metadata:
* geo_type  = state
* time_type = day
* as_of     = 2022-05-31 12:08:25

# A tibble: 45 × 4
   geo_value time_value case_rate death_rate
 * <chr>     <date>         <dbl>      <dbl>
 1 ak        2021-12-17      23.1      1.19 
 2 ca        2021-12-17      16.9      0.158
 3 ny        2021-12-17      67.1      0.310
 4 ak        2021-12-18      23.1      1.19 
 5 ca        2021-12-18      17.6      0.164
 6 ny        2021-12-18      71.0      0.309
 7 ak        2021-12-19      23.1      1.19 
 8 ca        2021-12-19      19.1      0.165
 9 ny        2021-12-19      84.1      0.319
10 ak        2021-12-20      23.2      1.17 
# … with 35 more rows

What components contains (pertains to the test about the default forecast_date).

> components
$mold
$mold$predictors
# A tibble: 117 × 3
   lag_0_death_rate lag_7_death_rate lag_14_death_rate
              <dbl>            <dbl>             <dbl>
 1            0.336            1.74              0.395
 2            0.225            0.180             0.201
 3            0.168            0.192             0.171
 4            0.198            1.86              0.415
 5            0.198            0.210             0.186
 6            0.166            0.184             0.176
 7            0.198            1.80              0.376
 8            0.213            0.217             0.189
 9            0.169            0.191             0.177
10            0.593            1.78              0.316
# … with 107 more rows

$mold$outcomes
# A tibble: 117 × 1
   ahead_7_death_rate
                <dbl>
 1              0.474
 2              0.234
 3              0.170
 4              0.553
 5              0.251
 6              0.181
 7              0.553
 8              0.202
 9              0.166
10              0.296
# … with 107 more rows

$mold$blueprint
Recipe blueprint: 
 
# Predictors: 0 
  # Outcomes: 0 
   Intercept: FALSE 
Novel Levels: FALSE 
 Composition: tibble 

$mold$extras
$mold$extras$roles
$mold$extras$roles$time_value
# A tibble: 117 × 1
   time_value
   <date>    
 1 2021-11-16
 2 2021-11-16
 3 2021-11-16
 4 2021-11-17
 5 2021-11-17
 6 2021-11-17
 7 2021-11-18
 8 2021-11-18
 9 2021-11-18
10 2021-11-19
# … with 107 more rows

$mold$extras$roles$geo_value
# A tibble: 117 × 1
   geo_value
   <chr>    
 1 ak       
 2 ca       
 3 ny       
 4 ak       
 5 ca       
 6 ny       
 7 ak       
 8 ca       
 9 ny       
10 ak       
# … with 107 more rows

$mold$extras$roles$raw
# A tibble: 117 × 2
   case_rate death_rate
       <dbl>      <dbl>
 1      53.5      0.336
 2      13.2      0.225
 3      29.1      0.168
 4      53.3      0.198
 5      13.3      0.198
 6      29.8      0.166
 7      63.5      0.198
 8      14.5      0.213
 9      30.9      0.169
10      56.7      0.593
# … with 107 more rows




$forged
$forged$predictors
# A tibble: 108 × 3
   lag_0_death_rate lag_7_death_rate lag_14_death_rate
              <dbl>            <dbl>             <dbl>
 1               NA               NA                NA
 2               NA               NA                NA
 3               NA               NA                NA
 4               NA               NA                NA
 5               NA               NA                NA
 6               NA               NA                NA
 7               NA               NA                NA
 8               NA               NA                NA
 9               NA               NA                NA
10               NA               NA                NA
# … with 98 more rows

$forged$outcomes
NULL

$forged$extras
$forged$extras$roles
$forged$extras$roles$time_value
# A tibble: 108 × 1
   time_value
   <date>    
 1 2021-12-10
 2 2021-12-10
 3 2021-12-10
 4 2021-12-11
 5 2021-12-11
 6 2021-12-11
 7 2021-12-12
 8 2021-12-12
 9 2021-12-12
10 2021-12-13
# … with 98 more rows

$forged$extras$roles$geo_value
# A tibble: 108 × 1
   geo_value
   <chr>    
 1 ak       
 2 ca       
 3 ny       
 4 ak       
 5 ca       
 6 ny       
 7 ak       
 8 ca       
 9 ny       
10 ak       
# … with 98 more rows

$forged$extras$roles$raw
# A tibble: 108 × 2
   case_rate death_rate
       <dbl>      <dbl>
 1        NA         NA
 2        NA         NA
 3        NA         NA
 4        NA         NA
 5        NA         NA
 6        NA         NA
 7        NA         NA
 8        NA         NA
 9        NA         NA
10        NA         NA
# … with 98 more rows




$keys
An `epi_df` object, with metadata:
* geo_type  = state
* time_type = day
* as_of     = 2022-05-31 12:08:25

# A tibble: 108 × 2
   geo_value time_value
 * <chr>     <date>    
 1 ak        2021-12-10
 2 ca        2021-12-10
 3 ny        2021-12-10
 4 ak        2021-12-11
 5 ca        2021-12-11
 6 ny        2021-12-11
 7 ak        2021-12-12
 8 ca        2021-12-12
 9 ny        2021-12-12
10 ak        2021-12-13
# … with 98 more rows

$predictions
An `epi_df` object, with metadata:
* geo_type  = state
* time_type = day
* as_of     = 2022-05-31 12:08:25

# A tibble: 108 × 3
   geo_value time_value .pred
   <chr>     <date>     <dbl>
 1 ak        2021-12-10    NA
 2 ca        2021-12-10    NA
 3 ny        2021-12-10    NA
 4 ak        2021-12-11    NA
 5 ca        2021-12-11    NA
 6 ny        2021-12-11    NA
 7 ak        2021-12-12    NA
 8 ca        2021-12-12    NA
 9 ny        2021-12-12    NA
10 ak        2021-12-13    NA
# … with 98 more rows

dajmcdon · 2022-08-03T16:30:23Z

It seems like the easiest thing at the moment is to adjust the slather() signature to also receive the test data (for all slather.layer_name() methods). Do you agree?

dajmcdon · 2022-08-03T16:33:08Z

I think we do that plus the entire workflow (as in #108).

One alternative (or addition) would be to add a step that detects the forecast date at training time. And then, given the workflow, we could look for that.

rachlobay · 2022-08-03T22:49:16Z

Ok. I agree with the first suggestion to adjust the signature of slather() to also get the test data... If you think that having a step that detects the forecast date at training time will be very useful beyond this function, then we could add that, but otherwise let's go for the simplest soln.

dajmcdon added bug Something isn't working P0 do this immediately labels Jul 20, 2022

rachlobay mentioned this issue Aug 16, 2022

mapply() error when trying to use percent_cli as a covariate in arx_forecaster() #128

Closed

rachlobay mentioned this issue Jul 17, 2023

Slather access workflow and test data & update layer_add_target_date() and layer_add_forecast_date() accordingly #220

Merged

rachlobay linked a pull request Jul 17, 2023 that will close this issue

Slather access workflow and test data & update layer_add_target_date() and layer_add_forecast_date() accordingly #220

Merged

rachlobay closed this as completed in #220 Aug 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug in `layer_add_forecast_date()` #109

bug in `layer_add_forecast_date()` #109

dajmcdon commented Jul 20, 2022 •

edited

Loading

rachlobay commented Jul 20, 2022 •

edited

Loading

rachlobay commented Jul 20, 2022

dajmcdon commented Aug 3, 2022

dajmcdon commented Aug 3, 2022

rachlobay commented Aug 3, 2022 •

edited

Loading

bug in layer_add_forecast_date() #109

bug in layer_add_forecast_date() #109

Comments

dajmcdon commented Jul 20, 2022 • edited Loading

rachlobay commented Jul 20, 2022 • edited Loading

rachlobay commented Jul 20, 2022

dajmcdon commented Aug 3, 2022

dajmcdon commented Aug 3, 2022

rachlobay commented Aug 3, 2022 • edited Loading

bug in `layer_add_forecast_date()` #109

bug in `layer_add_forecast_date()` #109

dajmcdon commented Jul 20, 2022 •

edited

Loading

rachlobay commented Jul 20, 2022 •

edited

Loading

rachlobay commented Aug 3, 2022 •

edited

Loading