Skip to content

Created a preprocessing step that limits the size of the training window #53

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Aug 29, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
d558289
Added code to make training_window, roxygen comments and some tests
Jun 17, 2022
6bf446c
Removed id from user facing fun
Jun 22, 2022
975afdb
Added ID back to how it was before (encountered fatal error)
Jun 22, 2022
ae759f7
,
Jun 22, 2022
1dd28a3
Made some changes as requested
Jun 24, 2022
18d8dd5
Added ... to print as per warning
Jun 24, 2022
0b3ee3b
Updated doc
Jun 24, 2022
dad1580
dplyr::all_of()
Jun 24, 2022
20e5961
Added another test
Jun 25, 2022
6114f3b
Updated ex. that includes multiple keys
Jun 25, 2022
126b81b
Merge branch 'frosting' into 36-step_training_window
rachlobay Jun 25, 2022
431d9e6
Added space
Jun 25, 2022
cca063d
rlang::enquos
Jun 25, 2022
8fc9017
testing
Jun 25, 2022
d6273cb
removed tibble::as_tibble()
Jun 26, 2022
c03370c
Pulled all changes from frosting and tried to resolve conflict with n…
rachlobay Aug 12, 2022
ed12a04
Trying epi_juice soln to decay to tibble problem
rachlobay Aug 12, 2022
e4d07f1
Round 2 to try to get epi_juice to work
rachlobay Aug 12, 2022
0055837
<<- Make assign outside of fun
rachlobay Aug 12, 2022
1a74021
utils::
rachlobay Aug 12, 2022
d4d132e
Delete .gitignore 2
rachlobay Aug 12, 2022
38e8df9
Add sliding vignette from comp
rachlobay Aug 15, 2022
21dbb7f
Delete here.
rachlobay Aug 15, 2022
0fa65ef
Added bake.epi_recipe and removed related code in zzz
rachlobay Aug 16, 2022
f42e04e
Added formats as in original bake
rachlobay Aug 16, 2022
93fec95
Remove hopefully unnecessary call
rachlobay Aug 16, 2022
8e81e7c
Added roxygen doc and devtools::document()
rachlobay Aug 16, 2022
4581115
Add is_empty to namespace
rachlobay Aug 16, 2022
207733c
Added some necessary funs from recipes
rachlobay Aug 16, 2022
57d10ac
Changed documentation
rachlobay Aug 16, 2022
2044b62
Merge branch 'frosting' into 36-step_training_window
rachlobay Aug 16, 2022
e1ddfee
Added @param
rachlobay Aug 16, 2022
5ffadca
Merge branch '36-step_training_window' of https://github.com/cmu-delp…
rachlobay Aug 16, 2022
d4cd12a
Removed one abort message
rachlobay Aug 16, 2022
243e45e
recipes:::strings2factors
rachlobay Aug 16, 2022
3f81cb0
Remove some unnecessary comments
rachlobay Aug 16, 2022
57fcae7
See if now works after epiprocess genlasso switch
rachlobay Aug 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ S3method(as_tibble,epi_df)
S3method(augment,epi_workflow)
S3method(bake,step_epi_ahead)
S3method(bake,step_epi_lag)
S3method(bake,step_training_window)
S3method(epi_keys,default)
S3method(epi_keys,epi_df)
S3method(epi_keys,recipe)
Expand All @@ -16,8 +17,10 @@ S3method(predict,epi_workflow)
S3method(prep,epi_recipe)
S3method(prep,step_epi_ahead)
S3method(prep,step_epi_lag)
S3method(prep,step_training_window)
S3method(print,step_epi_ahead)
S3method(print,step_epi_lag)
S3method(print,step_training_window)
S3method(slather,layer_naomit)
S3method(slather,layer_predict)
export("%>%")
Expand Down Expand Up @@ -47,6 +50,7 @@ export(smooth_arx_args_list)
export(smooth_arx_forecaster)
export(step_epi_ahead)
export(step_epi_lag)
export(step_training_window)
import(recipes)
importFrom(generics,augment)
importFrom(generics,fit)
Expand Down
109 changes: 109 additions & 0 deletions R/training_window.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
#' Limits the size of the training window to the most recent observations
#'
#' `step_training_window` creates a *specification* of a recipe step that
#' limit the size of the training window to the `nrec` most recent
#' observations in `time_value` per location from `geo_value`.
#'
#' @param recipe A recipe object. The step will be added to the
#' sequence of operations for this recipe.
#' @param ... One or more selector functions to choose variables
#' for this step. See [selections()] for more details.
#' @param role For model terms created by this step, what analysis role should
#' they be assigned?
#' @param trained A logical to indicate if the quantities for
#' preprocessing have been estimated.
#' @param nrec An integer value that represents the number of most recent
#' observations that are to be kept in the training window per location.
#' The default value is 50.
#' @param skip A logical. Should the step be skipped when the
#' recipe is baked by [bake()]? While all operations are baked
#' when [prep()] is run, some operations may not be able to be
#' conducted on new data (e.g. processing the outcome variable(s)).
#' Care should be taken when using `skip = TRUE` as it may affect
#' the computations for subsequent operations.
#' @param id A character string that is unique to this step to identify it.
#' @template step-return
#'
#'
#' @export
#'
#' @examples
#' tib <- tibble::tibble(
#' x = 1:10, y = 1:10,
#' time_value = rep(seq(as.Date("2020-01-01"), by = 1,
#' length.out = 5), times = 2),
#' geo_value = rep(c("ca", "hi"), each = 5)
#' ) %>% epiprocess::as_epi_df()
#'
#' library(recipes)
#' epi_recipe(y ~ x, data = tib) %>%
#' step_training_window(nrec = 3) %>%
#' prep(tib) %>%
#' bake(new_data = NULL)
step_training_window <-
function(recipe,
...,
role = NA,
trained = FALSE,
nrec = 50,
skip = TRUE,
id = rand_id("training_window")) {

add_step(
recipe,
step_training_window_new(
role = role,
trained = trained,
nrec = nrec,
skip = skip,
id = id
)
)
}

step_training_window_new <-
function(terms, role, trained, nrec, skip, id = id) {
step(
subclass = "training_window",
role = role,
trained = trained,
nrec = nrec,
skip = skip,
id = id
)
}

#' @export
prep.step_training_window <- function(x, training, info = NULL, ...) {

step_training_window_new(
role = x$role,
trained = TRUE,
nrec = x$nrec,
skip = x$skip,
id = x$id
)
}

#' @export
bake.step_training_window <- function(object, new_data, ...) {
if (!all(object$nrec == as.integer(object$nrec))) {
rlang::abort("step_training_window requires 'nrec' to be integer valued.")
}

new_data %>% dplyr::group_by(geo_value) %>%
dplyr::arrange(time_value) %>%
dplyr::slice_tail(n = object$nrec) %>%
dplyr::ungroup()
}

#' @export
print.step_training_window <-
function(x, width = max(20, options()$width - 30), ...) {
title <- "Number of most recent observations per location used in training window "
nrec = x$nrec
tr_obj = format_selectors(enquos(nrec), width)
recipes::print_step(tr_obj, enquos(nrec),
x$trained, title, width)
invisible(x)
}
2 changes: 1 addition & 1 deletion man/epi_workflow.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

65 changes: 65 additions & 0 deletions man/step_training_window.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

49 changes: 49 additions & 0 deletions tests/testthat/test-step_training_window.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
tib <- tibble::tibble(
x = 1:200, y = 1:200,
time_value = rep(seq(as.Date("2020-01-01"), by = 1,
length.out = 100), times = 2),
geo_value = rep(c("ca", "hi"), each = 100)
) %>% epiprocess::as_epi_df()

test_that("step_training_window works with default nrec", {
p <- epi_recipe(y ~ x, data = tib) %>%
step_training_window() %>%
recipes::prep(tib) %>%
recipes::bake(new_data = NULL)

expect_equal(nrow(p), 100L)
expect_equal(ncol(p), 4L)
expect_s3_class(p, "epi_df")
expect_named(p, c("x", "y", "time_value", "geo_value"))
expect_equal(p$time_value, rep(seq(as.Date("2020-02-20"), as.Date("2020-04-09"), by = 1), times = 2))
expect_equal(p$geo_value, rep(c("ca", "hi"), each = 50))
})

test_that("step_training_window works with specified nrec", {
p2 <- epi_recipe(y ~ x, data = tib) %>%
step_training_window(nrec = 5) %>%
recipes::prep(tib) %>%
recipes::bake(new_data = NULL)

expect_equal(nrow(p2), 10L)
expect_equal(ncol(p2), 4L)
expect_s3_class(p2, "epi_df")
expect_named(p2, c("x", "y", "time_value", "geo_value"))
expect_equal(p2$time_value, rep(seq(as.Date("2020-04-05"), as.Date("2020-04-09"), by = 1), times = 2))
expect_equal(p2$geo_value, rep(c("ca", "hi"), each = 5))
})

test_that("step_training_window does not proceed with specified new_data", {
# Should just return whatever the new_data is, unaffected by the step
p3 <- epi_recipe(y ~ x, data = tib) %>%
step_training_window(nrec = 3) %>%
recipes::prep(tib) %>%
recipes::bake(new_data = tib[1:10,])

expect_equal(nrow(p3), 10L)
expect_equal(ncol(p3), 4L)
expect_s3_class(p3, "epi_df")
expect_named(p3, c("x", "y", "time_value", "geo_value"))
expect_equal(p3$time_value, rep(seq(as.Date("2020-01-01"), as.Date("2020-01-10"), by = 1), times = 1))
expect_equal(p3$geo_value, rep("ca", times = 10))
})