-
Notifications
You must be signed in to change notification settings - Fork 0
Add the 7dav we talked about along with the std #76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
e4b3d45
feat: rolling_mean/sd for a new forecaster
dsweber2 4a78810
consistent name, only smooth non-smoothed, init forecaster
dsweber2 1edc6f6
smoothed_scaled passes all forecaster tests
dsweber2 e63be89
smoothed_scaled data tests
dsweber2 ea335c9
docfix, points no longer oversized
dsweber2 8df5f05
docs only tell one thing at a time
dsweber2 36608f6
Update R/data_transforms.R
dsweber2 0eb61e3
fix: warnings are one at a time, apparently
dsweber2 c50249b
switch to epi_slide, add logan's suggestions, NA tests
dsweber2 0b16c81
test: fix updated assumption
dsweber2 514219e
test: make sure keep is off by default
dsweber2 b976103
docs: slightly better
dsweber2 c4c7430
continuing to clarify update_predictors
dsweber2 d6d4cdd
fix before behavior: mean tests simplified
dsweber2 d542be2
various suggestions from logan, before=n_points-1
dsweber2 16dd527
fix tests (sd lag should only be 0)
dsweber2 4b34428
include smoothed_scaling in the targets
dsweber2 af3ff95
perform_sanity_checks -> sanitize_args_predictors_trainer
dsweber2 ebb9db3
zeallot (%<-%) needs all args
dsweber2 15d35d7
fix %<-% usage
dsweber2 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
# various reusable transforms to apply before handing to epipredict | ||
|
||
#' extract the non-key, non-smoothed columns from epi_data | ||
#' @keywords internal | ||
#' @param epi_data the `epi_df` | ||
#' @param cols vector of column names to use. If `NULL`, fill with all non-key columns | ||
get_trainable_names <- function(epi_data, cols) { | ||
if (is.null(cols)) { | ||
cols <- get_nonkey_names(epi_data) | ||
# exclude anything with the same naming schema as the rolling average/sd created below | ||
cols <- cols[!grepl("_\\w{1,2}\\d+", cols)] | ||
} | ||
return(cols) | ||
} | ||
|
||
#' just the names which aren't keys for an epi_df | ||
#' @description | ||
#' names, but it excludes keys | ||
#' @param epi_data the epi_df | ||
get_nonkey_names <- function(epi_data) { | ||
cols <- names(epi_data) | ||
cols <- cols[!(cols %in% c("geo_value", "time_value", attr(epi_data, "metadata")$other_keys))] | ||
return(cols) | ||
} | ||
|
||
|
||
#' update the predictors to only contain the smoothed/sd versions of cols | ||
#' @description | ||
#' modifies the list of preditors so that any which have been modified have the | ||
#' modified versions included, and not the original. Should only be applied | ||
#' after both rolling_mean and rolling_sd. | ||
#' @param epi_data the epi_df, only included to get the non-key column names | ||
#' @param cols_modified the list of columns which have been modified. If this is `NULL`, that means we were modifying every column. | ||
#' @param predictors the initial set of predictors; any unmodified are kept, any modified are replaced with the modified versions (e.g. "a" becoming "a_m17"). | ||
#' @importFrom purrr map map_chr reduce | ||
#' @return returns an updated list of predictors, with modified columns replaced and non-modified columns left intact. | ||
#' @export | ||
dsweber2 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
update_predictors <- function(epi_data, cols_modified, predictors) { | ||
if (!is.null(cols_modified)) { | ||
# if cols_modified isn't null, make sure we include predictors that weren't modified | ||
unchanged_predictors <- map(cols_modified, ~ !grepl(.x, predictors, fixed = TRUE)) %>% reduce(`&`) | ||
unchanged_predictors <- predictors[unchanged_predictors] | ||
} else { | ||
# if it's null, we've modified every predictor | ||
unchanged_predictors <- character(0L) | ||
} | ||
# all the non-key names | ||
col_names <- get_nonkey_names(epi_data) | ||
is_present <- function(original_predictor) { | ||
grepl(original_predictor, col_names) & !(col_names %in% predictors) | ||
} | ||
is_modified <- map(predictors, is_present) %>% reduce(`|`) | ||
new_predictors <- col_names[is_modified] | ||
return(c(unchanged_predictors, new_predictors)) | ||
} | ||
|
||
#' get a rolling average for the named columns | ||
#' @description | ||
#' add column(s) that are the rolling means of the specified columns, as | ||
#' implemented by slider. Defaults to the previous 7 days. | ||
#' Currently only group_by's on the geo_value. Should probably extend to more | ||
#' keys if you have them | ||
#' @param epi_data the dataset | ||
#' @param width the number of days (or examples, the sliding isn't time-aware) to use | ||
#' @param cols_to_mean the non-key columns to take the mean over. `NULL` means all | ||
#' @importFrom slider slide_dbl | ||
#' @importFrom epiprocess epi_slide | ||
#' @export | ||
rolling_mean <- function(epi_data, width = 7L, cols_to_mean = NULL) { | ||
cols_to_mean <- get_trainable_names(epi_data, cols_to_mean) | ||
epi_data %<>% group_by(geo_value) | ||
for (col in cols_to_mean) { | ||
mean_name <- paste0(col, "_m", width) | ||
epi_data %<>% epi_slide(~ mean(.x[[col]], rm.na = TRUE), before = width-1L, new_col_name = mean_name) | ||
} | ||
dsweber2 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
epi_data %<>% ungroup() | ||
return(epi_data) | ||
} | ||
|
||
#' get a rolling standard deviation for the named columns | ||
#' @description | ||
#' A rolling standard deviation, based off of a rolling mean. First it | ||
#' calculates a rolling mean with width `mean_width`, and then squares the | ||
#' difference between that and the actual value, averaged over `sd_width`. | ||
#' @param epi_data the dataset | ||
#' @param sd_width the number of days (or examples, the sliding isn't | ||
#' time-aware) to use for the standard deviation calculation | ||
#' @param mean_width like `sd_width`, but it governs the mean. Should be less | ||
#' than the `sd_width`, and if `NULL` (the default) it is half of `sd_width` | ||
#' (so 14 in the complete default case) | ||
#' @param cols_to_sd the non-key columns to take the sd over. `NULL` means all | ||
#' @param keep_mean bool, if `TRUE`, it retains keeps the mean column | ||
#' @importFrom epiprocess epi_slide | ||
#' @export | ||
rolling_sd <- function(epi_data, sd_width = 28L, mean_width = NULL, cols_to_sd = NULL, keep_mean = FALSE) { | ||
dsweber2 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (is.null(mean_width)) { | ||
mean_width <- as.integer(ceiling(sd_width / 2)) | ||
} | ||
cols_to_sd <- get_trainable_names(epi_data, cols_to_sd) | ||
result <- epi_data | ||
for (col in cols_to_sd) { | ||
result %<>% group_by(geo_value) | ||
mean_name <- paste0(col, "_m", mean_width) | ||
sd_name <- paste0(col, "_sd", sd_width) | ||
result %<>% epi_slide(~ mean(.x[[col]], na.rm = TRUE), before = mean_width-1L, new_col_name = mean_name) | ||
result %<>% epi_slide(~ sqrt(mean((.x[[mean_name]] - .x[[col]])^2, na.rm = TRUE)), before = sd_width-1, new_col_name = sd_name) | ||
if (!keep_mean) { | ||
# TODO make sure the extra info sticks around | ||
result %<>% select(-{{ mean_name }}) | ||
} | ||
result %<>% dplyr_reconstruct(epi_data) | ||
} | ||
result %<>% ungroup() | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.