Skip to content

Updated epi_slide to use before and after and added checks #188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Nov 13, 2022
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
bb39209
Some cleanup of slide; still incomplete.
kenmawer Jul 26, 2022
5984d8d
Still needs changes as before and after numbers are wrong.
kenmawer Jul 26, 2022
b68af0b
Changed bad formatting.
kenmawer Jul 26, 2022
b3229f2
Still needs refactoring.
kenmawer Jul 27, 2022
d18d98c
Redocumented with changes; still needs changes.
kenmawer Jul 27, 2022
121f9d2
Bad changes that break things.
kenmawer Jul 29, 2022
35811f1
Seems like merge is broken.
kenmawer Jul 29, 2022
37b3815
Merge branch 'main' of https://github.com/cmu-delphi/epiprocess into …
kenmawer Jul 29, 2022
f6e8795
Merge branch 'km-slide-n-replace' of https://github.com/dajmcdon/epip…
kenmawer Jul 29, 2022
ee10963
Seems broken beyond repair.
kenmawer Jul 29, 2022
05d84ca
Fixed tests.
kenmawer Jul 29, 2022
846b6ca
Fixed improper use of n.
kenmawer Jul 29, 2022
d55e6b8
This finally runs without errors.
kenmawer Jul 29, 2022
1158c8a
Note that epix_slide still hasn't been updated, and some epi_slide do…
kenmawer Jul 29, 2022
bbf5d6b
Need to ensure tests pass.
kenmawer Aug 5, 2022
b55d411
This shouldn't be here.
kenmawer Aug 5, 2022
b22ace3
Removed repetitive code and added more tests.
kenmawer Aug 6, 2022
feea2f4
Merge branch 'main' into km-slide-n-replace2.1
kenmawer Aug 8, 2022
1038e15
Ran document after updating to epidatr.
kenmawer Aug 9, 2022
6e2b207
Addressed first two comments.
kenmawer Aug 9, 2022
77b5bb9
Replaced `n` in details.
kenmawer Aug 9, 2022
db99a67
Updated some poorly typed documentation and an imporperly refactored …
kenmawer Aug 9, 2022
0456aff
Cleared unclear documentation and removed redundancy with slide's code.
kenmawer Aug 10, 2022
950ee8c
Added a test for blank `after`.
kenmawer Aug 10, 2022
d43cede
Refactored edf with grouped.
kenmawer Aug 10, 2022
039f33f
More fixes.
kenmawer Aug 10, 2022
93738aa
Updated `align`.
kenmawer Aug 10, 2022
8c601f8
Fixed inconsistency with test formatting.
kenmawer Aug 10, 2022
cfe2b55
Updated compactify on a vignette, added two tests for NA and put a te…
kenmawer Aug 10, 2022
26836c4
This should not be here.
kenmawer Aug 15, 2022
8ec50dd
Added example of centre alignment.
kenmawer Aug 15, 2022
ca5c4ee
I forgot to document.
kenmawer Aug 15, 2022
ff6b0c1
Made `n` more descriptive.
kenmawer Aug 16, 2022
94aa234
Updated documentation.
kenmawer Aug 16, 2022
88eae27
Fixed up mixup with alignments.
kenmawer Aug 17, 2022
2f88b85
Replaced "rolling" with "running".
kenmawer Aug 17, 2022
7995dfe
Pulled changes to take out conflicts on .Rd.
kenmawer Aug 18, 2022
b0b2450
Implemented first point.
kenmawer Aug 19, 2022
5cd8ea9
IDK what's going on with the warning message printing...
kenmawer Aug 19, 2022
9f5ee8c
Require >=1 of `before`,`after`; ensure `time_step` receives integer
lcbrooks Aug 23, 2022
0fec3ae
Format `epi_slide` roxygen examples
lcbrooks Aug 23, 2022
d9682da
Fix some outdated docs, refine wording on others
lcbrooks Aug 23, 2022
0d3ea1b
Fix broken reference in roxygen docs
lcbrooks Aug 23, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion R/growth_rate.R
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@
#' implicitly defined by the `x` variable; for example, if `x` is a vector of
#' `Date` objects, `h = 7`, and the reference point is January 7, then the
#' sliding window contains all data in between January 1 and 14 (matching the
#' behavior of `epi_slide()` with `n = 2 * h` and `align = "center"`).
#' behavior of `epi_slide()` with `before = h - 1` and `after = h`).
#'
#' @section Additional Arguments:
#' For the global methods, "smooth_spline" and "trend_filter", additional
Expand Down
9 changes: 7 additions & 2 deletions R/outliers.R
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,11 @@ detect_outlr = function(x = seq_along(y), y,
#' `y`).
#' @param y Signal values.
#' @param n Number of time steps to use in the rolling window. Default is 21.
#' This value is centrally aligned. When `n` is an odd number, the
#' rolling range goes between `(n-1)/2` `time_value`s before to `(n-1)/2`
#' `time_value`s after. When `n` is even, then the
#' rolling range goes between `n/2-1` `time_value`s before to `n/2`
#' `time_value`s after.
#' @param log_transform Should a log transform be applied before running outlier
#' detection? Default is `FALSE`. If `TRUE`, and zeros are present, then the
#' log transform will be padded by 1.
Expand Down Expand Up @@ -179,7 +184,7 @@ detect_outlr_rm = function(x = seq_along(y), y, n = 21,

# Calculate lower and upper thresholds and replacement value
z = z %>%
epi_slide(fitted = median(y), n = n, align = "center") %>%
epi_slide(fitted = median(y), before = floor((n-1)/2), after = ceiling((n-1)/2)) %>%
dplyr::mutate(resid = y - fitted) %>%
roll_iqr(n = n,
detection_multiplier = detection_multiplier,
Expand Down Expand Up @@ -332,7 +337,7 @@ roll_iqr = function(z, n, detection_multiplier, min_radius,
if (typeof(z$y) == "integer") as_type = as.integer
else as_type = as.numeric

epi_slide(z, roll_iqr = stats::IQR(resid), n = n, align = "center") %>%
epi_slide(z, roll_iqr = stats::IQR(resid), before = floor((n-1)/2), after = ceiling((n-1)/2)) %>%
dplyr::mutate(
lower = pmax(min_lower,
fitted - pmax(min_radius, detection_multiplier * roll_iqr)),
Expand Down
144 changes: 87 additions & 57 deletions R/slide.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
#'
#' @param x The `epi_df` object under consideration.
#' @param f Function or formula to slide over variables in `x`. To "slide" means
#' to apply a function or formula over a running window of `n` time steps
#' to apply a function or formula over a running window of `before`
#' and `after` time steps
#' (where one time step is typically one day or one week; see details for more
#' explanation). If a function, `f` should take `x`, an `epi_df` with the same
#' names as the non-grouping columns, followed by `g` to refer to the one row
Expand All @@ -19,24 +20,32 @@
#' @param ... Additional arguments to pass to the function or formula specified
#' via `f`. Alternatively, if `f` is missing, then the current argument is
#' interpreted as an expression for tidy evaluation. See details.
#' @param n Number of time steps to use in the running window. For example, if
#' `n = 7`, one time step is one day, and the alignment is "right", then to
#' produce a value on January 7 we apply the given function or formula to data
#' in between January 1 and 7.
#' @param before A nonnegative integer specifying the number of time steps
#' before the `ref_time_value` to use in the running window.
#' This must be a vector of length 1.
#' Set to 0 for a right-aligned/trailing sliding window, meaning that no
#' `time_value` after the slide will be used for the sliding calculation.
#' It is mandatory to specify a `before` value, unless `after` is specified
#' as a non-zero value. In this case, `before` will be assumed to be 0, as it
#' assumes the user wants to do a left-aligned/leading sliding window.
#' However, this usage is discouraged and will thus produce a warning.
#' For example, if `before = 3`, and one time step is one day, then to produce
#' a value on January 7, we apply the given function or formula to data on
#' January 4 and later (with the latest date dependent on `after`).
#' @param after A nonnegative integer specifying the number of time steps
#' after the `ref_time_value` to use in the running window. This must be a
#' vector of length 1. The default value for this is 0. Set to 0 for a
#' left-aligned/leading sliding window, meaning that no
#' `time_value` before the slide will be used for the sliding calculation.
#' To specify this to be centrally aligned, set `before` and `after` to be
#' the same.
#' For example, if `after = 3`, and one time step is one day, then to produce
#' a value on January 7, we apply the given function or formula to data on
#' January 10 and earlier (with the earliest date dependent on `before`).
#' @param ref_time_values Time values for sliding computations, meaning, each
#' element of this vector serves as the reference time point for one sliding
#' window. If missing, then this will be set to all unique time values in the
#' underlying data table, by default.
#' @param align One of "right", "center", or "left", indicating the alignment of
#' the sliding window relative to the reference time point. If the alignment
#' is "center" and `n` is even, then one more time point will be used after
#' the reference time point than before. Default is "right".
#' @param before Positive integer less than `n`, specifying the number of time
#' points to use in the sliding window strictly before the reference time
#' point. For example, setting `before = n-1` would be the same as setting
#' `align = "right"`. The `before` argument allows for more flexible
#' specification of alignment than the `align` parameter, and if specified,
#' overrides `align`.
#' @param time_step Optional function used to define the meaning of one time
#' step, which if specified, overrides the default choice based on the
#' `time_value` column. This function must take a positive integer and return
Expand All @@ -60,27 +69,34 @@
#' according to the `new_col_name` argument.
#'
#' @details To "slide" means to apply a function or formula over a running
#' window of `n` time steps, where the unit (the meaning of one time step) is
#' window of time steps where the window is entered at a reference time and
#' left and right endpoints are given by the `before` and `after` arguments.
#' The unit (the meaning of one time step) is
#' implicitly defined by the way the `time_value` column treats addition and
#' subtraction; for example, if the time values are coded as `Date` objects,
#' then one time step is one day, since `as.Date("2022-01-01") + 1` equals
#' `as.Date("2022-01-02")`. Alternatively, the time step can be set explicitly
#' using the `time_step` argument (which if specified would override the
#' default choice based on `time_value` column). If less than `n` time steps
#' are available at any given reference time value, then `epi_slide()` still
#' default choice based on `time_value` column). If there are not enough time
#' steps available to complete the window at any given reference time, then
#' `epi_slide()` still
#' attempts to perform the computation anyway (it does not require a complete
#' window). The issue of what to do with partial computations (those run on
#' incomplete windows) is therefore left up to the user, either through the
#' specified function or formula `f`, or through post-processing.
#' specified function or formula `f`, or through post-processing. For a
#' centrally-aligned slide of `n` `time_value`s in a sliding window, set
#' `before = (n-1)/2` and `after = (n-1)/2` when the number of `time_value`s
#' in a sliding window is odd and `before = n/2-1` and `after = n/2` when
#' `n` is even.
#'
#' If `f` is missing, then an expression for tidy evaluation can be specified,
#' for example, as in:
#' ```
#' epi_slide(x, cases_7dav = mean(cases), n = 7)
#' epi_slide(x, cases_7dav = mean(cases), before = 6)
#' ```
#' which would be equivalent to:
#' ```
#' epi_slide(x, function(x, ...) mean(x$cases), n = 7,
#' epi_slide(x, function(x, ...) mean(x$cases), before = 6,
#' new_col_name = "cases_7dav")
#' ```
#' Thus, to be clear, when the computation is specified via an expression for
Expand All @@ -95,29 +111,42 @@
#' # slide a 7-day trailing average formula on cases
#' jhu_csse_daily_subset %>%
#' group_by(geo_value) %>%
#' epi_slide(cases_7dav = mean(cases), n = 7,
#' align = "right") %>%
#' epi_slide(cases_7dav = mean(cases), before = 6) %>%
#' # rmv a nonessential var. to ensure new col is printed
#' dplyr::select(-death_rate_7d_av)
#'
#' # slide a left-aligned 7-day average
#' # slide a 7-day leading average
#' jhu_csse_daily_subset %>%
#' group_by(geo_value) %>%
#' epi_slide(cases_7dav = mean(cases), n = 7,
#' align = "left") %>%
#' epi_slide(cases_7dav = mean(cases), before = 0, after = 6) %>%
#' # rmv a nonessential var. to ensure new col is printed
#' dplyr::select(-death_rate_7d_av)
#'
#' # slide a 7-day centre-aligned average
#' jhu_csse_daily_subset %>%
#' group_by(geo_value) %>%
#' epi_slide(cases_7dav = mean(cases), before = 3, after = 3) %>%
#' # rmv a nonessential var. to ensure new col is printed
#' dplyr::select(-death_rate_7d_av)
#'
#' # slide a 14-day centre-aligned average
#' jhu_csse_daily_subset %>%
#' group_by(geo_value) %>%
#' epi_slide(cases_7dav = mean(cases), before = 6, after = 7) %>%
#' # rmv a nonessential var. to ensure new col is printed
#' dplyr::select(-death_rate_7d_av)
#'
#' # nested new columns
#' jhu_csse_daily_subset %>%
#' group_by(geo_value) %>%
#' epi_slide(a = data.frame(cases_2dav = mean(cases),
#' jhu_csse_daily_subset %>%
#' group_by(geo_value) %>%
#' epi_slide(a = data.frame(cases_2dav = mean(cases),
#' cases_2dma = mad(cases)),
#' n = 2, as_list_col = TRUE)
epi_slide = function(x, f, ..., n, ref_time_values,
align = c("right", "center", "left"), before, time_step,
#' before = 1, as_list_col = TRUE)
epi_slide = function(x, f, ..., before, after = 0, ref_time_values,
time_step,
new_col_name = "slide_value", as_list_col = FALSE,
names_sep = "_", all_rows = FALSE) {

# Check we have an `epi_df` object
if (!inherits(x, "epi_df")) Abort("`x` must be of class `epi_df`.")

Expand All @@ -133,44 +162,45 @@ epi_slide = function(x, f, ..., n, ref_time_values,
ref_time_values = ref_time_values[ref_time_values %in%
unique(x$time_value)]
}

# If before is missing, then use align to set up alignment

# We must ensure that both before and after are of length 1
if (length(after) != 1L || (!missing(before) && length(before) != 1L)) {
Abort("`before` and `after` must be vectors of length 1.")
}

# Before cannot be missing if after is set to 0. If after is set to a nonzero
# number, then before must be set to 0
if (missing(before)) {
align = match.arg(align)
if (align == "right") {
before_num = n-1
after_num = 0
}
else if (align == "center") {
before_num = floor((n-1)/2)
after_num = ceiling((n-1)/2)
}
else {
before_num = 0
after_num = n-1
if (after == 0) {
Abort("`before` cannot be missing when `after` is set to 0.")
} else {
Warn("`before` missing, `after` nonzero; assuming that left-aligned/leading window is desired and setting `before` = 0.")
before = 0
}
}

if (!(is.numeric(before) && is.numeric(after))||
floor(before) < ceiling(before) ||
floor(after) < ceiling(after)) {
Abort("`before` and `after` must be integers.")
}


# Otherwise set up alignment based on passed before value
else {
if (before < 0 || before > n-1) {
Abort("`before` must be in between 0 and n-1`.")
}

before_num = before
after_num = n-1-before
if (before < 0 || after < 0) {
Abort("`before` and `after` must be at least 0.")
}

# If a custom time step is specified, then redefine units
if (!missing(time_step)) {
before_num = time_step(before_num)
after_num = time_step(after_num)
before = time_step(before)
after = time_step(after)
}

# Now set up starts and stops for sliding/hopping
time_range = range(unique(x$time_value))
starts = in_range(ref_time_values - before_num, time_range)
stops = in_range(ref_time_values + after_num, time_range)
starts = in_range(ref_time_values - before, time_range)
stops = in_range(ref_time_values + after, time_range)

if( length(starts) == 0 || length(stops) == 0 ) {
Abort("The starting and/or stopping times for sliding are out of bounds with respect to the range of times in your data. Check your settings for ref_time_values and align (and before, if specified).")
Expand Down
12 changes: 8 additions & 4 deletions man/as_epi_archive.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 5 additions & 5 deletions man/as_epi_df.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 6 additions & 1 deletion man/detect_outlr_rm.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading