Skip to content

Possibly fixed 1-5 and 7 of issue #146 #157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 25 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
1cf5aec
Updated code for separate PR to deal with coupling.
kenmawer Jul 14, 2022
d730876
Removed file that can cause conflicts due to differing branches.
kenmawer Jul 14, 2022
2d96c5a
Updated tests.
kenmawer Jul 14, 2022
4c45aa5
Fixed a typo.
kenmawer Jul 14, 2022
f06b7fe
Still needs work.
kenmawer Jul 14, 2022
8683249
Made updates as to be able to check after misunderstandings.
kenmawer Jul 15, 2022
3f42e60
Added spacing.
kenmawer Jul 15, 2022
08b22e5
Fixed a typo.
kenmawer Jul 15, 2022
d5a402c
Added a test that should be granted.
kenmawer Jul 15, 2022
dfc3566
Finally fixed epix_slide to work with the right value.
kenmawer Jul 15, 2022
62ce720
Updated data for testing
kenmawer Jul 15, 2022
5ee25e6
Still needs ref_time_values refactoring.
kenmawer Jul 15, 2022
ee9e9f6
Refactored `n` to `max_version_gap` and addressed a typo.
kenmawer Jul 18, 2022
ce44a28
Refactored code.
kenmawer Jul 18, 2022
7a32fbd
Merge branch 'main' of https://github.com/cmu-delphi/epiprocess into …
kenmawer Jul 18, 2022
2a0c97e
Changed outdated name.
kenmawer Jul 18, 2022
35f53d4
Removed defaults for `slide`.
kenmawer Jul 18, 2022
0258a03
Finally fixed an annoying bug on `advanced.Rmd`.
kenmawer Jul 18, 2022
c1369c2
Fixed errors; archive vignette still has errors.
kenmawer Jul 19, 2022
e2172a3
Pulled again to ensure matching.
kenmawer Jul 19, 2022
04302b6
Updated datasets as to run properly.
kenmawer Jul 19, 2022
feb0f47
0 errors or warnings!
kenmawer Jul 20, 2022
08a580d
Refactor of `ref_time_values` still needs fixing!
kenmawer Jul 20, 2022
666c4f7
Finally refactored incorrect time_values on epix_slide.
kenmawer Jul 20, 2022
3134a55
Changed ref_versions to refer to versions by default.
kenmawer Jul 21, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 14 additions & 12 deletions R/archive.R
Original file line number Diff line number Diff line change
Expand Up @@ -286,23 +286,24 @@ epi_archive =
#' details.
#' @importFrom data.table key
#' @importFrom rlang !! !!! enquo enquos is_quosure sym syms
slide = function(f, ..., n = 7, group_by, ref_time_values,
slide = function(f, ..., max_version_gap, group_by,
ref_versions,
time_step, new_col_name = "slide_value",
as_list_col = FALSE, names_sep = "_",
all_rows = FALSE) {
# If missing, then set ref time values to be everything; else make
# sure we intersect with observed time values
if (missing(ref_time_values)) {
ref_time_values = unique(self$DT$time_value)
if (missing(ref_versions)) {
ref_versions = unique(self$DT$version)
}
else {
ref_time_values = ref_time_values[ref_time_values %in%
ref_versions = ref_versions[ref_versions %in%
unique(self$DT$time_value)]
}

# If a custom time step is specified, then redefine units
before_num = n-1
if (!missing(time_step)) before_num = time_step(n-1)
before_num = max_version_gap-1
if (!missing(time_step)) before_num = time_step(max_version_gap-1)

# What to group by? If missing, set according to internal keys
if (missing(group_by)) {
Expand All @@ -324,7 +325,7 @@ epi_archive =
# Computation for one group, one time value
comp_one_grp = function(.data_group,
f, ...,
time_value,
version,
key_vars,
new_col) {
# Carry out the specified computation
Expand Down Expand Up @@ -370,21 +371,22 @@ epi_archive =

# Note that we've already recycled comp value to make size stable,
# so tibble() will just recycle time value appropriately
return(tibble::tibble(time_value = time_value,
return(tibble::tibble(version = version,
!!new_col := comp_value))
}

# If f is not missing, then just go ahead, slide by group
if (!missing(f)) {

if (rlang::is_formula(f)) f = rlang::as_function(f)

x = purrr::map_dfr(ref_time_values, function(t) {
x = purrr::map_dfr(ref_versions, function(t) {
self$as_of(t, min_time_value = t - before_num) %>%
tibble::as_tibble() %>%
dplyr::group_by(!!!group_by) %>%
dplyr::group_modify(comp_one_grp,
f = f, ...,
time_value = t,
version = t,
key_vars = key_vars,
new_col = new_col,
.keep = TRUE) %>%
Expand All @@ -406,13 +408,13 @@ epi_archive =
f = function(x, quo, ...) rlang::eval_tidy(quo, x)
new_col = sym(names(rlang::quos_auto_name(quos)))

x = purrr::map_dfr(ref_time_values, function(t) {
x = purrr::map_dfr(ref_versions, function(t) {
self$as_of(t, min_time_value = t - before_num) %>%
tibble::as_tibble() %>%
dplyr::group_by(!!!group_by) %>%
dplyr::group_modify(comp_one_grp,
f = f, quo = quo,
time_value = t,
version = t,
key_vars = key_vars,
new_col = new_col,
.keep = TRUE) %>%
Expand Down
25 changes: 14 additions & 11 deletions R/methods-epi_archive.R
Original file line number Diff line number Diff line change
Expand Up @@ -112,15 +112,17 @@ epix_merge = function(x, y, ..., locf = TRUE, nan = NA) {
#' @param ... Additional arguments to pass to the function or formula specified
#' via `f`. Alternatively, if `f` is missing, then the current argument is
#' interpreted as an expression for tidy evaluation.
#' @param n Number of time steps to use in the running window. For example, if
#' `n = 7`, and one time step is one day, then to produce a value on January 7
#' @param max_version_gap Number of time steps to use in the running window.
#' For example, if
#' `max_version_gap = 7`, and one time step is one day, then to produce a
#' value on January 7
#' we apply the given function or formula to data in between January 1 and
#' 7. Default is 7.
#' @param group_by The variable(s) to group by before slide computation. If
#' missing, then the keys in the underlying data table, excluding `time_value`
#' and `version`, will be used for grouping. To omit a grouping entirely, use
#' `group_by = NULL`.
#' @param ref_time_values Time values for sliding computations, meaning, each
#' @param ref_versions Time values for sliding computations, meaning, each
#' element of this vector serves as the reference time point for one sliding
#' window. If missing, then this will be set to all unique time values in the
#' underlying data table, by default.
Expand Down Expand Up @@ -176,11 +178,11 @@ epix_merge = function(x, y, ..., locf = TRUE, nan = NA) {
#' Finally, this is simply a wrapper around the `slide()` method of the
#' `epi_archive` class, so if `x` is an `epi_archive` object, then:
#' ```
#' epix_slide(x, new_var = comp(old_var), n = 120)
#' epix_slide(x, new_var = comp(old_var), max_version_gap = 120)
#' ```
#' is equivalent to:
#' ```
#' x$slide(x, new_var = comp(old_var), n = 120)
#' x$slide(new_var = comp(old_var), max_version_gap = 120)
#' ```
#'
#' @importFrom rlang enquo
Expand All @@ -191,24 +193,25 @@ epix_merge = function(x, y, ..., locf = TRUE, nan = NA) {
#' # 0 day which has no results, for 2020-06-01
#' # 1 day, for 2020-06-02
#' # 2 days, for the rest of the results
#' # never 3 days dur to data latency
#' # never 3 days due to data latency
#'
#' time_values <- seq(as.Date("2020-06-01"),
#' versions <- seq(as.Date("2020-06-01"),
#' as.Date("2020-06-15"),
#' by = "1 day")
#' epix_slide(x = archive_cases_dv_subset,
#' f = ~ mean(.x$case_rate_7d_av),
#' n = 3,
#' group_by = geo_value,
#' ref_time_values = time_values,
#' ref_versions = versions,
#' new_col_name = 'case_rate_3d_av')
epix_slide = function(x, f, ..., n = 7, group_by, ref_time_values,
epix_slide = function(x, f, ..., max_version_gap, group_by, ref_versions,
time_step, new_col_name = "slide_value",
as_list_col = FALSE, names_sep = "_", all_rows = FALSE) {
if (!inherits(x, "epi_archive")) Abort("`x` must be of class `epi_archive`.")
return(x$slide(f, ..., n = n,
return(x$slide(f, ...,
max_version_gap = max_version_gap,
group_by = enquo(group_by),
ref_time_values = ref_time_values,
ref_versions = ref_versions,
time_step = time_step,
new_col_name = new_col_name,
as_list_col = as_list_col,
Expand Down
4 changes: 2 additions & 2 deletions man/epi_archive.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

22 changes: 12 additions & 10 deletions man/epix_slide.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

38 changes: 38 additions & 0 deletions tests/testthat/test-epix_slide.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
library(dplyr)

ea <- archive_cases_dv_subset$clone()

test_that("epix_slide only works on an epi_archive",{
expect_error(epix_slide(data.frame(x=1)))
})

test_that("epix_slide works as intended",{
x2 <- ea$clone()$DT %>%
filter(geo_value == "ca", version <= as.Date("2020-06-09")) %>%
select(-percent_cli,-case_rate_7d_av) %>%
mutate(binary = 2^(row_number())) %>%
as_epi_archive()

versions <- seq(as.Date("2020-06-01"),
as.Date("2020-06-09"),
by = "1 day")

xx1 <- epix_slide(x = x2,
f = ~ sum(.x$binary),
max_version_gap = 5,
group_by = geo_value,
ref_versions = versions,
new_col_name = 'sum_binary')

xx2 <- tibble(geo_value = rep("ca",7),
version = as.Date("2020-06-01") + 1:7,
sum_binary = c(2^1,
2^6+2^1,
2^11+2^6+2^1,
2^16+2^11+2^6+2^1,
2^19+2^16+2^12+2^7,
2^21+2^19+2^16+2^13,
2^22+2^21+2^19+2^17))

expect_identical(xx1,xx2)
})
18 changes: 9 additions & 9 deletions vignettes/advanced.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ ensure the result of a slide operation is *size stable*, meaning, it will return
something whose length is the same as the number of appearances of reference
time values for the slide computation in the given data frame/table (this
defaults to all time values, but can be some given subset when `ref_time_values`
is specified).
or `ref_versions` is specified, respectively).

The output of a slide computation should either be an atomic value/vector, or a
data frame. This data frame can have multiple columns, multiple rows, or both.
Expand Down Expand Up @@ -61,12 +61,12 @@ df %>%
df %>%
mutate(version = time_value) %>%
as_epi_archive() %>%
epix_slide(x_2dav = mean(x), n = 2, ref_time_values = as.Date("2020-06-02"))
epix_slide(x_2dav = mean(x), max_version_gap = 2, ref_versions = as.Date("2020-06-02"))

df %>%
mutate(version = time_value) %>%
as_epi_archive() %>%
epix_slide(~ mean(.x$x), n = 2, ref_time_values = as.Date("2020-06-02"))
epix_slide(~ mean(.x$x), max_version_gap = 2, ref_versions = as.Date("2020-06-02"))
```

When the slide computation returns an atomic vector (rather than a single value)
Expand Down Expand Up @@ -152,8 +152,8 @@ df %>%
mutate(version = time_value) %>%
as_epi_archive() %>%
epix_slide(a = data.frame(x_2dav = mean(x), x_2dma = mad(x)),
ref_time_values = as.Date("2020-06-02"),
n = 2, as_list_col = FALSE, names_sep = NULL)
ref_versions = as.Date("2020-06-02"),
max_version_gap = 2, as_list_col = FALSE, names_sep = NULL)
```

## Multi-row outputs
Expand Down Expand Up @@ -344,7 +344,7 @@ data.
```{r, message = FALSE, warning = FALSE, fig.width = 9, fig.height = 6}
# Latest snapshot of data, and forecast dates
x_latest <- epix_as_of(x, max_version = max(x$DT$version))
fc_time_values <- seq(as.Date("2020-08-01"),
fc_versions <- seq(as.Date("2020-08-01"),
as.Date("2021-12-01"),
by = "1 month")

Expand All @@ -354,15 +354,15 @@ k_week_ahead <- function(x, ahead = 7, as_of = TRUE) {
x %>%
epix_slide(fc = prob_arx(percent_cli, case_rate_7d_av, geo_value, time_value,
args = prob_arx_args(ahead = ahead)),
n = 120, ref_time_values = fc_time_values) %>%
mutate(target_date = time_value + ahead, as_of = TRUE,
max_version_gap = 120, ref_versions = fc_versions) %>%
mutate(target_date = version + ahead, as_of = TRUE,
geo_value = fc_geo_value)
}
else {
x_latest %>%
epi_slide(fc = prob_arx(percent_cli, case_rate_7d_av, geo_value, time_value,
args = prob_arx_args(ahead = ahead)),
n = 120, ref_time_values = fc_time_values) %>%
n = 120, ref_time_values = fc_versions) %>%
mutate(target_date = time_value + ahead, as_of = FALSE)
}
}
Expand Down
Loading