Skip to content

Allow epix_slide to access version history if desired #259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Mar 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
2f74727
add all_verions arg and document
nmdefries Jan 18, 2023
671f649
implement all_versions for as_of + helper
nmdefries Jan 19, 2023
6feaa95
clone and copy archive group DTs for all_versions sliding
nmdefries Jan 23, 2023
006ca77
reduce duplicate comp_one_group calls
nmdefries Jan 23, 2023
3783fa6
comments; pass comp args to all_versions wrapper
nmdefries Jan 23, 2023
c6a7141
document
nmdefries Jan 23, 2023
44504d8
comments
nmdefries Jan 23, 2023
c700208
support grouped archives and return same grouping type as input
nmdefries Jan 31, 2023
6d6d22c
test epix_truncate_version_after
nmdefries Jan 31, 2023
f2af340
turn epix_truncate_versions_after into a generic
nmdefries Feb 1, 2023
914de42
missing .group_key arg
nmdefries Feb 3, 2023
f264e0c
test slide all_versions
nmdefries Feb 3, 2023
d3607cd
grouped archive truncate_versions to call ungrouped vers for brevity
nmdefries Feb 7, 2023
acfd6d9
attribution
nmdefries Feb 7, 2023
23bde85
test wording
nmdefries Feb 7, 2023
c9bf57e
Add R6 analogues for epix_truncate_versions_after
lcbrooks Feb 22, 2023
4bd77f0
Fix epix_slide all_versions=T providing tibble $DTs
lcbrooks Feb 24, 2023
c641722
Fix some inconsistent scoping/importing, redocument
lcbrooks Feb 24, 2023
fb5e037
Test `epix_slide` with `all_versions=TRUE`, grouping by geo
lcbrooks Mar 8, 2023
01531df
Update + add more `epix_slide` @examples
lcbrooks Mar 9, 2023
e7ed141
Add missing parameter validation for `epix_{as_of,slide}`
lcbrooks Mar 9, 2023
5278255
Fix partial var rename, bad links, redocument
lcbrooks Mar 9, 2023
23e939b
Merge branch 'lcb/grouped_epi_archive' into ndefries/epix-slide-versions
lcbrooks Mar 9, 2023
8b356d6
Add NEWS.md entry for `all_versions`
lcbrooks Mar 9, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Authors@R: c(
person("Jacob", "Bien", role = "ctb"),
person("Logan", "Brooks", role = "aut"),
person("Rafael", "Catoia", role = "ctb"),
person("Nat", "DeFries", role = "ctb"),
person("Daniel", "McDonald", role = "aut"),
person("Rachel", "Lobay", role = "ctb"),
person("Ken", "Mawer", role = "ctb"),
Expand Down
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ S3method(dplyr_col_modify,col_modify_recorder_df)
S3method(dplyr_col_modify,epi_df)
S3method(dplyr_reconstruct,epi_df)
S3method(dplyr_row_slice,epi_df)
S3method(epix_truncate_versions_after,epi_archive)
S3method(epix_truncate_versions_after,grouped_epi_archive)
S3method(group_by,epi_archive)
S3method(group_by,epi_df)
S3method(group_by,grouped_epi_archive)
Expand Down Expand Up @@ -38,6 +40,7 @@ export(epi_slide)
export(epix_as_of)
export(epix_merge)
export(epix_slide)
export(epix_truncate_versions_after)
export(filter)
export(group_by)
export(group_modify)
Expand Down
8 changes: 8 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,14 @@ development versions. A ".9999" suffix indicates a development version.
* `epi_slide` and `epix_slide` now raise an error rather than silently filtering
out `ref_time_values` that don't meet their expectations.

## New features:

* `epix_slide`, `<epi_archive>$slide` have a new parameter `all_versions`. With
`all_versions=TRUE`, `epix_slide` will pass a filtered `epi_archive` to each
computation rather than an `epi_df` snapshot. This enables, e.g., performing
pseudoprospective forecasts with a revision-aware forecaster using nested
`epix_slide` operations.

## Improvements:

* Added `dplyr::group_by` and `dplyr::ungroup` S3 methods for `epi_archive`
Expand Down
49 changes: 46 additions & 3 deletions R/archive.R
Original file line number Diff line number Diff line change
Expand Up @@ -452,7 +452,7 @@ epi_archive =
#' @description Generates a snapshot in `epi_df` format as of a given version.
#' See the documentation for the wrapper function [`epix_as_of()`] for details.
#' @importFrom data.table between key
as_of = function(max_version, min_time_value = -Inf) {
as_of = function(max_version, min_time_value = -Inf, all_versions = FALSE) {
# Self max version and other keys
other_keys = setdiff(key(self$DT),
c("geo_value", "time_value", "version"))
Expand All @@ -472,12 +472,23 @@ epi_archive =
if (max_version > self$versions_end) {
Abort("`max_version` must be at most `self$versions_end`.")
}
if (!rlang::is_bool(all_versions)) {
Abort("`all_versions` must be TRUE or FALSE.")
}
if (!is.na(self$clobberable_versions_start) && max_version >= self$clobberable_versions_start) {
Warn('Getting data as of some recent version which could still be overwritten (under routine circumstances) without assigning a new version number (a.k.a. "clobbered"). Thus, the snapshot that we produce here should not be expected to be reproducible later. See `?epi_archive` for more info and `?epix_as_of` on how to muffle.',
class="epiprocess__snapshot_as_of_clobberable_version")
}

# Filter by version and return
if (all_versions) {
result = epix_truncate_versions_after(self, max_version)
# `self` has already been `clone`d in `epix_truncate_versions_after`
# so we can modify the new archive's DT directly.
result$DT = result$DT[time_value >= min_time_value, ]
return(result)
}

return(
# Make sure to use data.table ways of filtering and selecting
self$DT[time_value >= min_time_value &
Expand Down Expand Up @@ -559,6 +570,38 @@ epi_archive =
return (invisible(self))
},
#####
#' @description Filter to keep only older versions, mutating the archive by
#' potentially reseating but not mutating some fields. `DT` is likely, but not
#' guaranteed, to be copied. Returns the mutated archive
#' [invisibly][base::invisible].
#' @param x as in [`epix_truncate_versions_after`]
#' @param max_version as in [`epix_truncate_versions_after`]
truncate_versions_after = function(max_version) {
if (length(max_version) != 1) {
Abort("`max_version` cannot be a vector.")
}
if (is.na(max_version)) {
Abort("`max_version` must not be NA.")
}
if (!identical(class(max_version), class(self$DT$version)) ||
!identical(typeof(max_version), typeof(self$DT$version))) {
Abort("`max_version` and `DT$version` must have same `class` and `typeof`.")
}
if (max_version > self$versions_end) {
Abort("`max_version` must be at most `self$versions_end`.")
}
self$DT <- self$DT[self$DT$version <= max_version, colnames(self$DT), with=FALSE]
# (^ this filter operation seems to always copy the DT, even if it
# keeps every entry; we don't guarantee this behavior in
# documentation, though, so we could change to alias in this case)
if (!is.na(self$clobberable_versions_start) &&
self$clobberable_versions_start > max_version) {
self$clobberable_versions_start <- NA
}
self$versions_end <- max_version
return (invisible(self))
},
#####
#' @description Merges another `epi_archive` with the current one, mutating the
#' current one by reseating its `DT` and several other fields, but avoiding
#' mutation of the old `DT`; returns the current archive
Expand Down Expand Up @@ -597,7 +640,7 @@ epi_archive =
slide = function(f, ..., before, ref_time_values,
time_step, new_col_name = "slide_value",
as_list_col = FALSE, names_sep = "_",
all_rows = FALSE) {
all_rows = FALSE, all_versions = FALSE) {
# For an "ungrouped" slide, treat all rows as belonging to one big
# group (group by 0 vars), like `dplyr::summarize`, and let the
# resulting `grouped_epi_archive` handle the slide:
Expand All @@ -606,7 +649,7 @@ epi_archive =
before = before, ref_time_values = ref_time_values,
time_step = time_step, new_col_name = new_col_name,
as_list_col = as_list_col, names_sep = names_sep,
all_rows = all_rows
all_rows = all_rows, all_versions = all_versions
) %>%
# We want a slide on ungrouped archives to output something
# ungrouped, rather than retaining the trivial (0-variable)
Expand Down
Loading