Skip to content

Commit 8838c71

Browse files
committed
WIP Add docs for de-dupe approach, part of the required validation
Forbidding `new_col_name` being among the labeling columns addresses some dedupe cases where deduping would always lead to failure except for completely-redundant computations (that only output computation labels rather than and actual computation). - This might not be complete in a edge case where `"slide_value"` is a grouping variable. (E.g., from using a slide to assign a categorical trend, then doing a grouped slide based on the trend.) This is definitely only part of the dedupe handling. Unpacked-column outputs need to actually be de-duped. Also, fix incorrect documentation for time_value filter for .all_versions = TRUE while rebasing on other slide updates.
1 parent 4b112e5 commit 8838c71

File tree

6 files changed

+84
-18
lines changed

6 files changed

+84
-18
lines changed

R/methods-epi_archive.R

+11-8
Original file line numberDiff line numberDiff line change
@@ -650,15 +650,18 @@ epix_detailed_restricted_mutate <- function(.data, ...) {
650650
#' set to a regularly-spaced sequence of values set to cover the range of
651651
#' `version`s in the `DT` plus the `versions_end`; the spacing of values will
652652
#' be guessed (using the GCD of the skips between values).
653-
#' @param .new_col_name String indicating the name of the new column that will
654-
#' contain the derivative values. The default is "slide_value" unless your
655-
#' slide computations output data frames, in which case they will be unpacked
656-
#' into the constituent columns and those names used. Note that setting
657-
#' `.new_col_name` equal to an existing column name will overwrite this column.
653+
#' @param .new_col_name Either `NULL` or a string indicating the name of the new
654+
#' column that will contain the derived values. The default, `NULL`, will use
655+
#' the name "slide_value" unless your slide computations output data frames,
656+
#' in which case they will be unpacked into the constituent columns and those
657+
#' names used. If the resulting column name(s) overlap with the column names
658+
#' used for labeling the computations, which are `group_vars(x)` and
659+
#' `"version"`, then the values for these columns must be identical to the
660+
#' labels we assign.
658661
#' @param .all_versions (Not the same as `.all_rows` parameter of `epi_slide`.) If
659-
#' TRUE, then `.f` will be passed the version history (all
660-
#' `version <= .ref_time_value`) for rows having `time_value` between
661-
#' `.ref_time_value - before` and `.ref_time_value`. Otherwise, `.f` will be
662+
#' `.all_versions = TRUE`, then `.f` will be passed the version history (all
663+
#' `version <= .ref_time_value`) for rows having `time_value` of at least
664+
#' `.version - before`. Otherwise, `.f` will be
662665
#' passed only the most recent `version` for every unique `time_value`.
663666
#' Default is `FALSE`.
664667
#' @return A tibble whose columns are: the grouping variables, `time_value`,

R/slide.R

+18-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,9 @@
2727
#' and can also refer to `.x`, `.group_key`, and `.ref_time_value`. See
2828
#' details.
2929
#' @param .new_col_name String indicating the name of the new column that will
30-
#' contain the derivative values. Default is "slide_value"; note that setting
30+
#' contain the derivative values. The default is "slide_value" unless your
31+
#' slide computations output data frames, in which case they will be unpacked
32+
#' into the constituent columns and those names used. Note that setting
3133
#' `new_col_name` equal to an existing column name will overwrite this column.
3234
#'
3335
#' @template basic-slide-details
@@ -169,6 +171,21 @@ epi_slide <- function(
169171
}
170172
}
171173

174+
checkmate::assert_string(new_col_name, null.ok = TRUE)
175+
if (!is.null(new_col_name)) {
176+
if (new_col_name %in% group_vars(x)) {
177+
cli_abort(c("`new_col_name` must not be one of the grouping column name(s);
178+
`epi_slide()` uses these column name(s) to label what group
179+
each slide computation came from.",
180+
"i" = "{cli::qty(length(group_vars(x)))} grouping column name{?s}
181+
{?was/were} {format_chr_with_quotes(group_vars(x))}",
182+
"x" = "`new_col_name` was {format_chr_with_quotes(new_col_name)}"))
183+
}
184+
if (identical(new_col_name, "time_value")) {
185+
cli_abort('`new_col_name` must not be `"time_value"`; `epi_slide()` uses that column name to attach the `ref_time_value` associated with each slide computation') # nolint: line_length_linter
186+
}
187+
}
188+
172189
# Arrange by increasing time_value
173190
x <- arrange(.x, .data$time_value)
174191

R/utils.R

+22
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,28 @@ format_class_vec <- function(class_vec) {
9797
paste(collapse = "", deparse(class_vec))
9898
}
9999

100+
#' Format a character vector as a string via deparsing/quoting each
101+
#'
102+
#' @param x `chr`; e.g., `colnames` of some data frame
103+
#' @param empty string; what should be output if `x` is of length 0?
104+
#' @return string
105+
format_chr_with_quotes <- function(x, empty = "*none*") {
106+
if (length(x) == 0L) {
107+
empty
108+
} else {
109+
# Deparse to get quoted + escape-sequenced versions of varnames; collapse to
110+
# single line (assuming no newlines in `x`). Though if we hand this to cli
111+
# it may insert them (even in middle of quotes) while wrapping lines.
112+
deparsed_collapsed <- paste(collapse = "", deparse(x))
113+
if (length(x) == 1L) {
114+
deparsed_collapsed
115+
} else {
116+
# remove surrounding `c()`:
117+
substr(deparsed_collapsed, 3L, nchar(deparsed_collapsed) - 1L)
118+
}
119+
}
120+
}
121+
100122
#' Assert that a sliding computation function takes enough args
101123
#'
102124
#' @param f Function; specifies a computation to slide over an `epi_df` or

man/epi_slide.Rd

+3-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/epix_slide.Rd

+11-8
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/format_chr_with_quotes.Rd

+19
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)