-
Notifications
You must be signed in to change notification settings - Fork 8
Pass ref time value and group key to epix_slide for tidy computations #317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass ref time value and group key to epix_slide for tidy computations #317
Conversation
- Try to keep expected function visible even when using `epix_slide()` within a helper function or other packages. - Make `.{x,group_key,ref_time_value}` inaccessible via `.data` and `.env`, only directly accessible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[See edge case + commit to address below.] Please:
- Sanity check my changes [to interface (see changed test) & implementation].
- If they look okay, please add a test based on the
helper
function example. - Now or in a separate Issue: we should probably provide some
.x
,.group_key
, and.ref_time_value
placeholders to avoid downstream packages from getting warnings about using undefined things in checks. I think that's what{dplyr,rlang}::.data
is; I don't think this package member is actually directly used in tidy eval, just more "convenient fiction" for CHECKs. [?.data
confirms this, in the "Where does '.data' live?:" section.]
f = function(.x, .group_key, .ref_time_value, quo, ...) { | ||
quo = quo_set_env(quo, env()) | ||
rlang::eval_tidy(quo, .x) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: ([maybe] shared by the old code [or maybe not... keep getting tricked by eval_tidy
default env
for exprs when using quosures]) env()
uses the function execution environment [by default], which handles making .x
, .group_key
, and .ref_time_value
available, but also:
- probably makes
quo
available - messes with some helper functions:
helper = function(archive_haystack, time_value_needle) {
archive_haystack %>% epix_slide(has_needle = time_value_needle %in% time_value, before = 365000L)
}
helper(archive_cases_dv_subset, as.Date("2021-01-01"))
- probably doesn't work if there were another package, e.g.,
epipredict
, that used slides but relied on packages not in epiprocess imports nor attached in the user's GlobalEnv. (The GlobalEnv thing saves us in some other cases, but seems like an artifact that suggests that R didn't have proper package import namespacing at one point and relied on everything being attached via Depends:.)
I'll push an update for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
helper functions
I can see that the helper function example works with your updates and exits with an error with the old code, but I don't understand the difference in scope of the two approaches.
This line
epiprocess/R/grouped_epi_archive.R
Line 380 in e9580d3
data_mask = rlang::new_data_mask(data_env, top = data_parent_env) |
top
and bottom
will be accessible to the quosure -- right? So why is time_value_needle
findable?
And how is the helper function example
helper = function(archive_haystack, time_value_needle) {
archive_haystack %>% epix_slide(has_needle = time_value_needle %in% time_value, before = 365000L)
}
helper(archive_cases_dv_subset, as.Date("2021-01-01"))
different from
archive_haystack <- archive_cases_dv_subset
time_value_needle <- as.Date("2021-01-01")
archive_haystack %>% epix_slide(has_needle = time_value_needle %in% time_value, before = 365000L)
? The issue is more than the fact that we're passing a variable rather than a "raw", unassigned value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: In various places in this package we use ::
to use some (but not all) external functions. Why is that? Like here, we're using rlang::new_data_mask
but there aren't any conflicting function names in epiprocess
or other imported packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. We haven't really standardized this yet, and it is probably more of a mix because at some point, in new code, I started to try to favor ::
in our internal code (but favor no ::
in examples) based on previous discussion, but never spent the time and effort to refactor to any standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, didn't see your first question. eval_tidy
makes both the data mask and the quosure environment available to the computation. It looks like rlang
probably combines these into a single environment. [In the old approach, the data mask didn't have any of the stuff, and the environment was something like an env()
with no bindings -> our f
's execution environment -> slide
execution environment -> archive internal environment -> epiprocess pkg namespace -> epiprocess imports -> stuff leading to the global namespace for backward compatibility reasons with old old R stuff, which saved us in a lot of cases. In the newer approaches, we're putting stuff in the data mask part and leaving the env alone, which lets it get stuff from the helper
execution environment. Maybe we could have gotten similar results by modifying the first approach to not use the default parent env in the env()
call, but instead setting its parent to the quosure's original quo_get_env()
and manually adding bindings for .x
etc. like you did in even older versions.]
library(rlang)
# put in a bunch of duplicate names and tricks to see what eval_tidy actually grabs:
data_env_parent = env(a = 15)
data_env = env(data_env_parent, a = 10, .data = list(a = 8), .env = list(b = 10000))
## data_env = env(data_env_parent, a = 10, .env = list(b = 10000))
data_mask = new_data_mask(bottom = data_env, top = data_env_parent)
data_mask$.custom_pronoun <- as_data_pronoun(fizz = "buzz")
#> Error in as_data_pronoun(fizz = "buzz"): unused argument (fizz = "buzz")
class(data_mask)
#> [1] "environment"
data_mask %>% names()
#> Error in data_mask %>% names(): could not find function "%>%"
class(data_mask$.env) # not sure if this is a real or a placeholder or something else
#> [1] "rlang_ctxt_pronoun"
# ?eval_tidy tells us that it will add the .data and .env pronouns, so maybe
# it's something else?
data_mask %>% env_parent(n = 1L) %>% names()
#> Error in data_mask %>% env_parent(n = 1L) %>% names(): could not find function "%>%"
data_mask %>% env_parent(n = 2L) %>% names()
#> Error in data_mask %>% env_parent(n = 2L) %>% names(): could not find function "%>%"
data_mask %>% env_parent(n = 3L) %>% env_label()
#> Error in data_mask %>% env_parent(n = 3L) %>% env_label(): could not find function "%>%"
#
get_quo = function(x) {
.data = list(a = 80000)
quo(paste(a, .data$a, .data$.data$a, x, .env$x))
}
my_quo = get_quo(42)
# ?new_data_mask tells us that the tidy eval engine will change the parent of
# `top`. So I'd guess that `eval_tidy` is constructing the following environment
# chain: data_mask (holds pronouns etc.) -> data_env -> data_env_parent -> quo_get_env(my_quo) -> our current environment -> our current environment's ancestors
eval_tidy(my_quo, data = data_mask)
#> [1] "10 8 42 42"
# Oops, I forgot to install `.data` pronoun. ?as_data_mask tells us that
# `as_data_mask` automatically installs `.data`, but `new_data_mask` does not.
# Need to check on my suggested edits to make sure we install it properly.
Created on 2023-06-15 by the reprex package (v2.0.1)
Also, I don't think we can use as_data_pronoun
for .ref_time_value
, it's going to mess up the type, and I don't see what the benefit is anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, looking at this again, I think I overcomplicated it with the top & bottom. I think maybe we should be "installing" our pronoun-like objects into the data mask in the same way as regular pronouns. And regardless, we may need to document differences between the actual-pronoun .data vs. the non-pronoun .x. I'll test this out.
Question: what do you think about disallowing .
in the formula interface? We're faced with .
meaning something different in formulas vs. tidyeval, or meaning something different between slides and mutate/etc. We can't make these all consistent, but we can decide on whether the inconsistency is between two different non-error behaviors, one thing working & the other forbidden, or both forbidden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we wanted .
to be available in the formula interface so that it matches standard formula behavior, i.e. with access to the first arg via .
, .x
, and ..1
. (In reality, I'm not sure .
will be used much.)
.
by itself doesn't mean anything in tidy expressions, right? It's just a prefix to some pronoun/var names. Unless there's a subtlety I'm missing, it doesn't seem inconsistent or confusing to me.
…al implmentation
so developers can use (untracked) renv's at the repo level without as many git and CHECK complaints.
…imevalue-without-env Pass `ref_time_value` to `epix_slide` for functions and formulas
Set DELPHI_EPIDATA_KEY, from a GitHub Secret
and add some unrelated comments regarding time value counting details. Since we're not potentially dealing with R6 objects, we can just us `as_data_mask(.x)`. We can also flatten the data mask by installing the `.x`, `.group_key` and `.ref_time_value` in the same way as that pronouns are installed (except keeping their original classes, not converting to pronouns), rather than using another level in the environment chain for the data mask.
Allow `epi_slide` to access `ref_time_value`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to go ahead and merge this with a tweaked approach to the data masking approach. I also threw in .x
as a not-quite-pronoun in the mask; I'm not sure if this is a design change or a bugfix to my first data mask attempt. Going to go ahead and merge and we can discuss any design issues / further bugfixes in another Issue.
I've failed the GitHub Actions [trigger, because I didn't update the PR base & GitHub didn't do it automatically], so we're not getting checks here. Unfortunate. |
a2c9aa1
into
ndefries/epix-slide-pass-reftimevalue-without-env
Addresses #170 for
epix_slide
for tidy computation via dots.Change the quosure's environment from where it was defined to a child of
f
's environment, which lets the tidy computation use any off
's args by name. Set the arg names to mirror the descriptive names allowed in formula computations. If we want to exactly mirror formula vars (.x
,.
,.y
,.z
) those can be set in the call toenv()
.