Make tidy sliding play nicely with data frame inputs #55

ryantibs · 2022-03-10T14:35:33Z

Tidy sliding is pretty convenient. But in some applications, I might want to actually be sliding a function that takes an entire data frame rather than pass in individual columns as arguments.

For example, let's suppose my_fun() is a complicated function that I've written that acts on a whole data frame, whose signature is:

my_fun(df, args)

Then I could do:

x %>% 
  epi_slide(f = my_fun, args = my_args, n = 120, new_col_name = "fc")

But I could NOT easily do some tidy sliding, because there's currently no way to get the tidy eval thing I've written to understand "pass the sliding data frame". It only understands "pass these sliding variables".

Ideally, we'd code in some special variable name that epi_slide() (and analogously epix_slide()) understands and treats in a special way, say .cur_df (prefixing a dot so that it doesn't conflict with variable names in the environment) and then we could do:

x %>% 
  epi_slide(fc = my_fun(.cur_df, args = my_args), n = 120)

which is more intuitive/convenient for folks who like tidy code.

@lcbrooks @dajmcdon @jacobbien @elray1 what do you think? Anybody want to take a stab at implementing it?

The text was updated successfully, but these errors were encountered:

brookslogan · 2022-03-10T15:27:00Z

We just ran into this as part of cmu-delphi/epipredict#8 and cmu-delphi/epipredict#3. Definitely want this in one way or another.

Thinking about the implementation makes me wonder if the interface of epi_slide is a bit complicated and whether we'd want to try to simplify now. Currently we have at least the function, formula, and missing(= quosure stuff on dots) cases for f, and we are looking at ways to directly reference both the .cur_df and its columns by name. And we might think about another case where we want to take .cur_df, transmute it to get multiple columns, and have those multiple columns appear in the result of the slide (although maybe this is just a list-column and a dplyr::unnest away for the user). Maybe it'd be simpler to only accept f as a function or formula from .cur_df to a list/df of results?

Implementation: I could work on this after forecaster calibration exploration, but that will take a bit.

.cur_df approach:

formula f case: not sure how [Ryan pointed out it's already handled; user get the df as.x]
function f case: not sure how [Ryan pointed out it's already handled; user gets the df]
missing f case (tidy case): this may just involve constructing a env arg for eval_tidy. Currently we use the default (might be a bug --- it seems like this would use some of the epi_slide internals in the env --- this might need to be eval_tidy(quo, x, caller_env())). That would change to eval_tidy(quo, x, eval_env), where eval_env is a child of caller_env() with the extra .cur_df binding set. [But maybe we need to look at why rlang uses .data and .env pronouns in their own namespaces rather than doing child environment construction.] [from the future: eval_tidy(quo, x, eval_env) isn't right; either first arg should be an expr and third arg should be passed, or first arg should be a quosure and the third arg should not be there.]

ryantibs · 2022-03-10T15:48:46Z

Thanks for the response. As for the epipredict issues: that's partly why I thought of it. Though I'm in the middle of writing a long comment on cmu-delphi/epipredict#8 that's going to argue against letting the user pass a straight data frame. So I think we should think of the use case in the current issue as just one we'd want to support in general.

Responses are helpful but I think may be a bit of confusion here? I don't see a way to simplify epi_slide() without removing functionality. I would not want to remove tidy sliding from epi_slide(). That's the most convenient way of doing things for many simple applications of sliding.

And the multicolumn case you're describing I believe would be handed by returning a list column and using the default argument as_list_col = FALSE, which unnests it in the output.

So to be clear, the .cur_df approach is ONLY intended to be for tidy sliding. When using a function or a formula you won't need it. When using a function, it already receives the entire data frame. When using a formula, you can access the entire data frame with .x (this is provided by slider package).

When using tidy sliding, I'm proposing you can access the entire data frame with .cur_df. But maybe we should change it to be .x to be consistent with the formula case.

ryantibs · 2022-03-10T16:12:02Z

Just following up: the analogous implementation (whatever we do) should be something we carry over to epix_slide().

And in thinking that through, I realized I never implemented a formula option in epix_slide()! (Though the documentation claims it's possible, I just forgot to do it. You get it for free in epi_slide() because of slider.)

Should we implement that? Would give the greatest degree of symmetry in between the two sliding functions which I think we should try to maintain.

brookslogan · 2022-03-10T16:12:40Z

Thanks for clearing up my confusion on the current functionality.

So maybe "simplifying" would just be removing useful functionality. For [(epipred, other generalized)] coding on top of epi_slide we also want the formula/function interface[:] e.g., epi_slide(df, ~ .x %>% summarize(a = ......, b = ......., n = .....)) doesn't try to match n to the epi_slide argument rather than making a new column.

Implementation detail-wise we might need to check why rlang uses "pronouns" rather than child-env-creation for .data and .env. Maybe it's not as simple as I hoped.

ryantibs · 2022-03-10T16:25:53Z

Sorry, I'm not sure I understand your current perspective.

Are you suggesting we get rid of tidy evaluation for sliding?
Are you suggesting we get rid of function/formulation evaluation for sliding?

I see a use case for both, and now given your implementation notes (thank you, helpful), I'm actually leaning towards doing nothing further right now for tidy sliding---meaning, tidy sliding can only access variables by name.

If you ever want to "get this whole data frame", then you should be able to do that with a formula or function, and that's one of their main use cases. If we wanted to do something as advanced as what you're suggesting, in your last comment (with piping) then it's best to use a function. I don't think pipes play nicely with formulas (my impression based on basic experimentation just now).

Thoughts?

ryantibs · 2022-03-10T17:18:04Z

p.s. I fixed the formula problem (that epix_slide() doesn't support it) in #58. Will merge a bit later.

If you think it's more fluid we can talk about this and other issues on the call later today. Thanks!

brookslogan · 2022-03-10T17:19:38Z

Neither now, sorry for the noise. As you said, both have use cases.

The tidy eval + .cur_df does still seem like a convenient feature, but low priority compared, e.g., to #58.

ryantibs · 2022-03-10T17:21:31Z

No worries, your thoughts & discussions points are helpful (here and always). So I think we can leave this issue open but plan not to attack it until we feel that it's risen to a much higher priority.

brookslogan · 2022-10-21T16:27:49Z

Note that some dplyr operations support ~~cur_data(), cur_data_all()~~ [deprecated; now use pick()], and some other functions to solve this problem. However:

I don't think group_modify is one of them
While we could use summarize instead of group_modify, its cur_data* functions give plain tibbles rather than epi_dfs, which differs from what we feed f functions/formulas
Using cur_data requires extra thought about nesting of dplyr pipelines; naive approaches can sometimes refer to data from the wrong pipeline.

brookslogan · 2024-05-03T22:13:43Z

--- CURRENT STATE: ---
epi_slide() and epix_slide() support .data pronoun in tidy computations. However, pronouns do not support full dplyr operations, and our slide functions do not support pick() to get actual data frames. [But we have a pronoun-like object .x with the full operations available now.]

ryantibs added the enhancement New feature or request label Mar 10, 2022

ryantibs mentioned this issue Mar 10, 2022

Formulas in epix_slide #58

Merged

dshemetov added the P2 low priority label Mar 10, 2022

brookslogan mentioned this issue Jul 27, 2022

Add way to refer to ref_time_value within f arg to epix_slide #170

Closed

brookslogan mentioned this issue Oct 21, 2022

Allow multiple column computations in data masking slides #235

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make tidy sliding play nicely with data frame inputs #55

Make tidy sliding play nicely with data frame inputs #55

ryantibs commented Mar 10, 2022 •

edited

Loading

brookslogan commented Mar 10, 2022 •

edited

Loading

ryantibs commented Mar 10, 2022 •

edited

Loading

ryantibs commented Mar 10, 2022

brookslogan commented Mar 10, 2022 •

edited

Loading

ryantibs commented Mar 10, 2022

ryantibs commented Mar 10, 2022 •

edited

Loading

brookslogan commented Mar 10, 2022

ryantibs commented Mar 10, 2022

brookslogan commented Oct 21, 2022 •

edited

Loading

brookslogan commented May 3, 2024 •

edited

Loading

Make tidy sliding play nicely with data frame inputs #55

Make tidy sliding play nicely with data frame inputs #55

Comments

ryantibs commented Mar 10, 2022 • edited Loading

brookslogan commented Mar 10, 2022 • edited Loading

ryantibs commented Mar 10, 2022 • edited Loading

ryantibs commented Mar 10, 2022

brookslogan commented Mar 10, 2022 • edited Loading

ryantibs commented Mar 10, 2022

ryantibs commented Mar 10, 2022 • edited Loading

brookslogan commented Mar 10, 2022

ryantibs commented Mar 10, 2022

brookslogan commented Oct 21, 2022 • edited Loading

brookslogan commented May 3, 2024 • edited Loading

ryantibs commented Mar 10, 2022 •

edited

Loading

brookslogan commented Mar 10, 2022 •

edited

Loading

ryantibs commented Mar 10, 2022 •

edited

Loading

brookslogan commented Mar 10, 2022 •

edited

Loading

ryantibs commented Mar 10, 2022 •

edited

Loading

brookslogan commented Oct 21, 2022 •

edited

Loading

brookslogan commented May 3, 2024 •

edited

Loading