Skip to content

epi_slide* no column behavior #475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dsweber2 opened this issue Jun 21, 2024 · 6 comments
Open

epi_slide* no column behavior #475

dsweber2 opened this issue Jun 21, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@dsweber2
Copy link
Contributor

dsweber2 commented Jun 21, 2024

When I don't specify a column, epi_slide and epi_slide_mean have pretty different behavior. Example command epi_slide_mean(before = 6L) or epi_slide(mean, before = 6L).

epi_slide_mean outputs a Can't recycle input of size 0 to size 1 error, while epi_slide just returns a ton of NA values.

Ideally they would either

  1. assume the correct target (in the case of 1 non-key column there's an obvious choice)
  2. throw an error saying we need to specify a target (probably the right behavior if there's 2+ columns)

edit: second epi_slide_mean should've been just epi_slide

@dsweber2 dsweber2 added the enhancement New feature or request label Jun 21, 2024
@brookslogan
Copy link
Contributor

brookslogan commented Jun 26, 2024

Couple notes:

  • The cause of the epi_slide_mean() behavior is result_col_names <- paste0("slide_value_", col_names_chr) which doesn't properly vectorize to a 0-length result in the case of a 0-length col_names_chr. [without requiring R >= 4.0.1 and specifying recycle0 = TRUE]
  • In the epi_slide() example, (i) we issue a warning that mean doesn't look like a proper function to be sliding, (ii) mean generates a bunch of NAs + warnings from calculating the mean across an epi_df chunk + group key + ref time value, including non-numeric columns of the epi_df chunk or group key, and (iii) default R behavior collapses all these warnings into a less-noticeable (especially with my editor's coloring settings) "There were 50 or more warnings (use warnings() to see the first 50)".

We can add additional validation/defaults for the epi_slide_mean() case. To stop the epi_slide() case, we'd probably need to turn that warning into an error.

@dsweber2
Copy link
Contributor Author

Ah, I guess I was a bit confused about epi_slide working after including the column name. The lack of parallel is unfortunate; it would be nice if epi_slide accepted this sort of arg that switched it to operating on the specific column.

@brookslogan
Copy link
Contributor

brookslogan commented Jul 9, 2024

The tidyverse way would be to use across(). But that may be in the group of things that really delve into dplyr internals and aren't compatible with (simple usage of) the rlang analogues (that might not even back dplyr ops despite sounding that way??), so allowing across() in epi_slide() might not be simple.

@brookslogan
Copy link
Contributor

In #521 I'm just going for

throw an error saying we need to specify a target

in all cases, plus hinting them to use epi_slide_mean etc. We could reconsider whether we want to do some sort of special behavior when there's only 1 sensible value column, or maybe doing something for every value column if they are all sensible. But I'm just trying to make sure we have a hard & helpful error as a starting point.

brookslogan added a commit that referenced this issue Sep 13, 2024
See #475. We have the ability to catch this consistently now since we are using
`.keep = TRUE`, so the .x into each slide computation is an `epi_df` even if
we're grouping by `geo_value` (previously, .x would have decayed into a tibble
and we shouldn't override tibble's behavior).
brookslogan added a commit that referenced this issue Sep 13, 2024
See #475. We have the ability to catch this consistently now since we are using
`.keep = TRUE`, so the .x into each slide computation is an `epi_df` even if
we're grouping by `geo_value` (previously, .x would have decayed into a tibble
and we shouldn't override tibble's behavior).

Also add missing `group_map` import.
@brookslogan
Copy link
Contributor

brookslogan commented Sep 16, 2024

#521 lacks the epi_slide_mean fix [0 cols being fed into epi_slide_mean. The epi_slide part is "addressed" with an error message.]

@brookslogan
Copy link
Contributor

brookslogan commented Mar 17, 2025

Remaining TODO is the epi_slide_{opt,mean,sum} adjustment. We have some similar logic in revision_summary() we could mirror / borrow code from.

#469 (comment) & maybe elsewhere we have a alternative proposal for the default operating column(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants