-
Notifications
You must be signed in to change notification settings - Fork 8
validation for correct grouping structure #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @elray1. Great points. I think the first solution is more inline with the philosophy we've been sticking to so far in keeping the data structure pretty minimal and putting the onus on the user (or on functions) downstream to make sure that things satisfy the proper requirements for certain functionality. |
@elray1 Now that the package has developed a bit more, what's your current thinking on this? Is there something to do with the package in terms of new functionality, or new examples (in a vignette), or ?? |
Thanks for the reminder of this. Some thoughts:
|
Good points. So in Notes (to myself, as much as anybody else ...) If
@qpmnguyen @rafaelcatoia Can you take a look at this (my bullet points), both for the sake of improving this for robustness/efficiency, and also for the sake of writing tests around this? For the former, you can open up a separate issue to improve the robustness/efficiency of |
I just submitted a PR that addresses the part of this issue that's related to |
Thanks! Will take a look shortly. @qpmnguyen @rafaelcatoia Having slept on it, I'm not sure it's worth investigating how to get |
As discussed in comments on PR #14, we are asking users to group their data according to any relevant
key
/identifier variables before calling functions likeslide
ordetect_outliers
, so that those functions know to operate on one group of observations at a time. It would be good to check at some point that users have done their grouping correctly, so that e.g. we're sliding over a series representing a signal for a singlegeo_value
or a single combination ofgeo_value
andage_group
. The main way I can think of to check this is to verify that there are no duplicated values oftime_value
within the same group. I could imagine two ways this could go:slide
anddetect_outliers
). For example, maybe something like this:Then, within
slide
,detect_outliers
,correct_outliers
, and any other relevant functions, we'd callvalidate_epi_tibble_grouping(x)
before doing anything else.Example output/start of test case:
epi_tibble
is created, and also validate that the rules are not broken any time the groupings are updated inepitools::group_by.epi_tibble
... and I guess maybe also anything else that may incidentally introduce conflicts, e.g.bind_rows
?I feel hazy enough about how approach 2 would work that I'm thinking the first approach might be safer. But I'm open to ideas about any part of this.
The text was updated successfully, but these errors were encountered: