-
Notifications
You must be signed in to change notification settings - Fork 67
covidcast
endpoint - no mutual exclusivity check for issue, lag, and as_of fields
#1159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
While its difficult to imagine a truly useful reason for specifying more than one of those arguments (that couldnt be achieved in a simpler way), theyre not technically incompatible. In many cases, a combination of them could make up a perfectly valid query (for some definition of "valid"). Someone might request an This opens up a can of worms: Should we alert if the One exception to this is that |
Hm, you make a great point, especially the part about semantic constraints turning into a complexity slippery slope. From the perspective of minimizing API server complexity, it seems ideal to have the API be as close to an SQL query wrapper as possible (while satisfying other constraints like security/API keys and some minimal argument validation). From the perspective of maximizing API usability, we should probably ask ourselves: is it easy to use our API wrong? In one sense yes, since queries aren't guarded against mistakes. But practicality, I don't think so, since mixing the arguments I listed above is very much an opt-in: a user would have to write out those multiple arguments themselves, but by default these arguments don't collide. So I think I'm convinced that we can leave these alone. This is making me rethink some guards present in We can probably close this issue, but keep it in mind as a record of what our API does when mixing these arguments. Before writing this, I thought that some of these arguments would take precedence over others. |
We [probably shouldn't] leave these completely alone, because they no longer do the right thing in covidcast. I expect an library(epidatr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
cce = covidcast_epidata()
as_of_issue_data1 = cce$signals$`doctor-visits.doctor-visits:smoothed_cli`$call(
"state", "az", 20230101, issues="*", as_of=20230105
) %>% fetch_tbl()
as_of_issue_data2 = cce$signals$`doctor-visits.doctor-visits:smoothed_cli`$call(
"state", "az", 20230101, issues="*", as_of=20230106
) %>% fetch_tbl()
as_of_issue_data1 %>% distinct(issue)
#> # A tibble: 1 × 1
#> issue
#> <date>
#> 1 2023-01-05
as_of_issue_data2 %>% distinct(issue)
#> # A tibble: 1 × 1
#> issue
#> <date>
#> 1 2023-01-06 Created on 2023-05-04 with reprex v2.0.2 [Actually, I would like to use this type of query. It is a convenient way to do reproducible version-aware analyses. (Plus if we ever separate out different types of "versions" (e.g., what the data source said / would/should have said vs. what the API had available), I'd imagine we'd consider pairing Aside on the validation: it would be nice to get some warnings when requesting as_of beyond today / beyond our latest available issue / maybe some similar things. [The beyond-latest-issue check w]ould help detect when we've run something too early in the morning, or on a stale data source.] |
We do specifically say in the documentation to use only one of the three versioning parameters: so I'm in favor of a validation error. @brookslogan I'm not sure what is expected from issue-star+asof; asof is by definition ragged-edge ("the greatest issue less than X for each row") whereas issue queries are more columnar ("all rows posted on this date"). Also want to note, support for issue inequalities ("all issues less than X") are currently in review and should be available soon pending a minor ~enums refactor |
To me, So for
I'm fine with a validation error as well though. |
After thinking about this a bit, I feel fine with enforcing that only one of issue, as_of, lag can be specified at a time. The main reason is that it's just too confusing to reason about what their combinations will do. I don't think it's an overreach into semantic validation to be clear about which arguments are mutually exclusive - it's just a marker of "there be dragons ahead" and a posting of some guardrails, to prevent a user from falling into a pit of said dragons. Case in point, the as_of + issue combo in @brookslogan's example. It's not clear whether it should:
And if I understood @brookslogan's point correctly, the point about "what the
To me, this seems like the kind of validation that belongs in a data analyst's / modeler's repo downstream, not as part of server code. I'm happy for things like this to exist in a modeling package, but to have this be in the server code feels like importing our own assumptions about data usage too far upstream. |
Seems like there is some agreement on adding server-side exclusivity checks. (Asides:
|
Some of these should probably give a validation error.
The text was updated successfully, but these errors were encountered: