Skip to content

Update check functions for clarity and add tests #89

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
May 3, 2023
Merged

Update check functions for clarity and add tests #89

merged 36 commits into from
May 3, 2023

Conversation

dshemetov
Copy link
Contributor

@dshemetov dshemetov commented Apr 24, 2023

This PR rewrites all the check_* functions with checkmate and adds tests.

Changes:

  • consolidate check_single_int_param, check_int_param-> check_int_like_param
  • consolidate check_single_string_param, check_string_param -> check_character_param
  • add check_date_param
  • consolidate check_single_epirange_param, check_epirange_param -> check_timeset_param
  • update check_timeset_param to correctly detect a list of epiranges and to correctly error when given a vector of epiranges

Fixes:

@dshemetov dshemetov changed the base branch from dev to ds/docs April 24, 2023 21:49
Base automatically changed from ds/docs to dev April 24, 2023 21:49
@dajmcdon
Copy link
Contributor

I'm not going to block, but it's probably worth examining the related implementation of similar checks in {epipredict}.

@lcbrooks it may be worth putting these kind of potentially shared functions somewhere common.

@brookslogan
Copy link
Contributor

@dajmcdon Or should we use some prepackaged solution like checkmate? I know that doesn't work for everything, like the epiprange checks, though. Would it work for the epipredict stuff?

@dshemetov
Copy link
Contributor Author

dshemetov commented Apr 25, 2023

I took a glance at checkmate and decided not to use it here mostly because I didn't want to learn a new package while learning Sam's check functions. I might try refactoring with checkmate now that I understand these functions.

Also, thanks Dan, I'll check out similar functions in epipredict.

@brookslogan
Copy link
Contributor

Still need to give this a look over. Brief partial skim: is #36 actually addressed? Or is the code sample showing the old behavior?

@dshemetov
Copy link
Contributor Author

dshemetov commented Apr 27, 2023

Ah good catch, I misunderstood that issue. It's not fixed yet.

EDIT: Fixed now.

Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a lot of naming questions and polish comments which could be debated/deferred, but already this is a naming improvement and takes care of a few bugs. I think the current logic and naming doesn't match quite well with R's always-vectors approach, naming conventions (helped a lot by your addition of these "single" checker functions), and check function conventions (I think missingness would be handled separately).

thought: I believe [some] related rlang functions combine handle the single/"scalar" case simply by adding on an n argument to the vector case. Though checkmate takes a different approach and tries to use different names and functions for [additional] single/scalar checks. If you feel it'd be more maintainable to reduce the number of functions, you might try the rlang approach.

question: I haven't really given a good check over any actual functionality issues or the tests, but the tests seem pretty extensive so they might already have caught everything. Are you confident this behaves well or should I give it a second pass trying to find bugs?

@dajmcdon
Copy link
Contributor

On naming: perhaps "scalar" rather than "single"?

Copy link
Contributor Author

@dshemetov dshemetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Logan and Dan! Going to

  • make some conversions to checkmate
  • rename single -> scalar
  • add Date support

@dshemetov dshemetov requested a review from brookslogan April 28, 2023 22:40
Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome to see these simplifications! A couple remaining important things:

  • Make sure we check for single strings / scalar characters appropriately. (Note: I think I've been unclear here... rlang's "string" is a length-1 non-NA character vector, not just a length-1 character vector. We probably want to be forbidding NAs in just about everything, but not a mandatory thing to address right now.)
  • Make sure Date support actually works in requests.

* consolidate check functions, update tests
* allow date values in time args, handle output formatting
* update format_item, format_list, tests
* add basic endpoint tests and request_url tests
@brookslogan
Copy link
Contributor

brookslogan commented May 1, 2023

Awesome, think it's 99% there.

issue: hyphenated date strings don't work. (Or are they supposed to?)

modified_example = covidcast(
   data_source = "jhu-csse",
   signals = "confirmed_7dav_incidence_prop",
   time_type = "day",
   geo_type = "state",
   time_values = epirange("2020-06-01", "2020-08-01"),
   geo_values = "ca,fl"
 ) %>% fetch_tbl()
#> Error: '' does not exist in current working directory ('/path/to/working/dir').

A few things that could be refiled into Issues / other PRs:

todo (non-blocking): add yourself as an author/contributor in DESCRIPTION & redoc. And Kean Ming as well if he's not already there.
issue (non-blocking): the no-API-key warning

To avoid this, you can get your key [here] and then:
ℹ set the environment variable DELPHI_EPIDATA_KEY
ℹ set the option 'delphi.epidata.key'

should have something like

DELPHI_EPIDATA_KEY, or

or

DELPHI_EPIDATA_KEY, and/or

to indicate you don't have to do both.

@dajmcdon
Copy link
Contributor

dajmcdon commented May 1, 2023

To follow on Logan's previous comments:

  1. I would hope that hyphenated date strings would work with epirange() (but fine to add an issue for later).
  2. I would update the message to say exactly how to set the API key. Something like Call Sys.setenv(DELPHI_EPIDATA_KEY = blah) (or whatever is the right way to do this).

@dshemetov
Copy link
Contributor Author

I can probably make hyphenated date strings work with epirange, but there's currently a server bug that makes them not work.

@dajmcdon
Copy link
Contributor

dajmcdon commented May 1, 2023

Can they just be processed to numeric on the R side?

@dshemetov
Copy link
Contributor Author

So far I have avoided processing character / integer inputs as dates, hoping that we can rely on the API to return validation errors. But I could give date parsing a shot.

dshemetov added 2 commits May 2, 2023 17:19
* add timeset help topic
* replace links to epirange fields to timeset
* add parse_timeset_input function that will parse/validate date-like inputs
* thread parse_timeset_input for all epirange-like endpoint arguments
* update tests
@dshemetov
Copy link
Contributor Author

dshemetov commented May 3, 2023

Added a date validation function. This query works now

r$> modified_example = covidcast( 
           data_source = "jhu-csse", 
           signals = "confirmed_7dav_incidence_prop", 
           time_type = "day", 
           geo_type = "state", 
           time_values = list(epirange("2020-06-01", "2020-06-03"), epirange("2020-07-01", "20200703")),
           geo_values = "ca,fl" 
         ) %>% fetch_tbl() 
                                                                                                                                                                                                                                                  
r$> modified_example
# A tibble: 12 × 15
   geo_value signal                        source   geo_type time_type time_value direction issue        lag missing_value missing_stderr missing_sample_size value stderr sample_size
   <chr>     <chr>                         <chr>    <ord>    <ord>     <date>         <dbl> <date>     <int>         <int>          <int>               <int> <dbl>  <dbl>       <dbl>
 1 ca        confirmed_7dav_incidence_prop jhu-csse state    day       2020-06-01        NA 2023-03-10  1012             0              5                   5  6.84     NA          NA
 2 fl        confirmed_7dav_incidence_prop jhu-csse state    day       2020-06-01        NA 2023-03-03  1005             0              5                   5  3.34     NA          NA
 3 ca        confirmed_7dav_incidence_prop jhu-csse state    day       2020-06-02        NA 2023-03-10  1011             0              5                   5  6.83     NA          NA
 4 fl        confirmed_7dav_incidence_prop jhu-csse state    day       2020-06-02        NA 2023-03-03  1004             0              5                   5  3.41     NA          NA
 5 ca        confirmed_7dav_incidence_prop jhu-csse state    day       2020-06-03        NA 2023-03-10  1010             0              5                   5  6.66     NA          NA
 6 fl        confirmed_7dav_incidence_prop jhu-csse state    day       2020-06-03        NA 2023-03-03  1003             0              5                   5  4.03     NA          NA
 7 ca        confirmed_7dav_incidence_prop jhu-csse state    day       2020-07-01        NA 2023-03-10   982             0              5                   5 17.1      NA          NA
 8 fl        confirmed_7dav_incidence_prop jhu-csse state    day       2020-07-01        NA 2023-03-03   975             0              5                   5 32.9      NA          NA
 9 ca        confirmed_7dav_incidence_prop jhu-csse state    day       2020-07-02        NA 2023-03-10   981             0              5                   5 18.0      NA          NA
10 fl        confirmed_7dav_incidence_prop jhu-csse state    day       2020-07-02        NA 2023-03-03   974             0              5                   5 36.2      NA          NA
11 ca        confirmed_7dav_incidence_prop jhu-csse state    day       2020-07-03        NA 2023-03-10   980             0              5                   5 17.6      NA          NA
12 fl        confirmed_7dav_incidence_prop jhu-csse state    day       2020-07-03        NA 2023-03-03   973             0              5                   5 36.6      NA          NA

@brookslogan brookslogan self-requested a review May 3, 2023 17:53
lcbrooks added 3 commits May 3, 2023 12:00
since these invalid inputs might be attempts at epiweek inputs, not just dates
`c(epirange1, epirange2)` currently outputs a named, non-EpiRange vector or 4
that we wouldn't treat appropriately. Make this output an error about having
names (could be clearer, but this also might catch some other mistakes / false
expectations).

Since "scalars" are vectors in R, don't use `is_vector` to try to detect what we
might think of as vectors.  `format_item` is already mostly set up to handle
vectors, so just make it appropriately handle Date vectors and send it all
non-list vectors.
@brookslogan
Copy link
Contributor

Pushed some changes that I believe resolve the remaining comments. Please check over & merge.

@brookslogan brookslogan self-requested a review May 3, 2023 19:14
Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't seem to request @dshemetov to review my tweaks since he's the PR author. Just marking approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants