Skip to content

Consolidate fetch interfaces #99

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
May 23, 2023
Merged

Consolidate fetch interfaces #99

merged 42 commits into from
May 23, 2023

Conversation

dshemetov
Copy link
Contributor

@dshemetov dshemetov commented May 6, 2023

A list of changes:

  • add the fetch function, which most of the time calls fetch_tbl and calls fetch_classic if the endpoint is only_supports_classic (this is delphi, pvt_norostat_meta, and meta)
  • fetch_tbl now uses fetch_classic to make requests (instead of CSV)
  • unexport fetch_csv, fetch_classic, fetch_tbl
  • remove fetch_json
  • add unexported fetch_debug for response debugging
  • request_impl now always raises on HTTP errors, forwards the HTML body to the error
  • fetch always raises for epidata errors, including no results
  • test and check for differences in the tibbles output by fetch_tbl and fetch_csv (identical on a few small queries)
  • many updates to documentation and vignettes
  • add mockery and mockr to Suggests
  • add new import xml2 to Depends
  • export as_tibble

TODO:

  • add more a few more unit tests to the fetch interfaces

Fixes:

dshemetov added 2 commits May 5, 2023 17:18
* add a universal fetch interface that calls fetch_classic or fetch_json
* test and compare fetch_tbl and fetch_csv
* deprecate and unexport fetch_csv, fetch_json
* redo documentation and vignettes
* export as_tibble
* add mockery and mockr to Suggests
dshemetov and others added 15 commits May 5, 2023 17:57
* Favor `expect_identical` over `expect_equal`, as `expect_identical` is stricter,
  though in testthat edition 3.0, not as strict as `identical`.
* Avoid `dplyr::all_equal`, which was even more (and excessively) lenient and is
  now deprecated.
* Remove description of nonexistent parameter.
* Apply some markdown formatting.
* Keep developers' .Renviron's from interfering with the auth tests.
* Add test characterizing how we have the env var take precedence over the
  option.
so that the "Value:" heading in the documentation will appear as a bulleted list
naming each function documented in this topic and what its return value looks
like.
Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • praise: interface is looking more streamlined and slimmer
  • praise: looks like people can't miss epidata errors now!
  • issue: need to warn on epidata warnings in message.
  • todo: figure out whether epidata statuses or http statuses should be checked first. (see API keys doc for key situation. another one is when results are truncated, especially in older endpoints where the limits may be very small; maybe this should actually be an error because there's not a clear bug-free way to take advantage of any of the results, unless someone just wanted a preview, but then they could/should just ask for less data) [if there is http error, there will not be an epidata status; the content will be html with an error message]
  • question: what is return code 2? is that for truncated results?
  • todo: fix "1 for success, non-zero for failure"
  • [transferred below] question: (answer depends on some of the above questions/changes) what's the idea for fetch_classic() regarding errors and warnings? Right now it seems like a kind of mixed thing that stops in some cases, but still returns status&message info when it succeeds. Can we make it more consistent? (E.g., try to stop on all errors, warn on all warnings, and return $epidata?)
  • suggestion: rename "method = c("data.frame", "csv") to something like "via = c("classic", "csv") or encoding = c("classic", "csv")
  • question: is "fixtures" a standard term for these test prototype objects? testthat docs seem to use "fixture" to refer to local_*() functions
  • issue: (potentially; might be server thing) delphi(system = "ec", epiweek = 202006) %>% fetch() and delphi(system = "ec", epiweek = 202006) %>% fetch_classic() return 500s
  • question: Sorry, I'm so confused right now. I see in test comments and now code that "fetch_classic uses jsonlite::fromJSON, which converts the underlying data to a data.frame". But I thought we wanted this to give a nested list format on the only-classic things; we'd just want the old client's way of getting content for this function, and only use fromJSON when fetch_tbl-ing via the "classic" transfer-format/encoding, right? Or am I misunderstanding what fetch_classic output-format is supposed to be now?
  • todo: clarify/fix "fetch_csv uses readr::read_csv"; I assume it's fetch_tbl(...., format = "csv") that does this

Stopping here because I'm confused and may just be writing unhelpful things at this point. I haven't given the test based on the fixture objects a good look through yet, think I'll just muddle things further.

  • note: okay, now I see the disable_data_frame_parsing parameter. But I assume some statements I mentioned above may need clarified by mentioning this parameter or something along those lines.
  • note: I also committed a few minor changes to do with a dplyr deprecation and what I think current testthat practices are, roxygen tweaks, various wording tweaks

@dshemetov
Copy link
Contributor Author

dshemetov commented May 12, 2023

  • you can't check epidata status without checking HTTP status first, since an HTTP400+ won't return any epidata codes, so HTTP should go first
  • are HTTP statuses 200 < x < 400 possible from our server? do they carry useful information?
  • in fetch_classic, the error handling is: (1) httr::stop_for_status should raise an R error if we get an HTTP error, (2) if no HTTP error, look at epidata status code and raise an error with the epidata message if the the status code is not one of 1 success or -2 no results, (3) otherwise return the request (along with the status code and message). this seems pretty consistent to me?
  • wrt epidata warnings: should we catch other epidata status codes? is there even documentation for epidata status codes? i have a feeling we'll have to dig through server code to find these
  • i'm not planning on keeping the method = c("data.frame", "csv") around, it's there for testing atm; i'd prefer to do it in just one way
  • fixtures seems to be a standard unit testing term for functions that handle repetitive tasks in a clean and isolated way; it's probably a stretch to call test data files fixtures, I can rename the dir to data or something
  • the comments in the tests are confusing; what I wrote as fetch_classic was supposed to be fetch_tbl, i'll fix that now

@brookslogan
Copy link
Contributor

A couple error/warning cases I can think of:

  • Truncated results due to row limit. This should at least have a non-"success" message, but I don't know if there's a standard result code.
  • I see a line if (r$result != 1 && r$result != 2) { so I assume something's returning 2 as a non-error thing. Maybe this is the result code for the truncated results?
  • Probably any non-"success" message should be turned into an R message or warning.
  • I'm not sure what will happen with API keys. We say we will return something like a 401 or 429 but also say there will be an epidata message. I believe both are simultaneously possible (returning an http error code + also content). But if httr stop_for_status stops us early, we might not show the user the explanatory error message.
  • Don't know of any codes between 200 and 400 exclusive, but that's just because I have no memory.

@brookslogan
Copy link
Contributor

brookslogan commented May 12, 2023

Regarding fixtures: Thanks for the link. I had read some pytest fixture description before but it just sounded like testthat's definition, but now looking at examples, they do look more like fixed objects if you don't use extra features. Don't feel too strongly about the naming here, just hoped there was something more specific/clearer because just googling gave me a lot of context/intro and I couldn't quickly find just a simple definition. (Note that in testthat you can define objects in a test file outside calls to test_that and they can be used within the rest of the test_thats in the file, but won't be available in other test files. This doesn't necessarily help in the current situation as this feature alone wouldn't completely eliminate the API queries, but thought I'd mention it because I didn't see this pattern yet in epidatr tests.)

@dshemetov
Copy link
Contributor Author

dshemetov commented May 13, 2023

Sorry, the r$results != 2 is a typo from me. Meant to be -2.

Will need to dig into server stuff to see if we send messages for truncated results. I have a hunch that this was a TODO item and currently we just truncate without warning.

@brookslogan brookslogan self-requested a review May 22, 2023 15:15
lcbrooks added 6 commits May 22, 2023 09:13
* `sort` and `arrange` will still deal with these in the same way
* `print` will no longer have distracting (and untrue in some cases for
  `geo_type`, since we don't have a pure hierarchy) `<`'s
* `contrasts` will output indicators for the 2nd--last levels, rather than
  something more complex; see
  https://colinfay.me/intro-to-r/statistical-models-in-r.html#Contrasts.

We might also consider just plain character vectors for these two, to prevent
certain completion utilities from doing some annoying things by default.
Add a test for this warning. Additionally, fix `mockery::stub` usage to do the
intended thing for `fetch_tbl` in related tests; fix looks a little hacky, but
stubbing with `depth = 2` within `fetch_tbl` didn't seem to work.
Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Made a few edits; please give them a sanity check / fixup.

@@ -108,25 +108,25 @@ print.covidcast_data_source <- function(source, ...) {
#' @export
covidcast_epidata <- function(base_url = global_base_url) {
url <- join_url(base_url, "covidcast/meta")
result <- do_request(url, list())
response <- do_request(url, list())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: These clarifications are great! When talking about these things, we should try to match this terminology, though we might need to also say "http response" or "epidata response" sometimes. Though I'm still going to mix up result and response for a while though.

# fetch_debug(format_type = "classic") %>%
# readr::write_rds(testthat::test_path("data/test-classic.rds"))
mockery::stub(fetch_classic, "httr::content", readRDS(testthat::test_path("data/test-classic.rds")))
mockery::stub(fetch_tbl, "fetch_classic", fetch_classic)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: just the single stub wasn't working, at least on my system. See a little discussion at 4b6f964 and #110.

#'
#' @export
fetch_json <- function(epidata_call, fields = NULL, disable_date_parsing = FALSE) {
fetch_csv <- function(epidata_call, fields = NULL, disable_date_parsing = FALSE, disable_tibble_output = FALSE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: can we eliminate fetch_csv altogether? Or is it used internally somewhere / otherwise desirable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can. I kept it around just so we had a test reference for the new fetch_tbl.

@brookslogan
Copy link
Contributor

Going to go ahead and merge this, so we have warnings and the new interface for API keys rollout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment