Consolidate `fetch` interfaces #99

dshemetov · 2023-05-06T00:27:40Z

A list of changes:

add the fetch function, which most of the time calls fetch_tbl and calls fetch_classic if the endpoint is only_supports_classic (this is delphi, pvt_norostat_meta, and meta)
fetch_tbl now uses fetch_classic to make requests (instead of CSV)
unexport fetch_csv, fetch_classic, fetch_tbl
remove fetch_json
add unexported fetch_debug for response debugging
request_impl now always raises on HTTP errors, forwards the HTML body to the error
fetch always raises for epidata errors, including no results
test and check for differences in the tibbles output by fetch_tbl and fetch_csv (identical on a few small queries)
many updates to documentation and vignettes
add mockery and mockr to Suggests
add new import xml2 to Depends
export as_tibble

TODO:

add more a few more unit tests to the fetch interfaces

Fixes:

fixes Consolidate fetch_* interfaces #72
fixes classic only endpoints should return plain JSON, but fromJSON auto converts to data.frame #88
fixes Some fetching functions don't appropriately stop/warn on epidata msgs/status #83
fixes API call returns data frame instead of string #58
fixes Ambiguous error when using fetch_tbl on a null response #17
fixes fetch_tbl() warning from vroom sometimes (when NA's are present?) #106 (not sure how, might be related to switching away from readr::read_csv to jsonlite::fromJSON)

* add a universal fetch interface that calls fetch_classic or fetch_json * test and compare fetch_tbl and fetch_csv * deprecate and unexport fetch_csv, fetch_json * redo documentation and vignettes * export as_tibble * add mockery and mockr to Suggests

* Favor `expect_identical` over `expect_equal`, as `expect_identical` is stricter, though in testthat edition 3.0, not as strict as `identical`. * Avoid `dplyr::all_equal`, which was even more (and excessively) lenient and is now deprecated.

* Remove description of nonexistent parameter. * Apply some markdown formatting.

* Keep developers' .Renviron's from interfering with the auth tests. * Add test characterizing how we have the env var take precedence over the option.

so that the "Value:" heading in the documentation will appear as a bulleted list naming each function documented in this topic and what its return value looks like.

R/model.R

brookslogan

praise: interface is looking more streamlined and slimmer
praise: looks like people can't miss epidata errors now!
issue: need to warn on epidata warnings in message.
todo: figure out whether epidata statuses or http statuses should be checked first. (see API keys doc for key situation. another one is when results are truncated, especially in older endpoints where the limits may be very small; maybe this should actually be an error because there's not a clear bug-free way to take advantage of any of the results, unless someone just wanted a preview, but then they could/should just ask for less data) [if there is http error, there will not be an epidata status; the content will be html with an error message]
question: what is return code 2? is that for truncated results?
todo: fix "1 for success, non-zero for failure"
[transferred below] question: (answer depends on some of the above questions/changes) what's the idea for fetch_classic() regarding errors and warnings? Right now it seems like a kind of mixed thing that stops in some cases, but still returns status&message info when it succeeds. Can we make it more consistent? (E.g., try to stop on all errors, warn on all warnings, and return $epidata?)
suggestion: rename "method = c("data.frame", "csv") to something like "via = c("classic", "csv") or encoding = c("classic", "csv")
question: is "fixtures" a standard term for these test prototype objects? testthat docs seem to use "fixture" to refer to local_*() functions
issue: (potentially; might be server thing) delphi(system = "ec", epiweek = 202006) %>% fetch() and delphi(system = "ec", epiweek = 202006) %>% fetch_classic() return 500s
question: Sorry, I'm so confused right now. I see in test comments and now code that "fetch_classic uses jsonlite::fromJSON, which converts the underlying data to a data.frame". But I thought we wanted this to give a nested list format on the only-classic things; we'd just want the old client's way of getting content for this function, and only use fromJSON when fetch_tbl-ing via the "classic" transfer-format/encoding, right? Or am I misunderstanding what fetch_classic output-format is supposed to be now?
todo: clarify/fix "fetch_csv uses readr::read_csv"; I assume it's fetch_tbl(...., format = "csv") that does this

Stopping here because I'm confused and may just be writing unhelpful things at this point. I haven't given the test based on the fixture objects a good look through yet, think I'll just muddle things further.

note: okay, now I see the disable_data_frame_parsing parameter. But I assume some statements I mentioned above may need clarified by mentioning this parameter or something along those lines.
note: I also committed a few minor changes to do with a dplyr deprecation and what I think current testthat practices are, roxygen tweaks, various wording tweaks

dshemetov · 2023-05-12T19:03:35Z

you can't check epidata status without checking HTTP status first, since an HTTP400+ won't return any epidata codes, so HTTP should go first
are HTTP statuses 200 < x < 400 possible from our server? do they carry useful information?
in fetch_classic, the error handling is: (1) httr::stop_for_status should raise an R error if we get an HTTP error, (2) if no HTTP error, look at epidata status code and raise an error with the epidata message if the the status code is not one of 1 success or -2 no results, (3) otherwise return the request (along with the status code and message). this seems pretty consistent to me?
wrt epidata warnings: should we catch other epidata status codes? is there even documentation for epidata status codes? i have a feeling we'll have to dig through server code to find these
i'm not planning on keeping the method = c("data.frame", "csv") around, it's there for testing atm; i'd prefer to do it in just one way
fixtures seems to be a standard unit testing term for functions that handle repetitive tasks in a clean and isolated way; it's probably a stretch to call test data files fixtures, I can rename the dir to data or something
the comments in the tests are confusing; what I wrote as fetch_classic was supposed to be fetch_tbl, i'll fix that now

brookslogan · 2023-05-12T21:24:30Z

A couple error/warning cases I can think of:

Truncated results due to row limit. This should at least have a non-"success" message, but I don't know if there's a standard result code.
I see a line if (r$result != 1 && r$result != 2) { so I assume something's returning 2 as a non-error thing. Maybe this is the result code for the truncated results?
Probably any non-"success" message should be turned into an R message or warning.
I'm not sure what will happen with API keys. We say we will return something like a 401 or 429 but also say there will be an epidata message. I believe both are simultaneously possible (returning an http error code + also content). But if httr stop_for_status stops us early, we might not show the user the explanatory error message.
Don't know of any codes between 200 and 400 exclusive, but that's just because I have no memory.

brookslogan · 2023-05-12T21:35:55Z

Regarding fixtures: Thanks for the link. I had read some pytest fixture description before but it just sounded like testthat's definition, but now looking at examples, they do look more like fixed objects if you don't use extra features. Don't feel too strongly about the naming here, just hoped there was something more specific/clearer because just googling gave me a lot of context/intro and I couldn't quickly find just a simple definition. (Note that in testthat you can define objects in a test file outside calls to test_that and they can be used within the rest of the test_thats in the file, but won't be available in other test files. This doesn't necessarily help in the current situation as this feature alone wouldn't completely eliminate the API queries, but thought I'd mention it because I didn't see this pattern yet in epidatr tests.)

dshemetov · 2023-05-13T04:35:39Z

Sorry, the r$results != 2 is a typo from me. Meant to be -2.

Will need to dig into server stuff to see if we send messages for truncated results. I have a hunch that this was a TODO item and currently we just truncate without warning.

* `sort` and `arrange` will still deal with these in the same way * `print` will no longer have distracting (and untrue in some cases for `geo_type`, since we don't have a pure hierarchy) `<`'s * `contrasts` will output indicators for the 2nd--last levels, rather than something more complex; see https://colinfay.me/intro-to-r/statistical-models-in-r.html#Contrasts. We might also consider just plain character vectors for these two, to prevent certain completion utilities from doing some annoying things by default.

Add a test for this warning. Additionally, fix `mockery::stub` usage to do the intended thing for `fetch_tbl` in related tests; fix looks a little hacky, but stubbing with `depth = 2` within `fetch_tbl` didn't seem to work.

brookslogan

Looks good! Made a few edits; please give them a sanity check / fixup.

brookslogan · 2023-05-22T15:35:25Z

R/covidcast.R

@@ -108,25 +108,25 @@ print.covidcast_data_source <- function(source, ...) {
 #' @export
 covidcast_epidata <- function(base_url = global_base_url) {
  url <- join_url(base_url, "covidcast/meta")
-  result <- do_request(url, list())
+  response <- do_request(url, list())


praise: These clarifications are great! When talking about these things, we should try to match this terminology, though we might need to also say "http response" or "epidata response" sometimes. Though I'm still going to mix up result and response for a while though.

brookslogan · 2023-05-22T22:31:36Z

tests/testthat/test-epidatacall.R

+  # fetch_debug(format_type = "classic") %>%
+  # readr::write_rds(testthat::test_path("data/test-classic.rds"))
+  mockery::stub(fetch_classic, "httr::content", readRDS(testthat::test_path("data/test-classic.rds")))
+  mockery::stub(fetch_tbl, "fetch_classic", fetch_classic)


note: just the single stub wasn't working, at least on my system. See a little discussion at 4b6f964 and #110.

brookslogan · 2023-05-22T22:36:15Z

R/epidatacall.R

-#'
-#' @export
-fetch_json <- function(epidata_call, fields = NULL, disable_date_parsing = FALSE) {
+fetch_csv <- function(epidata_call, fields = NULL, disable_date_parsing = FALSE, disable_tibble_output = FALSE) {


question: can we eliminate fetch_csv altogether? Or is it used internally somewhere / otherwise desirable?

I think we can. I kept it around just so we had a test reference for the new fetch_tbl.

brookslogan · 2023-05-23T17:33:06Z

Going to go ahead and merge this, so we have warnings and the new interface for API keys rollout.

Change `fetch_tbl` -> `fetch`, following cmu-delphi/epidatr#99.

dshemetov added 2 commits May 5, 2023 17:18

docs: document

9f3bb2e

dshemetov requested review from dajmcdon and brookslogan as code owners May 6, 2023 00:27

dshemetov and others added 15 commits May 5, 2023 17:57

bug: fix epidata error handling and epidatr vignette

c221815

docs: document

4547219

bug: fix fetch tests

e94b3c9

doc: document

56aac93

bug: add magrittr to suggests

2e32cf8

bug: fix mutate call

93fc16b

feat: change global_base_url to non-proxy host

1c03257

docs: Polish get_auth_key() roxygen

eb4e602

* Remove description of nonexistent parameter. * Apply some markdown formatting.

tests: Use withr in auth tests for better consistency, specificity

3939005

* Keep developers' .Renviron's from interfering with the auth tests. * Add test characterizing how we have the env var take precedence over the option.

tests: usethis::use_testthat(edition = 3) (enforce 3rd edition)

24eba74

docs: match style of multiple @return entries in ?epidata_call

8daac67

so that the "Value:" heading in the documentation will appear as a bulleted list naming each function documented in this topic and what its return value looks like.

refactor: eliminate unused/redundant binding

036f131

feat: improve print.epidata_call instructions

b5d76b7

docs: describe the two categories of classic $epidata formats

d28a133

brookslogan reviewed May 12, 2023

View reviewed changes

R/model.R Outdated Show resolved Hide resolved

brookslogan requested changes May 12, 2023

View reviewed changes

dshemetov added 2 commits May 12, 2023 12:02

docs: fix delphi doc example

ad329f6

bug: fix no results error code

7b89f30

dshemetov added 3 commits May 12, 2023 14:10

docs: fix epidata results doc

0195f11

test: fix test comments

e320812

refactor: use factor for parsing categoricals

da7199b

dshemetov added 3 commits May 18, 2023 15:49

Merge branch 'dev' into ds/fetch

627eaa6

style: styler

f9c9b53

pkg: set package version to 0.5.0 in constants.R

d6f5f89

dsweber2 mentioned this pull request May 19, 2023

first pass endpoints vignette (for closing #96) #102

Closed

4 tasks

dshemetov added 2 commits May 19, 2023 17:41

refactor: get consistent naming for API request variables

9a4971c

doc: docs

f06a114

brookslogan self-requested a review May 22, 2023 15:15

lcbrooks added 6 commits May 22, 2023 09:13

docs: Fix paragraph-filling that ignored @param

c151e2c

docs(as_of): fix typo

030a60b

docs(fetch_csv): document new parameters

51814e5

fix: raise warnings from API when able

4b6f964

Add a test for this warning. Additionally, fix `mockery::stub` usage to do the intended thing for `fetch_tbl` in related tests; fix looks a little hacky, but stubbing with `depth = 2` within `fetch_tbl` didn't seem to work.

docs: remove docs for replaced method parameter

9e97e72

brookslogan mentioned this pull request May 22, 2023

Investigate testthat::{with,local}_mocked_bindings #110

Closed

brookslogan approved these changes May 22, 2023

View reviewed changes

brookslogan reviewed May 22, 2023

View reviewed changes

brookslogan merged commit dd0970b into dev May 23, 2023

brookslogan deleted the ds/fetch branch May 23, 2023 17:33

brookslogan mentioned this pull request May 23, 2023

Update to latest epidatr (fetch_tbl -> fetch) cmu-delphi/epipredict#182

Merged

brookslogan pushed a commit to dajmcdon/cdph-vignette that referenced this pull request May 23, 2023

Update to latest epidatr, recompile

905f8bd

Change `fetch_tbl` -> `fetch`, following cmu-delphi/epidatr#99.

This was referenced May 23, 2023

Update to latest epidatr, recompile dajmcdon/cdph-vignette#4

Open

docs: note fetch_* -> fetch breaking change in README.md, bump version #111

Merged

dshemetov mentioned this pull request May 25, 2023

Idea for issue #17 about ambiguous error #37

Closed

brookslogan restored the ds/fetch branch May 25, 2023 21:32

brookslogan deleted the ds/fetch branch May 25, 2023 21:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate `fetch` interfaces #99

Consolidate `fetch` interfaces #99

dshemetov commented May 6, 2023 •

edited

Loading

brookslogan left a comment •

edited

Loading

dshemetov commented May 12, 2023 •

edited

Loading

brookslogan commented May 12, 2023

brookslogan commented May 12, 2023 •

edited

Loading

dshemetov commented May 13, 2023 •

edited

Loading

brookslogan left a comment

brookslogan May 22, 2023

brookslogan May 22, 2023

brookslogan May 22, 2023

dshemetov May 25, 2023

brookslogan commented May 23, 2023

Consolidate fetch interfaces #99

Consolidate fetch interfaces #99

Conversation

dshemetov commented May 6, 2023 • edited Loading

brookslogan left a comment • edited Loading

Choose a reason for hiding this comment

dshemetov commented May 12, 2023 • edited Loading

brookslogan commented May 12, 2023

brookslogan commented May 12, 2023 • edited Loading

dshemetov commented May 13, 2023 • edited Loading

brookslogan left a comment

Choose a reason for hiding this comment

brookslogan May 22, 2023

Choose a reason for hiding this comment

brookslogan May 22, 2023

Choose a reason for hiding this comment

brookslogan May 22, 2023

Choose a reason for hiding this comment

dshemetov May 25, 2023

Choose a reason for hiding this comment

brookslogan commented May 23, 2023

Consolidate `fetch` interfaces #99

Consolidate `fetch` interfaces #99

dshemetov commented May 6, 2023 •

edited

Loading

brookslogan left a comment •

edited

Loading

dshemetov commented May 12, 2023 •

edited

Loading

brookslogan commented May 12, 2023 •

edited

Loading

dshemetov commented May 13, 2023 •

edited

Loading