Skip to content

Some fetching functions don't appropriately stop/warn on epidata msgs/status #83

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brookslogan opened this issue Apr 11, 2023 · 6 comments · Fixed by #99
Closed

Some fetching functions don't appropriately stop/warn on epidata msgs/status #83

brookslogan opened this issue Apr 11, 2023 · 6 comments · Fixed by #99
Assignees
Labels
P0 Top priority

Comments

@brookslogan
Copy link
Contributor

brookslogan commented Apr 11, 2023

We stop for httr status codes, but not Epidata errors / status codes. For example, we don't stop for the "no results" error. Testing out a few fetching functions:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(epidatr)
fluview("bogus", epirange(201501,201503)) %>%
  fetch_classic()
#> $epidata
#> data frame with 0 columns and 0 rows
#> 
#> $result
#> [1] -2
#> 
#> $message
#> [1] "no results"
fluview("bogus", epirange(201501,201503)) %>%
  fetch_json()
#> data frame with 0 columns and 0 rows
fluview("bogus", epirange(201501,201503)) %>%
  fetch_tbl()
#> Error: '' does not exist in current working directory ('/tmp/RtmpWbkEDL/reprex-1ba5e781fc6ae-ripe-human').
fluview("bogus", epirange(201501,201503)) %>%
  fetch_csv()
#> [1] ""
#> attr(,"class")
#> [1] "epidata_csv" "character"

Created on 2023-04-10 by the reprex package (v2.0.1)

So fetch_json has this problem, and probably fetch_tbl and fetch_csv too, though it may not be clear because of #17. (fetch_classic is fine, as it's representing the error directly and not throwing it away, though I'm not sure we want that 0-row 0-column data frame or something else.)

@brookslogan brookslogan added the P1 medium priority label Apr 11, 2023
@brookslogan
Copy link
Contributor Author

brookslogan commented Apr 11, 2023

For the specific case of "no results":

  • Most of the time I think this is from a faulty request (e.g., unsupported/misspelled geo values):
    • Most helpful: raise a more specific & helpful error saying what the exact problem was
    • Okay: raise a more generic error: "no results" / "no results (check that the geo values and time range requested are available for this endpoint)"
    • Less helpful: tibble with 0 rows, 0 columns
  • Some of the time we want to check on the boundary of available time/issue values to see whether or not something new is available, or to break up requests into chunks:
    • Most helpful: tibble with 0 rows, the correct column names and types
    • Okay: error
    • Probably okay: tibble with 0 rows, 0 columns

Here, I think we probably favor raising the error message. And for any other type of error message, we'd probably want to raise it even more than the "no results" one. I forget if there are status codes / messages to indicate warnings (maybe status codes > 1?); returning a warning + results might also be a desired outcome in some cases.

@brookslogan
Copy link
Contributor Author

The second case above needs a little more thought. If there is data available, the user will want to format it, probably to a tibble. That might mean we'd want to expose a conversion function from the classic/raw format to tibble, maybe something like function(raw, ignore_errors=FALSE) so that people that randomly use it instead of fetch_tbl won't also accidentally throw away epidata errors like we do now.

@dshemetov
Copy link
Contributor

Throwing an error on no results doesn't seem right to me, mostly because I found it a very common workflow to try a cross product of a set of dates or geos, make requests for each, and then compile together. Since I generally don't know ahead of time which param combinations will yield values, this tends to encourage a "poke and see" kind of approach. Having to wrap this workflow in a try catch block would be a bit annoying, though I guess doable.

@brookslogan
Copy link
Contributor Author

brookslogan commented Apr 13, 2023

Is this a common workflow you'd want to do, or that you see people using? Would just directly putting in the geos & dates in one API request work?

I'd imagine error vs. not for no data is actually a server-side todo, and [the clients] should just raise all epidata errors as errors? Unless we're aiming for backwards compatibility.

@dshemetov
Copy link
Contributor

It's a workflow that I seemed to converge on whenever I did exploratory analysis. Also the old clients separated geos and time_values into many separate requests, since we didn't support multiple values there. But yea, many of these issues go away with multi geo/time_value queries.

Hm, I suppose it wouldn't be too bad to error on no results. It's probably a mistake if a query returns nothing and can always be worked around with a catch block.

@dshemetov dshemetov self-assigned this May 3, 2023
@brookslogan brookslogan added P0 Top priority and removed P1 medium priority labels May 17, 2023
@brookslogan
Copy link
Contributor Author

API keys rollout is going to make appropriately warning here more important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 Top priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants