Some fetching functions don't appropriately stop/warn on epidata msgs/status #83

brookslogan · 2023-04-11T04:23:29Z

We stop for httr status codes, but not Epidata errors / status codes. For example, we don't stop for the "no results" error. Testing out a few fetching functions:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(epidatr)
fluview("bogus", epirange(201501,201503)) %>%
  fetch_classic()
#> $epidata
#> data frame with 0 columns and 0 rows
#> 
#> $result
#> [1] -2
#> 
#> $message
#> [1] "no results"
fluview("bogus", epirange(201501,201503)) %>%
  fetch_json()
#> data frame with 0 columns and 0 rows
fluview("bogus", epirange(201501,201503)) %>%
  fetch_tbl()
#> Error: '' does not exist in current working directory ('/tmp/RtmpWbkEDL/reprex-1ba5e781fc6ae-ripe-human').
fluview("bogus", epirange(201501,201503)) %>%
  fetch_csv()
#> [1] ""
#> attr(,"class")
#> [1] "epidata_csv" "character"

^{Created on 2023-04-10 by the reprex package (v2.0.1)}

So fetch_json has this problem, and probably fetch_tbl and fetch_csv too, though it may not be clear because of #17. (fetch_classic is fine, as it's representing the error directly and not throwing it away, though I'm not sure we want that 0-row 0-column data frame or something else.)

The text was updated successfully, but these errors were encountered:

brookslogan · 2023-04-11T04:34:45Z

For the specific case of "no results":

Most of the time I think this is from a faulty request (e.g., unsupported/misspelled geo values):
- Most helpful: raise a more specific & helpful error saying what the exact problem was
- Okay: raise a more generic error: "no results" / "no results (check that the geo values and time range requested are available for this endpoint)"
- Less helpful: tibble with 0 rows, 0 columns
Some of the time we want to check on the boundary of available time/issue values to see whether or not something new is available, or to break up requests into chunks:
- Most helpful: tibble with 0 rows, the correct column names and types
- Okay: error
- Probably okay: tibble with 0 rows, 0 columns

Here, I think we probably favor raising the error message. And for any other type of error message, we'd probably want to raise it even more than the "no results" one. I forget if there are status codes / messages to indicate warnings (maybe status codes > 1?); returning a warning + results might also be a desired outcome in some cases.

brookslogan · 2023-04-11T18:07:49Z

The second case above needs a little more thought. If there is data available, the user will want to format it, probably to a tibble. That might mean we'd want to expose a conversion function from the classic/raw format to tibble, maybe something like function(raw, ignore_errors=FALSE) so that people that randomly use it instead of fetch_tbl won't also accidentally throw away epidata errors like we do now.

dshemetov · 2023-04-11T18:12:37Z

Throwing an error on no results doesn't seem right to me, mostly because I found it a very common workflow to try a cross product of a set of dates or geos, make requests for each, and then compile together. Since I generally don't know ahead of time which param combinations will yield values, this tends to encourage a "poke and see" kind of approach. Having to wrap this workflow in a try catch block would be a bit annoying, though I guess doable.

brookslogan · 2023-04-13T00:00:38Z

Is this a common workflow you'd want to do, or that you see people using? Would just directly putting in the geos & dates in one API request work?

I'd imagine error vs. not for no data is actually a server-side todo, and [the clients] should just raise all epidata errors as errors? Unless we're aiming for backwards compatibility.

dshemetov · 2023-04-13T18:44:25Z

It's a workflow that I seemed to converge on whenever I did exploratory analysis. Also the old clients separated geos and time_values into many separate requests, since we didn't support multiple values there. But yea, many of these issues go away with multi geo/time_value queries.

Hm, I suppose it wouldn't be too bad to error on no results. It's probably a mistake if a query returns nothing and can always be worked around with a catch block.

brookslogan · 2023-05-17T20:28:11Z

API keys rollout is going to make appropriately warning here more important.

brookslogan added the P1 medium priority label Apr 11, 2023

dshemetov self-assigned this May 3, 2023

dshemetov mentioned this issue May 6, 2023

Consolidate fetch interfaces #99

Merged

1 task

brookslogan added P0 Top priority and removed P1 medium priority labels May 17, 2023

brookslogan closed this as completed in #99 May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some fetching functions don't appropriately stop/warn on epidata msgs/status #83

Some fetching functions don't appropriately stop/warn on epidata msgs/status #83

brookslogan commented Apr 11, 2023 •

edited

Loading

brookslogan commented Apr 11, 2023 •

edited

Loading

brookslogan commented Apr 11, 2023

dshemetov commented Apr 11, 2023

brookslogan commented Apr 13, 2023 •

edited

Loading

dshemetov commented Apr 13, 2023

brookslogan commented May 17, 2023

Some fetching functions don't appropriately stop/warn on epidata msgs/status #83

Some fetching functions don't appropriately stop/warn on epidata msgs/status #83

Comments

brookslogan commented Apr 11, 2023 • edited Loading

brookslogan commented Apr 11, 2023 • edited Loading

brookslogan commented Apr 11, 2023

dshemetov commented Apr 11, 2023

brookslogan commented Apr 13, 2023 • edited Loading

dshemetov commented Apr 13, 2023

brookslogan commented May 17, 2023

brookslogan commented Apr 11, 2023 •

edited

Loading

brookslogan commented Apr 11, 2023 •

edited

Loading

brookslogan commented Apr 13, 2023 •

edited

Loading