add healthdata.gov signals on hospitalizations #288

nickreich · 2020-11-17T01:35:11Z

We are hoping to migrate the teams doing the hospitalization forecasting challenge at the COVID-19 Forecast Hub over to using the datasets recently (in the last few days) made publicly available at the link below.

Note API links on left-hand panel down the page a bit:
https://healthdata.gov/dataset/covid-19-reported-patient-impact-and-hospital-capacity-state-timeseries

capnrefsmmat · 2020-11-17T02:54:25Z

Thanks. I'll move this over to our indicators repository and flag it to review when we do our next prioritization.

dfarrow0 · 2020-11-17T14:21:34Z

Thanks for the heads-up @nickreich!

I recommend scraping this as a new data source in epidata, in a new database table. If there are specific indicators to derive from this, I recommend computing those separately and storing them in the existing covidcast table.

krivard · 2020-11-17T14:29:33Z

Just to note, the API links on the healthdata.gov page are for the metadata only, not the dataset itself. We'll probably need to write a scraper to harvest the CSV link from the page.

nickreich · 2020-11-17T14:31:56Z

In case it's useful, this snippet of code was shared with me as an access to the data:

data.table::fread(
  jsonlite::fromJSON(
    "https://healthdata.gov/api/3/action/package_show?id=83b4a668-9321-4d8c-bc4f-2bef66c49050&page=0")$result$resources[[1]]$url
  )

krivard · 2020-11-17T14:34:03Z

Right -- that's the metadata query.

nickreich · 2020-11-17T14:40:50Z

when I run this I see the actual data in my R session.

krivard · 2020-11-17T14:48:03Z

I stand semi-corrected: the query url https://healthdata.gov/api/3/action/package_show?id=83b4a668-9321-4d8c-bc4f-2bef66c49050&page=0 returns the metadata, and the metatdata includes the url for the csv file at https://healthdata.gov/sites/default/files/reported_hospital_utilization_timeseries_20201115_2134.csv, which is then read by data.table. Fair enough!

nickreich · 2020-11-17T14:55:20Z

it ain't pretty, that's for sure.

krivard · 2020-11-17T18:01:53Z

Moving this to epidata for @dfarrow0 to add the full dataset as a separate endpoint. Once that's done, we'll bounce it back here for covidcast support.

RoniRos · 2020-11-18T00:00:50Z

FWIW, Nick just pointed me to the data dictionary for this table. Worth storing with the metadata,

nickreich · 2020-11-19T13:55:02Z

Note that the most time-sensitive columns of this dataset to be surfacing are

previous_day_admission_pediatric_covid_confirmed
previous_day_admission_adult_covid_confirmed

For Forecast Hub purposes, it may make sense to return the sum of these two values, as this is likely to emerge as the central hospitalization forecast target.

dfarrow0 · 2020-11-19T14:49:46Z

Thanks, noted.

It looks like the dataset was updated yesterday (11/18). I have a local version of the previous issue (11/16). When the code gets checked in, I'll manually load the previous issue so that we have a complete revision history in the API.

@nickreich are there versions of this dataset prior to 11/16? if so, do you have them?

nickreich · 2020-11-19T16:59:58Z

I don't know of prior versions. I think there were some available privately within the HHS Protect system, but I don't think they are public.

nickreich · 2020-11-19T22:38:12Z

Thanks @krivard and @dfarrow0. Is there another issue where I can track the progress of adding this to the covidcast system?

dfarrow0 · 2020-11-19T22:59:38Z

Quick update: acquisition code has been merged and is scheduled to run twice daily. The data is now available via the Delphi Epidata API directly (sample URL), although the PyPI python package hasn't been updated yet — planning to do that tomorrow.

EpiVis, our relatively obscure web visualization for the API, has been updated to support this new dataset. As an example, here's an interactive plot of inpatient bed utilization over time for ND. Spoiler: it's at 75% and increasing.

dfarrow0 · 2020-11-19T23:03:49Z

@nickreich I'll have to defer to @krivard for an ETA on COVIDcast.

But just to clarify, could you elaborate a little bit? For example, are you asking when it'll be available on the map at https://covidcast.cmu.edu/, or when it'll be available though the covidcast R package, or something else?

RoniRos · 2020-11-19T23:23:49Z

EpiVis, our relatively obscure web visualization for the API, has been updated to support this new dataset.

Hurray, @dfarrow0 ! You must remember how much I love EpiVis!

RoniRos · 2020-11-22T22:47:33Z

@nickreich may revise or elaborate, but for the purpose of the CDC forecasting activity, I think it's more important that this summed-up signal be available in the API, so we can point to it and tell all participants "This signal is to be used as the Ground Truth for hospitalization forecasting". While we can tell them that the ground truth is the sum of these two already-existing signals, that's kind of clumsy.
For the purpose of the CDC forecasting activity, I don't think there is a need to add the summed-up signal to the map. But it might be something we could consider separately.

nickreich · 2020-11-24T18:00:43Z

I agree that for modeling purposes, adding the sum is important to have a single signal for "ground truth".

I don't quite understand the distinction between the Epidata API and being available via the covidcast R package, say. @dfarrow0 that is what I was curious about. I don't care too much about the website availability, but do want to be able to access via an API somehow, and the more smooth that is, the better.

Final question: this weekend they seemed to post an incorrect file and then later corrected it. how does your system handle this?

dfarrow0 · 2020-12-03T15:15:43Z

I don't quite understand the distinction between the Epidata API and being available via the covidcast R package

Yeah this is unclear, and we haven't been very consistent in general about how we refer to these things. The short version is this: the Epidata API contains various "endpoints". For example, fluview, nowcast, covid_hosp, covidcast, etc. We have libraries in python, R, javascript called delphi-epidata, and those libraries can fetch data from all Epidata API endpoints. Separately, Delphi has much more specialized/powerful libraries in python and R called covidcast, and these libraries fetch data solely from the covidcast endpoint of the Epidata API. For some time we've loosely called this the "COVIDcast Epidata API" (although IMO that's something of a misnomer since it's not separate API).

In any case, the covid_hosp endpoint of the Epidata API hosts this HHS dataset, and so it's not available via the covidcast library, but it is available via the delphi-epidata library.

I agree that for modeling purposes, adding the sum is important to have a single signal for "ground truth".

As I understand it, the plan is to compute this sum and surface it via the covidcast endpoint, which would make it available in the covidcast library. Until then, someone would need to use the generic delphi-epidata library and compute the sum manually.

want to be able to access via an API somehow, and the more smooth that is, the better.

Definitely agree that smoother is better, and currently there's room for improvement. Currently you can access the data with the delphi-epidata library, or you can just curl it directly, for example: https://delphi.cmu.edu/epidata/api.php?source=covid_hosp&states=PA&dates=20201101

Final question: this weekend they seemed to post an incorrect file and then later corrected it. how does your system handle this?

The API stores each dataset along with a version number, which we refer to as "issue" (as in "The December issue of Nature magazine"). Anyway, when you query the Epidata API, it returns data from the most recent issue by default. If you request data from a specific issue, it'll return that instead. So when a correction is posted, that becomes the most recent version and will be returned in queries by default.

capnrefsmmat transferred this issue from cmu-delphi/covidcast Nov 17, 2020

krivard transferred this issue from cmu-delphi/covidcast-indicators Nov 17, 2020

dfarrow0 mentioned this issue Nov 18, 2020

new data source: covid hospitalization #292

Merged

krivard closed this as completed in #292 Nov 19, 2020

nickreich mentioned this issue Jan 28, 2021

add merged healthdata.gov signal that combines weekly and daily data #398

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add healthdata.gov signals on hospitalizations #288

add healthdata.gov signals on hospitalizations #288

nickreich commented Nov 17, 2020

capnrefsmmat commented Nov 17, 2020

dfarrow0 commented Nov 17, 2020 •

edited

Loading

krivard commented Nov 17, 2020

nickreich commented Nov 17, 2020

krivard commented Nov 17, 2020

nickreich commented Nov 17, 2020

krivard commented Nov 17, 2020 •

edited

Loading

nickreich commented Nov 17, 2020

krivard commented Nov 17, 2020

RoniRos commented Nov 18, 2020

nickreich commented Nov 19, 2020

dfarrow0 commented Nov 19, 2020

nickreich commented Nov 19, 2020

nickreich commented Nov 19, 2020

dfarrow0 commented Nov 19, 2020

dfarrow0 commented Nov 19, 2020

RoniRos commented Nov 19, 2020

RoniRos commented Nov 22, 2020

nickreich commented Nov 24, 2020

dfarrow0 commented Dec 3, 2020

add healthdata.gov signals on hospitalizations #288

add healthdata.gov signals on hospitalizations #288

Comments

nickreich commented Nov 17, 2020

capnrefsmmat commented Nov 17, 2020

dfarrow0 commented Nov 17, 2020 • edited Loading

krivard commented Nov 17, 2020

nickreich commented Nov 17, 2020

krivard commented Nov 17, 2020

nickreich commented Nov 17, 2020

krivard commented Nov 17, 2020 • edited Loading

nickreich commented Nov 17, 2020

krivard commented Nov 17, 2020

RoniRos commented Nov 18, 2020

nickreich commented Nov 19, 2020

dfarrow0 commented Nov 19, 2020

nickreich commented Nov 19, 2020

nickreich commented Nov 19, 2020

dfarrow0 commented Nov 19, 2020

dfarrow0 commented Nov 19, 2020

RoniRos commented Nov 19, 2020

RoniRos commented Nov 22, 2020

nickreich commented Nov 24, 2020

dfarrow0 commented Dec 3, 2020

dfarrow0 commented Nov 17, 2020 •

edited

Loading

krivard commented Nov 17, 2020 •

edited

Loading