-
Notifications
You must be signed in to change notification settings - Fork 67
add healthdata.gov signals on hospitalizations #288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks. I'll move this over to our indicators repository and flag it to review when we do our next prioritization. |
Thanks for the heads-up @nickreich! I recommend scraping this as a new data source in epidata, in a new database table. If there are specific indicators to derive from this, I recommend computing those separately and storing them in the existing covidcast table. |
Just to note, the API links on the healthdata.gov page are for the metadata only, not the dataset itself. We'll probably need to write a scraper to harvest the CSV link from the page. |
In case it's useful, this snippet of code was shared with me as an access to the data:
|
Right -- that's the metadata query. |
when I run this I see the actual data in my R session. |
I stand semi-corrected: the query url https://healthdata.gov/api/3/action/package_show?id=83b4a668-9321-4d8c-bc4f-2bef66c49050&page=0 returns the metadata, and the metatdata includes the url for the csv file at https://healthdata.gov/sites/default/files/reported_hospital_utilization_timeseries_20201115_2134.csv, which is then read by |
it ain't pretty, that's for sure. |
Moving this to epidata for @dfarrow0 to add the full dataset as a separate endpoint. Once that's done, we'll bounce it back here for covidcast support. |
FWIW, Nick just pointed me to the data dictionary for this table. Worth storing with the metadata, |
Note that the most time-sensitive columns of this dataset to be surfacing are
For Forecast Hub purposes, it may make sense to return the sum of these two values, as this is likely to emerge as the central hospitalization forecast target. |
Thanks, noted. It looks like the dataset was updated yesterday (11/18). I have a local version of the previous issue (11/16). When the code gets checked in, I'll manually load the previous issue so that we have a complete revision history in the API. @nickreich are there versions of this dataset prior to 11/16? if so, do you have them? |
I don't know of prior versions. I think there were some available privately within the HHS Protect system, but I don't think they are public. |
Quick update: acquisition code has been merged and is scheduled to run twice daily. The data is now available via the Delphi Epidata API directly (sample URL), although the PyPI python package hasn't been updated yet — planning to do that tomorrow. EpiVis, our relatively obscure web visualization for the API, has been updated to support this new dataset. As an example, here's an interactive plot of inpatient bed utilization over time for ND. Spoiler: it's at 75% and increasing. |
@nickreich I'll have to defer to @krivard for an ETA on COVIDcast. But just to clarify, could you elaborate a little bit? For example, are you asking when it'll be available on the map at https://covidcast.cmu.edu/, or when it'll be available though the |
Hurray, @dfarrow0 ! You must remember how much I love EpiVis! |
@nickreich may revise or elaborate, but for the purpose of the CDC forecasting activity, I think it's more important that this summed-up signal be available in the API, so we can point to it and tell all participants "This signal is to be used as the Ground Truth for hospitalization forecasting". While we can tell them that the ground truth is the sum of these two already-existing signals, that's kind of clumsy. |
I agree that for modeling purposes, adding the sum is important to have a single signal for "ground truth". I don't quite understand the distinction between the Epidata API and being available via the covidcast R package, say. @dfarrow0 that is what I was curious about. I don't care too much about the website availability, but do want to be able to access via an API somehow, and the more smooth that is, the better. Final question: this weekend they seemed to post an incorrect file and then later corrected it. how does your system handle this? |
Yeah this is unclear, and we haven't been very consistent in general about how we refer to these things. The short version is this: the Epidata API contains various "endpoints". For example, In any case, the
As I understand it, the plan is to compute this sum and surface it via the
Definitely agree that smoother is better, and currently there's room for improvement. Currently you can access the data with the
The API stores each dataset along with a version number, which we refer to as "issue" (as in "The December issue of Nature magazine"). Anyway, when you query the Epidata API, it returns data from the most recent issue by default. If you request data from a specific issue, it'll return that instead. So when a correction is posted, that becomes the most recent version and will be returned in queries by default. |
We are hoping to migrate the teams doing the hospitalization forecasting challenge at the COVID-19 Forecast Hub over to using the datasets recently (in the last few days) made publicly available at the link below.
Note API links on left-hand panel down the page a bit:
https://healthdata.gov/dataset/covid-19-reported-patient-impact-and-hospital-capacity-state-timeseries
The text was updated successfully, but these errors were encountered: