Skip to content

new data source: covid hospitalization #292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Nov 19, 2020
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions deploy.json
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,15 @@
"dst": "/common/covidcast/README.md"
},

"// acquisition - covid_hosp",
{
"type": "move",
"src": "src/acquisition/covid_hosp/",
"dst": "[[package]]/acquisition/covid_hosp/",
"match": "^.*\\.(py)$",
"add-header-comment": true
},

"// run unit and coverage tests",
{"type": "py3test"}

Expand Down
1 change: 1 addition & 0 deletions docs/api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ The parameters available for each source are documented in each linked source-sp
| --- | --- | --- | --- |
| [`covidcast`](covidcast.md) | COVIDCast | Delphi's COVID-19 surveillance streams. | no |
| [`covidcast_meta`](covidcast_meta.md) | COVIDCast Metadata | Metadata for Delphi's COVID-19 surveillance streams. | no |
| [`covid_hosp`](covid_hosp.md) | COVID-19 Hospitalization | COVID-19 Reported Patient Impact and Hospital Capacity. | no |

### Influenza Data

Expand Down
158 changes: 158 additions & 0 deletions docs/api/covid_hosp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
---
title: COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries
parent: Epidata API (Other Epidemics)
---

# COVID-19 Hospitalization

This data source is a mirror of the "COVID-19 Reported Patient Impact and
Hospital Capacity by State Timeseries" dataset provided by the US Department of
Health & Human Services via healthdata.gov.

See the
[official description at healthdata.gov](https://healthdata.gov/dataset/covid-19-reported-patient-impact-and-hospital-capacity-state-timeseries)
for more information, including a
[data dictionary](https://healthdata.gov/covid-19-reported-patient-impact-and-hospital-capacity-state-data-dictionary).

General topics not specific to any particular data source are discussed in the
[API overview](README.md). Such topics include:
[contributing](README.md#contributing) and [citing](README.md#citing).

## Metadata

This data source provides various measures of COVID-19 burden on patients and healthcare in the US.
- Data source: [US Department of Health & Human Services](https://healthdata.gov/dataset/covid-19-reported-patient-impact-and-hospital-capacity-state-timeseries) (HHS)
- Temporal Resolution: Daily, starting 2020-01-01
- Spatial Resolution: US States plus DC, PR, and VI
- Open access via [Open Data Commons Open Database License (ODbL)](https://opendatacommons.org/licenses/odbl/1.0/)
- Versioned by Delphi according to "issue" date. New issues are expected to be released roughly weekly.

# The API

The base URL is: https://delphi.cmu.edu/epidata/api.php

See [this documentation](README.md) for details on specifying locations and dates.

## Parameters

### Required

| Parameter | Description | Type |
| --- | --- | --- |
| `states` | two-letter state abbreviations | `list` of states |
| `dates` | dates | `list` of dates or date ranges |

### Optional

| Parameter | Description | Type |
| --- | --- | --- |
| `issues` | issues | `list` of "issue" dates or date ranges |

If `issues` is not specified, then the most recent issue is used by default.

## Response

| Field | Description | Type |
| --- | --- | --- |
| `result` | result code: 1 = success, 2 = too many results, -2 = no results | integer |
| `epidata` | list of results | array of objects |
| `epidata[].state` | state pertaining to this row | string |
| `epidata[].date` | date pertaining to this row | integer |
| `epidata[].issue` | the date on which the dataset containing this row was published | integer |
| `epidata[].*` | see the [data dictionary](https://healthdata.gov/covid-19-reported-patient-impact-and-hospital-capacity-state-data-dictionary) | |
| `message` | `success` or error message | string |

# Example URLs

### MA on 2020-05-10 (per most recent issue)
https://delphi.cmu.edu/epidata/api.php?source=covid_hosp&states=MA&dates=20200510

```json
{
"result": 1,
"epidata": [
{
"state": "MA",
"issue": 20201116,
"date": 20200510,
"hospital_onset_covid": 53,
"hospital_onset_covid_coverage": 84,
"inpatient_beds": 15691,
"inpatient_beds_coverage": 73,
"inpatient_beds_used": 12427,
"inpatient_beds_used_coverage": 83,
"inpatient_beds_used_covid": 3625,
"inpatient_beds_used_covid_coverage": 84,
"previous_day_admission_adult_covid_confirmed": null,
"previous_day_admission_adult_covid_confirmed_coverage": 0,
"previous_day_admission_adult_covid_suspected": null,
"previous_day_admission_adult_covid_suspected_coverage": 0,
"previous_day_admission_pediatric_covid_confirmed": null,
"previous_day_admission_pediatric_covid_confirmed_coverage": 0,
"previous_day_admission_pediatric_covid_suspected": null,
"previous_day_admission_pediatric_covid_suspected_coverage": 0,
"staffed_adult_icu_bed_occupancy": null,
"staffed_adult_icu_bed_occupancy_coverage": 0,
"staffed_icu_adult_patients_confirmed_suspected_covid": null,
"staffed_icu_adult_patients_confirmed_suspected_covid_coverage": 0,
"staffed_icu_adult_patients_confirmed_covid": null,
"staffed_icu_adult_patients_confirmed_covid_coverage": 0,
"total_adult_patients_hosp_confirmed_suspected_covid": null,
"total_adult_patients_hosp_confirmed_suspected_covid_coverage": 0,
"total_adult_patients_hosp_confirmed_covid": null,
"total_adult_patients_hosp_confirmed_covid_coverage": 0,
"total_pediatric_patients_hosp_confirmed_suspected_covid": null,
"total_pediatric_patients_hosp_confirmed_suspected_covid_coverage": 0,
"total_pediatric_patients_hosp_confirmed_covid": null,
"total_pediatric_patients_hosp_confirmed_covid_coverage": 0,
"total_staffed_adult_icu_beds": null,
"total_staffed_adult_icu_beds_coverage": 0,
"inpatient_beds_utilization_coverage": 72,
"inpatient_beds_utilization_numerator": 10876,
"inpatient_beds_utilization_denominator": 15585,
"percent_of_inpatients_with_covid_coverage": 83,
"percent_of_inpatients_with_covid_numerator": 3607,
"percent_of_inpatients_with_covid_denominator": 12427,
"inpatient_bed_covid_utilization_coverage": 73,
"inpatient_bed_covid_utilization_numerator": 3304,
"inpatient_bed_covid_utilization_denominator": 15691,
"adult_icu_bed_covid_utilization_coverage": null,
"adult_icu_bed_covid_utilization_numerator": null,
"adult_icu_bed_covid_utilization_denominator": null,
"adult_icu_bed_utilization_coverage": null,
"adult_icu_bed_utilization_numerator": null,
"adult_icu_bed_utilization_denominator": null,
"inpatient_beds_utilization": 0.6978504972730191,
"percent_of_inpatients_with_covid": 0.2902550897239881,
"inpatient_bed_covid_utilization": 0.21056656682174496,
"adult_icu_bed_covid_utilization": null,
"adult_icu_bed_utilization": null
}
],
"message": "success"
}
```


# Code Samples

Libraries are available for [CoffeeScript](../../src/client/delphi_epidata.coffee), [JavaScript](../../src/client/delphi_epidata.js), [Python](../../src/client/delphi_epidata.py), and [R](../../src/client/delphi_epidata.R).
The following sample shows how to import the library and fetch MA on 2020-05-10
(per most recent issue).

### Python

Optionally install the package using pip(env):
````bash
pip install delphi-epidata
````

Otherwise, place `delphi_epidata.py` from this repo next to your python script.

````python
# Import
from delphi_epidata import Epidata
# Fetch data
res = Epidata.covid_hosp('MA', 20200510)
print(res['result'], res['message'], len(res['epidata']))
````
87 changes: 87 additions & 0 deletions integrations/acquisition/covid_hosp/test_scenarios.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
"""Integration tests for acquisition of COVID hospitalization."""

# standard library
from pathlib import Path
import unittest
from unittest.mock import MagicMock

# first party
from delphi.epidata.acquisition.covid_hosp.database import Database
from delphi.epidata.acquisition.covid_hosp.test_utils import TestUtils
from delphi.epidata.client.delphi_epidata import Epidata
import delphi.operations.secrets as secrets

# py3tester coverage target (equivalent to `import *`)
__test_target__ = 'delphi.epidata.acquisition.covid_hosp.update'


class AcquisitionTests(unittest.TestCase):

def setUp(self):
"""Perform per-test setup."""

# configure test data
path_to_repo_root = Path(__file__).parent.parent.parent.parent
self.test_utils = TestUtils(path_to_repo_root)

# use the local instance of the Epidata API
Epidata.BASE_URL = 'http://delphi_web_epidata/epidata/api.php'

# use the local instance of the epidata database
secrets.db.host = 'delphi_database_epidata'
secrets.db.epi = ('user', 'pass')

# clear relevant tables
with Database.connect() as db:
with db.new_cursor() as cur:
cur.execute('truncate table covid_hosp')
cur.execute('truncate table covid_hosp_meta')

def test_acquire_dataset(self):
"""Acquire a new dataset."""

# only mock out network calls to external hosts
mock_network = MagicMock()
mock_network.fetch_metadata.return_value = \
self.test_utils.load_sample_metadata()
mock_network.fetch_dataset.return_value = \
self.test_utils.load_sample_dataset()

# make sure the data does not yet exist
with self.subTest(name='no data yet'):
response = Epidata.covid_hosp('MA', Epidata.range(20200101, 20210101))
self.assertEqual(response['result'], -2)

# acquire sample data into local database
with self.subTest(name='first acquisition'):
acquired = Update.run(network_impl=mock_network)
self.assertTrue(acquired)

# make sure the data now exists
with self.subTest(name='initial data checks'):
response = Epidata.covid_hosp('MA', Epidata.range(20200101, 20210101))
self.assertEqual(response['result'], 1)
self.assertEqual(len(response['epidata']), 1)
row = response['epidata'][0]
self.assertEqual(row['state'], 'MA')
self.assertEqual(row['date'], 20200510)
self.assertEqual(row['issue'], 20201116)
self.assertEqual(row['hospital_onset_covid'], 53)
actual = row['inpatient_bed_covid_utilization']
expected = 0.21056656682174496
self.assertTrue(abs(actual - expected) < 1e-5)
self.assertIsNone(row['adult_icu_bed_utilization'])

# expect 55 fields per row (56 database columns, except `id`)
self.assertEqual(len(row), 55)

# re-acquisition of the same dataset should be a no-op
with self.subTest(name='second acquisition'):
acquired = Update.run(network_impl=mock_network)
self.assertFalse(acquired)

# make sure the data still exists
with self.subTest(name='final data checks'):
response = Epidata.covid_hosp('MA', Epidata.range(20200101, 20210101))
self.assertEqual(response['result'], 1)
self.assertEqual(len(response['epidata']), 1)
78 changes: 78 additions & 0 deletions integrations/server/test_covid_hosp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
"""Integration tests for the `covid_meta` endpoint."""

# standard library
import unittest

# first party
from delphi.epidata.acquisition.covid_hosp.database import Database
from delphi.epidata.client.delphi_epidata import Epidata
import delphi.operations.secrets as secrets


class ServerTests(unittest.TestCase):
"""Tests the `covid_meta` endpoint."""

def setUp(self):
"""Perform per-test setup."""

# use the local instance of the Epidata API
Epidata.BASE_URL = 'http://delphi_web_epidata/epidata/api.php'

# use the local instance of the epidata database
secrets.db.host = 'delphi_database_epidata'
secrets.db.epi = ('user', 'pass')

# clear relevant tables
with Database.connect() as db:
with db.new_cursor() as cur:
cur.execute('truncate table covid_hosp')
cur.execute('truncate table covid_hosp_meta')

def test_query_by_issue(self):
"""Query with and without specifying an issue."""

# insert dummy data
def insert_issue(cur, issue, value):
so_many_nulls = ', '.join(['null'] * 51)
cur.execute(f'''insert into covid_hosp values (
0, {issue}, 'PA', 20201118, {value}, {so_many_nulls}
)''')
with Database.connect() as db:
with db.new_cursor() as cur:
# inserting out of order to test server-side order by
insert_issue(cur, 20201201, 123)
insert_issue(cur, 20201203, 789)
insert_issue(cur, 20201202, 456)

# request without issue (defaulting to latest issue)
with self.subTest(name='no issue (latest)'):
response = Epidata.covid_hosp('PA', 20201118)

self.assertEqual(response['result'], 1)
self.assertEqual(len(response['epidata']), 1)
self.assertEqual(response['epidata'][0]['issue'], 20201203)
self.assertEqual(response['epidata'][0]['hospital_onset_covid'], 789)

# request for specific issue
with self.subTest(name='specific single issue'):
response = Epidata.covid_hosp('PA', 20201118, issues=20201201)

self.assertEqual(response['result'], 1)
self.assertEqual(len(response['epidata']), 1)
self.assertEqual(response['epidata'][0]['issue'], 20201201)
self.assertEqual(response['epidata'][0]['hospital_onset_covid'], 123)

# request for multiple issues
with self.subTest(name='specific multiple issues'):
issues = Epidata.range(20201201, 20201231)
response = Epidata.covid_hosp('PA', 20201118, issues=issues)

self.assertEqual(response['result'], 1)
self.assertEqual(len(response['epidata']), 3)
rows = response['epidata']
self.assertEqual(rows[0]['issue'], 20201201)
self.assertEqual(rows[0]['hospital_onset_covid'], 123)
self.assertEqual(rows[1]['issue'], 20201202)
self.assertEqual(rows[1]['hospital_onset_covid'], 456)
self.assertEqual(rows[2]['issue'], 20201203)
self.assertEqual(rows[2]['hospital_onset_covid'], 789)
19 changes: 19 additions & 0 deletions src/acquisition/covid_hosp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries

- Data source:
https://healthdata.gov/dataset/covid-19-reported-patient-impact-and-hospital-capacity-state-timeseries
- Data dictionary:
https://healthdata.gov/covid-19-reported-patient-impact-and-hospital-capacity-state-data-dictionary
- Geographic resolution: US States plus DC, VI, and PR
- Temporal resolution: daily
- First date: 2020-01-01
- First issue: 2020-11-16

# acquisition overview

1. Fetch the dataset's metadata in JSON format.
1. If the metadata's `revision_timestamp` already appears in the database, then
stop here; otherwise continue.
1. Download the dataset in CSV format as determined by the metadata's `url`
field.
1. In a single transaction, insert the metadata and the dataset into database.
Loading