Skip to content

Commit 5ff844a

Browse files
aysim319minhkhul
andauthored
Nhsn indicator (#2080)
* first implimentation * figuring out metric/sensor to use * first take * working on test * added preliminary data source * adding indicator for gitaction * lint * replace with setup.py * more lint * fixed date range for test * lint * Update DETAILS.md * fix output data * analysis in progress * lint and suggestions * more analysis * add hhs geo aggregate * more analysis * update DETAILS.md * Update nhsn/params.json.template Co-authored-by: minhkhul <[email protected]> * Update nhsn/params.json.template Co-authored-by: minhkhul <[email protected]> * cleaning up anaylsis * rename geo_id column name * suggested / needed to deploy * adding default locations for deployment * fix geo aggregation for hhs Co-authored-by: minhkhul <[email protected]> * Update nhsn/params.json.template Co-authored-by: minhkhul <[email protected]> * lint * needed to add hhs in to geo for tests * fixed and added more plots * cleaning up notebook and adding details * new signal name * needed to update the type dict also --------- Co-authored-by: minhkhul <[email protected]>
1 parent 9f1d4a0 commit 5ff844a

File tree

25 files changed

+12899
-1
lines changed

25 files changed

+12899
-1
lines changed

.github/workflows/python-ci.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ jobs:
3939
dir: "delphi_quidel_covidtest"
4040
- package: "sir_complainsalot"
4141
dir: "delphi_sir_complainsalot"
42+
- package: "nhsn"
43+
dir: "delphi_nhsn"
4244
defaults:
4345
run:
4446
working-directory: ${{ matrix.package }}

Jenkinsfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
- TODO: #527 Get this list automatically from python-ci.yml at runtime.
1111
*/
1212

13-
def indicator_list = ['backfill_corrections', 'changehc', 'claims_hosp', 'google_symptoms', 'hhs_hosp', 'nchs_mortality', 'quidel_covidtest', 'sir_complainsalot', 'doctor_visits', 'nwss_wastewater', 'nssp']
13+
def indicator_list = ['backfill_corrections', 'changehc', 'claims_hosp', 'google_symptoms', 'hhs_hosp', 'nchs_mortality', 'quidel_covidtest', 'sir_complainsalot', 'doctor_visits', 'nwss_wastewater', 'nssp', 'nhsn']
1414
def build_package_main = [:]
1515
def build_package_prod = [:]
1616
def deploy_staging = [:]
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
{
2+
"common": {
3+
"export_dir": "/common/covidcast/receiving/nhsn",
4+
"backup_dir": "./raw_data_backups",
5+
"log_filename": "/var/log/indicators/nhsn.log",
6+
"log_exceptions": false
7+
},
8+
"indicator": {
9+
"wip_signal": true,
10+
"static_file_dir": "./static",
11+
"socrata_token": "{{ nhsn_token }}"
12+
},
13+
"validation": {
14+
"common": {
15+
"data_source": "nhsn",
16+
"api_credentials": "{{ validation_api_key }}",
17+
"span_length": 15,
18+
"min_expected_lag": {"all": "7"},
19+
"max_expected_lag": {"all": "13"},
20+
"dry_run": true,
21+
"suppressed_errors": []
22+
},
23+
"static": {
24+
"minimum_sample_size": 0,
25+
"missing_se_allowed": true,
26+
"missing_sample_size_allowed": true
27+
},
28+
"dynamic": {}
29+
}
30+
}

ansible/templates/sir_complainsalot-params-prod.json.j2

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,10 @@
4545
"nssp": {
4646
"max_age":19,
4747
"maintainers": []
48+
},
49+
"nhsn": {
50+
"max_age":19,
51+
"maintainers": []
4852
}
4953
}
5054
}

ansible/vars.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,9 @@ nwss_wastewater_token: "{{ vault_cdc_socrata_token }}"
5959
# nssp
6060
nssp_token: "{{ vault_cdc_socrata_token }}"
6161

62+
# nhsn
63+
nhsn_token: "{{ vault_cdc_socrata_token }}"
64+
6265
# SirCAL
6366
sir_complainsalot_api_key: "{{ vault_sir_complainsalot_api_key }}"
6467
sir_complainsalot_slack_token: "{{ vault_sir_complainsalot_slack_token }}"

nhsn/DETAILS.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# NHSN data
2+
3+
We import the NHSN Weekly Hospital Respiratory Data
4+
5+
There are 2 sources we grab data from for nhsn:
6+
Note that they are from the same source, but with different cadence and one reporting preliminary data for the previous reporting week
7+
8+
Primary source: https://data.cdc.gov/Public-Health-Surveillance/Weekly-Hospital-Respiratory-Data-HRD-Metrics-by-Ju/ua7e-t2fy/about_data
9+
Secondary (preliminary source): https://data.cdc.gov/Public-Health-Surveillance/Weekly-Hospital-Respiratory-Data-HRD-Metrics-by-Ju/mpgq-jmmr/about_data
10+
11+
## Geographical Levels
12+
* `state`: reported using two-letter postal code
13+
* `national`: just `us` for now
14+
* `hhs`: reporting using Geomapper with state level
15+
16+
## Metrics
17+
* `confirmed_admissions_covid`: total number of confirmed admission for covid
18+
* `confirmed_admissions_flu`: total number of confirmed admission for flu
19+
* `prelim_confirmed_admissions_covid`: total number of confirmed admission for covid from preliminary source
20+
* `prelim_confirmed_admissions_flu`: total number of confirmed admission for flu from preliminary source
21+
22+
## Additional Notes
23+
HHS dataset and NHSN dataset covers the equivalent data of hospital admission for covid and flu.
24+
As a general trend, HHS and NHSN data matches pretty well.
25+
However, there are differences between some of the states, notably for GA (untill 2023), LA, NV, PR (late 2020-early 2021), TN all have HHS substantially lower, HHS is substantially lower than NHSN.
26+
27+
Some states have this spike in NHSN or hhs where the other source doesn't have a spike and spikes don't happen at the same time_values across states
28+
29+
More details regarding the analysis is available in the [analysis.ipynb](notebook%2Fanalysis.ipynb)
30+
(may require installing additional packages to work)

nhsn/Makefile

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
.PHONY = venv, lint, test, clean
2+
3+
dir = $(shell find ./delphi_* -name __init__.py | grep -o 'delphi_[_[:alnum:]]*' | head -1)
4+
venv:
5+
python3.8 -m venv env
6+
7+
install: venv
8+
. env/bin/activate; \
9+
pip install wheel ; \
10+
pip install -e ../_delphi_utils_python ;\
11+
pip install -e .
12+
13+
install-ci: venv
14+
. env/bin/activate; \
15+
pip install wheel ; \
16+
pip install ../_delphi_utils_python ;\
17+
pip install .
18+
19+
lint:
20+
. env/bin/activate; pylint $(dir) --rcfile=../pyproject.toml
21+
. env/bin/activate; pydocstyle $(dir)
22+
23+
format:
24+
. env/bin/activate; darker $(dir)
25+
26+
test:
27+
. env/bin/activate ;\
28+
(cd tests && ../env/bin/pytest --cov=$(dir) --cov-report=term-missing)
29+
30+
clean:
31+
rm -rf env
32+
rm -f params.json

nhsn/delphi_nhsn/__init__.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# -*- coding: utf-8 -*-
2+
"""Module to pull and clean indicators from the XXXXX source.
3+
4+
This file defines the functions that are made public by the module. As the
5+
module is intended to be executed though the main method, these are primarily
6+
for testing.
7+
"""
8+
9+
from __future__ import absolute_import
10+
11+
from . import run
12+
13+
__version__ = "0.1.0"

nhsn/delphi_nhsn/__main__.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# -*- coding: utf-8 -*-
2+
"""Call the function run_module when executed.
3+
4+
This file indicates that calling the module (`python -m MODULE_NAME`) will
5+
call the function `run_module` found within the run.py file. There should be
6+
no need to change this template.
7+
"""
8+
9+
from delphi_utils import read_params
10+
11+
from .run import run_module # pragma: no cover
12+
13+
run_module(read_params()) # pragma: no cover

nhsn/delphi_nhsn/constants.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
"""Registry for signal names."""
2+
3+
GEOS = ["state", "nation", "hhs"]
4+
5+
# column name from socrata
6+
TOTAL_ADMISSION_COVID_API = "totalconfc19newadm"
7+
TOTAL_ADMISSION_FLU_API = "totalconfflunewadm"
8+
9+
SIGNALS_MAP = {
10+
"confirmed_admissions_covid_ew": TOTAL_ADMISSION_COVID_API,
11+
"confirmed_admissions_flu_ew": TOTAL_ADMISSION_FLU_API,
12+
}
13+
14+
TYPE_DICT = {
15+
"timestamp": "datetime64[ns]",
16+
"geo_id": str,
17+
"confirmed_admissions_covid_ew": float,
18+
"confirmed_admissions_flu_ew": float,
19+
}
20+
21+
# signal mapping for secondary, preliminary source
22+
PRELIM_SIGNALS_MAP = {
23+
"confirmed_admissions_covid_ew_prelim": TOTAL_ADMISSION_COVID_API,
24+
"confirmed_admissions_flu_ew_prelim": TOTAL_ADMISSION_FLU_API,
25+
}
26+
PRELIM_TYPE_DICT = {
27+
"timestamp": "datetime64[ns]",
28+
"geo_id": str,
29+
"confirmed_admissions_covid_ew_prelim": float,
30+
"confirmed_admissions_flu_ew_prelim": float,
31+
}

nhsn/delphi_nhsn/pull.py

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# -*- coding: utf-8 -*-
2+
"""Functions for pulling NSSP ER data."""
3+
import logging
4+
from typing import Optional
5+
6+
import pandas as pd
7+
from delphi_utils import create_backup_csv
8+
from sodapy import Socrata
9+
10+
from .constants import PRELIM_SIGNALS_MAP, PRELIM_TYPE_DICT, SIGNALS_MAP, TYPE_DICT
11+
12+
13+
def pull_data(socrata_token: str, dataset_id: str):
14+
"""Pull data from Socrata API."""
15+
client = Socrata("data.cdc.gov", socrata_token)
16+
results = []
17+
offset = 0
18+
limit = 50000 # maximum limit allowed by SODA 2.0
19+
while True:
20+
page = client.get(dataset_id, limit=limit, offset=offset)
21+
if not page:
22+
break # exit the loop if no more results
23+
results.extend(page)
24+
offset += limit
25+
26+
df = pd.DataFrame.from_records(results)
27+
return df
28+
29+
30+
def pull_nhsn_data(socrata_token: str, backup_dir: str, custom_run: bool, logger: Optional[logging.Logger] = None):
31+
"""Pull the latest NSSP ER visits data, and conforms it into a dataset.
32+
33+
The output dataset has:
34+
35+
- Each row corresponds to a single observation
36+
- Each row additionally has columns for the signals in SIGNALS
37+
38+
Parameters
39+
----------
40+
socrata_token: str
41+
My App Token for pulling the NHSN data
42+
backup_dir: str
43+
Directory to which to save raw backup data
44+
custom_run: bool
45+
Flag indicating if the current run is a patch. If so, don't save any data to disk
46+
logger: Optional[logging.Logger]
47+
logger object
48+
49+
Returns
50+
-------
51+
pd.DataFrame
52+
Dataframe as described above.
53+
"""
54+
# Pull data from Socrata API
55+
df = pull_data(socrata_token, dataset_id="ua7e-t2fy")
56+
57+
keep_columns = list(TYPE_DICT.keys())
58+
59+
if not df.empty:
60+
create_backup_csv(df, backup_dir, custom_run, logger=logger)
61+
62+
df = df.rename(columns={"weekendingdate": "timestamp", "jurisdiction": "geo_id"})
63+
64+
for signal, col_name in SIGNALS_MAP.items():
65+
df[signal] = df[col_name]
66+
67+
df = df[keep_columns]
68+
df["geo_id"] = df["geo_id"].str.lower()
69+
df.loc[df["geo_id"] == "usa", "geo_id"] = "us"
70+
df = df.astype(TYPE_DICT)
71+
else:
72+
df = pd.DataFrame(columns=keep_columns)
73+
74+
return df
75+
76+
77+
def pull_preliminary_nhsn_data(
78+
socrata_token: str, backup_dir: str, custom_run: bool, logger: Optional[logging.Logger] = None
79+
):
80+
"""Pull the latest NSSP ER visits data, and conforms it into a dataset.
81+
82+
The output dataset has:
83+
84+
- Each row corresponds to a single observation
85+
- Each row additionally has columns for the signals in SIGNALS
86+
87+
Parameters
88+
----------
89+
socrata_token: str
90+
My App Token for pulling the NHSN data
91+
backup_dir: str
92+
Directory to which to save raw backup data
93+
custom_run: bool
94+
Flag indicating if the current run is a patch. If so, don't save any data to disk
95+
logger: Optional[logging.Logger]
96+
logger object
97+
98+
Returns
99+
-------
100+
pd.DataFrame
101+
Dataframe as described above.
102+
"""
103+
# Pull data from Socrata API
104+
df = pull_data(socrata_token, dataset_id="mpgq-jmmr")
105+
106+
keep_columns = list(PRELIM_TYPE_DICT.keys())
107+
108+
if not df.empty:
109+
create_backup_csv(df, backup_dir, custom_run, sensor="prelim", logger=logger)
110+
111+
df = df.rename(columns={"weekendingdate": "timestamp", "jurisdiction": "geo_id"})
112+
113+
for signal, col_name in PRELIM_SIGNALS_MAP.items():
114+
df[signal] = df[col_name]
115+
116+
df = df[keep_columns]
117+
df = df.astype(PRELIM_TYPE_DICT)
118+
df["geo_id"] = df["geo_id"].str.lower()
119+
df.loc[df["geo_id"] == "usa", "geo_id"] = "us"
120+
else:
121+
df = pd.DataFrame(columns=keep_columns)
122+
123+
return df

0 commit comments

Comments
 (0)