Skip to content

Commit a9d3617

Browse files
committed
merging changes from katie
2 parents 68b0630 + 58418aa commit a9d3617

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+9097
-5506
lines changed

.bumpversion.cfg

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 0.3.0
2+
current_version = 0.3.13
33
commit = False
44
tag = False
55

dev/local/epidata-refresh.sh

+7-1
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,11 @@ docker images -f "dangling=true" -q | xargs docker rmi >/dev/null 2>&1
5050
docker network ls | grep delphi-net || docker network create --driver bridge delphi-net
5151

5252
LOGS=../driver-logs
53+
<<<<<<< HEAD
5354
NOW=`date "+%Y-%m-%d`
55+
=======
56+
NOW=`date "+%Y-%m-%d"`
57+
>>>>>>> 58418aa5350c6773de5f0b71e50ef754fcc6a65a
5458
5559
if [ "$1" == "database" ]; then
5660
shift
@@ -60,7 +64,9 @@ if [ "$1" == "database" ]; then
6064
docker images delphi_database | grep delphi || \
6165
docker build -t delphi_database -f repos/delphi/operations/dev/docker/database/Dockerfile . || exit 1
6266
docker build -t delphi_database_epidata -f repos/delphi/delphi-epidata/dev/docker/database/epidata/Dockerfile . || exit 1
63-
docker run --rm -p 127.0.0.1:13306:3306 --network delphi-net --name delphi_database_epidata delphi_database_epidata \
67+
docker run --rm -p 127.0.0.1:13306:3306 --network delphi-net --name delphi_database_epidata \
68+
--mount type=bind,source="$(pwd)"/repos/delphi/delphi-epidata,target=/usr/src/app/repos/delphi/delphi-epidata,readonly \
69+
delphi_database_epidata \
6470
>${LOGFILE} 2>&1 &
6571
while true; do
6672
sed -n '/Temporary server stopped/,/mysqld: ready for connections/p' ${LOGFILE} | grep "ready for connections" && break

dev/local/install.sh

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
#!/bin/bash
22

3-
mkdir -p driver/repos/delphi driver-logs
3+
mkdir -p driver/repos/delphi driver-logs/delphi_database_epidata driver-logs/delphi_web_epidata
44
cd driver/repos/delphi
55
git clone https://github.com/cmu-delphi/operations
6-
git clone https://github.com/cmu-delphi/delphi-epidata
6+
git clone https://github.com/cmu-delphi/delphi-epidat
77
git clone https://github.com/cmu-delphi/utils
88
cd ../../
99
ln -s repos/delphi/delphi-epidata/dev/local/epidata-refresh.sh

docs/Gemfile.lock

+4-4
Original file line numberDiff line numberDiff line change
@@ -204,23 +204,23 @@ GEM
204204
rb-fsevent (~> 0.10, >= 0.10.3)
205205
rb-inotify (~> 0.9, >= 0.9.10)
206206
mercenary (0.3.6)
207-
mini_portile2 (2.6.1)
207+
mini_portile2 (2.8.0)
208208
minima (2.5.1)
209209
jekyll (>= 3.5, < 5.0)
210210
jekyll-feed (~> 0.9)
211211
jekyll-seo-tag (~> 2.1)
212212
minitest (5.14.4)
213213
multipart-post (2.1.1)
214-
nokogiri (1.12.5)
215-
mini_portile2 (~> 2.6.1)
214+
nokogiri (1.13.3)
215+
mini_portile2 (~> 2.8.0)
216216
racc (~> 1.4)
217217
octokit (4.20.0)
218218
faraday (>= 0.9)
219219
sawyer (~> 0.8.0, >= 0.5.3)
220220
pathutil (0.16.2)
221221
forwardable-extended (~> 2.6)
222222
public_suffix (4.0.6)
223-
racc (1.5.2)
223+
racc (1.6.0)
224224
rb-fsevent (0.10.4)
225225
rb-inotify (0.10.1)
226226
ffi (~> 1.0)

docs/api/covid_hosp.md

+4-5
Original file line numberDiff line numberDiff line change
@@ -25,14 +25,13 @@ General topics not specific to any particular data source are discussed in the
2525

2626
This data source provides various measures of COVID-19 burden on patients and healthcare in the US.
2727
- Data source: US Department of Health & Human Services (HHS) [COVID-19 Reported Patient Impact and
28-
Hospital Capacity by State Timeseries](https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/g62h-syeh)
29-
and [COVID-19 Reported Patient Impact and Hospital Capacity by State](https://healthdata.gov/dataset/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/6xf2-c3ie)
28+
Hospital Capacity by State Timeseries](https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/g62h-syeh) (published weekly)
29+
and [COVID-19 Reported Patient Impact and Hospital Capacity by State](https://healthdata.gov/dataset/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/6xf2-c3ie) (published on an irregular schedule, every 1-6 days)
3030
- Temporal Resolution: Daily, starting 2020-01-01
3131
- Spatial Resolution: US States plus DC, PR, and VI
32-
- Open access via [Open Data Commons Open Database License (ODbL)](https://opendatacommons.org/licenses/odbl/1.0/)
32+
- Open Access: [Public Domain US Government](https://www.usa.gov/government-works)
3333
- Versioned by Delphi according to "issue" date, which is the date that the
34-
dataset was published by HHS. New issues are expected to be released roughly
35-
weekly.
34+
dataset was published by HHS.
3635

3736
# The API
3837

docs/api/covid_hosp_facility.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ This data source provides various measures of COVID-19 burden on patients and he
2424
- Geographic resolution: healthcare facility (address, city, zip, fips)
2525
- Temporal resolution: weekly (Friday -- Thursday)
2626
- First week: 2020-07-31
27-
- Open access via [Open Data Commons Open Database License (ODbL)](https://opendatacommons.org/licenses/odbl/1.0/)
27+
- Open Access: [Public Domain US Government](https://www.usa.gov/government-works)
2828
- Versioned by Delphi according to the date that the dataset was published by
2929
HHS. New versions are expected to be published roughly weekly.
3030

docs/api/covid_hosp_facility_lookup.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ General topics not specific to any particular data source are discussed in the
2626
This data source provides metadata about healthcare facilities in the US.
2727
- Data source: [US Department of Health & Human Services](https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/anag-cw7u) (HHS)
2828
- Total number of facilities: 4922
29-
- Open access via [Open Data Commons Open Database License (ODbL)](https://opendatacommons.org/licenses/odbl/1.0/)
29+
- Open Access: [Public Domain US Government](https://www.usa.gov/government-works)
3030

3131
# The API
3232

docs/api/covidcast-signals/chng.md

+25-6
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ grand_parent: COVIDcast Epidata API
1717

1818
## Overview
1919

20-
**Notice: This data source was paused on 2021-10-04 so that we can resolve some problems with the data pipeline. [Additional details on this pause are available below](#pipeline-pause).**
20+
**Notice: This data source was inactive between 2021-10-04 and 2021-12-02 to allow us resolve some problems with the data pipeline. We have resumed daily updates and are working on a data patch to fill the gap. [Additional details on this inactive period are available below](#pipeline-pause).**
2121

2222
This data source is based on Change Healthcare claims data that has been
2323
de-identified in accordance with HIPAA privacy regulations. Change Healthcare is
@@ -33,6 +33,8 @@ commercial purposes.
3333
| `smoothed_adj_outpatient_covid` | Same, but with systematic day-of-week effects removed; see [details below](#day-of-week-adjustment) <br/> **Earliest date available:** 2020-02-01 |
3434
| `smoothed_outpatient_cli` | Estimated percentage of outpatient doctor visits primarily about COVID-related symptoms, based on Change Healthcare claims data that has been de-identified in accordance with HIPAA privacy regulations, smoothed in time using a Gaussian linear smoother <br/> **Earliest date available:** 2020-02-01 |
3535
| `smoothed_adj_outpatient_cli` | Same, but with systematic day-of-week effects removed; see [details below](#day-of-week-adjustment) <br/> **Earliest date available:** 2020-02-01 |
36+
| `smoothed_outpatient_flu` | Estimated percentage of outpatient doctor visits with confirmed influenza, based on Change Healthcare claims data that has been de-identified in accordance with HIPAA privacy regulations, smoothed in time using a Gaussian linear smoother <br/> **Earliest issue available:** 2021-12-06 <br/> **Earliest date available:** 2020-02-01 |
37+
| `smoothed_adj_outpatient_flu` | Same, but with systematic day-of-week effects removed; see [details below](#day-of-week-adjustment) <br/> **Earliest issue available:** 2021-12-06 <br/> **Earliest date available:** 2020-02-01 |
3638

3739
## Table of Contents
3840
{: .no_toc .text-delta}
@@ -71,6 +73,19 @@ $$
7173
Y_{it}^{\text{Flu}}\right)}{N_{it}}
7274
$$
7375

76+
### Influenza Illness
77+
78+
The following estimation method is used for the `*_outpatient_flu` signals.
79+
80+
For a fixed location $$i$$ and time $$t$$, let $$Y_{it}$$
81+
denote the Flu counts and let $$N_{it}$$ be the
82+
total count of visits (the *Denominator*). Our estimate of the influenza
83+
percentage is given by
84+
85+
$$
86+
\hat p_{it} = 100 \cdot \frac{Y_{it}}{N_{it}}
87+
$$
88+
7489
### Day-of-Week Adjustment
7590

7691
The fraction of visits due to COVID-19 is dependent on the day of the week. On
@@ -201,11 +216,15 @@ spurious deletions affected all regions and `chng` signals from July 31 to
201216
August 3, 2021, and the affected date range would continue to grow by one day
202217
each day if we allowed the pipeline to continue running.
203218

204-
On October 8, 2021, we paused the `chng` pipeline, and it will remain inactive
205-
until we can identify and implement a fix. In the meantime, the versions with
206-
the deletion markings have been removed, so that default (latest) queries and
207-
queries with as-of set to 2021-10-04 or later will return the
208-
next-most-recently-updated value for these dates.
219+
On October 8, 2021, we paused the `chng` pipeline, and it remained inactive
220+
while we completed a fix. In the meantime, the versions with
221+
the deletion markings were removed, so that default (latest) queries and
222+
queries with as-of set to 2021-10-04 or later submitted during the inactive
223+
period returned the next-most-recently-updated value for these dates.
224+
225+
On December 2, we resumed the `chng` pipeline. We will soon be reconstructing
226+
the missed issues from October 7-December 1, and will update here once that
227+
process is complete.
209228

210229
## Qualifying Conditions
211230

+83
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
---
2+
title: Data Strategy and Execution Workgroup Community Profile Report
3+
parent: Data Sources and Signals
4+
grand_parent: COVIDcast Epidata API
5+
---
6+
7+
# Data Strategy and Execution Workgroup Community Profile Report (CPR)
8+
{: .no_toc}
9+
10+
* **Source name:** `dsew-cpr`
11+
* **Earliest issue available:** 2022-01-28
12+
* **Number of data revisions since 19 May 2020:** 0
13+
* **Date of last change:** Never
14+
* **Available for:** county, msa, state, hhs, nation (see [geography coding docs](../covidcast_geography.md))
15+
* **Time type:** day (see [date format docs](../covidcast_times.md))
16+
* **License:** [Public Domain US Government](https://www.usa.gov/government-works)
17+
18+
The Community Profile Report (CPR) is published by the Data Strategy and Execution Workgroup (DSEW) of the White House COVID-19 Team. For more information, see the [official description and data dictionary at healthdata.gov](https://healthdata.gov/Health/COVID-19-Community-Profile-Report/gqxm-d9w9) for "COVID-19 Community Profile Report".
19+
20+
This data source provides various COVID-19 related metrics, of which we report hospital admissions and vaccinations.
21+
22+
For hospital admissions, other sources of data in COVIDcast include [HHS](hhs.md) and [medical insurance claims](hospital-admissions.md). The CPR differs from these sources in that it is part of the public health surveillance stream (like HHS, unlike claims) but is available at a daily-county level (like claims, unlike HHS). CPR hospital admissions figures at the state level and above are meant to match those from HHS, but are known to differ. See the Limitations section for details.
23+
24+
County, MSA, state, and HHS-level values are pulled directly from CPR when available; nation-level values are aggregated up from the state level.
25+
26+
| Signal | Description |
27+
| --- | --- |
28+
| `confirmed_admissions_covid_1d_7dav` | Number of adult and pediatric confirmed COVID-19 hospital admissions occurring each day. Smoothed using a 7-day average. <br/> **Earliest date available:** 2019-12-16 for state, HHS, and nation; 2021-01-06 for MSA and county |
29+
| `confirmed_admissions_covid_1d_prop_7dav` | Number of adult and pediatric confirmed COVID-19 hospital admissions occurring each day, per 100,000 population. Smoothed using a 7-day average. <br/> **Earliest date available:** 2019-12-16 for state, HHS, and nation; 2021-01-06 for MSA and county |
30+
| `people_full_vaccinated` | "People fully vaccinated includes those who have received two doses of the Pfizer-BioNTech or Moderna vaccine and those who have received one dose of the J&J/Janssen vaccine" - from the CPR data dictionary. <br/> **Earliest date available:** 2021-01-15 at any geo level except MSA and 2021-04-01 at the MSA level.|
31+
| `people_booster_doses` |"The count of people who received a booster dose includes anyone who is fully vaccinated and has received another dose of COVID-19 vaccine since 2021-08-13. This includes people who received booster doses and people who received additional doses." - from the CPR data dictionary. <br/> **Earliest date available:** 2021-11-01 for state, HHS, and nation. Not available below state level. |
32+
| `doses_admin_7dav` | "Doses administered shown by date of report, not date of administration. ... [S]ubmitting entities will have the ability to update or delete previously submitted records using new functionality available in CDC’s Data Clearinghouse. Use of this new functionality may result in fluctuations across metrics as historical data are updated or deleted" - from the CPR data dictionary. Smoothed using a 7-day average. <br/> **Earliest date available:** 2021-04-29 for state, HHS, and nation. Not available below state level. |
33+
| `booster_doses_admin_7dav` | "Doses administered shown by date of report, not date of administration. ... [S]ubmitting entities will have the ability to update or delete previously submitted records using new functionality available in CDC’s Data Clearinghouse. Use of this new functionality may result in fluctuations across metrics as historical data are updated or deleted" - from the CPR data dictionary. "[A] booster dose includes anyone who is fully vaccinated and has received another dose of COVID-19 vaccine since August 13, 2021. This includes people who received booster doses and people who received additional doses." - from the CPR data dictionary. Smoothed using a 7-day average.<br/> **Earliest date available:** 2021-11-01 for state, HHS, and nation. Not available below state level. |
34+
35+
## Table of contents
36+
{: .no_toc .text-delta}
37+
38+
1. TOC
39+
{:toc}
40+
41+
## Estimation
42+
43+
For counts-based fields like hospital admissions, CPR reports rolling sums for the preceding 7 days. The 7-day average signals are computed by Delphi by dividing each sum by 7 and assigning it to the last date in the included range, so e.g. the signal for June 7 is the average of the underlying data for June 1 through 7, inclusive.
44+
45+
The `confirmed_admissions_covid_1d_7dav` signal mirrors the `Confirmed COVID-19 admissions - last 7 days` CPR field for all geographic resolutions except nation. Nation-level admissions is calculated by summing state-level values.
46+
47+
The `doses_admin_7dav` and `booster_doses_admin_7dav` signals mirror the `Doses administered - last 7 days` and `Booster doses administered - last 7 days` CPR fields for all geographic resolutions except nation. Nation-level doses are calculated by summing state-level values.
48+
49+
## Limitations
50+
51+
Nation-level estimates may be inaccurate since aggregations are done using state-level smoothed values instead of raw values. Ideally we would aggregate raw values before smoothing, but the raw values are not accessible in this case.
52+
53+
Because DSEW does not provide updates on weekends, estimates are not available for all dates.
54+
55+
Currently, of all the vaccination signals, county-level data is only available for `people_full_vaccinated`. Until 2021-11-15, several states reported vaccinated people not allocated to any individual county. These unallocated counts were reported using a FIPS code ending with `000` for that state, which is never a FIPS code for a real county.
56+
57+
This data source is susceptible to large corrections that can create strange data effects such as negative counts and sudden changes of 1M+ counts from one day to the next. Many of these corrections are documented in the "High-Visibility Data Notes" section in the first tab of the CPR spreadsheet for that day. To locate the correct spreadsheet for some `time_value` R, consult the following table:
58+
59+
| Signal type | CPR date |
60+
| - | - |
61+
| Hospital Admissions | usually R+2, sometimes R+1 |
62+
| Vaccinations | usually R+1, sometimes R+2 |
63+
64+
Not all CPRs have the same lag between the CPR date (listed in the filename) and the date for a particular signal.
65+
66+
### Differences with HHS reports
67+
68+
An analysis comparing the
69+
[CPR labeled January 5, 2022](https://healthdata.gov/api/views/gqxm-d9w9/files/14ee1150-edf1-4b54-b225-500c8954e6a8?download=true&filename=Community%20Profile%20Report%2020220105.xlsx)
70+
(newest file as of January 6, 2022) with the HHS
71+
[COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries](https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/g62h-syeh)
72+
(downloaded January 6, 2022) suggests that the CPR undercounts the hospital admissions published by HHS by 10-15% or more. We are waiting from clarification from the data provider, but until then, exercise caution when comparing work based on the CPR with work based on HHS reports.
73+
74+
## Lag and Backfill
75+
76+
The report is currently updated daily, excluding weekends. However, this is subject to change; DSEW previously issued updates on a twice-weekly schedule. We check for updates daily.
77+
78+
The CPR is prepared with an internal lag of 1-2 days for most signals. The file is usually posted to healthdata.gov the day after the date listed in the filename, excluding weekends and federal holidays. This results in an effective lag in COVIDcast of 2-4 days, or 5 days when Monday is a holiday.
79+
80+
## Source and Licensing
81+
82+
This indicator mirrors and lightly aggregates data originally published by the Data Strategy and Execution Workgroup via [HealthData.gov](https://healthdata.gov/). As a work of the US government, the original data is in the [public domain](https://www.usa.gov/government-works).
83+

0 commit comments

Comments
 (0)