diff --git a/docs/api/covidcast-signals/quidel.md b/docs/api/covidcast-signals/quidel.md index b2552fde5..ba6f0c5f3 100644 --- a/docs/api/covidcast-signals/quidel.md +++ b/docs/api/covidcast-signals/quidel.md @@ -1,12 +1,141 @@ --- title: Quidel -parent: Inactive Signals +parent: Data Sources and Signals grand_parent: COVIDcast API --- # Quidel +{: .no_toc} * **Source name:** `quidel` + +## Table of contents +{: .no_toc .text-delta} + +1. TOC +{:toc} + +## COVID-19 Tests + +* **First issued:** 27 July 2020 +* **Number of data revisions since 19 May 2020:** 0 +* **Date of last change:** Never +* **Available for:** hrr, msa, state (see [geography coding docs](../covidcast_geography.md)) + +Data source based on COVID-19 Antigen tests, provided to us by Quidel, Inc. When +a patient (whether at a doctor’s office, clinic, or hospital) has COVID-like +symptoms, doctors may order an antigen test. An antigen test can detect parts of +the virus that are present during an active infection. This is in contrast with +antibody tests, which detect parts of the immune system that react to the virus, +but which persist long after the infection has passed. Quidel began providing us +with test data starting May 9, 2020, and data volume increased to statistically +meaningful levels starting May 26, 2020. + +| Signal | Description | +| --- | --- | +| `covid_ag_raw_pct_positive` | Percentage of antigen tests that were positive for COVID-19, with no smoothing applied. | +| `covid_ag_smoothed_pct_positive` | Percentage of antigen tests that were positive for COVID-19, smoothed by pooling together the last 7 days of tests. | + +### Estimation + +The source data from which we derive our estimates contains a number of features +for every test, including localization at 5-digit Zip Code level, a TestDate and +StorageDate, patient age, and unique identifiers for the device on which the +test was performed, the individual test, and the result. Multiple tests are +stored on each device. + +Let $$n$$ be the number of total COVID tests taken over a given time period and a +given location (the test result can be negative, positive, or invalid). Let $$x$$ be the +number of tests taken with positive results in this location over the given time +period. We are interested in estimating the percentage of positive tests which +is defined as: + +$$ +p = \frac{100 x}{n} +$$ + +We estimate p across 3 temporal-spatial aggregation schemes: +- daily, at the MSA (metropolitan statistical area) level; +- daily, at the HRR (hospital referral region) level; +- daily, at the state level. + +**MSA and HRR levels**: In a given MSA or HRR, suppose $$N$$ COVID tests are taken +in a certain time period, $$X$$ is the number of tests taken with positive +results. If $$N \geq 50$$, we simply use: + +$$ +p = \frac{100 X}{N} +$$ + +If $$N < 50$$, we lend $$50 - N$$ fake samples from its home state to shrink the +estimate to the state's mean, which means: + +$$ +p = 100 \left( \frac{N}{50} \frac{X}{N} + \frac{50 - N}{50} \frac{X_s}{N_s} \right) +$$ + +where $$N_s, X_s$$ are the number of COVID tests and the number of COVID tests +taken with positive results taken in its home state in the same time period. + +**State level**: the states with fewer than 50 tests are discarded. For the +rest of the states with sufficient samples, + +$$ +p = \frac{100 X}{N} +$$ + +#### Standard Error + +We assume the estimates for each time point follow a binomial distribution. The +estimated standard error then is: + +$$ +\text{se} = \sqrt{ \frac{p(1-p)}{N} } +$$ + +#### Smoothing + +Smoothed estimates are formed by pooling data over time. That is, daily, for +each location, we first pool all data available in that location over the last 7 +days, and we then recompute everything described in the last two +subsections. Pooling in this way makes estimates available in more geographic +areas, as many areas report very few tests per day, but have enough data to +report when 7 days are considered. + +### Limitations + +This data source is based on data provided to us by a lab testing company. They can report on a portion of United States COVID-19 Antigen tests, but not all of them, and so this source only represents those tests known to them. Their coverage may vary across the United States. + +### Missingness + +When fewer than 50 tests are reported in a state on a specific day, no data is +reported for that area on that day; an API query for all reported states on that +day will not include it. + +When fewer than 50 tests are reported in an HRR or MSA on a specific day, and +not enough samples can be filled in from the parent state, no data is reported +for that area on that day; an API query for all reported geographic areas on +that day will not include it. + +### Lag and Backfill + +Because testing centers may report their data to Quidel several days after they +occur, these signals are typically available with 5-6 days of lag. This +means that estimates for a specific day first become available 5-6 days +later. + +The amount of lag in reporting can vary, and not all tests are reported with the +same lag. After we first report estimates for a specific date, further data may +arrive about tests that occurred on that date, sometimes six weeks later or +more. When this happens, we issue new estimates for those dates. This means that +a reported estimate for, say, June 10th may first be available in the API on +June 14th and subsequently revised on June 16th. + + +## Flu Tests + +* **First issued:** 20 April 2020 +* **Last issued:** 19 May 2020 * **Number of data revisions since 19 May 2020:** 0 * **Date of last change:** Never * **Available for:** msa, state (see [geography coding docs](../covidcast_geography.md)) diff --git a/docs/api/covidcast_signals.md b/docs/api/covidcast_signals.md index e6b251f8a..5e89d7ae4 100644 --- a/docs/api/covidcast_signals.md +++ b/docs/api/covidcast_signals.md @@ -21,20 +21,21 @@ data in this API are listed in the [API changelog](covidcast_changelog.md). The following signals are currently displayed on [the public COVIDcast map](https://covidcast.cmu.edu/): -| Name | Source | Signal | -| --- | --- | --- | -| Doctor's Visits | [`doctor-visits`](covidcast-signals/doctor-visits.md) | `smoothed_adj_cli` | -| Hospital Admissions | [`hospital-admissions`](covidcast-signals/hospital-admissions.md) | `smoothed_adj_covid19` | -| Symptoms (Facebook) | [`fb-survey`](covidcast-signals/fb-survey.md) | `smoothed_cli` | -| Symptoms in Community (Facebook) | [`fb-survey`](covidcast-signals/fb-survey.md) | `smoothed_hh_cmnty_cli` | -| Away from Home 6hr+ (SafeGraph) | [`safegraph`](covidcast-signals/safegraph.md) | `full_time_work_prop` | -| Away from Home 3-6hr (SafeGraph) | [`safegraph`](covidcast-signals/safegraph.md) | `part_time_work_prop` | -| Search Trends (Google) | [`ght`](covidcast-signals/ght.md) | `smoothed_search` | -| Combined | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `nmf_day_doc_fbc_fbs_ght` | -| Cases | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `confirmed_7dav_incidence_num` | -| Cases per 100,000 People | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `confirmed_7dav_incidence_prop` | -| Deaths | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `deaths_7dav_incidence_num` | -| Deaths per 100,000 People | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `deaths_7dav_incidence_prop` | +| Kind | Name | Source | Signal | +| ---- | ---- | ------ | ------ | +| Public Behavior | Away from Home 6hr+ (SafeGraph) | [`safegraph`](covidcast-signals/safegraph.md) | `full_time_work_prop` | +| Public Behavior | Away from Home 3-6hr (SafeGraph) | [`safegraph`](covidcast-signals/safegraph.md) | `part_time_work_prop` | +| Public Behavior | Search Trends (Google) | [`ght`](covidcast-signals/ght.md) | `smoothed_search` | +| Early Indicators | Symptoms (Facebook) | [`fb-survey`](covidcast-signals/fb-survey.md) | `smoothed_cli` | +| Early Indicators | Symptoms in Community (Facebook) | [`fb-survey`](covidcast-signals/fb-survey.md) | `smoothed_hh_cmnty_cli` | +| Early Indicators | Doctor's Visits | [`doctor-visits`](covidcast-signals/doctor-visits.md) | `smoothed_adj_cli` | +| Early Indicators | Combined | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `nmf_day_doc_fbc_fbs_ght` | +| Late Indicators | Test Positivity Rate | [`quidel`](covidcast-signals/quidel.md) | `covid_ag_smoothed_pct_positive` | +| Late Indicators | Cases | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `confirmed_7dav_incidence_num` | +| Late Indicators | Cases per 100,000 People | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `confirmed_7dav_incidence_prop` | +| Late Indicators | Deaths | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `deaths_7dav_incidence_num` | +| Late Indicators | Deaths per 100,000 People | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `deaths_7dav_incidence_prop` | +| Late Indicators | Hospital Admissions | [`hospital-admissions`](covidcast-signals/hospital-admissions.md) | `smoothed_adj_covid19` | ## All Available Sources and Signals diff --git a/docs/symptom-survey/survey-files.md b/docs/symptom-survey/survey-files.md index ae156e1a2..4a3e86237 100644 --- a/docs/symptom-survey/survey-files.md +++ b/docs/symptom-survey/survey-files.md @@ -27,19 +27,18 @@ where the data is hosted. ## Naming Conventions -All dates in filenames are of the form `YYYY_mm_dd`. - Cumulative files: - cvid_responses_{from}_-_{to}.csv.gz + {YYYY_mm}.tar Incremental files: - cvid_responses_{for}_recordedby_{recorded}.csv + cvid_responses_{for}_recordedby_{recorded}.csv.gz -`from`, `to`, and `for` refer to the day the survey response was started, in the -Pacific time zone (UTC - 7). `recorded` refers to the day survey data was -retrieved; see the [lag policy](#lag-policy) for more details. +Dates in incremental filenames are of the form `YYYY_mm_dd`. `for` refers to the +day the survey response was started, in the Pacific time zone (UTC - +7). `recorded` refers to the day survey data was retrieved; see the [lag +policy](#lag-policy) for more details. Every day, we write response files for *all* days of data, with today's `recorded` date. You need only load the most recent set of `recorded` files to