Skip to content

Docs/v1.7 #169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 130 additions & 1 deletion docs/api/covidcast-signals/quidel.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,141 @@
---
title: Quidel
parent: Inactive Signals
parent: Data Sources and Signals
grand_parent: COVIDcast API
---

# Quidel
{: .no_toc}

* **Source name:** `quidel`

## Table of contents
{: .no_toc .text-delta}

1. TOC
{:toc}

## COVID-19 Tests

* **First issued:** 27 July 2020
* **Number of data revisions since 19 May 2020:** 0
* **Date of last change:** Never
* **Available for:** hrr, msa, state (see [geography coding docs](../covidcast_geography.md))

Data source based on COVID-19 Antigen tests, provided to us by Quidel, Inc. When
a patient (whether at a doctor’s office, clinic, or hospital) has COVID-like
symptoms, doctors may order an antigen test. An antigen test can detect parts of
the virus that are present during an active infection. This is in contrast with
antibody tests, which detect parts of the immune system that react to the virus,
but which persist long after the infection has passed. Quidel began providing us
with test data starting May 9, 2020, and data volume increased to statistically
meaningful levels starting May 26, 2020.

| Signal | Description |
| --- | --- |
| `covid_ag_raw_pct_positive` | Percentage of antigen tests that were positive for COVID-19, with no smoothing applied. |
| `covid_ag_smoothed_pct_positive` | Percentage of antigen tests that were positive for COVID-19, smoothed by pooling together the last 7 days of tests. |

### Estimation

The source data from which we derive our estimates contains a number of features
for every test, including localization at 5-digit Zip Code level, a TestDate and
StorageDate, patient age, and unique identifiers for the device on which the
test was performed, the individual test, and the result. Multiple tests are
stored on each device.

Let $$n$$ be the number of total COVID tests taken over a given time period and a
given location (the test result can be negative, positive, or invalid). Let $$x$$ be the
number of tests taken with positive results in this location over the given time
period. We are interested in estimating the percentage of positive tests which
is defined as:

$$
p = \frac{100 x}{n}
$$

We estimate p across 3 temporal-spatial aggregation schemes:
- daily, at the MSA (metropolitan statistical area) level;
- daily, at the HRR (hospital referral region) level;
- daily, at the state level.

**MSA and HRR levels**: In a given MSA or HRR, suppose $$N$$ COVID tests are taken
in a certain time period, $$X$$ is the number of tests taken with positive
results. If $$N \geq 50$$, we simply use:

$$
p = \frac{100 X}{N}
$$

If $$N < 50$$, we lend $$50 - N$$ fake samples from its home state to shrink the
estimate to the state's mean, which means:

$$
p = 100 \left( \frac{N}{50} \frac{X}{N} + \frac{50 - N}{50} \frac{X_s}{N_s} \right)
$$

where $$N_s, X_s$$ are the number of COVID tests and the number of COVID tests
taken with positive results taken in its home state in the same time period.

**State level**: the states with fewer than 50 tests are discarded. For the
rest of the states with sufficient samples,

$$
p = \frac{100 X}{N}
$$

#### Standard Error

We assume the estimates for each time point follow a binomial distribution. The
estimated standard error then is:

$$
\text{se} = \sqrt{ \frac{p(1-p)}{N} }
$$

#### Smoothing

Smoothed estimates are formed by pooling data over time. That is, daily, for
each location, we first pool all data available in that location over the last 7
days, and we then recompute everything described in the last two
subsections. Pooling in this way makes estimates available in more geographic
areas, as many areas report very few tests per day, but have enough data to
report when 7 days are considered.

### Limitations

This data source is based on data provided to us by a lab testing company. They can report on a portion of United States COVID-19 Antigen tests, but not all of them, and so this source only represents those tests known to them. Their coverage may vary across the United States.

### Missingness

When fewer than 50 tests are reported in a state on a specific day, no data is
reported for that area on that day; an API query for all reported states on that
day will not include it.

When fewer than 50 tests are reported in an HRR or MSA on a specific day, and
not enough samples can be filled in from the parent state, no data is reported
for that area on that day; an API query for all reported geographic areas on
that day will not include it.

### Lag and Backfill

Because testing centers may report their data to Quidel several days after they
occur, these signals are typically available with 5-6 days of lag. This
means that estimates for a specific day first become available 5-6 days
later.

The amount of lag in reporting can vary, and not all tests are reported with the
same lag. After we first report estimates for a specific date, further data may
arrive about tests that occurred on that date, sometimes six weeks later or
more. When this happens, we issue new estimates for those dates. This means that
a reported estimate for, say, June 10th may first be available in the API on
June 14th and subsequently revised on June 16th.


## Flu Tests

* **First issued:** 20 April 2020
* **Last issued:** 19 May 2020
* **Number of data revisions since 19 May 2020:** 0
* **Date of last change:** Never
* **Available for:** msa, state (see [geography coding docs](../covidcast_geography.md))
Expand Down
29 changes: 15 additions & 14 deletions docs/api/covidcast_signals.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,21 @@ data in this API are listed in the [API changelog](covidcast_changelog.md).
The following signals are currently displayed on [the public COVIDcast
map](https://covidcast.cmu.edu/):

| Name | Source | Signal |
| --- | --- | --- |
| Doctor's Visits | [`doctor-visits`](covidcast-signals/doctor-visits.md) | `smoothed_adj_cli` |
| Hospital Admissions | [`hospital-admissions`](covidcast-signals/hospital-admissions.md) | `smoothed_adj_covid19` |
| Symptoms (Facebook) | [`fb-survey`](covidcast-signals/fb-survey.md) | `smoothed_cli` |
| Symptoms in Community (Facebook) | [`fb-survey`](covidcast-signals/fb-survey.md) | `smoothed_hh_cmnty_cli` |
| Away from Home 6hr+ (SafeGraph) | [`safegraph`](covidcast-signals/safegraph.md) | `full_time_work_prop` |
| Away from Home 3-6hr (SafeGraph) | [`safegraph`](covidcast-signals/safegraph.md) | `part_time_work_prop` |
| Search Trends (Google) | [`ght`](covidcast-signals/ght.md) | `smoothed_search` |
| Combined | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `nmf_day_doc_fbc_fbs_ght` |
| Cases | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `confirmed_7dav_incidence_num` |
| Cases per 100,000 People | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `confirmed_7dav_incidence_prop` |
| Deaths | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `deaths_7dav_incidence_num` |
| Deaths per 100,000 People | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `deaths_7dav_incidence_prop` |
| Kind | Name | Source | Signal |
| ---- | ---- | ------ | ------ |
| Public Behavior | Away from Home 6hr+ (SafeGraph) | [`safegraph`](covidcast-signals/safegraph.md) | `full_time_work_prop` |
| Public Behavior | Away from Home 3-6hr (SafeGraph) | [`safegraph`](covidcast-signals/safegraph.md) | `part_time_work_prop` |
| Public Behavior | Search Trends (Google) | [`ght`](covidcast-signals/ght.md) | `smoothed_search` |
| Early Indicators | Symptoms (Facebook) | [`fb-survey`](covidcast-signals/fb-survey.md) | `smoothed_cli` |
| Early Indicators | Symptoms in Community (Facebook) | [`fb-survey`](covidcast-signals/fb-survey.md) | `smoothed_hh_cmnty_cli` |
| Early Indicators | Doctor's Visits | [`doctor-visits`](covidcast-signals/doctor-visits.md) | `smoothed_adj_cli` |
| Early Indicators | Combined | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `nmf_day_doc_fbc_fbs_ght` |
| Late Indicators | Test Positivity Rate | [`quidel`](covidcast-signals/quidel.md) | `covid_ag_smoothed_pct_positive` |
| Late Indicators | Cases | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `confirmed_7dav_incidence_num` |
| Late Indicators | Cases per 100,000 People | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `confirmed_7dav_incidence_prop` |
| Late Indicators | Deaths | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `deaths_7dav_incidence_num` |
| Late Indicators | Deaths per 100,000 People | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `deaths_7dav_incidence_prop` |
| Late Indicators | Hospital Admissions | [`hospital-admissions`](covidcast-signals/hospital-admissions.md) | `smoothed_adj_covid19` |

## All Available Sources and Signals

Expand Down
13 changes: 6 additions & 7 deletions docs/symptom-survey/survey-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,19 +27,18 @@ where the data is hosted.

## Naming Conventions

All dates in filenames are of the form `YYYY_mm_dd`.

Cumulative files:

cvid_responses_{from}_-_{to}.csv.gz
{YYYY_mm}.tar

Incremental files:

cvid_responses_{for}_recordedby_{recorded}.csv
cvid_responses_{for}_recordedby_{recorded}.csv.gz

`from`, `to`, and `for` refer to the day the survey response was started, in the
Pacific time zone (UTC - 7). `recorded` refers to the day survey data was
retrieved; see the [lag policy](#lag-policy) for more details.
Dates in incremental filenames are of the form `YYYY_mm_dd`. `for` refers to the
day the survey response was started, in the Pacific time zone (UTC -
7). `recorded` refers to the day survey data was retrieved; see the [lag
policy](#lag-policy) for more details.

Every day, we write response files for *all* days of data, with today's
`recorded` date. You need only load the most recent set of `recorded` files to
Expand Down