Merge pull request #169 from krivard/docs/v1.7

capnrefsmmat · web-flow · commit e7cd1012d204 · 2020-08-04T13:45:46.000-04:00
Docs/v1.7
diff --git a/docs/api/covidcast-signals/quidel.md b/docs/api/covidcast-signals/quidel.md
@@ -1,12 +1,141 @@
 ---
 title: Quidel
-parent: Inactive Signals
+parent: Data Sources and Signals
 grand_parent: COVIDcast API
 ---
 
 # Quidel
+{: .no_toc}
 
 * **Source name:** `quidel`
+
+## Table of contents
+{: .no_toc .text-delta}
+
+1. TOC
+{:toc}
+
+## COVID-19 Tests
+
+* **First issued:** 27 July 2020 
+* **Number of data revisions since 19 May 2020:** 0
+* **Date of last change:** Never
+* **Available for:** hrr, msa, state (see [geography coding docs](../covidcast_geography.md))
+
+Data source based on COVID-19 Antigen tests, provided to us by Quidel, Inc. When
+a patient (whether at a doctor’s office, clinic, or hospital) has COVID-like
+symptoms, doctors may order an antigen test. An antigen test can detect parts of
+the virus that are present during an active infection. This is in contrast with
+antibody tests, which detect parts of the immune system that react to the virus,
+but which persist long after the infection has passed. Quidel began providing us
+with test data starting May 9, 2020, and data volume increased to statistically
+meaningful levels starting May 26, 2020.
+
+| Signal | Description |
+| --- | --- |
+| `covid_ag_raw_pct_positive` | Percentage of antigen tests that were positive for COVID-19, with no smoothing applied. |
+| `covid_ag_smoothed_pct_positive` | Percentage of antigen tests that were positive for COVID-19, smoothed by pooling together the last 7 days of tests. |
+
+### Estimation
+
+The source data from which we derive our estimates contains a number of features
+for every test, including localization at 5-digit Zip Code level, a TestDate and
+StorageDate, patient age, and unique identifiers for the device on which the
+test was performed, the individual test, and the result. Multiple tests are
+stored on each device.
+
+Let $$n$$ be the number of total COVID tests taken over a given time period and a
+given location (the test result can be negative, positive, or invalid). Let $$x$$ be the
+number of tests taken with positive results in this location over the given time
+period. We are interested in estimating the percentage of positive tests which
+is defined as:
+
+$$
+p = \frac{100 x}{n}
+$$
+
+We estimate p across 3 temporal-spatial aggregation schemes:
+- daily, at the MSA (metropolitan statistical area) level;
+- daily, at the HRR (hospital referral region) level;
+- daily, at the state level.
+
+**MSA and HRR levels**: In a given MSA or HRR, suppose $$N$$ COVID tests are taken
+in a certain time period, $$X$$ is the number of tests taken with positive
+results. If $$N \geq 50$$, we simply use:
+
+$$
+p = \frac{100 X}{N}
+$$
+
+If $$N < 50$$, we lend $$50 - N$$ fake samples from its home state to shrink the
+estimate to the state's mean, which means:
+
+$$
+p = 100 \left( \frac{N}{50} \frac{X}{N} + \frac{50 - N}{50}  \frac{X_s}{N_s} \right) 
+$$
+
+where $$N_s, X_s$$ are the number of COVID tests and the number of COVID tests
+taken with positive results taken in its home state in the same time period.
+
+**State level**: the states with fewer than 50 tests are discarded. For the
+rest of the states with sufficient samples,
+
+$$
+p = \frac{100 X}{N}
+$$
+
+#### Standard Error
+
+We assume the estimates for each time point follow a binomial distribution. The
+estimated standard error then is:
+
+$$
+\text{se} = \sqrt{ \frac{p(1-p)}{N} } 
+$$
+
+#### Smoothing
+
+Smoothed estimates are formed by pooling data over time. That is, daily, for
+each location, we first pool all data available in that location over the last 7
+days, and we then recompute everything described in the last two
+subsections. Pooling in this way makes estimates available in more geographic
+areas, as many areas report very few tests per day, but have enough data to
+report when 7 days are considered.
+
+### Limitations
+
+This data source is based on data provided to us by a lab testing company. They can report on a portion of United States COVID-19 Antigen tests, but not all of them, and so this source only represents those tests known to them. Their coverage may vary across the United States.
+
+### Missingness
+
+When fewer than 50 tests are reported in a state on a specific day, no data is
+reported for that area on that day; an API query for all reported states on that
+day will not include it.
+
+When fewer than 50 tests are reported in an HRR or MSA on a specific day, and
+not enough samples can be filled in from the parent state, no data is reported
+for that area on that day; an API query for all reported geographic areas on
+that day will not include it.
+
+### Lag and Backfill
+
+Because testing centers may report their data to Quidel several days after they
+occur, these signals are typically available with 5-6 days of lag. This
+means that estimates for a specific day first become available 5-6 days
+later.
+
+The amount of lag in reporting can vary, and not all tests are reported with the
+same lag. After we first report estimates for a specific date, further data may
+arrive about tests that occurred on that date, sometimes six weeks later or
+more. When this happens, we issue new estimates for those dates. This means that
+a reported estimate for, say, June 10th may first be available in the API on
+June 14th and subsequently revised on June 16th.
+
+
+## Flu Tests
+
+* **First issued:** 20 April 2020
+* **Last issued:** 19 May 2020
 * **Number of data revisions since 19 May 2020:** 0
 * **Date of last change:** Never
 * **Available for:** msa, state (see [geography coding docs](../covidcast_geography.md))
diff --git a/docs/api/covidcast_signals.md b/docs/api/covidcast_signals.md
@@ -21,20 +21,21 @@ data in this API are listed in the [API changelog](covidcast_changelog.md).
 The following signals are currently displayed on [the public COVIDcast
 map](https://covidcast.cmu.edu/):
 
-| Name | Source | Signal |
-| --- | --- | --- |
-| Doctor's Visits | [`doctor-visits`](covidcast-signals/doctor-visits.md) | `smoothed_adj_cli` |
-| Hospital Admissions | [`hospital-admissions`](covidcast-signals/hospital-admissions.md) | `smoothed_adj_covid19` |
-| Symptoms (Facebook) | [`fb-survey`](covidcast-signals/fb-survey.md) | `smoothed_cli` |
-| Symptoms in Community (Facebook) | [`fb-survey`](covidcast-signals/fb-survey.md) | `smoothed_hh_cmnty_cli` |
-| Away from Home 6hr+ (SafeGraph) | [`safegraph`](covidcast-signals/safegraph.md) | `full_time_work_prop` |
-| Away from Home 3-6hr (SafeGraph) | [`safegraph`](covidcast-signals/safegraph.md) | `part_time_work_prop` |
-| Search Trends (Google) | [`ght`](covidcast-signals/ght.md) | `smoothed_search` |
-| Combined | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `nmf_day_doc_fbc_fbs_ght` |
-| Cases | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `confirmed_7dav_incidence_num` |
-| Cases per 100,000 People | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `confirmed_7dav_incidence_prop` |
-| Deaths | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `deaths_7dav_incidence_num` |
-| Deaths per 100,000 People | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `deaths_7dav_incidence_prop` |
+| Kind             | Name                             | Source                                                                | Signal                           |
+| ----             | ----                             | ------                                                                | ------                           |
+| Public Behavior  | Away from Home 6hr+ (SafeGraph)  | [`safegraph`](covidcast-signals/safegraph.md)                         | `full_time_work_prop`            |
+| Public Behavior  | Away from Home 3-6hr (SafeGraph) | [`safegraph`](covidcast-signals/safegraph.md)                         | `part_time_work_prop`            |
+| Public Behavior  | Search Trends (Google)           | [`ght`](covidcast-signals/ght.md)                                     | `smoothed_search`                |
+| Early Indicators | Symptoms (Facebook)              | [`fb-survey`](covidcast-signals/fb-survey.md)                         | `smoothed_cli`                   |
+| Early Indicators | Symptoms in Community (Facebook) | [`fb-survey`](covidcast-signals/fb-survey.md)                         | `smoothed_hh_cmnty_cli`          |
+| Early Indicators | Doctor's Visits                  | [`doctor-visits`](covidcast-signals/doctor-visits.md)                 | `smoothed_adj_cli`               |
+| Early Indicators | Combined                         | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `nmf_day_doc_fbc_fbs_ght`        |
+| Late Indicators  | Test Positivity Rate             | [`quidel`](covidcast-signals/quidel.md)                               | `covid_ag_smoothed_pct_positive` |
+| Late Indicators  | Cases                            | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `confirmed_7dav_incidence_num`   |
+| Late Indicators  | Cases per 100,000 People         | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `confirmed_7dav_incidence_prop`  |
+| Late Indicators  | Deaths                           | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `deaths_7dav_incidence_num`      |
+| Late Indicators  | Deaths per 100,000 People        | [`indicator-combination`](covidcast-signals/indicator-combination.md) | `deaths_7dav_incidence_prop`     |
+| Late Indicators  | Hospital Admissions              | [`hospital-admissions`](covidcast-signals/hospital-admissions.md)     | `smoothed_adj_covid19`           |
 
 ## All Available Sources and Signals
 
diff --git a/docs/symptom-survey/survey-files.md b/docs/symptom-survey/survey-files.md
@@ -27,19 +27,18 @@ where the data is hosted.
 
 ## Naming Conventions
 
-All dates in filenames are of the form `YYYY_mm_dd`.
-
 Cumulative files:
 
-	cvid_responses_{from}_-_{to}.csv.gz
+	{YYYY_mm}.tar
 
 Incremental files:
 
-	cvid_responses_{for}_recordedby_{recorded}.csv
+	cvid_responses_{for}_recordedby_{recorded}.csv.gz
 
-`from`, `to`, and `for` refer to the day the survey response was started, in the
-Pacific time zone (UTC - 7). `recorded` refers to the day survey data was
-retrieved; see the [lag policy](#lag-policy) for more details.
+Dates in incremental filenames are of the form `YYYY_mm_dd`. `for` refers to the
+day the survey response was started, in the Pacific time zone (UTC -
+7). `recorded` refers to the day survey data was retrieved; see the [lag
+policy](#lag-policy) for more details.
 
 Every day, we write response files for *all* days of data, with today's
 `recorded` date. You need only load the most recent set of `recorded` files to