@@ -20,7 +20,7 @@ grand_parent: COVIDcast Epidata API
20
20
* ** Earliest issue available:** July 29, 2020
21
21
* ** Number of data revisions since May 19, 2020:** 1
22
22
* ** Date of last change:** October 22, 2020
23
- * ** Available for:** hrr, msa, state (see [ geography coding docs] ( ../covidcast_geography.md ) )
23
+ * ** Available for:** county, hrr, msa, state, HHS, nation (see [ geography coding docs] ( ../covidcast_geography.md ) )
24
24
* ** Time type:** day (see [ date format docs] ( ../covidcast_times.md ) )
25
25
* ** License:** [ CC BY] ( ../covidcast_licensing.md#creative-commons-attribution )
26
26
68
68
p = \frac{100 x}{n}
69
69
$$
70
70
71
- We estimate p across 3 temporal-spatial aggregation schemes:
71
+ We estimate p across 6 temporal-spatial aggregation schemes:
72
+ - daily, at the county level;
72
73
- daily, at the MSA (metropolitan statistical area) level;
73
74
- daily, at the HRR (hospital referral region) level;
74
- - daily, at the state level.
75
+ - daily, at the state level;
76
+ - daily, at the HHS level;
77
+ - daily, at the US national level.
75
78
76
- ** MSA and HRR levels** : In a given MSA or HRR, suppose $$ N $$ COVID tests are taken
77
- in a certain time period, $$ X $$ is the number of tests taken with positive
78
- results.
79
+ #### Standard Error
79
80
80
- For raw signals:
81
- - if $$ N \geq 50 $$ , we simply use :
81
+ We assume the estimates for each time point follow a binomial distribution. The
82
+ estimated standard error then is :
82
83
83
84
$$
84
- p = \ frac{100 X}{N}
85
+ \text{se} = 100 \sqrt{ \ frac{\frac{p}{ 100}(1- \frac{p}{100})}{N} }
85
86
$$
86
87
87
- For smoothed signals, before taking the temporal pooling average,
88
- - if $$ N \geq 50 $$ , we also use:
88
+ #### Smoothing
89
+
90
+ We add two kinds of smoothing to the smoothed signals:
91
+
92
+ ##### Temporal Smoothing
93
+ Smoothed estimates are formed by pooling data over time. That is, daily, for
94
+ each location, we first pool all data available in that location over the last 7
95
+ days, and we then recompute everything described in the two subsections above.
96
+
97
+ Pooling in this way makes estimates available in more geographic areas, as many areas
98
+ report very few tests per day, but have enough data to report when 7 days are considered.
99
+
100
+ ##### Geographical Smoothing
101
+
102
+ ** County, MSA and HRR levels** : In a given County, MSA or HRR, suppose $$ N $$ COVID tests
103
+ are taken in a certain time period, $$ X $$ is the number of tests taken with positive
104
+ results.
105
+
106
+
107
+ For smoothed signals, after taking the temporal pooling,
108
+ - if $$ N \geq 50 $$ , we still use:
89
109
$$
90
110
p = \frac{100 X}{N}
91
111
$$
92
- - if $$ 25 \leq N < 50 $$ , we lend $$ 50 - N $$ fake samples from its home state to shrink the
112
+ - if $$ 25 \leq N < 50 $$ , we lend $$ 50 - N $$ fake samples from its parent state to shrink the
93
113
estimate to the state's mean, which means:
94
114
$$
95
115
p = 100 \left( \frac{N}{50} \frac{X}{N} + \frac{50 - N}{50} \frac{X_s}{N_s} \right)
96
116
$$
97
117
where $$ N_s, X_s $$ are the number of COVID tests and the number of COVID tests
98
- taken with positive results taken in its home state in the same time period.
118
+ taken with positive results taken in its parent state in the same time period.
119
+ A parent state is defined as the state with the largest proportion of the population
120
+ in this county/MSA/HRR.
99
121
100
- ** State level** : the states with fewer than 50 tests are discarded. For the
101
- rest of the states with sufficient samples,
122
+ Counties with sample sizes smaller than 50 are merged into megacounties for
123
+ the raw signals; counties with sample sizes smaller than 25 are merged into megacounties for
124
+ the smoothed signals.
102
125
126
+ ** State level, HHS level, National level** : locations with fewer than 50 tests are discarded. For the remaining locations,
103
127
$$
104
128
p = \frac{100 X}{N}
105
129
$$
106
130
107
- #### Standard Error
108
-
109
- We assume the estimates for each time point follow a binomial distribution. The
110
- estimated standard error then is:
111
-
112
- $$
113
- \text{se} = 100 \sqrt{ \frac{\frac{p}{100}(1- \frac{p}{100})}{N} }
114
- $$
115
-
116
- #### Smoothing
117
-
118
- Smoothed estimates are formed by pooling data over time. That is, daily, for
119
- each location, we first pool all data available in that location over the last 7
120
- days, and we then recompute everything described in the last two
121
- subsections. Pooling in this way makes estimates available in more geographic
122
- areas, as many areas report very few tests per day, but have enough data to
123
- report when 7 days are considered.
124
-
125
131
### Lag and Backfill
126
132
127
133
Because testing centers may report their data to Quidel several days after they
@@ -142,13 +148,13 @@ This data source is based on data provided to us by a lab testing company. They
142
148
143
149
### Missingness
144
150
145
- When fewer than 50 tests are reported in a state on a specific day, no data is
151
+ When fewer than 50 tests are reported in a state/a HHS region/US on a specific day, no data is
146
152
reported for that area on that day; an API query for all reported states on that
147
153
day will not include it.
148
154
149
- When fewer than 50 tests are reported in an HRR or MSA on a specific day, and
150
- not enough samples can be filled in from the parent state, no data is reported
151
- for that area on that day; an API query for all reported geographic areas on
155
+ When fewer than 50 tests are reported in a county, HRR or MSA on a specific day, and
156
+ not enough samples can be filled in from the parent state for smoothed signals specifically,
157
+ no data is reported for that area on that day; an API query for all reported geographic areas on
152
158
that day will not include it.
153
159
154
160
## Flu Tests
0 commit comments