You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/api/covidcast-signals/doctor-visits.md
+21-16
Original file line number
Diff line number
Diff line change
@@ -13,12 +13,12 @@ grand_parent: COVIDcast API
13
13
***Available for:** county, hrr, msa, state (see [geography coding docs](../covidcast_geography.md))
14
14
15
15
This data source is based on information about outpatient visits, provided to us
16
-
by a national health system. Using this outpatient data, we estimate the
16
+
by health system partners. Using this outpatient data, we estimate the
17
17
percentage of COVID-related doctor's visits in a given location, on a given day.
18
18
19
19
| Signal | Description |
20
20
| --- | --- |
21
-
|`smoothed_cli`| Estimated percentage of outpatient doctor visits primarily about COVID-related symptoms, based on data from a national health system, smoothed in time using a Gaussian linear smoother |
21
+
|`smoothed_cli`| Estimated percentage of outpatient doctor visits primarily about COVID-related symptoms, based on data from health system partners, smoothed in time using a Gaussian linear smoother |
22
22
|`smoothed_adj_cli`| Same, but with systematic day-of-week effects removed; see [details below](#day-of-week-adjustment)|
23
23
24
24
## Table of contents
@@ -29,10 +29,10 @@ percentage of COVID-related doctor's visits in a given location, on a given day.
29
29
30
30
## Lag and Backfill
31
31
32
-
Note that because doctor's visits may be reported to the health system several
33
-
days after they occur, these signals are typically available with several days
34
-
of lag. This means that estimates for a specific day are only available several
35
-
days later.
32
+
Note that because doctor's visits may be reported to the health system partners
33
+
several days after they occur, these signals are typically available with
34
+
several days of lag. This means that estimates for a specific day are only
35
+
available several days later.
36
36
37
37
The amount of lag in reporting can vary, and not all visits are reported with
38
38
the same lag. After we first report estimates for a specific date, further data
@@ -43,10 +43,11 @@ June 16th.
43
43
44
44
## Limitations
45
45
46
-
This data source is based on outpatient visit data provided to us by a national
47
-
health system. The system can report on a portion of United States outpatient
48
-
doctor's visits, but not all of them, and so this source only represents those
49
-
visits known to them. Their coverage may vary across the United States.
46
+
This data source is based on outpatient visit data provided to us by health
47
+
system partners. The partners can report on a portion of United States
48
+
outpatient doctor's visits, but not all of them, and so this source only
49
+
represents those visits known to them. Their coverage may vary across the United
50
+
States.
50
51
51
52
Standard errors are not available for this data source.
52
53
@@ -115,16 +116,20 @@ the ratio between the doctor visit signals on Sunday and Monday would be a
115
116
constant. Formally, we assume that
116
117
117
118
$$
118
-
\log \mu_t = \alpha_{wd(t)} + \phi_t
119
+
\begin{aligned}
120
+
\mathbb{E}[Y_{it}] &= \mu_t\\
121
+
\log \mu_t &= \alpha_{\text{wd}(t)} + \phi_t,
122
+
\end{aligned}
119
123
$$
120
124
121
-
where $$\mu_t$$ is the expected doctor visits percentage of CLI at time $$t$$,
122
-
$$\alpha_{wd(t)}$$ is the weekday correction for the weekday of day $$t$$, and
125
+
where $$Y_{it}$$ is the observed doctor visits percentage of CLI at time $$t$$,
126
+
$$\text{wd}(t) \in \{0, \dots, 6\}$$ is the day-of-week of time $$t$$,
127
+
$$\alpha_{\text{wd}(t)}$$ is the corresponding weekday correction, and
123
128
$$\phi_t$$ is the corrected doctor visits percentage of CLI at time $$t$$.
124
129
125
-
For simplicity, we fit assume that the weekday parameters do not change over
126
-
time or location. To fit the $$\alpha$$ parameters, we minimize the following
127
-
convex objective function:
130
+
For simplicity, we assume that the weekday parameters do not change over time or
131
+
location. To fit the $$\alpha$$ parameters, we minimize the following convex
Copy file name to clipboardExpand all lines: docs/api/covidcast-signals/fb-survey.md
+10-2
Original file line number
Diff line number
Diff line change
@@ -27,8 +27,8 @@ day.
27
27
28
28
| Signal | Description |
29
29
| --- | --- |
30
-
|`raw_cli`| Estimated percentage of people with COVID-like illness based on the [criteria below](#defining-household-ili-and-cli), with no smoothing or survey weighting |
31
-
|`raw_ili`| Estimated percentage of people with influenza-like illness based on the [criteria below](#defining-household-ili-and-cli), with no smoothing or survey weighting |
30
+
|`raw_cli`| Estimated percentage of people with COVID-like illness based on the [criteria below](#ili-and-cli-indicators), with no smoothing or survey weighting |
31
+
|`raw_ili`| Estimated percentage of people with influenza-like illness based on the [criteria below](#ili-and-cli-indicators), with no smoothing or survey weighting |
32
32
|`raw_wcli`| Estimated percentage of people with COVID-like illness; adjusted using survey weights [as described below](#survey-weighting)|
33
33
|`raw_wili`| Estimated percentage of people with influenza-like illness; adjusted using survey weights [as described below](#survey-weighting)|
34
34
|`raw_hh_cmnty_cli`| Estimated percentage of people reporting illness in their local community, as [described below](#estimating-community-cli), including their household, with no smoothing or survey weighting |
@@ -92,6 +92,14 @@ COVID-like illness or CLI is not a standard indicator. Through our discussions
92
92
with the CDC, we chose to define it as: fever along with cough or shortness of
93
93
breath or difficulty breathing.
94
94
95
+
Symptoms alone are not sufficient to diagnose influenza or coronavirus
96
+
infections, and so these ILI and CLI indicators are *not* expected to be
97
+
unbiased estimates of the true rate of influenza or coronavirus infections.
98
+
These symptoms can be caused by many other conditions, and many true infections
99
+
can be asymptomatic. Instead, we expect these indicators to be useful for
100
+
comparison across the United States and across time, to determine where symptoms
101
+
appear to be increasing.
102
+
95
103
### Defining Household ILI and CLI
96
104
97
105
For a single survey, we are interested in the quantities:
Copy file name to clipboardExpand all lines: docs/api/covidcast-signals/google-survey.md
+233-1
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,7 @@ grand_parent: COVIDcast API
5
5
---
6
6
7
7
# Google Symptom Surveys
8
+
{: .no_toc}
8
9
9
10
***Source name:**`google-survey`
10
11
***Number of data revisions since 19 May 2020:** 0
@@ -35,4 +36,235 @@ specific geographical areas as needed to support forecasting efforts.
35
36
| Signal | Description |
36
37
| --- | --- |
37
38
|`raw_cli`| Estimated percentage of people who know someone in their community with COVID-like illness |
38
-
|`smoothed_cli`| Estimated percentage of people who know someone in their community with COVID-like illness, smoothed in time |
39
+
|`smoothed_cli`| Estimated percentage of people who know someone in their community with COVID-like illness, smoothed in time [as described below](#smoothing)|
40
+
41
+
## Table of contents
42
+
{: .no_toc .text-delta}
43
+
44
+
1. TOC
45
+
{:toc}
46
+
47
+
## Estimation
48
+
49
+
Let $$Y$$ be the number of people who know someone in their community with
50
+
COVID-like illness or CLI, over a given time period and in a given location, and
51
+
let $$N$$ be the number of people in this location who do *not* know someone in
52
+
their community with CLI. We are interested in the proportion
53
+
54
+
$$
55
+
p = \frac{Y}{Y+N}.
56
+
$$
57
+
58
+
Since the Google Surveys system provides estimated counties for each respondent,
59
+
we are able to report $$p$$ for counties, MSAs, HRRs, and states. Our current
60
+
rule-of-thumb is to discard any estimate (whether at a county, MSA, HRR, or
61
+
state level) that is composed of fewer than 100 survey responses.
62
+
63
+
At the county level, MSA, and HRR levels, our estimation procedure is fairly
64
+
simple, and is outlined below. Estimation for mega-counties and states is more
65
+
complex, and deferred to the next subsection.
66
+
67
+
### County Level
68
+
69
+
Recall that we run surveys separately (in a stratified manner) in each county.
70
+
In a given county, if $$Y$$ denotes the number of respondents who know someone
71
+
in their community with CLI, $$N$$ denotes the total number who do not, and $$n
72
+
= Y + N$$ the number of "yes" and "no" responses combined, then to estimate
0 commit comments