You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -17,42 +17,24 @@ grand_parent: COVIDcast Main Endpoint
17
17
18
18
## Overview
19
19
20
-
The Youtube-survey is a voluntary COVID-like illness 4-question survey that was part of a research study led by the Delphi group at Carnegie Mellon University. The survey consisted of the following introduction and questions:
20
+
This data source is based on a short survey about COVID-19-like illness
21
+
run by the Delphi group at Carnegie Mellon.
22
+
Youtube directed a random sample of its users to these surveys, which were
23
+
voluntary. Users age 18 or older were eligible to complete the surveys, and
24
+
their survey responses are held by CMU. No individual survey responses are
25
+
shared back to Youtube.
21
26
22
-
This voluntary survey is part of a research study led by the Delphi group at Carnegie Mellon University. Even if you are healthy, your responses may contribute to a better public health understanding of where the coronavirus pandemic is moving, to improve our local and national responses. The data captured does not include any personally identifiable information about you and your answers to all questions will remain confidential. Published results will be in aggregate and will not identify individual participants or their responses. This study is not conducted by YouTube and no individual responses will be shared back to YouTube. There are no foreseeable risks in participating and no compensation is offered. If you have any questions, contact: [email protected].
27
+
This survey was an early version of the [COVID-19 Trends and Impact Survey (CTIS)](../../symptom-survey/), collecting data only about COVID-19 symptoms. CTIS is much longer-running and more detailed, also collecting belief and behavior data, and is recommended in most usecases. See our [surveys
28
+
page](https://delphi.cmu.edu/covid19/ctis/) for more detail about how CTIS works.
23
29
24
-
Qualifying Questions
25
-
You must be 18 years or older to take this survey. Are you 18 years or older?
26
-
What is the ZIP Code of the city or town where you slept last night? [We mean the place where you are currently staying. This may be different from your usual residence.]
27
-
What is your current ZIP code?
30
+
[TODO note that indicators differ between the two surveys for unknown reasons]
28
31
29
-
List of Symptoms
30
-
Fever (100°F or higher)
31
-
Sore throat
32
-
Cough
33
-
Shortness of breath
34
-
Difficulty breathing
32
+
As of late April 2020, the number of Youtube survey responses we
33
+
received each day was 4-7 thousand. This was sparse at finer geographic levels, so this indicator only reports at the state level. The survey ran from April 21, 2020 to June
34
+
17, 2020, collecting about 159 thousand responses in the United States in that
35
+
time.
35
36
36
-
Survey Question 1
37
-
"How many additional people in your local community that you know personally are sick (fever, along with at least one other symptom from the above list)?
38
-
39
-
Survey Question 2
40
-
"How many people in your household (including yourself) are sick (fever, along with at least one other symptom from the above list)?"
41
-
42
-
Survey Question 3
43
-
"How many people in your household (including yourself) are experiencing at least one symptom from above?"
44
-
45
-
Survey Question 4
46
-
"In the past 24 hours, have you or anyone in your household experienced any of the following:"
47
-
48
-
| Signal | Description |
49
-
| --- | --- |
50
-
|`smoothed_outpatient_covid`| Estimated percentage of outpatient doctor visits with confirmed COVID-19, based on Change Healthcare claims data that has been de-identified in accordance with HIPAA privacy regulations, smoothed in time using a Gaussian linear smoother <br/> **Earliest date available:** 2020-02-01 |
51
-
|`smoothed_adj_outpatient_covid`| Same, but with systematic day-of-week effects removed; see [details below](#day-of-week-adjustment) <br/> **Earliest date available:** 2020-02-01 |
52
-
|`smoothed_outpatient_cli`| Estimated percentage of outpatient doctor visits primarily about COVID-related symptoms, based on Change Healthcare claims data that has been de-identified in accordance with HIPAA privacy regulations, smoothed in time using a Gaussian linear smoother <br/> **Earliest date available:** 2020-02-01 |
53
-
|`smoothed_adj_outpatient_cli`| Same, but with systematic day-of-week effects removed; see [details below](#day-of-week-adjustment) <br/> **Earliest date available:** 2020-02-01 |
54
-
|`smoothed_outpatient_flu`| Estimated percentage of outpatient doctor visits with confirmed influenza, based on Change Healthcare claims data that has been de-identified in accordance with HIPAA privacy regulations, smoothed in time using a Gaussian linear smoother <br/> **Earliest issue available:** 2021-12-06 <br/> **Earliest date available:** 2020-02-01 |
55
-
|`smoothed_adj_outpatient_flu`| Same, but with systematic day-of-week effects removed; see [details below](#day-of-week-adjustment) <br/> **Earliest issue available:** 2021-12-06 <br/> **Earliest date available:** 2020-02-01 |
37
+
We produce [influenza-like and COVID-like illness indicators](#ili-and-cli-indicators) based on the survey data.
56
38
57
39
## Table of Contents
58
40
{: .no_toc .text-delta}
@@ -62,44 +44,135 @@ Survey Question 4
62
44
63
45
## Survey Text and Questions
64
46
65
-
The survey starts with the following 5 questions:
47
+
The survey contains the following 5 questions:
66
48
67
-
1. In the past 24 hours, have you or anyone in your household had any of the
68
-
following (yes/no for each):
49
+
1. In the past 24 hours, have you or anyone in your household experienced any of the following:
69
50
- (a) Fever (100 °F or higher)
70
51
- (b) Sore throat
71
52
- (c) Cough
72
53
- (d) Shortness of breath
73
54
- (e) Difficulty breathing
74
-
2. How many people in your household (including yourself) are sick (fever, along
75
-
with at least one other symptom from the above list)?
76
-
3. How many people are there in your household in total (including yourself)?
77
-
*[Beginning in wave 4, this question asks respondents to break the number
78
-
down into three age categories.]*
55
+
2. How many people in your household (including yourself) are sick (fever, along with at least one other symptom from the above list)?
56
+
3. How many people are there in your household (including yourself)?
79
57
4. What is your current ZIP code?
80
-
5. How many additional people in your local community that you know personally
81
-
are sick (fever, along with at least one other symptom from the above list)?
58
+
5. How many additional people in your local community that you know personally are sick (fever, along with at least one other symptom from the above list)?
59
+
60
+
61
+
## ILI and CLI Indicators
62
+
63
+
We define COVID-like illness (fever, along with cough, or shortness of breath,
64
+
or difficulty breathing) or influenza-like illness (fever, along with cough or
65
+
sore throat) for use in forecasting and modeling. Using this survey data, we
66
+
estimate the percentage of people (age 18 or older) who have a COVID-like
67
+
illness, or influenza-like illness, in a given location, on a given day.
68
+
69
+
| Signals | Description |
70
+
| --- | --- |
71
+
|`raw_cli` and `smoothed_cli`| Estimated percentage of people with COVID-like illness <br/> **Earliest date available:** 2020-04-21 |
72
+
|`raw_ili` and `smoothed_ili`| Estimated percentage of people with influenza-like illness <br/> **Earliest date available:** 2020-04-21 |
73
+
74
+
Influenza-like illness or ILI is a standard indicator, and is defined by the CDC
75
+
as: fever along with sore throat or cough. From the list of symptoms from Q1 on
76
+
our survey, this means a and (b or c).
77
+
78
+
COVID-like illness or CLI is not a standard indicator. Through our discussions
79
+
with the CDC, we chose to define it as: fever along with cough or shortness of
80
+
breath or difficulty breathing. From the list of symptoms from Q1 on
81
+
our survey, this means a and (c or d or e).
82
+
83
+
Symptoms alone are not sufficient to diagnose influenza or coronavirus
84
+
infections, and so these ILI and CLI indicators are *not* expected to be
85
+
unbiased estimates of the true rate of influenza or coronavirus infections.
86
+
These symptoms can be caused by many other conditions, and many true infections
87
+
can be asymptomatic. Instead, we expect these indicators to be useful for
88
+
comparison across the United States and across time, to determine where symptoms
89
+
appear to be increasing.
82
90
83
-
Beyond these 5 questions, there are also many other questions that follow in the
84
-
survey, which go into more detail on symptoms, contacts, risk factors, and
85
-
demographics. These are used for many of our behavior and testing indicators
86
-
below. The full text of the survey (including all deployed versions) can be
87
-
found on our [questions and coding page](../../symptom-survey/coding.md).
91
+
**Smoothing.** The signals beginning with `smoothed` estimate the same quantities as their
92
+
`raw` partners, but are smoothed in time to reduce day-to-day sampling noise;
93
+
see [details below](#smoothing). Crucially, because the smoothed signals combine
94
+
information across multiple days, they have larger sample sizes and hence are
95
+
available for more locations than the raw signals.
88
96
89
-
### Day-of-Week Adjustment
90
97
98
+
### Defining Household ILI and CLI
91
99
100
+
[TODO check]
92
101
93
-
### Backwards Padding
102
+
For a single survey, we are interested in the quantities:
94
103
104
+
-$$X =$$ the number of people in the household with ILI;
105
+
-$$Y =$$ the number of people in the household with CLI;
106
+
-$$N =$$ the number of people in the household.
107
+
108
+
Note that $$N$$ comes directly from the answer to Q3, but neither $$X$$ nor
109
+
$$Y$$ can be computed directly (because Q2 does not give an answer to the
110
+
precise symptomatic profile of all individuals in the household, it only asks
111
+
how many individuals have fever and at least one other symptom from the list).
112
+
113
+
We hence estimate $$X$$ and $$Y$$ with the following simple strategy. Consider
114
+
ILI, without a loss of generality (we apply the same strategy to CLI). Let $$Z$$
115
+
be the answer to Q2.
116
+
117
+
- If the answer to Q1 does not meet the ILI definition, then we report $$X=0$$.
118
+
- If the answer to Q1 does meet the ILI definition, then we report $$X = Z$$.
119
+
120
+
This can only "over count" (result in too large estimates of) the true $$X$$ and
121
+
$$Y$$. For example, this happens when some members of the household experience
122
+
ILI that does not also qualify as CLI, while others experience CLI that does not
123
+
also qualify as ILI. In this case, for both $$X$$ and $$Y$$, our simple strategy
124
+
would return the sum of both types of cases. However, given the extreme degree
125
+
of overlap between the definitions of ILI and CLI, it is reasonable to believe
126
+
that, if symptoms across all household members qualified as both ILI and CLI,
127
+
each individual would have both, or neither---with neither being more common.
128
+
Therefore we do not consider this "over counting" phenomenon practically
129
+
problematic.
130
+
131
+
132
+
### Estimating Percent ILI and CLI
133
+
134
+
[TODO check]
135
+
136
+
Let $$x$$ and $$y$$ be the number of people with ILI and CLI, respectively, over
137
+
a given time period, and in a given location (for example, the time period being
138
+
a particular day, and a location being a particular state). Let $$n$$ be the
139
+
total number of people in this location. We are interested in estimating the
140
+
true ILI and CLI percentages, which we denote by $$p$$ and $$q$$, respectively:
141
+
142
+
$$
143
+
p = 100 \cdot \frac{x}{n}
144
+
\quad\text{and}\quad
145
+
q = 100 \cdot \frac{y}{n}.
146
+
$$
147
+
148
+
In a given aggregation unit (for example, daily-state), let $$X_i$$ and $$Y_i$$
149
+
denote number of ILI and CLI cases in the household, respectively (computed
150
+
according to the simple strategy [described
151
+
above](#defining-household-ili-and-cli)), and let $$N_i$$ denote the total
152
+
number of people in the household, in survey $$i$$, out of $$m$$ surveys we
153
+
collected. Then our unweighted estimates of $$p$$ and $$q$$ are:
0 commit comments