Skip to content

Commit 86cdee5

Browse files
committed
Creates Youtube-survey doc page
This is a draft of the Youtube-survey doc page. A lot of information is missing or assumed based off of (https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/fb-survey.html) and may very possibly altogether be incorrect. A comment from Katie from a May 4/7 2020 tooling team sprint states: "YouTube 4-question survey - this is essentially a copy of fb-survey without the weighted signals".
1 parent 30c6206 commit 86cdee5

File tree

1 file changed

+166
-0
lines changed

1 file changed

+166
-0
lines changed
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
---
2+
title: Youtube Survey
3+
parent: Inactive Signals
4+
grand_parent: COVIDcast Main Endpoint
5+
---
6+
7+
# Youtube Survey
8+
{: .no_toc}
9+
10+
* **Source name:** `youtube-survey`
11+
* **Earliest issue available:** April, 04, 2020
12+
* **Number of data revisions since May 19, 2020:** 0
13+
* **Date of last change:** Never
14+
* **Available for:** state (see [geography coding docs](../covidcast_geography.md))
15+
* **Time type:** day (see [date format docs](../covidcast_times.md))
16+
* **License:** [CC BY-NC](../covidcast_licensing.md#creative-commons-attribution-noncommercial)
17+
18+
## Overview
19+
20+
The Youtube-survey is a voluntary COVID-like illness 4-question survey that was part of a research study led by the Delphi group at Carnegie Mellon University. The survey consisted of the following introduction and questions:
21+
22+
This voluntary survey is part of a research study led by the Delphi group at Carnegie Mellon University. Even if you are healthy, your responses may contribute to a better public health understanding of where the coronavirus pandemic is moving, to improve our local and national responses. The data captured does not include any personally identifiable information about you and your answers to all questions will remain confidential. Published results will be in aggregate and will not identify individual participants or their responses. This study is not conducted by YouTube and no individual responses will be shared back to YouTube. There are no foreseeable risks in participating and no compensation is offered. If you have any questions, contact: [email protected].
23+
24+
Qualifying Questions
25+
You must be 18 years or older to take this survey. Are you 18 years or older?
26+
What is the ZIP Code of the city or town where you slept last night? [We mean the place where you are currently staying. This may be different from your usual residence.]
27+
What is your current ZIP code?
28+
29+
List of Symptoms
30+
Fever (100°F or higher)
31+
Sore throat
32+
Cough
33+
Shortness of breath
34+
Difficulty breathing
35+
36+
Survey Question 1
37+
"How many additional people in your local community that you know personally are sick (fever, along with at least one other symptom from the above list)?
38+
39+
Survey Question 2
40+
"How many people in your household (including yourself) are sick (fever, along with at least one other symptom from the above list)?"
41+
42+
Survey Question 3
43+
"How many people in your household (including yourself) are experiencing at least one symptom from above?"
44+
45+
Survey Question 4
46+
"In the past 24 hours, have you or anyone in your household experienced any of the following:"
47+
48+
| Signal | Description |
49+
| --- | --- |
50+
| `smoothed_outpatient_covid` | Estimated percentage of outpatient doctor visits with confirmed COVID-19, based on Change Healthcare claims data that has been de-identified in accordance with HIPAA privacy regulations, smoothed in time using a Gaussian linear smoother <br/> **Earliest date available:** 2020-02-01 |
51+
| `smoothed_adj_outpatient_covid` | Same, but with systematic day-of-week effects removed; see [details below](#day-of-week-adjustment) <br/> **Earliest date available:** 2020-02-01 |
52+
| `smoothed_outpatient_cli` | Estimated percentage of outpatient doctor visits primarily about COVID-related symptoms, based on Change Healthcare claims data that has been de-identified in accordance with HIPAA privacy regulations, smoothed in time using a Gaussian linear smoother <br/> **Earliest date available:** 2020-02-01 |
53+
| `smoothed_adj_outpatient_cli` | Same, but with systematic day-of-week effects removed; see [details below](#day-of-week-adjustment) <br/> **Earliest date available:** 2020-02-01 |
54+
| `smoothed_outpatient_flu` | Estimated percentage of outpatient doctor visits with confirmed influenza, based on Change Healthcare claims data that has been de-identified in accordance with HIPAA privacy regulations, smoothed in time using a Gaussian linear smoother <br/> **Earliest issue available:** 2021-12-06 <br/> **Earliest date available:** 2020-02-01 |
55+
| `smoothed_adj_outpatient_flu` | Same, but with systematic day-of-week effects removed; see [details below](#day-of-week-adjustment) <br/> **Earliest issue available:** 2021-12-06 <br/> **Earliest date available:** 2020-02-01 |
56+
57+
## Table of Contents
58+
{: .no_toc .text-delta}
59+
60+
1. TOC
61+
{:toc}
62+
63+
## Survey Text and Questions
64+
65+
The survey starts with the following 5 questions:
66+
67+
1. In the past 24 hours, have you or anyone in your household had any of the
68+
following (yes/no for each):
69+
- (a) Fever (100 °F or higher)
70+
- (b) Sore throat
71+
- (c) Cough
72+
- (d) Shortness of breath
73+
- (e) Difficulty breathing
74+
2. How many people in your household (including yourself) are sick (fever, along
75+
with at least one other symptom from the above list)?
76+
3. How many people are there in your household in total (including yourself)?
77+
*[Beginning in wave 4, this question asks respondents to break the number
78+
down into three age categories.]*
79+
4. What is your current ZIP code?
80+
5. How many additional people in your local community that you know personally
81+
are sick (fever, along with at least one other symptom from the above list)?
82+
83+
Beyond these 5 questions, there are also many other questions that follow in the
84+
survey, which go into more detail on symptoms, contacts, risk factors, and
85+
demographics. These are used for many of our behavior and testing indicators
86+
below. The full text of the survey (including all deployed versions) can be
87+
found on our [questions and coding page](../../symptom-survey/coding.md).
88+
89+
### Day-of-Week Adjustment
90+
91+
92+
93+
### Backwards Padding
94+
95+
96+
97+
### Smoothing
98+
99+
100+
101+
## Lag and Backfill
102+
103+
104+
105+
## Limitations
106+
107+
When interpreting the signals above, it is important to keep in mind several
108+
limitations of this survey data.
109+
110+
* **Survey population.** People are eligible to participate in the survey if
111+
they are age 18 or older, they are currently located in the USA, and they are
112+
an active user of Youtube. The survey data does not report on children under
113+
age 18, and the Youtube adult user population may differ from the United
114+
States population generally in important ways. We use our [survey
115+
weighting](#survey-weighting-and-estimation) to adjust the estimates to match
116+
age and gender demographics by state, but this process doesn't adjust for
117+
other demographic biases we may not be aware of.
118+
* **Non-response bias.** The survey is voluntary, and people who accept the
119+
invitation when it is presented to them on Youtube may be different from
120+
those who do not. The [survey weights provided by
121+
Youtube](#survey-weighting-and-estimation) attempt to model the probability
122+
of response for each user and hence adjust for this, but it is difficult to
123+
tell if these weights account for all possible non-response bias.
124+
* **Social desirability.** Previous survey research has shown that people's
125+
responses to surveys are often biased by what responses they believe are
126+
socially desirable or acceptable. For example, if it there is widespread
127+
pressure to wear masks, respondents who do *not* wear masks may feel pressured
128+
to answer that they *do*. This survey is anonymous and online, meaning we
129+
expect the social desirability effect to be smaller, but it may still be
130+
present.
131+
* **False responses.** As with anything on the Internet, a small percentage of
132+
users give deliberately incorrect responses. We discard a small number of
133+
responses that are obviously false, but do **not** perform extensive
134+
filtering. However, the large size of the study, and our procedure for
135+
ensuring that each respondent can only be counted once when they are invited
136+
to take the survey, prevents individual respondents from having a large effect
137+
on results.
138+
* **Repeat invitations.** Individual respondents can be invited by Youtube to
139+
take the survey several times. Usually Youtube only re-invites a respondent
140+
after one month. Hence estimates of values on a single day are calculated
141+
using independent survey responses from unique respondents (or, at least,
142+
unique Youtube accounts), whereas estimates from different months may involve
143+
the same respondents.
144+
145+
Whenever possible, you should compare this data to other independent sources. We
146+
believe that while these biases may affect point estimates -- that is, they may
147+
bias estimates on a specific day up or down -- the biases should not change
148+
strongly over time. This means that *changes* in signals, such as increases or
149+
decreases, are likely to represent true changes in the underlying population,
150+
even if point estimates are biased.
151+
152+
### Privacy Restrictions
153+
154+
To protect respondent privacy, we discard any estimate (whether at a county,
155+
MSA, HRR, or state level) that is based on fewer than 100 survey responses. For
156+
signals reported using a 7-day average (those beginning with `smoothed_`), this
157+
means a geographic area must have at least 100 responses in 7 days to be
158+
reported.
159+
160+
This affects some items more than others. For instance, items about vaccine
161+
hesitancy reasons are only asked of respondents who are unvaccinated and
162+
hesitant, not to all survey respondents. It also affects some geographic areas
163+
more than others, particularly rural areas with low population densities. When
164+
doing analysis of county-level data, one should be aware that missing counties
165+
are typically more rural and less populous than those present in the data, which
166+
may introduce bias into the analysis.

0 commit comments

Comments
 (0)