Skip to content

Commit d7add86

Browse files
authored
Merge pull request #293 from cmu-delphi/docs/survey-signup
Document next survey wave and how to get involved
2 parents 7cd9cb9 + e70e96c commit d7add86

File tree

9 files changed

+227
-10
lines changed

9 files changed

+227
-10
lines changed

docs/symptom-survey/coding.md

Lines changed: 66 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Questions and Coding
33
parent: COVID Symptom Survey
4-
nav_order: 3
4+
nav_order: 5
55
---
66

77
# Questions and Coding
@@ -268,3 +268,68 @@ your protocol.
268268
your household’s finances?") has been removed and replaced with item C15, for
269269
consistency with the international version of the survey.
270270
* Item D1b ("Are you currently pregnant?") has been removed.
271+
272+
## Wave 5
273+
274+
Wave 5 will be deployed in late November, 2020. It is available in English, as
275+
well as
276+
277+
* Simplified Chinese
278+
* English (UK)
279+
* Spanish (Latin America)
280+
* Spanish
281+
* French
282+
* Brazilian Portuguese
283+
* Vietnamese
284+
285+
**Draft** files:
286+
287+
* [Survey text and coding](waves/Survey_of_COVID-Like_Illness_-_Wave_5.pdf)
288+
(PDF)
289+
290+
### Summary of Changes
291+
292+
Wave 5 contains minor changes to the survey instrument and a few new items.
293+
Please review the changes carefully when you use responses from multiple waves
294+
of this survey.
295+
296+
Note that this changelog is a draft; as the survey instrument is finalized, this
297+
changelog will be updated to include all final changes.
298+
299+
#### Consent Text
300+
301+
The survey consent text has been altered to encourage respondents to answer the
302+
survey, even if they have already taken it before:
303+
304+
> We encourage you to complete the survey each time you are invited, even if you
305+
> have participated before. Completing the survey again will help us understand
306+
> how the situation is changing.
307+
308+
#### New Items
309+
310+
* Item C16 asks respondents to estimate how many people are wearing masks in
311+
their community.
312+
* Item C7 was previously asked in Waves 1-3; it asks respondents the extent they
313+
are avoiding other people.
314+
* Item C17 asks whether respondents have received a flu vaccine. The time frame
315+
and responses are adapted to specify the current seasonal flu vaccine. A more
316+
general version inquiry about the flu vaccine appeared in Wave 1-3 as item C2.
317+
* Items E1-E3 ask about household children and their education during the
318+
pandemic. These items appear for respondents who indicate there is a child in
319+
their household under the age of 18. E1 asks respondents to indicate the
320+
current grade level(s) the child(ren) in their household. Item E2 asks the
321+
respondents if the child(ren) are attending in-person classes part time or
322+
full time. Item E3 asks respondents what measures are applied to prevent the
323+
spread of COVID-19 when the child(ren) attend in-person classes (e.g.
324+
mandatory, mask wearing, closed communal areas).
325+
326+
#### Changed Items
327+
328+
* Item B2 now includes headaches and changes in sleep as symptoms.
329+
* Item D8 now includes the option of Master’s degree (unfortunately omitted in
330+
Wave 4) and has examples of professional degree for clarification.
331+
332+
#### Removed Items
333+
334+
* There are no items from Wave 4 that were removed in the Wave 5 version of
335+
this survey.
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
title: Collaboration and Survey Revision
3+
parent: COVID Symptom Survey
4+
nav_order: 1
5+
---
6+
7+
# Collaboration and Survey Revision
8+
9+
Delphi continues to revise the COVID-19 Symptom Survey instruments in order to
10+
prioritize items that have the greatest utility for the response to the COVID-19
11+
pandemic. We conduct revisions in collaboration with data users, fellow
12+
researchers, and public health officials, to ensure the survey data best serves
13+
public health and research goals.
14+
15+
## Survey Revisions
16+
17+
If there is a revision or question you would like us to consider, please fill
18+
out [this form requesting details about your
19+
proposal](https://forms.gle/q6NS8fPJJofKQ9mM8). This request can be submitted by
20+
researchers regardless of whether they have a signed Data Use Agreement for the
21+
individual responses to the COVID Symptom Survey.
22+
23+
## Collaboration Meetings
24+
25+
Collaboration in this ongoing effort is our priority. Delphi hosts a
26+
collaboration meeting the first Friday of each month at 2–3pm ET. The meeting is
27+
a chance to announce upcoming changes to the survey, have a discussion and get
28+
input about the instrument, share preliminary findings and network with other
29+
researchers.
30+
31+
If you're interested in joining, contact us at
32+

docs/symptom-survey/data-access.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
---
2+
title: Getting Data Access
3+
parent: COVID Symptom Survey
4+
nav_order: 0
5+
---
6+
7+
# Getting Data Access
8+
9+
The Delphi Research Group at Carnegie Mellon University (CMU), in partnership
10+
with Facebook, has conducted the COVID Symptom Survey to better understand the
11+
spread of COVID-19 and its effects on public health and well-being. This may
12+
help improve our local and national responses to the pandemic and our
13+
understanding of how it has affected society.
14+
15+
De-identified data can be made available to researchers associated with
16+
universities or non-profit organizations. To request access to the data please
17+
submit the information requested in [Facebook's page on obtaining data
18+
access](https://dataforgood.fb.com/docs/covid-19-symptom-survey-request-for-data-access/),
19+
which sets out the basic conditions and provides a form to request access. An
20+
[international version of the COVID Symptom Survey](https://covidmap.umd.edu/)
21+
is conducted by the University of Maryland (UMD) and access can be requested
22+
through the same form.

docs/symptom-survey/index.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,8 @@ as the [`fb-survey` data source](../api/covidcast-signals/fb-survey.md).
1919
This documentation is for users who have a signed Data Use Agreement to receive
2020
individual response data from the survey. It describes the survey items, data
2121
coding, data distribution, and the survey weights computed by Facebook. If you
22-
are a researcher and would like to get access to the data, check [Facebook's
23-
page on obtaining data
24-
access](https://dataforgood.fb.com/docs/covid-19-symptom-survey-request-for-data-access/),
25-
which sets out the basic conditions and provides a form to request access.
22+
are a researcher and would like to get access to the data, see our page on
23+
getting [data access](data-access.md).
24+
25+
If you have questions about the survey or getting access to data, contact us at
26+

docs/symptom-survey/server-access.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: SFTP Server Access
33
parent: COVID Symptom Survey
4-
nav_order: 4
4+
nav_order: 2
55
---
66

77
# SFTP Server Access

docs/symptom-survey/survey-files.md

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
22
title: Response Files
33
parent: COVID Symptom Survey
4-
nav_order: 1
4+
nav_order: 3
55
---
66

77
# Response Files
88
{: .no_toc}
99

10-
Users with access to the [COVID symptom survey](./index.md) individual response
10+
Users with access to the [COVID Symptom Survey](./index.md) individual response
1111
data should have received SFTP credentials for a private server where the data
1212
are stored. To connect to the server, see the [server access documentation](server-access.md).
1313
This documentation describes the survey data available on that server.
@@ -41,12 +41,23 @@ day the survey response was started, in the Pacific time zone (UTC -
4141
7). `recorded` refers to the day survey data was retrieved; see the [lag
4242
policy](#lag-policy) for more details.
4343

44-
Every day, we write response files for *all* days of data, with today's
44+
Every day, we write response files for all recent days of data, with today's
4545
`recorded` date. You need only load the most recent set of `recorded` files to
4646
obtain all survey responses; the older versions are available to track any
4747
changes in file formats or slight changes from late-arriving responses, as
4848
described in the [lag policy below](#lag-policy).
4949

50+
## Loading Data Files
51+
52+
As described above, one day of data may be reissued several times, if responses
53+
arrive late, file formats are changed, or errors in data processing are
54+
corrected. You need only load the latest version of each file.
55+
56+
For data users who use R to load and process data, we provide a [`get_survey_df`
57+
function](survey-utils.R) to read a directory of CSV files (such as those
58+
provided on the SFTP server), select the correct files, and read them into a
59+
single data frame for use.
60+
5061
## Conditions Responses are Recorded
5162

5263
The survey was configured to record responses under two sets of circumstances:

docs/symptom-survey/survey-utils.R

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
library(readr)
2+
library(purrr)
3+
library(dplyr)
4+
5+
#' Fetch all survey data in a chosen directory.
6+
#'
7+
#' There can be multiple data files for a single day of survey responses, for
8+
#' example if the data is reissued when late-arriving surveys are recorded.
9+
#' Each file contains *all* data recorded for that date, so only the later files
10+
#' are needed.
11+
#'
12+
#' This function extracts the date from each file, determines which files are
13+
#' reissued data, and produces a single data frame representing the most recent
14+
#' data available for each day. It can read gzip-compressed CSV files, such as
15+
#' those on the SFTP site, using `readr::read_csv`.
16+
#'
17+
#' This function handles column types correctly for surveys up to Wave 4.
18+
#'
19+
#' @param directory Directory in which to look for survey CSV files, relative to
20+
#' the current working directory.
21+
#' @param pattern Regular expression indicating which files in that directory to
22+
#' open. By default, selects all `.csv.gz` files, such as those downloaded
23+
#' from the SFTP site.
24+
#' @return A single data frame containing all survey responses. Note that this
25+
#' data frame may have millions of rows and use gigabytes of memory, if this
26+
#' function is run on *all* survey responses.
27+
get_survey_df <- function(directory, pattern = "*.csv.gz$") {
28+
files <- list.files(directory, pattern = pattern)
29+
30+
files <- map_dfr(files, get_file_properties)
31+
32+
latest_files <- files %>%
33+
group_by(date) %>%
34+
filter(recorded == max(recorded)) %>%
35+
ungroup() %>%
36+
pull(filename)
37+
38+
big_df <- map_dfr(
39+
latest_files,
40+
function(f) {
41+
# stop readr from thinking commas = thousand separators,
42+
# and from inferring column types incorrectly
43+
read_csv(file.path(directory, f), locale = locale(grouping_mark = ""),
44+
col_types = cols(
45+
A2b = col_number(),
46+
A3 = col_character(),
47+
A4 = col_number(),
48+
B2 = col_character(),
49+
B2_14_TEXT = col_character(),
50+
B2c = col_character(),
51+
B2c_14_TEXT = col_character(),
52+
B4 = col_number(),
53+
B5 = col_number(),
54+
B7 = col_character(),
55+
B10b = col_character(),
56+
B12a = col_character(),
57+
C1 = col_character(),
58+
C3 = col_number(),
59+
C4 = col_number(),
60+
C5 = col_number(),
61+
C7 = col_number(),
62+
C13 = col_character(),
63+
C13a = col_character(),
64+
D1_4_TEXT = col_character(),
65+
fips = col_character(),
66+
UserLanguage = col_character(),
67+
StartDatetime = col_character(),
68+
EndDatetime = col_character(),
69+
.default = col_number()))
70+
}
71+
)
72+
return(big_df)
73+
}
74+
75+
## Helper function to extract dates from each file's filename.
76+
get_file_properties <- function(filename) {
77+
short <- strsplit(filename, ".", fixed = TRUE)[[1]][1]
78+
parts <- strsplit(short, "_", fixed = TRUE)[[1]]
79+
80+
filedate <- as.Date(paste(parts[3:5], collapse = "-"))
81+
recordeddate <- as.Date(paste(parts[7:9], collapse = "-"))
82+
83+
return(data.frame(filename = filename,
84+
date = filedate,
85+
recorded = recordeddate))
86+
}
Binary file not shown.

docs/symptom-survey/weights.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Survey Weights
33
parent: COVID Symptom Survey
4-
nav_order: 2
4+
nav_order: 4
55
---
66

77
# Survey Weights

0 commit comments

Comments
 (0)