Skip to content

Commit d5b7844

Browse files
committed
1.11 docs: whitespace changes only
1 parent 17e4b64 commit d5b7844

File tree

2 files changed

+78
-35
lines changed

2 files changed

+78
-35
lines changed

docs/api/covidcast-signals/google-symptoms.md

Lines changed: 43 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,16 @@ grand_parent: COVIDcast API
1414
* **Available for:** county, MSA, HRR, state (see [geography coding docs](../covidcast_geography.md))
1515
* **License:** [CC BY](../covidcast_licensing.md#creative-commons-attribution)
1616

17-
This data source is based on the [COVID-19 Search Trends symptoms dataset](https://github.com/google-research/open-covid-19-data/tree/master/data/exports/search_trends_symptoms_dataset). Using this search data, we estimate the volume of searches mapped to symptoms related to COVID-19 such as _anosmia_ (lack of smell) and _ageusia_(lack of taste). The resulting daily dataset for each region shows the relative frequency of searches for each symptom. The signals are measured in arbitrary units that are normalized for population and scaled by the maximum value of the normalized popularity within a
18-
geographic region across a specific time range. **Thus, values are NOT
19-
comparable across geographic regions**. Larger numbers represent higher numbers of symptom-related searches.
17+
This data source is based on the [COVID-19 Search Trends symptoms
18+
dataset](https://github.com/google-research/open-covid-19-data/tree/master/data/exports/search_trends_symptoms_dataset). Using
19+
this search data, we estimate the volume of searches mapped to symptoms related
20+
to COVID-19 such as _anosmia_ (lack of smell) and _ageusia_(lack of taste). The
21+
resulting daily dataset for each region shows the relative frequency of searches
22+
for each symptom. The signals are measured in arbitrary units that are
23+
normalized for population and scaled by the maximum value of the normalized
24+
popularity within a geographic region across a specific time range. **Thus,
25+
values are NOT comparable across geographic regions**. Larger numbers represent
26+
higher numbers of symptom-related searches.
2027

2128
| Signal | Description |
2229
| --- | --- |
@@ -34,23 +41,45 @@ comparable across geographic regions**. Larger numbers represent higher numbers
3441
1. TOC
3542
{:toc}
3643
## Estimation
37-
The `sum_anosmia_ageusia_raw_search` signals are simply the raw sum of the values of `anosmia_raw_search`
38-
and `ageusia_raw_search`, but not the union of anosmia and ageusia related searches. This is because the data volume is calculated based on search queries. A single search query can be mapped to more than one symptom. Currently, Google does not provide _intersection/union_ data. Users should be careful when considering such signals.
44+
The `sum_anosmia_ageusia_raw_search` signals are simply the raw sum of the
45+
values of `anosmia_raw_search` and `ageusia_raw_search`, but not the union of
46+
anosmia and ageusia related searches. This is because the data volume is
47+
calculated based on search queries. A single search query can be mapped to more
48+
than one symptom. Currently, Google does not provide _intersection/union_
49+
data. Users should be careful when considering such signals.
3950

4051
## Limitation
41-
When daily volume in a region does not meet quality or privacy thresholds, set by Google, no value
42-
will be reported. Since Google uses differential privacy, there is artificial noise added to the raw
43-
datasets to avoid identifying any individual persons without affecting the quality of results.
52+
When daily volume in a region does not meet quality or privacy thresholds, set
53+
by Google, no value will be reported. Since Google uses differential privacy,
54+
there is artificial noise added to the raw datasets to avoid identifying any
55+
individual persons without affecting the quality of results.
4456

45-
The data is normalized by the total number of Search users in certain regions for a certain time period and is scaled considering the maximum value of the normalized
46-
popularity across the entire published time range for that region over all symptoms. The values
47-
of symptom popularity are **NOT** comparable across geographic regions. Due to the scaling step,
48-
most of the values should be in the range 0-1. However, since the scaling factor is calculated and stored at a certain time point, the symptom popularity released after that time point is likely to exceed the previously-observed maximum value which results in values larger than 1.
57+
The data is normalized by the total number of Search users in certain regions
58+
for a certain time period and is scaled considering the maximum value of the
59+
normalized popularity across the entire published time range for that region
60+
over all symptoms. The values of symptom popularity are **NOT** comparable
61+
across geographic regions. Due to the scaling step, most of the values should be
62+
in the range 0-1. However, since the scaling factor is calculated and stored at
63+
a certain time point, the symptom popularity released after that time point is
64+
likely to exceed the previously-observed maximum value which results in values
65+
larger than 1.
4966

5067

5168
## Geographical Aggregation
52-
The state-level and county-level `raw_search` signals for specific symptoms such as _anosmia_ and _ageusia_ are taken directly from the [COVID-19 Search Trends symptoms dataset](https://github.com/google-research/open-covid-19-data/tree/master/data/exports/search_trends_symptoms_dataset) without changes. We aggregate the county-level data to the MSA and HRR levels using the population-weighted average. For MSAs/HRRs that include counties that have no data provided due to quality or privacy issues for a certain day, we simply assume the values to be 0 during aggregation. The values for MSAs/HRRs with no counties having non-NaN values will not be reported. Thus, the resulting MSA/HRR level data does not fully match the _actual_ MSA/HRR level data (which we are not provided).
69+
The state-level and county-level `raw_search` signals for specific symptoms such
70+
as _anosmia_ and _ageusia_ are taken directly from the [COVID-19 Search Trends
71+
symptoms
72+
dataset](https://github.com/google-research/open-covid-19-data/tree/master/data/exports/search_trends_symptoms_dataset)
73+
without changes. We aggregate the county-level data to the MSA and HRR levels
74+
using the population-weighted average. For MSAs/HRRs that include counties that
75+
have no data provided due to quality or privacy issues for a certain day, we
76+
simply assume the values to be 0 during aggregation. The values for MSAs/HRRs
77+
with no counties having non-NaN values will not be reported. Thus, the resulting
78+
MSA/HRR level data does not fully match the _actual_ MSA/HRR level data (which
79+
we are not provided).
5380

5481

5582
## Lag and Backfill
56-
Google does not update the search data daily, but has an uncertain update frequency. The delay can range from 1 day to 10 days or even more. We check for updates every day and provide the most up-to-date data.
83+
Google does not update the search data daily, but has an uncertain update
84+
frequency. The delay can range from 1 day to 10 days or even more. We check for
85+
updates every day and provide the most up-to-date data.

docs/api/covidcast-signals/safegraph.md

Lines changed: 35 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,10 @@ datasets.
2727
* **Number of data revisions since 23 June 2020:** 1
2828
* **Date of last change:** 3 November 2020
2929

30-
Data source based on [social
31-
distancing metrics](https://docs.safegraph.com/docs/social-distancing-metrics).
32-
SafeGraph provides this data for
33-
individual census block groups, using differential privacy to protect individual people's data privacy.
30+
Data source based on [social distancing
31+
metrics](https://docs.safegraph.com/docs/social-distancing-metrics). SafeGraph
32+
provides this data for individual census block groups, using differential
33+
privacy to protect individual people's data privacy.
3434

3535
Delphi creates features of the SafeGraph data at the census block group level,
3636
then aggregates these features to the county and state levels. The aggregated
@@ -59,10 +59,11 @@ doing so, we make the simplifying assumption that each CBG contributes an iid
5959
observation to the county-level distribution. `n` also serves as the sample
6060
size. The same method is used for aggregation to states.
6161

62-
SafeGraph's signals measure mobility each day, which causes strong day-of-week effects:
63-
weekends have substantially different values than weekdays. Users interested in long-term
64-
trends, rather than mobility on one specific day, may prefer the `7dav` signals since
65-
averaging over the preceding 7 days removes these day-of-week effects.
62+
SafeGraph's signals measure mobility each day, which causes strong day-of-week
63+
effects: weekends have substantially different values than weekdays. Users
64+
interested in long-term trends, rather than mobility on one specific day, may
65+
prefer the `7dav` signals since averaging over the preceding 7 days removes
66+
these day-of-week effects.
6667

6768
### Lag
6869

@@ -77,12 +78,17 @@ additional day for SafeGraph's data to be ingested into the COVIDcast API.
7778
* **Number of data revisions since 23 June 2020:** 0
7879
* **Date of last change:** never
7980

80-
Data source based on
81-
[Weekly Patterns](https://docs.safegraph.com/docs/weekly-patterns) dataset. SafeGraph provides this data for
82-
different points of interest ([POIs](https://docs.safegraph.com/v4.0/docs#section-core-places)) considering individual census block groups, using differential privacy to protect individual people's data privacy.
81+
Data source based on [Weekly
82+
Patterns](https://docs.safegraph.com/docs/weekly-patterns) dataset. SafeGraph
83+
provides this data for different points of interest
84+
([POIs](https://docs.safegraph.com/v4.0/docs#section-core-places)) considering
85+
individual census block groups, using differential privacy to protect individual
86+
people's data privacy.
8387

84-
Delphi gathers the number of daily visits to POIs of certain types(bars, restaurants, etc.)
85-
from SafeGraph's Weekly Patterns data at the 5-digit ZipCode level, then aggregates and reports these features to the county, MSA, HRR, and state levels. The aggregated data is freely available through the COVIDcast API.
88+
Delphi gathers the number of daily visits to POIs of certain types(bars,
89+
restaurants, etc.) from SafeGraph's Weekly Patterns data at the 5-digit ZipCode
90+
level, then aggregates and reports these features to the county, MSA, HRR, and
91+
state levels. The aggregated data is freely available through the COVIDcast API.
8692

8793
For precise definitions of the quantities below, consult the [SafeGraph Weekly
8894
Patterns documentation](https://docs.safegraph.com/docs/weekly-patterns).
@@ -94,14 +100,22 @@ Patterns documentation](https://docs.safegraph.com/docs/weekly-patterns).
94100
| `restaurants_visit_num` | The number of daily visits to restaurant-related POIs in a certain region |
95101
| `restaurants_visit_prop` | The number of daily visits to restaurant-related POIs in a certain region, per 100,000 population |
96102

97-
SafeGraph delivers the number of daily visits to U.S. POIs, the details of which are described in
98-
the [Places Manual](https://readme.safegraph.com/docs/places-manual#section-placekey) dataset.
99-
Delphi aggregates the number of visits to certain types of places, such as
100-
bars (places with [NAICS code = 722410](https://www.census.gov/cgi-bin/sssd/naics/naicsrch?input=722410&search=2017+NAICS+Search&search=2017)) and restaurants (places with [NAICS code = 722511](https://www.census.gov/cgi-bin/sssd/naics/naicsrch)). For example, Adagio Teas is coded as a bar because it serves alcohol, while Napkin Burger is considered to be a full-service restaurant.
101-
More information on NAICS codes is available from the [US Census Bureau: North American Industry Classification System](https://www.census.gov/eos/www/naics/index.html).
103+
SafeGraph delivers the number of daily visits to U.S. POIs, the details of which
104+
are described in the [Places
105+
Manual](https://readme.safegraph.com/docs/places-manual#section-placekey)
106+
dataset. Delphi aggregates the number of visits to certain types of places,
107+
such as bars (places with [NAICS code =
108+
722410](https://www.census.gov/cgi-bin/sssd/naics/naicsrch?input=722410&search=2017+NAICS+Search&search=2017))
109+
and restaurants (places with [NAICS code =
110+
722511](https://www.census.gov/cgi-bin/sssd/naics/naicsrch)). For example,
111+
Adagio Teas is coded as a bar because it serves alcohol, while Napkin Burger is
112+
considered to be a full-service restaurant. More information on NAICS codes is
113+
available from the [US Census Bureau: North American Industry Classification
114+
System](https://www.census.gov/eos/www/naics/index.html).
102115

103116
### Lag
104117

105-
SafeGraph provides newly updated data for the previous week every Wednesday,
106-
meaning estimates for a specific day are only available 3-9 days later. It may take up to an
107-
additional day for SafeGraph's data to be ingested into the COVIDcast API.
118+
SafeGraph provides newly updated data for the previous week every Wednesday,
119+
meaning estimates for a specific day are only available 3-9 days later. It may
120+
take up to an additional day for SafeGraph's data to be ingested into the
121+
COVIDcast API.

0 commit comments

Comments
 (0)