You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/api/covidcast-signals/indicator-combination.md
+12-3Lines changed: 12 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -132,13 +132,22 @@ sensor. This can be avoided with global column scaling.
132
132
#### Lags and Sporadic Missingness
133
133
134
134
The matrix $$X$$ is not necessarily complete and we may have entries missing.
135
-
Several forms of missingness arise in our data. On certain days, all observations of a given sensor are missing due to release lag. For example, Doctor Visits is released several days late. Also, for any given region and sensor, a sensor may be available on some days but not others due to sample size cutoffs. Additionally, on any given day, different sensors are observed in different regions.
135
+
Several forms of missingness arise in our data. On certain days, all
136
+
observations of a given sensor are missing due to release lag. For example,
137
+
Doctor Visits is released several days late. Also, for any given region and
138
+
sensor, a sensor may be available on some days but not others due to sample size
139
+
cutoffs. Additionally, on any given day, different sensors are observed in
140
+
different regions.
136
141
137
142
To ensure that our combined indicator value has comparable scaling over time and
138
143
is free from erratic jumps that are just due to missingness, we use the
139
144
following imputation strategies:
140
-
*lag imputation*, where if a sensor is missing for all regions on a given day, we copy all observations from the last day on which any observation was available for that sensor;
141
-
*recent imputation*, where if a sensor value if missing on a given day is missing but at least one of past $T$ values is observed, we impute it with the most recent value. We limit $T$ to be 7 days.
145
+
**lag imputation*, where if a sensor is missing for all regions on a given day,
146
+
we copy all observations from the last day on which any observation was
147
+
available for that sensor;
148
+
**recent imputation*, where if a sensor value if missing on a given day is
149
+
missing but at least one of past $$T$$ values is observed, we impute it with
150
+
the most recent value. We limit $$T$$ to be 7 days.
0 commit comments