GHT has misleading declining trend --- in areas with unexpectedly low volume? #67

brookslogan · 2020-06-05T16:28:20Z

In MSAs like:

Beaumont-Port Arthur, TX
College Station-Bryan, TX
Sioux City, IA-NE-SD
Cheyenne, WY

It appears that the GHT signal is encountering single queries or single days with queries above the reporting threshold on scattered days, but then the smoothing/averaging over the following days make it look like a declining trend over the next 7 days. Especially with the default of showing the last two weeks only in the graph, this often appears like a (misleading) rapidly decreasing trend, or strange spikes upward.

(I guess this goes back to real 0s vs. under threshold; not sure what the status on handling these is.) It seems we need to adjust the smoothing to either: skip over missed reporting or perhaps both missed reporting and real 0s --- e.g., if there are only 3 nonmissing days, average over only those 3 with a 3 in the denominator rather than 7 --- or report NAs for the smoothed estimates instead.

brookslogan · 2020-06-05T16:29:42Z

The second potential issue here is why College Station isn't meeting the reporting threshold most of the time; I would think it's pretty populous but I am not certain.

krivard · 2020-06-05T16:32:54Z

Yes; this is related to #36 -- current smoothing is designed to handle regions with occasional missingness; it does poorly with areas that are typically missing with only occasional data.

capnrefsmmat · 2020-06-05T21:00:17Z

College Station is populous during the school year; otherwise it's a pretty small college town.

brookslogan · 2020-06-06T03:03:42Z

Since we are doing "filtering" --- best estimate for time t using data up to time t (or t + lag) --- I don't think we can avoid spikes upward without sacrificing prediction accuracy or availability. For viz purposes I would think a spike upward, plateau, then spike downward might mislead viewers less (maybe viz team's user studies will give actual data on this), but this might also be at the cost of prediction accuracy. Guess there is not a clear course of action here that meets all purposes simultaneously. Maybe expanding the default viz time scale to include last four weeks could help. But the issue might still remain if a signal is missing for the last 3 weeks in a row.

huisaddison · 2020-06-09T02:11:02Z

I agree with @brookslogan 's last comment. (I will refer to "filtering := left smoothing" and "smoothing := symmetric smoothing" to avoid overloading the term "smoothing".)

Basically, I devised the smoothing method for GHT, which was required to be a left smoother, to be "as smooth as possible without sacrificing the ability to tell that there was a jump today". This leads to a jump when a point mass appears followed by a taper to zero. I thought that this would be preferable to turning a point mass, say on Monday, to a symmetric mass centered on Wednesday. A left smoother is not capable of turning it into a point mass centered on Monday.

My personal view is that the appropriate thing to do for the map is, to perform left-smoothing for the present and symmetric-smoothing for the past (which is the same as recomputing a smoother over all available data, every day). Then maintain well-documented, separate sources of data that are only left-smoothed for end-users who are using it to construct their own models. (More generally, this fits into the discussion of backfill, etc.).

brookslogan · 2020-06-12T23:04:38Z

I agree with @huisaddison that symmetric smoothers are more natural to plot. But this may be blocked by the addition of the issue and lag columns; I don't know how far progress is on adding these.

Looking again at the map, though, I am not sure if I have correctly described the nature of the GHT patterns we are seeing. It looks like there can be spikes that are not left-smoothed. (But maybe this is from GHT returning different results for the same day?)

krivard · 2020-07-06T19:06:07Z

Put a prototype centered smoother signal in as a wip signal

jingjtang · 2020-07-07T18:00:20Z

The new signal is added here. The declining trend would be changed to a "spike upward, plateau, then spike downward" pattern.

dshemetov · 2020-08-12T00:44:36Z

Hi everyone! I've been working on a smoothing utility refactor along with implementing a new smoother for the past couple weeks (see #176). I am still catching up to speed on what the challenge points are exactly, but I have applied some new methods to tackling the spiking behavior in this notebook. I would love to get your feedback!

The first two sections ("GHT MSA" and "Imputing") contain some plots directly relevant to this discussion. The rest of the notebook contains applications of a variety of other methods on other datasets.

krivard · 2020-08-12T13:34:21Z

Just to clarify -- doctor-visits and hospital-admissions are smoothed by us, not by the provider, but they're in the same boat with fb-survey in that the DUA prohibits access to the source data outside CMU Delphi.

You can however get a conservative approximation of what smoothing the fb-survey signal would do if you grab the raw signal variants that come out of the API. Once Alex and I finish resolving discrepancies between the old and new fb-survey codebases, the survey signals should be less sensitive to this problem. This is because new codebase is able to compute the smoothed signals before applying the minimum sample size thresholds, so the underlying signal is less choppy. The raw signals in the API have had the minimum sample size thresholds applied, so that's a worst-case scenario of choppiness.

krivard added the Triage Nominate for inclusion in the next release label Jun 11, 2020

krivard added this to the Ongoing milestone Jun 12, 2020

krivard removed the Triage Nominate for inclusion in the next release label Jun 12, 2020

krivard mentioned this issue Jun 15, 2020

Consider supplying both smoothed + raw values in each COVIDcast signal cmu-delphi/delphi-epidata#131

Closed

krivard assigned jingjtang Jul 1, 2020

krivard added the modeling Must coordinate with Modeling team label Jul 8, 2020

krivard assigned dshemetov Jul 29, 2020

nmdefries added the data quality Missing data, weird data, broken data label Nov 10, 2020

SumitDELPHI added the Engineering Used to filter issues when synching with Asana label Dec 6, 2020

krivard closed this as completed Aug 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GHT has misleading declining trend --- in areas with unexpectedly low volume? #67

GHT has misleading declining trend --- in areas with unexpectedly low volume? #67

brookslogan commented Jun 5, 2020

brookslogan commented Jun 5, 2020

Uh oh!

krivard commented Jun 5, 2020

Uh oh!

capnrefsmmat commented Jun 5, 2020

Uh oh!

brookslogan commented Jun 6, 2020 •

edited

Loading

Uh oh!

huisaddison commented Jun 9, 2020

Uh oh!

brookslogan commented Jun 12, 2020

Uh oh!

krivard commented Jul 6, 2020

Uh oh!

jingjtang commented Jul 7, 2020

Uh oh!

dshemetov commented Aug 12, 2020

Uh oh!

krivard commented Aug 12, 2020 •

edited

Loading

Uh oh!

GHT has misleading declining trend --- in areas with unexpectedly low volume? #67

GHT has misleading declining trend --- in areas with unexpectedly low volume? #67

Comments

brookslogan commented Jun 5, 2020

brookslogan commented Jun 5, 2020

Uh oh!

krivard commented Jun 5, 2020

Uh oh!

capnrefsmmat commented Jun 5, 2020

Uh oh!

brookslogan commented Jun 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huisaddison commented Jun 9, 2020

Uh oh!

brookslogan commented Jun 12, 2020

Uh oh!

krivard commented Jul 6, 2020

Uh oh!

jingjtang commented Jul 7, 2020

Uh oh!

dshemetov commented Aug 12, 2020

Uh oh!

krivard commented Aug 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brookslogan commented Jun 6, 2020 •

edited

Loading

krivard commented Aug 12, 2020 •

edited

Loading