-
Notifications
You must be signed in to change notification settings - Fork 16
GHT has misleading declining trend --- in areas with unexpectedly low volume? #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The second potential issue here is why College Station isn't meeting the reporting threshold most of the time; I would think it's pretty populous but I am not certain. |
Yes; this is related to #36 -- current smoothing is designed to handle regions with occasional missingness; it does poorly with areas that are typically missing with only occasional data. |
College Station is populous during the school year; otherwise it's a pretty small college town. |
Since we are doing "filtering" --- best estimate for time t using data up to time t (or t + lag) --- I don't think we can avoid spikes upward without sacrificing prediction accuracy or availability. For viz purposes I would think a spike upward, plateau, then spike downward might mislead viewers less (maybe viz team's user studies will give actual data on this), but this might also be at the cost of prediction accuracy. Guess there is not a clear course of action here that meets all purposes simultaneously. Maybe expanding the default viz time scale to include last four weeks could help. But the issue might still remain if a signal is missing for the last 3 weeks in a row. |
I agree with @brookslogan 's last comment. (I will refer to "filtering := left smoothing" and "smoothing := symmetric smoothing" to avoid overloading the term "smoothing".) Basically, I devised the smoothing method for GHT, which was required to be a left smoother, to be "as smooth as possible without sacrificing the ability to tell that there was a jump today". This leads to a jump when a point mass appears followed by a taper to zero. I thought that this would be preferable to turning a point mass, say on Monday, to a symmetric mass centered on Wednesday. A left smoother is not capable of turning it into a point mass centered on Monday. My personal view is that the appropriate thing to do for the map is, to perform left-smoothing for the present and symmetric-smoothing for the past (which is the same as recomputing a smoother over all available data, every day). Then maintain well-documented, separate sources of data that are only left-smoothed for end-users who are using it to construct their own models. (More generally, this fits into the discussion of backfill, etc.). |
I agree with @huisaddison that symmetric smoothers are more natural to plot. But this may be blocked by the addition of the issue and lag columns; I don't know how far progress is on adding these. Looking again at the map, though, I am not sure if I have correctly described the nature of the GHT patterns we are seeing. It looks like there can be spikes that are not left-smoothed. (But maybe this is from GHT returning different results for the same day?) |
Next:
|
The new signal is added here. The declining trend would be changed to a "spike upward, plateau, then spike downward" pattern. |
Hi everyone! I've been working on a smoothing utility refactor along with implementing a new smoother for the past couple weeks (see #176). I am still catching up to speed on what the challenge points are exactly, but I have applied some new methods to tackling the spiking behavior in this notebook. I would love to get your feedback! The first two sections ("GHT MSA" and "Imputing") contain some plots directly relevant to this discussion. The rest of the notebook contains applications of a variety of other methods on other datasets. |
Just to clarify -- doctor-visits and hospital-admissions are smoothed by us, not by the provider, but they're in the same boat with fb-survey in that the DUA prohibits access to the source data outside CMU Delphi. You can however get a conservative approximation of what smoothing the fb-survey signal would do if you grab the raw signal variants that come out of the API. Once Alex and I finish resolving discrepancies between the old and new fb-survey codebases, the survey signals should be less sensitive to this problem. This is because new codebase is able to compute the smoothed signals before applying the minimum sample size thresholds, so the underlying signal is less choppy. The raw signals in the API have had the minimum sample size thresholds applied, so that's a worst-case scenario of choppiness. |
In MSAs like:
It appears that the GHT signal is encountering single queries or single days with queries above the reporting threshold on scattered days, but then the smoothing/averaging over the following days make it look like a declining trend over the next 7 days. Especially with the default of showing the last two weeks only in the graph, this often appears like a (misleading) rapidly decreasing trend, or strange spikes upward.
(I guess this goes back to real 0s vs. under threshold; not sure what the status on handling these is.) It seems we need to adjust the smoothing to either: skip over missed reporting or perhaps both missed reporting and real 0s --- e.g., if there are only 3 nonmissing days, average over only those 3 with a 3 in the denominator rather than 7 --- or report NAs for the smoothed estimates instead.
The text was updated successfully, but these errors were encountered: