-
Notifications
You must be signed in to change notification settings - Fork 67
flusurv
data is stale
#1247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Some other notes: upstream source:
|
Looking at the oldest and newest logs available for flusurv acquisition (can't find any before April 2023):
we always see "current issue: 202111" and
So issues newer than March 2021 are not available/being fetched, and thus no new data is added to the DB. Although we don't have older logs, I'd assume that this has been going on since May 2021, which is the latest This matches my local testing. There are no errors; the pipeline appears to have been running successfully this whole time. |
The The CDC GIS GRASP API/AMF server appears to be meant solely for internal use -- I can't find any documentation The returned data has also changed format, so we'll need to update
Not sure if To avoid fetching the same (large) JSON multiple times, my recommendation is to fetch once upfront in We should also consider how to avoid this type of issue in the future. Based on our mirror of the historical data, the source is updated infrequently so it's possible that we'd see fairly long periods without any data updates. This means that we can't just error out if no new data is returned. |
On recovering versioned data, again, because of the lack of documentation it's unclear to me if it's possible to request data from a particular |
Since fluview* signals pull from another CDC dashboard, we should double-check that those signals are updating correctly. Their CDC API endpoints may also have changed. |
The fluview outage also involved reporting the wrong "current" epiweek, but apparently was due to writing/querying from the wrong server/table. Doesn't seem like that is the case here, but I'm not 100% certain. The fluview ones use Phase02 rather than Phase03, but still sounds like a good idea to check! |
In the motivating example,
the result doesn't have all the value columns we'd expect. Acquisition attempts to update these columns, plus |
The version of FluSurv-NET data available appears to be from 2021-05-28, containing data through the epiweek labeled with date 2020-04-19, while the upstream source has data for the 2022/2023 flu season.
Created on 2023-07-26 with reprex v2.0.2
FluSurv-NET acquisition broke circa 2020-10-09 but was patched to ignore age groups that were introduced then. From the above sample, it looks like these/other age groups are still being ignored; upstream has 2 top-level age groups, 5 subgroups, and 8 subsubgroups; the API returns only 5 age groups. I believe age group changes may have broken flusurv acquisition at some other point in time as well, so that might be a top suspect for the current breakage.
The
fluview*
outage might be too late to be related. FluSurv-NET reporting is not year-round; it typically starts at some point during the flu season when activity levels / influenza hospitalization numbers are deemed high enough (I don't remember the precise rule) and ends at/after the end of the flu season, with some break in issues and/or gap in measurements before the next season.The text was updated successfully, but these errors were encountered: