`flusurv` data is stale #1247

brookslogan · 2023-07-26T23:46:29Z

The version of FluSurv-NET data available appears to be from 2021-05-28, containing data through the epiweek labeled with date 2020-04-19, while the upstream source has data for the 2022/2023 flu season.

library(epidatr)
dat = flusurv(locations = "network_all", epiweeks = epirange(201701, 202301)) %>% fetch()
max(dat$release_date)
#> [1] "2021-05-28"
max(dat$epiweek)
#> [1] "2020-04-19"
names(dat)
#>  [1] "release_date" "location"     "issue"        "epiweek"      "lag"         
#>  [6] "rate_age_0"   "rate_age_1"   "rate_age_2"   "rate_age_3"   "rate_age_4"  
#> [11] "rate_overall"

^{Created on 2023-07-26 with reprex v2.0.2}

FluSurv-NET acquisition broke circa 2020-10-09 but was patched to ignore age groups that were introduced then. From the above sample, it looks like these/other age groups are still being ignored; upstream has 2 top-level age groups, 5 subgroups, and 8 subsubgroups; the API returns only 5 age groups. I believe age group changes may have broken flusurv acquisition at some other point in time as well, so that might be a top suspect for the current breakage.

The fluview* outage might be too late to be related. FluSurv-NET reporting is not year-round; it typically starts at some point during the flu season when activity levels / influenza hospitalization numbers are deemed high enough (I don't remember the precise rule) and ends at/after the end of the flu season, with some break in issues and/or gap in measurements before the next season.

The text was updated successfully, but these errors were encountered:

brookslogan · 2023-07-26T23:51:05Z

Some other notes: upstream source:

has a few more breakdowns now which would be of interest: Sex, Race/Ethnicity, Season
shows a blank plot for Cumulative Rate for [the 2020-21 flu season, which would have started ~ Oct 2020] and "No Data Available" for corresponding Weekly Rates. [so maybe that + the fluview* outage could be connected]

nmdefries · 2023-08-18T16:56:39Z

Looking at the oldest and newest logs available for flusurv acquisition (can't find any before April 2023):

we always see "current issue: 202111" and

rows before: 212669
rows after: 212669 (+0)

So issues newer than March 2021 are not available/being fetched, and thus no new data is added to the DB. Although we don't have older logs, I'd assume that this has been going on since May 2021, which is the latest release_date available in our flusurv data.

This matches my local testing. get_current_issue() returns 202111 using a magic URL. I wonder if we were supposed to switch to a different magic URL.

There are no errors; the pipeline appears to have been running successfully this whole time.

nmdefries · 2023-08-18T20:09:25Z

The cdcfluview package successfully gets up-to-date data (loaddatetime is Aug 12, 2023; data is available through 2023w17), so it looks like the Flu3 endpoints changed. The old ones still work but aren't returning new data.

The CDC GIS GRASP API/AMF server appears to be meant solely for internal use -- I can't find any documentation ~~and can't find the new https://gis.cdc.gov/GRASP/Flu3/PostPhase03DataTool endpoint myself. I assume it is used somewhere in the dashboard source.~~ The new https://gis.cdc.gov/GRASP/Flu3/PostPhase03DataTool endpoint can be found by loading (in Chrome) the source dashboard, turning on the inspector, going to the Network tab, and reloading or otherwise interacting with the page. API queries will show up as network requests.

The returned data has also changed format, so we'll need to update flusurv.extract_from_object() to account for that. ~~It doesn't appear possible to request specific locations from the new endpoint.~~ Edit: You can request locations with a payload like

{
  "appversion": "Public",
  "key": "getdata",
  "injson": [
    {
      "seasonid": 62,
      "networkid": 2,
      "catchmentid": 22
    }
  ]
}

Not sure if seasonid is required.

To avoid fetching the same (large) JSON multiple times, my recommendation is to fetch once upfront in main(), and pass the resulting JSON to get_current_issue and get_data functions to extract data of interest. This also avoids different endpoints potentially returning data from different loaddatetime. (Currently we use GetPhase03InitApp for the loaddatetime and PostPhase03GetData for the location data. With no documentation, it's hard to guarantee their behavior.)

We should also consider how to avoid this type of issue in the future. Based on our mirror of the historical data, the source is updated infrequently so it's possible that we'd see fairly long periods without any data updates. This means that we can't just error out if no new data is returned.

nmdefries · 2023-08-18T20:10:42Z

On recovering versioned data, again, because of the lack of documentation it's unclear to me if it's possible to request data from a particular loaddatetime. Edit: According to our CDC contacts, this is not possible.

nmdefries · 2023-08-21T18:22:31Z

Since fluview* signals pull from another CDC dashboard, we should double-check that those signals are updating correctly. Their CDC API endpoints may also have changed.

brookslogan · 2023-08-21T18:35:33Z

The fluview outage also involved reporting the wrong "current" epiweek, but apparently was due to writing/querying from the wrong server/table. Doesn't seem like that is the case here, but I'm not 100% certain.

The fluview ones use Phase02 rather than Phase03, but still sounds like a good idea to check!

nmdefries · 2023-09-15T20:56:25Z

In the motivating example,

library(epidatr)
dat = flusurv(locations = "network_all", epiweeks = epirange(201701, 202301)) %>% fetch()
...
names(dat)
#>  [1] "release_date" "location"     "issue"        "epiweek"      "lag"         
#>  [6] "rate_age_0"   "rate_age_1"   "rate_age_2"   "rate_age_3"   "rate_age_4"  
#> [11] "rate_overall"

the result doesn't have all the value columns we'd expect. Acquisition attempts to update these columns, plus rate_age_5, 6, and 7. Float fields need to be updated.

brookslogan added the data quality label Jul 26, 2023

nmdefries self-assigned this Aug 8, 2023

nmdefries linked a pull request Aug 23, 2023 that will close this issue

Switch to new flusurv API endpoint #1278

Open

4 tasks

nmdefries mentioned this issue Sep 15, 2023

Update flusurv schema and docs with new age, sex, and race groups #1287

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`flusurv` data is stale #1247

`flusurv` data is stale #1247

brookslogan commented Jul 26, 2023

brookslogan commented Jul 26, 2023 •

edited

Loading

nmdefries commented Aug 18, 2023 •

edited

Loading

nmdefries commented Aug 18, 2023 •

edited

Loading

nmdefries commented Aug 18, 2023 •

edited

Loading

nmdefries commented Aug 21, 2023 •

edited

Loading

brookslogan commented Aug 21, 2023

nmdefries commented Sep 15, 2023 •

edited

Loading

flusurv data is stale #1247

flusurv data is stale #1247

Comments

brookslogan commented Jul 26, 2023

brookslogan commented Jul 26, 2023 • edited Loading

nmdefries commented Aug 18, 2023 • edited Loading

nmdefries commented Aug 18, 2023 • edited Loading

nmdefries commented Aug 18, 2023 • edited Loading

nmdefries commented Aug 21, 2023 • edited Loading

brookslogan commented Aug 21, 2023

nmdefries commented Sep 15, 2023 • edited Loading

`flusurv` data is stale #1247

`flusurv` data is stale #1247

brookslogan commented Jul 26, 2023 •

edited

Loading

nmdefries commented Aug 18, 2023 •

edited

Loading

nmdefries commented Aug 18, 2023 •

edited

Loading

nmdefries commented Aug 18, 2023 •

edited

Loading

nmdefries commented Aug 21, 2023 •

edited

Loading

nmdefries commented Sep 15, 2023 •

edited

Loading