You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A number of factors have made our HHS pipeline fragile in recent months. I made some WIP code to compare what our API had available compared to what was on the healthdata.gov site. Gonna post it here, as it might useful for debugging and doing some patches. It shows some recent discrepancies, though it could be just bugs in my comparison. Haven't tried it on older data.
If anyone feels motivated to dive into this, I'd welcome the help. Tagging myself so I remember to come back to this.
library(epidatr)
library(magrittr)
library(dplyr)
#' Download versioned data from healthdata.gov. Since the data is for previous#' day, we subtract 1 from the date.get_healthdata_from_url<-function(url) {
url %>%
read_csv() %>%
mutate(
state= tolower(state),
date= as.Date(date) -1
) %>%
select(geo_value=state, time_value=date, value=previous_day_admission_influenza_confirmed) %>%
arrange(geo_value, time_value)
}
#' Download versioned data from epidatr. The data is already adjusted for#' previous day.get_epidata_from_date<-function(date) {
pub_covidcast(
source="hhs",
signals="confirmed_admissions_influenza_1d",
geo_type="state",
time_type="day",
geo_values="*",
time_values= epirange(20200101, date),
as_of= strftime(date, format="%Y%m%d")
) %>%
select(geo_value, time_value, value) %>%
arrange(geo_value, time_value)
}
#' Find differences between healthdata.gov and epidatr.compare_datasets<-function(healthdata, epidata) {
inner_join(
healthdata,
epidata,
by= c("geo_value", "time_value")
) %>%
filter(value.x!=value.y) %>%
select(geo_value, time_value, value.x, value.y) %>%
arrange(geo_value, time_value)
}
get_metadata<-function() {
read_csv("https://healthdata.gov/api/views/qqte-vkut/rows.csv") %>%
select(datetime=`Update Date`, link=`Archive Link`) %>%
mutate(
datetime= as.POSIXct(datetime, format="%m/%d/%Y %H:%M:%S %p")
) %>%
arrange(desc(datetime))
}
metadata<- get_metadata()
data_comparisons<-metadata %>%
# Look at data uploads from before the previous day (ones we have a chance of# having).
filter(datetime<= Sys.time() -60*60*24) %>%
head(n=6) %>%
rowwise() %>%
mutate(
healthdata=list(get_healthdata_from_url(link)),
# Get the data one day later, since our pipeline is setup to get data for# previous day.epidata=list(get_epidata_from_date(as.Date(datetime) +1)),
comparison=list(compare_datasets(healthdata, epidata))
)
# Data drop from 2023-11-15 compared to epidata as of 2023-11-16data_comparisons[[1L, "comparison"]]
# A tibble: 135 × 4geo_valuetime_valuevalue.xvalue.y<chr><date><dbl><dbl>1ak2022-12-14782ak2022-12-15343ak2022-12-16694ak2022-12-2110125ak2022-12-23456ak2022-12-24347ak2022-12-25348ak2022-12-268129ak2022-12-276710ak2022-12-2879# ℹ 125 more rows# ℹ Use `print(n = ...)` to see more rows# Data drop from 2023-11-10 compared to epidata as of 2023-11-11data_comparisons[[3L, "comparison"]]
# A tibble: 339 × 4geo_valuetime_valuevalue.xvalue.y<chr><date><dbl><dbl>1ak2022-12-14782ak2022-12-15343ak2022-12-16694ak2022-12-2110125ak2022-12-23456ak2022-12-24347ak2022-12-25348ak2022-12-268129ak2022-12-276710ak2022-12-2879# ℹ 329 more rows# ℹ Use `print(n = ...)` to see more rows
The text was updated successfully, but these errors were encountered:
I think so. Our historical HHS data doesn't accurately reflect what's in the healthdata.gov archives and we should probably still fix that at some point.
Uh oh!
There was an error while loading. Please reload this page.
A number of factors have made our HHS pipeline fragile in recent months. I made some WIP code to compare what our API had available compared to what was on the healthdata.gov site. Gonna post it here, as it might useful for debugging and doing some patches. It shows some recent discrepancies, though it could be just bugs in my comparison. Haven't tried it on older data.
If anyone feels motivated to dive into this, I'd welcome the help. Tagging myself so I remember to come back to this.
The text was updated successfully, but these errors were encountered: