Skip to content

Add intermediate daily updates to covid_hosp signal #308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
elray1 opened this issue Dec 4, 2020 · 3 comments
Closed

Add intermediate daily updates to covid_hosp signal #308

elray1 opened this issue Dec 4, 2020 · 3 comments
Labels
Engineering Used to filter issues when synching with Asana

Comments

@elray1
Copy link

elray1 commented Dec 4, 2020

The covid_hosp signal currently pulls data only from the time series files that are posted on healthdata.gov on a roughly weekly basis with a somewhat erratic schedule. It would be helpful if it were possible to also access the intermediate daily updates, ideally as a combined signal. Ideally, I'd have the ability to query the data source as of some date between the weekly updates, and retrieve a data set constructed as the most recent available weekly signal before the specified date plus any additional data published in daily updates since then. I imagine this may be most easily implemented as a covidcast signal to make use of as_of (if I understand how the pieces fit together correctly).

For example: A weekly update was published on Nov. 29 2020, and daily updates were subsequently published on Nov. 30 and Dec 2. A query for data as of Dec 2 would return all values from the weekly update released on Nov 29 as well as the data for Nov 30 and Dec 2. Ideally, some indication of the missing data for Dec 1 would also be available.

For reference, I have implemented something like this in R. The functionality is in the following two files:

@SumitDELPHI SumitDELPHI added the Engineering Used to filter issues when synching with Asana label Dec 6, 2020
@krivard
Copy link
Contributor

krivard commented Dec 11, 2020

Hi! I'm not sure the Revisions page actually does what is needed for a daily update cadence. From the screenshot below, it looks like the ostensibly-daily updates for 12/01, 12/02, 12/03, 12/04, and 12/05 were all published in a lump on the 5th.

image

There's a discussion to be had about what as-of actually means: clearly someone had this data on the 2nd, but if it wasn't published until later, does that mean we should record a 12/02 issue or not? If the most recent datafile online is from the 6th, today is the 11th, and we train a forecaster with everything we know right now, we'll get some result. If tomorrow a batch of daily updates from 12/7-12/12 is posted...that still counts as future-privileged information for today's run, even if some of them are timestamped earlier than that.

Still, the resolution on the revisions list does seem to be finer than what we're getting from the official JSON metadata, so I will look into what it will take to add it as an alternate source stream. I just don't think we're going to wind up with daily issues.

@krivard
Copy link
Contributor

krivard commented Jan 28, 2021

Hello! This functionality has been added (#374) and activated in production. The daily updates will be sucked in by the system from now on. I'll be able to assemble the backissues from 16 November to present and upload them sometime next week. Thanks for the suggestion!

@krivard
Copy link
Contributor

krivard commented Feb 3, 2023

Fixed in #374

@krivard krivard closed this as completed Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Engineering Used to filter issues when synching with Asana
Projects
None yet
Development

No branches or pull requests

3 participants