You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The covid_hosp signal currently pulls data only from the time series files that are posted on healthdata.gov on a roughly weekly basis with a somewhat erratic schedule. It would be helpful if it were possible to also access the intermediate daily updates, ideally as a combined signal. Ideally, I'd have the ability to query the data source as of some date between the weekly updates, and retrieve a data set constructed as the most recent available weekly signal before the specified date plus any additional data published in daily updates since then. I imagine this may be most easily implemented as a covidcast signal to make use of as_of (if I understand how the pieces fit together correctly).
For example: A weekly update was published on Nov. 29 2020, and daily updates were subsequently published on Nov. 30 and Dec 2. A query for data as of Dec 2 would return all values from the weekly update released on Nov 29 as well as the data for Nov 30 and Dec 2. Ideally, some indication of the missing data for Dec 1 would also be available.
For reference, I have implemented something like this in R. The functionality is in the following two files:
Hi! I'm not sure the Revisions page actually does what is needed for a daily update cadence. From the screenshot below, it looks like the ostensibly-daily updates for 12/01, 12/02, 12/03, 12/04, and 12/05 were all published in a lump on the 5th.
There's a discussion to be had about what as-of actually means: clearly someone had this data on the 2nd, but if it wasn't published until later, does that mean we should record a 12/02 issue or not? If the most recent datafile online is from the 6th, today is the 11th, and we train a forecaster with everything we know right now, we'll get some result. If tomorrow a batch of daily updates from 12/7-12/12 is posted...that still counts as future-privileged information for today's run, even if some of them are timestamped earlier than that.
Still, the resolution on the revisions list does seem to be finer than what we're getting from the official JSON metadata, so I will look into what it will take to add it as an alternate source stream. I just don't think we're going to wind up with daily issues.
Hello! This functionality has been added (#374) and activated in production. The daily updates will be sucked in by the system from now on. I'll be able to assemble the backissues from 16 November to present and upload them sometime next week. Thanks for the suggestion!
The covid_hosp signal currently pulls data only from the time series files that are posted on healthdata.gov on a roughly weekly basis with a somewhat erratic schedule. It would be helpful if it were possible to also access the intermediate daily updates, ideally as a combined signal. Ideally, I'd have the ability to query the data source as of some date between the weekly updates, and retrieve a data set constructed as the most recent available weekly signal before the specified date plus any additional data published in daily updates since then. I imagine this may be most easily implemented as a covidcast signal to make use of
as_of
(if I understand how the pieces fit together correctly).For example: A weekly update was published on Nov. 29 2020, and daily updates were subsequently published on Nov. 30 and Dec 2. A query for data as of Dec 2 would return all values from the weekly update released on Nov 29 as well as the data for Nov 30 and Dec 2. Ideally, some indication of the missing data for Dec 1 would also be available.
For reference, I have implemented something like this in R. The functionality is in the following two files:
The text was updated successfully, but these errors were encountered: