Ingest Quidel COVID test data #40

krivard · 2020-05-22T15:09:11Z

Antigen test results will start coming to us Monday. Not as high-quality as PCR, but much faster, and still good enough for clinical use.

capnrefsmmat · 2020-05-27T18:24:21Z

Have we received any of this data yet, so we know what format to expect?

jingjtang · 2020-05-27T18:51:00Z

I talked to Roni this morning, it seems we have not got it yet. Best, Jingjing

…

On May 27, 2020, at 2:24 PM, Alex Reinhart ***@***.***> wrote: Have we received any of this data yet, so we know what format to expect? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#40 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHP44VORLWCSRZM74CLW24TRTVLGHANCNFSM4NH3626Q>.

krivard · 2020-07-09T15:37:36Z

@krivard to find Jeremy Weiss and get an update on what the current volume is

krivard · 2020-07-10T18:17:16Z

Volume is now 18-19k tests/day

Substantial backfill, like ~1 month

Jeremy's analyses are based on test date, not storage date, falling back to storage date if the test date is obviously wrong. Test date is the day of the assay not the day the patient's sample was taken.

Data is already being sent to the drop!

Jeremy to send over the script he's been running to Jingjing and cc Katie
Port the Quidel flu testing code to this codebase and use the COVID results instead, or write a new pipeline
Submit wip signals to the API
Do correlation analyses
Review, approve, document, deploy

@jingjtang and maybe @eujing to take on implementation, and ping @nloliveira for help if needed

krivard · 2020-07-10T18:30:48Z

The estimator we want is the test positivity rate; tests per device is fine but lower priority

krivard · 2020-07-14T18:14:07Z

18-19k cases per day includes backfill. We get <100 tests per state per day if you just see yesterday.

Add pooling?
Add dv-like backfill behavior? ie even if we have data for yesterday, there won't be enough to report, so make the most recent day you report on be 4 days ago+ (or whatever seems to have enough numbers)

We also don't receive data every day; e.g. today the 14th, the most recent data is for the 12th. May be able to absorb this into our lag policy, revisit if not.

jingjtang · 2020-07-21T16:01:59Z

Location Coverage:

krivard · 2020-07-27T14:41:54Z

@RoniRos would like to show any correlations plots we generate for this when he presents at the CDC meeting on ~~Thursday~~ (correction below)

RoniRos · 2020-07-27T15:04:15Z

@RoniRos would like to show any correlations plots we generate for this when he presents at the CDC meeting on Thursday

Actually, my bad, they want it for a community-wide presentation on Tuesday (the weekly meetings that James and I usually attend). Maybe it's better if we schedule it for next Tuesday, and have someone from the Quidel team present it and answer questions.

krivard · 2020-07-27T18:11:44Z

Some problems loading csv files into the correlations app (probably due to the client update from this weekend); uploaded wip versions to the API instead. quidel folders in receiving have been split by flu and covid. We should put the flu folder back to its original name so the source stays consistent with the signal that's already published; for covid we can use a new source or include covid in the signal name, depends on what Roni et al want for signal naming.

RoniRos · 2020-07-27T21:43:49Z

Is there a document that explains our current naming convention, and lists all current sources and signals (in a more convenient format than this) ?

jingjtang · 2020-07-27T21:51:40Z

Is there a document that explains our current naming convention, and lists all current sources and signals (in a more convenient format than this) ?

Here they are. Quidel has not been included yet. https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html

krivard · 2020-07-28T11:10:56Z

No — we switched to a composite signal a while ago. We get Puerto Rico from JHU and everything else comes from USA Facts.

…

On Mon, Jul 27, 2020 at 9:07 PM RoniRos ***@***.***> wrote: In the API documentation in the link above, shouldn't the source for the JHU Cases & Deaths be "JHU-CSSE" ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#40 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAI24CQRAIG3S3RK2P3NYF3R5YQEFANCNFSM4NH3626Q> .

krivard · 2020-07-28T18:06:00Z

Correlations

krivard · 2020-07-28T18:10:54Z

build API documentation (look at the DETAILS.md file)
put historical data back to end of May into the API -- this does mean we'll never have direction/trend until we revise historical variance calculation piece of the direction calc
missingness (usually "not enough data")

RoniRos · 2020-07-28T18:48:22Z

In Quidel, there will be both Covid and flu raw data streams to ingest, and they might even at some point be integrated (once a joint test panel is approved and distributed). Then there will likely be multiple signals made for each, some of which will make it to the map. So I think it might make sense to declare QUIDEL to be the source, and keep the flu/covid designation to be part of the signal.

However, note that none of our signals (or sources!) to date explicitly indicate Covid! We need to think how to handle the transition to multi-disease tracking. One solution is to create new signal names that explicitly include a disease name (Covid and flu for now), and map requests with old signal names to the Covid ones: initially silently, then with a gentle suggestion, and maybe at some point deprecate them.

krivard · 2020-07-30T17:51:45Z

Source: quidel
Signal: covid_ag_{raw|smoothed}_pct_positive

krivard · 2020-07-31T18:10:25Z

@jingjtang will be giving a talk on this signal at Tuesday's meeting with the CDC community.

jingjtang · 2020-08-03T17:03:28Z

Backfill problem in Quidel COVID (record here):
~ 15% data lost if upload until 5 days ago
~ 12% data lost if upload until 7 days ago
(Usually, data received each day would date back to over a month ago)

This problem has been fixed after switching to upload files for -45 days to -5 days every day.

(Minor issue but record here)
Special zipcdoes:

zip	State	Number of Tests
78086	TX	98
20174	VA	17
48824	MI	14
32313	FL	37
29486	SC	69
75033	TX	1990
79430	TX	36
75072	TX	63

Until 07-30-2020, only 2,324 tests out of 833,010 tests for those zip codes.

RoniRos · 2020-08-04T00:55:19Z

When you say e.g. "15% data lost", do you mean that 5 days after date D, the number of tests reported for day D is about 85% of the number that will be reported for day D eventually, say after more than a month?
Assuming that's the right interpretation, it would be good to add these statistics to your presentation tomorrow. In fact, if there is time, you could calculate what fraction of the 'final' data is available e.g. 1,2,3,4,5,6,7,8,9,10,15,20,25,30 days after the date, and chart it.

jingjtang · 2020-08-04T01:17:02Z

When you say e.g. "15% data lost", do you mean that 5 days after date D, the number of tests reported for day D is about 85% of the number that will be reported for day D eventually, say after more than a month?

Assuming that's the right interpretation, it would be good to add these statistics to your presentation tomorrow. In fact, if there is time, you could calculate what fraction of the 'final' data is available e.g. 1,2,3,4,5,6,7,8,9,10,15,20,25,30 days after the date, and chart it.

The number of tests reported for (day D to day D+5) is about 85% of the number total tests in the data received on a certain day.

So for the data received on Aug 3rd, since we only upload the report for July 29th. The data for July 29th to Aug 3rd (~85%) will be ingested today or several days later. But there are 15% tests will never be reported since those reports have already been uploaded. (This will be fixed if we upload the report for data going back to at least one month ago every day).

For your convenience, you can take a look at the same problem for Flu Test data here which is much more severe.

jingjtang · 2020-08-04T02:04:15Z

@RoniRos This is the figure generated following your logic.

For a certain x, we calculate the proportion of tests report for data D after x days among all of the tests that will be reported for date D eventually. Then we will get a time series vector ingested_prop. The corresponding y = 1 - median(ingested_prop).

RoniRos · 2020-08-04T02:26:43Z

Thanks @jingjtang . I think it may be easier for people to understand if you invert the Y axis, namely, report the average fraction of the eventual number of tests (after, say, at least a month) that is reported D days after the date in question. You can simple title it "Fraction of tests reported", and call the X axis "Number of days after the test date". Y axis could be "Fraction of tests reported". This is the only slide you need on this topic.

krivard · 2020-08-10T13:50:48Z

Released in 1.7a on 4 August.

krivard added API addition New signals Triage Nominate for inclusion in the next release and removed Triage Nominate for inclusion in the next release labels May 22, 2020

krivard assigned jingjtang May 26, 2020

capnrefsmmat changed the title ~~Quidel COVID tests~~ Ingest Quidel COVID test data May 27, 2020

krivard added Triage Nominate for inclusion in the next release and removed Triage Nominate for inclusion in the next release labels Jun 12, 2020

krivard added Triage Nominate for inclusion in the next release modeling Must coordinate with Modeling team labels Jul 8, 2020

krivard added this to the v1.6 milestone Jul 10, 2020

krivard removed the Triage Nominate for inclusion in the next release label Jul 10, 2020

krivard closed this as completed Aug 10, 2020

Ingest Quidel COVID test data #40

Ingest Quidel COVID test data #40

Comments

krivard commented May 22, 2020

capnrefsmmat commented May 27, 2020

Uh oh!

jingjtang commented May 27, 2020 via email

Uh oh!

krivard commented Jul 9, 2020

Uh oh!

krivard commented Jul 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krivard commented Jul 10, 2020

Uh oh!

krivard commented Jul 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jingjtang commented Jul 21, 2020

Uh oh!

krivard commented Jul 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RoniRos commented Jul 27, 2020

Uh oh!

krivard commented Jul 27, 2020

Uh oh!

RoniRos commented Jul 27, 2020

Uh oh!

jingjtang commented Jul 27, 2020

Uh oh!

krivard commented Jul 28, 2020 via email

Uh oh!

krivard commented Jul 28, 2020

Uh oh!

krivard commented Jul 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RoniRos commented Jul 28, 2020

Uh oh!

krivard commented Jul 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krivard commented Jul 31, 2020

Uh oh!

jingjtang commented Aug 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RoniRos commented Aug 4, 2020

Uh oh!

jingjtang commented Aug 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jingjtang commented Aug 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RoniRos commented Aug 4, 2020

Uh oh!

krivard commented Aug 10, 2020

Uh oh!

krivard commented Jul 10, 2020 •

edited

Loading

krivard commented Jul 14, 2020 •

edited

Loading

krivard commented Jul 27, 2020 •

edited

Loading

krivard commented Jul 28, 2020 •

edited

Loading

krivard commented Jul 30, 2020 •

edited

Loading

jingjtang commented Aug 3, 2020 •

edited

Loading

jingjtang commented Aug 4, 2020 •

edited

Loading

jingjtang commented Aug 4, 2020 •

edited

Loading