Skip to content

Ingest Quidel COVID test data #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
krivard opened this issue May 22, 2020 · 24 comments
Closed

Ingest Quidel COVID test data #40

krivard opened this issue May 22, 2020 · 24 comments
Assignees
Labels
API addition New signals modeling Must coordinate with Modeling team
Milestone

Comments

@krivard
Copy link
Contributor

krivard commented May 22, 2020

Antigen test results will start coming to us Monday. Not as high-quality as PCR, but much faster, and still good enough for clinical use.

@krivard krivard added API addition New signals Triage Nominate for inclusion in the next release and removed Triage Nominate for inclusion in the next release labels May 22, 2020
@capnrefsmmat capnrefsmmat changed the title Quidel COVID tests Ingest Quidel COVID test data May 27, 2020
@capnrefsmmat
Copy link
Contributor

Have we received any of this data yet, so we know what format to expect?

@jingjtang
Copy link
Contributor

jingjtang commented May 27, 2020 via email

@krivard krivard added Triage Nominate for inclusion in the next release and removed Triage Nominate for inclusion in the next release labels Jun 12, 2020
@krivard krivard added Triage Nominate for inclusion in the next release modeling Must coordinate with Modeling team labels Jul 8, 2020
@krivard
Copy link
Contributor Author

krivard commented Jul 9, 2020

@krivard to find Jeremy Weiss and get an update on what the current volume is

@krivard krivard added this to the v1.6 milestone Jul 10, 2020
@krivard krivard removed the Triage Nominate for inclusion in the next release label Jul 10, 2020
@krivard
Copy link
Contributor Author

krivard commented Jul 10, 2020

Volume is now 18-19k tests/day

Substantial backfill, like ~1 month

Jeremy's analyses are based on test date, not storage date, falling back to storage date if the test date is obviously wrong. Test date is the day of the assay not the day the patient's sample was taken.

Data is already being sent to the drop!

  • Jeremy to send over the script he's been running to Jingjing and cc Katie
  • Port the Quidel flu testing code to this codebase and use the COVID results instead, or write a new pipeline
  • Submit wip signals to the API
  • Do correlation analyses
  • Review, approve, document, deploy

@jingjtang and maybe @eujing to take on implementation, and ping @nloliveira for help if needed

@krivard
Copy link
Contributor Author

krivard commented Jul 10, 2020

The estimator we want is the test positivity rate; tests per device is fine but lower priority

@krivard
Copy link
Contributor Author

krivard commented Jul 14, 2020

18-19k cases per day includes backfill. We get <100 tests per state per day if you just see yesterday.

  • Add pooling?
  • Add dv-like backfill behavior? ie even if we have data for yesterday, there won't be enough to report, so make the most recent day you report on be 4 days ago+ (or whatever seems to have enough numbers)

We also don't receive data every day; e.g. today the 14th, the most recent data is for the 12th. May be able to absorb this into our lag policy, revisit if not.

@jingjtang
Copy link
Contributor

Location Coverage:
state_countsmsa_counts

hrr_countscounty_counts

@krivard
Copy link
Contributor Author

krivard commented Jul 27, 2020

@RoniRos would like to show any correlations plots we generate for this when he presents at the CDC meeting on Thursday (correction below)

@RoniRos
Copy link
Member

RoniRos commented Jul 27, 2020

@RoniRos would like to show any correlations plots we generate for this when he presents at the CDC meeting on Thursday

Actually, my bad, they want it for a community-wide presentation on Tuesday (the weekly meetings that James and I usually attend). Maybe it's better if we schedule it for next Tuesday, and have someone from the Quidel team present it and answer questions.

@krivard
Copy link
Contributor Author

krivard commented Jul 27, 2020

Some problems loading csv files into the correlations app (probably due to the client update from this weekend); uploaded wip versions to the API instead. quidel folders in receiving have been split by flu and covid. We should put the flu folder back to its original name so the source stays consistent with the signal that's already published; for covid we can use a new source or include covid in the signal name, depends on what Roni et al want for signal naming.

@RoniRos
Copy link
Member

RoniRos commented Jul 27, 2020

Is there a document that explains our current naming convention, and lists all current sources and signals (in a more convenient format than this) ?

@jingjtang
Copy link
Contributor

Is there a document that explains our current naming convention, and lists all current sources and signals (in a more convenient format than this) ?

Here they are. Quidel has not been included yet. https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html

@krivard
Copy link
Contributor Author

krivard commented Jul 28, 2020 via email

@krivard
Copy link
Contributor Author

krivard commented Jul 28, 2020

Correlations

@krivard
Copy link
Contributor Author

krivard commented Jul 28, 2020

  • build API documentation (look at the DETAILS.md file)
  • put historical data back to end of May into the API -- this does mean we'll never have direction/trend until we revise historical variance calculation piece of the direction calc
  • missingness (usually "not enough data")

@RoniRos
Copy link
Member

RoniRos commented Jul 28, 2020

In Quidel, there will be both Covid and flu raw data streams to ingest, and they might even at some point be integrated (once a joint test panel is approved and distributed). Then there will likely be multiple signals made for each, some of which will make it to the map. So I think it might make sense to declare QUIDEL to be the source, and keep the flu/covid designation to be part of the signal.

However, note that none of our signals (or sources!) to date explicitly indicate Covid! We need to think how to handle the transition to multi-disease tracking. One solution is to create new signal names that explicitly include a disease name (Covid and flu for now), and map requests with old signal names to the Covid ones: initially silently, then with a gentle suggestion, and maybe at some point deprecate them.

@krivard
Copy link
Contributor Author

krivard commented Jul 30, 2020

Source: quidel
Signal: covid_ag_{raw|smoothed}_pct_positive

@krivard
Copy link
Contributor Author

krivard commented Jul 31, 2020

@jingjtang will be giving a talk on this signal at Tuesday's meeting with the CDC community.

@jingjtang
Copy link
Contributor

jingjtang commented Aug 3, 2020

Backfill problem in Quidel COVID (record here):
~ 15% data lost if upload until 5 days ago
~ 12% data lost if upload until 7 days ago
(Usually, data received each day would date back to over a month ago)

This problem has been fixed after switching to upload files for -45 days to -5 days every day.

(Minor issue but record here)
Special zipcdoes:

zip State Number of Tests
78086 TX 98
20174 VA 17
48824 MI 14
32313 FL 37
29486 SC 69
75033 TX 1990
79430 TX 36
75072 TX 63

Until 07-30-2020, only 2,324 tests out of 833,010 tests for those zip codes.

@RoniRos
Copy link
Member

RoniRos commented Aug 4, 2020

  • When you say e.g. "15% data lost", do you mean that 5 days after date D, the number of tests reported for day D is about 85% of the number that will be reported for day D eventually, say after more than a month?
  • Assuming that's the right interpretation, it would be good to add these statistics to your presentation tomorrow. In fact, if there is time, you could calculate what fraction of the 'final' data is available e.g. 1,2,3,4,5,6,7,8,9,10,15,20,25,30 days after the date, and chart it.

@jingjtang
Copy link
Contributor

jingjtang commented Aug 4, 2020

  • When you say e.g. "15% data lost", do you mean that 5 days after date D, the number of tests reported for day D is about 85% of the number that will be reported for day D eventually, say after more than a month?
  • Assuming that's the right interpretation, it would be good to add these statistics to your presentation tomorrow. In fact, if there is time, you could calculate what fraction of the 'final' data is available e.g. 1,2,3,4,5,6,7,8,9,10,15,20,25,30 days after the date, and chart it.

The number of tests reported for (day D to day D+5) is about 85% of the number total tests in the data received on a certain day.

So for the data received on Aug 3rd, since we only upload the report for July 29th. The data for July 29th to Aug 3rd (~85%) will be ingested today or several days later. But there are 15% tests will never be reported since those reports have already been uploaded. (This will be fixed if we upload the report for data going back to at least one month ago every day).

For your convenience, you can take a look at the same problem for Flu Test data here which is much more severe.

@jingjtang
Copy link
Contributor

jingjtang commented Aug 4, 2020

@RoniRos This is the figure generated following your logic.

image

For a certain x, we calculate the proportion of tests report for data D after x days among all of the tests that will be reported for date D eventually. Then we will get a time series vector ingested_prop. The corresponding y = 1 - median(ingested_prop).

@RoniRos
Copy link
Member

RoniRos commented Aug 4, 2020

Thanks @jingjtang . I think it may be easier for people to understand if you invert the Y axis, namely, report the average fraction of the eventual number of tests (after, say, at least a month) that is reported D days after the date in question. You can simple title it "Fraction of tests reported", and call the X axis "Number of days after the test date". Y axis could be "Fraction of tests reported". This is the only slide you need on this topic.

@krivard
Copy link
Contributor Author

krivard commented Aug 10, 2020

Released in 1.7a on 4 August.

@krivard krivard closed this as completed Aug 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API addition New signals modeling Must coordinate with Modeling team
Projects
None yet
Development

No branches or pull requests

4 participants