-
Notifications
You must be signed in to change notification settings - Fork 16
Update quidel covidtest (Add Age Groups Signals, Add rest-of-state reports) #1467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We need new signals names for age-group specific siganls.
Need @RoniRos @ryantibs @krivard 's approval for the signal names. |
Let me find what we're using in covid_hosp; we should be consistent where possible |
Recommended set:
|
Katie's last naming scheme: 👍 from me. |
As suggested by @RoniRos, we can add another two super age groups: 0-17; 18-64, so that we have information for those small age groups like 0-4 added but not bothered too much by the available threshold |
Katie’s last naming scheme LGTM, too.
From: ryantibs ***@***.***>
Sent: Friday, January 14, 2022 11:35 AM
To: cmu-delphi/covidcast-indicators ***@***.***>
Cc: RoniRos ***@***.***>; Mention ***@***.***>
Subject: Re: [cmu-delphi/covidcast-indicators] Update quidel covidtest (Add Age Groups Signals, Add rest-of-state reports) (PR #1467)
Katie's last naming scheme: 👍 from me.
—
Reply to this email directly, view it on GitHub<#1467 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFQ3C3QREPIZEJHKSJITKFLUWBGCRANCNFSM5L67GA4Q>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>
|
This suggestion was for our consideration. By adding the super-groups of 0-17 and 18-64, we may have many counties for which Quidel age-specific signal is available. Jingjing will calculate and share the # counties above threshold, so we can consider if it’s worth it.
From: Jingjing Tang ***@***.***>
Sent: Friday, January 14, 2022 1:16 PM
To: cmu-delphi/covidcast-indicators ***@***.***>
Cc: RoniRos ***@***.***>; Mention ***@***.***>
Subject: Re: [cmu-delphi/covidcast-indicators] Update quidel covidtest (Add Age Groups Signals, Add rest-of-state reports) (PR #1467)
As suggested by @RoniRos<https://github.com/RoniRos>, we can add another two super age groups: 0-17; 18-64, so that we have information for those small age groups like 0-4 added but not bothered too much by the available threshold
—
Reply to this email directly, view it on GitHub<#1467 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFQ3C3VDTIXGATEM6MFSME3UWBR47ANCNFSM5L67GA4Q>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>
|
I agree with Jingjing that these two supergroups could be useful, especially at the county level.
I’d like to know what Ryan and Kate think in terms of value vs. cost.
Btw, Jingjing, the State-level panels include also a coverage curve for “total” (all-ages), but the county-level panel does not. Can you please add it at your convenience? It’s a good indication of upper limit, and what is lost when breaking down by age-group.
From: Jingjing Tang ***@***.***>
Sent: Friday, January 14, 2022 2:09 PM
To: cmu-delphi/covidcast-indicators ***@***.***>
Cc: RoniRos ***@***.***>; Mention ***@***.***>
Subject: Re: [cmu-delphi/covidcast-indicators] Update quidel covidtest (Add Age Groups Signals, Add rest-of-state reports) (PR #1467)
[location_nums_over_time]<https://user-images.githubusercontent.com/31444565/149571274-8ec093eb-f89b-485f-ba6b-730b0f77f71e.png>
The new available ability is shown here. I think those two super age groups are good to be added.
—
Reply to this email directly, view it on GitHub<#1467 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFQ3C3VN3KRXB3AE6L5ZIHLUWBYDZANCNFSM5L67GA4Q>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>
|
It's quite hard for me to tell based on the current plots whether the tradeoff is actually worth it. I suspect it may not be. The easiest & most direct way to tell would be to add up the availability of the subgroups within each combined group Eg, just compare the availability of 0_4 + 5_17 to 0_17 (so just comparing two lines), and likewise for the other case. |
Ryan: I’m not sure why you would want to add up the coverages. I would think one would want to subtract the coverage of each sub-group from the coverage of its super-group, e.g. :
* How many counties that do not support a 0-4 signal, at least support a 0-17 signal.
And separately:
* How many counties that do not support a 5-17 signal, at least support a 0-17 signal.
From: ryantibs ***@***.***>
Sent: Friday, January 14, 2022 4:12 PM
To: cmu-delphi/covidcast-indicators ***@***.***>
Cc: RoniRos ***@***.***>; Mention ***@***.***>
Subject: Re: [cmu-delphi/covidcast-indicators] Update quidel covidtest (Add Age Groups Signals, Add rest-of-state reports) (PR #1467)
It's quite hard for me to tell based on the current plots whether the tradeoff is actually worth it. I suspect it may not be. The easiest & most direct way to tell would be to add up the availability of the subgroups within each combined group
Eg, just compare the availability of 0_4 + 5_17 to 0_17 (so just comparing two lines), and likewise for the case.
—
Reply to this email directly, view it on GitHub<#1467 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFQ3C3UV2JEQFB3RJSJOIGDUWCGTHANCNFSM5L67GA4Q>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>
|
Yes sorry, my "+" was supposed to be an "OR". So I think we are saying equivalent things. |
Sorry to be a pest, but I still don't understand. Why would you want to do an OR, rather than add up the two differences I listed as an example? If there is a county were neither 0-4 nor 5-17 is available, arguably it should count twice as much as a county where only one of them is unavailable. No? Anyway, what are the use cases you envision for these signals? |
One could argue that when, say 0-17 and 5-17 are available but 0-4 is not, one could derive an estimate of 0-4 from the other two, based on their proportion in the population (since we don't give them their proportion among the tests). So the case when neither 0-4 nor 5-17 are available is by far the worst. |
Not a pest, I just wasn't reading/thinking carefully! Yes, you're right, your message is alluding to something different. But we wouldn't be able to find out from your counts whether in a particular county we have what you call the "worst case" (neither 0-4 nor 5-17 are available), since it's just marginal counts, right? And yes I buy your argument. So I guess the following four numbers would be useful: How many counties that do not support a 0-4 signal, support a 0-17 signal. @jingjtang Can you do this, and do the same for the other category 18-64, so that we can get a sense of the value ad? |
@RoniRos We initially added the availability checking for all-age groups here, but that curve for county only is removed later ONLY for a more clear looking at the bottom area for the age-specific signals. In case you are not able to jump to the slack thread, the fig is attached below (The first legend should be 0-4, not 0-5) |
Ok. I just browsed some of CDC's data and found many places were they present age breakdowns as 0-17,18-49,50-64,65+. E.g. https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/burden.html So my suggested is that we at least include 0-17. |
@ryantibs Similar ones for 18-64 |
@RoniRos @jingjtang Thanks! I'm convinced we should add 0-17. We can just make it clear in our documentation that we also wanted to separate this out into two subcategories duet to the fact that 0-4 is unvaccinated population. This would be a clean explanation as to why there is overlap/nesting in the lowest 3 categories (0-4, 5-17, 0-17). I'm less convinced we should add 18-64. Rationale: if we rescue any counties here with this big bucket, then the signal we publish is probably going to be very similar to the all-ages signal. (Because I'm guessing this big bucket includes the most common ages from a marginal perspective, and if signals at 18-49 and 50-64 are each missing, then it must means we're just right below the threshold we set for availability, and 18-64 puts us right above). So I suggest we just publish 0-17, along with the existing breakdowns, and finish this off ASAP. However, not strongly opposed to also including 18-64, if Roni or somebody else feels differently. Thanks! |
And for the time series plot, the left most one, I checked how many counties have < 25 counts for each date (and if we switch to option2, those counties will be unavailable at all).
Yes, it is just a plotting artifact. In fact, we only have values reported at the two endpoints. All the dates in between are unavailable for that county. And remember that, we also have counties with only <10 days available due to geographical pooling (they has < 10 days with counts in [25, 50] but the rest of the days with counts < 25), I didn't show those examples since the figures are just dominated by the line of option1 but you can image how they look like. And we do have those counties, the numbers are shown in the middle one of the time series plot |
Thank you. I think option 2 is the winner in my mind, but will have a decision for by 5pm (just to be sure, will check to if Roni agrees with me at team-leads meeting). |
Hi @jingjtang, confirming that we should go with option 2. Thank you for working hard to compare the options. |
Thanks @ryantibs. @dshemetov The pipeline is ready for review with the new strategy. Described in details below for your reference:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I found a small bug, see the comments. Otherwise looks good!
# For raw signals, the threshold is MIN_OBS | ||
# For smoothed signals, the threshold is MIN_OBS/2 | ||
if smooth: | ||
threshold_visits = MIN_OBS/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this change in thresholding here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
[(14+parent_pos*22/parent_test+0.5)/(50+1)*100, | ||
(26+parent_pos*1/parent_test+0.5)/(50+1)*100, | ||
(24+0.5)/(60+1)*100, | ||
(32+0.5)/(64+1)*100], equal_nan=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is excellent, thank you!
Three questions:
|
By the way, since it's a time to re-run the past data, I want to ask whether we are able to extend the export range for Quidel from 45 days to 175 days? (If storage allows) This means for the daily run, we generate reports for the dates from -180 days to -5 days instead of -50 days to -5 days. The backfill related pipeline will benefit from this. |
Sorry if this is contributing noise, but I just want to double check --- apart from the new stricter way of pooling, and the new signals --- 1. that we'll get megacounties, and 2. that the national signal (and all signals) will take values for references dates all the way back to the start of the time period. These are things I've asked for on other issues or messages. |
@ryantibs Yes, we will have
for this PR. |
Cool. To do this we will need the full list of (issue, reference date, geo type, geo value) to delete.
Acknowledged
Acknowledged
That's just shy of 4x, on top of the 7x we're adding with this PR -- I don't recommend it. Quidel currently adds ~110k rows to the database per day, out of a total ~2.8M rows added daily across all indicators. Here's some rough figures imagining the impact on the database if we did that (upper bound; i imagine the age group signals have lower coverage than all-ages and so are somewhat less than 7x)
I do notice that quidel isn't currently using the archive-differ, so some of that 110k rows might be duplicates. Do you happen to know the actual rate of new/updated figures vs duplicates in each submission? |
Releasing this change has some subtle details -- here's a draft of our options. Pick:
A: If we think the initial run will be fast (<40 minutes)
B: If we think the initial run will be slow
C: If we think the deletions will be fast (<4 hours)Delete any time after A2 or B3 D: If we think the deletions will be slowDo the deletions need to be in place relatively simultaneously with release? Is it bad for the database to spend multiple days with deletions only partially applied? Di: if we need to be rigorousSchedule a 3-day patch period the same way we did with the JHU patch last fall. Start A1 or B1 on day one of the patch. Complete A/B before concluding the patch period. Dii: if we can be more flexibleDelete in batches daily until done. Could take place before, during, or after A/B, depending on if we want deletions finished before the new code runs, or if we want the new code to run before we finalize the set of deletions needed. E: If we want archive-differ to handle deletionsIf we don't care about deleting from past issues, archive-differ will automatically mark as deleted anything that switches from showing up in an output file to not-showing-up in an output file. "Mark as deleted" means the data is still in the database, and will appear in as-of queries for as-ofs before the deletion date, but will not appear in most-recent queries or in as-of queries for as-ofs after the deletion date.
|
I do not have that kind of data in hand. Do you want me to check that? Choice between A and B: the runtime will be longer than 40 mins but ~60 mins or so, no more than 120 mins according to my experience. Choice for deletion: I think Dii is better. We can have the new signals produced as soon as possible and have those corrections for the historical release. We do need correction (deleting) for the past issues. So I won't choose E. |
Yes please; just for a week or two worth of data should be fine. |
@krivard Here is a table showing the percentage of 0 diff
We can see that for the smallest geographical level (county), we have zero_diff_pct >= 80 for most of the signals. It's good to apply archive-differ to quidel_covidtest too. |
Excellent news! So if the update to the smoothing means we expect to drop 15-50% of new rows we currently report each day for quidel, Let's install archive-differ, release this, see where we are, and then revisit expanding the export range once we have a better idea of where we wind up. That'll look like E without the initialization step, or:
|
@krivard File created to provide deletion info: https://cmu.box.com/s/y41vjkj8z8cjvaqgtz8g5phagms1i3t0 I did the sanity check by
But it's still better to have another one to help do another sanity check for this. |
Description
Changelog
Itemize code/test/documentation changes and files added/removed.
constants.py
,pull.py
,run.py
,geo_map.py
,generate_sensor.py
tests/test_constants.py
,tests/test_pull.py
,tests/test_run.py
,tests/test_geo_map.py
,tests/test_generate_sensor.py
,tests/test_data
Fixes