Skip to content

Quidel test devices showing up under more than one FIPS code #173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
krivard opened this issue Jul 30, 2020 · 12 comments
Closed

Quidel test devices showing up under more than one FIPS code #173

krivard opened this issue Jul 30, 2020 · 12 comments
Assignees
Labels
blocked This task is waiting for completion of another task Engineering Used to filter issues when synching with Asana

Comments

@krivard
Copy link
Contributor

krivard commented Jul 30, 2020

No description provided.

@jingjtang
Copy link
Contributor

jingjtang commented Aug 3, 2020

Huge backfill problem in Quidel Flu Tests.
Usually, data received each day would date back to 6-7 months ago

image

Available Tests means: among the tests in the raw data received on a certain day, how many of them will finally be reported if we only upload the report for X days ago every day.

image

@krivard
Copy link
Contributor Author

krivard commented Aug 4, 2020

Most of the code is ported; some unit tests to finish. Still resolving historical data -- K to follow up.

@jingjtang
Copy link
Contributor

jingjtang commented Aug 4, 2020

Problem in Quidel Flu test considering test_per_device:
Out of 16528 unique devices (according to SofiaSerNum):
63 devices show in tests for more than 1 ZipCodes with the same StorageDate;
189 devices show in tests for more than 1 ZipCodes with the same TestDate.

36 devices show in tests for more than 1 fips with the same StorageDate;
109 devices show in tests for more than 1 fips with the same TestDate.

Now in the code, we

  1. pull the historical data;
  2. fix the issue with TestDate vs Storage Date
  3. Store the historical data aggregated at ZipCode level in ./cache.
    So, we only need to pull the most recent data, aggregate them at zipcode level and combine it with the historical data. (No need to pull all of the data every day which is very runtime and memory consuming since we are getting more and more data)

However, these special cases would cause problem when aggregating unique devices. For example, device 29027721 used on 2020-04-05 appears in the records for both place A (zip = 48054, fips = 26147) and place B (zip= 48071, fips = 26125). It will be count twice at larger geographical levels, e.g. state level.

Do we want to ignore them, which means just count let the problem stated above happen since they are not that common ?
Or do we want to calculate them as accurate as we can which means we should pull all of the historical data every day(since we all ways need individual level data unless we become more confident about the backfill problem) ?

@krivard
Copy link
Contributor Author

krivard commented Aug 5, 2020

Back-catalog of data is stored on midas for the moment because a couple of large files (100MB+) cannot be stored on the email dropbox indefinitely. We don't expect large files like this to arrive as part of the weekly drop; the large files were part of a historical roll-up. If a large file arrives in the future, we'll have to re-think how they are stored.

@krivard
Copy link
Contributor Author

krivard commented Aug 5, 2020

Options for the double-counted devices problem:

  • Ignore them
  • Drop them
  • Split them based on fraction of tests in each region
  • Use the first / most recent region assigned
  • Create a composite identifier for each device which is device+zip or device+fips

@jingjtang
Copy link
Contributor

jingjtang commented Aug 5, 2020

Options for the double-counted devices problem:

  • Ignore them
  • Drop them
  • Split them based on fraction of tests in each region
  • Use the first / most recent region assigned
  • Create a composite identifier for each device which is device+zip or device+fips

Isn't the last solution the same as the first one( ignoring such a problem)? In the example I mentioned above, if we create two composite identifiers for this device, don't we still count it twice in larger geo resolution?

@krivard
Copy link
Contributor Author

krivard commented Aug 6, 2020

Yep. Let's do the following analysis:

For each device used in more than one region, what's the number of devices in that region? take the region with the smallest number of devices, that will tell us the largest impact ignoring will have.

@jingjtang
Copy link
Contributor

In Difference_at_county_level.xlsx:
_zip: aggregate at zip code level -> aggregate at county level
_zip: aggregate at county level directly
diffprop: (_zip - _county) / _county

In Difference_at_state_level.xlsx:
_zip: aggregate at zip code level -> aggregate at state level
_zip: aggregate at state level directly
diffprop: (_zip - _state) / _state

In general, not serious impact on state level aggregation, but big influence on some of counties for certain dates.

@krivard
Copy link
Contributor Author

krivard commented Aug 7, 2020

Wait for more information from Quidel before deciding.

@krivard
Copy link
Contributor Author

krivard commented Aug 10, 2020

Data delivery late today -- consider making pipeline robust to delays

@krivard krivard added the blocked This task is waiting for completion of another task label Aug 11, 2020
@jingjtang
Copy link
Contributor

jingjtang commented Aug 19, 2020

Dry-run mode added for unit tests

@krivard krivard changed the title Port Quidel codebase here Port Quidel flu codebase here Aug 21, 2020
@krivard
Copy link
Contributor Author

krivard commented Sep 15, 2020

Update from Jhobe:

...they would like to create a unique hash of the facility name and zip code (rather than my sequentially assigned facility numbers). This new method would allow them to not have to keep a “look up table” for assigning Site#’s to each result record. When they switch over to this new method, we would provide you with a listing to allow you to convert your current Site#’s to the corresponding new hashes.

@krivard krivard changed the title Port Quidel flu codebase here Quidel test devices showing up under more than one FIPS code Sep 15, 2020
@SumitDELPHI SumitDELPHI added the Engineering Used to filter issues when synching with Asana label Dec 6, 2020
@krivard krivard closed this as completed Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked This task is waiting for completion of another task Engineering Used to filter issues when synching with Asana
Projects
None yet
Development

No branches or pull requests

3 participants