Skip to content

Refactor smoothers as a utility, create new filters. #171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jsharpna opened this issue Jul 30, 2020 · 7 comments
Closed

Refactor smoothers as a utility, create new filters. #171

jsharpna opened this issue Jul 30, 2020 · 7 comments
Assignees
Labels
Engineering Used to filter issues when synching with Asana

Comments

@jsharpna
Copy link
Contributor

No description provided.

@krivard
Copy link
Contributor

krivard commented Jul 30, 2020

  • Standardize input and output: numpy array with one cell per day
  • Start with JHU or another Jingjing & Addison production; that's the most common style amongst the python indicators.

@dshemetov
Copy link
Contributor

I think it may be a good to collect all the possible sources of data missingness and come up with a standard approach:

  • missing from source (temporarily or permanently)
  • privacy censoring

One approach to missing data can be seen in geo_reindex here. The TL;DR is: we fill a regular grid of daily values with the signal and all the missing values are then filled with 0's.

I wonder if it would be better to mark them with NANs instead. This would make it clear on the smoothing end that the value is missing and not 0. There we can decide to impute the data based on the most recent historic data or geographic factors.

@dshemetov
Copy link
Contributor

Also, we should probably think through smoother boundary effects issues. Currently some smoothers will report NANs for approximately the first few weeks of the available data (the NAN window is based on the averaging window of ~2 weeks). This data will be so far in the past, that it likely won't matter for practical real-time users, but is worth considering for historical use-cases. A natural way to fix this would be to dynamically change the smoothing window on the boundaries.

@krivard
Copy link
Contributor

krivard commented Aug 4, 2020

Will incorporate into JHU first.

Comparing results to the reference implementation can be done in two copies of the repository or between a feature branch and the main branch.

Data censoring occurs within each indicator to handle data that are permitted by the DUA (ie no stderr/sample size) vs minimum sample size. Missingness due to censorship should be NA. Some sources report 0 when it is not necessarily a true 0 (GHT, cases/deaths).

There are a lot of parameters to the smoothers -- at some point we'll want to evaluate different configurations for typical and edge case performance.

@dshemetov
Copy link
Contributor

#177 JHU refactoring is almost done. Want to write a couple better smoother tests first.

@krivard
Copy link
Contributor

krivard commented Aug 11, 2020

Add to other indicators, order TBD

@krivard
Copy link
Contributor

krivard commented Aug 12, 2020

  • python template
  • ght
  • emr-hosp
  • quidel covid tests
  • doctor-visits [blocked?]
  • quidel flu tests [blocked; wait for merge to main]
  • jhu [blocked; wait for geo revision]
  • usafacts [blocked; wait for geo revision]
  • combo cases/deaths [blocked; wait for diff revision]
  • combo nmf_day_doc_fbc_fbs_ght [blocked; not in this codebase]
  • fb-survey [blocked; wait for merge to main]

@SumitDELPHI SumitDELPHI added the Engineering Used to filter issues when synching with Asana label Dec 6, 2020
@krivard krivard closed this as completed Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Engineering Used to filter issues when synching with Asana
Projects
None yet
Development

No branches or pull requests

4 participants