|
34 | 34 | * User can manually disable specific checks for specific datasets using a field in the params.json file
|
35 | 35 | * User can enable test mode (checks only a small number of data files) using a field in the params.json file
|
36 | 36 |
|
37 |
| -## Checks + features wishlist, and problems to think about: |
| 37 | +## Checks + features wishlist, and problems to think about |
| 38 | + |
| 39 | +### Starter/small issues |
38 | 40 |
|
39 |
| -* Improve performance and reduce runtime (what's the target time? Just want to not be painfully slow...) |
40 |
| - * Profiling (iterate) |
41 |
| - * Check if saving intermediate files will improve efficiency (currently a bottleneck at "individual file checks" section. Parallelize?) |
42 |
| - * Make `all_frames` MultiIndex-ed by geo type and signal name? Make a dict of data indexed by geo type and signal name? May improve performance or may just make access more readable. |
43 | 41 | * Check for duplicate rows
|
44 |
| -* Check explicitly for large spikes (avg_val check can detect jumps in average value) |
45 | 42 | * Backfill problems, especially with JHU and USA Facts, where a change to old data results in a datapoint that doesn’t agree with surrounding data ([JHU examples](https://delphi-org.slack.com/archives/CF9G83ZJ9/p1600729151013900)) or is very different from the value it replaced. If date is already in the API, have any values changed significantly within the "backfill" window (use span_length setting). See [this](https://github.com/cmu-delphi/covidcast-indicators/pull/155#discussion_r504195207) for context.
|
46 | 43 | * Run check_missing_date_files (or similar) on every geo type-signal type separately in comparative checks loop.
|
47 |
| -* Different test thresholds for different files? Currently some control based on smoothed vs raw signals |
| 44 | + |
| 45 | +### Larger issues |
| 46 | + |
| 47 | +* Check if [errors raised from validating all signals](https://docs.google.com/spreadsheets/d/1_aRBDrNeaI-3ZwuvkRNSZuZ2wfHJk6Bxj35Ol_XZ9yQ/edit#gid=1226266834) are correct, not false positives, not overly verbose or repetitive |
| 48 | +* Check for erratic data sources that wrongly report all zeroes |
| 49 | + * E.g. the error with the Wisconsin data for the 10/26 forecasts |
| 50 | + * Wary of a purely static check for this |
| 51 | + * Are there any geo regions where this might cause false positives? E.g. small counties or MSAs, certain signals (deaths, since it's << cases) |
| 52 | + * This test is partially captured by checking avgs in source vs reference data, unless erroneous zeroes continue for more than a week |
| 53 | + * Also partially captured by outlier checking. If zeroes aren't outliers, then it's hard to say that they're erroneous at all. |
| 54 | +* Easier suppression of many errors at once |
| 55 | +* Nicer formatting for error “report”. |
| 56 | + * E.g. if a single type of error is raised for many different datasets, summarize all error messages into a single message? But it still has to be clear how to suppress each individually |
| 57 | +* Use known erroneous/anomalous days of source data to tune static thresholds and test behavior |
| 58 | +* If can't get data from API, do we want to use substitute data for the comparative checks instead? |
| 59 | + * E.g. most recent successful API pull -- might end up being a couple weeks older |
| 60 | + * Currently, any API fetch problems just doesn't do comparative checks at all. |
| 61 | +* Improve performance and reduce runtime (no particular goal, just avoid being painfully slow!) |
| 62 | + * Profiling (iterate) |
| 63 | + * Check if saving intermediate files will improve efficiency (currently a bottleneck at "individual file checks" section. Parallelize?) |
| 64 | + * Make `all_frames` MultiIndex-ed by geo type and signal name? Make a dict of data indexed by geo type and signal name? May improve performance or may just make access more readable. |
| 65 | +* Ensure validator runs on signals that require AWS credentials (iterate) |
| 66 | + |
| 67 | +### Longer-term issues |
| 68 | + |
48 | 69 | * Data correctness and consistency over longer time periods (weeks to months). Compare data against long-ago (3 months?) API data for changes in trends.
|
49 | 70 | * Long-term trends and correlations between time series. Currently, checks only look at a data window of a few days
|
50 | 71 | * Any relevant anomaly detection packages already exist?
|
|
57 | 78 | * Raise errors when one p-value (per geo region, e.g.) is significant OR when a bunch of p-values for that same type of test (different geo regions, e.g.) are "close" to significant
|
58 | 79 | * Correct p-values for multiple testing
|
59 | 80 | * Bonferroni would be easy but is sensitive to choice of "family" of tests; Benjamimi-Hochberg is a bit more involved but is less sensitive to choice of "family"; [comparison of the two](https://delphi-org.slack.com/archives/D01A9KNTPKL/p1603294915000500)
|
60 |
| -* Nicer formatting for error “report”. |
61 |
| - * E.g. if a single type of error is raised for many different datasets, summarize all error messages into a single message? But it still has to be clear how to suppress each individually |
62 |
| -* Easier suppression of many errors at once |
63 |
| -* Use known erroneous/anomalous days of source data to tune static thresholds and test behavior |
64 |
| -* Ensure validator runs on signals that require AWS credentials (iterate) |
65 |
| -* Check if [errors raised from validating all signals](https://docs.google.com/spreadsheets/d/1_aRBDrNeaI-3ZwuvkRNSZuZ2wfHJk6Bxj35Ol_XZ9yQ/edit#gid=1226266834) are correct, not false positives, not overly verbose or repetitive |
66 |
| -* If can't get data from API, do we want to use substitute data for the comparative checks instead? |
67 |
| - * E.g. most recent successful API pull -- might end up being a couple weeks older |
68 |
| - * Currently, any API fetch problems just doesn't do comparative checks at all. |
69 |
| -* Check for erratic data sources that wrongly report all zeroes |
70 |
| - * E.g. the error with the Wisconsin data for the 10/26 forecasts |
71 |
| - * Wary of a purely static check for this |
72 |
| - * Are there any geo regions where this might cause false positives? E.g. small counties or MSAs, certain signals (deaths, since it's << cases) |
73 |
| - * This test is partially captured by checking avgs in source vs reference data, unless erroneous zeroes continue for more than a week |
74 |
| - * Also partially captured by outlier checking. If zeroes aren't outliers, then it's hard to say that they're erroneous at all. |
|
0 commit comments