|
44 | 44 |
|
45 | 45 | ### Larger issues
|
46 | 46 |
|
47 |
| -* Check if [errors raised from validating all signals](https://docs.google.com/spreadsheets/d/1_aRBDrNeaI-3ZwuvkRNSZuZ2wfHJk6Bxj35Ol_XZ9yQ/edit#gid=1226266834) are correct, not false positives, not overly verbose or repetitive |
| 47 | +* Improve errors and error reports |
| 48 | + * Check if [errors raised from validating all signals](https://docs.google.com/spreadsheets/d/1_aRBDrNeaI-3ZwuvkRNSZuZ2wfHJk6Bxj35Ol_XZ9yQ/edit#gid=1226266834) are correct, not false positives, not overly verbose or repetitive |
| 49 | + * Easier suppression of many errors at once |
| 50 | + * Maybe store errors as dict of dicts. Keys could be check strings (e.g. "check_bad_se"), then next layer geo type, etc |
| 51 | + * Nicer formatting for error “report”. |
| 52 | + * E.g. if a single type of error is raised for many different datasets, summarize all error messages into a single message? But it still has to be clear how to suppress each individually |
48 | 53 | * Check for erratic data sources that wrongly report all zeroes
|
49 | 54 | * E.g. the error with the Wisconsin data for the 10/26 forecasts
|
50 | 55 | * Wary of a purely static check for this
|
51 | 56 | * Are there any geo regions where this might cause false positives? E.g. small counties or MSAs, certain signals (deaths, since it's << cases)
|
52 | 57 | * This test is partially captured by checking avgs in source vs reference data, unless erroneous zeroes continue for more than a week
|
53 | 58 | * Also partially captured by outlier checking. If zeroes aren't outliers, then it's hard to say that they're erroneous at all.
|
54 |
| -* Easier suppression of many errors at once |
55 |
| - * Maybe store errors as dict of dicts. Keys could be check strings (e.g. "check_bad_se"), then next layer geo type, etc |
56 |
| -* Nicer formatting for error “report”. |
57 |
| - * E.g. if a single type of error is raised for many different datasets, summarize all error messages into a single message? But it still has to be clear how to suppress each individually |
58 | 59 | * Use known erroneous/anomalous days of source data to tune static thresholds and test behavior
|
59 | 60 | * If can't get data from API, do we want to use substitute data for the comparative checks instead?
|
60 | 61 | * E.g. most recent successful API pull -- might end up being a couple weeks older
|
|
0 commit comments