Skip to content

Commit e2c86a3

Browse files
committed
update plans, small clean-up items
1 parent af05d99 commit e2c86a3

File tree

3 files changed

+5
-2
lines changed

3 files changed

+5
-2
lines changed

validator/PLANS.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,9 @@
5959
* Are there any geo regions where this might cause false positives? E.g. small counties or MSAs, certain signals (deaths, since it's << cases)
6060
* This test is partially captured by checking avgs in source vs reference data, unless erroneous zeroes continue for more than a week
6161
* Also partially captured by outlier checking. If zeroes aren't outliers, then it's hard to say that they're erroneous at all.
62+
* Outlier detection (in progress)
63+
* Current approach is tuned to daily cases and daily deaths; use just on those signals?
64+
* prophet (package) detection is flexible, but needs 2-3 months historical data to fit on. May make sense to use if other statistical checks also need that much data.
6265
* Use known erroneous/anomalous days of source data to tune static thresholds and test behavior
6366
* If can't get data from API, do we want to use substitute data for the comparative checks instead?
6467
* E.g. most recent successful API pull -- might end up being a couple weeks older

validator/delphi_validator/datafetcher.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@
66
import re
77
from os import listdir
88
from os.path import isfile, join
9-
from datetime import datetime
109
from itertools import product
1110
import pandas as pd
1211
import numpy as np

validator/delphi_validator/validate.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -248,7 +248,8 @@ def check_df_format(self, df_to_test, nameformat):
248248

249249
def check_bad_geo_id_value(self, df_to_test, filename, geo_type):
250250
"""
251-
Check for bad geo_id values, by comparing to a list of known values (drawn from historical data)
251+
Check for bad geo_id values, by comparing to a list of known values (drawn from
252+
historical data)
252253
253254
Arguments:
254255
- df_to_test: pandas dataframe of CSV source data containing the geo_id column to check

0 commit comments

Comments
 (0)