-
Notifications
You must be signed in to change notification settings - Fork 16
Port the Facebook validation pipeline to be generic and automatable #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've begun a validator branch that will automate validating new data for one data source. Only a very minimal skeleton is present; I have not ported the validation code, nor have I even run it. My plan is to have a script that takes two arguments: the name of the data source (e.g. "fb-survey") and the date we want to validate. The The script will log any errors in a concise output format and then exit with nonzero status if there were any failures, to make it easy to plug into an automation pipeline. |
Complications with architecture: validation needs to run before the CSV files are written to have access to the day and other metadata. Complications with checking against the real data. |
Candidate is up in #155 |
Whack-a-mole to get the thing to run, using usafacts output as a test case.
|
Runs! Now checking each criteria against the reference codebase, and building unit tests. Developing a better procedure for reporting data quality issues: label "data quality"; issue template PR incoming. |
Finished handling for known anomalies. Cleaning up remaining TODOs, expect to flag for review end of week. |
Uh oh!
There was an error while loading. Please reload this page.
Currently,
covid-19/facebook/prepare-extracts/covidalert-io-funs.R
contains a validation pipeline for Facebook. As I understand it, it does the following checks when we prepare a new day of data to upload.Many of these checks can be made generic to multiple data sources and applied to our new pipeline. This would require
alldata source as a part of its automation jobThe text was updated successfully, but these errors were encountered: