-
Notifications
You must be signed in to change notification settings - Fork 16
Fix linting on CHC #463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix linting on CHC #463
Conversation
logging.debug(f"wrote files to {outpath}") | ||
''' | ||
params = read_params() | ||
|
||
arch_diff = S3ArchiveDiffer( | ||
params["cache_dir"], | ||
params["export_dir"], | ||
params["bucket_name"], "chc", | ||
params["aws_credentials"]) | ||
arch_diff.update_cache() | ||
|
||
_, common_diffs, new_files = arch_diff.diff_exports() | ||
|
||
# Archive changed and new files only | ||
to_archive = [f for f, diff in common_diffs.items() if diff is not None] | ||
to_archive += new_files | ||
_, fails = arch_diff.archive_exports(to_archive) | ||
print(fails) | ||
|
||
# Filter existing exports to exclude those that failed to archive | ||
succ_common_diffs = {f: diff for f, diff in common_diffs.items() if f not in fails} | ||
arch_diff.filter_exports(succ_common_diffs) | ||
|
||
# Report failures: someone should probably look at them | ||
for exported_file in fails: | ||
print(f"Failed to archive '{exported_file}'") | ||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if these were still needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how exactly we use S3 and archiving. @krivard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each day when the indicator runs, it produces some output. Much of that output may match what's already in the API. To conserve disk space and memory, we want to exclude those matching rows from the new issue to be uploaded to the API. The easiest way to do this is to compare the daily output with a set of CSV files matching the current (most recent issue for each day of data) contents of the API. These files are kept in a cache in a versioned S3 bucket. After the day's output is produced, the archive differ updates the cached files, and edits the files in receiving to only include lines with updated or new data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: Probably best not to remove those lines until we have a fix to automation ready to run the archive differ as a subsequent process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear these are currently commented out with the triple single quotes, is it worth leaving them in as reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right! Change is new, so I suppose it makes sense that it's not already running the archive differ.
In that case let's begin as we mean to go on, and drop them.
Summary of changes:
Note that there are still linting issues after this PR that will be closed by #431 and #460