Skip to content

Fix linting on CHC #463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 10, 2020
Merged

Fix linting on CHC #463

merged 1 commit into from
Nov 10, 2020

Conversation

chinandrew
Copy link
Contributor

Summary of changes:

  • Fix linting issues in CHC. Mainly long lines and formatting log strings

Note that there are still linting issues after this PR that will be closed by #431 and #460

Comment on lines -238 to -264
logging.debug(f"wrote files to {outpath}")
'''
params = read_params()

arch_diff = S3ArchiveDiffer(
params["cache_dir"],
params["export_dir"],
params["bucket_name"], "chc",
params["aws_credentials"])
arch_diff.update_cache()

_, common_diffs, new_files = arch_diff.diff_exports()

# Archive changed and new files only
to_archive = [f for f, diff in common_diffs.items() if diff is not None]
to_archive += new_files
_, fails = arch_diff.archive_exports(to_archive)
print(fails)

# Filter existing exports to exclude those that failed to archive
succ_common_diffs = {f: diff for f, diff in common_diffs.items() if f not in fails}
arch_diff.filter_exports(succ_common_diffs)

# Report failures: someone should probably look at them
for exported_file in fails:
print(f"Failed to archive '{exported_file}'")
'''
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if these were still needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how exactly we use S3 and archiving. @krivard?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each day when the indicator runs, it produces some output. Much of that output may match what's already in the API. To conserve disk space and memory, we want to exclude those matching rows from the new issue to be uploaded to the API. The easiest way to do this is to compare the daily output with a set of CSV files matching the current (most recent issue for each day of data) contents of the API. These files are kept in a cache in a versioned S3 bucket. After the day's output is produced, the archive differ updates the cached files, and edits the files in receiving to only include lines with updated or new data.

Copy link
Contributor

@krivard krivard Nov 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: Probably best not to remove those lines until we have a fix to automation ready to run the archive differ as a subsequent process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear these are currently commented out with the triple single quotes, is it worth leaving them in as reference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right! Change is new, so I suppose it makes sense that it's not already running the archive differ.

In that case let's begin as we mean to go on, and drop them.

@chinandrew chinandrew mentioned this pull request Nov 9, 2020
7 tasks
@krivard krivard merged commit bcf412d into main Nov 10, 2020
@krivard krivard deleted the change-lint branch November 10, 2020 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants