Skip to content

CHNG should compute source filename instead of using delivery time #600

Closed
@krivard

Description

@krivard

From #595 (review), regarding the brittleness of using delivery time to identify which files to download:

If we run this script at 4:01 and they had posted the files already at 3, then the script will execute properly.

If they consistently post the files late, then I'm pretty sure the code will throw some sort of "File not found" error. It uses dropdate_dt to determine the filenames, so it will expect there to be files from yesterday, but the code can only download files from two days ago.

If CHNG posts the files at 5pm one day and at 4pm the next, and we run the code between 4pm and 5pm, the code will throw an assertion error because it found too many files dropped within the last 24 hours.

This code followed the outline of the HSP download scheme, but maybe a simpler solution is to compute the expected filename beforehand and check if it's been uploaded to their server yet. This way the errors would be more intelligible. For now, if we run the script after 8, everything should work properly.

This issue tracks work to convert the data fetcher to use the filename, not the delivery time, to determine which files to download.

We should be careful to explicitly consider the case of a late delivery (ie where the file has not yet been delivered at the time when the indicator is run). Do we want the indicator to skip that issue, and never process the file even if it is posted before the next day's run? Do we want to try to automatically detect that yesterday's missed file is now available, and notify someone to produce and load the missed issue manually?

Metadata

Metadata

Assignees

No one assigned

    Labels

    EngineeringUsed to filter issues when synching with Asana

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions