2085 add proportions nhsn #2111

aysim319 · 2025-01-28T19:50:43Z

Description

address #2085

Changelog

added new function that checks the metadata for last update
-> the function also checks for 503 error
-> added signals for total reporting hospitals

Associated Issue(s)

Addresses add retry for 50x error for scorata api and also throw error when that happens #2091

nmdefries

Are you planning on adding the new RSV signals in this PR or in a separate one?

nhsn/delphi_nhsn/constants.py

nmdefries · 2025-02-04T15:13:43Z

nhsn/delphi_nhsn/pull.py

+    updated_timestamp = datetime.utcfromtimestamp(int(response["rowsUpdatedAt"]))
+    now = datetime.utcnow()
+    recently_updated = (now - updated_timestamp) < timedelta(days=1)


issue: I think this "recently-updated" logic is sufficient but not robust. For example, if we fail to pull data for multiple days, the next day we run we would not pull data we had never seen before if it was not posted in the last day.

The more robust solution would be to save last pull's updated_timestamp to a local file. We would then load that and compare updated_timestamp to that -- if exactly equal, skip update; if unequal, pull data.

definitely makes sense and something I didn't think about! The only thing I did different was use the api instead of scanning the file since I imagine the file list is going to go and doesn't make much sense to scan the file list every day

Yeah, checking the API could make sense, too. The one thing I'd caution is timezones -- your previous approach explicitly used UTC on both "old" and "now" timestamps, but I don't know what the API uses.

Second, the API only has dates, not times. Would that ever cause problems? E.g. we want to check for updates multiple times a day.

Yeah, checking the API could make sense, too. The one thing I'd caution is timezones -- your previous
approach explicitly used UTC on both "old" and "now" timestamps, but I don't know what the API uses

since the data and the dates are just date and not datetime, I didn't take timezones into account....hmm i also don't know for sure which timezone, i believe it's EST, but have to double check

the API only has dates, not times. Would that ever cause problems? E.g. we want to check for updates multiple times a day.

since this is data that generally updates weekly, I was planning on just running once a day, so I thought timezone wouldn't be as much of an issue

Okay, given these complications, I'm thinking reading/writing to a file is easier. We wouldn't need to keep a complete list of all update datetimes ever, just the single most recent datetime. So the file wouldn't keep getting bigger and bigger, we could just read a single line.

This lets us store a UTC date (no timezones to worry about), no API date-processing to worry about, and we can store a datetime to be extra precise.

I wasn't a fan of have metadata files, seems overkill / introduce more complexity than I would like, so after talking things through with Nolan just now, I decided to simplify the logic and create backups daily, but still do simple check to see recently updated to actually continue processing and create the csv files, so if there are outages that happened after the initial pulls, we can go back and do patches for them.

Nolan also mentioned that for the future, we could look into creating a generic tool/script to dedup things specifically and I like that direction since it would seperate the complexity away from this code base

nhsn/delphi_nhsn/pull.py

aysim319 · 2025-02-04T15:35:48Z

Are you planning on adding the new RSV signals in this PR or in a separate one?

I was originally planning for a seperate one. I was considering adding the new signal in this PR, but when I tried to add just the new columns and locally ran the tests and ran across issues and looks like it might be more involved, so i think it'd be better to create a seperate one. I also kinda shoved in other issues (retry and daily checking) and didn't want to add more things on top

…for patching

nhsn/delphi_nhsn/pull.py

nolangormley

Some small questions

nolangormley · 2025-02-07T18:18:28Z

nhsn/delphi_nhsn/pull.py

-        df = df.astype(TYPE_DICT)
+        try:
+            df = df.astype(TYPE_DICT)
+        except KeyError:


Why does this just pass?

The idea was that some of the older data didn't have newer signals (rsv, reporting hospitals) in the source back up it would log, but resume the patching process.

Previously I tried modify TYPE_DICT, but being mutable caused some issue in patching runs. So this was the next solution...is it a good one ehhh....I log that that the signal is unavailable eariler (line 150) and I thought I shouldn't log basically the same message twice

Since you brought up...I should also check if the rest of the columns actually changed data types and maybe look into a less janky way

nhsn/delphi_nhsn/pull.py

nolangormley

LGTM!

nmdefries

A couple cleanup requests -- deduping pull_nhsn... and pull_preliminary... is the main one.

nhsn/delphi_nhsn/pull.py

nmdefries · 2025-02-11T20:14:39Z

nhsn/delphi_nhsn/pull.py

        if not custom_run
        else pull_data_from_file(backup_dir, issue_date, logger, prelim_flag=False)
    )

-    keep_columns = list(TYPE_DICT.keys())
+    recently_updated = True if custom_run else check_last_updated(socrata_token, MAIN_DATASET_ID, logger)


suggestion: if we put this before the pull_data logic, we could avoid fetching from the source API in most cases (since this is or will be running every day but only updates once a week).

that was the original thought, but previously you brought up if there's multiple failures in a row and the solution was to squirrel away for now, but at least avoid duplicating for both raw and processed

nmdefries · 2025-02-11T20:27:02Z

nhsn/delphi_nhsn/pull.py

@@ -144,24 +210,31 @@ def pull_preliminary_nhsn_data(
    pd.DataFrame
        Dataframe as described above.
    """
+    # Pull data from Socrata API


issue: pull_preliminary_nhsn_data and pull_nhsn_data are really similar. I think it will become a maintenance issue to keep both. We should probably keep these two functions as wrappers of a shared fn that takes a is_prelim flag (or similar).

Diff of the two fns:

I know they're similar, i thought about it and went back and forth about it but I was in the thought of maybe in the future there would be something different going on so kept it seperate. I'm not too concerned about this, since we'll be slowly deprecating this codebase;

Co-authored-by: nmdefries <[email protected]>

aysim319 added 7 commits January 16, 2025 16:21

initial implimentation for proportion

ff91c4c

in progress

5ef99b2

check for update in progress

6b19402

merged with main

ad92262

adding checking updates in progress

f4b3c40

adding just num reporting hospital

1df478c

tests and undoing proportion signal code

6e5a99b

aysim319 linked an issue Jan 28, 2025 that may be closed by this pull request

add retry for 50x error for scorata api and also throw error when that happens #2091

Closed

aysim319 added 3 commits January 29, 2025 09:37

lint

7cabd8a

fixed test

6a73c35

fix test part 2

6e0d4c2

aysim319 requested review from nolangormley and nmdefries February 3, 2025 19:35

nmdefries reviewed Feb 4, 2025

View reviewed changes

aysim319 added 4 commits February 4, 2025 14:53

fixed bugs related to patching added test for missing signal columns …

2da6c08

…for patching

suggestion

1e408ba

missed fix for test data

77662dc

fixed test

783ab24

aysim319 requested a review from nmdefries February 5, 2025 15:57

nolangormley requested changes Feb 5, 2025

View reviewed changes

nhsn/delphi_nhsn/pull.py Outdated Show resolved Hide resolved

suggested change

76d5436

aysim319 commented Feb 5, 2025

View reviewed changes

nhsn/delphi_nhsn/pull.py Outdated Show resolved Hide resolved

aysim319 requested a review from nolangormley February 5, 2025 20:19

aysim319 added 4 commits February 6, 2025 10:43

changed logic to be cleaner; always create backups

18de943

lint

e3e96bf

wrapped in try block

e9bb0a7

retrigger jenkin build

33f3db5

nolangormley requested changes Feb 7, 2025

View reviewed changes

cleaned test and suggested changes

88fbc6e

aysim319 added 2 commits February 10, 2025 10:48

adding more context/comments for check update

7e6b23a

doc string fix

a220e0d

aysim319 requested a review from nolangormley February 11, 2025 14:38

nolangormley approved these changes Feb 11, 2025

View reviewed changes

aysim319 added 2 commits February 11, 2025 11:58

fixed copy issue

d8f237b

lint

11ceae9

nmdefries requested changes Feb 11, 2025

View reviewed changes

Apply suggestions from code review

ebe52aa

Co-authored-by: nmdefries <[email protected]>

nmdefries self-requested a review February 11, 2025 21:35

nmdefries approved these changes Feb 11, 2025

View reviewed changes

aysim319 merged commit f543fe9 into main Feb 11, 2025
17 checks passed

aysim319 deleted the 2085-add-proportions-nhsn branch February 14, 2025 15:50

aysim319 mentioned this pull request Feb 19, 2025

Release covidcast-indicators 0.3.61 #2126

Merged

melange396 mentioned this pull request Feb 19, 2025

NHSN doc changes cmu-delphi/delphi-epidata#1601

Merged

4 tasks

2085 add proportions nhsn #2111

2085 add proportions nhsn #2111

Conversation

aysim319 commented Jan 28, 2025

Description

Changelog

Associated Issue(s)

Uh oh!

nmdefries left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nmdefries Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

aysim319 Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nmdefries Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

aysim319 Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nmdefries Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aysim319 Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aysim319 commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nolangormley left a comment

Choose a reason for hiding this comment

Uh oh!

nolangormley Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

aysim319 Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nolangormley left a comment

Choose a reason for hiding this comment

Uh oh!

nmdefries left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nmdefries Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

aysim319 Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

nmdefries Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

aysim319 Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aysim319 Feb 4, 2025 •

edited

Loading

aysim319 Feb 5, 2025 •

edited

Loading

nmdefries Feb 5, 2025 •

edited

Loading

aysim319 Feb 6, 2025 •

edited

Loading

aysim319 commented Feb 4, 2025 •

edited

Loading

aysim319 Feb 7, 2025 •

edited

Loading

aysim319 Feb 11, 2025 •

edited

Loading