-
Notifications
You must be signed in to change notification settings - Fork 16
Fix nchs missing value #535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, except 1 possible issue to clarify.
But otherwise tests pass for me too!
df = df.groupby( | ||
["state", "timestamp"]).sum().reindex(index_df).reset_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this reindex step was used to fill in missing values by state and time and removing it makes sense given the discussion.
I am not super familiar with the actual NCHS data, but were there cases of duplicated (state, timestamp) where the sum aggregation should still be retained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right! Let me fix this. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After further discussion, looks good to me!
Description
bug fix. Do not assign 0 to missing values. The nchs mortality weekly table reports incidence. The existence of a missing value means the data does not meet the privacy threshold.
Changelog
Fixes