Skip to content

Fix nchs timestamp #529

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 16, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 9 additions & 12 deletions nchs_mortality/DETAILS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,20 @@ consistency how NCHS reports the data, please refer to [Exceptions](#Exceptions)
* `state`: reported using two-letter postal code

## Metrics, Level 1 (`m1`)
* `covid_deaths`: All Deaths with confirmed or presumed COVID-19,
* `deaths_covid_incidence`: All Deaths with confirmed or presumed COVID-19,
coded to ICD–10 code U07.1
* `total_deaths`: Deaths from all causes.
* `percent_of_expected_deaths`: the number of deaths for all causes for this
* `deaths_allcause_incidence`: Deaths from all causes.
* `deaths_percent_of_expected`: the number of deaths for all causes for this
week in 2020 compared to the average number
across the same week in 2017–2019.
* `pneumonia_deaths`: Counts of deaths involving Pneumonia, with or without
* `deaths_pneumonia_notflu_incidence`: Counts of deaths involving Pneumonia, with or without
COVID-19, excluding Influenza deaths(J12.0-J18.9).
* `pneumonia_and_covid_deaths`: Counts of deaths involving COVID-19 and Pneumonia,
* `deaths_covid_and_pneumonia_notflu_incidence`: Counts of deaths involving COVID-19 and Pneumonia,
excluding Influenza (U07.1 and J12.0-J18.9).
* `influenza_deaths`: Counts of deaths involving Influenza, with or without
* `deaths_flu_incidence`: Counts of deaths involving Influenza, with or without
COVID-19 or Pneumonia (J09-J11), includes COVID-19 or
Pneumonia.
* `pneumonia_influenza_or_covid_19_deaths`: Counts of deaths involving Pneumonia,
* `deaths_pneumonia_or_flu_or_covid_incidence`: Counts of deaths involving Pneumonia,
Influenza, or COVID-19, coded to ICD–10
codes U07.1 or J09–J18.9

Expand All @@ -32,7 +32,7 @@ Detailed descriptions are provided in the notes under Table 1 [here](https://www
## Metrics, Level 2 (`m2`)
* `num`: number of new deaths on a given week
* `prop`: `num` / population * 100,000
* _**No** `m2` for signal `percent_of_expected_deaths`._
* _**No** `m2` for signal `deaths_percent_of_expected`_.

## Exceptions

Expand All @@ -49,10 +49,7 @@ but we don't consider NYC separately. The death counts for NYC would be included
### Report Using Epiweeks

We report the NCHS Mortality data in a weekly format (`weekly_YYYYWW`, where `YYYYWW`
refers to an epiweek). However, NCHS reports their weekly data from Saturday to
Saturday. We assume there is a one day shift. For example, they report a death counts
for Alaska in a week starting from date D, we will report the timestamp of this report
as the corresponding epiweek of date(D + 1).
refers to an epiweek). As defined by CDC, [epiweeks](https://wwwn.cdc.gov/nndss/document/MMWR_Week_overview.pdf) are seven days from Sunday to Saturday. We use Python package [epiweeks](https://pypi.org/project/epiweeks/) to convert the week-ending dates in the raw dataset into epiweek format.

### Data Versioning
Data versions are tracked on both a daily and weekly level.
Expand Down
2 changes: 1 addition & 1 deletion nchs_mortality/delphi_nchs_mortality/export.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ def export_csv(df, geo_name, sensor, export_dir, start_date):

for date in df["timestamp"].unique():
t = Week.fromdate(pd.to_datetime(str(date)))
date_short = "weekly_" + str(t.year) + str(t.week + 1).zfill(2)
date_short = "weekly_" + str(t.year) + str(t.week).zfill(2)
export_fn = f"{date_short}_{geo_name}_{sensor}.csv"
result_df = df[df["timestamp"] == date][["geo_id", "val", "se", "sample_size"]]
result_df.to_csv(f"{export_dir}/{export_fn}",
Expand Down
4 changes: 2 additions & 2 deletions nchs_mortality/tests/test_export.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def test_export(self):
)

# check data for 2020-06-02
expected_name = f"weekly_202024_state_region_thing.csv"
expected_name = f"weekly_202023_state_region_thing.csv"
assert exists(join("./receiving", expected_name))

output_data = pd.read_csv(join("./receiving", expected_name))
Expand All @@ -40,7 +40,7 @@ def test_export(self):
assert (output_data.sample_size.values == [100, 500, 80]).all()

# check data for 2020-06-03
expected_name = f"weekly_202025_state_region_thing.csv"
expected_name = f"weekly_202024_state_region_thing.csv"
assert exists(join("./receiving", expected_name))

output_data = pd.read_csv(join("./receiving", expected_name))
Expand Down