Skip to content

First pass of the CDC Vaccination Indicator #1238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 136 commits into from
Closed
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
136 commits
Select commit Hold shift + click to select a range
89410dc
First pass of the CDC Indicator
Ananya-Joshi Sep 8, 2021
4799139
added explicit dictionary creation
Ananya-Joshi Sep 9, 2021
5d92ed4
added os import
Ananya-Joshi Sep 9, 2021
7836d23
Minor changes for the linter - tests pass locally
Ananya-Joshi Sep 9, 2021
52e04c2
Update cdc_vaccines/delphi_cdc_vaccines/__main__.py
Ananya-Joshi Sep 10, 2021
16c1050
Update cdc_vaccines/delphi_cdc_vaccines/constants.py
Ananya-Joshi Sep 10, 2021
350f91c
Update cdc_vaccines/README.md
Ananya-Joshi Sep 10, 2021
9b102db
Update cdc_vaccines/README.md
Ananya-Joshi Sep 10, 2021
f08c9b1
Update cdc_vaccines/params.json.template
Ananya-Joshi Sep 10, 2021
2a0dcae
Update cdc_vaccines/README.md
Ananya-Joshi Sep 10, 2021
e1187f3
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Sep 10, 2021
04fbc1d
Update cdc_vaccines/delphi_cdc_vaccines/pull.py
Ananya-Joshi Sep 10, 2021
ba11d3c
minor changes
Ananya-Joshi Sep 10, 2021
8f7b814
Merge branch 'indicator_cdc_vaccines' of https://github.com/cmu-delph…
Ananya-Joshi Sep 10, 2021
ff808a4
changes to the json file
Ananya-Joshi Sep 10, 2021
675106c
changed the signal name generation
Ananya-Joshi Sep 11, 2021
0804d12
committed constants
Ananya-Joshi Sep 11, 2021
8963748
Update cdc_vaccines/README.md
Ananya-Joshi Sep 13, 2021
b2769e6
Modified run.py to have the right NaN codes
Ananya-Joshi Sep 17, 2021
b5f82b7
Merge branch 'indicator_cdc_vaccines' of https://github.com/cmu-delph…
Ananya-Joshi Sep 19, 2021
d0349a6
Added appropriate NaN codes
Ananya-Joshi Sep 19, 2021
e9b4a6a
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Sep 21, 2021
652664a
added back appropriate nan codes
Ananya-Joshi Sep 21, 2021
3401d71
changes to run.py
Ananya-Joshi Sep 21, 2021
58ee0e2
Update utilities for NAN codes:
dshemetov Feb 10, 2021
e035a21
Nans: update archiver deletion handling
dshemetov May 12, 2021
292084b
Nans: update archiver deletion
dshemetov May 13, 2021
8c9f41f
Nancodes archiver: rename variable for clarity
dshemetov Aug 31, 2021
738b201
Nancodes archiver: small formatting change
dshemetov Aug 31, 2021
f67925b
Nancodes: make linter happy
dshemetov Sep 15, 2021
661fab9
create dockerfile and dockerignore to host survey pipeline
nmdefries Jul 9, 2021
3043cb4
force Rcpp update and install utils via pip
nmdefries Jul 9, 2021
46e2d47
install wheel dependency
nmdefries Jul 9, 2021
1365da1
move docker files to facebook dir
nmdefries Jul 9, 2021
55ed232
add more package requirements
nmdefries Jul 12, 2021
583c5e1
remove dockerignore
nmdefries Jul 15, 2021
11186b9
specify readr version
nmdefries Sep 2, 2021
21b908d
- Adds BH dependency
korlaxxalrok Sep 8, 2021
6e783c4
Add ssmtp config
korlaxxalrok Sep 9, 2021
2f33968
Change geo_sig call
qx-teo Aug 4, 2021
0627683
Changing to using requests module
qx-teo Aug 4, 2021
a8ef938
Fix imports
qx-teo Aug 9, 2021
7e8dff6
Fix geo_combos
qx-teo Aug 9, 2021
a0366c4
Include indicator name
qx-teo Aug 10, 2021
8fbf23b
Improve clarity
qx-teo Aug 31, 2021
80c67d8
Fix syntax
qx-teo Aug 31, 2021
db508d2
Update README
qx-teo Aug 20, 2021
d8c0ec5
Update PLANS.md
qx-teo Aug 20, 2021
4d9d214
remove more addressed issues
nmdefries Aug 31, 2021
a1b443b
Update _delphi_utils_python/delphi_utils/validator/README.md
qx-teo Sep 1, 2021
04bf3f9
Update README.md
qx-teo Sep 1, 2021
b3c9325
Update PLANS.md
qx-teo Sep 1, 2021
ec51e1a
remove us territories from valid zips list
nmdefries Aug 26, 2021
c10adcc
fix tests
nmdefries Aug 27, 2021
c450e3b
stop supporting non-default aggs; revert #977
nmdefries Aug 18, 2021
85dd31f
simplify aggregate range selection; drop support for 'both'
nmdefries Aug 18, 2021
643bef9
remove mc_ agg processing
nmdefries Aug 18, 2021
1b1bc5e
add default contingency Rprof command
nmdefries Aug 18, 2021
8e0ebde
remove missing weights up front
nmdefries Aug 19, 2021
3b69c28
make clear that add_geo_vars acts on cols
nmdefries Aug 21, 2021
a41b6b4
create Cpp is_selected; R vers only on uniques
nmdefries Aug 18, 2021
ce4634b
deduplicate preparing group agg output
nmdefries Aug 21, 2021
2ad7b4c
switch na filter to use data.table with
nmdefries Aug 23, 2021
868aa20
update documentation
nmdefries Aug 23, 2021
ed93b3c
apply contingency changes to API agg func
nmdefries Aug 24, 2021
309a013
switch weights handling to use data.table
nmdefries Aug 24, 2021
37f97bc
misc cleanup
nmdefries Aug 26, 2021
a663f19
set up C++ package structure so can use in parallel
nmdefries Aug 30, 2021
0a4dc61
store output in list of lists
nmdefries Aug 31, 2021
f548c6a
comments
nmdefries Sep 1, 2021
5567ba0
cpp style
nmdefries Sep 1, 2021
d882622
remove test code
nmdefries Sep 1, 2021
0080d6e
rm cpp compile files
nmdefries Sep 1, 2021
693a25e
prevent raceeth missing error
nmdefries Sep 3, 2021
9dca6cb
[hhs_hosp] Permit generating backissues
krivard Sep 3, 2021
7338a96
ignore msa vaccine barriers "tried"
nmdefries Sep 7, 2021
c2bcb47
set weight field names in fread
nmdefries Sep 3, 2021
d420a2e
chore: bump delphi_utils to 0.1.11
Sep 7, 2021
2657fdb
chore: bump covidcast-indicators to 0.1.14
Sep 7, 2021
fc09f49
clean up organization and docstring for get_geos_within method
alexcoda Sep 3, 2021
a358dcd
Make docstring arguments up-to-date.
alexcoda Sep 3, 2021
198dd29
fix lint
alexcoda Sep 3, 2021
180ed9a
Update _delphi_utils_python/delphi_utils/geomap.py
alexcoda Sep 6, 2021
9ce0670
Update _delphi_utils_python/delphi_utils/geomap.py
alexcoda Sep 6, 2021
d4f8bf1
Include given values in get_geos_within error message
alexcoda Sep 6, 2021
6573b08
Remove caching in geomap.py
alexcoda Sep 7, 2021
bf28950
Simplify _load_crosswalk_from_file method
alexcoda Sep 8, 2021
2e4c4c4
Add missing imports to run.py template
alexcoda Sep 8, 2021
cb34ad0
chore: bump delphi_utils to 0.1.12
Sep 8, 2021
225e16d
chore: bump covidcast-indicators to 0.1.15
Sep 8, 2021
6cfe2bc
use recoded values, where available, for response choice codes
nmdefries Sep 7, 2021
2e3ef74
recode values in display logic
nmdefries Sep 8, 2021
bfb81f5
Makefile changes
korlaxxalrok Sep 8, 2021
a762ca7
Adds new params to template
korlaxxalrok Sep 9, 2021
e132c7d
vars changes
korlaxxalrok Sep 9, 2021
fe3d9c3
Add new secrets to vault
korlaxxalrok Sep 9, 2021
8f9d38f
chore: bump covidcast-indicators to 0.1.16
Sep 14, 2021
bc083b7
Build facebook container image
korlaxxalrok Sep 15, 2021
01d142b
un-retire schooling indicators from sirCAL
nmdefries Sep 14, 2021
372ecab
break line
nmdefries Sep 15, 2021
0204d88
add closing bracket
nmdefries Sep 15, 2021
f361676
Nancodes archiver: remove deleted file nan replacements
dshemetov Sep 22, 2021
ea68224
Update archiver docstrings
dshemetov Sep 27, 2021
b873a95
Update archiver docstrings
dshemetov Sep 27, 2021
49a5766
Nancodes archiver/export: explicit tests
dshemetov Sep 27, 2021
874623e
Update setup.py files to "Python :: 3.8" annotation
dshemetov Sep 15, 2021
f350dd6
Correctly ignore all receiving/*.csv files
dshemetov Sep 15, 2021
d827480
check if readr is installed
nmdefries Sep 16, 2021
3397277
only install remotes if not avail; upgrade as needed
nmdefries Sep 20, 2021
4b8ee7a
test run with no cache
nmdefries Sep 20, 2021
b043d54
Revert "test run with no cache"
nmdefries Sep 20, 2021
e40cd55
Fix value check in quidel data_tools
alexcoda Sep 19, 2021
6849004
Replace print statements with logging
alexcoda Sep 19, 2021
ff84e3a
lint
alexcoda Sep 19, 2021
74a84e4
lint
alexcoda Sep 19, 2021
6bfd724
Fix missing logger in tests
alexcoda Sep 19, 2021
50fd522
Fix missing logger in tests
alexcoda Sep 19, 2021
d6d0534
Instantiate logger correctly in tests
alexcoda Sep 19, 2021
52f8cb2
Fix error check
alexcoda Sep 19, 2021
511bf2e
Update quidel/delphi_quidel/pull.py
alexcoda Sep 22, 2021
d572e26
Update quidel/delphi_quidel/pull.py
alexcoda Sep 22, 2021
33e9325
Update quidel_covidtest/delphi_quidel_covidtest/pull.py
alexcoda Sep 22, 2021
775d125
Update quidel_covidtest/delphi_quidel_covidtest/pull.py
alexcoda Sep 22, 2021
e6ade5a
Add new host to inventory
korlaxxalrok Sep 21, 2021
2f1927a
Re-add primary back to inventory
korlaxxalrok Sep 21, 2021
9a3f4f1
Remove bare except in DV
chinandrew Sep 22, 2021
000dc8b
set E2 to integer on read
nmdefries Sep 20, 2021
9b00342
Switch CDC Covidnet to use structed logger
chinandrew Sep 22, 2021
c7d7ce0
Switch to structed logger for ChangeHC
chinandrew Sep 22, 2021
02f7080
switch doctor visits to structured logger
chinandrew Sep 22, 2021
c21b544
Refactor NCHS mortality to use delphi export util
chinandrew Sep 25, 2021
ca09586
Remove test for old export func
chinandrew Sep 25, 2021
90ea653
resolved misssing name issue with another PR, retrying this one.
Ananya-Joshi Sep 28, 2021
b9c6e8a
Merge branch 'main' into indicator_cdc_vaccines
Ananya-Joshi Sep 28, 2021
d3544d0
Cdc vaccines: add basic nancodes
dshemetov Sep 29, 2021
ea6587d
Cdc vaccines: add docstring for linter
dshemetov Sep 29, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/python-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
if: github.event.pull_request.draft == false
strategy:
matrix:
packages: [_delphi_utils_python, changehc, claims_hosp, combo_cases_and_deaths, covid_act_now, doctor_visits, google_symptoms, hhs_hosp, hhs_facilities, jhu, nchs_mortality, nowcast, quidel, quidel_covidtest, safegraph_patterns, sir_complainsalot, usafacts]
packages: [_delphi_utils_python, changehc, claims_hosp, combo_cases_and_deaths, covid_act_now, doctor_visits, google_symptoms, hhs_hosp, hhs_facilities, jhu, nchs_mortality, nowcast, quidel, quidel_covidtest, safegraph_patterns, sir_complainsalot, usafacts, cdc_vaccines]
defaults:
run:
working-directory: ${{ matrix.packages }}
Expand Down
22 changes: 22 additions & 0 deletions cdc_vaccines/.pylintrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@

[MESSAGES CONTROL]

disable=logging-format-interpolation,
too-many-locals,
too-many-arguments,
# Allow pytest functions to be part of a class.
no-self-use,
# Allow pytest classes to have one test.
too-few-public-methods

[BASIC]

# Allow arbitrarily short-named variables.
variable-rgx=[a-z_][a-z0-9_]*
argument-rgx=[a-z_][a-z0-9_]*
attr-rgx=[a-z_][a-z0-9_]*

[DESIGN]

# Don't complain about pytest "unused" arguments.
ignored-argument-names=(_.*|run_as_module)
29 changes: 29 additions & 0 deletions cdc_vaccines/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
.PHONY = venv, lint, test, clean

dir = $(shell find ./delphi_* -name __init__.py | grep -o 'delphi_[_[:alnum:]]*')

venv:
python3.8 -m venv env

install: venv
. env/bin/activate; \
pip install wheel ; \
pip install -e ../_delphi_utils_python ;\
pip install -e .

lint:
. env/bin/activate; pylint $(dir)
. env/bin/activate; pydocstyle $(dir)

test:
. env/bin/activate ;\
(cd tests && ../env/bin/pytest --cov=$(dir) --cov-report=term-missing)

clean:
rm -rf env
rm -f params.json

run:
env/bin/python -m $(dir)
env/bin/python -m delphi_utils.validator --dry_run
env/bin/python -m delphi_utils.archive
69 changes: 69 additions & 0 deletions cdc_vaccines/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# CDC Vaccinations

This indicator provides the official vaccination counts in the US. We export the county-level
daily vaccination rates data as-is, and publish the result as a COVIDcast signal.
We also aggregate the data to the MSA, HRR, State, HHS Region, and Nation levels.
For detailed information see the files DETAILS.md contained in this directory.

Note that individuals could be vaccinated outside of the US. Additionally,
there is no county level data for counties in Texas and Hawaii. Each state has some vaccination counts assigned to "unknown county". Some vaccination counts are assigned to "unknown state, unknown county".


## Running the Indicator

The indicator is run by directly executing the Python module contained in this
directory. The safest way to do this is to create a virtual environment,
installed the common DELPHI tools, and then install the module and its
dependencies. To do this, run the following command from this directory:

```
make install
```

This command will install the package in editable mode, so you can make changes that
will automatically propagate to the installed package.

All of the user-changable parameters are stored in `params.json`. To execute
the module and produce the output datasets (by default, in `receiving`), run
the following:

```
env/bin/python -m delphi_cdc_vaccines
```

If you want to enter the virtual environment in your shell,
you can run `source env/bin/activate`. Run `deactivate` to leave the virtual environment.

Once you are finished, you can remove the virtual environment and
params file with the following:

```
make clean
```

## Testing the code

To run static tests of the code style, run the following command:

```
make lint
```

Unit tests are also included in the module. To execute these, run the following
command from this directory:

```
make test
```

To run individual tests, run the following:

```
(cd tests && ../env/bin/pytest test_run.py --cov=delphi_ --cov-report=term-missing)
```

The output will show the number of unit tests that passed and failed, along
with the percentage of code covered by the tests.

None of the linting or unit tests should fail, and the code lines that are not covered by unit tests should be small and
should not include critical sub-routines.
38 changes: 38 additions & 0 deletions cdc_vaccines/REVIEW.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
## Code Review (Python)

A code review of this module should include a careful look at the code and the
output. To assist in the process, but certainly not in replace of it, please
check the following items.

**Documentation**

- [ ] the README.md file template is filled out and currently accurate; it is
possible to load and test the code using only the instructions given
- [ ] minimal docstrings (one line describing what the function does) are
included for all functions; full docstrings describing the inputs and expected
outputs should be given for non-trivial functions

**Structure**

- [ ] code should pass lint checks (`make lint`)
- [ ] any required metadata files are checked into the repository and placed
within the directory `static`
- [ ] any intermediate files that are created and stored by the module should
be placed in the directory `cache`
- [ ] final expected output files to be uploaded to the API are placed in the
`receiving` directory; output files should not be committed to the respository
- [ ] all options and API keys are passed through the file `params.json`
- [ ] template parameter file (`params.json.template`) is checked into the
code; no personal (i.e., usernames) or private (i.e., API keys) information is
included in this template file

**Testing**

- [ ] module can be installed in a new virtual environment (`make install`)
- [ ] reasonably high level of unit test coverage covering all of the main logic
of the code (e.g., missing coverage for raised errors that do not currently seem
possible to reach are okay; missing coverage for options that will be needed are
not)
- [ ] all unit tests run without errors (`make test`)
- [ ] indicator directory has been added to GitHub CI
(`covidcast-indicators/.github/workflows/python-ci.yml`)
Empty file added cdc_vaccines/cache/.gitignore
Empty file.
13 changes: 13 additions & 0 deletions cdc_vaccines/delphi_cdc_vaccines/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# -*- coding: utf-8 -*-
"""Module to pull and clean indicators from the CDC source.

This file defines the functions that are made public by the module. As the
module is intended to be executed though the main method, these are primarily
for testing.
"""

from __future__ import absolute_import
from . import pull
from . import run

__version__ = "0.1.0"
12 changes: 12 additions & 0 deletions cdc_vaccines/delphi_cdc_vaccines/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# -*- coding: utf-8 -*-
"""Call the function run_module when executed.

This file indicates that calling the module (`python -m delphi_cdc_vaccines`) will
call the function `run_module` found within the run.py file. There should be
no need to change this template.
"""

from delphi_utils import read_params
from .run import run_module # pragma: no cover

run_module(read_params()) # pragma: no cover
33 changes: 33 additions & 0 deletions cdc_vaccines/delphi_cdc_vaccines/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
"""Registry for variations."""

from itertools import product
from delphi_utils import Smoother


CUMULATIVE = 'cumulative'
INCIDENCE ='incidence'
FREQUENCY = [CUMULATIVE, INCIDENCE]
STATUS = ["tot", "part"]
AGE = ["", "_12P", "_18P", "_65P"]

SIGNALS = [f"{frequency}_counts_{status}_vaccine{AGE}" for
frequency, status, age in product(FREQUENCY, STATUS, AGE)]
DIFFERENCE_MAPPING = {
f"{INCIDENCE}_counts_{status}_vaccine{age}": f"{CUMULATIVE}_counts_{status}_vaccine{age}"
for status, age in product(STATUS, AGE)
}
SIGNALS = list(DIFFERENCE_MAPPING.keys()) + list(DIFFERENCE_MAPPING.values())


GEOS = [
"nation",
"state",
"hrr",
"hhs",
"msa"
]

SMOOTHERS = [
(Smoother("identity", impute_method=None), ""),
(Smoother("moving_average", window_length=7), "_7dav"),
]
136 changes: 136 additions & 0 deletions cdc_vaccines/delphi_cdc_vaccines/pull.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# -*- coding: utf-8 -*-
"""Functions for pulling data from the CDC data website for vaccines."""
import hashlib
from logging import Logger
from delphi_utils.geomap import GeoMapper
import numpy as np
import pandas as pd
from .constants import SIGNALS, DIFFERENCE_MAPPING



def pull_cdcvacc_data(base_url: str, logger: Logger) -> pd.DataFrame:
"""Pull the latest data from the CDC on vaccines and conform it into a dataset.

The output dataset has:
- Each row corresponds to (County, Date), denoted (FIPS, timestamp)
- Each row additionally has columns that correspond to the counts or
cumulative counts of vaccination status (fully vaccinated,
partially vaccinated) of various age groups (all, 12+, 18+, 65+)
from December 13th 2020 until the latest date

Note that the raw dataset gives the `cumulative` metrics, from which
we compute `counts` by taking first differences. Hence, `counts`
may be negative. This is wholly dependent on the quality of the raw
dataset.

We filter the data such that we only keep rows with valid FIPS, or "FIPS"
codes defined under the exceptions of the README. The current exceptions
include:
# - 0: statewise unallocated
Parameters
----------
base_url: str
Base URL for pulling the CDC Vaccination Data
logger: Logger
Returns
-------
pd.DataFrame
Dataframe as described above.
"""
# Columns to drop the the data frame.
drop_columns = [
"date",
"recip_state",
"series_complete_pop_pct",
"mmwr_week",
"recip_county",
"state_id"
]


# Read data
df = pd.read_csv(base_url)
logger.info("data retrieved from source",
num_rows=df.shape[0],
num_cols=df.shape[1],
min_date=min(df['Date']),
max_date=max(df['Date']),
checksum=hashlib.sha256(pd.util.hash_pandas_object(df).values).hexdigest())
df.columns = [i.lower() for i in df.columns]

df['recip_state'] = df['recip_state'].str.lower()
drop_columns.extend([x for x in df.columns if ("pct" in x) | ("svi" in x)])
drop_columns = list(set(drop_columns))
df = GeoMapper().add_geocode(df, "state_id", "state_code",
from_col="recip_state", new_col="state_id", dropna=False)
df['state_id'] = df['state_id'].fillna('0').astype(int)
# Change FIPS from 0 to XX000 for statewise unallocated cases/deaths
unassigned_index = (df["fips"] == "UNK")
df.loc[unassigned_index, "fips"] = df["state_id"].loc[unassigned_index].values * 1000

# Conform FIPS
df["fips"] = df["fips"].apply(lambda x: f"{int(x):05d}")
df["timestamp"] = pd.to_datetime(df["date"])
# Drop unnecessary columns (state is pre-encoded in fips)
try:
df.drop(drop_columns, axis=1, inplace=True)
except KeyError as e:
raise ValueError(
"Tried to drop non-existent columns. The dataset "
"schema may have changed. Please investigate and "
"amend drop_columns."
) from e
# timestamp: str -> datetime
df.columns = ["fips",
"cumulative_counts_tot_vaccine",
"cumulative_counts_tot_vaccine_12P",
"cumulative_counts_tot_vaccine_18P",
"cumulative_counts_tot_vaccine_65P",
"cumulative_counts_part_vaccine",
"cumulative_counts_part_vaccine_12P",
"cumulative_counts_part_vaccine_18P",
"cumulative_counts_part_vaccine_65P",
"timestamp"]
df_dummy = df.loc[(df["fips"]!='00000') & (df["timestamp"] == min(df["timestamp"]))].copy()
#handle fips 00000 separately
df_oth = df.loc[((df["fips"]=='00000') &
(df["timestamp"]==min(df[df['fips'] == '00000']['timestamp'])))].copy()
df_dummy = pd.concat([df_dummy, df_oth])
df_dummy.loc[:, "timestamp"] = df_dummy.loc[:, "timestamp"] - pd.Timedelta(days=1)
df_dummy.loc[:, ["cumulative_counts_tot_vaccine",
"cumulative_counts_tot_vaccine_12P",
"cumulative_counts_tot_vaccine_18P",
"cumulative_counts_tot_vaccine_65P",
"cumulative_counts_part_vaccine",
"cumulative_counts_part_vaccine_12P",
"cumulative_counts_part_vaccine_18P",
"cumulative_counts_part_vaccine_65P",
]] = 0

df =pd.concat([df_dummy, df])
# Obtain new_counts
df.sort_values(["fips", "timestamp"], inplace=True)
for to, from_d in DIFFERENCE_MAPPING.items():
df[to] = df[from_d].diff()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, you might like this version of taking diffs, grouped by geos.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the .isna() in that code should probably be replaced with .loc[min_time_value, :] or something, but the gist is there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can keep this method for now, but then later I'll look to fix the method and use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's alright. A possible refactor for later.


rem_list = [ x for x in list(df.columns) if x not in ['timestamp', 'fips'] ]
# Handle edge cases where we diffed across fips
mask = df["fips"] != df["fips"].shift(1)
df.loc[mask, rem_list] = np.nan
df.reset_index(inplace=True, drop=True)
# Final sanity checks
unique_days = df["timestamp"].unique()
min_timestamp = min(unique_days)
max_timestamp = max(unique_days)
n_days = (max_timestamp - min_timestamp) / np.timedelta64(1, "D") + 1
if n_days != len(unique_days):
raise ValueError(
f"Not every day between {min_timestamp} and "
"{max_timestamp} is represented."
)
return df.loc[
df["timestamp"] >= min(df["timestamp"]),
# Reorder
["fips", "timestamp"] + SIGNALS,
].reset_index(drop=True)
Loading