Skip to content

Commit 5343cfb

Browse files
authored
Merge pull request #1516 from cmu-delphi/release/indicators_v0.3.2_utils_v0.3.1
Release covidcast-indicators 0.3.2
2 parents 09bdbc0 + 6019bc4 commit 5343cfb

25 files changed

+1050
-576
lines changed

.bumpversion.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 0.3.1
2+
current_version = 0.3.2
33
commit = True
44
message = chore: bump covidcast-indicators to {new_version}
55
tag = False

_delphi_utils_python/.bumpversion.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 0.3.0
2+
current_version = 0.3.1
33
commit = True
44
message = chore: bump delphi_utils to {new_version}
55
tag = False

_delphi_utils_python/data_proc/geomap/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ You can see consistency checks and diffs with old sources in ./consistency_check
1717

1818
We support the following geocodes.
1919

20-
- The ZIP code and the FIPS code are the most granular geocodes we support.
20+
- The ZIP code and the FIPS code are the most granular geocodes we support.
2121
- The [ZIP code](https://en.wikipedia.org/wiki/ZIP_Code) is a US postal code used by the USPS and the [FIPS code](https://en.wikipedia.org/wiki/FIPS_county_code) is an identifier for US counties and other associated territories. The ZIP code is five digit code (with leading zeros).
2222
- The FIPS code is a five digit code (with leading zeros), where the first two digits are a two-digit state code and the last three are a three-digit county code (see this [US Census Bureau page](https://www.census.gov/library/reference/code-lists/ansi.html) for detailed information).
2323
- The Metropolitan Statistical Area (MSA) code refers to regions around cities (these are sometimes referred to as CBSA codes). More information on these can be found at the [US Census Bureau](https://www.census.gov/programs-surveys/metro-micro/about.html).
24-
- We are reserving 10001-10099 for states codes of the form 100XX where XX is the FIPS code for the state (the current smallest CBSA is 10100). In the case that the CBSA codes change then it should be verified that these are not used.
24+
- We are reserving 10001-10099 for states codes of the form 100XX where XX is the FIPS code for the state (the current smallest CBSA is 10100). In the case that the CBSA codes change then it should be verified that these are not used.
2525
- State codes are a series of equivalent identifiers for US state. They include the state name, the state number (state_id), and the state two-letter abbreviation (state_code). The state number is the state FIPS code. See [here](https://en.wikipedia.org/wiki/List_of_U.S._state_and_territory_abbreviations) for more.
2626
- The Hospital Referral Region (HRR) and the Hospital Service Area (HSA). More information [here](https://www.dartmouthatlas.org/covid-19/hrr-mapping/).
2727
- The JHU signal contains its own geographic identifier, labeled the UID. Documentation is provided at [their repo](https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data#uid-lookup-table-logic). Its FIPS codes depart in some special cases, so we produce manual changes listed below.
@@ -30,7 +30,7 @@ We support the following geocodes.
3030

3131
The source files are requested from a government URL when `geo_data_proc.py` is run (see the top of said script for the URLs). Below we describe the locations to find updated versions of the source files, if they are ever needed.
3232

33-
- ZIP -> FIPS (county) population tables available from [US Census](https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.html#par_textimage_674173622). This file contains the population of the intersections between ZIP and FIPS regions, allowing the creation of a population-weighted transform between the two.
33+
- ZIP -> FIPS (county) population tables available from [US Census](https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.html#par_textimage_674173622). This file contains the population of the intersections between ZIP and FIPS regions, allowing the creation of a population-weighted transform between the two. As of 4 February 2022, this source did not include population information for 24 ZIPs that appear in our indicators. We have added those values manually using information available from the [zipdatamaps website](www.zipdatamaps.com).
3434
- ZIP -> HRR -> HSA crosswalk file comes from the 2018 version at the [Dartmouth Atlas Project](https://atlasdata.dartmouth.edu/static/supp_research_data).
3535
- FIPS -> MSA crosswalk file comes from the September 2018 version of the delineation files at the [US Census Bureau](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html).
3636
- State Code -> State ID -> State Name comes from the ANSI standard at the [US Census](https://www.census.gov/library/reference/code-lists/ansi.html#par_textimage_3). The first two digits of a FIPS codes should match the state code here.
@@ -60,6 +60,6 @@ The rest of the crosswalk tables are derived from the mappings above. We provide
6060
- MSA tables from March 2020 [here](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html). This file seems to differ in a few fips codes from the source for the 02_20_uszip file which Jingjing constructed. There are at least 10 additional fips in 03_20_msa that are not in the uszip file, and one of the msa codes seems to be incorrect: 49020 (a google search confirms that it is incorrect in uszip and correct in the census data).
6161
- MSA tables from 2019 [here](https://apps.bea.gov/regional/docs/msalist.cfm)
6262

63-
## Notes
63+
## Notes
6464

6565
- The NAs in the coding currently zero-fills.

_delphi_utils_python/data_proc/geomap/geo_data_proc.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
FIPS_POPULATION_URL = f"https://www2.census.gov/programs-surveys/popest/datasets/2010-{YEAR}/counties/totals/co-est{YEAR}-alldata.csv"
3333
FIPS_PUERTO_RICO_POPULATION_URL = "https://www2.census.gov/geo/docs/maps-data/data/rel/zcta_county_rel_10.txt?"
3434
STATE_HHS_FILE = "hhs.txt"
35+
ZIP_POP_MISSING_FILE = "zip_pop_filling.csv"
3536

3637
# Out files
3738
FIPS_STATE_OUT_FILENAME = "fips_state_table.csv"
@@ -181,6 +182,7 @@ def create_jhu_uid_fips_crosswalk():
181182
]
182183
)
183184

185+
184186
jhu_df = pd.read_csv(JHU_FIPS_URL, dtype={"UID": str, "FIPS": str}).query("Country_Region == 'US'")
185187
jhu_df = jhu_df.rename(columns={"UID": "jhu_uid", "FIPS": "fips"}).dropna(subset=["fips"])
186188

@@ -336,6 +338,7 @@ def create_hhs_population_table():
336338
state_pop = pd.read_csv(join(OUTPUT_DIR, STATE_POPULATION_OUT_FILENAME), dtype={"state_code": str, "hhs": int}, usecols=["state_code", "pop"])
337339
state_hhs = pd.read_csv(join(OUTPUT_DIR, STATE_HHS_OUT_FILENAME), dtype=str)
338340
hhs_pop = state_pop.merge(state_hhs, on="state_code").groupby("hhs", as_index=False).sum()
341+
339342
hhs_pop.sort_values("hhs").to_csv(join(OUTPUT_DIR, HHS_POPULATION_OUT_FILENAME), index=False)
340343

341344

@@ -363,6 +366,18 @@ def derive_zip_population_table():
363366
df = census_pop.merge(fz_df, on="fips", how="left")
364367
df["pop"] = df["pop"].multiply(df["weight"], axis=0)
365368
df = df.drop(columns=["fips", "weight"]).groupby("zip").sum().dropna().reset_index()
369+
370+
## loading populatoin of some zips- #Issue 0648
371+
zip_pop_missing = pd.read_csv(
372+
ZIP_POP_MISSING_FILE,sep=",",
373+
dtype={"zip":str,"pop":np.int32}
374+
)
375+
## cheking if each zip still missing, and concatenating if True
376+
for x_zip in zip_pop_missing['zip']:
377+
if x_zip not in df['zip']:
378+
df = pd.concat([df, zip_pop_missing[zip_pop_missing['zip'] == x_zip]],
379+
ignore_index=True)
380+
366381
df["pop"] = df["pop"].astype(int)
367382
df.sort_values("zip").to_csv(join(OUTPUT_DIR, ZIP_POPULATION_OUT_FILENAME), index=False)
368383

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
zip,pop
2+
57756,1126
3+
57764,1923
4+
57770,5271
5+
57772,2048
6+
57794,644
7+
99554,677
8+
99563,938
9+
99566,192
10+
99573,1115
11+
99574,2348
12+
99581,762
13+
99585,417
14+
99586,605
15+
99604,1093
16+
99620,577
17+
99632,813
18+
99650,568
19+
99657,329
20+
99658,616
21+
99662,480
22+
99666,189
23+
99677,88
24+
99686,4005
25+
99693,248

_delphi_utils_python/delphi_utils/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,4 @@
1515
from .nancodes import Nans
1616
from .weekday import Weekday
1717

18-
__version__ = "0.3.0"
18+
__version__ = "0.3.1"

_delphi_utils_python/delphi_utils/data/2019/zip_pop.csv

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19549,15 +19549,19 @@ zip,pop
1954919549
57752,317
1955019550
57754,4067
1955119551
57755,119
19552+
57756,1126
1955219553
57758,219
1955319554
57759,585
1955419555
57760,1395
1955519556
57761,1360
1955619557
57762,547
1955719558
57763,266
19559+
57764,1923
1955819560
57766,224
1955919561
57767,167
1956019562
57769,3915
19563+
57770,5271
19564+
57772,2048
1956119565
57773,145
1956219566
57775,251
1956319567
57776,15
@@ -19572,6 +19576,7 @@ zip,pop
1957219576
57791,213
1957319577
57792,143
1957419578
57793,2061
19579+
57794,644
1957519580
57799,707
1957619581
58001,54
1957719582
58002,27
@@ -32756,29 +32761,37 @@ zip,pop
3275632761
99551,677
3275732762
99552,373
3275832763
99553,1092
32764+
99554,677
3275932765
99555,224
3276032766
99556,2659
3276132767
99557,784
3276232768
99558,79
3276332769
99559,8248
3276432770
99561,451
32771+
99563,938
3276532772
99564,88
3276632773
99565,76
3276732774
99566,183
32775+
99566,192
3276832776
99567,9090
3276932777
99568,295
3277032778
99569,64
3277132779
99571,170
3277232780
99572,313
32781+
99573,1115
3277332782
99573,1064
32783+
99574,2348
3277432784
99574,2242
3277532785
99575,113
3277632786
99576,2640
3277732787
99577,25433
3277832788
99578,319
3277932789
99579,106
3278032790
99580,116
32791+
99581,762
3278132792
99583,37
32793+
99585,417
32794+
99586,605
3278232795
99586,577
3278332796
99587,2220
3278432797
99588,958
@@ -32787,6 +32800,7 @@ zip,pop
3278732800
99591,107
3278832801
99602,166
3278932802
99603,10427
32803+
99604,1093
3279032804
99605,223
3279132805
99606,457
3279232806
99607,233
@@ -32797,6 +32811,7 @@ zip,pop
3279732811
99613,372
3279832812
99614,690
3279932813
99615,12347
32814+
99620,577
3280032815
99621,781
3280132816
99622,346
3280232817
99624,86
@@ -32806,6 +32821,7 @@ zip,pop
3280632821
99628,449
3280732822
99630,206
3280832823
99631,241
32824+
99632,813
3280932825
99633,456
3281032826
99634,382
3281132827
99636,517
@@ -32820,18 +32836,23 @@ zip,pop
3282032836
99647,42
3282132837
99648,110
3282232838
99649,78
32839+
99650,568
3282332840
99651,65
3282432841
99652,4506
3282532842
99653,155
3282632843
99654,63494
3282732844
99655,723
3282832845
99656,27
32846+
99657,329
32847+
99658,616
3282932848
99659,422
3283032849
99660,503
3283132850
99661,1040
32851+
99662,480
3283232852
99663,438
3283332853
99664,5226
3283432854
99665,77
32855+
99666,189
3283532856
99667,79
3283632857
99668,92
3283732858
99669,15038
@@ -32840,6 +32861,7 @@ zip,pop
3284032861
99672,3925
3284132862
99674,1757
3284232863
99676,1881
32864+
99677,88
3284332865
99677,84
3284432866
99678,903
3284532867
99679,403
@@ -32850,11 +32872,13 @@ zip,pop
3285032872
99684,725
3285132873
99685,4438
3285232874
99686,3824
32875+
99686,4005
3285332876
99688,3358
3285432877
99689,579
3285532878
99690,302
3285632879
99691,88
3285732880
99692,159
32881+
99693,248
3285832882
99693,236
3285932883
99694,1840
3286032884
99695,6

_delphi_utils_python/setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626

2727
setup(
2828
name="delphi_utils",
29-
version="0.3.0",
29+
version="0.3.1",
3030
description="Shared Utility Functions for Indicators",
3131
long_description=long_description,
3232
long_description_content_type="text/markdown",

ansible/templates/quidel_covidtest-params-prod.json.j2

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,15 @@
4848
]
4949
}
5050
},
51+
"archive": {
52+
"aws_credentials": {
53+
"aws_access_key_id": "{{ delphi_aws_access_key_id }}",
54+
"aws_secret_access_key": "{{ delphi_aws_secret_access_key }}"
55+
},
56+
"bucket_name": "delphi-covidcast-indicator-output",
57+
"cache_dir": "./archivediffer_cache",
58+
"indicator_prefix": "quidel"
59+
},
5160
"delivery": {
5261
"delivery_dir": "/common/covidcast/receiving/quidel"
5362
}

quidel_covidtest/.pylintrc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
disable=logging-format-interpolation,
55
too-many-locals,
66
too-many-arguments,
7+
too-many-branches,
78
# Allow pytest functions to be part of a class.
89
no-self-use,
910
# Allow pytest classes to have one test.

quidel_covidtest/delphi_quidel_covidtest/constants.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
MIN_OBS = 50 # minimum number of observations in order to compute a proportion.
44
POOL_DAYS = 7 # number of days in the past (including today) to pool over
55
END_FROM_TODAY_MINUS = 5 # report data until - X days
6-
# Signal names
6+
# Signal Types
77
SMOOTHED_POSITIVE = "covid_ag_smoothed_pct_positive"
88
RAW_POSITIVE = "covid_ag_raw_pct_positive"
99
SMOOTHED_TEST_PER_DEVICE = "covid_ag_smoothed_test_per_device"
@@ -22,6 +22,7 @@
2222
HRR,
2323
]
2424

25+
# state should be last one
2526
NONPARENT_GEO_RESOLUTIONS = [
2627
HHS,
2728
NATION,
@@ -39,3 +40,12 @@
3940
# SMOOTHED_TEST_PER_DEVICE: (True, True),
4041
# RAW_TEST_PER_DEVICE: (True, False)
4142
}
43+
AGE_GROUPS = [
44+
"total",
45+
"age_0_4",
46+
"age_5_17",
47+
"age_18_49",
48+
"age_50_64",
49+
"age_65plus",
50+
"age_0_17",
51+
]

quidel_covidtest/delphi_quidel_covidtest/data_tools.py

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -67,15 +67,14 @@ def _slide_window_sum(arr, k):
6767
sarr = np.convolve(temp, np.ones(k, dtype=int), 'valid')
6868
return sarr
6969

70-
7170
def _geographical_pooling(tpooled_tests, tpooled_ptests, min_obs):
7271
"""
7372
Determine how many samples from the parent geography must be borrowed.
7473
75-
If there are no samples available in the parent, the borrow_prop is 0. If
76-
the parent does not have enough samples, we return a borrow_prop of 1, and
77-
the fact that the pooled samples are insufficient are handled in the
78-
statistic fitting step.
74+
If there are no samples available in the parent, the borrow_prop is 0.
75+
If the parent does not have enough samples, we return a borrow_prop of 1.
76+
No more samples borrowed from the parent compared to the number of samples
77+
we currently have.
7978
8079
Args:
8180
tpooled_tests: np.ndarray[float]
@@ -93,10 +92,12 @@ def _geographical_pooling(tpooled_tests, tpooled_ptests, min_obs):
9392
"""
9493
if (np.any(np.isnan(tpooled_tests)) or np.any(np.isnan(tpooled_ptests))):
9594
raise ValueError('[parent] tests should be non-negative '
96-
'with no np.nan')
95+
'with no np.nan')
9796
# STEP 1: "TOP UP" USING PARENT LOCATION
9897
# Number of observations we need to borrow to "top up"
98+
# Can't borrow more than total no. observations.
9999
borrow_tests = np.maximum(min_obs - tpooled_tests, 0)
100+
borrow_tests = np.minimum(borrow_tests, tpooled_tests)
100101
# There are many cases (a, b > 0):
101102
# Case 1: a / b => no problem
102103
# Case 2: a / 0 => np.inf => borrow_prop becomes 1
@@ -108,13 +109,14 @@ def _geographical_pooling(tpooled_tests, tpooled_ptests, min_obs):
108109
with np.errstate(divide='ignore', invalid='ignore'):
109110
borrow_prop = borrow_tests / tpooled_ptests
110111
# If there's nothing to borrow, then ya can't borrow
111-
borrow_prop[np.isnan(borrow_prop)] = 0
112-
# Can't borrow more than total no. observations.
112+
borrow_prop[(np.isnan(borrow_prop))
113+
| (tpooled_tests == 0)
114+
| (tpooled_ptests == 0)] = 0
115+
# Can't borrow more than total no. observations in the parent state
113116
# Relies on the fact that np.inf > 1
114117
borrow_prop[borrow_prop > 1] = 1
115118
return borrow_prop
116119

117-
118120
def raw_positive_prop(positives, tests, min_obs):
119121
"""
120122
Calculate the proportion of positive tests without any temporal smoothing.

0 commit comments

Comments
 (0)