From ff2d3413ab202782ec1ec5dfff4a449af8f78a43 Mon Sep 17 00:00:00 2001
From: Dmitry Shemetov <dshemetov@ucdavis.edu>
Date: Mon, 6 May 2024 15:32:59 -0700
Subject: [PATCH 1/9] lint+doc: update geomap docs and minor lint

---
 .../data_proc/geomap/README.md                |  31 ++--
 .../data_proc/geomap/geo_data_proc.py         |   8 +-
 _delphi_utils_python/delphi_utils/geomap.py   | 135 ++++++++++--------
 3 files changed, 90 insertions(+), 84 deletions(-)

diff --git a/_delphi_utils_python/data_proc/geomap/README.md b/_delphi_utils_python/data_proc/geomap/README.md
index 08075fff9..38297b691 100644
--- a/_delphi_utils_python/data_proc/geomap/README.md
+++ b/_delphi_utils_python/data_proc/geomap/README.md
@@ -1,4 +1,4 @@
-# Geocoding data processing pipeline
+# Geocoding Data Processing
 
 Authors: Jingjing Tang, James Sharpnack, Dmitry Shemetov
 
@@ -7,42 +7,37 @@ Authors: Jingjing Tang, James Sharpnack, Dmitry Shemetov
 Requires the following source files below.
 
 Run the following to build the crosswalk tables in `covidcast-indicators/_delph_utils_python/delph_utils/data`
-```
+
+```sh
 $ python geo_data_proc.py
 ```
 
-You can see consistency checks and diffs with old sources in ./consistency_checks.ipynb
+Find data consistency checks in `./source-file-sanity-check.ipynb`.
 
 ## Geo Codes
 
 We support the following geocodes.
 
-- The ZIP code and the FIPS code are the most granular geocodes we support.
-  - The [ZIP code](https://en.wikipedia.org/wiki/ZIP_Code) is a US postal code used by the USPS and the [FIPS code](https://en.wikipedia.org/wiki/FIPS_county_code) is an identifier for US counties and other associated territories. The ZIP code is five digit code (with leading zeros). 
-  - The FIPS code is a five digit code (with leading zeros), where the first two digits are a two-digit state code and the last three are a three-digit county code (see this [US Census Bureau page](https://www.census.gov/library/reference/code-lists/ansi.html) for detailed information).
-- The Metropolitan Statistical Area (MSA) code refers to regions around cities (these are sometimes referred to as CBSA codes). More information on these can be found at the [US Census Bureau](https://www.census.gov/programs-surveys/metro-micro/about.html).
-  - We are reserving 10001-10099 for states codes of the form 100XX where XX is the FIPS code for the state (the current smallest CBSA is 10100). In the case that the CBSA codes change then it should be verified that these are not used.
+- The [ZIP code](https://en.wikipedia.org/wiki/ZIP_Code) is a US postal code used by the USPS and the [FIPS code](https://en.wikipedia.org/wiki/FIPS_county_code) is an identifier for US counties and other associated territories. The ZIP code is five digit code (with leading zeros).
+- The FIPS code is a five digit code (with leading zeros), where the first two digits are a two-digit state code and the last three are a three-digit county code (see this [US Census Bureau page](https://www.census.gov/library/reference/code-lists/ansi.html) for detailed information).
+- The Metropolitan Statistical Area (MSA) code refers to regions around cities (these are sometimes referred to as CBSA codes). More information on these can be found at the [US Census Bureau](https://www.census.gov/programs-surveys/metro-micro/about.html). We rserve 10001-10099 for states codes of the form 100XX where XX is the FIPS code for the state (the current smallest CBSA is 10100). In the case that the CBSA codes change then it should be verified that these are not used.
 - State codes are a series of equivalent identifiers for US state. They include the state name, the state number (state_id), and the state two-letter abbreviation (state_code). The state number is the state FIPS code. See [here](https://en.wikipedia.org/wiki/List_of_U.S._state_and_territory_abbreviations) for more.
 - The Hospital Referral Region (HRR) and the Hospital Service Area (HSA). More information [here](https://www.dartmouthatlas.org/covid-19/hrr-mapping/).
-FIPS codes depart in some special cases, so we produce manual changes listed below.
 
-## Source files
+## Source Files
 
 The source files are requested from a government URL when `geo_data_proc.py` is run (see the top of said script for the URLs). Below we describe the locations to find updated versions of the source files, if they are ever needed.
 
 - ZIP -> FIPS (county) population tables available from [US Census](https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.html#par_textimage_674173622). This file contains the population of the intersections between ZIP and FIPS regions, allowing the creation of a population-weighted transform between the two. As of 4 February 2022, this source did not include population information for 24 ZIPs that appear in our indicators. We have added those values manually using information available from the [zipdatamaps website](www.zipdatamaps.com).
 - ZIP -> HRR -> HSA crosswalk file comes from the 2018 version at the [Dartmouth Atlas Project](https://atlasdata.dartmouth.edu/static/supp_research_data).
 - FIPS -> MSA crosswalk file comes from the September 2018 version of the delineation files at the [US Census Bureau](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html).
-- State Code -> State ID -> State Name comes from the ANSI standard at the [US Census](https://www.census.gov/library/reference/code-lists/ansi.html#par_textimage_3). The first two digits of a FIPS codes should match the state code here.
+- State Code -> State ID -> State Name comes from the ANSI standard at the [US Census](https://www.census.gov/library/reference/code-lists/ansi.html#par_textimage_3).
 
-
-## Derived files
+## Derived Files
 
 The rest of the crosswalk tables are derived from the mappings above. We provide crosswalk functions from granular to coarser codes, but not the other way around. This is because there is no information gained when crosswalking from coarse to granular.
 
-
-
-## Deprecated source files
+## Deprecated Source Files
 
 - ZIP to FIPS to HRR to states: `02_20_uszips.csv` comes from a version of the table [here](https://simplemaps.com/data/us-zips) modified by Jingjing to include population weights.
   - The `02_20_uszips.csv` file is based on the newest consensus data including 5-digit zipcode, fips code, county name, state, population, HRR, HSA (I downloaded the original file from [here](https://simplemaps.com/data/us-zips). This file matches best to the most recent (2020) situation in terms of the population. But there still exist some matching problems. I manually checked and corrected those lines (~20) with [zip-codes](https://www.zip-codes.com/zip-code/58439/zip-code-58439.asp). The mapping from 5-digit zipcode to HRR is based on the file in 2017 version downloaded from [here](https://atlasdata.dartmouth.edu/static/supp_research_data).
@@ -51,7 +46,3 @@ The rest of the crosswalk tables are derived from the mappings above. We provide
 - CBSA -> FIPS crosswalk from [here](https://data.nber.org/data/cbsa-fips-county-crosswalk.html) (the file is `cbsatocountycrosswalk.csv`).
 - MSA tables from March 2020 [here](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html). This file seems to differ in a few fips codes from the source for the 02_20_uszip file which Jingjing constructed. There are at least 10 additional fips in 03_20_msa that are not in the uszip file, and one of the msa codes seems to be incorrect: 49020 (a google search confirms that it is incorrect in uszip and correct in the census data).
 - MSA tables from 2019 [here](https://apps.bea.gov/regional/docs/msalist.cfm)
-
-## Notes
-
-- The NAs in the coding currently zero-fills.
diff --git a/_delphi_utils_python/data_proc/geomap/geo_data_proc.py b/_delphi_utils_python/data_proc/geomap/geo_data_proc.py
index c2a07a78f..5634d6f83 100755
--- a/_delphi_utils_python/data_proc/geomap/geo_data_proc.py
+++ b/_delphi_utils_python/data_proc/geomap/geo_data_proc.py
@@ -1,10 +1,7 @@
 """
-Authors: Dmitry Shemetov @dshemetov, James Sharpnack @jsharpna
-
-Intended execution:
+Authors: Dmitry Shemetov, James Sharpnack
 
 cd _delphi_utils/data_proc/geomap
-chmod u+x geo_data_proc.py
 python geo_data_proc.py
 """
 
@@ -19,7 +16,7 @@
 
 
 # Source files
-YEAR = 2019
+YEAR = 2020
 INPUT_DIR = "./old_source_files"
 OUTPUT_DIR = f"../../delphi_utils/data/{YEAR}"
 FIPS_BY_ZIP_POP_URL = "https://www2.census.gov/geo/docs/maps-data/data/rel/zcta_county_rel_10.txt?#"
@@ -41,7 +38,6 @@
 FIPS_HHS_FILENAME = "fips_hhs_table.csv"
 FIPS_CHNGFIPS_OUT_FILENAME = "fips_chng-fips_table.csv"
 FIPS_POPULATION_OUT_FILENAME = "fips_pop.csv"
-
 CHNGFIPS_STATE_OUT_FILENAME = "chng-fips_state_table.csv"
 ZIP_HSA_OUT_FILENAME = "zip_hsa_table.csv"
 ZIP_HRR_OUT_FILENAME = "zip_hrr_table.csv"
diff --git a/_delphi_utils_python/delphi_utils/geomap.py b/_delphi_utils_python/delphi_utils/geomap.py
index 29ae3667e..c928a2bf5 100644
--- a/_delphi_utils_python/delphi_utils/geomap.py
+++ b/_delphi_utils_python/delphi_utils/geomap.py
@@ -18,54 +18,89 @@ class GeoMapper:  # pylint: disable=too-many-public-methods
 
     The GeoMapper class provides utility functions for translating between different
     geocodes. Supported geocodes:
-    - zip: zip5, a length 5 str of 0-9 with leading 0's
-    - fips: state code and county code, a length 5 str of 0-9 with leading 0's
-    - msa: metropolitan statistical area, a length 5 str of 0-9 with leading 0's
-    - state_code: state code, a str of 0-9
-    - state_id: state id, a str of A-Z
-    - hrr: hospital referral region, an int 1-500
-
-    Mappings:
-    - [x] zip -> fips : population weighted
-    - [x] zip -> hrr : unweighted
-    - [x] zip -> msa : unweighted
-    - [x] zip -> state
-    - [x] zip -> hhs
-    - [x] zip -> population
-    - [x] state code -> hhs
-    - [x] fips -> state : unweighted
-    - [x] fips -> msa : unweighted
-    - [x] fips -> megacounty
-    - [x] fips -> hrr
-    - [x] fips -> hhs
-    - [x] fips -> chng-fips
-    - [x] chng-fips -> state : unweighted
-    - [x] nation
-    - [ ] zip -> dma (postponed)
-
-    The GeoMapper instance loads crosswalk tables from the package data_dir. The
-    crosswalk tables are assumed to have been built using the geo_data_proc.py script
-    in data_proc/geomap. If a mapping between codes is NOT one to many, then the table has
-    just two colums. If the mapping IS one to many, then a third column, the weight column,
-    exists (e.g. zip, fips, weight; satisfying (sum(weights) where zip==ZIP) == 1).
+
+    - zip:          five characters [0-9] with leading 0's, e.g. "33626"
+                    also known as zip5 or zip code
+    - fips:         five characters [0-9] with leading 0's, e.g. "12057"
+                    the first two digits are the state FIPS code and the last
+                    three are the county FIPS code
+    - msa:          five characters [0-9] with leading 0's, e.g. "90001"
+                    also known as metropolitan statistical area
+    - state_code:   two characters [0-9], e.g "06"
+    - state_id:     two characters [A-Z], e.g "CA"
+    - state_name:   human-readable name, e.g "California"
+    - hrr:          an integer from 1-500, also known as hospital
+                    referral region
+    - hhs:          an integer from 1-10, also known as health and human services region
+                    https://www.hhs.gov/about/agencies/iea/regional-offices/index.html
+
+    Valid mappings:
+
+    From            To              Population Weighted
+    zip             fips            Yes
+    zip             hrr             No
+    zip             msa             Yes
+    zip             state_*         Yes
+    zip             hhs             Yes
+    zip             population      --
+    zip             nation          No
+    state_*         state_*         No
+    state_*         hhs             No
+    state_*         population      --
+    state_*         nation          No
+    fips            state_*         No
+    fips            msa             No
+    fips            megacounty      No
+    fips            hrr             Yes
+    fips            hhs             No
+    fips            chng-fips       No
+    fips            nation          No
+    chng-fips       state_*         No
+
+    Crosswalk Tables
+    ================
+
+    The GeoMapper instance loads pre-generated crosswalk tables (built by the
+    script in `data_proc/geomap/geo_data_proc.py`). If a mapping between codes
+    is one to one or many to one, then the table has just two columns. If the
+    mapping is one to many, then a weight column is provided, which gives the
+    fractional population contribution of a source_geo to the target_geo. The
+    weights satisfy the condition that df.groupby(from_code).sum(weight) == 1.0
+    for all values of from_code.
+
+    Aggregation
+    ===========
+
+    The GeoMapper class provides functions to aggregate data from one geocode
+    to another. The aggregation can be a simple one-to-one mapping or a
+    weighted aggregation. The weighted aggregation is useful when the data
+    being aggregated is a population-weighted quantity, such as visits or
+    cases. The aggregation is done by multiplying the data columns by the
+    weights and summing over the data columns. Note that the aggregation does
+    not adjust the aggregation for missing or NA values in the data columns,
+    which is equivalent to a zero-fill.
 
     Example Usage
-    ==========
+    =============
     The main GeoMapper object loads and stores crosswalk dataframes on-demand.
 
-    When replacing geocodes with a new one an aggregation step is performed on the data columns
-    to merge entries  (i.e. in the case of a many to one mapping or a weighted mapping). This
-    requires a specification of the data columns, which are assumed to be all the columns that
-    are not the geocodes or the date column specified in date_col.
+    When replacing geocodes with a new one an aggregation step is performed on
+    the data columns to merge entries  (i.e. in the case of a many to one
+    mapping or a weighted mapping). This requires a specification of the data
+    columns, which are assumed to be all the columns that are not the geocodes
+    or the date column specified in date_col.
 
     Example 1: to add a new column with a new geocode, possibly with weights:
     > gmpr = GeoMapper()
-    > df = gmpr.add_geocode(df, "fips", "zip", from_col="fips", new_col="geo_id",
+    > df = gmpr.add_geocode(df, "fips", "zip",
+                            from_col="fips", new_col="geo_id",
                             date_col="timestamp", dropna=False)
 
-    Example 2: to replace a geocode column with a new one, aggregating the data with weights:
+    Example 2: to replace a geocode column with a new one, aggregating the data
+    with weights:
     > gmpr = GeoMapper()
-    > df = gmpr.replace_geocode(df, "fips", "zip", from_col="fips", new_col="geo_id",
+    > df = gmpr.replace_geocode(df, "fips", "zip",
+                                from_col="fips", new_col="geo_id",
                                 date_col="timestamp", dropna=False)
     """
 
@@ -113,7 +148,7 @@ def __init__(self, census_year: int = 2020):
             subkey
             for mainkey in self.CROSSWALK_FILENAMES
             for subkey in self.CROSSWALK_FILENAMES[mainkey]
-        }.union(set(self.CROSSWALK_FILENAMES.keys())) - set(["state", "pop"])
+        }.union(set(self.CROSSWALK_FILENAMES.keys())) - {"state", "pop"}
 
         for from_code, to_codes in self.CROSSWALK_FILENAMES.items():
             for to_code, file_path in to_codes.items():
@@ -135,7 +170,6 @@ def _load_crosswalk_from_file(
             "weight": float,
             **{geo: str for geo in self._geos - set("nation")},
         }
-
         usecols = [from_code, "pop"] if to_code == "pop" else None
         return pd.read_csv(stream, dtype=dtype, usecols=usecols)
 
@@ -229,13 +263,6 @@ def add_geocode(
     ):
         """Add a new geocode column to a dataframe.
 
-        Currently supported conversions:
-        - fips -> state_code, state_id, state_name, zip, msa, hrr, nation, hhs, chng-fips
-        - chng-fips -> state_code, state_id, state_name
-        - zip -> state_code, state_id, state_name, fips, msa, hrr, nation, hhs
-        - state_x -> state_y (where x and y are in {code, id, name}), nation
-        - state_code -> hhs, nation
-
         Parameters
         ---------
         df: pd.DataFrame
@@ -303,7 +330,7 @@ def add_geocode(
             df = df.merge(crosswalk, left_on=from_col, right_on=from_col, how="left")
 
         # Drop extra state columns
-        if new_code in state_codes and not from_code in state_codes:
+        if new_code in state_codes and from_code not in state_codes:
             state_codes.remove(new_code)
             df.drop(columns=state_codes, inplace=True)
         elif new_code in state_codes and from_code in state_codes:
@@ -345,13 +372,6 @@ def replace_geocode(
     ) -> pd.DataFrame:
         """Replace a geocode column in a dataframe.
 
-        Currently supported conversions:
-        - fips -> chng-fips, state_code, state_id, state_name, zip, msa, hrr, nation
-        - chng-fips -> state_code, state_id, state_name
-        - zip -> state_code, state_id, state_name, fips, msa, hrr, nation
-        - state_x -> state_y (where x and y are in {code, id, name}), nation
-        - state_code -> hhs, nation
-
         Parameters
         ---------
         df: pd.DataFrame
@@ -397,7 +417,7 @@ def replace_geocode(
             df[data_cols] = df[data_cols].multiply(df["weight"], axis=0)
             df.drop("weight", axis=1, inplace=True)
 
-        if not date_col is None:
+        if date_col is not None:
             df = df.groupby([date_col, new_col]).sum(numeric_only=True).reset_index()
         else:
             df = df.groupby([new_col]).sum(numeric_only=True).reset_index()
@@ -575,8 +595,7 @@ def get_geos_within(
         Return all contained regions of the given type within the given container geocode.
 
         Given container_geocode (e.g "ca" for California) of type container_geocode_type
-        (e.g "state"), return:
-            - all (contained_geocode_type)s within container_geocode
+        (e.g "state"), return all (contained_geocode_type)s within container_geocode.
 
         Supports these 4 combinations:
             - all states within a nation

From a7fbb3e744a5f5850d993b5d69bb97e895603c4e Mon Sep 17 00:00:00 2001
From: Dmitry Shemetov <dshemetov@ucdavis.edu>
Date: Mon, 6 May 2024 15:33:15 -0700
Subject: [PATCH 2/9] feat(geomap): add aggregate_by_weighted_sum

---
 _delphi_utils_python/delphi_utils/geomap.py | 43 +++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/_delphi_utils_python/delphi_utils/geomap.py b/_delphi_utils_python/delphi_utils/geomap.py
index c928a2bf5..1007d1ff6 100644
--- a/_delphi_utils_python/delphi_utils/geomap.py
+++ b/_delphi_utils_python/delphi_utils/geomap.py
@@ -646,3 +646,46 @@ def get_geos_within(
             "must be one of (state, nation), (state, hhs), (county, state)"
             ", (fips, state), (chng-fips, state)"
         )
+
+    def aggregate_by_weighted_sum(
+        self, df: pd.DataFrame, to_geo: str, sensor: str, population_column: str
+    ) -> pd.DataFrame:
+        """Aggregate sensor, weighted by time-dependent population.
+
+        Note: This function generates its own population weights and adjusts the
+        weights based on which data is NA. This is in contrast to the
+        `replace_geocode` function, which assumes that the weights are already
+        present in the data and does not adjust for missing data (see the
+        docstring for the GeoMapper class).
+
+        Parameters
+        ---------
+        df: pd.DataFrame
+            Input dataframe, assumed to have a sensor column (e.g. "visits"), a
+            to_geo column (e.g. "state"), and a population column (corresponding
+            to a from_geo, e.g. "wastewater collection site").
+        to_geo: str
+            The column name of the geocode to aggregate to.
+        sensor: str
+            The column name of the sensor to aggregate.
+        population_column: str
+            The column name of the population to weight the sensor by.
+
+        Returns
+        ---------
+        agg_df: pd.DataFrame
+            A dataframe with the aggregated sensor values, weighted by population.
+        """
+        # Zero-out populations where the sensor is NA
+        df[f"relevant_pop_{sensor}"] = df[population_column] * df[sensor].abs().notna()
+        # Weight the sensor by the population
+        df[f"weighted_{sensor}"] = df[sensor] * df[f"relevant_pop_{sensor}"]
+        agg_df = df.groupby(["timestamp", to_geo]).agg(
+            {
+                f"relevant_pop_{sensor}": "sum",
+                f"weighted_{sensor}": lambda x: x.sum(min_count=1),
+            }
+        )
+        agg_df["val"] = agg_df[f"weighted_{sensor}"] / agg_df[f"relevant_pop_{sensor}"]
+        agg_df = agg_df.reset_index()
+        return agg_df

From 7359bf9281ba1479894ef89a42545c9a5b8dc2af Mon Sep 17 00:00:00 2001
From: Dmitry Shemetov <dshemetov@ucdavis.edu>
Date: Tue, 7 May 2024 10:17:39 -0700
Subject: [PATCH 3/9] Update _delphi_utils_python/delphi_utils/geomap.py

---
 _delphi_utils_python/delphi_utils/geomap.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/_delphi_utils_python/delphi_utils/geomap.py b/_delphi_utils_python/delphi_utils/geomap.py
index 1007d1ff6..bb5f7da83 100644
--- a/_delphi_utils_python/delphi_utils/geomap.py
+++ b/_delphi_utils_python/delphi_utils/geomap.py
@@ -29,6 +29,7 @@ class GeoMapper:  # pylint: disable=too-many-public-methods
     - state_code:   two characters [0-9], e.g "06"
     - state_id:     two characters [A-Z], e.g "CA"
     - state_name:   human-readable name, e.g "California"
+    - state_*:      we use this below to refer to the three above geocodes in aggregate
     - hrr:          an integer from 1-500, also known as hospital
                     referral region
     - hhs:          an integer from 1-10, also known as health and human services region

From c249a3ab8df436a0652225a9d11e3288207305a1 Mon Sep 17 00:00:00 2001
From: Dmitry Shemetov <dshemetov@ucdavis.edu>
Date: Tue, 7 May 2024 10:19:02 -0700
Subject: [PATCH 4/9] Update _delphi_utils_python/delphi_utils/geomap.py

---
 _delphi_utils_python/delphi_utils/geomap.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/_delphi_utils_python/delphi_utils/geomap.py b/_delphi_utils_python/delphi_utils/geomap.py
index bb5f7da83..da8cbe772 100644
--- a/_delphi_utils_python/delphi_utils/geomap.py
+++ b/_delphi_utils_python/delphi_utils/geomap.py
@@ -263,6 +263,8 @@ def add_geocode(
         dropna: bool = True,
     ):
         """Add a new geocode column to a dataframe.
+        
+        See class docstring for supported geocode transformations.
 
         Parameters
         ---------

From 577a41ef89997554db99f38e96bca168e625eec2 Mon Sep 17 00:00:00 2001
From: Dmitry Shemetov <dshemetov@ucdavis.edu>
Date: Tue, 7 May 2024 10:19:39 -0700
Subject: [PATCH 5/9] Update _delphi_utils_python/delphi_utils/geomap.py

---
 _delphi_utils_python/delphi_utils/geomap.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/_delphi_utils_python/delphi_utils/geomap.py b/_delphi_utils_python/delphi_utils/geomap.py
index da8cbe772..8d746ce54 100644
--- a/_delphi_utils_python/delphi_utils/geomap.py
+++ b/_delphi_utils_python/delphi_utils/geomap.py
@@ -374,6 +374,8 @@ def replace_geocode(
         dropna: bool = True,
     ) -> pd.DataFrame:
         """Replace a geocode column in a dataframe.
+        
+        See class docstring for supported geocode transformations.
 
         Parameters
         ---------

From 912f58dbf964686b62bfed65f0c28d3ab6650174 Mon Sep 17 00:00:00 2001
From: Dmitry Shemetov <dshemetov@ucdavis.edu>
Date: Tue, 7 May 2024 10:23:22 -0700
Subject: [PATCH 6/9] Update _delphi_utils_python/delphi_utils/geomap.py

---
 _delphi_utils_python/delphi_utils/geomap.py | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/_delphi_utils_python/delphi_utils/geomap.py b/_delphi_utils_python/delphi_utils/geomap.py
index 8d746ce54..d3a150f37 100644
--- a/_delphi_utils_python/delphi_utils/geomap.py
+++ b/_delphi_utils_python/delphi_utils/geomap.py
@@ -657,11 +657,11 @@ def aggregate_by_weighted_sum(
     ) -> pd.DataFrame:
         """Aggregate sensor, weighted by time-dependent population.
 
-        Note: This function generates its own population weights and adjusts the
-        weights based on which data is NA. This is in contrast to the
-        `replace_geocode` function, which assumes that the weights are already
-        present in the data and does not adjust for missing data (see the
-        docstring for the GeoMapper class).
+        Note: This function generates its own population weights and excludes
+        locations where the data is NA, which is effectively an extrapolation assumption 
+        to the rest of the geos.  This is in contrast to the `replace_geocode` function,
+        which assumes that the weights are already present in the data and does
+        not adjust for missing data (see the docstring for the GeoMapper class).
 
         Parameters
         ---------

From 511322b0358f4170733ba8bd6f9d1b5dfe069c1a Mon Sep 17 00:00:00 2001
From: Dmitry Shemetov <dshemetov@ucdavis.edu>
Date: Tue, 7 May 2024 15:47:15 -0700
Subject: [PATCH 7/9] feat(geomap): fix and test aggregate_by_weighted_sum

---
 _delphi_utils_python/delphi_utils/geomap.py | 37 +++++----
 _delphi_utils_python/tests/test_geomap.py   | 86 +++++++++++++++++++++
 2 files changed, 109 insertions(+), 14 deletions(-)

diff --git a/_delphi_utils_python/delphi_utils/geomap.py b/_delphi_utils_python/delphi_utils/geomap.py
index d3a150f37..a313c754c 100644
--- a/_delphi_utils_python/delphi_utils/geomap.py
+++ b/_delphi_utils_python/delphi_utils/geomap.py
@@ -263,7 +263,7 @@ def add_geocode(
         dropna: bool = True,
     ):
         """Add a new geocode column to a dataframe.
-        
+
         See class docstring for supported geocode transformations.
 
         Parameters
@@ -374,7 +374,7 @@ def replace_geocode(
         dropna: bool = True,
     ) -> pd.DataFrame:
         """Replace a geocode column in a dataframe.
-        
+
         See class docstring for supported geocode transformations.
 
         Parameters
@@ -653,15 +653,16 @@ def get_geos_within(
         )
 
     def aggregate_by_weighted_sum(
-        self, df: pd.DataFrame, to_geo: str, sensor: str, population_column: str
+        self, df: pd.DataFrame, to_geo: str, sensor_col: str, time_col: str, population_col: str
     ) -> pd.DataFrame:
         """Aggregate sensor, weighted by time-dependent population.
 
         Note: This function generates its own population weights and excludes
-        locations where the data is NA, which is effectively an extrapolation assumption 
-        to the rest of the geos.  This is in contrast to the `replace_geocode` function,
-        which assumes that the weights are already present in the data and does
-        not adjust for missing data (see the docstring for the GeoMapper class).
+        locations where the data is NA, which is effectively an extrapolation
+        assumption to the rest of the geos.  This is in contrast to the
+        `replace_geocode` function, which assumes that the weights are already
+        present in the data and does not adjust for missing data (see the
+        docstring for the GeoMapper class).
 
         Parameters
         ---------
@@ -681,16 +682,24 @@ def aggregate_by_weighted_sum(
         agg_df: pd.DataFrame
             A dataframe with the aggregated sensor values, weighted by population.
         """
+        # Don't modify the input dataframe
+        df = df.copy()
         # Zero-out populations where the sensor is NA
-        df[f"relevant_pop_{sensor}"] = df[population_column] * df[sensor].abs().notna()
+        df["_zeroed_pop"] = df[population_col] * df[sensor_col].abs().notna()
         # Weight the sensor by the population
-        df[f"weighted_{sensor}"] = df[sensor] * df[f"relevant_pop_{sensor}"]
-        agg_df = df.groupby(["timestamp", to_geo]).agg(
+        df["_weighted_sensor"] = df[sensor_col] * df["_zeroed_pop"]
+        agg_df = (
+            df.groupby([time_col, to_geo])
+            .agg(
             {
-                f"relevant_pop_{sensor}": "sum",
-                f"weighted_{sensor}": lambda x: x.sum(min_count=1),
+                "_zeroed_pop": "sum",
+                "_weighted_sensor": lambda x: x.sum(min_count=1),
             }
+            ).assign(
+                _new_sensor = lambda x: x["_weighted_sensor"] / x["_zeroed_pop"]
+            ).reset_index()
+            .rename(columns={"_new_sensor": f"weighted_{sensor_col}"})
+            .drop(columns=["_zeroed_pop", "_weighted_sensor"])
         )
-        agg_df["val"] = agg_df[f"weighted_{sensor}"] / agg_df[f"relevant_pop_{sensor}"]
-        agg_df = agg_df.reset_index()
+
         return agg_df
diff --git a/_delphi_utils_python/tests/test_geomap.py b/_delphi_utils_python/tests/test_geomap.py
index ab86c143d..f29cb9b65 100644
--- a/_delphi_utils_python/tests/test_geomap.py
+++ b/_delphi_utils_python/tests/test_geomap.py
@@ -395,3 +395,89 @@ def test_census_year_pop(self, geomapper, geomapper_2019):
         df = pd.DataFrame({"fips": ["01001"]})
         assert geomapper.add_population_column(df, "fips").population[0] == 56145
         assert geomapper_2019.add_population_column(df, "fips").population[0] == 55869
+
+    def test_aggregate_by_weighted_sum(self, geomapper: GeoMapper):
+        df = pd.DataFrame(
+            {
+                "timestamp": [0] * 7,
+                "state": ["al", "al", "ca", "ca", "nd", "me", "me"],
+                "a": [1, 2, 3, 4, 12, -2, 2],
+                "b": [5, 6, 7, np.nan, np.nan, -1, -2],
+                "population_served": [10, 5, 8, 1, 3, 1, 2],
+            }
+        )
+        agg_df = geomapper.aggregate_by_weighted_sum(
+            df,
+            to_geo="state",
+            sensor_col="a",
+            time_col = "timestamp",
+            population_col="population_served"
+        )
+        agg_df_by_hand = pd.DataFrame(
+            {
+                "timestamp": [0] * 4,
+                "state": ["al", "ca", "me", "nd"],
+                "weighted_a": [
+                    (1 * 10 + 2 * 5) / 15,
+                    (3 * 8 + 4 * 1) / 9,
+                    (-2 * 1 + 2 * 2) / 3,
+                    (12 * 3) / 3,
+                ]
+            }
+        )
+        pd.testing.assert_frame_equal(agg_df, agg_df_by_hand)
+        agg_df = geomapper.aggregate_by_weighted_sum(
+            df,
+            to_geo="state",
+            sensor_col="b",
+            time_col = "timestamp",
+            population_col="population_served"
+        )
+        agg_df_by_hand = pd.DataFrame(
+            {
+                "timestamp": [0] * 4,
+                "state": ["al", "ca", "me", "nd"],
+                "weighted_b": [
+                    (5 * 10 + 6 * 5) / 15,
+                    (7 * 8 + 4 * 0) / 8,
+                    (-1 * 1 + -2 * 2) / 3,
+                    (np.nan) / 3,
+                ]
+            }
+        )
+        pd.testing.assert_frame_equal(agg_df, agg_df_by_hand)
+
+        df = pd.DataFrame(
+            {
+                "state": [
+                    "al",
+                    "al",
+                    "ca",
+                    "ca",
+                    "nd",
+                ],
+                "nation": ["us"] * 5,
+                "timestamp": [0] * 3 + [1] * 2,
+                "a": [1, 2, 3, 4, 12],
+                "b": [5, 6, 7, np.nan, np.nan],
+                "population_served": [10, 5, 8, 1, 3],
+            }
+        )
+        agg_df = geomapper.aggregate_by_weighted_sum(
+            df,
+            to_geo="nation",
+            sensor_col="a",
+            time_col = "timestamp",
+            population_col="population_served"
+        )
+        agg_df_by_hand = pd.DataFrame(
+            {
+                "timestamp": [0, 1],
+                "nation": ["us"] * 2,
+                "weighted_a": [
+                    (1 * 10 + 2 * 5 + 3 * 8) / 23,
+                    (1 * 4 + 3 * 12) / 4
+                    ]
+            }
+        )
+        pd.testing.assert_frame_equal(agg_df, agg_df_by_hand)

From 79072dcdec3faca9aaeeea65de83f7fa5c00d53f Mon Sep 17 00:00:00 2001
From: Dmitry Shemetov <dshemetov@ucdavis.edu>
Date: Tue, 7 May 2024 15:55:07 -0700
Subject: [PATCH 8/9] lint: format test_geomap

---
 _delphi_utils_python/tests/test_geomap.py | 206 +++++++++++++++-------
 1 file changed, 139 insertions(+), 67 deletions(-)

diff --git a/_delphi_utils_python/tests/test_geomap.py b/_delphi_utils_python/tests/test_geomap.py
index f29cb9b65..c968fd359 100644
--- a/_delphi_utils_python/tests/test_geomap.py
+++ b/_delphi_utils_python/tests/test_geomap.py
@@ -10,10 +10,12 @@
 def geomapper():
     return GeoMapper(census_year=2020)
 
+
 @pytest.fixture(scope="class")
 def geomapper_2019():
     return GeoMapper(census_year=2019)
 
+
 class TestGeoMapper:
     fips_data = pd.DataFrame(
         {
@@ -34,7 +36,8 @@ class TestGeoMapper:
     fips_data_3 = pd.DataFrame(
         {
             "fips": ["48059", "48253", "48441", "72003", "72005", "10999"],
-            "timestamp": [pd.Timestamp("2018-01-01")] * 3 + [pd.Timestamp("2018-01-03")] * 3,
+            "timestamp": [pd.Timestamp("2018-01-01")] * 3
+            + [pd.Timestamp("2018-01-03")] * 3,
             "count": [1, 2, 3, 4, 8, 5],
             "total": [2, 4, 7, 11, 100, 10],
         }
@@ -58,7 +61,8 @@ class TestGeoMapper:
     zip_data = pd.DataFrame(
         {
             "zip": ["45140", "95616", "95618"] * 2,
-            "timestamp": [pd.Timestamp("2018-01-01")] * 3 + [pd.Timestamp("2018-01-03")] * 3,
+            "timestamp": [pd.Timestamp("2018-01-01")] * 3
+            + [pd.Timestamp("2018-01-03")] * 3,
             "count": [99, 345, 456, 100, 344, 442],
         }
     )
@@ -132,7 +136,7 @@ class TestGeoMapper:
     )
 
     # Loading tests updated 8/26
-    def test_crosswalks(self, geomapper):
+    def test_crosswalks(self, geomapper: GeoMapper):
         # These tests ensure that the one-to-many crosswalks have properly normalized weights
         # FIPS -> HRR is allowed to be an incomplete mapping, since only a fraction of a FIPS
         # code can not belong to an HRR
@@ -152,33 +156,32 @@ def test_crosswalks(self, geomapper):
         cw = geomapper.get_crosswalk(from_code="zip", to_code="hhs")
         assert cw.groupby("zip")["weight"].sum().round(5).eq(1.0).all()
 
-
-    def test_load_zip_fips_table(self, geomapper):
+    def test_load_zip_fips_table(self, geomapper: GeoMapper):
         fips_data = geomapper.get_crosswalk(from_code="zip", to_code="fips")
         assert set(fips_data.columns) == set(["zip", "fips", "weight"])
         assert pd.api.types.is_string_dtype(fips_data.zip)
         assert pd.api.types.is_string_dtype(fips_data.fips)
         assert pd.api.types.is_float_dtype(fips_data.weight)
 
-    def test_load_state_table(self, geomapper):
+    def test_load_state_table(self, geomapper: GeoMapper):
         state_data = geomapper.get_crosswalk(from_code="state", to_code="state")
         assert tuple(state_data.columns) == ("state_code", "state_id", "state_name")
         assert state_data.shape[0] == 60
 
-    def test_load_fips_msa_table(self, geomapper):
+    def test_load_fips_msa_table(self, geomapper: GeoMapper):
         msa_data = geomapper.get_crosswalk(from_code="fips", to_code="msa")
         assert tuple(msa_data.columns) == ("fips", "msa")
 
-    def test_load_fips_chngfips_table(self, geomapper):
+    def test_load_fips_chngfips_table(self, geomapper: GeoMapper):
         chngfips_data = geomapper.get_crosswalk(from_code="fips", to_code="chng-fips")
         assert tuple(chngfips_data.columns) == ("fips", "chng-fips")
 
-    def test_load_zip_hrr_table(self, geomapper):
+    def test_load_zip_hrr_table(self, geomapper: GeoMapper):
         zip_data = geomapper.get_crosswalk(from_code="zip", to_code="hrr")
         assert pd.api.types.is_string_dtype(zip_data["zip"])
         assert pd.api.types.is_string_dtype(zip_data["hrr"])
 
-    def test_megacounty(self, geomapper):
+    def test_megacounty(self, geomapper: GeoMapper):
         new_data = geomapper.fips_to_megacounty(self.mega_data, 6, 50)
         assert (
             new_data[["count", "visits"]].sum()
@@ -204,12 +207,18 @@ def test_megacounty(self, geomapper):
                 "count": [8, 7, 3, 10021],
             }
         )
-        pd.testing.assert_frame_equal(new_data.set_index("megafips").sort_index(axis=1), expected_df.set_index("megafips").sort_index(axis=1))
+        pd.testing.assert_frame_equal(
+            new_data.set_index("megafips").sort_index(axis=1),
+            expected_df.set_index("megafips").sort_index(axis=1),
+        )
         # chng-fips should have the same behavior when converting to megacounties.
         mega_county_groups = self.mega_data_3.copy()
-        mega_county_groups.fips.replace({1125:"01g01"}, inplace = True)
+        mega_county_groups.fips.replace({1125: "01g01"}, inplace=True)
         new_data = geomapper.fips_to_megacounty(self.mega_data_3, 4, 1)
-        pd.testing.assert_frame_equal(new_data.set_index("megafips").sort_index(axis=1), expected_df.set_index("megafips").sort_index(axis=1))
+        pd.testing.assert_frame_equal(
+            new_data.set_index("megafips").sort_index(axis=1),
+            expected_df.set_index("megafips").sort_index(axis=1),
+        )
 
         new_data = geomapper.fips_to_megacounty(self.mega_data_3, 4, 1, thr_col="count")
         expected_df = pd.DataFrame(
@@ -220,14 +229,20 @@ def test_megacounty(self, geomapper):
                 "count": [6, 5, 7, 10021],
             }
         )
-        pd.testing.assert_frame_equal(new_data.set_index("megafips").sort_index(axis=1), expected_df.set_index("megafips").sort_index(axis=1))
+        pd.testing.assert_frame_equal(
+            new_data.set_index("megafips").sort_index(axis=1),
+            expected_df.set_index("megafips").sort_index(axis=1),
+        )
         # chng-fips should have the same behavior when converting to megacounties.
         mega_county_groups = self.mega_data_3.copy()
-        mega_county_groups.fips.replace({1123:"01g01"}, inplace = True)
+        mega_county_groups.fips.replace({1123: "01g01"}, inplace=True)
         new_data = geomapper.fips_to_megacounty(self.mega_data_3, 4, 1, thr_col="count")
-        pd.testing.assert_frame_equal(new_data.set_index("megafips").sort_index(axis=1), expected_df.set_index("megafips").sort_index(axis=1))
+        pd.testing.assert_frame_equal(
+            new_data.set_index("megafips").sort_index(axis=1),
+            expected_df.set_index("megafips").sort_index(axis=1),
+        )
 
-    def test_add_population_column(self, geomapper):
+    def test_add_population_column(self, geomapper: GeoMapper):
         new_data = geomapper.add_population_column(self.fips_data_3, "fips")
         assert new_data.shape == (5, 5)
         new_data = geomapper.add_population_column(self.zip_data, "zip")
@@ -245,14 +260,18 @@ def test_add_population_column(self, geomapper):
         new_data = geomapper.add_population_column(self.nation_data, "nation")
         assert new_data.shape == (1, 3)
 
-    def test_add_geocode(self, geomapper):
+    def test_add_geocode(self, geomapper: GeoMapper):
         # state_code -> nation
         new_data = geomapper.add_geocode(self.zip_data, "zip", "state_code")
         new_data2 = geomapper.add_geocode(new_data, "state_code", "nation")
         assert new_data2["nation"].unique()[0] == "us"
         new_data = geomapper.replace_geocode(self.zip_data, "zip", "state_code")
-        new_data2 = geomapper.add_geocode(new_data, "state_code", "state_id", new_col="state")
-        new_data3 = geomapper.replace_geocode(new_data2, "state_code", "nation", new_col="geo_id")
+        new_data2 = geomapper.add_geocode(
+            new_data, "state_code", "state_id", new_col="state"
+        )
+        new_data3 = geomapper.replace_geocode(
+            new_data2, "state_code", "nation", new_col="geo_id"
+        )
         assert "state" not in new_data3.columns
 
         # state_code -> hhs
@@ -264,11 +283,15 @@ def test_add_geocode(self, geomapper):
         new_data = geomapper.replace_geocode(self.zip_data, "zip", "state_name")
         new_data2 = geomapper.add_geocode(new_data, "state_name", "state_id")
         assert new_data2.shape == (4, 5)
-        new_data2 = geomapper.replace_geocode(new_data, "state_name", "state_id", new_col="abbr")
+        new_data2 = geomapper.replace_geocode(
+            new_data, "state_name", "state_id", new_col="abbr"
+        )
         assert "abbr" in new_data2.columns
 
         # fips -> nation
-        new_data = geomapper.replace_geocode(self.fips_data_5, "fips", "nation", new_col="NATION")
+        new_data = geomapper.replace_geocode(
+            self.fips_data_5, "fips", "nation", new_col="NATION"
+        )
         pd.testing.assert_frame_equal(
             new_data,
             pd.DataFrame().from_dict(
@@ -278,15 +301,25 @@ def test_add_geocode(self, geomapper):
                     "count": {0: 10024.0},
                     "total": {0: 100006.0},
                 }
-            )
+            ),
         )
 
         # fips -> chng-fips
         new_data = geomapper.add_geocode(self.fips_data_5, "fips", "chng-fips")
-        assert sorted(list(new_data["chng-fips"])) == ['01123', '18181', '48g19', '72003']
+        assert sorted(list(new_data["chng-fips"])) == [
+            "01123",
+            "18181",
+            "48g19",
+            "72003",
+        ]
         assert new_data["chng-fips"].size == self.fips_data_5.fips.size
         new_data = geomapper.replace_geocode(self.fips_data_5, "fips", "chng-fips")
-        assert sorted(list(new_data["chng-fips"])) == ['01123', '18181', '48g19', '72003']
+        assert sorted(list(new_data["chng-fips"])) == [
+            "01123",
+            "18181",
+            "48g19",
+            "72003",
+        ]
         assert new_data["chng-fips"].size == self.fips_data_5.fips.size
 
         # chng-fips -> state_id
@@ -294,12 +327,12 @@ def test_add_geocode(self, geomapper):
         new_data2 = geomapper.add_geocode(new_data, "chng-fips", "state_id")
         assert new_data2["state_id"].unique().size == 4
         assert new_data2["state_id"].size == self.fips_data_5.fips.size
-        assert sorted(list(new_data2["state_id"])) == ['al', 'in', 'pr', 'tx']
+        assert sorted(list(new_data2["state_id"])) == ["al", "in", "pr", "tx"]
 
         new_data2 = geomapper.replace_geocode(new_data, "chng-fips", "state_id")
         assert new_data2["state_id"].unique().size == 4
         assert new_data2["state_id"].size == 4
-        assert sorted(list(new_data2["state_id"])) == ['al', 'in', 'pr', 'tx']
+        assert sorted(list(new_data2["state_id"])) == ["al", "in", "pr", "tx"]
 
         # zip -> nation
         new_data = geomapper.replace_geocode(self.zip_data, "zip", "nation")
@@ -315,7 +348,7 @@ def test_add_geocode(self, geomapper):
                     "count": {0: 900, 1: 886},
                     "total": {0: 1800, 1: 1772},
                 }
-            )
+            ),
         )
 
         # hrr -> nation
@@ -324,53 +357,84 @@ def test_add_geocode(self, geomapper):
             new_data2 = geomapper.replace_geocode(new_data, "hrr", "nation")
 
         # fips -> hrr (dropna=True/False check)
-        assert not geomapper.add_geocode(self.fips_data_3, "fips", "hrr").isna().any().any()
-        assert geomapper.add_geocode(self.fips_data_3, "fips", "hrr", dropna=False).isna().any().any()
+        assert (
+            not geomapper.add_geocode(self.fips_data_3, "fips", "hrr")
+            .isna()
+            .any()
+            .any()
+        )
+        assert (
+            geomapper.add_geocode(self.fips_data_3, "fips", "hrr", dropna=False)
+            .isna()
+            .any()
+            .any()
+        )
 
         # fips -> zip (date_col=None chech)
-        new_data = geomapper.replace_geocode(self.fips_data_5.drop(columns=["timestamp"]), "fips", "hrr", date_col=None)
+        new_data = geomapper.replace_geocode(
+            self.fips_data_5.drop(columns=["timestamp"]), "fips", "hrr", date_col=None
+        )
         pd.testing.assert_frame_equal(
             new_data,
             pd.DataFrame().from_dict(
                 {
-                    'hrr': {0: '1', 1: '183', 2: '184', 3: '382', 4: '7'},
-                    'count': {0: 1.772347174163783, 1: 7157.392403522299, 2: 2863.607596477701, 3: 1.0, 4: 0.22765282583621685},
-                    'total': {0: 3.544694348327566, 1: 71424.64801363471, 2: 28576.35198636529, 3: 1.0, 4: 0.4553056516724337}
+                    "hrr": {0: "1", 1: "183", 2: "184", 3: "382", 4: "7"},
+                    "count": {
+                        0: 1.772347174163783,
+                        1: 7157.392403522299,
+                        2: 2863.607596477701,
+                        3: 1.0,
+                        4: 0.22765282583621685,
+                    },
+                    "total": {
+                        0: 3.544694348327566,
+                        1: 71424.64801363471,
+                        2: 28576.35198636529,
+                        3: 1.0,
+                        4: 0.4553056516724337,
+                    },
                 }
-            )
+            ),
         )
 
         # fips -> hhs
-        new_data = geomapper.replace_geocode(self.fips_data_3.drop(columns=["timestamp"]),
-                                        "fips", "hhs", date_col=None)
+        new_data = geomapper.replace_geocode(
+            self.fips_data_3.drop(columns=["timestamp"]), "fips", "hhs", date_col=None
+        )
         pd.testing.assert_frame_equal(
             new_data,
             pd.DataFrame().from_dict(
                 {
                     "hhs": {0: "2", 1: "6"},
                     "count": {0: 12, 1: 6},
-                    "total": {0: 111, 1: 13}
+                    "total": {0: 111, 1: 13},
                 }
-            )
+            ),
         )
 
         # zip -> hhs
         new_data = geomapper.replace_geocode(self.zip_data, "zip", "hhs")
-        new_data = new_data.round(10)  # get rid of a floating point error with 99.00000000000001
+        new_data = new_data.round(
+            10
+        )  # get rid of a floating point error with 99.00000000000001
         pd.testing.assert_frame_equal(
             new_data,
             pd.DataFrame().from_dict(
                 {
-                    "timestamp": {0: pd.Timestamp("2018-01-01"), 1: pd.Timestamp("2018-01-01"),
-                             2: pd.Timestamp("2018-01-03"), 3: pd.Timestamp("2018-01-03")},
+                    "timestamp": {
+                        0: pd.Timestamp("2018-01-01"),
+                        1: pd.Timestamp("2018-01-01"),
+                        2: pd.Timestamp("2018-01-03"),
+                        3: pd.Timestamp("2018-01-03"),
+                    },
                     "hhs": {0: "5", 1: "9", 2: "5", 3: "9"},
                     "count": {0: 99.0, 1: 801.0, 2: 100.0, 3: 786.0},
-                    "total": {0: 198.0, 1: 1602.0, 2: 200.0, 3: 1572.0}
+                    "total": {0: 198.0, 1: 1602.0, 2: 200.0, 3: 1572.0},
                 }
-            )
+            ),
         )
 
-    def test_get_geos(self, geomapper):
+    def test_get_geos(self, geomapper: GeoMapper):
         assert geomapper.get_geo_values("nation") == {"us"}
         assert geomapper.get_geo_values("hhs") == set(str(i) for i in range(1, 11))
         assert len(geomapper.get_geo_values("fips")) == 3293
@@ -378,20 +442,31 @@ def test_get_geos(self, geomapper):
         assert len(geomapper.get_geo_values("state_id")) == 60
         assert len(geomapper.get_geo_values("zip")) == 32976
 
-    def test_get_geos_2019(self, geomapper_2019):
+    def test_get_geos_2019(self, geomapper_2019: GeoMapper):
         assert len(geomapper_2019.get_geo_values("fips")) == 3292
         assert len(geomapper_2019.get_geo_values("chng-fips")) == 2710
 
-    def test_get_geos_within(self, geomapper):
-        assert len(geomapper.get_geos_within("us","state","nation")) == 60
-        assert len(geomapper.get_geos_within("al","county","state")) == 68
-        assert len(geomapper.get_geos_within("al","fips","state")) == 68
-        assert geomapper.get_geos_within("al","fips","state") == geomapper.get_geos_within("al","county","state")
-        assert len(geomapper.get_geos_within("al","chng-fips","state")) == 66
-        assert len(geomapper.get_geos_within("4","state","hhs")) == 8
-        assert geomapper.get_geos_within("4","state","hhs") == {'al', 'fl', 'ga', 'ky', 'ms', 'nc', "tn", "sc"}
+    def test_get_geos_within(self, geomapper: GeoMapper):
+        assert len(geomapper.get_geos_within("us", "state", "nation")) == 60
+        assert len(geomapper.get_geos_within("al", "county", "state")) == 68
+        assert len(geomapper.get_geos_within("al", "fips", "state")) == 68
+        assert geomapper.get_geos_within(
+            "al", "fips", "state"
+        ) == geomapper.get_geos_within("al", "county", "state")
+        assert len(geomapper.get_geos_within("al", "chng-fips", "state")) == 66
+        assert len(geomapper.get_geos_within("4", "state", "hhs")) == 8
+        assert geomapper.get_geos_within("4", "state", "hhs") == {
+            "al",
+            "fl",
+            "ga",
+            "ky",
+            "ms",
+            "nc",
+            "tn",
+            "sc",
+        }
 
-    def test_census_year_pop(self, geomapper, geomapper_2019):
+    def test_census_year_pop(self, geomapper: GeoMapper, geomapper_2019: GeoMapper):
         df = pd.DataFrame({"fips": ["01001"]})
         assert geomapper.add_population_column(df, "fips").population[0] == 56145
         assert geomapper_2019.add_population_column(df, "fips").population[0] == 55869
@@ -410,8 +485,8 @@ def test_aggregate_by_weighted_sum(self, geomapper: GeoMapper):
             df,
             to_geo="state",
             sensor_col="a",
-            time_col = "timestamp",
-            population_col="population_served"
+            time_col="timestamp",
+            population_col="population_served",
         )
         agg_df_by_hand = pd.DataFrame(
             {
@@ -422,7 +497,7 @@ def test_aggregate_by_weighted_sum(self, geomapper: GeoMapper):
                     (3 * 8 + 4 * 1) / 9,
                     (-2 * 1 + 2 * 2) / 3,
                     (12 * 3) / 3,
-                ]
+                ],
             }
         )
         pd.testing.assert_frame_equal(agg_df, agg_df_by_hand)
@@ -430,8 +505,8 @@ def test_aggregate_by_weighted_sum(self, geomapper: GeoMapper):
             df,
             to_geo="state",
             sensor_col="b",
-            time_col = "timestamp",
-            population_col="population_served"
+            time_col="timestamp",
+            population_col="population_served",
         )
         agg_df_by_hand = pd.DataFrame(
             {
@@ -442,7 +517,7 @@ def test_aggregate_by_weighted_sum(self, geomapper: GeoMapper):
                     (7 * 8 + 4 * 0) / 8,
                     (-1 * 1 + -2 * 2) / 3,
                     (np.nan) / 3,
-                ]
+                ],
             }
         )
         pd.testing.assert_frame_equal(agg_df, agg_df_by_hand)
@@ -467,17 +542,14 @@ def test_aggregate_by_weighted_sum(self, geomapper: GeoMapper):
             df,
             to_geo="nation",
             sensor_col="a",
-            time_col = "timestamp",
-            population_col="population_served"
+            time_col="timestamp",
+            population_col="population_served",
         )
         agg_df_by_hand = pd.DataFrame(
             {
                 "timestamp": [0, 1],
                 "nation": ["us"] * 2,
-                "weighted_a": [
-                    (1 * 10 + 2 * 5 + 3 * 8) / 23,
-                    (1 * 4 + 3 * 12) / 4
-                    ]
+                "weighted_a": [(1 * 10 + 2 * 5 + 3 * 8) / 23, (1 * 4 + 3 * 12) / 4],
             }
         )
         pd.testing.assert_frame_equal(agg_df, agg_df_by_hand)

From 48e247fbc8e80f9196b262dfae5743ea9ee1cbe7 Mon Sep 17 00:00:00 2001
From: Dmitry Shemetov <dshemetov@ucdavis.edu>
Date: Tue, 7 May 2024 15:56:10 -0700
Subject: [PATCH 9/9] repo: update blame-ignore

---
 .git-blame-ignore-revs | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/.git-blame-ignore-revs b/.git-blame-ignore-revs
index f91c04645..904a3bf69 100644
--- a/.git-blame-ignore-revs
+++ b/.git-blame-ignore-revs
@@ -1,2 +1,4 @@
-# Format geomap.py with black
+# Format geomap.py
 d4b056e7a4c11982324e9224c9f9f6fd5d5ec65c
+# Format test_geomap.py
+79072dcdec3faca9aaeeea65de83f7fa5c00d53f
\ No newline at end of file