geo_data_proc refactor: readability and a small bug fix #1331

dshemetov · 2021-10-21T21:41:23Z

Description

geo_data_proc.py (producing crosswalk files for geomapper) is a little bloated, so a few refactors to make it easier to read (on a widescreen monitor), a bug fix, and a minor change.

Changelog

Allow long lines. Personal code style preference to not line wrap long pandas chains.
Fix a minor bug where derived functions could use old crosswalks from a previous build, causing inconsistency.
Force deterministic sort order for crosswalk tables to make diffs easier to read.

chinandrew · 2021-10-25T15:18:30Z

_delphi_utils_python/data_proc/geomap/geo_data_proc.py

-    jhu_df = (
-        pd.read_csv(JHU_FIPS_URL, dtype={"UID": str, "FIPS": str})
-        .query("Country_Region == 'US'")[["UID", "FIPS"]]
-        .rename(columns={"UID": "jhu_uid", "FIPS": "fips"})
-        .dropna(subset=["fips"])
-    )
+    jhu_df = pd.read_csv(JHU_FIPS_URL, dtype={"UID": str, "FIPS": str}).query("Country_Region == 'US'")[["UID", "FIPS"]].rename(columns={"UID": "jhu_uid", "FIPS": "fips"}).dropna(subset=["fips"])


Is this actually easier to read? I find the multiline a bit easier, though could be convinced otherwise. Also it would keep linting standards consistent across the codebase even though this isnt linted.

Yea, it's def a personal preference I've developed while working with the delphi-epidata rewrite. Roughly put, I think it's easier to get an overview of a function when its code lines are lines and not code chunks. I spend less mental overhead on parsing indentation and code chunks and get more vertical space economy.

The pandas functions are code highlighted even in GH, so if you need the details of all the data munging you scan to the right, otherwise the read_csv is all you need to get a first approximation of what jhu_df contains. And then there's no need for the extra parentheses bloat on top and bottom.

But yea, it's mixing code styles across the codebase, so that's not great. At least it's consistent in a file? 🙃

Split longer lines like this one in two steps.

* type hinting for language server * function docstrings * split really long pandas chains in two

chinandrew

dshemetov added 4 commits October 21, 2021 14:36

Utils geo_data_proc: improve readability

fcde742

Utils geo_data_proc: sort crosswalks for clearer diffs

dcecd56

Utils geo_data_proc: add sorted crosswalks

a409da6

Utils geo_data_proc: fix possible bug, minor population fix

b5668ee

dshemetov requested a review from chinandrew October 21, 2021 21:41

chinandrew reviewed Oct 25, 2021

View reviewed changes

dshemetov added 2 commits October 25, 2021 15:10

Utils geo_data_proc: more readability improvements

1ff8b17

* type hinting for language server * function docstrings * split really long pandas chains in two

Utils geo_data_proc: more minor changes

2cb9807

chinandrew approved these changes Oct 26, 2021

View reviewed changes

krivard merged commit a1fce7e into main Oct 26, 2021

krivard deleted the geocoder-readability branch October 26, 2021 15:05

krivard mentioned this pull request Oct 26, 2021

Release covidcast-indicators 0.2.2 #1336

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

geo_data_proc refactor: readability and a small bug fix #1331

geo_data_proc refactor: readability and a small bug fix #1331

Uh oh!

dshemetov commented Oct 21, 2021 •

edited

Loading

Uh oh!

chinandrew Oct 25, 2021

Uh oh!

dshemetov Oct 25, 2021 •

edited

Loading

Uh oh!

dshemetov Oct 25, 2021

Uh oh!

chinandrew left a comment

Uh oh!

Uh oh!

geo_data_proc refactor: readability and a small bug fix #1331

geo_data_proc refactor: readability and a small bug fix #1331

Uh oh!

Conversation

dshemetov commented Oct 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changelog

Uh oh!

chinandrew Oct 25, 2021

Choose a reason for hiding this comment

Uh oh!

dshemetov Oct 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dshemetov Oct 25, 2021

Choose a reason for hiding this comment

Uh oh!

chinandrew left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dshemetov commented Oct 21, 2021 •

edited

Loading

dshemetov Oct 25, 2021 •

edited

Loading