Skip to content

geo_data_proc refactor: readability and a small bug fix #1331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 26, 2021

Conversation

dshemetov
Copy link
Contributor

@dshemetov dshemetov commented Oct 21, 2021

Description

geo_data_proc.py (producing crosswalk files for geomapper) is a little bloated, so a few refactors to make it easier to read (on a widescreen monitor), a bug fix, and a minor change.

Changelog

  • Allow long lines. Personal code style preference to not line wrap long pandas chains.
  • Fix a minor bug where derived functions could use old crosswalks from a previous build, causing inconsistency.
  • Force deterministic sort order for crosswalk tables to make diffs easier to read.

@dshemetov dshemetov requested a review from chinandrew October 21, 2021 21:41
Comment on lines 233 to 210
jhu_df = (
pd.read_csv(JHU_FIPS_URL, dtype={"UID": str, "FIPS": str})
.query("Country_Region == 'US'")[["UID", "FIPS"]]
.rename(columns={"UID": "jhu_uid", "FIPS": "fips"})
.dropna(subset=["fips"])
)
jhu_df = pd.read_csv(JHU_FIPS_URL, dtype={"UID": str, "FIPS": str}).query("Country_Region == 'US'")[["UID", "FIPS"]].rename(columns={"UID": "jhu_uid", "FIPS": "fips"}).dropna(subset=["fips"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually easier to read? I find the multiline a bit easier, though could be convinced otherwise. Also it would keep linting standards consistent across the codebase even though this isnt linted.

Copy link
Contributor Author

@dshemetov dshemetov Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it's def a personal preference I've developed while working with the delphi-epidata rewrite. Roughly put, I think it's easier to get an overview of a function when its code lines are lines and not code chunks. I spend less mental overhead on parsing indentation and code chunks and get more vertical space economy.

The pandas functions are code highlighted even in GH, so if you need the details of all the data munging you scan to the right, otherwise the read_csv is all you need to get a first approximation of what jhu_df contains. And then there's no need for the extra parentheses bloat on top and bottom.

But yea, it's mixing code styles across the codebase, so that's not great. At least it's consistent in a file? 🙃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split longer lines like this one in two steps.

* type hinting for language server
* function docstrings
* split really long pandas chains in two
Copy link
Contributor

@chinandrew chinandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit::shipit::shipit:

@krivard krivard merged commit a1fce7e into main Oct 26, 2021
@krivard krivard deleted the geocoder-readability branch October 26, 2021 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants