-
Notifications
You must be signed in to change notification settings - Fork 16
geo_data_proc refactor: readability and a small bug fix #1331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jhu_df = ( | ||
pd.read_csv(JHU_FIPS_URL, dtype={"UID": str, "FIPS": str}) | ||
.query("Country_Region == 'US'")[["UID", "FIPS"]] | ||
.rename(columns={"UID": "jhu_uid", "FIPS": "fips"}) | ||
.dropna(subset=["fips"]) | ||
) | ||
jhu_df = pd.read_csv(JHU_FIPS_URL, dtype={"UID": str, "FIPS": str}).query("Country_Region == 'US'")[["UID", "FIPS"]].rename(columns={"UID": "jhu_uid", "FIPS": "fips"}).dropna(subset=["fips"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually easier to read? I find the multiline a bit easier, though could be convinced otherwise. Also it would keep linting standards consistent across the codebase even though this isnt linted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it's def a personal preference I've developed while working with the delphi-epidata rewrite. Roughly put, I think it's easier to get an overview of a function when its code lines are lines and not code chunks. I spend less mental overhead on parsing indentation and code chunks and get more vertical space economy.
The pandas functions are code highlighted even in GH, so if you need the details of all the data munging you scan to the right, otherwise the read_csv
is all you need to get a first approximation of what jhu_df
contains. And then there's no need for the extra parentheses bloat on top and bottom.
But yea, it's mixing code styles across the codebase, so that's not great. At least it's consistent in a file? 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split longer lines like this one in two steps.
* type hinting for language server * function docstrings * split really long pandas chains in two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
geo_data_proc.py
(producing crosswalk files for geomapper) is a little bloated, so a few refactors to make it easier to read (on a widescreen monitor), a bug fix, and a minor change.Changelog