-
Notifications
You must be signed in to change notification settings - Fork 16
Revised: Basic FlaSH Implementation #1751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think it's fine to have duplicate or semi-duplicate code for now. It could make sense to share functions, but that would probably require a refactor of the validator, given its class structure.
👍
We use a caching mechanism in the CPR indicator that you can reference. It's pretty simple, We save the file to disk if it doesn't already exist, using a unique filename. Each time the indicator runs, we construct a list of new files to download. |
Fixed the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial look
Questions for in-person meeting:
- I need a brief refresher on the general FlaSH approach.
- Do you have an example of the alerts visualization tool? I saw a link in the code but it said the app was "asleep".
- What broad changes have been made in this compared to the previous version of FlaSH?
- I'm finding it hard to understand what the functions in
eval_day.py
are doing broadly. We can improve docstrings and maybe function names together in the meeting.
|
||
## Running FlaSH-eval | ||
|
||
First, run the indicator so that there are files for FlaSH to check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the production version of this downloads files from somewhere, but the local/testing version runs off of pre-existing files, sounds like this tool will need two different modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the
W0108: Lambda may not be necessary
error in bab4e43, but linting is now showing. env/bin/activate; pylint delphi_utils ------------------------------------------------------------------- Your code has been rated at 10.00/10 (previous run: 9.99/10, +0.01) . env/bin/activate; pydocstyle delphi_utils delphi_utils/flash_eval/eval_day.py:1 at module level: D300: Use """triple double quotes""" (found "-quotes) delphi_utils/flash_eval/eval_day.py:17 in public function `fix_iqr`: D300: Use """triple double quotes""" (found "-quotes) delphi_utils/flash_eval/eval_day.py:17 in public function `fix_iqr`: D401: First line should be in imperative mood (perhaps 'Change', not 'Changes') delphi_utils/flash_eval/eval_day.py:30 in public function `outlier`: D205: 1 blank line required between summary line and description (found 0) delphi_utils/flash_eval/eval_day.py:91 in public function `spike_outliers`: D205: 1 blank line required between summary line and description (found 0) delphi_utils/flash_eval/eval_day.py:91 in public function `spike_outliers`: D209: Multi-line docstring closing quotes should be on a separate line delphi_utils/flash_eval/eval_day.py:91 in public function `spike_outliers`: D401: First line should be in imperative mood; try rephrasing (found 'An') delphi_utils/flash_eval/eval_day.py:143 in public function `bin_dist`: D300: Use """triple double quotes""" (found "-quotes) delphi_utils/flash_eval/eval_day.py:143 in public function `bin_dist`: D401: First line should be in imperative mood; try rephrasing (found 'A') delphi_utils/flash_eval/eval_day.py:145 in private nested function `ts_dist`: D300: Use """triple double quotes""" (found "-quotes) delphi_utils/flash_eval/eval_day.py:145 in private nested function `ts_dist`: D401: First line should be in imperative mood (perhaps 'Initialize', not 'Initial') delphi_utils/flash_eval/eval_day.py:155 in public function `eval_day`: D205: 1 blank line required between summary line and description (found 0) delphi_utils/flash_eval/eval_day.py:208 in public function `return_vals`: D400: First line should end with a period (not 't') delphi_utils/flash_eval/eval_day.py:208 in public function `return_vals`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') delphi_utils/flash_eval/eval_day.py:222 in public function `process_anomalies`: D300: Use """triple double quotes""" (found "-quotes) delphi_utils/flash_eval/eval_day.py:244 in public function `flash_eval_lag`: D202: No blank lines allowed after function docstring (found 2) delphi_utils/flash_eval/eval_day.py:244 in public function `flash_eval_lag`: D205: 1 blank line required between summary line and description (found 0) delphi_utils/flash_eval/eval_day.py:244 in public function `flash_eval_lag`: D400: First line should end with a period (not 'g') delphi_utils/flash_eval/eval_day.py:336 in public function `flash_eval`: D202: No blank lines allowed after function docstring (found 1) delphi_utils/flash_eval/eval_day.py:336 in public function `flash_eval`: D205: 1 blank line required between summary line and description (found 0) delphi_utils/flash_eval/eval_day.py:336 in public function `flash_eval`: D210: No whitespaces allowed surrounding docstring text make: *** [Makefile:18: lint] Error 1
Hi Nat,
Thank you for this - I'll fix the linting bugs & we can talk about some of these high & low level comments in our meeting. FlaSH is still just a prototype, the major difference between this version and the prior version being removing some unnecessary features for the MVP. Chat soon!
df_fix_unstack = df.ffill() | ||
diff_df_small = df_fix_unstack.copy().diff(1).bfill() | ||
upper = 0.75 | ||
lower = 0.25 | ||
df['day'] = [x.weekday() for x in list(df.index)] | ||
diff_df2 = diff_df_small | ||
diff_df2['day'] = df['day'] | ||
diff_df2_stack = diff_df2.drop(columns=['day']).stack().reset_index() | ||
diff_df2_stack.columns = ['date', 'state', 'val'] | ||
diff_df2_stack['weekday'] = diff_df2_stack.date.dt.weekday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of objects floating around here, some of which don't appear to be reused downstream, e.g. diff_df_small
. This logic could be simplified. It's also hard for me to tell what the transformations are doing. More comments/descriptive variable names, please.
Also, when performing the first ffill
s and bfill
s, do you expect the incoming df to have missing values? Some of our indicators may produce output that don't have missing values and instead drop that index entirely. fill
procedures often follow a reindex
ing step.
Notes from discussion:
Broad plan is to get this running in production as soon as possible, even if it's not totally done and not running on all signals. This will give us a better idea of where this is misbehaving (speed? some checks are oversensitive? etc). |
Hi Nat, thanks for meeting. Updates:
Considerations:
Please let me know what you think, especially about the I/O! |
What is being saved to AWS right now? What are the "results"? I'm used to thinking of AWS storing only final results, and all intermediate files only being saved locally. But if the intermediate files take a long time to recreate, having AWS as a backup is convenient.
Tests should be deterministic. It sounds like you're saying that
What does the location of the alert determine? Where it shows up in Slack? |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Ananya-Joshi, made a few performance suggestions and a few utils suggestions. Hope this feedback helps!
header=None).astype(str).set_axis(['FIPS', 'STATE'], axis=1) | ||
fips_lookup['FIPS'] = fips_lookup['FIPS'].str.zfill(2) | ||
fips_to_STATE = fips_lookup.set_index("FIPS").to_dict()['STATE'] | ||
STATE_to_fips = fips_lookup.set_index("STATE").to_dict()['FIPS'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This data should be available in our utils module
from delphi_utils.geomap import GeoMapper
gm = GeoMapper()
state_to_fips = gm.get_crosswalk("state_code", "state_id")
fips_to_state = gm.get_crosswalk("state_id", "state_code")
fips_pop_table = gm.get_crosswalk("state_code", "pop")
``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can map from state code to state id! I can add the extra handling code in the future though (might need to be reused for example from geos.py in delphi_jhu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point, the way to get that table is unintuitive in that util. This should work:
from delphi_utils.geomap import GeoMapper
gmpr = GeoMapper()
fips_to_state = gmpr.get_crosswalk("state", "state")[["state_code", "state_id"]].set_index("state_code").to_dict()["state_id"]
state_to_fips = gmpr.get_crosswalk("state", "state")[["state_id", "state_code"]].set_index("state_id").to_dict()["state_code"]
@@ -0,0 +1,58 @@ | |||
1,al |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these files if geo data is available to every indicator via delphi_utils.geomap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I wasn't able to get state_id to state_code to work, does it work for you ?
"- [x] zip -> fips : population weighted
- [x] zip -> hrr : unweighted
- [x] zip -> msa : unweighted
- [x] zip -> state
- [x] zip -> hhs
- [x] zip -> population
- [x] state code -> hhs
- [x] fips -> state : unweighted
- [x] fips -> msa : unweighted
- [x] fips -> megacounty
- [x] fips -> hrr
- [x] fips -> hhs
- [x] nation"
Thank you so much! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly questions and non-blocking suggestions. I need to come back and look at the rest of eval_day.py
, but wanted you to have some comments to work off of for now.
for signal in signals: | ||
export_files = read_filenames(params["common"]["export_dir"]) | ||
days = {} | ||
for (x, _) in export_files: | ||
#Concat the data from recent files at nation, state, and county resolution per day. | ||
if signal in x and pd.Series([y in x for y in ['state', 'county', 'nation']]).any(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (non-blocking, future improvement): Since read_filenames
output includes the regex match, with named substrings, we already know which signal
and geo_type
each file belongs to. So we can do a bit less work/fewer loop iterations here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with day
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provisional approval. Have discussed with Brian and Ananya that since this is part of delphi-utils
we need to release before it can be used in indicator runs in staging.
However, we expect the internals of this package to change (e.g. tuning output criteria or outlier labeling approach). The package will also need many more tests for us to be confident of its behavior.
question: Do we want this to run for all indicators? IIRC we talked about running it for only JHU (?) in which case the runner.py
bit may need a step where we check the indicator being run.
Yes, it's only going to be used for JHU confirmed_cases_num at the moment. It should be good if the flash parameters are only in the jhu params.json . Would that work? I'll add the nits here ready for Monday morning! Thank you Nat! |
33063b5
to
447d07a
Compare
In the next iteration:
|
Update _delphi_utils_python/delphi_utils/flash_eval/eval_day.py Co-authored-by: Dmitry Shemetov <[email protected]>
5c1e096
to
f78bbfa
Compare
If flash params are entirely missing from
That first line will cause a key error. I see a couple ways to address that. If we make sure that all of the indicators' params files contain a |
We'd talked about caching before but I lost track of progress on it.
Does this need to be done before we release? How long does downloading take for just JHU |
Currently all files are local and nothing needs to be downloaded! In the next iteration, we might move to having some of the parameter files on AWS, but for now they're in flash_ref |
@krivard I think we are good to merge sans any additional comments on this MVP! |
There are some existing concerns:
The provided code is only on the evaluation side - not on generating the test statistic distribution. For more details, please see the FlaSH powerpoint on Slack (or ask me).
Of Note:
Downloading files takes a long time - I'm thinking about a way to deal with caching the relevant data frames
Any help with the above would be appreciated! Please let me know when you have a chance to discuss.
Thank you!