-
Notifications
You must be signed in to change notification settings - Fork 16
Docs on new signal deployment details #1986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
830388a
57cb4ec
6a289fd
d2fafd1
67152f7
519582a
9972728
01c9ebe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -238,7 +238,7 @@ This example is taken from [`hhs_hosp`](https://github.com/cmu-delphi/covidcast- | |
|
||
The column is described [here](https://cmu-delphi.github.io/delphi-epidata/api/missing_codes.html). | ||
|
||
#### Testing | ||
#### Local testing | ||
|
||
As a general rule, it helps to decompose your functions into operations for which you can write unit tests. To run the tests, use `make test` in the top-level indicator directory. | ||
|
||
|
@@ -355,25 +355,136 @@ Next, the `acquisition.covidcast` component of the `delphi-epidata` codebase doe | |
12. `value_updated_timestamp`: now | ||
2. Update the `epimetric_latest` table with any new keys or new versions of existing keys. | ||
|
||
### CI/CD | ||
|
||
* Add module name to the `build` job in `.github/workflows/python-ci.yml`. | ||
This allows github actions to run on this indicator code, which includes unit tests and linting. | ||
* Add top-level directory name to `indicator_list` in `Jenkinsfile`. | ||
This allows your code to be automatically deployed to staging after your branch is merged to main, and deployed to prod after `cocivcast-indicators` is released. | ||
* Create `ansible/templates/{top_level_directory_name}-params-prod.json.j2` based on your `params.json.template` with some adjustment: | ||
* "export_dir": "/common/covidcast/receiving/{data-source-name}" | ||
* "log_filename": "/var/log/indicators/{top_level_directory_name}.log" | ||
|
||
Pay attention to the receiving/export directory, as well as how you can store credentials in vault. | ||
Refer to [this guide](https://docs.google.com/document/d/1Bbuvtoxowt7x2_8USx_JY-yTo-Av3oAFlhyG-vXGG-c/edit#heading=h.8kkoy8sx3t7f) for more vault info. | ||
|
||
### Staging | ||
|
||
After developing the pipeline code, but before deploying in development, the pipeline should be run on staging for at least a week. This involves setting up some cronicle jobs as follows: | ||
After developing the pipeline code, but before release to prod, the pipeline should be run on staging for at least a week. | ||
|
||
first the indicator run | ||
The indicator run code is automatically deployed on staging after your branch is merged into main. After merging, make sure you have proper access to Cronicle and staging server `app-mono-dev-01.delphi.cmu.edu` _and_ can see your code on staging at `/home/indicators/runtime/`. | ||
|
||
Then the acquisition run | ||
Then, on Cronicle, create two jobs: First one to run the indicator and second one to load the output csv files into database. | ||
|
||
See [@korlaxxalrok](https://www.github.com/korlaxxalrok) or [@minhkhul](https://www.github.com/minhkhul) for more information. | ||
#### Indicator run job | ||
|
||
This job ssh into our staging server and run the indicator, producing csv files output. | ||
minhkhul marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Example script: | ||
|
||
https://cronicle-prod-01.delphi.cmu.edu/#Schedule?sub=edit_event&id=elr5clgy6rs | ||
``` | ||
#!/bin/sh | ||
|
||
# vars | ||
user='automation' | ||
host='app-mono-dev-01.delphi.cmu.edu' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @korlaxxalrok which of these should be removed? Do we need to change any user or host names? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, those are pretty well exposed already, so no point in making changes for them. Generally, my instinct is that this kind of documentation should live privately "elsewhere" and we should link to it from our public assets. I realize the "elsewhere" part of this has never been solved, or when attempted to be solved has not been consistently adopted, but it is something we should still consider for the future and treat like a best practice when possible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesnt hurt to scrub as much of the sensitive stuff as possible... Even if some bits of it are already "out there", reducing the number of current instances shrinks the attack surface (kinda) and makes it a little harder for a malicious actor to find useful information. I definitely prefer having documentation like this in version control -- its easier to find, use, and see the timeline of changes. It puts instructions and details "next to" the code assets that are being discussed, and in a format closer to whats practical for developers and operations (as opposed to the complicated wysiwyg editor of gdocs). We could create a new private documentation-only repository or even just a top-level There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ie:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something like Docsify might be nice, for eventually building a docs site from Markdown files. I'd maybe push for a separate repo, to make it simpler to clone the repo without getting extra stuff (delph-admin comes with a lot of extra bits), to add access control, and to eventually make it easier to add CI to build a site from it. We tried something like this before, but with a different tool (mkdocs), ultimately building it into a small site at https://delphi.cmu.edu/systems-handbook. There is a process to clone the repo, make changes, build it locally for review, and publish changes. It never caught on though, and the wheel kept going round. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree completely, great points. The "where" is definitely a problem. I wanted to put the "how to make an indicator" manual in this repo so it's close to all the indicator code. Alternative ideas:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @melange396 opened an issue for this. |
||
ind_name='nchs_mortality' | ||
acq_ind_name='nchs-mortality' | ||
|
||
# chain_data to be sent to acquisition job | ||
chain_data=$(jo chain_data=$(jo acq_ind_name=${acq_ind_name} ind_name=${ind_name} user=${user} host=${host})); | ||
echo "${chain_data}"; | ||
|
||
ssh -T -l ${user} ${host} "sudo -u indicators -s bash -c 'cd /home/indicators/runtime/${ind_name} && env/bin/python -m delphi_${ind_name}'"; | ||
``` | ||
|
||
Note the staging hostname in `host`. | ||
|
||
Note that `ind_name` variable here refer to the top-level directory name where code is located, while `acq_ind_name` refer to the directory name where output csv files are located, which corresponds to the name of `source` column in our database, as mentioned in step 3. | ||
|
||
To automatically run acquisition job right after indicator job finishes successfully: | ||
|
||
1. In `Plugin` section, select `Interpret JSON in Output`. | ||
2. In `Chain Reaction` section, select your acquisition run job below to `Run Event on Success` | ||
minhkhul marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
You can read more about how the `chain_data` json object in the script above can be used in our subsequent acquisition job [here](https://github.com/jhuckaby/Cronicle/blob/master/docs/Plugins.md#chain-reaction-control). | ||
|
||
#### Acquisition job | ||
|
||
The acquisition job use chained data from indicator job to determine where csv output files are, then load them into our database. | ||
minhkhul marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Example script: | ||
|
||
``` | ||
#!/usr/bin/python3 | ||
|
||
https://cronicle-prod-01.delphi.cmu.edu/#Schedule?sub=edit_event&id=elr5ctl7art | ||
import subprocess | ||
import json | ||
|
||
Note the staging hostname and how the acquisition job is chained to run right after the indicator job. Do a few test runs. | ||
str_data = input() | ||
print(str_data) | ||
|
||
data = json.loads(str_data, strict=False) | ||
chain_data = data["chain_data"] | ||
user = chain_data["user"] | ||
host = chain_data["host"] | ||
acq_ind_name = chain_data["acq_ind_name"] | ||
|
||
cmd = f'''ssh -T -l {user} {host} "cd ~/driver && python3 -m delphi.epidata.acquisition.covidcast.csv_to_database --data_dir=/common/covidcast --indicator_name={acq_ind_name} --log_file=/var/log/epidata/csv_upload_{acq_ind_name}.log"''' | ||
|
||
std_err, std_out = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate() | ||
|
||
print(std_err.decode('UTF-8')) | ||
print(std_out.decode('UTF-8')) | ||
``` | ||
|
||
#### Staging database checks | ||
|
||
Apart from checking the logs of staging indicator run and acquisition jobs to identify potential issues with the pipeline, one can also check the contents of staging database for abnormalities. | ||
|
||
At this point, acquisition job should have loaded data onto staging mysql db, specifically the `covid` database. | ||
|
||
``` | ||
mysql> use covid; | ||
minhkhul marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Database changed | ||
``` | ||
Check `signal_dim` table to see if new source and signal names are all present and reasonable. For example: | ||
``` | ||
mysql> select * from signal_dim where source='nssp'; | ||
+---------------+--------+----------------------------------+ | ||
| signal_key_id | source | signal | | ||
+---------------+--------+----------------------------------+ | ||
| 817 | nssp | pct_ed_visits_combined | | ||
| 818 | nssp | pct_ed_visits_covid | | ||
| 819 | nssp | pct_ed_visits_influenza | | ||
| 820 | nssp | pct_ed_visits_rsv | | ||
| 821 | nssp | smoothed_pct_ed_visits_combined | | ||
| 822 | nssp | smoothed_pct_ed_visits_covid | | ||
| 823 | nssp | smoothed_pct_ed_visits_influenza | | ||
| 824 | nssp | smoothed_pct_ed_visits_rsv | | ||
+---------------+--------+----------------------------------+ | ||
``` | ||
|
||
Then, check if number of records ingested in db matches with number of rows in csv when running locally. | ||
For example, the below query sets the `issue` date being the day acquisition job was run, and `signal_key_id` correspond with signals from our new source. | ||
Check if this count matches with local run result. | ||
|
||
``` | ||
mysql> SELECT count(*) FROM epimetric_full WHERE issue=202425 AND signal_key_id > 816 AND signal_key_id < 825; | ||
+----------+ | ||
| count(*) | | ||
+----------+ | ||
| 2620872 | | ||
+----------+ | ||
1 row in set (0.80 sec) | ||
``` | ||
|
||
You can also check how data looks more specifically at each geo level or among different signal names depending on the quirks of the source. | ||
|
||
See [@korlaxxalrok](https://www.github.com/korlaxxalrok) or [@minhkhul](https://www.github.com/minhkhul) for more information. | ||
|
||
If everything goes well (check staging db if data is ingested properly), make a prod version of the indicator run job and use that to run indicator on a daily basis. | ||
If everything goes well make a prod version of the indicator run job and use that to run indicator on a daily basis. | ||
|
||
Another thing to do is setting up the params.json template file in accordance with how you want to run the indicator and acquisition. Pay attention to the receiving directory, as well as how you can store credentials in vault. Refer to [this guide](https://docs.google.com/document/d/1Bbuvtoxowt7x2_8USx_JY-yTo-Av3oAFlhyG-vXGG-c/edit#heading=h.8kkoy8sx3t7f) for more vault info. | ||
|
||
### Signal Documentation | ||
|
||
|
Uh oh!
There was an error while loading. Please reload this page.