Skip to content

Docs on new signal deployment details #1986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 10, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 121 additions & 10 deletions _template_python/INDICATOR_DEV_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ This example is taken from [`hhs_hosp`](https://github.com/cmu-delphi/covidcast-

The column is described [here](https://cmu-delphi.github.io/delphi-epidata/api/missing_codes.html).

#### Testing
#### Local testing

As a general rule, it helps to decompose your functions into operations for which you can write unit tests. To run the tests, use `make test` in the top-level indicator directory.

Expand Down Expand Up @@ -355,25 +355,136 @@ Next, the `acquisition.covidcast` component of the `delphi-epidata` codebase doe
12. `value_updated_timestamp`: now
2. Update the `epimetric_latest` table with any new keys or new versions of existing keys.

### CI/CD

* Add module name to the `build` job in `.github/workflows/python-ci.yml`.
This allows github actions to run on this indicator code, which includes unit tests and linting.
* Add top-level directory name to `indicator_list` in `Jenkinsfile`.
This allows your code to be automatically deployed to staging after your branch is merged to main, and deployed to prod after `cocivcast-indicators` is released.
* Create `ansible/templates/{top_level_directory_name}-params-prod.json.j2` based on your `params.json.template` with some adjustment:
* "export_dir": "/common/covidcast/receiving/{data-source-name}"
* "log_filename": "/var/log/indicators/{top_level_directory_name}.log"

Pay attention to the receiving/export directory, as well as how you can store credentials in vault.
Refer to [this guide](https://docs.google.com/document/d/1Bbuvtoxowt7x2_8USx_JY-yTo-Av3oAFlhyG-vXGG-c/edit#heading=h.8kkoy8sx3t7f) for more vault info.

### Staging

After developing the pipeline code, but before deploying in development, the pipeline should be run on staging for at least a week. This involves setting up some cronicle jobs as follows:
After developing the pipeline code, but before release to prod, the pipeline should be run on staging for at least a week.

first the indicator run
The indicator run code is automatically deployed on staging after your branch is merged into main. After merging, make sure you have proper access to Cronicle and staging server `app-mono-dev-01.delphi.cmu.edu` _and_ can see your code on staging at `/home/indicators/runtime/`.

Then the acquisition run
Then, on Cronicle, create two jobs: First one to run the indicator and second one to load the output csv files into database.

See [@korlaxxalrok](https://www.github.com/korlaxxalrok) or [@minhkhul](https://www.github.com/minhkhul) for more information.
#### Indicator run job

This job ssh into our staging server and run the indicator, producing csv files output.

Example script:

https://cronicle-prod-01.delphi.cmu.edu/#Schedule?sub=edit_event&id=elr5clgy6rs
```
#!/bin/sh

# vars
user='automation'
host='app-mono-dev-01.delphi.cmu.edu'
Copy link
Contributor

@nmdefries nmdefries Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hostname publicly available here and here. User derivable here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@korlaxxalrok which of these should be removed? Do we need to change any user or host names?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, those are pretty well exposed already, so no point in making changes for them.

Generally, my instinct is that this kind of documentation should live privately "elsewhere" and we should link to it from our public assets. I realize the "elsewhere" part of this has never been solved, or when attempted to be solved has not been consistently adopted, but it is something we should still consider for the future and treat like a best practice when possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesnt hurt to scrub as much of the sensitive stuff as possible... Even if some bits of it are already "out there", reducing the number of current instances shrinks the attack surface (kinda) and makes it a little harder for a malicious actor to find useful information.

I definitely prefer having documentation like this in version control -- its easier to find, use, and see the timeline of changes. It puts instructions and details "next to" the code assets that are being discussed, and in a format closer to whats practical for developers and operations (as opposed to the complicated wysiwyg editor of gdocs).

We could create a new private documentation-only repository or even just a top-level /documentation directory in the existing private https://github.com/cmu-delphi/delphi-admin repo. It wouldnt be as nice as having the docs in the same repo as the code, but it would be protected and cross-references will be handled well by the github web ui (for privileged users).

Copy link
Contributor

@melange396 melange396 Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ie:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like Docsify might be nice, for eventually building a docs site from Markdown files. I'd maybe push for a separate repo, to make it simpler to clone the repo without getting extra stuff (delph-admin comes with a lot of extra bits), to add access control, and to eventually make it easier to add CI to build a site from it.

We tried something like this before, but with a different tool (mkdocs), ultimately building it into a small site at https://delphi.cmu.edu/systems-handbook. There is a process to clone the repo, make changes, build it locally for review, and publish changes. It never caught on though, and the wheel kept going round.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree completely, great points. The "where" is definitely a problem. I wanted to put the "how to make an indicator" manual in this repo so it's close to all the indicator code.

Alternative ideas:

  • Put documentation/manuals in relevant repo, but omit host/user/login details and just say "talk to Brian"
  • Ditto but link to login details stored in a separate private repo
  • Put all documentation/manuals in a dedicated private repo. I like that it'd be on GH, but harder to discover for a given topic
  • Put all documentation/manuals in a dedicated folder in GDrive. Also not discoverable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@melange396 opened an issue for this. delphi-admin is private and so a good place to store sensitive info.

ind_name='nchs_mortality'
acq_ind_name='nchs-mortality'

# chain_data to be sent to acquisition job
chain_data=$(jo chain_data=$(jo acq_ind_name=${acq_ind_name} ind_name=${ind_name} user=${user} host=${host}));
echo "${chain_data}";

ssh -T -l ${user} ${host} "sudo -u indicators -s bash -c 'cd /home/indicators/runtime/${ind_name} && env/bin/python -m delphi_${ind_name}'";
```

Note the staging hostname in `host`.

Note that `ind_name` variable here refer to the top-level directory name where code is located, while `acq_ind_name` refer to the directory name where output csv files are located, which corresponds to the name of `source` column in our database, as mentioned in step 3.

To automatically run acquisition job right after indicator job finishes successfully:

1. In `Plugin` section, select `Interpret JSON in Output`.
2. In `Chain Reaction` section, select your acquisition run job below to `Run Event on Success`

You can read more about how the `chain_data` json object in the script above can be used in our subsequent acquisition job [here](https://github.com/jhuckaby/Cronicle/blob/master/docs/Plugins.md#chain-reaction-control).

#### Acquisition job

The acquisition job use chained data from indicator job to determine where csv output files are, then load them into our database.

Example script:

```
#!/usr/bin/python3

https://cronicle-prod-01.delphi.cmu.edu/#Schedule?sub=edit_event&id=elr5ctl7art
import subprocess
import json

Note the staging hostname and how the acquisition job is chained to run right after the indicator job. Do a few test runs.
str_data = input()
print(str_data)

data = json.loads(str_data, strict=False)
chain_data = data["chain_data"]
user = chain_data["user"]
host = chain_data["host"]
acq_ind_name = chain_data["acq_ind_name"]

cmd = f'''ssh -T -l {user} {host} "cd ~/driver && python3 -m delphi.epidata.acquisition.covidcast.csv_to_database --data_dir=/common/covidcast --indicator_name={acq_ind_name} --log_file=/var/log/epidata/csv_upload_{acq_ind_name}.log"'''

std_err, std_out = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()

print(std_err.decode('UTF-8'))
print(std_out.decode('UTF-8'))
```

#### Staging database checks

Apart from checking the logs of staging indicator run and acquisition jobs to identify potential issues with the pipeline, one can also check the contents of staging database for abnormalities.

At this point, acquisition job should have loaded data onto staging mysql db, specifically the `covid` database.

```
mysql> use covid;
Database changed
```
Check `signal_dim` table to see if new source and signal names are all present and reasonable. For example:
```
mysql> select * from signal_dim where source='nssp';
+---------------+--------+----------------------------------+
| signal_key_id | source | signal |
+---------------+--------+----------------------------------+
| 817 | nssp | pct_ed_visits_combined |
| 818 | nssp | pct_ed_visits_covid |
| 819 | nssp | pct_ed_visits_influenza |
| 820 | nssp | pct_ed_visits_rsv |
| 821 | nssp | smoothed_pct_ed_visits_combined |
| 822 | nssp | smoothed_pct_ed_visits_covid |
| 823 | nssp | smoothed_pct_ed_visits_influenza |
| 824 | nssp | smoothed_pct_ed_visits_rsv |
+---------------+--------+----------------------------------+
```

Then, check if number of records ingested in db matches with number of rows in csv when running locally.
For example, the below query sets the `issue` date being the day acquisition job was run, and `signal_key_id` correspond with signals from our new source.
Check if this count matches with local run result.

```
mysql> SELECT count(*) FROM epimetric_full WHERE issue=202425 AND signal_key_id > 816 AND signal_key_id < 825;
+----------+
| count(*) |
+----------+
| 2620872 |
+----------+
1 row in set (0.80 sec)
```

You can also check how data looks more specifically at each geo level or among different signal names depending on the quirks of the source.

See [@korlaxxalrok](https://www.github.com/korlaxxalrok) or [@minhkhul](https://www.github.com/minhkhul) for more information.

If everything goes well (check staging db if data is ingested properly), make a prod version of the indicator run job and use that to run indicator on a daily basis.
If everything goes well make a prod version of the indicator run job and use that to run indicator on a daily basis.

Another thing to do is setting up the params.json template file in accordance with how you want to run the indicator and acquisition. Pay attention to the receiving directory, as well as how you can store credentials in vault. Refer to [this guide](https://docs.google.com/document/d/1Bbuvtoxowt7x2_8USx_JY-yTo-Av3oAFlhyG-vXGG-c/edit#heading=h.8kkoy8sx3t7f) for more vault info.

### Signal Documentation

Expand Down