Skip to content

Commit 71bad06

Browse files
authored
Merge pull request #45 from cmu-delphi/ds/readme
feat: add Makefile, update README
2 parents 98c1370 + 094770a commit 71bad06

File tree

2 files changed

+51
-49
lines changed

2 files changed

+51
-49
lines changed

Makefile

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
install:
2+
Rscript -e "install.packages(c('renv', 'pak'))"
3+
Rscript -e "renv::restore()"
4+
5+
run:
6+
Rscript run.R

README.md

Lines changed: 45 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,50 @@
1-
# Existing tools
2-
3-
- [batchtools](https://mllg.github.io/batchtools/)
4-
- probably worth comparing the save output format with that generated by hubutils
5-
- definitely seems like it will save a lot of headache on exploration; probably not as useful for actual live forecasting
6-
- their "algorithm" should I think correspond to a forecaster
7-
- any `problem`s we add that modify the data should do so by returning the modified version in instance, rather than as access functions.
8-
- [hubutils](https://infectious-disease-modeling-hubs.github.io/hubUtils/index.html)
9-
- sort of a different direction focused more on aggregating results from several places. I think the output format is something I should target; file format of parquet
10-
- [epiforecasts](https://github.com/epiforecasts)
11-
- another group, they have a scoring utils package
12-
- [scoringutils](https://epiforecasts.io/scoringutils/)
13-
- it does not. For quantile models, they expect ‘true_value’, ‘prediction’, ‘quantile’
14-
- hubverse expects 'output_type' 'output_type_id' and 'value'
15-
- easy enough to map between them though
16-
17-
# Things I definitely need:
18-
19-
- a way to produce forecasts
20-
this should also be easily used in production
21-
- a way to score forecasts
22-
- I currently have one that only does WIS; I think switching to scoringutils wouldn't take much time at all
23-
- a way to compare scores
24-
25-
currently, I'm producing forecasts and evaluating at the same time. Actually, no I'm not. I'm first doing an `epix_slide` to produce forecasts, and then
26-
27-
- parallel over forecasterXahead definitions:
28-
- for each (forecaster,ahead):
29-
- generate forecast
30-
- evaluate forecast
31-
- save
32-
33-
# Kinds of forecasters
34-
## Basic
1+
# Exploration Tooling
2+
3+
This repo is meant to be a place to explore different forecasting methods and tools for doing so.
4+
The goal is to unify COVID forecasting and flu forecasting in one repo.
5+
The repo is structured as a [targets](https://docs.ropensci.org/targets/) project, which means that it is easy to run things in parallel and to cache results.
6+
The repo is also structured as an R package, which means that it is easy to share code between different targets.
7+
8+
## Usage
9+
10+
```sh
11+
# Install renv and R dependencies.
12+
make install
13+
14+
# Run the pipeline wrapper run.R.
15+
make run
16+
```
17+
18+
## Directory Layout
19+
20+
- `R/`: R package code to be reused
21+
- `extras/`: plotting and notebook code
22+
- `covid_hosp_explore/`: a `targets` project for exploring covid hospitalization forecasters
23+
- `flu_hosp_explore/`: a `targets` project for exploring flu hospitalization forecasters
24+
- `covid_hosp_prod/`: a `targets` project for predicting covid hospitalizations
25+
- `flu_hosp_prod/`: a `targets` project for predicting flu hospitalizations
26+
- `testing`: for debugging forecasters and doing sanity checks
27+
28+
## Tricky Gotchas
29+
30+
Currently, to run in parallel, you need to make sure to install the package via `renv::install(".")` and not just via `devtools::load_all()`.
31+
Therefore we recommend developing serially, but running exploration in parallel.
32+
33+
## Pipeline Design
34+
35+
See [this diagram](https://excalidraw.com/#room=85f8bfeb397ddf29f110,q8nOcBql7ACvhgCyjXu98g).
36+
Double diamond objects represent plates (to evoke [plate notation](https://en.wikipedia.org/wiki/Plate_notation), but don't take the comparison too literally), which are used to represent multiple objects of the same type (e.g. different forecasters).
37+
38+
## Notes on Forecaster Types
39+
40+
### Basic
41+
3542
The basic forecaster takes in an epi_df, does some pre-processing, does an epipredict workflow, and then some post-processing
36-
## Ensemble
37-
This kind of forecaster has two components: a list of existing forecasters it depends on, and a function that aggregates those forecasters.
38-
## (to be named)
39-
Any forecaster which requires a pre-trained component. An example is a forecaster with a sophisticated imputation method. Evaluating these has some thorns around training/testing splitting. It may be foldable into the basic variety though.
40-
# later things
41-
- a way to check that a given function is or is not in the right format to be a forecaster
4243

44+
### Ensemble
4345

44-
# Random notes
45-
Currently, to run in parallel, you need to install the package via `renv::install(".")`.
46-
The parallel workers will continue to use the version as of the last time you ran `renv::install`, while the non-parallel ones won't. This separates development from exploration.
46+
This kind of forecaster has two components: a list of existing forecasters it depends on, and a function that aggregates those forecasters.
4747

48+
### (to be named)
4849

49-
# Targets projects
50-
- testing: for debugging forecasters and doing sanity checks
51-
- flu_hosp_explore: for exploring flu hospitalization forecasters
52-
- covid_hosp_explore: for exploring covid hospitalization forecasters
53-
- flu_hosp_prod: for predicting flu hospitalizations
54-
- covid_hosp_prod: for predicting flu hospitalizations
50+
Any forecaster which requires a pre-trained component. An example is a forecaster with a sophisticated imputation method. Evaluating these has some thorns around training/testing splitting. It may be foldable into the basic variety though.

0 commit comments

Comments
 (0)