Merge pull request #45 from cmu-delphi/ds/readme

dshemetov · web-flow · commit 71bad0685964 · 2023-10-23T13:27:44.000-07:00
feat: add Makefile, update README
diff --git a/Makefile b/Makefile
@@ -0,0 +1,6 @@
+install:
+	Rscript -e "install.packages(c('renv', 'pak'))"
+	Rscript -e "renv::restore()"
+
+run:
+	Rscript run.R
diff --git a/README.md b/README.md
@@ -1,54 +1,50 @@
-# Existing tools
-
-- [batchtools](https://mllg.github.io/batchtools/)
-  - probably worth comparing the save output format with that generated by hubutils
-  - definitely seems like it will save a lot of headache on exploration; probably not as useful for actual live forecasting
-  - their "algorithm" should I think correspond to a forecaster
-  - any `problem`s we add that modify the data should do so by returning the modified version in instance, rather than as access functions.
-- [hubutils](https://infectious-disease-modeling-hubs.github.io/hubUtils/index.html)
-  - sort of a different direction focused more on aggregating results from several places. I think the output format is something I should target; file format of parquet
-- [epiforecasts](https://github.com/epiforecasts)
-  - another group, they have a scoring utils package
-  - [scoringutils](https://epiforecasts.io/scoringutils/)
-    - it does not. For quantile models, they expect ‘true_value’, ‘prediction’, ‘quantile’
-    - hubverse expects 'output_type' 'output_type_id' and 'value'
-    - easy enough to map between them though
-
-# Things I definitely need:
-
-- a way to produce forecasts
-  this should also be easily used in production
-- a way to score forecasts
-  - I currently have one that only does WIS; I think switching to scoringutils wouldn't take much time at all
-- a way to compare scores
-
-currently, I'm producing forecasts and evaluating at the same time. Actually, no I'm not. I'm first doing an `epix_slide` to produce forecasts, and then
-
-- parallel over forecasterXahead definitions:
-  - for each (forecaster,ahead):
-    - generate forecast
-    - evaluate forecast
-    - save
-
-# Kinds of forecasters
-## Basic
+# Exploration Tooling
+
+This repo is meant to be a place to explore different forecasting methods and tools for doing so.
+The goal is to unify COVID forecasting and flu forecasting in one repo.
+The repo is structured as a [targets](https://docs.ropensci.org/targets/) project, which means that it is easy to run things in parallel and to cache results.
+The repo is also structured as an R package, which means that it is easy to share code between different targets.
+
+## Usage
+
+```sh
+# Install renv and R dependencies.
+make install
+
+# Run the pipeline wrapper run.R.
+make run
+```
+
+## Directory Layout
+
+-   `R/`: R package code to be reused
+-   `extras/`: plotting and notebook code
+-   `covid_hosp_explore/`: a `targets` project for exploring covid hospitalization forecasters
+-   `flu_hosp_explore/`: a `targets` project for exploring flu hospitalization forecasters
+-   `covid_hosp_prod/`: a `targets` project for predicting covid hospitalizations
+-   `flu_hosp_prod/`: a `targets` project for predicting flu hospitalizations
+-   `testing`: for debugging forecasters and doing sanity checks
+
+## Tricky Gotchas
+
+Currently, to run in parallel, you need to make sure to install the package via `renv::install(".")` and not just via `devtools::load_all()`.
+Therefore we recommend developing serially, but running exploration in parallel.
+
+## Pipeline Design
+
+See [this diagram](https://excalidraw.com/#room=85f8bfeb397ddf29f110,q8nOcBql7ACvhgCyjXu98g).
+Double diamond objects represent plates (to evoke [plate notation](https://en.wikipedia.org/wiki/Plate_notation), but don't take the comparison too literally), which are used to represent multiple objects of the same type (e.g. different forecasters).
+
+## Notes on Forecaster Types
+
+### Basic
+
 The basic forecaster takes in an epi_df, does some pre-processing, does an epipredict workflow, and then some post-processing
-## Ensemble
-This kind of forecaster has two components: a list of existing forecasters it depends on, and a function that aggregates those forecasters.
-## (to be named)
-Any forecaster which requires a pre-trained component. An example is a forecaster with a sophisticated imputation method. Evaluating these has some thorns around training/testing splitting. It may be foldable into the basic variety though.
-# later things
-- a way to check that a given function is or is not in the right format to be a forecaster
 
+### Ensemble
 
-# Random notes
-Currently, to run in parallel, you need to install the package via `renv::install(".")`.
-The parallel workers will continue to use the version as of the last time you ran `renv::install`, while the non-parallel ones won't. This separates development from exploration.
+This kind of forecaster has two components: a list of existing forecasters it depends on, and a function that aggregates those forecasters.
 
+### (to be named)
 
-# Targets projects
-- testing: for debugging forecasters and doing sanity checks
-- flu_hosp_explore: for exploring flu hospitalization forecasters
-- covid_hosp_explore: for exploring covid hospitalization forecasters
-- flu_hosp_prod: for predicting flu hospitalizations
-- covid_hosp_prod: for predicting flu hospitalizations
+Any forecaster which requires a pre-trained component. An example is a forecaster with a sophisticated imputation method. Evaluating these has some thorns around training/testing splitting. It may be foldable into the basic variety though.