Skip to content

Commit 6524ac8

Browse files
author
Thibault Jeandet
committed
backend pass
1 parent 4ca3563 commit 6524ac8

File tree

7 files changed

+130
-286
lines changed

7 files changed

+130
-286
lines changed

docs/backends/Backends.md

+21-97
Original file line numberDiff line numberDiff line change
@@ -1,109 +1,34 @@
1-
_For the Doc-A-Thon_
2-
**Questions to answer and things to consider:**
3-
4-
1. Who is visiting the General Backends page?
5-
*Do they know what a backend is?*
6-
2. What do they need to know first?
7-
8-
3. Is all the important information there? If not, add it!
9-
*Add information about SLURM? See this [Github issue](https://github.com/broadinstitute/cromwell/issues/1750) for more information.
10-
4. Are there things that don't need to be there? Remove them.
11-
12-
5. Are the code and instructions accurate? Try it!
13-
14-
---
15-
**DELETE ABOVE ONCE COMPLETE**
16-
17-
---
18-
19-
20-
A backend represents a way to run the user's command specified in the `task` section. Cromwell allows for backends conforming to
21-
the Cromwell backend specification to be plugged into the Cromwell engine. Additionally, backends are included with the
1+
A backend represents a way to run the commands of your workflow. Cromwell allows for backends conforming to
2+
the Cromwell backend specification to be plugged into the Cromwell engine. Additionally, backends are included with the
223
Cromwell distribution:
234

24-
* **Local / GridEngine / LSF / etc.** - Run jobs as subprocesses or via a dispatcher. Supports launching in Docker containers. Use `bash`, `qsub`, `bsub`, etc. to run scripts.
25-
* **Google Cloud** - Launch jobs on Google Compute Engine through the Google Genomics Pipelines API.
26-
* **GA4GH TES** - Launch jobs on servers that support the GA4GH Task Execution Schema (TES).
27-
* **HtCondor** - Allows to execute jobs using HTCondor.
28-
* **Spark** - Adds support for execution of spark jobs.
5+
* **[Local](Local)**
6+
* **[HPC](HPC): [SunGridEngine](SGE) / [LSF](LSF) / [HTCondor](HTcondor), [SLURM](SLURM), etc.** - Run jobs as subprocesses or via a dispatcher. Supports launching in Docker containers. Use `bash`, `qsub`, `bsub`, etc. to run scripts.
7+
* **[Google Cloud](Google)** - Launch jobs on Google Compute Engine through the Google Genomics Pipelines API.
8+
* **[GA4GH TES](TES)** - Launch jobs on servers that support the GA4GH Task Execution Schema (TES).
9+
* **[Spark](Spark)** - Supports execution of spark jobs.
10+
11+
HPC backends are put under the same umbrella because they all use the same generic configuration that can be specialized to fit the need of a particular technology.
2912

30-
Backends are specified in the `backend` configuration block under `providers`. Each backend has a configuration that looks like:
13+
Backends are specified in the `backend.providers` configuration. Each backend has a configuration that looks like:
3114

3215
```hocon
33-
backend {
34-
default = "Local"
35-
providers {
36-
BackendName {
37-
actor-factory = "FQN of BackendLifecycleActorFactory instance"
38-
config {
39-
key = "value"
40-
key2 = "value2"
41-
...
42-
}
43-
}
16+
BackendName {
17+
actor-factory = "FQN of BackendLifecycleActorFactory class"
18+
config {
19+
...
4420
}
4521
}
4622
```
4723

4824
The structure within the `config` block will vary from one backend to another; it is the backend implementation's responsibility
4925
to be able to interpret its configuration.
5026

51-
In the example below two backend types are named within the `providers` section here, so both
52-
are available. The default backend is specified by `backend.default` and must match the `name` of one of the
53-
configured backends:
54-
55-
```hocon
56-
backend {
57-
default = "Local"
58-
providers {
59-
Local {
60-
actor-factory = "cromwell.backend.impl.local.LocalBackendLifecycleActorFactory"
61-
config {
62-
root: "cromwell-executions"
63-
filesystems = {
64-
local {
65-
localization: [
66-
"hard-link", "soft-link", "copy"
67-
]
68-
}
69-
gcs {
70-
# References an auth scheme defined in the 'google' stanza.
71-
auth = "application-default"
72-
}
73-
}
74-
}
75-
},
76-
JES {
77-
actor-factory = "cromwell.backend.impl.jes.JesBackendLifecycleActorFactory"
78-
config {
79-
project = "my-cromwell-workflows"
80-
root = "gs://my-cromwell-workflows-bucket"
81-
maximum-polling-interval = 600
82-
dockerhub {
83-
# account = ""
84-
# token = ""
85-
}
86-
genomics {
87-
# A reference to an auth defined in the 'google' stanza at the top. This auth is used to create
88-
# Pipelines and manipulate auth JSONs.
89-
auth = "application-default"
90-
endpoint-url = "https://genomics.googleapis.com/"
91-
}
92-
filesystems = {
93-
gcs {
94-
# A reference to a potentially different auth for manipulating files via engine functions.
95-
auth = "user-via-refresh"
96-
}
97-
}
98-
}
99-
}
100-
]
101-
}
102-
```
27+
The providers section can contain multiple backends which will all be available to Cromwell.
10328

10429
**Backend Job Limits**
10530

106-
You can limit the number of concurrent jobs for a backend by specifying the following option in the backend's config
31+
All backends support limiting the number of concurrent jobs by specifying the following option in the backend's configuration
10732
stanza:
10833

10934
```
@@ -118,12 +43,11 @@ backend {
11843

11944
**Backend Filesystems**
12045

121-
Each backend will utilize filesystems to store the directory structure of an executed workflow. Currently, the backends and the type of filesystems that the backend use are tightly coupled. In future versions of Cromwell, they may be more loosely coupled.
122-
46+
Each backend will utilize a filesystem to store the directory structure and results of an executed workflow.
12347
The backend/filesystem pairings are as follows:
12448

125-
* [Local Backend](Local) and associated backends primarily use the [Shared Local Filesystem](SharedFilesystem).
126-
* [Google Backend](Google) uses the [Google Cloud Storage Filesystem](Google/#google-cloud-storage-filesystem).
49+
* Local, HPC and Spark backend use the [Shared Local Filesystem](SharedFilesystem).
50+
* Google backend uses the [Google Cloud Storage Filesystem](Google/#google-cloud-storage-filesystem).
12751

128-
Note that while Local, SGE, LSF, etc. backends use the local or network filesystem for the directory structure of a workflow, they are able to localize inputs
129-
from GCS paths if configured to use a GCS filesystem. See [Google Storage Filesystem](Google/#google-cloud-storage-filesystem) for more details.
52+
Additional filesystems capabilities can be added depending on the backend.
53+
For instance, an HPC backend can be configured to work with files on Google Cloud Storage. See the HPC documentation for more details.

docs/backends/HPC.md

+54
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
Cromwell provides a generic way to configure a backend relying on most High Performance Computing (HPC) frameworks, and with access to a shared filesystem.
2+
3+
The two main features that are needed for this backend to be used are a way to submit a job to the compute cluster and to get its status through the command line.
4+
You can find example configurations for a variety of those backends here:
5+
6+
* [SGE](SGE)
7+
* [LSF](LSF)
8+
* [SLURM](SLURM)
9+
* [HTCondor](HTcondor)
10+
11+
## FileSystems
12+
13+
### Shared FileSystem
14+
HPC backends rely on being able to access and use a shared filesystem to store workflow results.
15+
16+
Cromwell is configured with a root execution directory which is set in the configuration file under `backend.providers.<backend_name>.config.root`. This is called the `cromwell_root` and it is set to `./cromwell-executions` by default. Relative paths are interpreted as relative to the current working directory of the Cromwell process.
17+
18+
When Cromwell runs a workflow, it first creates a directory `<cromwell_root>/<workflow_uuid>`. This is called the `workflow_root` and it is the root directory for all activity in this workflow.
19+
20+
Each `call` has its own subdirectory located at `<workflow_root>/call-<call_name>`. This is the `<call_dir>`.
21+
Any input files to a call need to be localized into the `<call_dir>/inputs` directory. There are different localization strategies that Cromwell will try until one works. Below is the default order specified in `reference.conf` but it can be overridden:
22+
23+
* `hard-link` - This will create a hard link to the file
24+
* `soft-link` - Create a symbolic link to the file. This strategy is not applicable for tasks which specify a Docker image and will be ignored.
25+
* `copy` - Make a copy the file
26+
27+
Shared filesystem localization is defined in the `config` section of each backend. The default stanza for the Local and HPC backends looks like this:
28+
29+
```
30+
filesystems {
31+
local {
32+
localization: [
33+
"hard-link", "soft-link", "copy"
34+
]
35+
}
36+
}
37+
```
38+
39+
### Additional FileSystems
40+
41+
HPC backends (as well as the Local backend) can be configured to be able to interact with other type of filesystems, where the input files can be located for example.
42+
Currently the only other filesystem supported is Google Cloud Storage (GCS). See the [Google section](Google) of the documentation for information on how to configure GCS in Cromwell.
43+
Once you have a google authentication configured, you can simply add a `gcs` stanza in your configuration file to enable GCS:
44+
45+
```
46+
backend.providers.MyHPCBackend {
47+
filesystems {
48+
gcs {
49+
# A reference to a potentially different auth for manipulating files via engine functions.
50+
auth = "application-default"
51+
}
52+
}
53+
}
54+
```

docs/backends/LSF.md

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
The following configuration can be used as a base to allow Cromwell to interact with an [LSF](https://en.wikipedia.org/wiki/Platform_LSF) cluster and dispatch jobs to it:
2+
3+
```hocon
4+
LSF {
5+
actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
6+
config {
7+
submit = "bsub -J ${job_name} -cwd ${cwd} -o ${out} -e ${err} /bin/bash ${script}"
8+
kill = "bkill ${job_id}"
9+
check-alive = "bjobs ${job_id}"
10+
job-id-regex = "Job <(\\d+)>.*"
11+
}
12+
}
13+
```
14+
15+
For information on how to further configure it, take a look at the [Getting Started on HPC Clusters](../tutorials/HPCIntro)

docs/backends/Local.md

+8-55
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,14 @@
1-
_For the Doc-A-Thon_
2-
**Questions to answer and things to consider:**
3-
4-
1. Who is visiting the Local page?
5-
*This is the first in the list of Backends*
6-
2. What do they need to know first?
7-
8-
3. Is all the important information there? If not, add it!
9-
*What is an rc file? Write out the full name with the abbreviation, Return Code (rc) file, then abbreviate after.*
10-
4. Are there things that don't need to be there? Remove them.
11-
12-
5. Are the code and instructions accurate? Try it!
13-
14-
---
15-
**DELETE ABOVE ONCE COMPLETE**
16-
17-
---
18-
19-
201
**Local Backend**
212

22-
The local backend will simply launch a subprocess for each task invocation and wait for it to produce its rc file.
23-
24-
This backend creates three files in the `<call_dir>` (see previous section):
25-
26-
* `script` - A shell script of the job to be run. This contains the user's command from the `command` section of the WDL code.
27-
* `stdout` - The standard output of the process
28-
* `stderr` - The standard error of the process
29-
30-
The `script` file contains:
31-
32-
```
33-
#!/bin/sh
34-
cd <container_call_root>
35-
<user_command>
36-
echo $? > rc
37-
```
38-
39-
`<container_call_root>` would be equal to `<call_dir>` for non-Docker jobs, or it would be under `/cromwell-executions/<workflow_uuid>/call-<call_name>` if this is running in a Docker container.
40-
41-
When running without docker, the subprocess command that the local backend will launch is:
42-
43-
```
44-
/bin/bash <script>"
45-
```
46-
47-
When running with docker, the subprocess command that the local backend will launch is:
3+
The local backend will simply launch a subprocess for each job invocation and wait for it to produce a return code file (rc file) which will contain the exit code of the job's command.
4+
It is pre-enabled by default and there is no further configuration needed to start using it.
485

49-
```
50-
docker run --rm -v <cwd>:<docker_cwd> -i <docker_image> /bin/bash < <script>
51-
```
6+
It uses the local filesystem on which Cromwell is running to store the workflow directory structure.
527

53-
**NOTE**: If you are using the local backend with Docker and Docker Machine on Mac OS X, by default Cromwell can only
54-
run from in any path under your home directory.
8+
You can find the complete set of configurable settings with explanations in the [example configuration file](https://github.com/broadinstitute/cromwell/blob/b47feaa207fcf9e73e105a7d09e74203fff6f73b/cromwell.examples.conf#L193).
559

56-
The `-v` flag will only work if `<cwd>` is within your home directory because VirtualBox with
57-
Docker Machine only exposes the home directory by default. Any local path used in `-v` that is not within the user's
58-
home directory will silently be interpreted as references to paths on the VirtualBox VM. This can manifest in
59-
Cromwell as tasks failing for odd reasons (like missing RC file)
10+
The Local backend makes use of the same generic configuration as HPC backends. The same [filesystem considerations](HPC#filesystems) apply.
6011

61-
See https://docs.docker.com/engine/userguide/dockervolumes/ for more information on volume mounting in Docker.
12+
**Note to OSX users**: Docker on Mac restricts the directories that can be mounted. Only some directories are allowed by default.
13+
If you try to mount a volume from a disallowed directory, jobs can fail in an odd manner. Before mounting a directory make sure it is in the list
14+
of allowed directories. See the [docker documentation](https://docs.docker.com/docker-for-mac/osxfs/#namespaces) for how to configure those directories.

docs/backends/SLURM.md

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
The following configuration can be used as a base to allow Cromwell to interact with a [SLURM](https://slurm.schedmd.com/) cluster and dispatch jobs to it:
2+
3+
```hocon
4+
SLURM {
5+
actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
6+
config {
7+
runtime-attributes = """
8+
Int runtime_minutes = 600
9+
Int cpus = 2
10+
Int requested_memory_mb_per_core = 8000
11+
String queue = "short"
12+
"""
13+
14+
submit = """
15+
sbatch -J ${job_name} -D ${cwd} -o ${out} -e ${err} -t ${runtime_minutes} -p ${queue} \
16+
${"-n " + cpus} \
17+
--mem-per-cpu=${requested_memory_mb_per_core} \
18+
--wrap "/bin/bash ${script}"
19+
"""
20+
kill = "scancel ${job_id}"
21+
check-alive = "squeue -j ${job_id}"
22+
job-id-regex = "Submitted batch job (\\d+).*"
23+
}
24+
}
25+
```
26+
27+
For information on how to further configure it, take a look at the [Getting Started on HPC Clusters](../tutorials/HPCIntro)

0 commit comments

Comments
 (0)