Skip to content

Commit 01ada52

Browse files
committed
Add information on how to contribute a new diagnostic
1 parent 0a2171a commit 01ada52

File tree

2 files changed

+194
-0
lines changed

2 files changed

+194
-0
lines changed

doc/sphinx/source/developer_guide2/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
Developer's Guide
33
#################
44

5+
.. include:: new_diagnostic.inc
56
.. include:: porting.inc
67
.. include:: git_repository.inc
78
.. include:: core_team.inc
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
.. _new_diagnostic:
2+
3+
***************************************
4+
Contributing a new diagnostic or recipe
5+
***************************************
6+
7+
Getting started
8+
===============
9+
10+
Please discuss your idea for a new diagnostic or recipe with the development team before getting started,
11+
to avoid disappointment later. A good way to do this is to open an
12+
`issue on GitHub <https://github.com/ESMValGroup/ESMValTool/issues>`_.
13+
This is also a good way to get help.
14+
15+
Creating a recipe and diagnostic script(s)
16+
==========================================
17+
First create a recipe in esmvaltool/recipes to define the input data your analysis script needs
18+
and optionally preprocessing and other settings. Also create a script in the esmvaltool/diag_scripts directory
19+
and make sure it is referenced from your recipe. The easiest way to do this is probably to copy the example recipe
20+
and diagnostic script and adjust those to your needs.
21+
A good example recipe is esmvaltool/recipes/examples/recipe_python.yml
22+
and a good example diagnostic is esmvaltool/diag_scripts/examples/diagnostic.py.
23+
24+
If you have no preferred programming language yet, Python 3 is highly recommended, because it is most well supported.
25+
However, NCL, R, and Julia scripts are also supported.
26+
27+
Unfortunately not much documentation is available at this stage,
28+
so have a look at the other recipes and diagnostics for further inspiration.
29+
30+
Re-using existing code
31+
======================
32+
Always make sure your code is or can be released under a license that is compatible with the Apache 2 license.
33+
34+
If you have existing code in a supported scripting language, you have two options for re-using it. If it is fairly
35+
mature and a large amount of code, the preferred way is to package and publish it on the
36+
official package repository for that language and add it as a dependency of esmvaltool.
37+
If it is just a few simple scripts or packaging is not possible (i.e. for NCL) you can simply copy
38+
and paste the source code into the esmvaltool/diag_scripts directory.
39+
40+
If you have existing code in a compiled language like
41+
C, C++, or Fortran that you want to re-use, the recommended way to proceed is to add Python bindings and publish
42+
the package on PyPI so it can be installed as a Python dependency. You can then call the functions it provides
43+
using a Python diagnostic.
44+
45+
Interfaces and provenance
46+
=========================
47+
When ESMValTool runs a recipe, it will first find all data and run the default preprocessor steps plus any
48+
additional preprocessing steps defined in the recipe. Next it will run the diagnostic script defined in the recipe
49+
and finally it will store provenance information. Provenance information is stored in the
50+
`W3C PROV XML format <https://www.w3.org/TR/prov-xml/>`_
51+
and also plotted in an SVG file for human inspection. In addition to provenance information, a caption is also added
52+
to the plots.
53+
54+
In order to communicate with the diagnostic script, two interfaces have been defined, which are described below.
55+
Note that for Python and NCL diagnostics much more convenient methods are available than
56+
directly reading and writing the interface files. For other languages these are not implemented yet.
57+
58+
Using the interfaces from Python
59+
--------------------------------
60+
Always use :meth:`esmvaltool.diag_scripts.shared.run_diagnostic` to start your script and make use of a
61+
:class:`esmvaltool.diag_scripts.shared.ProvenanceLogger` to log provenance. Have a look at the example
62+
Python diagnostic in esmvaltool/recipes/examples/diagnostic.py for a complete example.
63+
64+
Using the interfaces from NCL
65+
-----------------------------
66+
TODO: write this
67+
68+
Generic interface between backend and diagnostic
69+
------------------------------------------------
70+
To provide the diagnostic script with the information it needs to run (e.g. location of input data, various settings),
71+
the backend creates a YAML file called settings.yml and provides the path to this file as the first command line
72+
argument to the diagnostic script.
73+
74+
The most interesting settings provided in this file are
75+
76+
.. code:: yaml
77+
78+
run_dir: /path/to/recipe_output/run/diagnostic_name/script_name
79+
work_dir: /path/to/recipe_output/work/diagnostic_name/script_name
80+
plot_dir: /path/to/recipe_output/work/diagnostic_name/script_name
81+
input_files:
82+
- /path/to/recipe_output/preproc/diagnostic_name/ta/metadata.yml
83+
- /path/to/recipe_output/preproc/diagnostic_name/pr/metadata.yml
84+
85+
Custom settings in the script section of the recipe will also be made available in this file.
86+
87+
There are three directories defined:
88+
89+
- :code:`run_dir` use this for storing temporary files
90+
- :code:`work_dir` use this for storing NetCDF files containing the data used to make a plot
91+
- :code:`plot_dir` use this for storing plots
92+
93+
Finally :code:`input_files` is a list of YAML files, containing a description of the preprocessed data. Each entry in these
94+
YAML files is a path to a preprocessed file in NetCDF format, with a list of various attributes.
95+
An example preprocessor metadata.yml file could look like this
96+
97+
.. code:: yaml
98+
99+
? /path/to/recipe_output/preproc/diagnostic_name/pr/CMIP5_GFDL-ESM2G_Amon_historical_r1i1p1_T2Ms_pr_2000-2002.nc
100+
: cmor_table: CMIP5
101+
dataset: GFDL-ESM2G
102+
diagnostic: diagnostic_name
103+
end_year: 2002
104+
ensemble: r1i1p1
105+
exp: historical
106+
filename: /path/to/recipe_output/preproc/diagnostic_name/pr/CMIP5_GFDL-ESM2G_Amon_historical_r1i1p1_T2Ms_pr_2000-2002.nc
107+
frequency: mon
108+
institute: [NOAA-GFDL]
109+
long_name: Precipitation
110+
mip: Amon
111+
modeling_realm: [atmos]
112+
preprocessor: preprocessor_name
113+
project: CMIP5
114+
recipe_dataset_index: 1
115+
reference_dataset: MPI-ESM-LR
116+
short_name: pr
117+
standard_name: precipitation_flux
118+
start_year: 2000
119+
units: kg m-2 s-1
120+
variable_group: pr
121+
? /path/to/recipe_output/preproc/diagnostic_name/pr/CMIP5_MPI-ESM-LR_Amon_historical_r1i1p1_T2Ms_pr_2000-2002.nc
122+
: cmor_table: CMIP5
123+
dataset: MPI-ESM-LR
124+
diagnostic: diagnostic_name
125+
end_year: 2002
126+
ensemble: r1i1p1
127+
exp: historical
128+
filename: /path/to/recipe_output/preproc/diagnostic1/pr/CMIP5_MPI-ESM-LR_Amon_historical_r1i1p1_T2Ms_pr_2000-2002.nc
129+
frequency: mon
130+
institute: [MPI-M]
131+
long_name: Precipitation
132+
mip: Amon
133+
modeling_realm: [atmos]
134+
preprocessor: preprocessor_name
135+
project: CMIP5
136+
recipe_dataset_index: 2
137+
reference_dataset: MPI-ESM-LR
138+
short_name: pr
139+
standard_name: precipitation_flux
140+
start_year: 2000
141+
units: kg m-2 s-1
142+
variable_group: pr
143+
144+
Generic interface between diagnostic and backend
145+
------------------------------------------------
146+
147+
After the diagnostic script has finished running, the backend will try to store provenance information. In order to
148+
link the produced files to input data, the diagnostic script needs to store a file called diagnostic_provenance.yml
149+
in it's :code:`run_dir`.
150+
151+
For output file produced by the diagnostic script, there should be an entry in the diagnostic_provenance.yml file.
152+
The name of each entry should be the path to the output file.
153+
Each file entry should at least contain the following items
154+
155+
- :code:`ancestors` a list of input files used to create the plot
156+
- :code:`caption` a caption text for the plot
157+
- :code:`plot_file` if the diagnostic also created a plot file, e.g. in .png format.
158+
159+
Each file entry can also contain items from the categories defined in the file esmvaltool/config_references.yml.
160+
The short entries will automatically be replaced by their longer equivalent in the final provenance records.
161+
It is possible to add custom provenance information by adding custom items to entries.
162+
163+
An example preprocessor diagnostic_provenance.yml file could look like this
164+
165+
.. code:: yaml
166+
167+
? /path/to/recipe_output/work/diagnostic_name/script_name/CMIP5_GFDL-ESM2G_Amon_historical_r1i1p1_T2Ms_pr_2000-2002_mean.nc
168+
: ancestors:
169+
- /path/to/recipe_output/preproc/diagnostic_name/pr/CMIP5_GFDL-ESM2G_Amon_historical_r1i1p1_T2Ms_pr_2000-2002.nc
170+
authors: [ande_bo, righ_ma]
171+
caption: Average Precipitation between 2000 and 2002 according to GFDL-ESM2G.
172+
domains: [global]
173+
plot_file: /path/to/recipe_output/plots/diagnostic_name/script_name/CMIP5_GFDL-ESM2G_Amon_historical_r1i1p1_T2Ms_pr_2000-2002_mean.png
174+
plot_type: zonal
175+
references: [acknow_project]
176+
statistics: [mean]
177+
? /path/to/recipe_output/work/diagnostic_name/script_name/CMIP5_MPI-ESM-LR_Amon_historical_r1i1p1_T2Ms_pr_2000-2002_mean.nc
178+
: ancestors:
179+
- /path/to/recipe_output/preproc/diagnostic_name/pr/CMIP5_MPI-ESM-LR_Amon_historical_r1i1p1_T2Ms_pr_2000-2002.nc
180+
authors: [ande_bo, righ_ma]
181+
caption: Average Precipitation between 2000 and 2002 according to MPI-ESM-LR.
182+
domains: [global]
183+
plot_file: /path/to/recipe_output/plots/diagnostic_name/script_name/CMIP5_MPI-ESM-LR_Amon_historical_r1i1p1_T2Ms_pr_2000-2002_mean.png
184+
plot_type: zonal
185+
references: [acknow_project]
186+
statistics: [mean]
187+
188+
You can check whether your diagnostic script successfully provided the provenance information to the backend by
189+
verifying that
190+
191+
- for each output file in the :code:`work_dir`, a file with the same name, but ending with _provenance.xml is created
192+
- any NetCDF files created by your diagnostic script contain a 'provenance' global attribute
193+
- any PNG plots created by your diagnostic script contain the provenance information in the 'Image History' attribute

0 commit comments

Comments
 (0)