|
| 1 | +.. _new_diagnostic: |
| 2 | + |
| 3 | +*************************************** |
| 4 | +Contributing a new diagnostic or recipe |
| 5 | +*************************************** |
| 6 | + |
| 7 | +Getting started |
| 8 | +=============== |
| 9 | +
|
| 10 | +Please discuss your idea for a new diagnostic or recipe with the development team before getting started, |
| 11 | +to avoid disappointment later. A good way to do this is to open an |
| 12 | +`issue on GitHub <https://github.com/ESMValGroup/ESMValTool/issues>`_. |
| 13 | +This is also a good way to get help. |
| 14 | +
|
| 15 | +Creating a recipe and diagnostic script(s) |
| 16 | +========================================== |
| 17 | +First create a recipe in esmvaltool/recipes to define the input data your analysis script needs |
| 18 | +and optionally preprocessing and other settings. Also create a script in the esmvaltool/diag_scripts directory |
| 19 | +and make sure it is referenced from your recipe. The easiest way to do this is probably to copy the example recipe |
| 20 | +and diagnostic script and adjust those to your needs. |
| 21 | +A good example recipe is esmvaltool/recipes/examples/recipe_python.yml |
| 22 | +and a good example diagnostic is esmvaltool/diag_scripts/examples/diagnostic.py. |
| 23 | +
|
| 24 | +If you have no preferred programming language yet, Python 3 is highly recommended, because it is most well supported. |
| 25 | +However, NCL, R, and Julia scripts are also supported. |
| 26 | +
|
| 27 | +Unfortunately not much documentation is available at this stage, |
| 28 | +so have a look at the other recipes and diagnostics for further inspiration. |
| 29 | +
|
| 30 | +Re-using existing code |
| 31 | +====================== |
| 32 | +Always make sure your code is or can be released under a license that is compatible with the Apache 2 license. |
| 33 | +
|
| 34 | +If you have existing code in a supported scripting language, you have two options for re-using it. If it is fairly |
| 35 | +mature and a large amount of code, the preferred way is to package and publish it on the |
| 36 | +official package repository for that language and add it as a dependency of esmvaltool. |
| 37 | +If it is just a few simple scripts or packaging is not possible (i.e. for NCL) you can simply copy |
| 38 | +and paste the source code into the esmvaltool/diag_scripts directory. |
| 39 | +
|
| 40 | +If you have existing code in a compiled language like |
| 41 | +C, C++, or Fortran that you want to re-use, the recommended way to proceed is to add Python bindings and publish |
| 42 | +the package on PyPI so it can be installed as a Python dependency. You can then call the functions it provides |
| 43 | +using a Python diagnostic. |
| 44 | +
|
| 45 | +Interfaces and provenance |
| 46 | +========================= |
| 47 | +When ESMValTool runs a recipe, it will first find all data and run the default preprocessor steps plus any |
| 48 | +additional preprocessing steps defined in the recipe. Next it will run the diagnostic script defined in the recipe |
| 49 | +and finally it will store provenance information. Provenance information is stored in the |
| 50 | +`W3C PROV XML format <https://www.w3.org/TR/prov-xml/>`_ |
| 51 | +and also plotted in an SVG file for human inspection. In addition to provenance information, a caption is also added |
| 52 | +to the plots. |
| 53 | +
|
| 54 | +In order to communicate with the diagnostic script, two interfaces have been defined, which are described below. |
| 55 | +Note that for Python and NCL diagnostics much more convenient methods are available than |
| 56 | +directly reading and writing the interface files. For other languages these are not implemented yet. |
| 57 | +
|
| 58 | +Using the interfaces from Python |
| 59 | +-------------------------------- |
| 60 | +Always use :meth:`esmvaltool.diag_scripts.shared.run_diagnostic` to start your script and make use of a |
| 61 | +:class:`esmvaltool.diag_scripts.shared.ProvenanceLogger` to log provenance. Have a look at the example |
| 62 | +Python diagnostic in esmvaltool/recipes/examples/diagnostic.py for a complete example. |
| 63 | +
|
| 64 | +Using the interfaces from NCL |
| 65 | +----------------------------- |
| 66 | +TODO: write this |
| 67 | +
|
| 68 | +Generic interface between backend and diagnostic |
| 69 | +------------------------------------------------ |
| 70 | +To provide the diagnostic script with the information it needs to run (e.g. location of input data, various settings), |
| 71 | +the backend creates a YAML file called settings.yml and provides the path to this file as the first command line |
| 72 | +argument to the diagnostic script. |
| 73 | +
|
| 74 | +The most interesting settings provided in this file are |
| 75 | +
|
| 76 | +.. code:: yaml |
| 77 | +
|
| 78 | + run_dir: /path/to/recipe_output/run/diagnostic_name/script_name |
| 79 | + work_dir: /path/to/recipe_output/work/diagnostic_name/script_name |
| 80 | + plot_dir: /path/to/recipe_output/work/diagnostic_name/script_name |
| 81 | + input_files: |
| 82 | + - /path/to/recipe_output/preproc/diagnostic_name/ta/metadata.yml |
| 83 | + - /path/to/recipe_output/preproc/diagnostic_name/pr/metadata.yml |
| 84 | +
|
| 85 | +Custom settings in the script section of the recipe will also be made available in this file. |
| 86 | +
|
| 87 | +There are three directories defined: |
| 88 | +
|
| 89 | +- :code:`run_dir` use this for storing temporary files |
| 90 | +- :code:`work_dir` use this for storing NetCDF files containing the data used to make a plot |
| 91 | +- :code:`plot_dir` use this for storing plots |
| 92 | +
|
| 93 | +Finally :code:`input_files` is a list of YAML files, containing a description of the preprocessed data. Each entry in these |
| 94 | +YAML files is a path to a preprocessed file in NetCDF format, with a list of various attributes. |
| 95 | +An example preprocessor metadata.yml file could look like this |
| 96 | +
|
| 97 | +.. code:: yaml |
| 98 | +
|
| 99 | + ? /path/to/recipe_output/preproc/diagnostic_name/pr/CMIP5_GFDL-ESM2G_Amon_historical_r1i1p1_T2Ms_pr_2000-2002.nc |
| 100 | + : cmor_table: CMIP5 |
| 101 | + dataset: GFDL-ESM2G |
| 102 | + diagnostic: diagnostic_name |
| 103 | + end_year: 2002 |
| 104 | + ensemble: r1i1p1 |
| 105 | + exp: historical |
| 106 | + filename: /path/to/recipe_output/preproc/diagnostic_name/pr/CMIP5_GFDL-ESM2G_Amon_historical_r1i1p1_T2Ms_pr_2000-2002.nc |
| 107 | + frequency: mon |
| 108 | + institute: [NOAA-GFDL] |
| 109 | + long_name: Precipitation |
| 110 | + mip: Amon |
| 111 | + modeling_realm: [atmos] |
| 112 | + preprocessor: preprocessor_name |
| 113 | + project: CMIP5 |
| 114 | + recipe_dataset_index: 1 |
| 115 | + reference_dataset: MPI-ESM-LR |
| 116 | + short_name: pr |
| 117 | + standard_name: precipitation_flux |
| 118 | + start_year: 2000 |
| 119 | + units: kg m-2 s-1 |
| 120 | + variable_group: pr |
| 121 | + ? /path/to/recipe_output/preproc/diagnostic_name/pr/CMIP5_MPI-ESM-LR_Amon_historical_r1i1p1_T2Ms_pr_2000-2002.nc |
| 122 | + : cmor_table: CMIP5 |
| 123 | + dataset: MPI-ESM-LR |
| 124 | + diagnostic: diagnostic_name |
| 125 | + end_year: 2002 |
| 126 | + ensemble: r1i1p1 |
| 127 | + exp: historical |
| 128 | + filename: /path/to/recipe_output/preproc/diagnostic1/pr/CMIP5_MPI-ESM-LR_Amon_historical_r1i1p1_T2Ms_pr_2000-2002.nc |
| 129 | + frequency: mon |
| 130 | + institute: [MPI-M] |
| 131 | + long_name: Precipitation |
| 132 | + mip: Amon |
| 133 | + modeling_realm: [atmos] |
| 134 | + preprocessor: preprocessor_name |
| 135 | + project: CMIP5 |
| 136 | + recipe_dataset_index: 2 |
| 137 | + reference_dataset: MPI-ESM-LR |
| 138 | + short_name: pr |
| 139 | + standard_name: precipitation_flux |
| 140 | + start_year: 2000 |
| 141 | + units: kg m-2 s-1 |
| 142 | + variable_group: pr |
| 143 | +
|
| 144 | +Generic interface between diagnostic and backend |
| 145 | +------------------------------------------------ |
| 146 | +
|
| 147 | +After the diagnostic script has finished running, the backend will try to store provenance information. In order to |
| 148 | +link the produced files to input data, the diagnostic script needs to store a file called diagnostic_provenance.yml |
| 149 | +in it's :code:`run_dir`. |
| 150 | +
|
| 151 | +For output file produced by the diagnostic script, there should be an entry in the diagnostic_provenance.yml file. |
| 152 | +The name of each entry should be the path to the output file. |
| 153 | +Each file entry should at least contain the following items |
| 154 | +
|
| 155 | +- :code:`ancestors` a list of input files used to create the plot |
| 156 | +- :code:`caption` a caption text for the plot |
| 157 | +- :code:`plot_file` if the diagnostic also created a plot file, e.g. in .png format. |
| 158 | +
|
| 159 | +Each file entry can also contain items from the categories defined in the file esmvaltool/config_references.yml. |
| 160 | +The short entries will automatically be replaced by their longer equivalent in the final provenance records. |
| 161 | +It is possible to add custom provenance information by adding custom items to entries. |
| 162 | +
|
| 163 | +An example preprocessor diagnostic_provenance.yml file could look like this |
| 164 | +
|
| 165 | +.. code:: yaml |
| 166 | +
|
| 167 | + ? /path/to/recipe_output/work/diagnostic_name/script_name/CMIP5_GFDL-ESM2G_Amon_historical_r1i1p1_T2Ms_pr_2000-2002_mean.nc |
| 168 | + : ancestors: |
| 169 | + - /path/to/recipe_output/preproc/diagnostic_name/pr/CMIP5_GFDL-ESM2G_Amon_historical_r1i1p1_T2Ms_pr_2000-2002.nc |
| 170 | + authors: [ande_bo, righ_ma] |
| 171 | + caption: Average Precipitation between 2000 and 2002 according to GFDL-ESM2G. |
| 172 | + domains: [global] |
| 173 | + plot_file: /path/to/recipe_output/plots/diagnostic_name/script_name/CMIP5_GFDL-ESM2G_Amon_historical_r1i1p1_T2Ms_pr_2000-2002_mean.png |
| 174 | + plot_type: zonal |
| 175 | + references: [acknow_project] |
| 176 | + statistics: [mean] |
| 177 | + ? /path/to/recipe_output/work/diagnostic_name/script_name/CMIP5_MPI-ESM-LR_Amon_historical_r1i1p1_T2Ms_pr_2000-2002_mean.nc |
| 178 | + : ancestors: |
| 179 | + - /path/to/recipe_output/preproc/diagnostic_name/pr/CMIP5_MPI-ESM-LR_Amon_historical_r1i1p1_T2Ms_pr_2000-2002.nc |
| 180 | + authors: [ande_bo, righ_ma] |
| 181 | + caption: Average Precipitation between 2000 and 2002 according to MPI-ESM-LR. |
| 182 | + domains: [global] |
| 183 | + plot_file: /path/to/recipe_output/plots/diagnostic_name/script_name/CMIP5_MPI-ESM-LR_Amon_historical_r1i1p1_T2Ms_pr_2000-2002_mean.png |
| 184 | + plot_type: zonal |
| 185 | + references: [acknow_project] |
| 186 | + statistics: [mean] |
| 187 | +
|
| 188 | +You can check whether your diagnostic script successfully provided the provenance information to the backend by |
| 189 | +verifying that |
| 190 | +
|
| 191 | +- for each output file in the :code:`work_dir`, a file with the same name, but ending with _provenance.xml is created |
| 192 | +- any NetCDF files created by your diagnostic script contain a 'provenance' global attribute |
| 193 | +- any PNG plots created by your diagnostic script contain the provenance information in the 'Image History' attribute |
0 commit comments