rework 7 tutorials

cbielow · cbielow · commit b49bae7133a3 · 2025-02-27T14:42:23.000+01:00
diff --git a/docs/source/user_guide/adduct_detection.rst b/docs/source/user_guide/adduct_detection.rst
@@ -3,9 +3,10 @@ Adduct Detection
 
 In mass spectrometry it is crucial to ionize analytes prior to detection, because they are accelerated and manipulated in electric fields, allowing their separation based on mass-to-charge ratio.
 This happens by addition of protons in positive mode or loss of protons in negative mode. Other ions present in the buffer solution can ionize the analyte as well, e.g. sodium, potassium or formic acid.
-Depending on the size and chemical compsition, multiple adducts can bind leading to multiple charges on the analyte. In metabolomics with smaller analytes the number of charges is typically low with one or two, whereas in proteomics the number of charges is much higher.
+Depending on the size and chemical compsition, multiple adducts can bind leading to multiple charges on the analyte. In metabolomics with smaller analytes the number of charges is typically low with one or two, whereas in proteomics the number of charges is potentially higher.
+
 Furthermore, analytes can loose functional groups during ionization, e.g. a neutral water loss.
-Since the ionization happens after liquid chromatography, different adducts for an analyte have similar retention times.
+Since the ionization happens after liquid chromatography, different adducts for an analyte have almost identical retention times.
 
 .. image:: img/adduct_detection.png
 
diff --git a/docs/source/user_guide/charge_isotope_deconvolution.rst b/docs/source/user_guide/charge_isotope_deconvolution.rst
@@ -3,65 +3,99 @@ Charge and Isotope Deconvolution
 
 A single mass spectrum contains measurements of one or more analytes and the
 m/z values recorded for these analytes. Most analytes produce multiple signals
-in the mass spectrometer, due to the natural abundance of carbon :math:`13` (naturally
-occurring at ca. :math:`1\%` frequency) and the large amount of carbon atoms in most
-organic molecules, most analytes produce a so-called isotopic pattern with a
-monoisotopic peak (all carbon are :chem:`^{12}C`) and a first isotopic peak (exactly one
-carbon atom is a :chem:`^{13}C`), a second isotopic peak (exactly two atoms are :chem:`^{13}C`) etc.
-Note that also other elements can contribute to the isotope pattern, see the 
-`chemistry section <chemistry.html>`_ for further details.
+in the mass spectrometer, due to the natural abundance of heavy isotopes.
+The most dominant isotope in proteins is carbon :math:`13` (naturally
+occurring at ca. :math:`1.1\%` frequency). Other elements such as Hydrogen also have heavy isotopes, but
+they contribute to a much lesser extend, since the heavy isotopes are very low abundant, 
+e.g. hydrogen :math:`2` (Deuterium), occurs at a frequency of only :math:`0.0156\%`.
+
+All analytes produce a so-called isotopic pattern, consisting of a
+monoisotopic peak and a first isotopic peak (exactly one
+extra neutron in one of the atoms of the molecule), a second isotopic peak (exactly two extra neutrons) etc.
+With higher mass, the monoisotopic peak will become dimishingly small, up to the point where it is not detectable
+any more (although this is rarely the case with peptides and more problematic for whole proteins in Top-Down approaches).
+
+By definition, the monoisotopic peak is the peak which contains only isotopes of the most abundant type.
+For peptides and proteins, the constituent atoms are C,H,N,O,P,S, where incidentally, the
+most abundant isotope is also the lightest isotope. Hence, for peptides and proteins, the monoisotopic peak is always 
+the lightest peak in an isotopic distribution.
+
+See the `chemistry section <chemistry.html>`_ for further details on isotope abundances and how to compute isotope patterns.
 
 In addition, each analyte may appear in more than one charge state and adduct
-state, a singly charge analyte :chem:`[M +H]+` may be accompanied by a doubly
+state, a singly charged analyte :chem:`[M +H]+` may be accompanied by a doubly
 charged analyte :chem:`[M +2H]++` or a sodium adduct :chem:`[M +Na]+`. In the case of a
-multiply charged peptide, the isotopic traces are spaced by ``PROTON_MASS /
+multiply charged peptide, the isotopic traces are spaced by ``NEUTRON_MASS /
 charge_state`` which is often close to :math:`0.5\ m/z` for doubly charged analytes,
 :math:`0.33\ m/z` for triply charged analytes etc. Note: tryptic peptides often appear
-at least doubly charged, while small molecules often carry a single charge but
-can have adducts other than hydrogen.
+either singly charged (when ionized with  :term:`MALDI`), or doubly charged (when ionized with  :term:`ESI`).
+Higher charges are also possible, but usually connected to incomplete tryptic digestions with missed cleavages.
+Small molecules in metabolomics often carry a single charge but can have adducts other than hydrogen.
 
 Single Peak Example
 *********************************
 
+Let's compute the isotope distribution of the peptide ``DFPIANGER`` using the classes :py:class:`~.AASequence` and 
+:py:class:`~.EmpiricalFormula`. Then we use the :py:class:`~.Deisotoper` to find the monoisotopic peak:
+
 .. code-block:: python
     :linenos:
 
     import pyopenms as oms
 
-    charge = 2
     seq = oms.AASequence.fromString("DFPIANGER")
+    print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
+
+    ## get isotopic distribution for two additional hydrogens (which carry the charge)
+    charge = 2
     seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge))
     isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(6))
-    print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
 
     # Append isotopic distribution to spectrum
     s = oms.MSSpectrum()
-    for iso in isotopes.getContainer():
-        iso.setMZ(iso.getMZ() / charge)
+    for iso in isotopes.getContainer():  # the container contains masses, not m/z!
+        iso.setMZ(iso.getMZ() / charge) #  ... even though it's called '.getMZ()'
         s.push_back(iso)
         print("Isotope", iso.getMZ(), ":", iso.getIntensity())
 
+    # deisotope with 10 ppm mass tolerance
     oms.Deisotoper.deisotopeAndSingleChargeDefault(s, 10, True)
 
     for p in s:
-        print(p.getMZ(), p.getIntensity())
+        print("Mono peaks:", p.getMZ(), p.getIntensity())
+
+which will print:
+
+
+.. code-block:: output
+    :linenos:
+    
+    [M+H]+ weight: 1018.495240604071
+    Isotope 509.75180710055 : 0.5680345296859741
+    Isotope 510.25348451945 : 0.3053518533706665
+    Isotope 510.75516193835 : 0.09806874394416809
+    Isotope 511.25683935725004 : 0.023309258744120598
+    Isotope 511.75851677615003 : 0.0044969217851758
+    Isotope 512.2601941950501 : 0.000738693168386817
+    Mono peaks: 1018.496337734329 0.5680345296859741
 
 
 Note that the algorithm presented here as some heuristics built into it, such
 as assuming that the isotopic peaks will decrease after the first isotopic
-peak. This heuristic can be tuned by changing the parameter
-``use_decreasing_model`` and ``start_intensity_check``. In this case, the
-second isotopic peak  is the highest in intensity and the
-``start_intensity_check`` parameter needs to be set to 3. 
+peak. This heuristic can be tuned by setting the parameter
+``use_decreasing_model`` to ``False``.
+For more fine-grained control use ``start_intensity_check`` and leave ``use_decreasing_model = True`` (see :py:class:`~.Deisotoper` --> C++ documentation).
+Let's look at a very heavy peptide, whose isotopic distribution is dominated by the first and second isotopic peak.
 
 .. code-block:: python
     :linenos:
 
-    charge = 4
     seq = oms.AASequence.fromString("DFPIANGERDFPIANGERDFPIANGERDFPIANGER")
+    print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
+
+    charge = 4
     seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge))
     isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(8))
-    print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
 
     # Append isotopic distribution to spectrum
     s = oms.MSSpectrum()
@@ -73,9 +107,9 @@ second isotopic peak  is the highest in intensity and the
     min_charge = 1
     min_isotopes = 2
     max_isotopes = 10
-    use_decreasing_model = True
-    start_intensity_check = 3
-    oms.Deisotoper.deisotopeAndSingleCharge(
+    use_decreasing_model = True   # ignores all intensities
+    start_intensity_check = 3     # here, the value does not matter, since we ignore intensities (see above)
+    oms.Deisotoper.deisotopeAndSingleCharge( ## a function with all parameters exposed
         s,
         10,
         True,
@@ -92,8 +126,23 @@ second isotopic peak  is the highest in intensity and the
         False,
     )
     for p in s:
-        print(p.getMZ(), p.getIntensity())
+        print("Mono peaks:", p.getMZ(), p.getIntensity())
 
+.. code-block:: output
+    :linenos:
+        
+    [M+H]+ weight: 4016.927437824572
+    Isotope 1004.9878653713499 : 0.10543462634086609
+    Isotope 1005.2387040808 : 0.22646738588809967
+    Isotope 1005.48954279025 : 0.25444599986076355
+    Isotope 1005.7403814996999 : 0.19825772941112518
+    Isotope 1005.9912202091499 : 0.12000058591365814
+    Isotope 1006.2420589185999 : 0.05997777357697487
+    Isotope 1006.49289762805 : 0.025713207200169563
+    Isotope 1006.7437363375 : 0.009702674113214016
+    Mono peaks: 4016.9296320850867 0.10543462634086609
+
+This successfully recovers the monoisotopic peak, even though it is not the most abundant peak.
 
 Full Spectral De-Isotoping
 **************************
@@ -147,7 +196,15 @@ state:
         if p.getIntensity() > 0.25 * maxvalue:
             print(p.getMZ(), p.getIntensity())
 
-
+    unpicked_peak_data = e[214].get_peaks()
+    plt.bar(unpicked_peak_data[0], unpicked_peak_data[1], snap=False)
+    plt.show()
+    
+    picked_peak_data = s.get_peaks()
+    plt.bar(picked_peak_data[0], picked_peak_data[1], snap=False)
+    plt.show()
+    
+    
 which produces the following output
 
 .. code-block:: output
@@ -159,7 +216,7 @@ which produces the following output
   974.4589691256419 3215808.75
 
 As we can see, the algorithm has reduced :math:`140` peaks to :math:`41` deisotoped peaks. It
-also has identified a molecule at :math:`974.45\ m/z` as the most intense peak in the
+also has identified a molecule with a singly charged mass of  :math:`974.45\ Da` as the most intense peak in the
 data (base peak).
 
 Visualization
diff --git a/docs/source/user_guide/chemistry.rst b/docs/source/user_guide/chemistry.rst
@@ -5,6 +5,8 @@ OpenMS has representations for various chemical concepts including molecular
 formulas, isotopes, ribonucleotide and amino acid sequences as well as common
 modifications of amino acids or ribonucleotides.
 
+For an introduction to isotope patterns, see `Charge and Isotope Deconvolution <charge_isotope_deconvolution.html>`_.
+
 Constants
 ---------
 
diff --git a/docs/source/user_guide/feature_detection.rst b/docs/source/user_guide/feature_detection.rst
@@ -3,37 +3,38 @@ Feature Detection
 
 One very common task in mass spectrometry is the detection of 2-dimensional
 patterns in m/z and time (RT) dimension from a series of :term:`MS1` scans. These
-patterns are called ``Features`` and they exhibit a chromatographic elution
+patterns are called a term:`Feature` and they exhibit a chromatographic elution
 profile in the time dimension and an isotopic pattern in the m/z dimension (see
-`previous section <deisotoping.html>`_ for the 1-dimensional problem).
+`previous section <charge_isotope_deconvolution.html>`_ for the 1-dimensional problem).
+
 OpenMS has multiple tools that can identify these features in 2-dimensional
-data, these tools are called :py:class:`~.FeatureFinder`.  Currently the following
+data, these tools are called ``FeatureFinder``.  Currently the following
 FeatureFinders are available in pyOpenMS:
 
-  - :py:class:`~.FeatureFinderMultiplexAlgorithm` (e.g., :term:`SILAC`, Dimethyl labeling, (and label-free), identification free feature detection of peptides)
   - :py:class:`~.FeatureFinderAlgorithmPicked` (Label-free, identification free feature detection of peptides)
   - :py:class:`~.FeatureFinderIdentificationAlgorithm` (Label-free identification-guided feature detection of peptides)
+  - :py:class:`~.FeatureFinderMultiplexAlgorithm` (e.g., :term:`SILAC`, Dimethyl labeling, (and label-free), identification free feature detection of peptides)
   - :py:class:`~.FeatureFindingMetabo` (Label-free, identification free feature detection of metabolites)
   - :py:class:`~.FeatureFinderAlgorithmMetaboIdent` (Label-free, identification guided feature detection of metabolites)
 
-All of the algorithms above are for proteomics data with the exception of :py:class:`~.FeatureFindingMetabo` and :py:class:`~.FeatureFinderMetaboIdentCompound` for metabolomics data and small molecules in general.
+All of the algorithms above are for proteomics data with the exception of :py:class:`~.FeatureFindingMetabo` and :py:class:`~.FeatureFinderAlgorithmMetaboIdent` for metabolomics data and small molecules in general.
 
 Proteomics
 ******************************
 
-Two of the most commonly used feature finders for proteomics in OpenMS are the :py:class:`~.FeatureFinder` and :py:class:`~.FeatureFinderIdentificationAlgorithm` which both work on (high
-resolution) centroided data. We can use the following code to find features in MS data:
+Two of the most commonly used feature finders for proteomics in OpenMS are the :py:class:`~.FeatureFinderAlgorithmPicked`, :py:class:`~.FeatureFinderMultiplexAlgorithm` and :py:class:`~.FeatureFinderIdentificationAlgorithm` which all work on (high
+resolution) centroided data (FeatureFinderMultiplexAlgorithm can also work on profile data). We can use the following code to find features in MS data:
 
 .. code-block:: python
 
   from urllib.request import urlretrieve
+  import pyopenms as oms
 
   gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master"
   urlretrieve(
       gh + "/src/data/FeatureFinderCentroided_1_input.mzML", "feature_test.mzML"
   )
 
-  import pyopenms as oms
 
   # Prepare data loading (save memory by only
   # loading MS1 spectra into memory)
@@ -47,32 +48,30 @@ resolution) centroided data. We can use the following code to find features in M
   fh.load("feature_test.mzML", input_map)
   input_map.updateRanges()
 
-  ff = oms.FeatureFinder()
-  ff.setLogType(oms.LogType.CMD)
+  ff = oms.FeatureFinderAlgorithmPicked()
 
   # Run the feature finder
-  name = "centroided"
-  features = oms.FeatureMap()
-  seeds = oms.FeatureMap()
-  params = oms.FeatureFinder().getParameters(name)
-  ff.run(name, input_map, features, params, seeds)
+  out_features = oms.FeatureMap()  ## our result
+  seeds = oms.FeatureMap()     ## optional: you can provide seeds where FF should take place -- not used here
+  params = ff.getParameters(); ## we do not modify params for now
+  ff.run(input_map, out_features, params, seeds)
 
-  features.setUniqueIds()
+  out_features.setUniqueIds()
   fh = oms.FeatureXMLFile()
-  fh.store("output.featureXML", features)
-  print("Found", features.size(), "features")
+  fh.store("output.featureXML", out_features)
+  print("Found", out_features.size(), "features")
 
 With a few lines of Python, we are able to run powerful algorithms available in
 OpenMS. The resulting data is held in memory (a :py:class:`~.FeatureMap` object) and can be
-inspected directly using the ``help(features)`` comment. It reveals that the
+inspected directly using the ``help(out_features)`` comment. It reveals that the
 object supports iteration (through the ``__iter__`` function) as well as direct
 access (through the ``__getitem__`` function). This means we write code that uses direct access and iteration in
 Python as follows:
 
 .. code-block:: python
 
-  f0 = features[0]
-  for f in features:
+  f0 = out_features[0]
+  for f in out_features:
       print(f.getRT(), f.getMZ())
 
 
@@ -82,7 +81,7 @@ inspecting ``help(f)`` or by consulting the manual.
 
 Note: the output file that we have written (``output.featureXML``) is an
 OpenMS-internal XML format for storing features. You can learn more about file
-formats in the `Reading MS data formats <other_file_handling.html>`_ section.
+formats in the `Reading MS data formats <other_ms_data_formats.html>`_ section.
 
 Metabolomics - Untargeted
 *************************
@@ -239,16 +238,18 @@ Now we can use the following code to detect features with :py:class:`~.FeatureFi
   # save FeatureMap to file
   oms.FeatureXMLFile().store("detected_features.featureXML", fm)
 
-Note: the output file that we have written (``output.featureXML``) is an
+Note: the output file that we have written (``detected_features.featureXML``) is an
 OpenMS-internal XML format for storing features. You can learn more about file
-formats in the `Reading MS data formats <other_file_handling.html>`_ section.
+formats in the `Reading MS data formats <other_ms_data_formats.html>`_ section.
 
 We can get a quick overview on the detected features by plotting them using the following function:
 
 .. code-block:: python
     :linenos:
 
     import matplotlib.pyplot as plt
+    import matplotlib.colors as mcolors
+    import itertools
 
     def plotDetectedFeatures3D(path_to_featureXML):
       fm = oms.FeatureMap()
@@ -258,8 +259,9 @@ We can get a quick overview on the detected features by plotting them using the
       fig = plt.figure()
       ax = fig.add_subplot(111, projection="3d")
 
-      for feature in fm:
-          color = next(ax._get_lines.prop_cycler)["color"]
+      cycled_colors = itertools.cycle(['red', 'green', 'blue', 'orange', 'purple', 'yellow', 'cyan', 'magenta', 'black', 'gray'])
+      
+      for feature, color in zip(fm, cycled_colors):
           # chromatogram data is stored in the subordinates of the feature
           for i, sub in enumerate(feature.getSubordinates()):
               retention_times = [
@@ -268,7 +270,7 @@ We can get a quick overview on the detected features by plotting them using the
               intensities = [
                   int(y[1]) for y in sub.getConvexHulls()[0].getHullPoints()
               ]
-              mz = sub.getMetaValue("MZ")
+              mz = sub.getMZ()
               ax.plot(retention_times, intensities, zs=mz, zdir="x", color=color)
               if i == 0:
                   ax.text(
@@ -284,4 +286,6 @@ We can get a quick overview on the detected features by plotting them using the
       ax.set_zlabel("intensity (cps)")
       plt.show()
 
+    plotDetectedFeatures3D("detected_features.featureXML")
+
 .. image:: img/ffmid_graph.png
diff --git a/docs/source/user_guide/map_alignment.rst b/docs/source/user_guide/map_alignment.rst
@@ -1,7 +1,7 @@
 Map Alignment
 ===============
 
-The pyOpenMS map alignment algorithms transform different maps (peak maps, :term:`feature maps`) to a common retention time axis.
+The pyOpenMS map alignment algorithms transform different maps (:term:`peak maps`, :term:`feature maps`) to a common retention time axis.
 
 .. image:: img/map_alignment_illustration.png
 
@@ -12,7 +12,6 @@ Different map alignment algorithms are available in pyOpenMS:
 
 - :py:class:`~.MapAlignmentAlgorithmPoseClustering`
 - :py:class:`~.MapAlignmentAlgorithmIdentification`
-- :py:class:`~.MapAlignmentAlgorithmSpectrumAlignment`
 - :py:class:`~.MapAlignmentAlgorithmKD`
 - :py:class:`~.MapAlignmentTransformer`
 
diff --git a/docs/source/user_guide/parameter_handling.rst b/docs/source/user_guide/parameter_handling.rst
diff --git a/docs/source/user_guide/spectrum_merging.rst b/docs/source/user_guide/spectrum_merging.rst