OpenMS · cbielow · Apr 29, 2025 · Apr 11, 2025 · Apr 11, 2025 · Apr 29, 2025
diff --git a/docs/source/user_guide/PSM_to_features.rst b/docs/source/user_guide/PSM_to_features.rst
@@ -0,0 +1,88 @@
+Feature Map Annotation with Peptide Identifications
+===================================================
+
+A :term:`feature map` is usually obtained from running a :term:`feature finder`, e.g. :py:class:`~.FeatureFinderAlgorithmPicked` (see `Feature Detection <feature_detection.html>`_).
+A logical next step is to compare features across runs using `Map Alignment <map_alignment.html>`_ and `Feature Linking <feature_linking.html>`_.
+However, most map aligners in pyOpenMS require features which are annotated with PSMs (see `Identication Data <identification_data.html>`_).
+To link features to their respective PSMs (as obtained from a search engine, such as Comet), we can use the :py:class:`~.IDMapper`.
+
+
+Step 0: Download Example data
+-----------------------------
+
+.. code-block:: python
+    :linenos:
+
+    import pyopenms as oms
+    from urllib.request import urlretrieve
+
+    base_url = (
+        "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master/src/data/"
+    )
+
+    feature_file = "BSA1_F1.featureXML"
+    urlretrieve(base_url + feature_file, feature_file)
+
+    idxml_file = "BSA1_F1.idXML"
+    urlretrieve(base_url + idxml_file, idxml_file)
+
+Step 1: Load the Feature Map
+----------------------------
+
+First, load the FeatureMap from a `.featureXML` file:
+
+.. code-block:: python
+
+   import pyopenms as oms
+
+   feature_map = oms.FeatureMap()
+   oms.FeatureXMLFile().load(feature_file, feature_map)
+
+Step 2: Load Peptide Identifications
+------------------------------------
+
+Next, load the PeptideIdentifications from an `.idXML` file:
+
+.. code-block:: python
+
+   peptide_ids = []
+   protein_ids = []
+   oms.IdXMLFile().load(idxml_file, protein_ids, peptide_ids)
+
+Step 3: Initialize and Configure `IDMapper`
+-------------------------------------------
+
+Now, configure `IDMapper` to apply **retention time (RT) and m/z tolerance settings**:
+
+.. code-block:: python
+
+   id_mapper = oms.IDMapper()
+   params = id_mapper.getParameters()
+   params.setValue("rt_tolerance", 5.0)  # RT tolerance in seconds
+   params.setValue("mz_tolerance", 10.0)  # m/z tolerance in ppm
+   id_mapper.setParameters(params)
+
+Step 4: Annotate the FeatureMap
+-------------------------------
+
+Use the configured `IDMapper` to link peptide IDs to the FeatureMap:
+
+.. code-block:: python
+
+   id_mapper.annotate(feature_map, peptide_ids, protein_ids, use_centroid_rt=True, use_centroid_mz=True, spectra=None)
+
+Step 5: Save the Annotated FeatureMap
+--------------------------------------
+
+Finally, store the modified FeatureMap back to a file:
+
+.. code-block:: python
+
+   oms.FeatureXMLFile().store("BSA1_F1_annotated.featureXML", feature_map)
+
+.. tip::
+   You can visualize the annotated FeatureMap using OpenMS visualization tools like `TOPPView`.
+
+
+You have successfully **annotated a FeatureMap** with PeptideIdentifications using `IDMapper`. This allows further downstream analysis in (py)OpenMS workflows.
+
diff --git a/docs/source/user_guide/feature_linking.rst b/docs/source/user_guide/feature_linking.rst
@@ -14,17 +14,17 @@ are available in pyOpenMS:
 
 - :py:class:`~.FeatureGroupingAlgorithmQT`
 - :py:class:`~.FeatureGroupingAlgorithmKD`
-- :py:class:`~.FeatureGroupingAlgorithm`
 - :py:class:`~.FeatureGroupingAlgorithmLabeled`
 - :py:class:`~.FeatureGroupingAlgorithmUnlabeled`
 
-We now perform a features linking using the :py:class:`~.FeatureGroupingAlgorithmQT` algorithm.
+We now perform feature linking using the :py:class:`~.FeatureGroupingAlgorithmQT` algorithm.
 
 Download Example Data
 *********************
 
 .. code-block:: python
-
+    :linenos:
+
     import pyopenms as oms
     from urllib.request import urlretrieve
 
@@ -47,28 +47,115 @@ Download Example Data
         oms.FeatureXMLFile().load(feature_file, feature_map)
         feature_maps.append(feature_map)
 
-features Linking Algorithm
+Feature Linking Algorithm
 ******************************************
 
-All :py:class:`~.FeatureMap` objects will be combined in a :py:class:`~.ConsensusMap`.
+Feature linking is the process which connects (links) features with similar RT and m/z across different runs. 
+This enables comparisons of peptide signals across runs and is a prerequisite for downstream statistical analysis.
+Thus, all features across all :py:class:`~.FeatureMap` objects will be combined into a :py:class:`~.ConsensusMap`.
+Here we use :py:class:`~.FeatureGroupingAlgorithmQT` to find corresponding features with similar RT and m/z across feature maps.
+If everything goes well, we will have a lot of triplets in our test run (we have three input maps; one feature from each feature map) within each consensus feature.
+Realistically, some features will remain unmatched, forming singletons or doublets (consensus features of size 1 or 2). These undersized consensus features can arise due to several factors. In some cases, the signal may be genuinely absent in the dataset, preventing feature detection. Alternatively, the feature-finding algorithm might fail to identify the feature in all available maps, leading to incomplete linkage. Additionally, suboptimal parameters for feature linking, such as excessive retention time (RT) variations across samples, can contribute to mismatches, further reducing the likelihood of forming larger consensus features. The latter can be corrected for using a process called map alignment, which we addressed in a `previous chapter <map_alignment.html>`_.
+
 
 .. code-block:: python
+    :linenos:
+
+    consensus_map = oms.ConsensusMap()
 
+    # populate the actual cmap using FeatureGroupingAlgorithmQT
     feature_grouper = oms.FeatureGroupingAlgorithmQT()
+    # execute feature linking:
+    feature_grouper.group(feature_maps, consensus_map)
 
-    consensus_map = oms.ConsensusMap()
+That's it!
+Now, we can print some summary information:
+
+
+.. code-block:: python
+    :linenos:
+
+    # print some info on the consensus map
+    print(f"Total number of consensus features: {consensus_map.size()}\n\n")
+
+    from collections import defaultdict
+
+    def compute_feature_size_stats(consensus_map):
+        size_counts = defaultdict(int)  # Default value for missing keys is 0
+
+        for cfeature in consensus_map:
+            size = len(cfeature.getFeatureList())
+            size_counts[size] += 1  # No need to check if the key exists
+
+        return size_counts
+
+    stats = compute_feature_size_stats(consensus_map)
+    ## how many consensus features are there which contain features from 1, 2, or all 3 feature maps?
+    print("ConsensusFeature Size Distribution:", dict(stats), "\n\n")
+
+    for (i, cfeature) in enumerate(consensus_map):
+        if i > 3:
+            break
+        print(
+            f"ConsensusFeature: average int: {cfeature.getIntensity()}, RT: {cfeature.getRT()}, m/z: {cfeature.getMZ()}"
+        )
+        # The two features in map 1 and map 2 represent the same analyte at
+        # slightly different RT and m/z
+        for fh in cfeature.getFeatureList():
+            print(f" -- Feature: map#: {fh.getMapIndex()}, int: {fh.getIntensity()}, RT: {fh.getRT()}")
+
+
+This then prints:
+
+.. code-block:: output
+
+    Map 0: Filename = BSA1_F1.featureXML, Size = 256, UniqueID = 15391035140702131759
+    Map 1: Filename = BSA2_F1.featureXML, Size = 235, UniqueID = 4409371154349730995
+    Map 2: Filename = BSA3_F1.featureXML, Size = 204, UniqueID = 16851873442542870272
+    Total number of consensus features: 474
+
+
+    ConsensusFeature Size Distribution: {3: 55, 2: 111, 1: 308} 
+
+
+    ConsensusFeature: average int: 674365.6875, RT: 1925.75138611056, m/z: 387.7128198087733
+     -- Feature: map#: 0, int: 1432460.0, RT: 1961.42395727216
+     -- Feature: map#: 1, int: 285228.0, RT: 1914.26950841529
+     -- Feature: map#: 2, int: 305409.0, RT: 1901.56069264423
+    ConsensusFeature: average int: 1154167.0, RT: 1787.5601793634498, m/z: 662.2911690419279
+     -- Feature: map#: 0, int: 834075.0, RT: 1836.92005770327
+     -- Feature: map#: 1, int: 2119760.0, RT: 1748.02632036743
+     -- Feature: map#: 2, int: 508666.0, RT: 1777.73416001965
+    ConsensusFeature: average int: 562620.0, RT: 1650.6520418306134, m/z: 779.7355985805051
+     -- Feature: map#: 0, int: 661665.0, RT: 1662.1136104186
+     -- Feature: map#: 1, int: 751397.0, RT: 1616.78250841374
+     -- Feature: map#: 2, int: 274798.0, RT: 1673.0600066595
+    ConsensusFeature: average int: 23445466.0, RT: 1877.1572820450535, m/z: 738.310987605286
+     -- Feature: map#: 0, int: 34988800.0, RT: 1901.57762589719
+     -- Feature: map#: 1, int: 25276700.0, RT: 1849.00759248543
+     -- Feature: map#: 2, int: 10070900.0, RT: 1880.88662775254
+
+The results here are not ideal, since we only found 55 features which are present in all three feature maps. The remaining consensus features have a missing value for at least one feature map.
+We could tweak the parameters used for grouping, but you should make sure that the, for example, acceptable RT deltas are sensible. There is a tradeoff between sensitivity and specificity here.
+
+Finally, we add some meta-data to the consensus map, which allows us to track the input data later on if we were to store the consensus map to disk.
+
+.. code-block:: python
+    :linenos:
 
     file_descriptions = consensus_map.getColumnHeaders()
 
-    # collect information about input maps
+    # collect information about input maps ...
     for i, feature_map in enumerate(feature_maps):
         file_description = file_descriptions.get(i, oms.ColumnHeader())
-        file_description.filename = feature_map.getDataProcessing()[0].getMetaValue(
-            "parameter: in"
-        )[:-5]
+        file_description.filename = feature_files[i]  ## filename
         file_description.size = feature_map.size()
         file_description.unique_id = feature_map.getUniqueId()
         file_descriptions[i] = file_description
-
+    # ... and store them in the column headers of the cmap
     consensus_map.setColumnHeaders(file_descriptions)
-    feature_grouper.group(feature_maps, consensus_map)
+
+    # print the information on underlying feature maps which we just collected
+    file_descriptions = consensus_map.getColumnHeaders()
+    for index, header in file_descriptions.items():
+        print(f"Map {index}: Filename = {header.filename}, Size = {header.size}, UniqueID = {header.unique_id}")
diff --git a/docs/source/user_guide/glossary.rst b/docs/source/user_guide/glossary.rst
@@ -28,6 +28,7 @@ A glossary of common terms used throughout OpenMS documentation.
         One consensus map usually contains many consensus features. OpenMS represents a consensus map using 
         the class `ConsensusMap <https://abibuilder.cs.uni-tuebingen.de/archive/openms/Documentation/nightly/html/classOpenMS_1_1ConsensusMap.html>`_.
 
+    de novo
     de novo peptide sequencing
         A peptide's amino acid sequence is inferred directly from the precursor peptide mass and tandem
         mass spectrum (:term:`MS2` or :term:`MS3`) fragment ions, without comparison to a reference proteome.
@@ -45,6 +46,10 @@ A glossary of common terms used throughout OpenMS documentation.
     features
         A feature, in the OpenMS terminology, subsumes all m/z signals originating from a single compound at a 
         certain charge state. This includes the isotope pattern and usually spans multiple spectra in retention time (the elution profile).
+
+    feature finder
+    FeatureFinder
+        A FeatureFinder in pyOpenMS creates a :term:`feature map` using the MS1 spectra from a :term:`peak map` (e.g. from an mzML file) as input.
 
     feature maps
     feature map
@@ -154,9 +159,10 @@ A glossary of common terms used throughout OpenMS documentation.
 
     peptide-spectrum match
     PSM
-        A method used in proteomics to identify proteins from a complex mixture. Involves comparing the
-        mass spectra of peptide fragments generated from a protein sample with a database of predicted
-        spectra, in order to identify the protein that produced the observed peptides.
+    PSMs
+        A peptide-spectrum match associates a peptide sequence (possibly including modifications) to an MS/MS spectrum.
+        This usually involves comparing the mass spectra of peptide fragments generated from a digested protein sample with a database of predicted
+        spectra. Alternatively, this can be done using :term:`de novo` techniques (without a database, just using observed spectra).
 
     PepNovo
         PepNovo is a de :term:`de novo peptide sequencing` algorithm for :term:`MS2` spectra.

diff --git a/docs/source/user_guide/img/map_alignment.png b/docs/source/user_guide/img/map_alignment.png
diff --git a/docs/source/user_guide/img/map_alignment_fmaps.png b/docs/source/user_guide/img/map_alignment_fmaps.png
diff --git a/docs/source/user_guide/img/map_alignment_trafos.png b/docs/source/user_guide/img/map_alignment_trafos.png
diff --git a/docs/source/user_guide/index.rst b/docs/source/user_guide/index.rst
@@ -46,6 +46,7 @@ headings and structure.
    spectrum_merging
    charge_isotope_deconvolution
    feature_detection
+   PSM_to_features
    map_alignment
    adduct_detection
    feature_linking