Skip to content

Tutorial extended: Map Alignment, Feature Linking and IDMapper #472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 29, 2025
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions docs/source/user_guide/PSM_to_features.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
Feature Map Annotation with Peptide Identifications
===================================================

A :term:`feature map` is usually obtained from running a :term:`feature finder`, e.g. :py:class:`~.FeatureFinderAlgorithmPicked` (see `Feature Detection <feature_detection.html>`_).
A logical next step is to compare features across runs using `Map Alignment <map_alignment.html>`_ and `Feature Linking <feature_linking.html>`_.
However, most map aligners in pyOpenMS require features which are annotated with PSMs (see `Identication Data <identification_data.html>`_).
To link features to their respective PSMs (as obtained from a search engine, such as Comet), we can use the :py:class:`~.IDMapper`.


Step 0: Download Example data
-----------------------------

.. code-block:: python
:linenos:

import pyopenms as oms
from urllib.request import urlretrieve

base_url = (
"https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master/src/data/"
)

feature_file = "BSA1_F1.featureXML"
urlretrieve(base_url + feature_file, feature_file)

idxml_file = "BSA1_F1.idXML"
urlretrieve(base_url + idxml_file, idxml_file)

Step 1: Load the Feature Map
----------------------------

First, load the FeatureMap from a `.featureXML` file:

.. code-block:: python

import pyopenms as oms

feature_map = oms.FeatureMap()
oms.FeatureXMLFile().load(feature_file, feature_map)

Step 2: Load Peptide Identifications
------------------------------------

Next, load the PeptideIdentifications from an `.idXML` file:

.. code-block:: python

peptide_ids = []
protein_ids = []
oms.IdXMLFile().load(idxml_file, protein_ids, peptide_ids)

Step 3: Initialize and Configure `IDMapper`
-------------------------------------------

Now, configure `IDMapper` to apply **retention time (RT) and m/z tolerance settings**:

.. code-block:: python

id_mapper = oms.IDMapper()
params = id_mapper.getParameters()
params.setValue("rt_tolerance", 5.0) # RT tolerance in seconds
params.setValue("mz_tolerance", 10.0) # m/z tolerance in ppm
id_mapper.setParameters(params)

Step 4: Annotate the FeatureMap
-------------------------------

Use the configured `IDMapper` to link peptide IDs to the FeatureMap:

.. code-block:: python

id_mapper.annotate(feature_map, peptide_ids, protein_ids, use_centroid_rt=True, use_centroid_mz=True, spectra=None)

Step 5: Save the Annotated FeatureMap
--------------------------------------

Finally, store the modified FeatureMap back to a file:

.. code-block:: python

oms.FeatureXMLFile().store("BSA1_F1_annotated.featureXML", feature_map)

.. tip::
You can visualize the annotated FeatureMap using OpenMS visualization tools like `TOPPView`.


You have successfully **annotated a FeatureMap** with PeptideIdentifications using `IDMapper`. This allows further downstream analysis in (py)OpenMS workflows.

111 changes: 99 additions & 12 deletions docs/source/user_guide/feature_linking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,17 @@ are available in pyOpenMS:

- :py:class:`~.FeatureGroupingAlgorithmQT`
- :py:class:`~.FeatureGroupingAlgorithmKD`
- :py:class:`~.FeatureGroupingAlgorithm`
- :py:class:`~.FeatureGroupingAlgorithmLabeled`
- :py:class:`~.FeatureGroupingAlgorithmUnlabeled`

We now perform a features linking using the :py:class:`~.FeatureGroupingAlgorithmQT` algorithm.
We now perform feature linking using the :py:class:`~.FeatureGroupingAlgorithmQT` algorithm.

Download Example Data
*********************

.. code-block:: python

:linenos:

import pyopenms as oms
from urllib.request import urlretrieve

Expand All @@ -47,28 +47,115 @@ Download Example Data
oms.FeatureXMLFile().load(feature_file, feature_map)
feature_maps.append(feature_map)

features Linking Algorithm
Feature Linking Algorithm
******************************************

All :py:class:`~.FeatureMap` objects will be combined in a :py:class:`~.ConsensusMap`.
Feature linking is the process which connects (links) features with similar RT and m/z across different runs.
This enables comparisons of peptide signals across runs and is a prerequisite for downstream statistical analysis.
Thus, all features across all :py:class:`~.FeatureMap` objects will be combined into a :py:class:`~.ConsensusMap`.
Here we use :py:class:`~.FeatureGroupingAlgorithmQT` to find corresponding features with similar RT and m/z across feature maps.
If everything goes well, we will have a lot of triplets in our test run (we have three input maps; one feature from each feature map) within each consensus feature.
Realistically, some features will remain unmatched, forming singletons or doublets (consensus features of size 1 or 2). These undersized consensus features can arise due to several factors. In some cases, the signal may be genuinely absent in the dataset, preventing feature detection. Alternatively, the feature-finding algorithm might fail to identify the feature in all available maps, leading to incomplete linkage. Additionally, suboptimal parameters for feature linking, such as excessive retention time (RT) variations across samples, can contribute to mismatches, further reducing the likelihood of forming larger consensus features. The latter can be corrected for using a process called map alignment, which we addressed in a `previous chapter <map_alignment.html>`_.


.. code-block:: python
:linenos:

consensus_map = oms.ConsensusMap()

# populate the actual cmap using FeatureGroupingAlgorithmQT
feature_grouper = oms.FeatureGroupingAlgorithmQT()
# execute feature linking:
feature_grouper.group(feature_maps, consensus_map)

consensus_map = oms.ConsensusMap()
That's it!
Now, we can print some summary information:


.. code-block:: python
:linenos:

# print some info on the consensus map
print(f"Total number of consensus features: {consensus_map.size()}\n\n")

from collections import defaultdict

def compute_feature_size_stats(consensus_map):
size_counts = defaultdict(int) # Default value for missing keys is 0

for cfeature in consensus_map:
size = len(cfeature.getFeatureList())
size_counts[size] += 1 # No need to check if the key exists

return size_counts

stats = compute_feature_size_stats(consensus_map)
## how many consensus features are there which contain features from 1, 2, or all 3 feature maps?
print("ConsensusFeature Size Distribution:", dict(stats), "\n\n")

for (i, cfeature) in enumerate(consensus_map):
if i > 3:
break
print(
f"ConsensusFeature: average int: {cfeature.getIntensity()}, RT: {cfeature.getRT()}, m/z: {cfeature.getMZ()}"
)
# The two features in map 1 and map 2 represent the same analyte at
# slightly different RT and m/z
for fh in cfeature.getFeatureList():
print(f" -- Feature: map#: {fh.getMapIndex()}, int: {fh.getIntensity()}, RT: {fh.getRT()}")


This then prints:

.. code-block:: output

Map 0: Filename = BSA1_F1.featureXML, Size = 256, UniqueID = 15391035140702131759
Map 1: Filename = BSA2_F1.featureXML, Size = 235, UniqueID = 4409371154349730995
Map 2: Filename = BSA3_F1.featureXML, Size = 204, UniqueID = 16851873442542870272
Total number of consensus features: 474


ConsensusFeature Size Distribution: {3: 55, 2: 111, 1: 308}


ConsensusFeature: average int: 674365.6875, RT: 1925.75138611056, m/z: 387.7128198087733
-- Feature: map#: 0, int: 1432460.0, RT: 1961.42395727216
-- Feature: map#: 1, int: 285228.0, RT: 1914.26950841529
-- Feature: map#: 2, int: 305409.0, RT: 1901.56069264423
ConsensusFeature: average int: 1154167.0, RT: 1787.5601793634498, m/z: 662.2911690419279
-- Feature: map#: 0, int: 834075.0, RT: 1836.92005770327
-- Feature: map#: 1, int: 2119760.0, RT: 1748.02632036743
-- Feature: map#: 2, int: 508666.0, RT: 1777.73416001965
ConsensusFeature: average int: 562620.0, RT: 1650.6520418306134, m/z: 779.7355985805051
-- Feature: map#: 0, int: 661665.0, RT: 1662.1136104186
-- Feature: map#: 1, int: 751397.0, RT: 1616.78250841374
-- Feature: map#: 2, int: 274798.0, RT: 1673.0600066595
ConsensusFeature: average int: 23445466.0, RT: 1877.1572820450535, m/z: 738.310987605286
-- Feature: map#: 0, int: 34988800.0, RT: 1901.57762589719
-- Feature: map#: 1, int: 25276700.0, RT: 1849.00759248543
-- Feature: map#: 2, int: 10070900.0, RT: 1880.88662775254

The results here are not ideal, since we only found 55 features which are present in all three feature maps. The remaining consensus features have a missing value for at least one feature map.
We could tweak the parameters used for grouping, but you should make sure that the, for example, acceptable RT deltas are sensible. There is a tradeoff between sensitivity and specificity here.

Finally, we add some meta-data to the consensus map, which allows us to track the input data later on if we were to store the consensus map to disk.

.. code-block:: python
:linenos:

file_descriptions = consensus_map.getColumnHeaders()

# collect information about input maps
# collect information about input maps ...
for i, feature_map in enumerate(feature_maps):
file_description = file_descriptions.get(i, oms.ColumnHeader())
file_description.filename = feature_map.getDataProcessing()[0].getMetaValue(
"parameter: in"
)[:-5]
file_description.filename = feature_files[i] ## filename
file_description.size = feature_map.size()
file_description.unique_id = feature_map.getUniqueId()
file_descriptions[i] = file_description

# ... and store them in the column headers of the cmap
consensus_map.setColumnHeaders(file_descriptions)
feature_grouper.group(feature_maps, consensus_map)

# print the information on underlying feature maps which we just collected
file_descriptions = consensus_map.getColumnHeaders()
for index, header in file_descriptions.items():
print(f"Map {index}: Filename = {header.filename}, Size = {header.size}, UniqueID = {header.unique_id}")
12 changes: 9 additions & 3 deletions docs/source/user_guide/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ A glossary of common terms used throughout OpenMS documentation.
One consensus map usually contains many consensus features. OpenMS represents a consensus map using
the class `ConsensusMap <https://abibuilder.cs.uni-tuebingen.de/archive/openms/Documentation/nightly/html/classOpenMS_1_1ConsensusMap.html>`_.

de novo
de novo peptide sequencing
A peptide's amino acid sequence is inferred directly from the precursor peptide mass and tandem
mass spectrum (:term:`MS2` or :term:`MS3`) fragment ions, without comparison to a reference proteome.
Expand All @@ -45,6 +46,10 @@ A glossary of common terms used throughout OpenMS documentation.
features
A feature, in the OpenMS terminology, subsumes all m/z signals originating from a single compound at a
certain charge state. This includes the isotope pattern and usually spans multiple spectra in retention time (the elution profile).

feature finder
FeatureFinder
A FeatureFinder in pyOpenMS creates a :term:`feature map` using the MS1 spectra from a :term:`peak map` (e.g. from an mzML file) as input.

feature maps
feature map
Expand Down Expand Up @@ -154,9 +159,10 @@ A glossary of common terms used throughout OpenMS documentation.

peptide-spectrum match
PSM
A method used in proteomics to identify proteins from a complex mixture. Involves comparing the
mass spectra of peptide fragments generated from a protein sample with a database of predicted
spectra, in order to identify the protein that produced the observed peptides.
PSMs
A peptide-spectrum match associates a peptide sequence (possibly including modifications) to an MS/MS spectrum.
This usually involves comparing the mass spectra of peptide fragments generated from a digested protein sample with a database of predicted
spectra. Alternatively, this can be done using :term:`de novo` techniques (without a database, just using observed spectra).

PepNovo
PepNovo is a de :term:`de novo peptide sequencing` algorithm for :term:`MS2` spectra.
Expand Down
Binary file removed docs/source/user_guide/img/map_alignment.png
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/source/user_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ headings and structure.
spectrum_merging
charge_isotope_deconvolution
feature_detection
PSM_to_features
map_alignment
adduct_detection
feature_linking
Expand Down
Loading
Loading