Skip to content

Commit b9bcad8

Browse files
authored
Merge pull request #472 from cbielow/new_tut
Tutorial extended: Map Alignment, Feature Linking and IDMapper
2 parents 3a5fa7b + 5a56a84 commit b9bcad8

13 files changed

+24243
-8551
lines changed
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
Feature Map Annotation with Peptide Identifications
2+
===================================================
3+
4+
A :term:`feature map` is usually obtained from running a :term:`feature finder`, e.g. :py:class:`~.FeatureFinderAlgorithmPicked` (see `Feature Detection <feature_detection.html>`_).
5+
A logical next step is to compare features across runs using `Map Alignment <map_alignment.html>`_ and `Feature Linking <feature_linking.html>`_.
6+
However, most map aligners in pyOpenMS require features which are annotated with PSMs (see `Identication Data <identification_data.html>`_).
7+
To link features to their respective PSMs (as obtained from a search engine, such as Comet), we can use the :py:class:`~.IDMapper`.
8+
9+
10+
Step 0: Download Example data
11+
-----------------------------
12+
13+
.. code-block:: python
14+
:linenos:
15+
16+
import pyopenms as oms
17+
from urllib.request import urlretrieve
18+
19+
base_url = (
20+
"https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master/src/data/"
21+
)
22+
23+
feature_file = "BSA1_F1.featureXML"
24+
urlretrieve(base_url + feature_file, feature_file)
25+
26+
idxml_file = "BSA1_F1.idXML"
27+
urlretrieve(base_url + idxml_file, idxml_file)
28+
29+
Step 1: Load the Feature Map
30+
----------------------------
31+
32+
First, load the FeatureMap from a `.featureXML` file:
33+
34+
.. code-block:: python
35+
36+
import pyopenms as oms
37+
38+
feature_map = oms.FeatureMap()
39+
oms.FeatureXMLFile().load(feature_file, feature_map)
40+
41+
Step 2: Load Peptide Identifications
42+
------------------------------------
43+
44+
Next, load the PeptideIdentifications from an `.idXML` file:
45+
46+
.. code-block:: python
47+
48+
peptide_ids = []
49+
protein_ids = []
50+
oms.IdXMLFile().load(idxml_file, protein_ids, peptide_ids)
51+
52+
Step 3: Initialize and Configure `IDMapper`
53+
-------------------------------------------
54+
55+
Now, configure `IDMapper` to apply **retention time (RT) and m/z tolerance settings**:
56+
57+
.. code-block:: python
58+
59+
id_mapper = oms.IDMapper()
60+
params = id_mapper.getParameters()
61+
params.setValue("rt_tolerance", 5.0) # RT tolerance in seconds
62+
params.setValue("mz_tolerance", 10.0) # m/z tolerance in ppm
63+
id_mapper.setParameters(params)
64+
65+
Step 4: Annotate the FeatureMap
66+
-------------------------------
67+
68+
Use the configured `IDMapper` to link peptide IDs to the FeatureMap:
69+
70+
.. code-block:: python
71+
72+
id_mapper.annotate(feature_map, peptide_ids, protein_ids, use_centroid_rt=True, use_centroid_mz=True, spectra=None)
73+
74+
Step 5: Save the Annotated FeatureMap
75+
--------------------------------------
76+
77+
Finally, store the modified FeatureMap back to a file:
78+
79+
.. code-block:: python
80+
81+
oms.FeatureXMLFile().store("BSA1_F1_annotated.featureXML", feature_map)
82+
83+
.. tip::
84+
You can visualize the annotated FeatureMap using OpenMS visualization tools like `TOPPView`.
85+
86+
87+
You have successfully **annotated a FeatureMap** with PeptideIdentifications using `IDMapper`. This allows further downstream analysis in (py)OpenMS workflows.
88+

docs/source/user_guide/feature_linking.rst

Lines changed: 99 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,17 @@ are available in pyOpenMS:
1414

1515
- :py:class:`~.FeatureGroupingAlgorithmQT`
1616
- :py:class:`~.FeatureGroupingAlgorithmKD`
17-
- :py:class:`~.FeatureGroupingAlgorithm`
1817
- :py:class:`~.FeatureGroupingAlgorithmLabeled`
1918
- :py:class:`~.FeatureGroupingAlgorithmUnlabeled`
2019

21-
We now perform a features linking using the :py:class:`~.FeatureGroupingAlgorithmQT` algorithm.
20+
We now perform feature linking using the :py:class:`~.FeatureGroupingAlgorithmQT` algorithm.
2221

2322
Download Example Data
2423
*********************
2524

2625
.. code-block:: python
27-
26+
:linenos:
27+
2828
import pyopenms as oms
2929
from urllib.request import urlretrieve
3030
@@ -47,28 +47,115 @@ Download Example Data
4747
oms.FeatureXMLFile().load(feature_file, feature_map)
4848
feature_maps.append(feature_map)
4949
50-
features Linking Algorithm
50+
Feature Linking Algorithm
5151
******************************************
5252

53-
All :py:class:`~.FeatureMap` objects will be combined in a :py:class:`~.ConsensusMap`.
53+
Feature linking is the process which connects (links) features with similar RT and m/z across different runs.
54+
This enables comparisons of peptide signals across runs and is a prerequisite for downstream statistical analysis.
55+
Thus, all features across all :py:class:`~.FeatureMap` objects will be combined into a :py:class:`~.ConsensusMap`.
56+
Here we use :py:class:`~.FeatureGroupingAlgorithmQT` to find corresponding features with similar RT and m/z across feature maps.
57+
If everything goes well, we will have a lot of triplets in our test run (we have three input maps; one feature from each feature map) within each consensus feature.
58+
Realistically, some features will remain unmatched, forming singletons or doublets (consensus features of size 1 or 2). These undersized consensus features can arise due to several factors. In some cases, the signal may be genuinely absent in the dataset, preventing feature detection. Alternatively, the feature-finding algorithm might fail to identify the feature in all available maps, leading to incomplete linkage. Additionally, suboptimal parameters for feature linking, such as excessive retention time (RT) variations across samples, can contribute to mismatches, further reducing the likelihood of forming larger consensus features. The latter can be corrected for using a process called map alignment, which we addressed in a `previous chapter <map_alignment.html>`_.
59+
5460

5561
.. code-block:: python
62+
:linenos:
63+
64+
consensus_map = oms.ConsensusMap()
5665
66+
# populate the actual cmap using FeatureGroupingAlgorithmQT
5767
feature_grouper = oms.FeatureGroupingAlgorithmQT()
68+
# execute feature linking:
69+
feature_grouper.group(feature_maps, consensus_map)
5870
59-
consensus_map = oms.ConsensusMap()
71+
That's it!
72+
Now, we can print some summary information:
73+
74+
75+
.. code-block:: python
76+
:linenos:
77+
78+
# print some info on the consensus map
79+
print(f"Total number of consensus features: {consensus_map.size()}\n\n")
80+
81+
from collections import defaultdict
82+
83+
def compute_feature_size_stats(consensus_map):
84+
size_counts = defaultdict(int) # Default value for missing keys is 0
85+
86+
for cfeature in consensus_map:
87+
size = len(cfeature.getFeatureList())
88+
size_counts[size] += 1 # No need to check if the key exists
89+
90+
return size_counts
91+
92+
stats = compute_feature_size_stats(consensus_map)
93+
## how many consensus features are there which contain features from 1, 2, or all 3 feature maps?
94+
print("ConsensusFeature Size Distribution:", dict(stats), "\n\n")
95+
96+
for (i, cfeature) in enumerate(consensus_map):
97+
if i > 3:
98+
break
99+
print(
100+
f"ConsensusFeature: average int: {cfeature.getIntensity()}, RT: {cfeature.getRT()}, m/z: {cfeature.getMZ()}"
101+
)
102+
# The two features in map 1 and map 2 represent the same analyte at
103+
# slightly different RT and m/z
104+
for fh in cfeature.getFeatureList():
105+
print(f" -- Feature: map#: {fh.getMapIndex()}, int: {fh.getIntensity()}, RT: {fh.getRT()}")
106+
107+
108+
This then prints:
109+
110+
.. code-block:: output
111+
112+
Map 0: Filename = BSA1_F1.featureXML, Size = 256, UniqueID = 15391035140702131759
113+
Map 1: Filename = BSA2_F1.featureXML, Size = 235, UniqueID = 4409371154349730995
114+
Map 2: Filename = BSA3_F1.featureXML, Size = 204, UniqueID = 16851873442542870272
115+
Total number of consensus features: 474
116+
117+
118+
ConsensusFeature Size Distribution: {3: 55, 2: 111, 1: 308}
119+
120+
121+
ConsensusFeature: average int: 674365.6875, RT: 1925.75138611056, m/z: 387.7128198087733
122+
-- Feature: map#: 0, int: 1432460.0, RT: 1961.42395727216
123+
-- Feature: map#: 1, int: 285228.0, RT: 1914.26950841529
124+
-- Feature: map#: 2, int: 305409.0, RT: 1901.56069264423
125+
ConsensusFeature: average int: 1154167.0, RT: 1787.5601793634498, m/z: 662.2911690419279
126+
-- Feature: map#: 0, int: 834075.0, RT: 1836.92005770327
127+
-- Feature: map#: 1, int: 2119760.0, RT: 1748.02632036743
128+
-- Feature: map#: 2, int: 508666.0, RT: 1777.73416001965
129+
ConsensusFeature: average int: 562620.0, RT: 1650.6520418306134, m/z: 779.7355985805051
130+
-- Feature: map#: 0, int: 661665.0, RT: 1662.1136104186
131+
-- Feature: map#: 1, int: 751397.0, RT: 1616.78250841374
132+
-- Feature: map#: 2, int: 274798.0, RT: 1673.0600066595
133+
ConsensusFeature: average int: 23445466.0, RT: 1877.1572820450535, m/z: 738.310987605286
134+
-- Feature: map#: 0, int: 34988800.0, RT: 1901.57762589719
135+
-- Feature: map#: 1, int: 25276700.0, RT: 1849.00759248543
136+
-- Feature: map#: 2, int: 10070900.0, RT: 1880.88662775254
137+
138+
The results here are not ideal, since we only found 55 features which are present in all three feature maps. The remaining consensus features have a missing value for at least one feature map.
139+
We could tweak the parameters used for grouping, but you should make sure that the, for example, acceptable RT deltas are sensible. There is a tradeoff between sensitivity and specificity here.
140+
141+
Finally, we add some meta-data to the consensus map, which allows us to track the input data later on if we were to store the consensus map to disk.
142+
143+
.. code-block:: python
144+
:linenos:
60145
61146
file_descriptions = consensus_map.getColumnHeaders()
62147
63-
# collect information about input maps
148+
# collect information about input maps ...
64149
for i, feature_map in enumerate(feature_maps):
65150
file_description = file_descriptions.get(i, oms.ColumnHeader())
66-
file_description.filename = feature_map.getDataProcessing()[0].getMetaValue(
67-
"parameter: in"
68-
)[:-5]
151+
file_description.filename = feature_files[i] ## filename
69152
file_description.size = feature_map.size()
70153
file_description.unique_id = feature_map.getUniqueId()
71154
file_descriptions[i] = file_description
72-
155+
# ... and store them in the column headers of the cmap
73156
consensus_map.setColumnHeaders(file_descriptions)
74-
feature_grouper.group(feature_maps, consensus_map)
157+
158+
# print the information on underlying feature maps which we just collected
159+
file_descriptions = consensus_map.getColumnHeaders()
160+
for index, header in file_descriptions.items():
161+
print(f"Map {index}: Filename = {header.filename}, Size = {header.size}, UniqueID = {header.unique_id}")

docs/source/user_guide/glossary.rst

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ A glossary of common terms used throughout OpenMS documentation.
2828
One consensus map usually contains many consensus features. OpenMS represents a consensus map using
2929
the class `ConsensusMap <https://abibuilder.cs.uni-tuebingen.de/archive/openms/Documentation/nightly/html/classOpenMS_1_1ConsensusMap.html>`_.
3030

31+
de novo
3132
de novo peptide sequencing
3233
A peptide's amino acid sequence is inferred directly from the precursor peptide mass and tandem
3334
mass spectrum (:term:`MS2` or :term:`MS3`) fragment ions, without comparison to a reference proteome.
@@ -45,6 +46,10 @@ A glossary of common terms used throughout OpenMS documentation.
4546
features
4647
A feature, in the OpenMS terminology, subsumes all m/z signals originating from a single compound at a
4748
certain charge state. This includes the isotope pattern and usually spans multiple spectra in retention time (the elution profile).
49+
50+
feature finder
51+
FeatureFinder
52+
A FeatureFinder in pyOpenMS creates a :term:`feature map` using the MS1 spectra from a :term:`peak map` (e.g. from an mzML file) as input.
4853

4954
feature maps
5055
feature map
@@ -154,9 +159,10 @@ A glossary of common terms used throughout OpenMS documentation.
154159

155160
peptide-spectrum match
156161
PSM
157-
A method used in proteomics to identify proteins from a complex mixture. Involves comparing the
158-
mass spectra of peptide fragments generated from a protein sample with a database of predicted
159-
spectra, in order to identify the protein that produced the observed peptides.
162+
PSMs
163+
A peptide-spectrum match associates a peptide sequence (possibly including modifications) to an MS/MS spectrum.
164+
This usually involves comparing the mass spectra of peptide fragments generated from a digested protein sample with a database of predicted
165+
spectra. Alternatively, this can be done using :term:`de novo` techniques (without a database, just using observed spectra).
160166

161167
PepNovo
162168
PepNovo is a de :term:`de novo peptide sequencing` algorithm for :term:`MS2` spectra.
-72.1 KB
Binary file not shown.
Loading
Loading

docs/source/user_guide/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ headings and structure.
4646
spectrum_merging
4747
charge_isotope_deconvolution
4848
feature_detection
49+
PSM_to_features
4950
map_alignment
5051
adduct_detection
5152
feature_linking

0 commit comments

Comments
 (0)