Skip to content

Commit b49bae7

Browse files
committed
rework 7 tutorials
1 parent bba2d77 commit b49bae7

7 files changed

+167
-71
lines changed

docs/source/user_guide/adduct_detection.rst

+3-2
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,10 @@ Adduct Detection
33

44
In mass spectrometry it is crucial to ionize analytes prior to detection, because they are accelerated and manipulated in electric fields, allowing their separation based on mass-to-charge ratio.
55
This happens by addition of protons in positive mode or loss of protons in negative mode. Other ions present in the buffer solution can ionize the analyte as well, e.g. sodium, potassium or formic acid.
6-
Depending on the size and chemical compsition, multiple adducts can bind leading to multiple charges on the analyte. In metabolomics with smaller analytes the number of charges is typically low with one or two, whereas in proteomics the number of charges is much higher.
6+
Depending on the size and chemical compsition, multiple adducts can bind leading to multiple charges on the analyte. In metabolomics with smaller analytes the number of charges is typically low with one or two, whereas in proteomics the number of charges is potentially higher.
7+
78
Furthermore, analytes can loose functional groups during ionization, e.g. a neutral water loss.
8-
Since the ionization happens after liquid chromatography, different adducts for an analyte have similar retention times.
9+
Since the ionization happens after liquid chromatography, different adducts for an analyte have almost identical retention times.
910

1011
.. image:: img/adduct_detection.png
1112

docs/source/user_guide/charge_isotope_deconvolution.rst

+85-28
Original file line numberDiff line numberDiff line change
@@ -3,65 +3,99 @@ Charge and Isotope Deconvolution
33

44
A single mass spectrum contains measurements of one or more analytes and the
55
m/z values recorded for these analytes. Most analytes produce multiple signals
6-
in the mass spectrometer, due to the natural abundance of carbon :math:`13` (naturally
7-
occurring at ca. :math:`1\%` frequency) and the large amount of carbon atoms in most
8-
organic molecules, most analytes produce a so-called isotopic pattern with a
9-
monoisotopic peak (all carbon are :chem:`^{12}C`) and a first isotopic peak (exactly one
10-
carbon atom is a :chem:`^{13}C`), a second isotopic peak (exactly two atoms are :chem:`^{13}C`) etc.
11-
Note that also other elements can contribute to the isotope pattern, see the
12-
`chemistry section <chemistry.html>`_ for further details.
6+
in the mass spectrometer, due to the natural abundance of heavy isotopes.
7+
The most dominant isotope in proteins is carbon :math:`13` (naturally
8+
occurring at ca. :math:`1.1\%` frequency). Other elements such as Hydrogen also have heavy isotopes, but
9+
they contribute to a much lesser extend, since the heavy isotopes are very low abundant,
10+
e.g. hydrogen :math:`2` (Deuterium), occurs at a frequency of only :math:`0.0156\%`.
11+
12+
All analytes produce a so-called isotopic pattern, consisting of a
13+
monoisotopic peak and a first isotopic peak (exactly one
14+
extra neutron in one of the atoms of the molecule), a second isotopic peak (exactly two extra neutrons) etc.
15+
With higher mass, the monoisotopic peak will become dimishingly small, up to the point where it is not detectable
16+
any more (although this is rarely the case with peptides and more problematic for whole proteins in Top-Down approaches).
17+
18+
By definition, the monoisotopic peak is the peak which contains only isotopes of the most abundant type.
19+
For peptides and proteins, the constituent atoms are C,H,N,O,P,S, where incidentally, the
20+
most abundant isotope is also the lightest isotope. Hence, for peptides and proteins, the monoisotopic peak is always
21+
the lightest peak in an isotopic distribution.
22+
23+
See the `chemistry section <chemistry.html>`_ for further details on isotope abundances and how to compute isotope patterns.
1324

1425
In addition, each analyte may appear in more than one charge state and adduct
15-
state, a singly charge analyte :chem:`[M +H]+` may be accompanied by a doubly
26+
state, a singly charged analyte :chem:`[M +H]+` may be accompanied by a doubly
1627
charged analyte :chem:`[M +2H]++` or a sodium adduct :chem:`[M +Na]+`. In the case of a
17-
multiply charged peptide, the isotopic traces are spaced by ``PROTON_MASS /
28+
multiply charged peptide, the isotopic traces are spaced by ``NEUTRON_MASS /
1829
charge_state`` which is often close to :math:`0.5\ m/z` for doubly charged analytes,
1930
:math:`0.33\ m/z` for triply charged analytes etc. Note: tryptic peptides often appear
20-
at least doubly charged, while small molecules often carry a single charge but
21-
can have adducts other than hydrogen.
31+
either singly charged (when ionized with :term:`MALDI`), or doubly charged (when ionized with :term:`ESI`).
32+
Higher charges are also possible, but usually connected to incomplete tryptic digestions with missed cleavages.
33+
Small molecules in metabolomics often carry a single charge but can have adducts other than hydrogen.
2234

2335
Single Peak Example
2436
*********************************
2537

38+
Let's compute the isotope distribution of the peptide ``DFPIANGER`` using the classes :py:class:`~.AASequence` and
39+
:py:class:`~.EmpiricalFormula`. Then we use the :py:class:`~.Deisotoper` to find the monoisotopic peak:
40+
2641
.. code-block:: python
2742
:linenos:
2843
2944
import pyopenms as oms
3045
31-
charge = 2
3246
seq = oms.AASequence.fromString("DFPIANGER")
47+
print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
48+
49+
## get isotopic distribution for two additional hydrogens (which carry the charge)
50+
charge = 2
3351
seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge))
3452
isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(6))
35-
print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
3653
3754
# Append isotopic distribution to spectrum
3855
s = oms.MSSpectrum()
39-
for iso in isotopes.getContainer():
40-
iso.setMZ(iso.getMZ() / charge)
56+
for iso in isotopes.getContainer(): # the container contains masses, not m/z!
57+
iso.setMZ(iso.getMZ() / charge) # ... even though it's called '.getMZ()'
4158
s.push_back(iso)
4259
print("Isotope", iso.getMZ(), ":", iso.getIntensity())
4360
61+
# deisotope with 10 ppm mass tolerance
4462
oms.Deisotoper.deisotopeAndSingleChargeDefault(s, 10, True)
4563
4664
for p in s:
47-
print(p.getMZ(), p.getIntensity())
65+
print("Mono peaks:", p.getMZ(), p.getIntensity())
66+
67+
which will print:
68+
69+
70+
.. code-block:: output
71+
:linenos:
72+
73+
[M+H]+ weight: 1018.495240604071
74+
Isotope 509.75180710055 : 0.5680345296859741
75+
Isotope 510.25348451945 : 0.3053518533706665
76+
Isotope 510.75516193835 : 0.09806874394416809
77+
Isotope 511.25683935725004 : 0.023309258744120598
78+
Isotope 511.75851677615003 : 0.0044969217851758
79+
Isotope 512.2601941950501 : 0.000738693168386817
80+
Mono peaks: 1018.496337734329 0.5680345296859741
4881
4982
5083
Note that the algorithm presented here as some heuristics built into it, such
5184
as assuming that the isotopic peaks will decrease after the first isotopic
52-
peak. This heuristic can be tuned by changing the parameter
53-
``use_decreasing_model`` and ``start_intensity_check``. In this case, the
54-
second isotopic peak is the highest in intensity and the
55-
``start_intensity_check`` parameter needs to be set to 3.
85+
peak. This heuristic can be tuned by setting the parameter
86+
``use_decreasing_model`` to ``False``.
87+
For more fine-grained control use ``start_intensity_check`` and leave ``use_decreasing_model = True`` (see :py:class:`~.Deisotoper` --> C++ documentation).
88+
Let's look at a very heavy peptide, whose isotopic distribution is dominated by the first and second isotopic peak.
5689

5790
.. code-block:: python
5891
:linenos:
5992
60-
charge = 4
6193
seq = oms.AASequence.fromString("DFPIANGERDFPIANGERDFPIANGERDFPIANGER")
94+
print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
95+
96+
charge = 4
6297
seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge))
6398
isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(8))
64-
print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
6599
66100
# Append isotopic distribution to spectrum
67101
s = oms.MSSpectrum()
@@ -73,9 +107,9 @@ second isotopic peak is the highest in intensity and the
73107
min_charge = 1
74108
min_isotopes = 2
75109
max_isotopes = 10
76-
use_decreasing_model = True
77-
start_intensity_check = 3
78-
oms.Deisotoper.deisotopeAndSingleCharge(
110+
use_decreasing_model = True # ignores all intensities
111+
start_intensity_check = 3 # here, the value does not matter, since we ignore intensities (see above)
112+
oms.Deisotoper.deisotopeAndSingleCharge( ## a function with all parameters exposed
79113
s,
80114
10,
81115
True,
@@ -92,8 +126,23 @@ second isotopic peak is the highest in intensity and the
92126
False,
93127
)
94128
for p in s:
95-
print(p.getMZ(), p.getIntensity())
129+
print("Mono peaks:", p.getMZ(), p.getIntensity())
96130
131+
.. code-block:: output
132+
:linenos:
133+
134+
[M+H]+ weight: 4016.927437824572
135+
Isotope 1004.9878653713499 : 0.10543462634086609
136+
Isotope 1005.2387040808 : 0.22646738588809967
137+
Isotope 1005.48954279025 : 0.25444599986076355
138+
Isotope 1005.7403814996999 : 0.19825772941112518
139+
Isotope 1005.9912202091499 : 0.12000058591365814
140+
Isotope 1006.2420589185999 : 0.05997777357697487
141+
Isotope 1006.49289762805 : 0.025713207200169563
142+
Isotope 1006.7437363375 : 0.009702674113214016
143+
Mono peaks: 4016.9296320850867 0.10543462634086609
144+
145+
This successfully recovers the monoisotopic peak, even though it is not the most abundant peak.
97146

98147
Full Spectral De-Isotoping
99148
**************************
@@ -147,7 +196,15 @@ state:
147196
if p.getIntensity() > 0.25 * maxvalue:
148197
print(p.getMZ(), p.getIntensity())
149198
150-
199+
unpicked_peak_data = e[214].get_peaks()
200+
plt.bar(unpicked_peak_data[0], unpicked_peak_data[1], snap=False)
201+
plt.show()
202+
203+
picked_peak_data = s.get_peaks()
204+
plt.bar(picked_peak_data[0], picked_peak_data[1], snap=False)
205+
plt.show()
206+
207+
151208
which produces the following output
152209

153210
.. code-block:: output
@@ -159,7 +216,7 @@ which produces the following output
159216
974.4589691256419 3215808.75
160217
161218
As we can see, the algorithm has reduced :math:`140` peaks to :math:`41` deisotoped peaks. It
162-
also has identified a molecule at :math:`974.45\ m/z` as the most intense peak in the
219+
also has identified a molecule with a singly charged mass of :math:`974.45\ Da` as the most intense peak in the
163220
data (base peak).
164221

165222
Visualization

docs/source/user_guide/chemistry.rst

+2
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ OpenMS has representations for various chemical concepts including molecular
55
formulas, isotopes, ribonucleotide and amino acid sequences as well as common
66
modifications of amino acids or ribonucleotides.
77

8+
For an introduction to isotope patterns, see `Charge and Isotope Deconvolution <charge_isotope_deconvolution.html>`_.
9+
810
Constants
911
---------
1012

docs/source/user_guide/feature_detection.rst

+31-27
Original file line numberDiff line numberDiff line change
@@ -3,37 +3,38 @@ Feature Detection
33

44
One very common task in mass spectrometry is the detection of 2-dimensional
55
patterns in m/z and time (RT) dimension from a series of :term:`MS1` scans. These
6-
patterns are called ``Features`` and they exhibit a chromatographic elution
6+
patterns are called a term:`Feature` and they exhibit a chromatographic elution
77
profile in the time dimension and an isotopic pattern in the m/z dimension (see
8-
`previous section <deisotoping.html>`_ for the 1-dimensional problem).
8+
`previous section <charge_isotope_deconvolution.html>`_ for the 1-dimensional problem).
9+
910
OpenMS has multiple tools that can identify these features in 2-dimensional
10-
data, these tools are called :py:class:`~.FeatureFinder`. Currently the following
11+
data, these tools are called ``FeatureFinder``. Currently the following
1112
FeatureFinders are available in pyOpenMS:
1213

13-
- :py:class:`~.FeatureFinderMultiplexAlgorithm` (e.g., :term:`SILAC`, Dimethyl labeling, (and label-free), identification free feature detection of peptides)
1414
- :py:class:`~.FeatureFinderAlgorithmPicked` (Label-free, identification free feature detection of peptides)
1515
- :py:class:`~.FeatureFinderIdentificationAlgorithm` (Label-free identification-guided feature detection of peptides)
16+
- :py:class:`~.FeatureFinderMultiplexAlgorithm` (e.g., :term:`SILAC`, Dimethyl labeling, (and label-free), identification free feature detection of peptides)
1617
- :py:class:`~.FeatureFindingMetabo` (Label-free, identification free feature detection of metabolites)
1718
- :py:class:`~.FeatureFinderAlgorithmMetaboIdent` (Label-free, identification guided feature detection of metabolites)
1819

19-
All of the algorithms above are for proteomics data with the exception of :py:class:`~.FeatureFindingMetabo` and :py:class:`~.FeatureFinderMetaboIdentCompound` for metabolomics data and small molecules in general.
20+
All of the algorithms above are for proteomics data with the exception of :py:class:`~.FeatureFindingMetabo` and :py:class:`~.FeatureFinderAlgorithmMetaboIdent` for metabolomics data and small molecules in general.
2021

2122
Proteomics
2223
******************************
2324

24-
Two of the most commonly used feature finders for proteomics in OpenMS are the :py:class:`~.FeatureFinder` and :py:class:`~.FeatureFinderIdentificationAlgorithm` which both work on (high
25-
resolution) centroided data. We can use the following code to find features in MS data:
25+
Two of the most commonly used feature finders for proteomics in OpenMS are the :py:class:`~.FeatureFinderAlgorithmPicked`, :py:class:`~.FeatureFinderMultiplexAlgorithm` and :py:class:`~.FeatureFinderIdentificationAlgorithm` which all work on (high
26+
resolution) centroided data (FeatureFinderMultiplexAlgorithm can also work on profile data). We can use the following code to find features in MS data:
2627

2728
.. code-block:: python
2829
2930
from urllib.request import urlretrieve
31+
import pyopenms as oms
3032
3133
gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master"
3234
urlretrieve(
3335
gh + "/src/data/FeatureFinderCentroided_1_input.mzML", "feature_test.mzML"
3436
)
3537
36-
import pyopenms as oms
3738
3839
# Prepare data loading (save memory by only
3940
# loading MS1 spectra into memory)
@@ -47,32 +48,30 @@ resolution) centroided data. We can use the following code to find features in M
4748
fh.load("feature_test.mzML", input_map)
4849
input_map.updateRanges()
4950
50-
ff = oms.FeatureFinder()
51-
ff.setLogType(oms.LogType.CMD)
51+
ff = oms.FeatureFinderAlgorithmPicked()
5252
5353
# Run the feature finder
54-
name = "centroided"
55-
features = oms.FeatureMap()
56-
seeds = oms.FeatureMap()
57-
params = oms.FeatureFinder().getParameters(name)
58-
ff.run(name, input_map, features, params, seeds)
54+
out_features = oms.FeatureMap() ## our result
55+
seeds = oms.FeatureMap() ## optional: you can provide seeds where FF should take place -- not used here
56+
params = ff.getParameters(); ## we do not modify params for now
57+
ff.run(input_map, out_features, params, seeds)
5958
60-
features.setUniqueIds()
59+
out_features.setUniqueIds()
6160
fh = oms.FeatureXMLFile()
62-
fh.store("output.featureXML", features)
63-
print("Found", features.size(), "features")
61+
fh.store("output.featureXML", out_features)
62+
print("Found", out_features.size(), "features")
6463
6564
With a few lines of Python, we are able to run powerful algorithms available in
6665
OpenMS. The resulting data is held in memory (a :py:class:`~.FeatureMap` object) and can be
67-
inspected directly using the ``help(features)`` comment. It reveals that the
66+
inspected directly using the ``help(out_features)`` comment. It reveals that the
6867
object supports iteration (through the ``__iter__`` function) as well as direct
6968
access (through the ``__getitem__`` function). This means we write code that uses direct access and iteration in
7069
Python as follows:
7170

7271
.. code-block:: python
7372
74-
f0 = features[0]
75-
for f in features:
73+
f0 = out_features[0]
74+
for f in out_features:
7675
print(f.getRT(), f.getMZ())
7776
7877
@@ -82,7 +81,7 @@ inspecting ``help(f)`` or by consulting the manual.
8281

8382
Note: the output file that we have written (``output.featureXML``) is an
8483
OpenMS-internal XML format for storing features. You can learn more about file
85-
formats in the `Reading MS data formats <other_file_handling.html>`_ section.
84+
formats in the `Reading MS data formats <other_ms_data_formats.html>`_ section.
8685

8786
Metabolomics - Untargeted
8887
*************************
@@ -239,16 +238,18 @@ Now we can use the following code to detect features with :py:class:`~.FeatureFi
239238
# save FeatureMap to file
240239
oms.FeatureXMLFile().store("detected_features.featureXML", fm)
241240
242-
Note: the output file that we have written (``output.featureXML``) is an
241+
Note: the output file that we have written (``detected_features.featureXML``) is an
243242
OpenMS-internal XML format for storing features. You can learn more about file
244-
formats in the `Reading MS data formats <other_file_handling.html>`_ section.
243+
formats in the `Reading MS data formats <other_ms_data_formats.html>`_ section.
245244

246245
We can get a quick overview on the detected features by plotting them using the following function:
247246

248247
.. code-block:: python
249248
:linenos:
250249
251250
import matplotlib.pyplot as plt
251+
import matplotlib.colors as mcolors
252+
import itertools
252253
253254
def plotDetectedFeatures3D(path_to_featureXML):
254255
fm = oms.FeatureMap()
@@ -258,8 +259,9 @@ We can get a quick overview on the detected features by plotting them using the
258259
fig = plt.figure()
259260
ax = fig.add_subplot(111, projection="3d")
260261
261-
for feature in fm:
262-
color = next(ax._get_lines.prop_cycler)["color"]
262+
cycled_colors = itertools.cycle(['red', 'green', 'blue', 'orange', 'purple', 'yellow', 'cyan', 'magenta', 'black', 'gray'])
263+
264+
for feature, color in zip(fm, cycled_colors):
263265
# chromatogram data is stored in the subordinates of the feature
264266
for i, sub in enumerate(feature.getSubordinates()):
265267
retention_times = [
@@ -268,7 +270,7 @@ We can get a quick overview on the detected features by plotting them using the
268270
intensities = [
269271
int(y[1]) for y in sub.getConvexHulls()[0].getHullPoints()
270272
]
271-
mz = sub.getMetaValue("MZ")
273+
mz = sub.getMZ()
272274
ax.plot(retention_times, intensities, zs=mz, zdir="x", color=color)
273275
if i == 0:
274276
ax.text(
@@ -284,4 +286,6 @@ We can get a quick overview on the detected features by plotting them using the
284286
ax.set_zlabel("intensity (cps)")
285287
plt.show()
286288
289+
plotDetectedFeatures3D("detected_features.featureXML")
290+
287291
.. image:: img/ffmid_graph.png

docs/source/user_guide/map_alignment.rst

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Map Alignment
22
===============
33

4-
The pyOpenMS map alignment algorithms transform different maps (peak maps, :term:`feature maps`) to a common retention time axis.
4+
The pyOpenMS map alignment algorithms transform different maps (:term:`peak maps`, :term:`feature maps`) to a common retention time axis.
55

66
.. image:: img/map_alignment_illustration.png
77

@@ -12,7 +12,6 @@ Different map alignment algorithms are available in pyOpenMS:
1212

1313
- :py:class:`~.MapAlignmentAlgorithmPoseClustering`
1414
- :py:class:`~.MapAlignmentAlgorithmIdentification`
15-
- :py:class:`~.MapAlignmentAlgorithmSpectrumAlignment`
1615
- :py:class:`~.MapAlignmentAlgorithmKD`
1716
- :py:class:`~.MapAlignmentTransformer`
1817

0 commit comments

Comments
 (0)