Skip to content

Commit 521db6c

Browse files
authored
Rework python docs
Rework python docs
2 parents 336edfc + bd1fa62 commit 521db6c

15 files changed

+364
-107
lines changed

docs/source/conf.py

+4
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,10 @@
4646
'chemrole',
4747
]
4848

49+
## warn about invalid references (e.g. invalid class names)
50+
nitpicky = True
51+
nitpick_ignore = []
52+
4953
autosummary_generate = True
5054
autosummary_imported_members = True
5155
remove_from_toctrees = [

docs/source/user_guide/adduct_detection.rst

+3-2
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,10 @@ Adduct Detection
33

44
In mass spectrometry it is crucial to ionize analytes prior to detection, because they are accelerated and manipulated in electric fields, allowing their separation based on mass-to-charge ratio.
55
This happens by addition of protons in positive mode or loss of protons in negative mode. Other ions present in the buffer solution can ionize the analyte as well, e.g. sodium, potassium or formic acid.
6-
Depending on the size and chemical compsition, multiple adducts can bind leading to multiple charges on the analyte. In metabolomics with smaller analytes the number of charges is typically low with one or two, whereas in proteomics the number of charges is much higher.
6+
Depending on the size and chemical compsition, multiple adducts can bind leading to multiple charges on the analyte. In metabolomics with smaller analytes the number of charges is typically low with one or two, whereas in proteomics the number of charges is potentially higher.
7+
78
Furthermore, analytes can loose functional groups during ionization, e.g. a neutral water loss.
8-
Since the ionization happens after liquid chromatography, different adducts for an analyte have similar retention times.
9+
Since the ionization happens after liquid chromatography, different adducts for an analyte have almost identical retention times.
910

1011
.. image:: img/adduct_detection.png
1112

docs/source/user_guide/algorithms.rst

+10-5
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,17 @@ Many signal processing algorithms follow a similar pattern in OpenMS.
77
88
algorithm = NameOfTheAlgorithmClass()
99
exp = MSExperiment()
10+
1011
# populate exp, for example load from file
12+
# ...
13+
14+
# run the algorithm on data
1115
algorithm.filterExperiment(exp)
1216
1317
In many cases, the processing algorithms have a set of parameters that can be
14-
adjusted. These are accessible through :py:meth:`~.Algorithm.getParameters()` and yield a
18+
adjusted. These are accessible through ``Algorithm.getParameters()`` and yield a
1519
:py:class:`~.Param` object (see `Parameter handling <parameter_handling.html>`_) which can
16-
be manipulated. After changing parameters, one can use :py:meth:`~.Algorithm.setParameters()` to
20+
be manipulated. After changing parameters, one can use ``Algorithm.setParameters()`` to
1721
propagate the new parameters to the algorithm:
1822

1923
.. code-block:: output
@@ -24,15 +28,16 @@ propagate the new parameters to the algorithm:
2428
algorithm.setParameters(param)
2529
2630
exp = MSExperiment()
31+
2732
# populate exp, for example load from file
33+
# ...
34+
2835
algorithm.filterExperiment(exp)
2936
3037
Since they work on a single :py:class:`~.MSExperiment` object, little input is needed to
3138
execute a filter directly on the data. Examples of filters that follow this
3239
pattern are :py:class:`~.GaussFilter`, :py:class:`~.SavitzkyGolayFilter` as well as the spectral filters
33-
:py:class:`~.BernNorm`, :py:class:`~.MarkerMower`, :py:class:`~.NLargest`, :py:class:`~.Normalizer`,
34-
:py:class:`~.ParentPeakMower`, :py:class:`~.Scaler`, :py:class:`~.SpectraMerger`, :py:class:`~.SqrtMower`,
35-
:py:class:`~.ThresholdMower`, :py:class:`~.WindowMower`.
40+
:py:class:`~.NLargest`, :py:class:`~.Normalizer`, :py:class:`~.SpectraMerger`, :py:class:`~.ThresholdMower`, :py:class:`~.WindowMower`.
3641

3742
Using the same example file as before, we can execute a :py:class:`~.GaussFilter` on our test data as follows:
3843

docs/source/user_guide/centroiding.rst

+13-5
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,18 @@ Let's zoom in on an isotopic pattern in profile mode and plot it.
3333
plt.plot(
3434
profile_spectra[0].get_peaks()[0], profile_spectra[0].get_peaks()[1]
3535
) # plot the first spectrum
36-
36+
plt.show()
37+
3738
.. image:: img/profile_data.png
3839

39-
Because of the limited resolution of MS instruments m/z measurements are not of unlimited precision.
40-
Consequently, peak shapes spreads in the m/z dimension and resemble a gaussian distribution.
40+
Due to the limited resolution of mass spectrometry (MS) instruments, m/z measurements exhibit a certain spread
41+
when multiple copies of a molecule are measured. Even with identical mass and charge, the copies are recorded with
42+
slight deviations in the m/z dimension. Consequently, peak shapes in this dimension adopt a Gaussian-like distribution.
43+
The number of copies correlates with the peak height (or rather peak volume).
44+
45+
A single peptide species, e.g. "DPFINAGER" at charge 2, typically consists of various molecular
46+
entities that differ in the number of neutrons, leading to an isotopic distribution and resulting in multiple peaks.
47+
4148
Using the :py:class:`~.PeakPickerHiRes` algorithm, we can convert data from profile to centroided mode. Usually, not much information is lost
4249
by storing only centroided data. Thus, many algorithms and tools assume that centroided data is provided.
4350

@@ -55,8 +62,9 @@ by storing only centroided data. Thus, many algorithms and tools assume that cen
5562
plt.stem(
5663
centroided_spectra[0].get_peaks()[0], centroided_spectra[0].get_peaks()[1]
5764
) # plot as vertical lines
58-
65+
plt.show()
66+
5967
.. image:: img/centroided_data.png
6068

6169
After centroiding, a single m/z value for every isotopic peak is retained. By plotting the centroided data as stem plot
62-
we discover that (in addition to the isotopic peaks) some low intensity peaks (intensity at approx. 4k) were present in the profile data.
70+
we discover that (in addition to the isotopic peaks) some low intensity peaks (intensity at approx. 4k units on the y-axis) were present in the profile data.

docs/source/user_guide/charge_isotope_deconvolution.rst

+88-28
Original file line numberDiff line numberDiff line change
@@ -3,65 +3,99 @@ Charge and Isotope Deconvolution
33

44
A single mass spectrum contains measurements of one or more analytes and the
55
m/z values recorded for these analytes. Most analytes produce multiple signals
6-
in the mass spectrometer, due to the natural abundance of carbon :math:`13` (naturally
7-
occurring at ca. :math:`1\%` frequency) and the large amount of carbon atoms in most
8-
organic molecules, most analytes produce a so-called isotopic pattern with a
9-
monoisotopic peak (all carbon are :chem:`^{12}C`) and a first isotopic peak (exactly one
10-
carbon atom is a :chem:`^{13}C`), a second isotopic peak (exactly two atoms are :chem:`^{13}C`) etc.
11-
Note that also other elements can contribute to the isotope pattern, see the
12-
`chemistry section <chemistry.html>`_ for further details.
6+
in the mass spectrometer, due to the natural abundance of heavy isotopes.
7+
The most dominant isotope in proteins is carbon :math:`13` (naturally
8+
occurring at ca. :math:`1.1\%` frequency). Other elements such as Hydrogen also have heavy isotopes, but
9+
they contribute to a much lesser extend, since the heavy isotopes are very low abundant,
10+
e.g. hydrogen :math:`2` (Deuterium), occurs at a frequency of only :math:`0.0156\%`.
11+
12+
All analytes produce a so-called isotopic pattern, consisting of a
13+
monoisotopic peak and a first isotopic peak (exactly one
14+
extra neutron in one of the atoms of the molecule), a second isotopic peak (exactly two extra neutrons) etc.
15+
With higher mass, the monoisotopic peak will become dimishingly small, up to the point where it is not detectable
16+
any more (although this is rarely the case with peptides and more problematic for whole proteins in Top-Down approaches).
17+
18+
By definition, the monoisotopic peak is the peak which contains only isotopes of the most abundant type.
19+
For peptides and proteins, the constituent atoms are C,H,N,O,P,S, where incidentally, the
20+
most abundant isotope is also the lightest isotope. Hence, for peptides and proteins, the monoisotopic peak is always
21+
the lightest peak in an isotopic distribution.
22+
23+
See the `chemistry section <chemistry.html>`_ for further details on isotope abundances and how to compute isotope patterns.
1324

1425
In addition, each analyte may appear in more than one charge state and adduct
15-
state, a singly charge analyte :chem:`[M +H]+` may be accompanied by a doubly
26+
state, a singly charged analyte :chem:`[M +H]+` may be accompanied by a doubly
1627
charged analyte :chem:`[M +2H]++` or a sodium adduct :chem:`[M +Na]+`. In the case of a
17-
multiply charged peptide, the isotopic traces are spaced by ``PROTON_MASS /
28+
multiply charged peptide, the isotopic traces are spaced by ``NEUTRON_MASS /
1829
charge_state`` which is often close to :math:`0.5\ m/z` for doubly charged analytes,
1930
:math:`0.33\ m/z` for triply charged analytes etc. Note: tryptic peptides often appear
20-
at least doubly charged, while small molecules often carry a single charge but
21-
can have adducts other than hydrogen.
31+
either singly charged (when ionized with :term:`MALDI`), or doubly charged (when ionized with :term:`ESI`).
32+
Higher charges are also possible, but usually connected to incomplete tryptic digestions with missed cleavages.
33+
Small molecules in metabolomics often carry a single charge but can have adducts other than hydrogen.
2234

2335
Single Peak Example
2436
*********************************
2537

38+
Let's compute the isotope distribution of the peptide ``DFPIANGER`` using the classes :py:class:`~.AASequence` and
39+
:py:class:`~.EmpiricalFormula`. Then we use the :py:class:`~.Deisotoper` to find the monoisotopic peak:
40+
2641
.. code-block:: python
2742
:linenos:
2843
2944
import pyopenms as oms
3045
31-
charge = 2
3246
seq = oms.AASequence.fromString("DFPIANGER")
47+
print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
48+
49+
## get isotopic distribution for two additional hydrogens (which carry the charge)
50+
charge = 2
3351
seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge))
3452
isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(6))
35-
print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
3653
3754
# Append isotopic distribution to spectrum
3855
s = oms.MSSpectrum()
39-
for iso in isotopes.getContainer():
40-
iso.setMZ(iso.getMZ() / charge)
56+
for iso in isotopes.getContainer(): # the container contains masses, not m/z!
57+
iso.setMZ(iso.getMZ() / charge) # ... even though it's called '.getMZ()'
4158
s.push_back(iso)
4259
print("Isotope", iso.getMZ(), ":", iso.getIntensity())
4360
61+
# deisotope with 10 ppm mass tolerance
4462
oms.Deisotoper.deisotopeAndSingleChargeDefault(s, 10, True)
4563
4664
for p in s:
47-
print(p.getMZ(), p.getIntensity())
65+
print("Mono peaks:", p.getMZ(), p.getIntensity())
66+
67+
which will print:
68+
69+
70+
.. code-block:: output
71+
:linenos:
72+
73+
[M+H]+ weight: 1018.495240604071
74+
Isotope 509.75180710055 : 0.5680345296859741
75+
Isotope 510.25348451945 : 0.3053518533706665
76+
Isotope 510.75516193835 : 0.09806874394416809
77+
Isotope 511.25683935725004 : 0.023309258744120598
78+
Isotope 511.75851677615003 : 0.0044969217851758
79+
Isotope 512.2601941950501 : 0.000738693168386817
80+
Mono peaks: 1018.496337734329 0.5680345296859741
4881
4982
5083
Note that the algorithm presented here as some heuristics built into it, such
5184
as assuming that the isotopic peaks will decrease after the first isotopic
52-
peak. This heuristic can be tuned by changing the parameter
53-
``use_decreasing_model`` and ``start_intensity_check``. In this case, the
54-
second isotopic peak is the highest in intensity and the
55-
``start_intensity_check`` parameter needs to be set to 3.
85+
peak. This heuristic can be tuned by setting the parameter
86+
``use_decreasing_model`` to ``False``.
87+
For more fine-grained control use ``start_intensity_check`` and leave ``use_decreasing_model = True`` (see :py:class:`~.Deisotoper` --> C++ documentation).
88+
Let's look at a very heavy peptide, whose isotopic distribution is dominated by the first and second isotopic peak.
5689

5790
.. code-block:: python
5891
:linenos:
5992
60-
charge = 4
6193
seq = oms.AASequence.fromString("DFPIANGERDFPIANGERDFPIANGERDFPIANGER")
94+
print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
95+
96+
charge = 4
6297
seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge))
6398
isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(8))
64-
print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
6599
66100
# Append isotopic distribution to spectrum
67101
s = oms.MSSpectrum()
@@ -73,9 +107,9 @@ second isotopic peak is the highest in intensity and the
73107
min_charge = 1
74108
min_isotopes = 2
75109
max_isotopes = 10
76-
use_decreasing_model = True
77-
start_intensity_check = 3
78-
oms.Deisotoper.deisotopeAndSingleCharge(
110+
use_decreasing_model = True # ignores all intensities
111+
start_intensity_check = 3 # here, the value does not matter, since we ignore intensities (see above)
112+
oms.Deisotoper.deisotopeAndSingleCharge( ## a function with all parameters exposed
79113
s,
80114
10,
81115
True,
@@ -90,10 +124,26 @@ second isotopic peak is the highest in intensity and the
90124
use_decreasing_model,
91125
start_intensity_check,
92126
False,
127+
True
93128
)
94129
for p in s:
95-
print(p.getMZ(), p.getIntensity())
130+
print("Mono peaks:", p.getMZ(), p.getIntensity())
96131
132+
.. code-block:: output
133+
:linenos:
134+
135+
[M+H]+ weight: 4016.927437824572
136+
Isotope 1004.9878653713499 : 0.10543462634086609
137+
Isotope 1005.2387040808 : 0.22646738588809967
138+
Isotope 1005.48954279025 : 0.25444599986076355
139+
Isotope 1005.7403814996999 : 0.19825772941112518
140+
Isotope 1005.9912202091499 : 0.12000058591365814
141+
Isotope 1006.2420589185999 : 0.05997777357697487
142+
Isotope 1006.49289762805 : 0.025713207200169563
143+
Isotope 1006.7437363375 : 0.009702674113214016
144+
Mono peaks: 4016.9296320850867 0.10543462634086609
145+
146+
This successfully recovers the monoisotopic peak, even though it is not the most abundant peak.
97147

98148
Full Spectral De-Isotoping
99149
**************************
@@ -107,6 +157,7 @@ state:
107157
:linenos:
108158
109159
from urllib.request import urlretrieve
160+
import matplotlib.pyplot as plt
110161
111162
gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master"
112163
urlretrieve(gh + "/src/data/BSA1.mzML", "BSA1.mzML")
@@ -130,6 +181,7 @@ state:
130181
use_decreasing_model,
131182
start_intensity_check,
132183
False,
184+
True
133185
)
134186
135187
print(e[214].size())
@@ -147,7 +199,15 @@ state:
147199
if p.getIntensity() > 0.25 * maxvalue:
148200
print(p.getMZ(), p.getIntensity())
149201
150-
202+
unpicked_peak_data = e[214].get_peaks()
203+
plt.bar(unpicked_peak_data[0], unpicked_peak_data[1], snap=False)
204+
plt.show()
205+
206+
picked_peak_data = s.get_peaks()
207+
plt.bar(picked_peak_data[0], picked_peak_data[1], snap=False)
208+
plt.show()
209+
210+
151211
which produces the following output
152212

153213
.. code-block:: output
@@ -159,7 +219,7 @@ which produces the following output
159219
974.4589691256419 3215808.75
160220
161221
As we can see, the algorithm has reduced :math:`140` peaks to :math:`41` deisotoped peaks. It
162-
also has identified a molecule at :math:`974.45\ m/z` as the most intense peak in the
222+
also has identified a molecule with a singly charged mass of :math:`974.45\ Da` as the most intense peak in the
163223
data (base peak).
164224

165225
Visualization

docs/source/user_guide/chemistry.rst

+2
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ OpenMS has representations for various chemical concepts including molecular
55
formulas, isotopes, ribonucleotide and amino acid sequences as well as common
66
modifications of amino acids or ribonucleotides.
77

8+
For an introduction to isotope patterns, see `Charge and Isotope Deconvolution <charge_isotope_deconvolution.html>`_.
9+
810
Constants
911
---------
1012

0 commit comments

Comments
 (0)