-
Notifications
You must be signed in to change notification settings - Fork 66
Tutorial extended: Map Alignment, Feature Linking and IDMapper #472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
+ add more glossary terms + extend feature linking tutorial + add new PSM_to_features tutorial for IDMapper
Caution Review failedThe pull request is closed. WalkthroughThe pull request introduces extensive documentation updates for the pyOpenMS user guide and glossary. A new guide explains how to annotate feature maps with peptide identifications using the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DataFiles as Data Files
participant FeatureMap
participant IDMapper
User->>DataFiles: Download .featureXML and .idXML
DataFiles-->>User: Provide data files
User->>FeatureMap: Load feature map from featureXML
User->>IDMapper: Configure RT and m/z tolerances
IDMapper->>FeatureMap: Annotate feature map with peptide IDs
User->>FeatureMap: Save annotated feature map
sequenceDiagram
participant User
participant MapAligner as Map Alignment Engine
participant FeatureMaps
participant Plotter
User->>FeatureMaps: Load feature maps (excluding reference)
User->>MapAligner: Perform RT alignment and transformation
MapAligner->>FeatureMaps: Align and update feature maps
User->>Plotter: Plot original and transformed RTs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (11)
docs/source/user_guide/index.rst (1)
49-49
: New entry for PSM_to_features added.
The new "PSM_to_features" entry in the "OpenMS Algorithms" section is correctly inserted. Please verify that the referenced documentation page exists and that its title and style are consistent with the rest of the guide.docs/source/user_guide/PSM_to_features.rst (1)
1-89
: Comprehensive new IDMapper tutorial for Feature Map annotation.
The step-by-step guide—from downloading example data, loading feature maps and peptide identifications, to configuring and using theIDMapper
—is clear and well-structured. To further assist beginners, consider adding notes on common pitfalls or basic error handling (e.g., file loading failures).docs/source/user_guide/map_alignment.rst (6)
5-8
: Clarify Map Alignment Explanation.
The expanded explanation about why chromatographic columns are unstable and the need to correct RTs is clear and informative. Consider splitting the long sentence into two shorter sentences for improved readability.
21-24
: Custom RT Mapping Section.
Introducing the option to apply a custom RT mapping viaMapAlignmentTransformer
is a valuable addition. Consider adding a brief explanation or an example scenario outlining when a user might prefer custom mapping over the default alignment.
26-29
: Linear Alignment Explanation.
The description of using an affine transformation (offset and slope) withMapAlignmentAlgorithmPoseClustering
is clear and emphasizes the benefits of aligning on the feature level. A short note explaining why feature-level alignment is preferred (e.g., better stability and performance) could further enhance clarity.
95-104
: Selecting a Reference Map.
The section on selecting a reference map by choosing the one with the largest number of features is practical and well explained. It might be beneficial to add a brief note on handling edge cases, such as when two maps have an equal number of features.
195-225
: Visualization Function: plot_consensus_maps.
The added functionplot_consensus_maps
is well documented with a clear docstring, and it effectively demonstrates how to visualize feature maps before and after alignment. As a minor improvement, consider caching the result ofmax([f.getIntensity() for f in fm])
to avoid duplicate computations when determining alpha values.
237-275
: Visualization Function: plot_transformed_rt_with_trafo.
Theplot_transformed_rt_with_trafo
function is detailed and describes each step of visualizing the transformation process. One suggestion is to verify that usingedgecolors="black"
with the chosen marker does not trigger any matplotlib warnings; if it does, include a note or adjust the marker style accordingly.docs/source/user_guide/feature_linking.rst (3)
53-59
: Detailed Explanation for Feature Linking.
The expanded explanation about how consensus maps are formed and why some features may remain unmatched is very informative. There is a minor typo ("have have" in line 57) that should be corrected to "have".
81-91
: Feature Size Statistics Function.
The helper functioncompute_feature_size_stats
effectively illustrates how to compute and return the distribution of consensus feature sizes. Consider adding type hints or a short comment describing the expected input type for better clarity.
92-98
: Display of Sample Consensus Features.
The loop that prints a few representative consensus features gives a clear snapshot of the linking results. While limiting the output to the first few features is practical, you may consider using the conditionif i >= 3:
(instead ofif i > 3:
) to make the intent clearer.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
docs/source/user_guide/img/map_alignment.png
is excluded by!**/*.png
docs/source/user_guide/img/map_alignment_fmaps.png
is excluded by!**/*.png
docs/source/user_guide/img/map_alignment_trafos.png
is excluded by!**/*.png
📒 Files selected for processing (7)
docs/source/user_guide/PSM_to_features.rst
(1 hunks)docs/source/user_guide/feature_linking.rst
(2 hunks)docs/source/user_guide/glossary.rst
(3 hunks)docs/source/user_guide/index.rst
(1 hunks)docs/source/user_guide/map_alignment.rst
(3 hunks)docs/source/user_guide/quantitative_data.rst
(1 hunks)src/data/BSA1_F1.idXML
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-test
- GitHub Check: build-test
🔇 Additional comments (19)
docs/source/user_guide/quantitative_data.rst (1)
77-79
: Enhancement in the Consensus Features Section.
The added sentences clarify that linked features are represented by aConsensusFeature
and introduce the process of FeatureLinking. Ensure that the link to the "Feature Linking" chapter (line 78) is correct and that the explanation aligns with the updated tutorials.src/data/BSA1_F1.idXML (2)
1-173
: New IdXML file for peptide identifications added.
The XML file adheres to the IdXML 1.5 schema and is well-structured. This file will serve as the source for peptide identification data in the new IDMapper tutorial.
25-25
: Verify referenced spectra data path.
Thespectra_data
attribute on line 25 references[share/OpenMS/examples/FRACTIONS/BSA1_F1.mzML]
. Please confirm that this path is accurate and that the file is available in the repository as expected.docs/source/user_guide/glossary.rst (3)
31-35
: New glossary term added: de novo peptide sequencing.
The definition clearly describes the concept and aligns with the updated documentation discussing PSM annotations.
50-53
: New glossary term added: FeatureFinder.
The entry succinctly explains the role of FeatureFinder in pausing a feature map from MS1 spectra, which is useful for users new to pyOpenMS.
160-165
: Updated glossary definition for peptide-spectrum match (PSMs).
The revised definition appropriately covers both database-driven and de novo approaches. Ensure that all documentation referring to PSMs uses the updated terminology for consistency.docs/source/user_guide/map_alignment.rst (9)
13-14
: Consensus Map Note Clarity.
The note on creating a consensus map via a feature linking algorithm is concise and useful. Please verify that the link tofeature_linking.html
directs users to the expected comprehensive explanation.
30-33
: PSM Annotation and Reference Map Information.
The discussion on the need for PSM annotations and the use of reference maps adds important context. Ensure that users can easily cross-reference these details in thePSM_to_features
tutorial, and consider mentioning any pitfalls if PSM data is incomplete or missing.
34-40
: Summary Table for Alignment Algorithms.
The summary table effectively compares the characteristics of the available alignment algorithms. Double-check the rendered table layout in the final output to make sure the formatting remains clear and consistent.
60-61
: Footnote Reference Consistency.
The footnote listing valid models ("linear", "b_spline", "lowess", "interpolated") is helpful. Please ensure that these model names match the terminology used throughout the documentation and the pyOpenMS API reference.
78-79
: FeatureXML Files Annotation.
The note indicating that featureXML files already contain PSMs (obtained byoms.IDMapper()
) is clear. Verify that the details provided here align well with the guidance in thePSM_to_features
tutorial for consistency.
119-122
: Excluding the Reference Map for Efficient Alignment.
Excluding the reference map from the alignment loop is a smart choice to maximize efficiency. Consider adding a note to mention any potential edge cases (e.g., what if the feature map list is empty) to further inform advanced users.
124-140
: Map Alignment Algorithm Implementation.
The code snippet demonstrating the use ofMapAlignmentAlgorithmPoseClustering
, including parameter retrieval and setting, is clear and practical. Make sure that the parameters (likemax_rt_shift
) are also documented in the API reference so that users can cross-check their meanings.
141-149
: Iterative Alignment Process.
The for-loop that processes each non-reference feature map with the alignment algorithm is well illustrated. The inline comment about storing the original RT as meta-data adds clarity.
151-180
: Alternative Alignment with MapAlignmentAlgorithmIdentification.
The section detailing the use ofMapAlignmentAlgorithmIdentification
—including the adjustment of parameters and handling of theref_index
(set to -1)—is thorough. Please verify that the behavior of settingref_index = -1
is explained clearly in the broader context of pyOpenMS usage.docs/source/user_guide/feature_linking.rst (4)
20-20
: Feature Linking Introduction Update.
The update clearly states the use ofFeatureGroupingAlgorithmQT
for feature linking, which streamlines the explanation.
61-69
: Consensus Map Population Example.
The Python code demonstrating the instantiation of a consensus map and its population usingFeatureGroupingAlgorithmQT
is concise and clear.
75-80
: Consensus Map Summary Output.
The snippet that prints the total number of consensus features and provides summary information is useful and straightforward. Be sure that the sample output format is consistent with real-world data outcomes.
142-159
: Meta-data Augmentation and Display.
Augmenting the consensus map with file descriptions and printing the metadata is a valuable addition that helps users trace back the origin of the data. This step reinforces good practices in data provenance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
docs/source/user_guide/feature_linking.rst (3)
26-27
: Ensure consistent:linenos:
directive indentationThe
:linenos:
option should be indented by at least three spaces to align with Sphinx directive best practices and maintain consistency across all code blocks. Please verify indentation and blank-line placement after the option.Also applies to: 62-63, 75-76, 144-145
53-58
: Fix typos in explanatory textThere are minor typos that impact clarity:
- Duplicate word in "we have have three input maps"
- Misspelling "we adressed" should be "we addressed"
Please correct these typos for a smoother reading experience.
64-64
: Consider rephrasing casual commentThe inline comment
# this is where the work happens:
is informative but slightly informal. Consider using a more descriptive, professional comment such as# Execute feature grouping
to maintain tone consistency.Also applies to: 68-69
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/source/user_guide/feature_linking.rst
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-test
- GitHub Check: build-test
🔇 Additional comments (2)
docs/source/user_guide/feature_linking.rst (2)
20-20
: Approve new algorithm selection statementThe added sentence succinctly introduces the use of
FeatureGroupingAlgorithmQT
and improves the flow into the example.
148-161
: Approve metadata collection code snippetThe added code for populating and printing column header metadata is clear and enhances reproducibility by tracking input file details. Great addition!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
docs/source/user_guide/map_alignment.rst (6)
5-9
: Revise introductory phrasing for clarity and consistency
The new opening sentences effectively motivate RT alignment, but consider tightening the prose—e.g., replace “This is useful, since…” with “Because chromatographic columns are less stable, retention times of identical compounds can vary across runs.” Also, avoid “we want to… ” in user‐facing docs; use passive or imperative voice (“Align features by finding landmarks…”).
13-13
: Use Sphinx cross‐reference role instead of raw HTML link
Rather than hard‐coding<feature_linking.html>
use a native Sphinx role (e.g., :doc:feature_linking
or a named reference) to ensure the link remains valid if the target filename changes.
69-83
: Clarify example data comments and file names
- The inline comments (“# we use featureXML…”) could reference the new
PSM_to_features
tutorial for context.- Double‐check that the filenames (
BSA1_F1_idmapped.featureXML
, etc.) match the assets insrc/data/
.
95-104
: Align heading underline length with title text
The “Selecting a Reference Map” heading uses 25 asterisks but the text is shorter. For consistency, match the underline length exactly to the title (Sphinx style).
184-228
: Enhance visualization examples
- In
plot_consensus_maps
, the slicing[...][:-11]
to trim file names is brittle; consider using Python’sos.path.splitext()
orre.sub()
for clarity.- Suggest adding axis labels in both subplots for consistency.
279-280
: Minor grammar tweak
Replace “Using a more flexible model like LOWESS will likely result in a worse alignment (try and see for yourself).” with “More flexible models (e.g., LOWESS) may overfit and degrade performance—feel free to experiment.”
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/source/user_guide/map_alignment.rst
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-test
- GitHub Check: build-test
🔇 Additional comments (7)
docs/source/user_guide/map_alignment.rst (7)
21-23
: Approve: Custom RT‐mapping mention reads well
The note aboutMapAlignmentTransformer
is concise and correctly signals the advanced use case.
26-32
: Approve: Algorithm descriptions accurately updated
The expanded explanation ofPoseClustering
vs. other aligners and the PSM/reference requirements is clear and matches the new tutorial flow.
34-63
: Verify table formatting and footnote placement
The new summary table is a welcome addition. Please confirm in the built HTML that:
- The column headers align correctly.
- Footnote [#footnote1] appears just once and refers only to the “any” models column.
- Indentation of list‐table rows renders as expected.
117-123
: Approve: Reference map selection snippet
The Python snippet correctly demonstrates picking the feature map with the most entries and excluding it from alignment. Variable naming (ref_index
,feature_maps_to_align
) is clear.
128-150
: Approve: PoseClustering alignment example
The example walks through parameter retrieval, alignment, and RT transformation with original RT metadata. Code is accurate and well commented.
152-180
: Approve: Identification‐based alignment code
The tutorial correctly showcases choosing models, setting parameters, and applying transformations for multiple maps. Theref_index = -1
idiom is well explained.
232-278
: Approve: RT‐transformation landmarks plot
Theplot_transformed_rt_with_trafo
function is well structured, with distinct markers and colors. The commentary guides the user effectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (12)
docs/source/user_guide/feature_linking.rst (12)
20-20
: Maintain formal and consistent instructional tone.The phrases "We now perform feature linking..." (line 20) and "That's it!" (line 71) are conversational. For a more formal tutorial style, consider rephrasing to "Perform feature linking using the :py:class:
~.FeatureGroupingAlgorithmQT
algorithm." and "Now we can print summary information:" respectively.Also applies to: 71-72
26-27
: Verify Sphinx:linenos:
directive formatting.Ensure that each
:linenos:
option is consistently indented (commonly three spaces) under its corresponding.. code-block:: python
directive and that a blank line separates the options from the code. Align this with other tutorials to maintain a uniform look.Also applies to: 62-63, 76-77, 144-144
50-51
: Align section heading underline length.The underline under "Feature Linking Algorithm" (line 51) uses more asterisks than the title length. Adjust it so that the number of
*
matches the character count of the heading or follows the project's heading style conventions.
53-58
: Improve grammar and readability of descriptive text.
- Use "that" instead of "which" for restrictive clauses (e.g., "the process that connects…").
- Consider splitting the very long sentence on line 58 into two sentences to enhance clarity and reduce cognitive load.
68-69
: Capitalize inline comment for consistency.In the Python snippet, line 68 reads
# execute feature linking:
—for consistency with sentence-style comments, capitalize it as# Execute feature linking:
.
75-80
: Consistent multi-line print formatting.The multi-line
print(f"Total number of consensus features: {consensus_map.size()}\n\n")
is clear, but consider placing the extra newline directly in the string or handling it once in a separateprint()
call to avoid embedded escape sequences and improve readability.
83-90
: Add docstring and type hints to helper function.The
compute_feature_size_stats
helper would benefit from a brief docstring describing its input (ConsensusMap
) and return type (Dict[int, int]
), as well as optional Python type hints on the function signature. This helps learners understand the contract and return value structure.
96-106
: Clarify example loop for consensus features.To guide users through the enumeration:
- Add a short comment before the loop to explain that only the first few consensus features are shown.
- Ensure consistent indentation for the multi-line
print(...)
call to match other code snippets.
108-112
: Refine output introduction phrasing.Instead of "This then prints:", consider using "The following output is produced:" or "Example output:" to more clearly introduce the
.. code-block:: output
section.
112-119
: Verify indentation in output code block.Ensure each line (e.g.,
Map 0: ...
,Map 1: ...
) is indented uniformly (typically four spaces) within the.. code-block:: output
directive to maintain clear formatting.
151-157
: Standardize inline comments and variable usage.
- Change
## filename
(line 151) to a single# filename
for consistency with the rest of the snippets.- Avoid reusing
file_descriptions
for both the original and updated headers. Consider renaming the updated mapping (e.g.,new_headers
) before callingconsensus_map.setColumnHeaders(...)
to prevent shadowing.
159-161
: Include example output for metadata section.To complete this tutorial section, add a corresponding
.. code-block:: output
snippet showing the printed metadata (Filename, Size, UniqueID) so users can compare against expected results.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/source/user_guide/feature_linking.rst
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-test
- GitHub Check: build-test
🔇 Additional comments (1)
docs/source/user_guide/feature_linking.rst (1)
64-69
: Code snippet setup and execution are accurate.The initialization of
consensus_map = oms.ConsensusMap()
and the subsequent callfeature_grouper.group(feature_maps, consensus_map)
correctly demonstrate the API usage for feature linking. No issues found.
Summary by CodeRabbit