Skip to content

Latest commit

 

History

History
95 lines (74 loc) · 4.25 KB

0021-str-pattern-pt2.md

File metadata and controls

95 lines (74 loc) · 4.25 KB

Challenge 21: Verify the safety of substring-related functions in str::pattern

  • Status: Open
  • Tracking Issue: #278
  • Start date: 2025-03-07
  • End date: 2025-10-17
  • Reward: 10000 USD

Context

The following str library functions are generic over the Pattern trait (https://doc.rust-lang.org/std/str/pattern/trait.Pattern.html):

  • contains
  • starts_with
  • ends_with
  • find
  • rfind
  • split
  • split_inclusive
  • rsplit
  • split_terminator
  • rsplit_terminator
  • splitn
  • rsplitn
  • split_once
  • rsplit_once
  • rmatches
  • match_indices
  • rmatch_indices
  • trim_matches
  • trim_start_matches
  • strip_prefix
  • strip_suffix
  • trim_end_matches These functions accept a pattern as input, then call into_searcher to create a Searcher for the pattern. They use this Searcher to perform their desired operations (split, find, etc.). Those functions are implemented in (library/core/src/str/mod.rs), but the core of them is the searching algorithms which are implemented in (library/core/src/str/pattern.rs).

Details

Important note: for this challenge, you can assume:

  1. The safety and functional correctness of all functions in slice module.
  2. That all functions in (library/core/src/str/validations.rs) are functionally correct (consistent with the UTF8 encoding description in https://en.wikipedia.org/wiki/UTF-8).
  3. That all the Searchers in (library/core/src/str/iter.rs) are created by the into_searcher(_, haystack) with haystack is a valid utf8 string (str). You can assume any utf8 string property of haystack.

Verify the safety of the functions in (library/core/src/str/pattern.rs) listed in the next section.

The safety properties we are targeting are:

  1. There is no UB happens when calling the functions after the Searcher is created.
  2. The impls of unsafe traits Searcher and ReverseSearcher satisfies the SAFETY condition stated in the file:
/// The trait is marked unsafe because the indices returned by the
/// [`next()`][Searcher::next] methods are required to lie on valid utf8
/// boundaries in the haystack. This enables consumers of this trait to
/// slice the haystack without additional runtime checks.

This property should hold for next_back() of ReverseSearcher too.

Although this challenge appears similar to Challenge 20 and only requires proving safety for the StrSearcher implementation, it warrants being a separate challenge due to its complexity. The StrSearcher implementation relies on sophisticated algorithms - TwoWay-Search and SIMD-search - making the safety proof significantly more complex.

Success Criteria

Verify the safety of the following StrSearcher functions in (library/core/src/str/pattern.rs):

  • next
  • next_match
  • next_back
  • next_match_back
  • next_reject
  • next_back_reject

The verification is considered successful if you can specify a condition (a "type invariant") C and prove that:

  1. If the StrSearcher is created from any valid UTF-8 haystack, it satisfies C.
  2. If the StrSearcher satisfies C, it ensures the two safety properties mentioned in the previous section.
  3. If the StrSearcher satisfies C, after it calls any function above and gets modified, it still satisfies C.

Furthermore, you must prove the absence of undefined behaviors listed in the next section.

The verification must be unbounded---it must hold for inputs of arbitrary size.

List of UBs

All proofs must automatically ensure the absence of the following undefined behaviors ref:

  • Accessing (loading from or storing to) a place that is dangling or based on a misaligned pointer.
  • Reading from uninitialized memory except for padding or unions.
  • Mutating immutable bytes.
  • Producing an invalid value

Note: All solutions to verification challenges need to satisfy the criteria established in the challenge book in addition to the ones listed above.