Skip to content

Clarify BTree range_search comments #81312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 18, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion library/alloc/src/collections/btree/search.rs
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ impl<BorrowType: marker::BorrowType, K, V> NodeRef<BorrowType, K, V, marker::Lea
K: Borrow<Q>,
R: RangeBounds<Q>,
{
// WARNING: Inlining these variables would be unsound (#81138)
// It might be unsound to inline these variables if this logic changes (#81138).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to tone down the warning, but I don't know what you mean with "if this logic changes". What logic? It might be bad to inline these variables without any other change, because successfully passing the match statement might fool you to believe the start and end actually used were fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What logic?

I meant the code in the rest of the method. This might be better:

// Inlining these variables should be avoided (#81138).

The rest of the comment already explains why inlining can cause a safety issue.

// We assume the bounds reported by `range` remain the same, but
// an adversarial implementation could change between calls
let (start, end) = (range.start_bound(), range.end_bound());
Expand All @@ -114,6 +114,8 @@ impl<BorrowType: marker::BorrowType, K, V> NodeRef<BorrowType, K, V, marker::Lea
loop {
let (lower_edge_idx, lower_child_bound) = self.find_lower_bound_index(lower_bound);
let (upper_edge_idx, upper_child_bound) = self.find_upper_bound_index(upper_bound);
// SAFETY: This panic is used for safety, so external impls can't be called here. The
// comparison is done with integers for that reason.
Copy link
Contributor

@ssomers ssomers Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite confused here. If with "external impls" you mean the K: Ord implementation, sure we can call it, we have to call it, we just had the find functions call it profusely. But we can't blindly trust it to implement a total order , and even if it does, it may not correspond to the Q: Ord implementation that we used to check the range bounds.

The comparison is done with integers because those are what we end up with, what we need to identify positions in the data structure. For me, the question is: why do we compare and panic at all? Because if we didn't, and if the Ord implementations are invalid or inconsistent, or until #81138 if the range implementation was adversarial, we could easily and silently return a result that would lead callers to loop beyond the end of the tree, which they do unsafely with unwrap_unchecked, because they rely on a promise delivered here but documented nowhere.

I thought "SAFETY:" was for documenting why calling something marked unsafe is in fact safe, and here it's kind of the opposite.

So I would add API documentation about the promise here and elsewhere (not sure if I could or should squeeze that into this PR), and change this comment to something like:

We've called an external `Ord` implementation and can't trust it implements a total order

nor (if type K != Q) that it corresponds to the Ord implementation we used to check the range bounds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out I'm pretty wrong, after noticing there is no K: Ord declared here. An inconsistency between K: Ord and Q: Ord doesn't play up here, only Q: Ord is ever used. We assume that for various types of Q used over time, the same order among keys prevails. We only use K: Ord directly in some places like insert, because we actually move the key into the map so there is no room to play for a Q type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure we can call it, we have to call it, we just had the find functions call it profusely.

We can call it, but the result can't be used for a safety check. The user impl could return anything, which makes it useless.

The comparison is done with integers because those are what we end up with

Yes, but it's really because the Ord impl can't be assumed to be consistent or performant. Otherwise, the integers shouldn't be necessary IIUC.

I thought "SAFETY:" was for documenting why calling something marked unsafe is in fact safe, and here it's kind of the opposite.

I debated putting it here. However, it does document that the panic allows using unsafe code later with user impls involved.

We've called an external Ord implementation and can't trust it implements a total order

This would be okay with me, but it should say something about the panic providing safety.

Copy link
Contributor

@ssomers ssomers Mar 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can call it, but the result can't be used for a safety check.

Again, I don't understand why you write that. We have called it, we needed to call it, it's the purpose of the function we're discussing. Yes, the user impl could return anything, but that doesn't make it useless. We need to know what it says in case it's not trying to pull our leg.

Otherwise, the integers shouldn't be necessary IIUC.

The purpose of this function is to translate a pair of bounds (let's say a pair of keys) to (a node and) a pair of integers, edge indices, indices into various arrays. To translate from the problem domain to the solution domain, in terms of the btree data structure.

I debated putting it here. However, it does document that the panic allows using unsafe code later with user impls involved.

In my view, the fact that later code is calling unsafe code under some assumption about the result is not an unsafety problem in this function and not the only reason to have the panic. Even if there was no unsafe code outside, without a panic countermeasures, the outside code could erroneously access key-value pairs outside the range. Well, that's up for debate, because if your Ord stinks, there is no inside and outside a range. In any case, I think this function should define (which I tried in #83147) and honour its contract, and then callers can exploit those guarantees with any unsafe shenanigans they like.

We've called an external Ord implementation and can't trust it implements a total order

This would be okay with me, but it should say something about the panic providing safety.

Actually, come to think of it, that's not even true. AFAIK the panic used to provide safety (to code outside the function) in the old code, but no longer after #81094. The line below the main panic (if lower_edge_idx < upper_edge_idx {) is what provides adherence to the contract / external safety. If we'd leave out all the panics from this function, the whole btree code would be just as safe, it would merely return results as weird as the Ord implementation, instead of barking at users.

All the panics in this function are only there to fulfill an additional service this function delivers, diagnosing user error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to know what it says in case it's not trying to pull our leg.

I agree. All I'm saying is that we can't make a safety decision based on it. For example, you can't decide if a range can be passed to slice::get_unchecked based on a user impl.

I think this function should define (which I tried in #83147) and honour its contract, and then callers can exploit those guarantees with any unsafe shenanigans they like.

You wrote the new code, so whatever you prefer is good with me. The purpose of this comment was to document what part of the implementation of this function makes it safe, since that wasn't obvious when we talked about the function before your PR.

If we'd leave out all the panics from this function, the whole btree code would be just as safe, it would merely return results as weird as the Ord implementation, instead of barking at users.

I didn't realize this. In this case, I see no reason to have this comment. I'll remove it and fix the other one as we discussed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also didn't realize this, so it was a fruitful discussion.

if lower_edge_idx > upper_edge_idx {
panic!("Ord is ill-defined in BTreeMap range")
}
Expand Down