From 5c3139576b965de17cdac665333d22c17f90f1e8 Mon Sep 17 00:00:00 2001 From: Aaron Turon Date: Thu, 4 Sep 2014 20:32:22 -0700 Subject: [PATCH 1/7] RFC: Collections reform --- active/0000-collections-conventions.md | 1607 ++++++++++++++++++++++++ 1 file changed, 1607 insertions(+) create mode 100644 active/0000-collections-conventions.md diff --git a/active/0000-collections-conventions.md b/active/0000-collections-conventions.md new file mode 100644 index 00000000000..43c2f9c6ef9 --- /dev/null +++ b/active/0000-collections-conventions.md @@ -0,0 +1,1607 @@ +- Start Date: (fill me in with today's date, 2014-08-29) +- RFC PR #: (leave this empty) +- Rust Issue #: (leave this empty) + +# Summary + +This is a combined *conventions* and *library stabilization* RFC. The goal is to +establish a set of naming and signature conventions for `std::collections`. + +The major components of the RFC include: + +* Removing most of the traits in `collections`. + +* A general proposal for solving the "equiv" problem, as well as improving + `MaybeOwned`. + +* Patterns for overloading on by-need values and predicates. + +* Initial, forwards-compatible steps toward `Iterable`. + +* A coherent set of API conventions across the full variety of collections. + +*A big thank-you to @Gankro, who helped collect API information and worked + through an initial pass of some of the proposals here.* + +# Motivation + +This RFC aims to improve the design of the `std::collections` module in +preparation for API stabilization. There are a number of problems that need to +be addressed, as spelled out in the subsections below. + +## Collection traits + +The `collections` module defines several traits: + +* Collection +* Mutable +* MutableSeq +* Deque +* Map, MutableMap +* Set, MutableSet + +There are several problems with the current trait design: + +* Most important: the traits do not provide iterator methods like `iter`. It is + not possible to do so in a clean way without higher-kinded types, as the RFC + explains in more detail below. + +* The split between mutable and immutable traits is not well-motivated by + any of the existing collections. + +* The methods defined in these traits are somewhat anemic compared to the suite + of methods provided on the concrete collections that implement them. + +## Divergent APIs + +Despite the current collection traits, the APIs of various concrete collections +has diverged; there is not a globally coherent design, and there are many +inconsistencies. + +One problem in particular is the lack of clear guiding principles for the API +design. This RFC proposes a few along the way. + +## Providing slice APIs on `Vec` and `String` + +The `String` and `Vec` types each provide a limited subset of the methods +provides on string and vector slices, but there is not a clear reason to limit +the API in this way. Today, one has to write things like +`some_str.as_slice().contains(...)`, which is not ergonomic or intuitive. + +## The `Equiv` problem + +There is a more subtle problem related to slices. It's common to use a `HashMap` +with owned `String` keys, but then the natural API for things like lookup is not +very usable: + +```rust +fn find(&self, k: &K) -> Option<&V> +``` + +The problem is that, since `K` will be `String`, the `find` function requests a +`&String` value -- whereas one typically wants to work with the more flexible +`&str` slices. In particular, using `find` with a literal string requires +something like: + +```rust +map.find(&"some literal".to_string()) +``` + +which is unergonomic and requires an extra allocation just to get a borrow that, +in some sense, was already available. + +The current `HashMap` API works around this problem by providing an *additional* +set of methods that uses a generic notion of "equivalence" of values that have +different types: + +```rust +pub trait Equiv { + fn equiv(&self, other: &T) -> bool; +} + +impl Equiv for String { + fn equiv(&self, other: &str) -> bool { + self.as_slice() == other + } +} + +fn find_equiv + Equiv>(&self, k: &Q) -> Option<&V> +``` + +There are a few downsides to this approach: + +* It requires a duplicated `_equiv` variant of each method taking a reference to + the key. + +* Its correctness depends on equivalent values producing the same hash, which is + not checked. + +* `String`-keyed hash maps are very common, so newcomers are likely to run + headlong into the problem. First, `find` will fail to work in the expected + way. But the signature of `find_equiv` is more difficult to understand than + `find`, and it it's not immediately obvious that it solves the problem. + +The `TreeMap` API currently deals with this problem in an entirely different +way: + +```rust +/// Returns the value for which f(key) returns Equal. +/// f is invoked with current key and guides tree navigation. +/// That means f should be aware of natural ordering of the tree. +fn find_with(&self, f: |&K| -> Ordering) -> Option<&V> +``` + +Besides being less convenient -- you cannot write `map.find_with("some literal")` -- +this function navigates the tree according to an ordering that may have no +relationship to the actual ordering of the tree. + +## `MaybeOwned` + +Sometimes a function does not know in advance whether it will need or produce an +owned copy of some data, or whether a borrow suffices. A typical example is the +`from_utf8_lossy` function: + +```rust +fn from_utf8_lossy<'a>(v: &'a [u8]) -> MaybeOwned<'a> +``` + +This function will return a string slice if the input was correctly utf8 encoded +-- without any allocation. But if the input has invalid utf8 characters, the +function allocates a new `String` and inserts utf8 "replacement characters" +instead. Hence, the return type is an `enum`: + +```rust +pub enum MaybeOwned<'a> { + Slice(&'a str), + Owned(String), +} +``` + +This interface makes it possible to allocate only when necessary, but the +`MaybeOwned` type (and connected machinery) are somewhat ad hoc -- and +specialized to `String`/`str`. It would be somewhat more palatable if there were +a single "maybe owned" abstraction usable across a wide range of types. + +## `Iterable` + +A frequently-requested feature for the `collections` module is an `Iterable` +trait for "values that can be iterated over". There are two main motivations: + +* *Abstraction*. Today, you can write a function that takes a single `Iterator`, + but you cannot write a function that takes a container and then iterates over + it multiple times (perhaps with differing mutability levels). An `Iterable` + trait could allow that. + +* *Ergonomics*. You'd be able to write + + ```rust + for v in some_vec { ... } + ``` + + rather than + + ```rust + for v in some_vec.iter() { ... } + ``` + + and `consume_iter(some_vec)` rather than `consume_iter(some_vec.iter())`. + +# Detailed design + +## The collections today + +The concrete collections currently available in `std` fall into roughly three categories: + +* Sequences + * Vec + * String + * Slices + * Bitv + * DList + * RingBuf + * PriorityQueue + +* Sets + * HashSet + * TreeSet + * TrieSet + * EnumSet + * BitvSet + +* Maps + * HashMap + * TreeMap + * TrieMap + * LruCache + * SmallIntMap + +The primary goal of this RFC is to establish clean and consistent APIs that +apply across each group of collections. + +Before diving into the details, there is one high-level changes that should be +made to these collections. The `PriorityQueue` collection should be renamed to +`BinaryHeap`, following the convention that concrete collections are named according +to their implementation strategy, not the abstract semantics they implement. We +may eventually want `PriorityQueue` to be a *trait* that's implemented by +multiple concrete collections. + +The `LruCache` could be renamed for a similar reason (it uses a `HashMap` in its +implementation), However, the implementation is actually generic with respect to +this underlying map, and so in the long run (with HKT and other language +changes) `LruCache` should probably add a type parameter for the underlying map, +defaulted to `HashMap`. + +## Design principles + +* *Centering on `Iterator`s*. The `Iterator` trait is a strength of Rust's + collections library. Because so many APIs can produce iterators, adding an API + that consumes one is very powerful -- and conversely as well. Thus, whenever + possible, collection APIs should strive to work with iterators. + + In particular, some existing convenience methods avoid iterators for either + performance or ergonomic reasons. We should instead improve the ergonomics and + performance of iterators, so that these extra convenience methods are not + necessary and so that *all* collections can benefit. + +* *Minimizing method variants*. One problem with some of the current collection + APIs is the proliferation of method variants. For example, `HashMap` include + *seven* methods that begin with the name `find`! While each method has a + motivation, the API as a whole can be bewildering, especially to newcomers. + + When possible, we should leverage the trait system, or find other + abstractions, to reduce the need for method variants while retaining their + ergonomics and power. + +* *Conservatism*. It is easier to add APIs than to take them away. This RFC + takes a fairly conservative stance on what should be included in the + collections APIs. In general, APIs should be very clearly motivated by a wide + variety of use cases, either for expressiveness, performance, or ergonomics. + +## Deprecating the traits + +This RFC proposes a somewhat radical step for the collections traits: rather +than reform them, we should eliminate them altogether -- *for now*. + +Unlike inherent methods, which can easily be added and deprecated over time, a +trait is "forever": there are very few backwards-compatible modifications to +traits. Thus, for something as fundamental as collections, it is prudent to take +our time to get the traits right. + +### Lack of iterator methods + +In particular, there is one way in which the current traits are clearly *wrong*: +they do not provide standard methods like `iter`, despite these being +fundamental to working with collections in Rust. Sadly, this gap is due to +inexpressiveness in the language, which makes directly defining iterator methods +in a trait impossible: + +```rust +trait Iter { + type A; + type I: Iterator<&'a A>; // what is the lifetime here? + fn iter<'a>(&'a self) -> I; // and how to connect it to self? +} +``` + +The problem is that, when implementing this trait, the return type `I` of `iter` +should depend on the *lifetime* of self. For example, the corresponding +method in `Vec` looks like the following: + +```rust +impl Vec { + fn iter(&'a self) -> Items<'a, T> { ... } +} +``` + +This means that, given a `Vec`, there isn't a *single* type `Items` for +iteration -- rather, there is a *family* of types, one for each input lifetime. +In other words, the associated type `I` in the `Iter` needs to be +"higher-kinded": not just a single type, but rather a family: + +```rust +trait Iter { + type A; + type I<'a>: Iterator<&'a A>; + fn iter<'a>(&self) -> I<'a>; +} +``` + +In this case, `I` is parameterized by a lifetime, but in other cases (like +`map`) an associated type needs to be parameterized by a type. + +In general, such higher-kinded types (HKTs) are a much-requested feature for +Rust. But the design and implementation of higher-kinded types is, by itself, a +significant investment. + +HKT would also allow for parameterization over smart pointer types, which has +many potential use cases in the context of collections. + +Thus, the goal in this RFC is to do the best we can without HKT *for now*, +while allowing a graceful migration if or when HKT is added. + +### Persistent/immutable collections + +Another problem with the current collection traits is the split between +immutable and mutable versions. In the long run, we will probably want to +provide *persistent* collections (which allow non-destructive "updates" that +create new collections that share most of their data with the old ones). + +However, persistent collection APIs have not been thoroughly explored in Rust; +it would be hasty to standardize on a set of traits until we have more +experience. + +### Downsides of deprecation + +There are two main downsides to deprecating the traits without a replacement: + +1. It becomes impossible to write code using generics over a "kind" of + collection (like `Map`). + +2. It becomes more difficult to ensure that the collections share a common API. + +For point (1), first, if the APIs are sufficiently consistent it should be +possible to transition code from e.g. a `TreeMap` to a `HashMap` by changing +very few lines of code. Second, generic programming is currently quite limited, +given the inability to iterate. Finally, generic programming over collections is +a large design space (with much precedent in C++, for example), and we should +take our time and gain more experience with a variety of concrete collections +before settling on a design. + +For point (2), first, the current traits have failed to keep the APIs in line, +as we will see below. Second, this RFC is the antidote: we establish a clear set +of conventions and APIs for concrete collections up front, and stabilize on +those, which should make it easy to add traits later on. + +### Why not leave the traits as "experimental"? + +An alternative to deprecation would be to leave the traits intact, but marked as +experimental, with the intent to radically change them later. + +Such a strategy doesn't buy much relative to deprecation (given the arguments +above), but risks the traits becoming "de facto" stable if people begin using +them en masse. + +## Solving the `_equiv` and `MaybeOwned` problems + +The basic problem that leads to `_equiv` methods is that: + +* `&String` and `&str` are not the same type. +* The `&str` type is more flexible and hence more widely used. +* Code written for a generic type `T` that takes a reference `&T` will therefore + not be suitable when `T` is instantiated with `String`. + +A similar story plays out for `&Vec` and `&[T]`, and with DST and custom +slice types the same problem will arise elsewhere. + +### The `Borrow` trait + +This RFC proposes to use a *trait*, `Borrow` to connect borrowed and owned data +in a generic fashion: + +```rust +/// A trait for borrowing. +/// If `T: Borrow` then `&T` represents data borrowed from `T::Owned`. +trait Borrow for Sized? { + /// The type being borrowed from. + type Owned; + + /// Immutably borrow from an owned value. + fn borrow(&Owned) -> &Self; + + /// Mutably borrow from an owned value. + fn borrow_mut(&mut Owned) -> &mut Self; +} + +trait ToOwned: Borrow { + /// Produce a new owned value, usually by cloning. + fn to_owned(&self) -> Owned; +} + +// This has an implicit Sized bound, so the impls below would +// be allowed with full trait reform +impl Borrow for A { + type Owned = A; + fn borrow(a: &A) -> &A { + a + } + fn borrow_mut(a: &mut A) -> &mut A { + a + } +} + +impl ToOwned for A { + fn to_owned(&self) -> A { + self.clone() + } +} + +impl Borrow for str { + type Owned = String; + fn borrow(s: &String) -> &str { + self.as_slice() + } + fn borrow_mut(s: &mut String) -> &mut str { + self.as_mut_slice() + } +} + +impl ToOwned for str { + fn to_owned(&self) -> String { + self.to_string() + } +} + +impl Borrow for [T] { + type Owned = Vec; + fn borrow(s: &Vec) -> &[T] { + self.as_slice() + } + fn borrow_mut(s: &mut Vec) -> &mut [T] { + self.as_mut_slice() + } +} + +impl ToOwned for [T] { + fn to_owned(&self) -> Vec { + self.to_vec() + } +} +``` + +The design of the `Borrow` trait is a bit subtle. One of the main goals of the +design was allowing a *blanket* `impl` for non-sliceable types (the first `impl` +above). This blanket `impl` ensures that all new sized, cloneable types are +automatically borrowable; new `impl`s are required only for new *unsized* types, +which are rare. (Note that the first `impl` *implicitly* applies to only sized +types, which is why the additional `impl`s for particular unsized types are +allowed.) + +The desire for the blanket `impl` precludes several other possible designs: + +* An alternative design would swap the role of `Borrow` and `Owned`, making the + trait represent owned data with an associated `Borrowed` type. That's appealing, + because it would be a generic way to go from `T` to `&T` but from `Vec` to `&[T]`. + Unfortunately, there's no way to provide a blanket `impl` for such a trait. Since + the trait would need to be implemented for virtually *every* type, this is a non-starter. + +* Sticking with the structure of the `Borrow` trait, one question is why + implement it on e.g. `str` rather than `&str`. There are two reasons. First, + in order to use the sized/unsized distinction, we need the trait to talk + directly about `str`. Second, the `borrow` methods need to tie the lifetime of + the borrow to the input lifetime, whereas an implementation for `&str` would + have to specify a lifetime up front. + +Because of the blanket `impl`, the `Borrow` trait can largely be ignored except +when it is actually used -- which we describe next. + +### Using `Borrow` to replace `_equiv` methods + +With the `Borrow` trait in place, we can eliminate the `_equiv` method variants +by asking map keys to be `Borrow`: + +```rust +impl HashMap where K: Borrow + Hash + Eq { + fn find(&self, k: &K) -> &V { ... } + fn insert(&mut self, k: K::Owned, v: V) -> Option { ... } + ... +} +``` + +For string keys, we would use `HashMap`. Then, the `find` method would +take an `&str` key argument, while `insert` would take an owned `String`. On the +other hand, for some other type `Foo` a `HashMap` would take +`&Foo` for `find` and `Foo` for `insert`. (More discussion on the choice of +ownership is given in the [alternatives section](#ownership-management-for-keys). + +Aside from removing the `_equiv` variants, this approach retains a quite natural +signature for the map's methods, while retaining the flexibility that `_equiv` +methods offered. + +The same approach works for `TreeMap`, and should work in general for generic +data structures that need to work with both owned and borrowed values. + +Unlike the current `_equiv` or `find_with` methods, the above approach +guarantees coherence about hashing or ordering. For example, `HashMap` above +requires that `K` (the borrowed key type) is `Hash`, and will produce hashes +from owned keys by first borrowing from them. + +### Clone-on-write (`Cow`) pointers + +A side-benefit of the `Borrow` trait is that we can give a more general version +of the `MaybeOwned` as a "clone-on-write" smart pointer: + +```rust +pub enum Cow<'a, T> where T: ToOwned { + Shared(&'a T), + Owned(T::Owned) +} + +impl<'a, T> Cow<'a, T> where T: ToOwned { + pub fn new(shared: &'a T) -> Cow<'a, T> { + Shared(shared) + } + + pub fn new_owned(owned: T::Owned) -> Cow<'static, T> { + Owned(owned) + } + + pub fn is_owned(&self) -> bool { + match *self { + Owned(_) => true, + Shared(_) => false + } + } + + pub fn to_owned_mut(&mut self) -> &mut T::Owned { + match *self { + Shared(shared) => { + *self = Owned(shared.to_owned()); + self.to_owned_mut() + } + Owned(ref mut owned) => owned + } + } + + pub fn into_owned(self) -> T::Owned { + match self { + Shared(shared) => shared.to_owned(), + Owned(owned) => owned + } + } +} + +impl<'a, T> Deref for Cow<'a, T> where T: ToOwned { + fn deref(&self) -> &T { + match *self { + Shared(shared) => shared, + Owned(ref owned) => T::borrow(owned) + } + } +} + +impl<'a, T> DerefMut for Cow<'a, T> where T: ToOwned { + fn deref_mut(&mut self) -> &mut T { + T::borrow_mut(self.to_owned_mut()) + } +} +``` + +The type `Cow<'a, str>` is roughly equivalent to today's `MaybeOwned<'a>` +(and `Cow<'a, [T]>` to `MaybeOwnedVector<'a, T>`). + +By implementing `Deref` and `DerefMut`, the `Cow` type acts as a smart pointer +-- but in particular, the `mut` variant actually *clones* if the pointed-to +value is not currently owned. Hence "clone on write". + +One slight gotcha with the design is that `&mut str` is not very useful, while +`&mut String` is (since it allows extending the string, for example). On the +other hand, `Deref` and `DerefMut` must deref to the *same* underlying type, and +for `Deref` to not require cloning, it must yield a `&str` value. + +Thus, the `Cow` pointer offers a separate `to_owned_mut` method that yields a +mutable reference to the *owned* version of the type. + +Note that, by not using `into_owned`, the `Cow` pointer itself may be owned by +some other data structure (perhaps as part of a collection) and will internally +track whether an owned copy is available. + +Altogether, this RFC proposes to introduce `Borrow` and `Cow` as above, and to +deprecate `MaybeOwned` and `MaybeOwnedVector`. The API changes for the +collections are discussed [below](#the-apis). + +## `IntoIterator` (and `Iterable`) + +As discussed in [earlier](#iterable), some form of an `Iterable` trait is +desirable for both expressiveness and ergonomics. Unfortunately, a full +treatment of `Iterable` requires HKT for similar reasons to +[the collection traits](#lack-of-iterator-methods). However, it's possible to +get some of the way there in a forwards-compatible fashion. + +In particular, the following two traits work fine (with +[associated items](https://github.com/rust-lang/rfcs/pull/195)): + +```rust +trait Iterator { + type A; + fn next(&mut self) -> Option; + ... +} + +trait IntoIterator { + type A; + type I: Iterator; + + fn into_iter(self) -> I; +} +``` + +Because `IntoIterator` consumes `self`, lifetimes are not an issue. + +It's tempting to also define a trait like: + +```rust +trait Iterable<'a> { + type A; + type I: Iterator<&'a A>; + + fn iter(&'a self) -> I; +} +``` + +(along the lines of those proposed by +[an earlier RFC](https://github.com/rust-lang/rfcs/pull/17)). + +The problem with `Iterable` as defined above is that it's locked to a particular +lifetime up front. But in many cases, the needed lifetime is not even nameable +in advance: + +```rust +fn iter_through_rc(c: Rc) where I: Iterable { + // the lifetime of the borrow is established here, + // so cannot even be named in the function signature + for x in c.iter() { + // ... + } +} +``` + +To make this kind of example work, you'd need to be able to say something like: + +```rust +where <'a> I: Iterable<'a> +``` + +that is, that `I` implements `Iterable` for *every* lifetime `'a`. While such a +feature is feasible to add to `where` clauses, the HKT solution is undoubtedly +cleaner. + +Fortunately, we can have our cake and eat it too. This RFC proposes the +`IntoIterator` trait above, together with the following blanket `impl`: + +```rust +impl IntoIterator for I { + type A = I::A; + type I = I; + fn into_iter(self) -> I { + self + } +} +``` + +which means that taking `IntoIterator` is strictly more flexible than taking +`Iterator`. Note that in other languages (like Java), iterators are *not* +iterable because the latter implies an unlimited number of iterations. But +because `IntoIterable` consumes `self`, it yields only a single iteration, so +all is good. + +For individual collections, one can then implement `IntoIterator` on both the +collection and borrows of it: + +```rust +impl IntoIterator for Vec { + type A = T; + type I = MoveItems; + fn into_iter(self) -> MoveItems { ... } +} + +impl<'a, T> IntoIterator for &'a Vec { + type A = &'a T; + type I = Items<'a, T>; + fn into_iter(self) -> Items<'a, T> { ... } +} + +impl<'a, T> IntoIterator for &'a mut Vec { + type A = &'a mut T; + type I = ItemsMut<'a, T>; + fn into_iter(self) -> ItemsMut<'a, T> { ... } +} +``` + +If/when HKT is added later on, we can add an `Iterable` trait and a blanket +`impl` like the following: + +```rust +// the HKT version +trait Iterable { + type A; + type I<'a>: Iterator<&'a A>; + fn iter<'a>(&'a self) -> I<'a>; +} + +impl<'a, C: Iterable> IntoIterator for &'a C { + type A = &'a C::A; + type I = C::I<'a>; + fn into_iter(self) -> I { + self.iter() + } +} +``` + +This gives a clean migration path: once `Vec` implements `Iterable`, it can drop +the `IntoIterator` `impl`s for borrowed vectors, since they will be covered by +the blanket implementation. No code should break. + +Likewise, if we add a feature like the "universal" `where` clause mentioned +above, it can be used to deal with embedded lifetimes as in the +`iter_through_rc` example; and if the HKT version of `Iterable` is later added, +thanks to the suggested blanket `impl` for `IntoIterator` that `where` clause +could be changed to use `Iterable` instead, again without breakage. + +### Benefits of `IntoIterator` + +What do we gain by incorporating `IntoIterator` today? + +This RFC proposes that `for` loops should use `IntoIterator` rather than +`Iterator`. With the blanket `impl` of `IntoIterator` for any `Iterator`, this +is not a breaking change. However, given the `IntoIterator` `impl`s for `Vec` +above, we would be able to write: + +```rust +let v: Vec = ... + +for x in &v { ... } // iterate over &Foo +for x in &mut v { ... } // iterate over &mut Foo +for x in v { ... } // iterate over Foo +``` + +Similarly, methods that currently take slices or iterators can be changed to +take `IntoIterator` instead, immediately becoming more general and more +ergonomic. + +In general, `IntoIterator` will allow us to move toward more `Iterator`-centric +APIs today, in a way that's compatible with HKT tomorrow. + +### Additional methods + +Another typical desire for an `Iterable` trait is to offer defaulted versions of +methods that basically re-export iterator methods on containers (see +[the earlier RFC](https://github.com/rust-lang/rfcs/pull/17)). Usually these +methods would go through a reference iterator (i.e. the `iter` method) rather +than a moving iterator. + +It is possible to add such methods using the design proposed above, but there +are some drawbacks. For example, should `Vec::map` produce an iterator, or a new +vector? It would be possible to do the latter generically, but only with +HKT. (See +[this discussion](https://github.com/rust-lang/rfcs/pull/17#issuecomment-43817453).) + +This RFC only proposes to add the following method via `IntoIterator`, as a +convenience for a common pattern: + +```rust +trait IterCloned { + type A; + type I: Iterator; + fn iter_cloned(self) -> I; +} + +impl<'a, T, I: IntoIterator> IterCloned for I where I::A = &'a T { + type A = T; + type I = ClonedItems; + fn into_iter(self) -> I { ... } +} +``` + +(The `iter_cloned` method will help reduce the number of method variants in +general for collections, as we will see below). + +We will leave to later RFCs the incorporation of additional methods. Notice, in +particular, that such methods can wait until we introduce an `Iterable` trait +via HKT without breaking backwards compatibility. + +## Minimizing variants: `ByNeed` and `Predicate` traits + +There are several kinds of methods that, in their most general form take +closures, but for which convenience variants taking simpler data are common: + +* *Taking values by need*. For example, consider the `unwrap_or` and + `unwrap_or_else` methods in `Option`: + + ```rust + fn unwrap_or(self, def: T) -> T + fn unwrap_or_else(self, f: || -> T) -> T + ``` + + The `unwrap_or_else` method is the most general: it invokes the closure to + compute a default value *only when `self` is `None`*. When the default value + is expensive to compute, this by-need approach helps. But often the default + value is cheap, and closures are somewhat annoying to write, so `unwrap_or` + provides a convenience wrapper. + +* *Taking predicates*. For example, a method like `contains` often shows up + (inconsistently!) in two variants: + + ```rust + fn contains(&self, elem: &T) -> bool; // where T: PartialEq + fn contains_fn(&self, pred: |&T| -> bool) -> bool; + ``` + + Again, the `contains_fn` version is the more general, but it's convenient to + provide a specialized variant when the element type can be compared for + equality, to avoid writing explicit closures. + +As it turns out, with +[multidispatch](https://github.com/rust-lang/rfcs/pull/195)) it is possible to +use a *trait* to express these variants through overloading: + +```rust +trait ByNeed { + fn compute(self) -> T; +} + +impl ByNeed for T { + fn compute(self) -> T { + self + } +} + +// Due to multidispatch, this impl does NOT overlap with the above one +impl ByNeed for || -> T { + fn compute(self) -> T { + self() + } +} + +impl Option { + fn unwrap_or(self, def: U) where U: ByNeed { ... } + ... +} +``` + +```rust +trait Predicate { + fn check(&self, &T) -> bool; +} + +impl Predicate for &T { + fn check(&self, t: &T) -> bool { + self == t + } +} + +impl Predicate for |&T| -> bool { + fn check(&self, t: &T) -> bool { + (*self)(t) + } +} + +impl Vec { + fn contains

(&mut self, f: P) where P: Predicate` | `Vec`, `DList`, `RingBuf` +`fn dedup(&mut self)` | `Vec`, `DList`, `RingBuf` where `T: PartialEq` + +As with the insertion methods, there are some differences from today's API: + +* The `DList` and `RingBuf` data structures no longer provide `pop`, but rather + `pop_front` and `pop_back` -- similarly to the `push` methods. + +* The `remove` method on maps returns the value previously associated with the + key, if any. Previously, this functionality was provided by a separate `pop` + method, which has been dropped (consolidating needless method variants.) + +* The `retain` method takes a `Predicate`. + +* The `truncate`, `retain` and `dedup` methods are offered more widely. + +Again, some of the more specialized methods are not discussed here; see +"specialized operations" [below](#specialized-operations). + +### Inspection/mutation + +The next table gives methods for inspection and mutation of existing items in collections: + +Operation | Collections +--------- | ----------- +`fn len(&self) -> uint` | *all* +`fn is_empty(&self) -> bool` | *all* +`fn get(&self, uint) -> Option<&T>` | `[T]`, `Vec`, `RingBuf` +`fn get_mut(&mut self, uint) -> Option<&mut T>` | `[T]`, `Vec`, `RingBuf` +`fn get(&self, &K) -> Option<&V>` | `HashMap`, `TreeMap`, `TrieMap`, `SmallIntMap` +`fn get_mut(&mut self, &K) -> Option<&mut V>` | `HashMap`, `TreeMap`, `TrieMap`, `SmallIntMap` +`fn contains

(&self, P) where P: Predicate` | `[T]`, `str`, `Vec`, `String`, `DList`, `RingBuf`, `BinaryHeap` +`fn contains(&self, &K) -> bool` | `HashSet`, `TreeSet`, `TrieSet`, `EnumSet` +`fn contains_key(&self, &K) -> bool` | `HashMap`, `TreeMap`, `TrieMap`, `SmallIntMap` + +The biggest changes from the current APIs are: + +* The `find` and `find_mut` methods have been renamed to `get` and `get_mut`. + Further, all `get` methods return `Option` values and do not invoke `fail!`. + This is part of a general convention described in the next section (on the + `Index` traits). + +* The `contains` method is offered more widely. + +* There is no longer an equivalent of `find_copy` (which should be called + `find_clone`). Instead, we propose to add the following method to the `Option<&'a T>` + type where `T: Clone`: + + ```rust + fn cloned(self) -> Option { + self.map(|x| x.clone()) + } + ``` + + so that `some_map.find_copy(key)` will instead be written + `some_map.find(key).cloned()`. This method chain is slightly longer, but is + more clear and allows us to drop the `_copy` variants. Moreover, *all* users + of `Option` benefit from the new convenience method. + +#### The `Index` trait + +The `Index` and `IndexMut` traits provide indexing notation like `v[0]`: + +```rust +pub trait Index { + type Index; + type Result; + fn index(&'a self, index: &Index) -> &'a Result; +} + +pub trait IndexMut { + type Index; + type Result; + fn index_mut(&'a mut self, index: &Index) -> &'a mut Result; +} +``` + +These traits will be implemented for: `[T]`, `Vec`, `RingBuf`, `HashMap`, `TreeMap`, `TrieMap`, `SmallIntMap`. + +As a general convention, implementation of the `Index` traits will *fail the +task* if the index is invalid (out of bounds or key not found); they will +therefor return direct references to values. Any collection implementing `Index` +(resp. `IndexMut`) should also provide a `get` method (resp. `get_mut`) as a +non-failing variant that returns an `Option` value. + +This allows us to keep indexing notation maximally concise, while still +providing convenient non-failing variants (which can be used to provide a check +for index validity). + +### Iteration + +Every collection should provide the standard trio of iteration methods: + +```rust +fn iter(&'a self) -> Items<'a>; +fn iter_mut(&'a mut self) -> ItemsMut<'a>; +fn into_iter(self) -> ItemsMove; +``` + +and in particular implement the `IntoIterator` trait on both the collection type +and on (mutable) references to it. + +### Capacity management + +many of the collections have some notion of "capacity", which may be fixed, grow +explicitly, or grow implicitly: + +- No capacity/fixed capacity: `DList`, `TreeMap`, `TreeSet`, `TrieMap`, `TrieSet`, slices, `EnumSet` +- Explicit growth: `LruCache` +- Implicit growth: `Vec`, `RingBuf`, `HashMap`, `HashSet`, `BitvSet`, `BinaryHeap` + +Growable collections provide functions for capacity management, as follows. + +#### Explicit growth + +For explicitly-grown collections, the normal constructor (`new`) takes a +capacity argument. Capacity can later be inspected or updated as follows: + +```rust +fn capacity(&self) -> uint +fn set_capacity(&mut self, capacity: uint) +``` + +(Note, this renames `LruCache::change_capacity` to `set_capacity`, the +prevailing style for setter method.) + +#### Implicit growth + +For implicitly-grown collections, the normal constructor (`new`) does not take a +capacity, but there is an explicit `with_capacity` constructor, along with other +functions to work with the capacity later on: + +```rust +fn with_capacity(uint) -> Self +fn capacity(&self) -> uint +fn reserve(&mut self, uint) +fn reserve_exact(&mut self, uint) +fn shrink_to_fit(&mut self) +``` + +There are some important changes from the current APIs: + +* The `reserve` and `reserve_exact` methods now take as an argument the *extra* + space to reserve, rather than the final desired capacity, as this usage is + vastly more common. The `reserve` function will generally grow the capacity in + powers of two (as needed for amortization), while `reserve_exact` will reserve + exactly the requested additional capacity. The `reserve_additional` methods + are deprecated. + +* The `with_capacity` constructor does *not* take any additional arguments, for + uniformity with `new`. This change affects `Bitv` in particular. + +#### Bounded iterators + +Some of the maps (e.g. `TreeMap`) currently offer specialized iterators over +their entries starting at a given key (called `lower_bound`) and above a given +key (called `upper_bound`), along with `_mut` variants. While the functionality +is worthwhile, the names are not very clear, so this RFC proposes the following +renaming: + +```rust +// Returns an iterator starting with the first key-value pair whose key is not less than k. +fn iter_from(&self, k: &K) -> Entries<'a, K, V> +fn iter_from_mut(&mut self, k: &K) -> EntriesMut<'a, K, V> + +Returns an iterator starting with the first key-value pair whose key is greater than k. +fn iter_above(&self, k: &K) -> Entries<'a, K, V> +fn iter_above_mut(&mut self, k: &K) -> EntriesMut <'a, K, V> +``` + +These iterators should be provided for any maps over ordered keys (`TreeMap`, +`TrieMap` and `SmallIntMap`). + +In addition, analogous methods should be provided for sets over ordered keys +(`TreeSet`, `TrieSet`, `BitvSet`). + +### Set operations + +#### Comparisons + +All sets should offer the following methods, as they do today: + +```rust +fn is_disjoint(&self, other: &Self) -> bool; +fn is_subset(&self, other: &Self) -> bool; +fn is_superset(&self, other: &Self) -> bool; +``` + +#### Combinations + +Sets can also be combined using the standard operations -- union, intersection, +difference and symmetric difference (exclusive or). Today's APIs for doing so +look like this: + +```rust +fn union<'a>(&'a self, other: &'a Self) -> I; +fn intersection<'a>(&'a self, other: &'a Self) -> I; +fn difference<'a>(&'a self, other: &'a Self) -> I; +fn symmetric_difference<'a>(&'a self, other: &'a Self) -> I; +``` + +where the `I` type is an iterator over keys that varies by concrete set. Working +with these iterators avoids materializing intermediate sets when they're not +needed; the `collect` method can be used to create sets when they are. + +To clarify the API, this RFC proposes renaming the methods to `iter_or`, +`iter_and`, `iter_sub`, and `iter_xor` respectively. These names emphasize the +fact that the methods return iterators, which may be surprising. + +Sets should also implement the `BitOr`, `BitAnd`, `BitXor` and `Sub` traits from +`std::ops`, allowing overloaded notation `|`, `&`, `|^` and `-` to be used with +sets. These are equivalent to invoking the corresponding `iter_` method and then +calling `collect`, but for some sets (notably `BitvSet`) a more efficient direct +implementation is possible. + +Unfortunately, we do not yet have a set of traits corresponding to operations +`|=`, `&=`, etc, but again in some cases doing the update in place may be more +efficient. Right now, `BitvSet` is the only concrete set offering such operations: + +```rust +fn union_with(&mut self, other: &BitvSet) +fn intersect_with(&mut self, other: &BitvSet) +fn difference_with(&mut self, other: &BitvSet) +fn symmetric_difference_with(&mut self, other: &BitvSet) +``` + +This RFC punts on the question of naming here: it does *not* propose a new set +of names. Ideally, we would add operations like `|=` in a separate RFC, and use +those conventionally for sets. If not, we will choose fallback names during the +stabilization of `BitvSet`. + +### Map operations + +#### Combined methods + +The `HashMap` type currently provides a somewhat bewildering set of `find`/`insert` variants: + +```rust +fn find_or_insert(&mut self, k: K, v: V) -> &mut V +fn find_or_insert_with<'a>(&'a mut self, k: K, f: |&K| -> V) -> &'a mut V +fn insert_or_update_with<'a>(&'a mut self, k: K, v: V, f: |&K, &mut V|) -> &'a mut V +fn find_with_or_insert_with<'a, A>(&'a mut self, k: K, a: A, found: |&K, &mut V, A|, not_found: |&K, A| -> V) -> &'a mut V +``` + +These methods are used to couple together lookup and insertion/update +operations, thereby avoiding an extra lookup step. However, the current set of +method variants seems overly complex. + +There is [another RFC](https://github.com/rust-lang/rfcs/pull/216) already in +the queue addressing this problem in a very nice way, and this RFC defers to +that one + +#### Key and value iterators + +In addition to the standard iterators, maps should provide by-reference +convenience iterators over keys and values: + +```rust +fn keys(&'a self) -> Keys<'a, K> +fn values(&'a self) -> Values<'a, V> +``` + +While these iterators are easy to define in terms of the main `iter` method, +they are used often enough to warrant including convenience methods. + +### Specialized operations + +Many concrete collections offer specialized operations beyond the ones given +above. These will largely be addressed through the API stabilization process +(which focuses on local API issues, as opposed to general conventions), but a +few broad points are addressed below. + +#### Relating `Vec` and `String` to slices + +One goal of this RFC is to supply all of the methods on (mutable) slices on +`Vec` and `String`. There are a few ways to achieve this, so concretely the +proposal is for `Vec` to implement `Deref<[T]>` and `DerefMut<[T]>`, and +`String` to implement `Deref`. This will automatically allow all slice +methods to be invoked from vectors and strings, and will allow writing `&*v` +rather than `v.as_slice()`. + +In this scheme, `Vec` and `String` are really "smart pointers" around the +corresponding slice types. While counterintuitive at first, this perspective +actually makes a fair amount of sense, especially with DST. + +(Initially, it was unclear whether this strategy would play well with method +resolution, but the planned resolution rules should work fine.) + +#### `String` API + +One of the key difficulties with the `String` API is that strings use utf8 +encoding, and some operations are only efficient when working at the byte level +(and thus taking this encoding into account). + +As a general principle, we will move the API toward the following convention: +index-related operations always work in terms of bytes, other operations deal +with chars by default (but can have suffixed variants for working at other +granularities when appropriate.) + +#### `DList` + +The `DList` type offers a number of specialized methods: + +```rust +swap_remove, insert_when, insert_ordered, merge, rotate_forward and rotate_backward +``` + +Prior to stabilizing the `DList` API, we will attempt to simplify its API +surface, possibly by using idea from the +[collection views RFC](https://github.com/rust-lang/rfcs/pull/216). + +### Minimizing method variants via iterators + +#### Partitioning via `FromIterator` + +One place we can move toward iterators is functions like `partition` and +`partitioned` on vectors and slices: + +```rust +// on Vec +fn partition(self, f: |&T| -> bool) -> (Vec, Vec); + +// on [T] where T: Clone +fn partitioned(&self, f: |&T| -> bool) -> (Vec, Vec); +``` + +These two functions transform a vector/slice into a pair of vectors, based on a +"partitioning" function that says which of the two vectors to place elements +into. The `partition` variant works by moving elements of the vector, while +`paritioned` clones elements. + +There are a few unfortunate aspects of an API like this one: + +* It's specific to vectors/slices, although in principle both the source and + target containers could be more general. + +* The fact that two variants have to be exposed, for owned versus clones, is + somewhat unfortunate. + +This RFC proposes the following alternative design: + +```rust +pub enum Either { + pub Left(T), + pub Right(U), +} + +impl FromIterator for (A, B) where A: Extend, B: Extend { + fn from_iter(mut iter: I) -> (A, B) where I: IntoIterator> { + let mut left: A = FromIterator::from_iter(None::); + let mut right: B = FromIterator::from_iter(None::); + + for item in iter { + match item { + Left(t) => left.extend(Some(t)), + Right(u) => right.extend(Some(u)), + } + } + + (left, right) + } +} + +trait Iterator { + ... + fn partition(self, |&A| -> bool) -> Partitioned { ... } +} + +// where Partitioned: Iterator> +``` + +This design drastically generalizes the partitioning functionality, allowing it +be used with arbitrary collections and iterators, while removing the +by-reference and by-value distinction. + +Using this design, you have: + +```rust +// The following two lines are equivalent: +let (u, w) = v.partition(f); +let (u, w): (Vec, Vec) = v.into_iter().partition(f).collect(); + +// The following two lines are equivalent: +let (u, w) = v.as_slice().partitioned(f); +let (u, w): (Vec, Vec) = v.iter_cloned().partition(f).collect(); +``` + +There is some extra verbosity, mainly due to the type annotations for `collect`, +but the API is much more flexible, since the partitioned data can now be +collected into other collections (or even differing collections). In addition, +partitioning is supported for *any* iterator. + +#### Removing methods like `from_elem`, `from_fn`, `grow`, and `grow_fn` + +Vectors and some other collections offer constructors and growth functions like +the following: + +```rust +fn from_elem(length: uint, value: T) -> Vec +fn from_fn(length: uint, op: |uint| -> T) -> Vec +fn grow(&mut self, n: uint, value: &T) +fn grow_fn(&mut self, n: uint, f: |uint| -> T) +``` + +These extra variants can easily be dropped in favor of iterators, and this RFC +proposes to do so. + +The `iter` module already contains a `Repeat` iterator; this RFC proposes to add +a free function `repeat` to `iter` as a convenience for `iter::Repeat::new`. + +With that in place, we have: + +```rust +// Equivalent: +let v = Vec::from_elem(n, a); +let v = Vec::from_iter(repeat(a).take(n)); + +// Equivalent: +let v = Vec::from_fn(n, f); +let v = Vec::from_iter(range(0, n).map(f)); + +// Equivalent: +v.grow(n, a); +v.extend(repeat(a).take(n)); + +// Equivalent: +v.grow_fn(n, f); +v.extend(range(0, n).map(f)); +``` + +While these replacements are slightly longer, an important aspect of ergonomics +is *memorability*: by placing greater emphasis on iterators, programmers will +quickly learn the iterator APIs and have those at their fingertips, while +remembering ad hoc method variants like `grow_fn` is more difficult. + +#### Long-term: removing `push_all` and `push_all_move` + +The `push_all` and `push_all_move` methods on vectors are yet more API variants +that could, in principle, go through iterators: + +```rust +// The following are *semantically* equivalent +v.push_all(some_slice); +v.extend(some_slice.iter_cloned()); + +// The following are *semantically* equivalent +v.push_all_move(some_vec); +v.extend(some_vec); +``` + +However, currently the `push_all` and `push_all_move` methods can rely on the +*exact* size of the container being pushed, in order to elide bounds checks. We +do not currently have a way to "trust" methods like `len` on iterators to elide +bounds checks. A separate RFC will introduce the notion of a "trusted" method +which should support such optimization and allow us to deprecate the `push_all` +and `push_all_move` variants. (This is unlikely to happen before 1.0, so the +methods will probably still be included with "experimental" status.) + +# Alternatives + +## For the `Equiv` problem + +### The `HashMapKey` trait and friends + +An earlier proposal for solving the `_equiv` problem was given in the +[associated items RFC](https://github.com/rust-lang/rfcs/pull/195)): + +```rust +trait HashMapKey : Clone + Hash + Eq { + type Query: Hash = Self; + fn compare(&self, other: &Query) -> bool { self == other } + fn query_to_key(q: &Query) -> Self { q.clone() }; +} + +impl HashMapKey for String { + type Query = str; + fn compare(&self, other: &str) -> bool { + self.as_slice() == other + } + fn query_to_key(q: &str) -> String { + q.into_string() + } +} + +impl HashMap where K: HashMapKey { + fn find(&self, q: &K::Query) -> &V { ... } +} +``` + +This solution has several drawbacks, however: + +* It requires a separate trait for different kinds of maps -- one for `HashMap`, + one for `TreeMap`, etc. + +* It requires that a trait be implemented on a given key without providing a + blanket implementation. Since you also need different traits for different + maps, it's easy to imagine cases where a out-of-crate type you want to use as + a key doesn't implement the key trait, forcing you to newtype. + +* It doesn't help with the `MaybeOwned` problem. + +### Daniel Micay's hack + +@strcat has a [PR](https://github.com/rust-lang/rust/pull/16713) that makes it +possible to, for example, coerce a `&str` to an `&String` value. + +This provides some help for the `_equiv` problem, since the `_equiv` methods +could potentially be dropped. However, there are a few downsides: + +* Using a map with string keys is still a bit more verbose: + + ```rust + map.find("some static string".as_string()) // with the hack + map.find("some static string") // with this RFC + ``` + +* The solution is specialized to strings and vectors, and does not necessarily + support user-defined unsized types or slices. + +* It doesn't help with the `MaybeOwned` problem. + +* It exposes some representation interplay between slices and references to + owned values, which we may not want to commit to or reveal. + +## For `IntoIterator` + +An important aspect of the `IntoIterator` design is that the element type is an +associated type, *not* an input type. + +This is a tradeoff: + +* Making it an associated type means that the `for` examples work, because the + type of `Self` uniquely determines the element type for iteration, aiding type + inference. + +* Making it an input type would forgo those benefits, but would allow some + additional flexibility. For example, you could implement `IntoIterator` for + an iterator on `&A` when `A` is cloned, therefore *implicitly* cloning as + needed to make the ownership work out (and obviating the need for + `iter_cloned`). However, we have generally kept away from this kind of + implicit magic, *especially* when it can involve hidden costs like cloning, so + the more explicit design given in this RFC seems best. + +# Unresolved questions + +## Unresolved conventions/APIs + +As mentioned [above](#combinations), this RFC does not resolve the question of +what to call set operations that update the set in place. + +It likewise does not settle the APIs that appear in only single concrete +collections. These will largely be handled through the API stabilization +process, unless radical changes are proposed. + +Finally, additional methods provided via the `IntoIterator` API are left for +future consideration. + +## Coercions + +Using the `Borrow` trait, it might be possible to safely add a coercion for auto-slicing: + +``` + If T: Borrow: + coerce &'a T::Owned to &'a T + coerce &'a mut T::Owned to &'a mut T +``` + +For sized types, this coercion is *forced* to be trivial, so the only time it +would involve running user code is for unsized values. + +A general story about such coercions will be left to a follow-up RFC. From 4034f4a965c178156d8ddbdadafd26f39b8da98b Mon Sep 17 00:00:00 2001 From: Aaron Turon Date: Thu, 11 Sep 2014 11:34:53 -0700 Subject: [PATCH 2/7] Added drawbacks --- active/0000-collections-conventions.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/active/0000-collections-conventions.md b/active/0000-collections-conventions.md index 43c2f9c6ef9..98755059845 100644 --- a/active/0000-collections-conventions.md +++ b/active/0000-collections-conventions.md @@ -1577,6 +1577,10 @@ This is a tradeoff: implicit magic, *especially* when it can involve hidden costs like cloning, so the more explicit design given in this RFC seems best. +# Downsides + +Design tradeoffs were discussed inline. + # Unresolved questions ## Unresolved conventions/APIs From 8ecd36038e3c10ae8c90b07f9019497827d1f595 Mon Sep 17 00:00:00 2001 From: Aaron Turon Date: Thu, 11 Sep 2014 12:46:15 -0700 Subject: [PATCH 3/7] A few minor fixes --- active/0000-collections-conventions.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/active/0000-collections-conventions.md b/active/0000-collections-conventions.md index 98755059845..47aeffce588 100644 --- a/active/0000-collections-conventions.md +++ b/active/0000-collections-conventions.md @@ -855,7 +855,7 @@ trait Predicate { impl Predicate for &T { fn check(&self, t: &T) -> bool { - self == t + *self == t } } @@ -1169,8 +1169,8 @@ functions to work with the capacity later on: ```rust fn with_capacity(uint) -> Self fn capacity(&self) -> uint -fn reserve(&mut self, uint) -fn reserve_exact(&mut self, uint) +fn reserve(&mut self, additional: uint) +fn reserve_exact(&mut self, additional: uint) fn shrink_to_fit(&mut self) ``` @@ -1178,10 +1178,10 @@ There are some important changes from the current APIs: * The `reserve` and `reserve_exact` methods now take as an argument the *extra* space to reserve, rather than the final desired capacity, as this usage is - vastly more common. The `reserve` function will generally grow the capacity in - powers of two (as needed for amortization), while `reserve_exact` will reserve - exactly the requested additional capacity. The `reserve_additional` methods - are deprecated. + vastly more common. The `reserve` function may grow the capacity by a larger + amount than requested, to ensure amortization, while `reserve_exact` will + reserve exactly the requested additional capacity. The `reserve_additional` + methods are deprecated. * The `with_capacity` constructor does *not* take any additional arguments, for uniformity with `new`. This change affects `Bitv` in particular. @@ -1199,7 +1199,7 @@ renaming: fn iter_from(&self, k: &K) -> Entries<'a, K, V> fn iter_from_mut(&mut self, k: &K) -> EntriesMut<'a, K, V> -Returns an iterator starting with the first key-value pair whose key is greater than k. +// Returns an iterator starting with the first key-value pair whose key is greater than k. fn iter_above(&self, k: &K) -> Entries<'a, K, V> fn iter_above_mut(&mut self, k: &K) -> EntriesMut <'a, K, V> ``` From 4490dc7797bd319d22508961583deaf31b5b5c66 Mon Sep 17 00:00:00 2001 From: Aaron Turon Date: Thu, 11 Sep 2014 12:58:05 -0700 Subject: [PATCH 4/7] Added alternative discussion for `for` --- active/0000-collections-conventions.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/active/0000-collections-conventions.md b/active/0000-collections-conventions.md index 47aeffce588..fcc9429ef3c 100644 --- a/active/0000-collections-conventions.md +++ b/active/0000-collections-conventions.md @@ -1560,6 +1560,25 @@ could potentially be dropped. However, there are a few downsides: ## For `IntoIterator` +### Handling of `for` loops + +The fact that `for x in v` moves elements from `v`, while `for x in v.iter()` +yields references, may be a bit surprising. On the other hand, moving is the +default almost everywhere in Rust, and with the proposed approach you get to use `&` and +`&mut` to easily select other forms of iteration. + +Unfortunately, it's a bit tricky to make for use by-ref iterators instead. The +problem is that an iterator is `IntoIterator`, but it is not `Iterable` (or +whatever we call the by-reference trait). Why? Because `IntoIterator` gives you +an iterator that can be used only *once*, while `Iterable` allows you to ask for +iterators repeatedly. + +If `for` demanded an `Iterable`, then `for x in v.iter()` and `for x in v.iter_mut()` +would cease to work -- we'd have to find some other approach. It might be +doable, but it's not obvious how to do it. + +### Input versus output type parameters + An important aspect of the `IntoIterator` design is that the element type is an associated type, *not* an input type. From d4e9e71127e5ecd4018101370edec01f1640ab6a Mon Sep 17 00:00:00 2001 From: Aaron Turon Date: Thu, 11 Sep 2014 12:59:13 -0700 Subject: [PATCH 5/7] Fix bad references to self --- active/0000-collections-conventions.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/active/0000-collections-conventions.md b/active/0000-collections-conventions.md index fcc9429ef3c..74c80875ce5 100644 --- a/active/0000-collections-conventions.md +++ b/active/0000-collections-conventions.md @@ -418,10 +418,10 @@ impl ToOwned for A { impl Borrow for str { type Owned = String; fn borrow(s: &String) -> &str { - self.as_slice() + s.as_slice() } fn borrow_mut(s: &mut String) -> &mut str { - self.as_mut_slice() + s.as_mut_slice() } } @@ -434,10 +434,10 @@ impl ToOwned for str { impl Borrow for [T] { type Owned = Vec; fn borrow(s: &Vec) -> &[T] { - self.as_slice() + s.as_slice() } fn borrow_mut(s: &mut Vec) -> &mut [T] { - self.as_mut_slice() + s.as_mut_slice() } } From 7eadf3497610ecae6c6992432999977598fe5fb5 Mon Sep 17 00:00:00 2001 From: Aaron Turon Date: Thu, 18 Sep 2014 11:26:02 -0700 Subject: [PATCH 6/7] Substantially revised Borrow trait, add range proposal --- active/0000-collections-conventions.md | 370 +++++++++++++++++-------- 1 file changed, 255 insertions(+), 115 deletions(-) diff --git a/active/0000-collections-conventions.md b/active/0000-collections-conventions.md index 74c80875ce5..9cd525b040f 100644 --- a/active/0000-collections-conventions.md +++ b/active/0000-collections-conventions.md @@ -111,7 +111,8 @@ fn find_equiv + Equiv>(&self, k: &Q) -> Option<&V> There are a few downsides to this approach: * It requires a duplicated `_equiv` variant of each method taking a reference to - the key. + the key. (This downside could likely be mitigated using + [multidispatch](https://github.com/rust-lang/rfcs/pull/195).) * Its correctness depends on equivalent values producing the same hash, which is not checked. @@ -121,6 +122,9 @@ There are a few downsides to this approach: way. But the signature of `find_equiv` is more difficult to understand than `find`, and it it's not immediately obvious that it solves the problem. +* It is the right API for `HashMap`, but not helpful for e.g. `TreeMap`, which + would want an analog for `Ord`. + The `TreeMap` API currently deals with this problem in an entirely different way: @@ -235,8 +239,10 @@ defaulted to `HashMap`. * *Centering on `Iterator`s*. The `Iterator` trait is a strength of Rust's collections library. Because so many APIs can produce iterators, adding an API - that consumes one is very powerful -- and conversely as well. Thus, whenever - possible, collection APIs should strive to work with iterators. + that consumes one is very powerful -- and conversely as well. Moreover, + iterators are highly efficient, since you can chain several layers of + modification without having to materialize intermediate results. Thus, + whenever possible, collection APIs should strive to work with iterators. In particular, some existing convenience methods avoid iterators for either performance or ergonomic reasons. We should instead improve the ergonomics and @@ -257,7 +263,7 @@ defaulted to `HashMap`. collections APIs. In general, APIs should be very clearly motivated by a wide variety of use cases, either for expressiveness, performance, or ergonomics. -## Deprecating the traits +## Removing the traits This RFC proposes a somewhat radical step for the collections traits: rather than reform them, we should eliminate them altogether -- *for now*. @@ -330,9 +336,9 @@ However, persistent collection APIs have not been thoroughly explored in Rust; it would be hasty to standardize on a set of traits until we have more experience. -### Downsides of deprecation +### Downsides of removal -There are two main downsides to deprecating the traits without a replacement: +There are two main downsides to removing the traits without a replacement: 1. It becomes impossible to write code using generics over a "kind" of collection (like `Map`). @@ -354,10 +360,10 @@ those, which should make it easy to add traits later on. ### Why not leave the traits as "experimental"? -An alternative to deprecation would be to leave the traits intact, but marked as +An alternative to removal would be to leave the traits intact, but marked as experimental, with the intent to radically change them later. -Such a strategy doesn't buy much relative to deprecation (given the arguments +Such a strategy doesn't buy much relative to removal (given the arguments above), but risks the traits becoming "de facto" stable if people begin using them en masse. @@ -380,43 +386,25 @@ in a generic fashion: ```rust /// A trait for borrowing. -/// If `T: Borrow` then `&T` represents data borrowed from `T::Owned`. -trait Borrow for Sized? { - /// The type being borrowed from. - type Owned; - +trait Borrow { /// Immutably borrow from an owned value. - fn borrow(&Owned) -> &Self; + fn borrow(&self) -> &B; /// Mutably borrow from an owned value. - fn borrow_mut(&mut Owned) -> &mut Self; -} - -trait ToOwned: Borrow { - /// Produce a new owned value, usually by cloning. - fn to_owned(&self) -> Owned; + fn borrow_mut(&mut self) -> &mut B; } -// This has an implicit Sized bound, so the impls below would -// be allowed with full trait reform -impl Borrow for A { - type Owned = A; - fn borrow(a: &A) -> &A { +// The Sized bound means that this impl does not overlap with the impls below. +impl Borrow for T { + fn borrow(a: &T) -> &T { a } - fn borrow_mut(a: &mut A) -> &mut A { + fn borrow_mut(a: &mut T) -> &mut T { a } } -impl ToOwned for A { - fn to_owned(&self) -> A { - self.clone() - } -} - -impl Borrow for str { - type Owned = String; +impl Borrow for String { fn borrow(s: &String) -> &str { s.as_slice() } @@ -425,14 +413,7 @@ impl Borrow for str { } } -impl ToOwned for str { - fn to_owned(&self) -> String { - self.to_string() - } -} - -impl Borrow for [T] { - type Owned = Vec; +impl Borrow<[T]> for Vec { fn borrow(s: &Vec) -> &[T] { s.as_slice() } @@ -440,36 +421,17 @@ impl Borrow for [T] { s.as_mut_slice() } } - -impl ToOwned for [T] { - fn to_owned(&self) -> Vec { - self.to_vec() - } -} ``` -The design of the `Borrow` trait is a bit subtle. One of the main goals of the -design was allowing a *blanket* `impl` for non-sliceable types (the first `impl` -above). This blanket `impl` ensures that all new sized, cloneable types are -automatically borrowable; new `impl`s are required only for new *unsized* types, -which are rare. (Note that the first `impl` *implicitly* applies to only sized -types, which is why the additional `impl`s for particular unsized types are -allowed.) - -The desire for the blanket `impl` precludes several other possible designs: - -* An alternative design would swap the role of `Borrow` and `Owned`, making the - trait represent owned data with an associated `Borrowed` type. That's appealing, - because it would be a generic way to go from `T` to `&T` but from `Vec` to `&[T]`. - Unfortunately, there's no way to provide a blanket `impl` for such a trait. Since - the trait would need to be implemented for virtually *every* type, this is a non-starter. - -* Sticking with the structure of the `Borrow` trait, one question is why - implement it on e.g. `str` rather than `&str`. There are two reasons. First, - in order to use the sized/unsized distinction, we need the trait to talk - directly about `str`. Second, the `borrow` methods need to tie the lifetime of - the borrow to the input lifetime, whereas an implementation for `&str` would - have to specify a lifetime up front. +*(Note: thanks to @epdtry for [suggesting this variation](https://github.com/rust-lang/rfcs/pull/235#issuecomment-55337168)! The original proposal + is listed [in the Alternatives](#variants-of-borrow).)* + +A primary goal of the design is allowing a *blanket* `impl` for non-sliceable +types (the first `impl` above). This blanket `impl` ensures that all new sized, +cloneable types are automatically borrowable; new `impl`s are required only for +new *unsized* types, which are rare. The `Sized` bound on the initial impl means +that we can freely add impls for unsized types (like `str` and `[T]`) without +running afoul of coherence. Because of the blanket `impl`, the `Borrow` trait can largely be ignored except when it is actually used -- which we describe next. @@ -480,30 +442,54 @@ With the `Borrow` trait in place, we can eliminate the `_equiv` method variants by asking map keys to be `Borrow`: ```rust -impl HashMap where K: Borrow + Hash + Eq { - fn find(&self, k: &K) -> &V { ... } - fn insert(&mut self, k: K::Owned, v: V) -> Option { ... } +impl HashMap where K: Hash + Eq { + fn find(&self, k: &Q) -> &V where K: Borrow, Q: Hash + Eq { ... } + fn contains_key(&self, k: &Q) -> bool where K: Borrow, Q: Hash + Eq { ... } + fn insert(&mut self, k: K, v: V) -> Option { ... } + ... } ``` -For string keys, we would use `HashMap`. Then, the `find` method would -take an `&str` key argument, while `insert` would take an owned `String`. On the -other hand, for some other type `Foo` a `HashMap` would take -`&Foo` for `find` and `Foo` for `insert`. (More discussion on the choice of -ownership is given in the [alternatives section](#ownership-management-for-keys). +The benefits of this approach over `_equiv` are: + +* The `Borrow` trait captures the borrowing relationship between an owned data + structure and both references to it and slices from it -- once and for all. + This means that it can be used *anywhere* we need to program generically over + "borrowed" data. In particular, the single trait works for both `HashMap` and + `TreeMap`, and should work for other kinds of data structures as well. It also + helps generalize `MaybeOwned`, for similar reasons (see below.) + + A *very important* consequence is that the map methods using `Borrow` can + potentially be put into a common `Map` trait that's implemented by `HashMap`, + `TreeMap`, and others. While we do not propose to do so now, we definitely + want to do so later on. -Aside from removing the `_equiv` variants, this approach retains a quite natural -signature for the map's methods, while retaining the flexibility that `_equiv` -methods offered. +* When using a `HashMap`, all of the basic methods like `find`, + `contains_key` and `insert` "just work", without forcing you to think about + `&String` vs `&str`. -The same approach works for `TreeMap`, and should work in general for generic -data structures that need to work with both owned and borrowed values. +* We don't need separate `_equiv` variants of methods. (However, this could + probably be addressed with + [multidispatch](https://github.com/rust-lang/rfcs/pull/195) by providing a + blanket `Equiv` implementation.) -Unlike the current `_equiv` or `find_with` methods, the above approach -guarantees coherence about hashing or ordering. For example, `HashMap` above -requires that `K` (the borrowed key type) is `Hash`, and will produce hashes -from owned keys by first borrowing from them. +On the other hand, this approach retains some of the downsides of `_equiv`: + +* The signature for methods like `find` and `contains_key` is more complex than + their current signatures. There are two counterpoints. First, over time the + `Borrow` trait is likely to become a well-known concept, so the signature will + not appear completely alien. Second, what is perhaps more important than the + signature is that, when using `find` on `HashMap`, various method + arguments *just work* as expected. + +* The API does not guarantee "coherence": the `Hash` and `Eq` (or `Ord`, for + `TreeMap`) implementations for the owned and borrowed keys might differ, + breaking key invariants of the data structure. This is already the case with + `_equiv`. + +The [Alternatives section](#variants-of-borrow) includes a variant of `Borrow` +that doesn't suffer from these downsides, but has some downsides of its own. ### Clone-on-write (`Cow`) pointers @@ -511,17 +497,23 @@ A side-benefit of the `Borrow` trait is that we can give a more general version of the `MaybeOwned` as a "clone-on-write" smart pointer: ```rust -pub enum Cow<'a, T> where T: ToOwned { - Shared(&'a T), - Owned(T::Owned) +/// A generalization of Clone. +trait FromBorrow: Borrow { + fn from_borrow(b: &B) -> Self; +} + +/// A clone-on-write smart pointer +pub enum Cow<'a, T, B> where T: FromBorrow { + Shared(&'a B), + Owned(T) } -impl<'a, T> Cow<'a, T> where T: ToOwned { - pub fn new(shared: &'a T) -> Cow<'a, T> { +impl<'a, T, B> Cow<'a, T, B> where T: FromBorrow { + pub fn new(shared: &'a B) -> Cow<'a, T, B> { Shared(shared) } - pub fn new_owned(owned: T::Owned) -> Cow<'static, T> { + pub fn new_owned(owned: T) -> Cow<'static, T, B> { Owned(owned) } @@ -532,42 +524,42 @@ impl<'a, T> Cow<'a, T> where T: ToOwned { } } - pub fn to_owned_mut(&mut self) -> &mut T::Owned { + pub fn to_owned_mut(&mut self) -> &mut T { match *self { Shared(shared) => { - *self = Owned(shared.to_owned()); + *self = Owned(FromBorrow::from_borrow(shared)); self.to_owned_mut() } Owned(ref mut owned) => owned } } - pub fn into_owned(self) -> T::Owned { + pub fn into_owned(self) -> T { match self { - Shared(shared) => shared.to_owned(), + Shared(shared) => FromBorrow::from_borrow(shared), Owned(owned) => owned } } } -impl<'a, T> Deref for Cow<'a, T> where T: ToOwned { - fn deref(&self) -> &T { +impl<'a, T, B> Deref for Cow<'a, T, B> where T: FromBorrow { + fn deref(&self) -> &B { match *self { Shared(shared) => shared, - Owned(ref owned) => T::borrow(owned) + Owned(ref owned) => owned.borrow() } } } -impl<'a, T> DerefMut for Cow<'a, T> where T: ToOwned { - fn deref_mut(&mut self) -> &mut T { - T::borrow_mut(self.to_owned_mut()) +impl<'a, T, B> DerefMut for Cow<'a, T, B> where T: FromBorrow { + fn deref_mut(&mut self) -> &mut B { + self.to_owned_mut().borrow_mut() } } ``` -The type `Cow<'a, str>` is roughly equivalent to today's `MaybeOwned<'a>` -(and `Cow<'a, [T]>` to `MaybeOwnedVector<'a, T>`). +The type `Cow<'a, String, str>` is roughly equivalent to today's `MaybeOwned<'a>` +(and `Cow<'a, Vec, [T]>` to `MaybeOwnedVector<'a, T>`). By implementing `Deref` and `DerefMut`, the `Cow` type acts as a smart pointer -- but in particular, the `mut` variant actually *clones* if the pointed-to @@ -979,7 +971,7 @@ Operation | Collections `fn push(&mut self, T)` | `Vec`, `BinaryHeap`, `String` `fn push_front(&mut self, T)` | `DList`, `RingBuf` `fn push_back(&mut self, T)` | `DList`, `RingBuf` -`fn insert(&mut self, uint, T)` | `Vec`, `RingBuf` +`fn insert(&mut self, uint, T)` | `Vec`, `RingBuf`, `String` `fn insert(&mut self, K::Owned) -> bool` | `HashSet`, `TreeSet`, `TrieSet`, `BitvSet` `fn insert(&mut self, K::Owned, V) -> Option` | `HashMap`, `TreeMap`, `TrieMap`, `SmallIntMap` `fn append(&mut self, Self)` | `DList` @@ -1030,7 +1022,7 @@ Operation | Collections `fn pop(&mut self) -> Option` | `Vec`, `BinaryHeap`, `String` `fn pop_front(&mut self) -> Option` | `DList`, `RingBuf` `fn pop_back(&mut self) -> Option` | `DList`, `RingBuf` -`fn remove(&mut self, uint) -> Option` | `Vec`, `RingBuf` +`fn remove(&mut self, uint) -> Option` | `Vec`, `RingBuf`, `String` `fn remove(&mut self, &K) -> bool` | `HashSet`, `TreeSet`, `TrieSet`, `BitvSet` `fn remove(&mut self, &K) -> Option` | `HashMap`, `TreeMap`, `TrieMap`, `SmallIntMap` `fn truncate(&mut self, len: uint)` | `Vec`, `String`, `Bitv`, `DList`, `RingBuf` @@ -1192,16 +1184,27 @@ Some of the maps (e.g. `TreeMap`) currently offer specialized iterators over their entries starting at a given key (called `lower_bound`) and above a given key (called `upper_bound`), along with `_mut` variants. While the functionality is worthwhile, the names are not very clear, so this RFC proposes the following -renaming: +replacement API (thanks to [@Gankro for the suggestion](https://github.com/rust-lang/rfcs/pull/235#issuecomment-55512788)): ```rust -// Returns an iterator starting with the first key-value pair whose key is not less than k. -fn iter_from(&self, k: &K) -> Entries<'a, K, V> -fn iter_from_mut(&mut self, k: &K) -> EntriesMut<'a, K, V> +Bound { + /// An inclusive bound + Included(T), + + /// An exclusive bound + Excluded(T), + + Unbounded, +} + +/// Creates a double-ended iterator over a sub-range of the collection's items, +/// starting at min, and ending at max. If min is `Unbounded`, then it will +/// be treated as "negative infinity", and if max is `Unbounded`, then it will +/// be treated as "positive infinity". Thus range(Unbounded, Unbounded) will yield +/// the whole collection. +fn range(&self, min: Bound<&T>, max: Bound<&T>) -> RangedItems<'a, T>; -// Returns an iterator starting with the first key-value pair whose key is greater than k. -fn iter_above(&self, k: &K) -> Entries<'a, K, V> -fn iter_above_mut(&mut self, k: &K) -> EntriesMut <'a, K, V> +fn range_mut(&self, min: Bound<&T>, max: Bound<&T>) -> RangedItemsMut<'a, T>; ``` These iterators should be provided for any maps over ordered keys (`TreeMap`, @@ -1494,7 +1497,140 @@ methods will probably still be included with "experimental" status.) # Alternatives -## For the `Equiv` problem +## `Borrow` and the `Equiv` problem + +### Variants of `Borrow` + +The original version of `Borrow` was somewhat more subtle: + +```rust +/// A trait for borrowing. +/// If `T: Borrow` then `&T` represents data borrowed from `T::Owned`. +trait Borrow for Sized? { + /// The type being borrowed from. + type Owned; + + /// Immutably borrow from an owned value. + fn borrow(&Owned) -> &Self; + + /// Mutably borrow from an owned value. + fn borrow_mut(&mut Owned) -> &mut Self; +} + +trait ToOwned: Borrow { + /// Produce a new owned value, usually by cloning. + fn to_owned(&self) -> Owned; +} + +impl Borrow for A { + type Owned = A; + fn borrow(a: &A) -> &A { + a + } + fn borrow_mut(a: &mut A) -> &mut A { + a + } +} + +impl ToOwned for A { + fn to_owned(&self) -> A { + self.clone() + } +} + +impl Borrow for str { + type Owned = String; + fn borrow(s: &String) -> &str { + s.as_slice() + } + fn borrow_mut(s: &mut String) -> &mut str { + s.as_mut_slice() + } +} + +impl ToOwned for str { + fn to_owned(&self) -> String { + self.to_string() + } +} + +impl Borrow for [T] { + type Owned = Vec; + fn borrow(s: &Vec) -> &[T] { + s.as_slice() + } + fn borrow_mut(s: &mut Vec) -> &mut [T] { + s.as_mut_slice() + } +} + +impl ToOwned for [T] { + fn to_owned(&self) -> Vec { + self.to_vec() + } +} + +impl HashMap where K: Borrow + Hash + Eq { + fn find(&self, k: &K) -> &V { ... } + fn insert(&mut self, k: K::Owned, v: V) -> Option { ... } + ... +} + +pub enum Cow<'a, T> where T: ToOwned { + Shared(&'a T), + Owned(T::Owned) +} +``` + +This approach ties `Borrow` directly to the borrowed data, and uses an +associated type to *uniquely determine* the corresponding owned data type. + +For string keys, we would use `HashMap`. Then, the `find` method would +take an `&str` key argument, while `insert` would take an owned `String`. On the +other hand, for some other type `Foo` a `HashMap` would take +`&Foo` for `find` and `Foo` for `insert`. (More discussion on the choice of +ownership is given in the [alternatives section](#ownership-management-for-keys). + +**Benefits of this alternative**: + +* Unlike the current `_equiv` or `find_with` methods, or the proposal in the +RFC, this approach guarantees coherence about hashing or ordering. For example, +`HashMap` above requires that `K` (the borrowed key type) is `Hash`, and will +produce hashes from owned keys by first borrowing from them. + +* Unlike the proposal in this RFC, the signature of the methods for maps is + *very simple* -- essentially the same as the current `find`, `insert`, etc. + +* Like the proposal in this RFC, there is only a single `Borrow` + trait, so it would be possible to standardize on a `Map` trait later + on and include these APIs. The trait could be made somewhat simpler + with this alternative form of `Borrow`, but can be provided in + either case; see + [these](https://github.com/rust-lang/rfcs/pull/235#issuecomment-55976755) + [comments](https://github.com/rust-lang/rfcs/pull/235#issuecomment-56070223) + for details. + +* The `Cow` data type is simpler than in the RFC's proposal, since it does not + need a type parameter for the owned data. + +**Drawbacks of this alternative**: + +* It's quite subtle that you want to use `HashMap` rather than + `HashMap`. That is, if you try to use a map in the "obvious way" + you will not be able to use string slices for lookup, which is part of what + this RFC is trying to achieve. The same applies to `Cow`. + +* The design is somewhat less flexible than the one in the RFC, because (1) + there is a fixed choice of owned type corresponding to each borrowed type and + (2) you cannot use multiple borrow types for lookups at different types + (e.g. using `&String` sometimes and `&str` other times). On the other hand, + these restrictions guarantee coherence of hashing/equality/comparison. + +* This version of `Borrow`, mapping from borrowed to owned data, is + somewhat less intuitive. + +On the balance, the approach proposed in the RFC seems better, because using the +map APIs in the obvious ways works by default. ### The `HashMapKey` trait and friends @@ -1567,6 +1703,10 @@ yields references, may be a bit surprising. On the other hand, moving is the default almost everywhere in Rust, and with the proposed approach you get to use `&` and `&mut` to easily select other forms of iteration. +(See +[@huon's comment](https://github.com/rust-lang/rfcs/pull/235/files#r17697796) +for additional drawbacks.) + Unfortunately, it's a bit tricky to make for use by-ref iterators instead. The problem is that an iterator is `IntoIterator`, but it is not `Iterable` (or whatever we call the by-reference trait). Why? Because `IntoIterator` gives you From 3825c868bf39741f21f05d5d8c2c65f27637f599 Mon Sep 17 00:00:00 2001 From: Aaron Turon Date: Mon, 27 Oct 2014 16:26:30 -0700 Subject: [PATCH 7/7] Update with minor clarifications and adjustments --- .../0000-collection-conventions.md | 34 +++++++++++-------- 1 file changed, 19 insertions(+), 15 deletions(-) rename active/0000-collections-conventions.md => text/0000-collection-conventions.md (98%) diff --git a/active/0000-collections-conventions.md b/text/0000-collection-conventions.md similarity index 98% rename from active/0000-collections-conventions.md rename to text/0000-collection-conventions.md index 9cd525b040f..95adbbedba2 100644 --- a/active/0000-collections-conventions.md +++ b/text/0000-collection-conventions.md @@ -975,6 +975,7 @@ Operation | Collections `fn insert(&mut self, K::Owned) -> bool` | `HashSet`, `TreeSet`, `TrieSet`, `BitvSet` `fn insert(&mut self, K::Owned, V) -> Option` | `HashMap`, `TreeMap`, `TrieMap`, `SmallIntMap` `fn append(&mut self, Self)` | `DList` +`fn prepend(&mut self, Self)` | `DList` There are a few changes here from the current state of affairs: @@ -1238,13 +1239,13 @@ fn difference<'a>(&'a self, other: &'a Self) -> I; fn symmetric_difference<'a>(&'a self, other: &'a Self) -> I; ``` -where the `I` type is an iterator over keys that varies by concrete set. Working -with these iterators avoids materializing intermediate sets when they're not -needed; the `collect` method can be used to create sets when they are. - -To clarify the API, this RFC proposes renaming the methods to `iter_or`, -`iter_and`, `iter_sub`, and `iter_xor` respectively. These names emphasize the -fact that the methods return iterators, which may be surprising. +where the `I` type is an iterator over keys that varies by concrete +set. Working with these iterators avoids materializing intermediate +sets when they're not needed; the `collect` method can be used to +create sets when they are. This RFC proposes to keep these names +intact, following the +[RFC](https://github.com/rust-lang/rfcs/pull/344) on iterator +conventions. Sets should also implement the `BitOr`, `BitAnd`, `BitXor` and `Sub` traits from `std::ops`, allowing overloaded notation `|`, `&`, `|^` and `-` to be used with @@ -1487,13 +1488,15 @@ v.push_all_move(some_vec); v.extend(some_vec); ``` -However, currently the `push_all` and `push_all_move` methods can rely on the -*exact* size of the container being pushed, in order to elide bounds checks. We -do not currently have a way to "trust" methods like `len` on iterators to elide -bounds checks. A separate RFC will introduce the notion of a "trusted" method -which should support such optimization and allow us to deprecate the `push_all` -and `push_all_move` variants. (This is unlikely to happen before 1.0, so the -methods will probably still be included with "experimental" status.) +However, currently the `push_all` and `push_all_move` methods can rely +on the *exact* size of the container being pushed, in order to elide +bounds checks. We do not currently have a way to "trust" methods like +`len` on iterators to elide bounds checks. A separate RFC will +introduce the notion of a "trusted" method which should support such +optimization and allow us to deprecate the `push_all` and +`push_all_move` variants. (This is unlikely to happen before 1.0, so +the methods will probably still be included with "experimental" +status, and likely with different names.) # Alternatives @@ -1767,4 +1770,5 @@ Using the `Borrow` trait, it might be possible to safely add a coercion for auto For sized types, this coercion is *forced* to be trivial, so the only time it would involve running user code is for unsized values. -A general story about such coercions will be left to a follow-up RFC. +A general story about such coercions will be left to a +[follow-up RFC](https://github.com/rust-lang/rfcs/pull/241).

(&self, pred: P) where P: Predicate { ... } + ... +} +``` + +Since these two patterns are particularly common throughout `std`, this RFC +proposes adding both of the above traits, and using them to cut down on the +number of method variants. + +In particular, some methods on string slices currently work with `CharEq`, which +is similar to `Predicate`: + +```rust +pub trait CharEq { + fn matches(&mut self, char) -> bool; + fn only_ascii(&self) -> bool; +} +``` + +The difference is the `only_ascii` method, which is used to optimize certain +operations when the predicate only holds for characters in the ASCII range. + +To keep these optimizations intact while connecting to `Predicate`, this RFC +proposes the following restructuring of `CharEq`: + +```rust +pub trait CharPredicate: Predicate { + fn only_ascii(&self) -> bool { + false + } +} +``` + +### Why not leverage unboxed closures? + +A natural question is: why not use the traits for unboxed closures to achieve a +similar effect? For example, you could imagine writing a blanket `impl` for +`Fn(&T) -> bool` for any `T: PartialEq`, which would allow `PartialEq` values to +be used anywhere a predicate-like closure was requested. + +The problem is that these blanket `impl`s will often conflict. In particular, +*any* type `T` could implement `Fn() -> T`, and that single blanket `impl` would +preclude any others (at least, assuming that unboxed closure traits treat the +argument and return types as associated (output) types). + +In addition, the explicit use of traits like `Predicate` makes the intended +semantics more clear, and the overloading less surprising. + +## The APIs + +Now we'll delve into the detailed APIs for the various concrete +collections. These APIs will often be given in tabular form, grouping together +common APIs across multiple collections. When writing these function signatures: + +* We will assume a type parameter `T` for `Vec`, `BinaryHeap`, `DList` and `RingBuf`; +we will also use this parameter for APIs on `String`, where it should be +understood as `char`. + +* We will assume type parameters `K: Borrow` and `V` for `HashMap` and +`TreeMap`; for `TrieMap` and `SmallIntMap` the `K` is assumed to be `uint` + +* We will assume a type parameter `K: Borrow` for `HashSet` and `TreeSet`; for + `BitvSet` it is assumed to be `uint`. + +We will begin by outlining the most widespread APIs in tables, making it easy to +compare names and signatures across different kinds of collections. Then we will +focus on some APIs specific to particular classes of collections -- e.g. sets +and maps. Finally, we will briefly discuss APIs that are specific to a single +concrete collection. + +### Construction + +All of the collections should support a static function: + +```rust +fn new() -> Self +``` + +that creates an empty version of the collection; the constructor may take +arguments needed to set up the collection, e.g. the capacity for `LruCache`. + +Several collections also support separate constructors for providing capacities in +advance; these are discussed [below](#capacity-management). + +#### The `FromIterator` trait + +All of the collections should implement the `FromIterator` trait: + +```rust +pub trait FromIterator { + type A: + fn from_iter(T) -> Self where T: IntoIterator; +} +``` + +Note that this varies from today's `FromIterator` by consuming an `IntoIterator` +rather than `Iterator`. As explained [above](#intoiterator-and-iterable), this +choice is strictly more general and will not break any existing code. + +This constructor initializes the collection with the contents of the +iterator. For maps, the iterator is over key/value pairs, and the semantics is +equivalent to inserting those pairs in order; if keys are repeated, the last +value is the one left in the map. + +### Insertion + +The table below gives methods for inserting items into various concrete collections: + +Operation | Collections +--------- | ----------- +`fn push(&mut self, T)` | `Vec`, `BinaryHeap`, `String` +`fn push_front(&mut self, T)` | `DList`, `RingBuf` +`fn push_back(&mut self, T)` | `DList`, `RingBuf` +`fn insert(&mut self, uint, T)` | `Vec`, `RingBuf` +`fn insert(&mut self, K::Owned) -> bool` | `HashSet`, `TreeSet`, `TrieSet`, `BitvSet` +`fn insert(&mut self, K::Owned, V) -> Option` | `HashMap`, `TreeMap`, `TrieMap`, `SmallIntMap` +`fn append(&mut self, Self)` | `DList` + +There are a few changes here from the current state of affairs: + +* The `DList` and `RingBuf` data structures no longer provide `push`, but rather + `push_front` and `push_back`. This change is based on (1) viewing them as + deques and (2) not giving priority to the "front" or the "back". + +* The `insert` method on maps returns the value previously associated with the + key, if any. Previously, this functionality was provided by a `swap` method, + which has been dropped (consolidating needless method variants.) + +Aside from these changes, a number of insertion methods will be deprecated +(e.g. the `append` and `append_one` methods on `Vec`). These are discussed +further in the section on "specialized operations" +[below](#specialized-operations). + +#### The `Extend` trait (was: `Extendable`) + +In addition to the standard insertion operations above, *all* collections will +implement the `Extend` trait. This trait was previously called `Extendable`, but +in general we +[prefer to avoid](http://aturon.github.io/style/naming/README.html) `-able` +suffixes and instead name the trait using a verb (or, especially, the key method +offered by the trait.) + +The `Extend` trait allows data from an arbitrary iterator to be inserted into a +collection, and will be defined as follows: + +```rust +pub trait Extend: FromIterator { + fn extend(&mut self, T) where T: IntoIterator; +} +``` + +As with `FromIterator`, this trait has been modified to take an `IntoIterator` +value. + +### Deletion + +The table below gives methods for removing items into various concrete collections: + +Operation | Collections +--------- | ----------- +`fn clear(&mut self)` | *all* +`fn pop(&mut self) -> Option` | `Vec`, `BinaryHeap`, `String` +`fn pop_front(&mut self) -> Option` | `DList`, `RingBuf` +`fn pop_back(&mut self) -> Option` | `DList`, `RingBuf` +`fn remove(&mut self, uint) -> Option` | `Vec`, `RingBuf` +`fn remove(&mut self, &K) -> bool` | `HashSet`, `TreeSet`, `TrieSet`, `BitvSet` +`fn remove(&mut self, &K) -> Option` | `HashMap`, `TreeMap`, `TrieMap`, `SmallIntMap` +`fn truncate(&mut self, len: uint)` | `Vec`, `String`, `Bitv`, `DList`, `RingBuf` +`fn retain