Skip to content

Make Utf16CodeUnits generic and give it a public constructor. #19042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 21, 2014

Conversation

SimonSapin
Copy link
Contributor

This allows encoding to UTF-16 something that is not in UTF-8, e.g. a [char] UTF-32 string.

This might help with servo/servo#4023

@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @huonw (or someone else) soon.

@alexcrichton
Copy link
Member

This seems like something that would be more appropriate as an iterator adaptor rather than a constructor. It seems somewhat akin to map where you're consuming an Iterator<T> and returning an Iterator<U>, it's just less generic in this case.

I would also be a little worried that we're hard-wiring the return value of utf16_units to always operating over an Iterator<char> which may prevent us from discovering a more efficient algorithm in the future possibly.

@SimonSapin
Copy link
Contributor Author

This seems like something that would be more appropriate as an iterator adaptor rather than a constructor.

This seems niche enough that it shouldn’t be in the prelude, so an iterator adaptor would have to be a trait that you have bring into scope. How is this better than a function or a constructor?

It seems somewhat akin to map where you're consuming an Iterator and returning an Iterator, it's just less generic in this case.

It’s a special case of flat_map, but the specialization enables a much more useful implementation of size_hint. (flat_map has to assume an infinite upper bound until the outer iterator is exhausted.)

I would also be a little worried that we're hard-wiring the return value of utf16_units to always operating over an Iterator which may prevent us from discovering a more efficient algorithm in the future possibly.

Would you prefer a newtype that hides the details of Utf16CodeUnits<Chars> for the return value of str::utf16_unit? In any case, I believe that any change to the algorithm would be in the domain of micro-optimization (which is kinda the job of the compiler IMO) rather than anything radically different. There is only so much you can do to shuffle bits around.

@SimonSapin
Copy link
Contributor Author

I believe this additional commit addresses your latter comment on the return value of utf16_units.

I also tried making the iterator adaptor a method (through a trait) instead of a ::new() constructor. Method chaining is indeed nicer than nesting parentheses. But, trying to name the method encode_utf16 with a blanket for I where I: Iterator<char> implementation failed to build, because of the name collision with Char::encode_utf16.

@SimonSapin SimonSapin force-pushed the generic-utf16-encoder branch from 6e22fb3 to 75332a4 Compare November 18, 2014 00:47
@alexcrichton
Copy link
Member

Thanks @SimonSapin! This solves what I was concerned about with respect to the return type. I'm always somewhat wary about increasing API surface area because we can. Dealing with encodings is pretty complicated and we don't want to expose any sort of full-fledged interface in the standard library as one probably won't exist here just yet.

cc @aturon

@aturon
Copy link
Member

aturon commented Nov 20, 2014

I'm personally in favor of this change. (It is perhaps a bit confusing that this lives with str rather than in libunicode, but that's OK.) Basically, this is exposing encoding functionality we already have in a cleanly more general way. 👍

This allows encoding to UTF-16 something that is not in UTF-8, e.g. a
`[char]` UTF-32 string.

This might help with servo/servo#4023
@SimonSapin SimonSapin force-pushed the generic-utf16-encoder branch from 75332a4 to dff48a9 Compare November 20, 2014 14:06
@SimonSapin
Copy link
Contributor Author

Rebased.

Indeed, this only makes more generic existing functionality. I don’t mind moving it to libunicode, but libcore is where the existing str::utf16_units method was.

bors added a commit that referenced this pull request Nov 21, 2014
…richton

This allows encoding to UTF-16 something that is not in UTF-8, e.g. a `[char]` UTF-32 string.

This might help with servo/servo#4023
@bors bors closed this Nov 21, 2014
@bors bors merged commit dff48a9 into rust-lang:master Nov 21, 2014
@SimonSapin SimonSapin deleted the generic-utf16-encoder branch December 5, 2014 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants