Support larger sizes in rmeta tables and incr comp alloc offsets #666

saethlin · 2023-08-24T19:58:18Z

Proposal

Rustc stores a lot of lengths in u32, which reduces the size of data structures and on-disk formats that contain these lengths. But this strategy limits the space of programs that can be compiled. In general, this is a gamble that our user base will benefit more from the performance improvement of picking a small fixed-size type than the implementation limitation this imposes will cause trouble.

We have a few reports of running into the u32 limit for rmeta table offsets and incr comp allocation offsets:

This is a proposal to find a way to lift the implementation limitation imposed by using u32 in these two places, and thus be able to compile the programs reported in these issues.

Current implementations of these are:

While the goal of this MCP is to support compiling more programs, it is important that the additional space of supported programs not impose a measurable cost on the average user according to our benchmark suite. In that sense, the perf implications of solutions to this MCP should be evaluated on their own, as opposed to as if they are the cost of adding a feature.

Mentors or Reviewers

Process

The main points of the Major Change Process are as follows:

File an issue describing the proposal.
A compiler team member or contributor who is knowledgeable in the area can second by writing @rustbot second.
- Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a -C flag, then full team check-off is required.
- Compiler team members can initiate a check-off via @rfcbot fcp merge on either the MCP or the PR.
Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.

You can read more about Major Change Proposals on forge.

Comments

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

The text was updated successfully, but these errors were encountered:

rustbot · 2023-08-24T19:58:20Z

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

cc @rust-lang/compiler @rust-lang/compiler-contributors

oli-obk · 2023-08-28T14:47:12Z

@rustbot second

Adapt table sizes to the contents This is an implementation of rust-lang/compiler-team#666 The objective of this PR is to permit the rmeta format to accommodate larger crates that need offsets larger than a `u32` can store without compromising performance for crates that do not need such range. The second commit is a number of tiny optimization opportunities I noticed while looking at perf recordings of the first commit. The rmeta tables need to have fixed-size elements to permit lazy random access. But the size only needs to be fixed _per table_, not per element type. This PR adds another `usize` to the table header which indicates the table element size. As each element of a table is set, we keep track of the widest encoded table value, then don't bother encoding all the unused trailing bytes on each value. When decoding table elements, we copy them to a full-width array if they are not already full-width. `LazyArray` needs some special treatment. Most other values that are encoded in tables are indexes or offsets, and those tend to be small so we get to drop a lot of zero bytes off the end. But `LazyArray` encodes _two_ small values in a fixed-width table element: A position of the table and the length of the table. The treatment described above could trim zero bytes off the table length, but any nonzero length shields the position bytes from the optimization. To improve this, we interleave the bytes of position and length. This change is responsible for about half of the crate metadata win on many crates. Fixes rust-lang#112934 (probably) Fixes rust-lang#103607

Adapt table sizes to the contents This is an implementation of rust-lang/compiler-team#666 The objective of this PR is to permit the rmeta format to accommodate larger crates that need offsets larger than a `u32` can store without compromising performance for crates that do not need such range. The second commit is a number of tiny optimization opportunities I noticed while looking at perf recordings of the first commit. The rmeta tables need to have fixed-size elements to permit lazy random access. But the size only needs to be fixed _per table_, not per element type. This PR adds another `usize` to the table header which indicates the table element size. As each element of a table is set, we keep track of the widest encoded table value, then don't bother encoding all the unused trailing bytes on each value. When decoding table elements, we copy them to a full-width array if they are not already full-width. `LazyArray` needs some special treatment. Most other values that are encoded in tables are indexes or offsets, and those tend to be small so we get to drop a lot of zero bytes off the end. But `LazyArray` encodes _two_ small values in a fixed-width table element: A position of the table and the length of the table. The treatment described above could trim zero bytes off the table length, but any nonzero length shields the position bytes from the optimization. To improve this, we interleave the bytes of position and length. This change is responsible for about half of the crate metadata win on many crates. Fixes rust-lang/rust#112934 (probably) Fixes rust-lang/rust#103607

WaffleLapkin · 2023-09-19T22:24:49Z

Closing as accepted (we should maybe automate "close after 10 days"...)

Adapt table sizes to the contents This is an implementation of rust-lang/compiler-team#666 The objective of this PR is to permit the rmeta format to accommodate larger crates that need offsets larger than a `u32` can store without compromising performance for crates that do not need such range. The second commit is a number of tiny optimization opportunities I noticed while looking at perf recordings of the first commit. The rmeta tables need to have fixed-size elements to permit lazy random access. But the size only needs to be fixed _per table_, not per element type. This PR adds another `usize` to the table header which indicates the table element size. As each element of a table is set, we keep track of the widest encoded table value, then don't bother encoding all the unused trailing bytes on each value. When decoding table elements, we copy them to a full-width array if they are not already full-width. `LazyArray` needs some special treatment. Most other values that are encoded in tables are indexes or offsets, and those tend to be small so we get to drop a lot of zero bytes off the end. But `LazyArray` encodes _two_ small values in a fixed-width table element: A position of the table and the length of the table. The treatment described above could trim zero bytes off the table length, but any nonzero length shields the position bytes from the optimization. To improve this, we interleave the bytes of position and length. This change is responsible for about half of the crate metadata win on many crates. Fixes rust-lang/rust#112934 (probably) Fixes rust-lang/rust#103607

saethlin added T-compiler Add this label so rfcbot knows to poll the compiler team major-change A proposal to make a major change to rustc labels Aug 24, 2023

rustbot added the to-announce Announce this issue on triage meeting label Aug 24, 2023

saethlin mentioned this issue Aug 24, 2023

Migrate away from u32 as an offset/length type #647

Closed

3 tasks

WaffleLapkin mentioned this issue Aug 27, 2023

ICE when include_bytes-ing ~>1GB of data in lib.rs rust-lang/rust#103607

Closed

saethlin mentioned this issue Aug 28, 2023

Adapt table sizes to the contents, accommodating u64 rmeta offsets rust-lang/rust#113542

Merged

rustbot added the final-comment-period The FCP has started, most (if not all) team members are in agreement label Aug 28, 2023

apiraino removed the to-announce Announce this issue on triage meeting label Sep 1, 2023

WaffleLapkin closed this as completed Sep 19, 2023

WaffleLapkin added the major-change-accepted A major change proposal that was accepted label Sep 19, 2023

rustbot added the to-announce Announce this issue on triage meeting label Sep 19, 2023

WaffleLapkin removed final-comment-period The FCP has started, most (if not all) team members are in agreement to-announce Announce this issue on triage meeting labels Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support larger sizes in rmeta tables and incr comp alloc offsets #666

Support larger sizes in rmeta tables and incr comp alloc offsets #666

saethlin commented Aug 24, 2023 •

edited

Loading

rustbot commented Aug 24, 2023

Uh oh!

oli-obk commented Aug 28, 2023

Uh oh!

WaffleLapkin commented Sep 19, 2023

Uh oh!

Support larger sizes in rmeta tables and incr comp alloc offsets #666

Support larger sizes in rmeta tables and incr comp alloc offsets #666

Comments

saethlin commented Aug 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposal

Mentors or Reviewers

Process

Comments

rustbot commented Aug 24, 2023

Uh oh!

oli-obk commented Aug 28, 2023

Uh oh!

WaffleLapkin commented Sep 19, 2023

Uh oh!

saethlin commented Aug 24, 2023 •

edited

Loading