Skip to content

Commit 5c7bbe3

Browse files
lsabaJinsong Ji
authored andcommitted
[MachinePipeliner] Refine the RecMII calculation
In the case of more than one SDep between two successor SUnits in the Nodeset, the current implementation sums the latencies of the dependencies, which could create a larger RecMII than necessary. for example, in case there is both a data dependency and an output dependency (with latency > 0) between successor nodes: SU(1) inst1: successors: SU(2): out latency = 1 SU(2): data latency = 1 SU(2) inst2: successors: SU(3): out latency = 1 SU(3): data latency = 1 SU(3) inst3: successors: SU(1): out latency = 1 SU(1): data latency = 1 the NodeSet latency returned would be 6, whereas it could be 3 if we take the max for each successor SUnit. In general this can be extended to finding the shortest path in the recurrence.. thoughts? Unfortunately I had a hard time creating a test for this in Hexagon/PowerPC, so help would be appreciated. Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D75918
1 parent cc4d7dc commit 5c7bbe3

File tree

1 file changed

+16
-4
lines changed

1 file changed

+16
-4
lines changed

llvm/include/llvm/CodeGen/MachinePipeliner.h

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -330,10 +330,22 @@ class NodeSet {
330330
NodeSet() = default;
331331
NodeSet(iterator S, iterator E) : Nodes(S, E), HasRecurrence(true) {
332332
Latency = 0;
333-
for (unsigned i = 0, e = Nodes.size(); i < e; ++i)
334-
for (const SDep &Succ : Nodes[i]->Succs)
335-
if (Nodes.count(Succ.getSUnit()))
336-
Latency += Succ.getLatency();
333+
for (unsigned i = 0, e = Nodes.size(); i < e; ++i) {
334+
DenseMap<SUnit *, unsigned> SuccSUnitLatency;
335+
for (const SDep &Succ : Nodes[i]->Succs) {
336+
auto SuccSUnit = Succ.getSUnit();
337+
if (!Nodes.count(SuccSUnit))
338+
continue;
339+
unsigned CurLatency = Succ.getLatency();
340+
unsigned MaxLatency = 0;
341+
if (SuccSUnitLatency.count(SuccSUnit))
342+
MaxLatency = SuccSUnitLatency[SuccSUnit];
343+
if (CurLatency > MaxLatency)
344+
SuccSUnitLatency[SuccSUnit] = CurLatency;
345+
}
346+
for (auto SUnitLatency : SuccSUnitLatency)
347+
Latency += SUnitLatency.second;
348+
}
337349
}
338350

339351
bool insert(SUnit *SU) { return Nodes.insert(SU); }

0 commit comments

Comments
 (0)