Skip to content

Commit 20dd329

Browse files
committed
[LV] Allow scalable vectorization with vscale = 1
This change is a bit subtle. If we have a type like <vscale x 1 x i64>, the vectorizer will currently reject vectorization. The reason is that a type like <1 x i64> is likely to get simply rescalarized, and the vectorizer doesn't want to be in the game of simple unrolling. (I've given the example in terms of 1 x types which use a single register, but the same issue exists for any N x types which use N registers. e.g. RISCV LMULs.) This change distinguishes scalable types from fixed types under the reasoning that converting to a scalable type isn't unrolling. Because the actual vscale isn't known until runtime, using a vscale type is potentially very profitable. This makes an important, but unchecked, assumption. Specifically, the scalable type is assumed to only be legal per the cost model if there's actually a scalable register class which is distinct from the scalar domain. This is, to my knowledge, true for all targets which return non-invalid costs for scalable vector ops today, but in theory, we could have a target decide to lower scalable to fixed length vector or even scalar registers. If that ever happens, we'd need to revisit this code. In practice, this patch unblocks scalable vectorization for ELEN types on RISCV. Let me sketch one alternate implementation I considered. We could have restricted this to when we know a minimum value for vscale. Specifically, for the default +v extension for RISCV, we actually know that vscale >= 2 for ELEN types. However, doing it this way means we can't generate scalable vectors when using the various embedded vector extensions which have a minimum vscale of 1. Differential Revision: https://reviews.llvm.org/D128542
1 parent d2dad62 commit 20dd329

File tree

2 files changed

+256
-124
lines changed

2 files changed

+256
-124
lines changed

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6712,10 +6712,17 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I,
67126712

67136713
bool TypeNotScalarized = false;
67146714
if (VF.isVector() && VectorTy->isVectorTy()) {
6715-
unsigned NumParts = TTI.getNumberOfParts(VectorTy);
6716-
if (NumParts)
6717-
TypeNotScalarized = NumParts < VF.getKnownMinValue();
6718-
else
6715+
if (unsigned NumParts = TTI.getNumberOfParts(VectorTy)) {
6716+
if (VF.isScalable())
6717+
// <vscale x 1 x iN> is assumed to be profitable over iN because
6718+
// scalable registers are a distinct register class from scalar ones.
6719+
// If we ever find a target which wants to lower scalable vectors
6720+
// back to scalars, we'll need to update this code to explicitly
6721+
// ask TTI about the register class uses for each part.
6722+
TypeNotScalarized = NumParts <= VF.getKnownMinValue();
6723+
else
6724+
TypeNotScalarized = NumParts < VF.getKnownMinValue();
6725+
} else
67196726
C = InstructionCost::getInvalid();
67206727
}
67216728
return VectorizationCostTy(C, TypeNotScalarized);

0 commit comments

Comments
 (0)