Skip to content

Commit 5c166f1

Browse files
committed
[x86] make SLM extract vector element more expensive than default
I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607
1 parent 5c5e860 commit 5c166f1

File tree

12 files changed

+2519
-1104
lines changed

12 files changed

+2519
-1104
lines changed

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2377,6 +2377,13 @@ int X86TTIImpl::getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,
23772377
}
23782378

23792379
int X86TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {
2380+
static const CostTblEntry SLMCostTbl[] = {
2381+
{ ISD::EXTRACT_VECTOR_ELT, MVT::i8, 4 },
2382+
{ ISD::EXTRACT_VECTOR_ELT, MVT::i16, 4 },
2383+
{ ISD::EXTRACT_VECTOR_ELT, MVT::i32, 4 },
2384+
{ ISD::EXTRACT_VECTOR_ELT, MVT::i64, 7 }
2385+
};
2386+
23802387
assert(Val->isVectorTy() && "This must be a vector type");
23812388

23822389
Type *ScalarType = Val->getScalarType();
@@ -2396,6 +2403,13 @@ int X86TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {
23962403
// Floating point scalars are already located in index #0.
23972404
if (ScalarType->isFloatingPointTy() && Index == 0)
23982405
return 0;
2406+
2407+
int ISD = TLI->InstructionOpcodeToISD(Opcode);
2408+
assert(ISD && "Unexpected vector opcode");
2409+
MVT MScalarTy = LT.second.getScalarType();
2410+
if (ST->isSLM())
2411+
if (auto *Entry = CostTableLookup(SLMCostTbl, ISD, MScalarTy))
2412+
return LT.first * Entry->Cost;
23992413
}
24002414

24012415
// Add to the base cost if we know that the extracted element of a vector is

llvm/test/Analysis/CostModel/X86/fptosi.ll

Lines changed: 58 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx512f | FileCheck %s --check-prefixes=CHECK,AVX512,AVX512F
77
; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx512f,+avx512dq | FileCheck %s --check-prefixes=CHECK,AVX512,AVX512DQ
88
;
9-
; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=slm | FileCheck %s --check-prefixes=CHECK,SSE,SSE42
9+
; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=slm | FileCheck %s --check-prefixes=SLM
1010
; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=goldmont | FileCheck %s --check-prefixes=CHECK,SSE,SSE42
1111
; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=btver2 | FileCheck %s --check-prefixes=BTVER2
1212

@@ -39,6 +39,13 @@ define i32 @fptosi_double_i64(i32 %arg) {
3939
; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I64 = fptosi <8 x double> undef to <8 x i64>
4040
; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
4141
;
42+
; SLM-LABEL: 'fptosi_double_i64'
43+
; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi double undef to i64
44+
; SLM-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V2I64 = fptosi <2 x double> undef to <2 x i64>
45+
; SLM-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %V4I64 = fptosi <4 x double> undef to <4 x i64>
46+
; SLM-NEXT: Cost Model: Found an estimated cost of 75 for instruction: %V8I64 = fptosi <8 x double> undef to <8 x i64>
47+
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
48+
;
4249
; BTVER2-LABEL: 'fptosi_double_i64'
4350
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi double undef to i64
4451
; BTVER2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I64 = fptosi <2 x double> undef to <2 x i64>
@@ -75,6 +82,13 @@ define i32 @fptosi_double_i32(i32 %arg) {
7582
; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptosi <8 x double> undef to <8 x i32>
7683
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
7784
;
85+
; SLM-LABEL: 'fptosi_double_i32'
86+
; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptosi double undef to i32
87+
; SLM-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = fptosi <2 x double> undef to <2 x i32>
88+
; SLM-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4I32 = fptosi <4 x double> undef to <4 x i32>
89+
; SLM-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V8I32 = fptosi <8 x double> undef to <8 x i32>
90+
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
91+
;
7892
; BTVER2-LABEL: 'fptosi_double_i32'
7993
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptosi double undef to i32
8094
; BTVER2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = fptosi <2 x double> undef to <2 x i32>
@@ -111,6 +125,13 @@ define i32 @fptosi_double_i16(i32 %arg) {
111125
; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
112126
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
113127
;
128+
; SLM-LABEL: 'fptosi_double_i16'
129+
; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
130+
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
131+
; SLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
132+
; SLM-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
133+
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
134+
;
114135
; BTVER2-LABEL: 'fptosi_double_i16'
115136
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
116137
; BTVER2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
@@ -147,6 +168,13 @@ define i32 @fptosi_double_i8(i32 %arg) {
147168
; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
148169
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
149170
;
171+
; SLM-LABEL: 'fptosi_double_i8'
172+
; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8
173+
; SLM-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
174+
; SLM-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
175+
; SLM-NEXT: Cost Model: Found an estimated cost of 51 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
176+
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
177+
;
150178
; BTVER2-LABEL: 'fptosi_double_i8'
151179
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8
152180
; BTVER2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
@@ -194,6 +222,14 @@ define i32 @fptosi_float_i64(i32 %arg) {
194222
; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V16I64 = fptosi <16 x float> undef to <16 x i64>
195223
; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
196224
;
225+
; SLM-LABEL: 'fptosi_float_i64'
226+
; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi float undef to i64
227+
; SLM-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V2I64 = fptosi <2 x float> undef to <2 x i64>
228+
; SLM-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %V4I64 = fptosi <4 x float> undef to <4 x i64>
229+
; SLM-NEXT: Cost Model: Found an estimated cost of 75 for instruction: %V8I64 = fptosi <8 x float> undef to <8 x i64>
230+
; SLM-NEXT: Cost Model: Found an estimated cost of 151 for instruction: %V16I64 = fptosi <16 x float> undef to <16 x i64>
231+
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
232+
;
197233
; BTVER2-LABEL: 'fptosi_float_i64'
198234
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi float undef to i64
199235
; BTVER2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I64 = fptosi <2 x float> undef to <2 x i64>
@@ -218,6 +254,13 @@ define i32 @fptosi_float_i32(i32 %arg) {
218254
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I32 = fptosi <16 x float> undef to <16 x i32>
219255
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
220256
;
257+
; SLM-LABEL: 'fptosi_float_i32'
258+
; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptosi float undef to i32
259+
; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptosi <4 x float> undef to <4 x i32>
260+
; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptosi <8 x float> undef to <8 x i32>
261+
; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I32 = fptosi <16 x float> undef to <16 x i32>
262+
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
263+
;
221264
; BTVER2-LABEL: 'fptosi_float_i32'
222265
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptosi float undef to i32
223266
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptosi <4 x float> undef to <4 x i32>
@@ -254,6 +297,13 @@ define i32 @fptosi_float_i16(i32 %arg) {
254297
; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I16 = fptosi <16 x float> undef to <16 x i16>
255298
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
256299
;
300+
; SLM-LABEL: 'fptosi_float_i16'
301+
; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi float undef to i16
302+
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4I16 = fptosi <4 x float> undef to <4 x i16>
303+
; SLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V8I16 = fptosi <8 x float> undef to <8 x i16>
304+
; SLM-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V16I16 = fptosi <16 x float> undef to <16 x i16>
305+
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
306+
;
257307
; BTVER2-LABEL: 'fptosi_float_i16'
258308
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi float undef to i16
259309
; BTVER2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4I16 = fptosi <4 x float> undef to <4 x i16>
@@ -290,6 +340,13 @@ define i32 @fptosi_float_i8(i32 %arg) {
290340
; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I8 = fptosi <16 x float> undef to <16 x i8>
291341
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
292342
;
343+
; SLM-LABEL: 'fptosi_float_i8'
344+
; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi float undef to i8
345+
; SLM-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4I8 = fptosi <4 x float> undef to <4 x i8>
346+
; SLM-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %V8I8 = fptosi <8 x float> undef to <8 x i8>
347+
; SLM-NEXT: Cost Model: Found an estimated cost of 99 for instruction: %V16I8 = fptosi <16 x float> undef to <16 x i8>
348+
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
349+
;
293350
; BTVER2-LABEL: 'fptosi_float_i8'
294351
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi float undef to i8
295352
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x float> undef to <4 x i8>

0 commit comments

Comments
 (0)