Skip to content

Commit 3a61245

Browse files
committed
clang/AMDGPU: Assume denormals are enabled for the default target.
Since the default logic was based on having fast denormal/fma features, and the default target has no features, we assumed flushing by default. This fixes incorrectly assuming flushing in builds for "generic" IR libraries. The handling for no specified --cuda-gpu-arch in HIP is kind of broken. Somewhere else forces a default target of gfx803, which does not enable denormal handling by default. We don't see this default switching here, so you'll end up with a different denormal mode depending on whether you explicitly requested gfx803, or used it by default.
1 parent 9289f34 commit 3a61245

File tree

3 files changed

+17
-6
lines changed

3 files changed

+17
-6
lines changed

clang/lib/Driver/ToolChains/AMDGPU.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -262,6 +262,11 @@ AMDGPUToolChain::TranslateArgs(const DerivedArgList &Args, StringRef BoundArch,
262262

263263
bool AMDGPUToolChain::getDefaultDenormsAreZeroForTarget(
264264
llvm::AMDGPU::GPUKind Kind) {
265+
266+
// Assume nothing without a specific target.
267+
if (Kind == llvm::AMDGPU::GK_NONE)
268+
return false;
269+
265270
const unsigned ArchAttr = llvm::AMDGPU::getArchAttrAMDGCN(Kind);
266271

267272
// Default to enabling f32 denormals by default on subtargets where fma is
Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,24 @@
11
// Slow FMAF and slow f32 denormals
2-
// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=pitcairn %s 2>&1 | FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
2+
// RUN: %clang -### -target amdgcn--amdhsa -nogpulib -c -mcpu=pitcairn %s 2>&1 | FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
33
// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=pitcairn %s 2>&1 | FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
44

55
// Fast FMAF, but slow f32 denormals
6-
// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=tahiti %s 2>&1 | FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
6+
// RUN: %clang -### -target amdgcn--amdhsa -nogpulib -c -mcpu=tahiti %s 2>&1 | FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
77
// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=tahiti %s 2>&1 | FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
88

99
// Fast F32 denormals, but slow FMAF
10-
// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=fiji %s 2>&1 | FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
10+
// RUN: %clang -### -target amdgcn--amdhsa -nogpulib -c -mcpu=fiji %s 2>&1 | FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
1111
// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=fiji %s 2>&1 | FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
1212

1313
// Fast F32 denormals and fast FMAF
14-
// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=gfx900 %s 2>&1 | FileCheck -check-prefixes=AMDGCN,AMDGCN-DENORM %s
15-
// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=gfx900 %s 2>&1 | FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
14+
// RUN: %clang -### -target amdgcn--amdhsa -nogpulib -c -mcpu=gfx900 %s 2>&1 | FileCheck -check-prefixes=AMDGCN,AMDGCN-DENORM %s
15+
// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -nogpulib -c -mcpu=gfx900 %s 2>&1 | FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
16+
17+
// Default target is artificial, but should assume a conservative default.
18+
// RUN: %clang -### -target amdgcn--amdhsa -nogpulib -c %s 2>&1 | FileCheck -check-prefixes=AMDGCN,AMDGCN-DENORM %s
19+
// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -nogpulib -c %s 2>&1 | FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
1620

1721
// AMDGCN-FLUSH: "-fdenormal-fp-math-f32=preserve-sign,preserve-sign"
1822

1923
// This should be omitted and default to ieee
20-
// AMDGCN-DENORM-NOT: "-fdenormal-fp-math-f32"
24+
// AMDGCN-DENORM-NOT: denormal-fp-math

clang/test/Driver/cuda-flush-denormals-to-zero.cu

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@
2222
// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell --cuda-gpu-arch=gfx803 -nocudainc -nogpulib %s 2>&1 | FileCheck -check-prefix=FTZ %s
2323
// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell --cuda-gpu-arch=gfx900 -nocudainc -nogpulib %s 2>&1 | FileCheck -check-prefix=NOFTZ %s
2424

25+
// Test no subtarget, which should get the denormal setting of the default gfx803
26+
// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell -nocudainc -nogpulib %s 2>&1 | FileCheck -check-prefix=FTZ %s
2527

2628
// Test multiple offload archs with different defaults.
2729
// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 -nocudainc -nogpulib %s 2>&1 | FileCheck -check-prefix=MIXED-DEFAULT-MODE %s

0 commit comments

Comments
 (0)