Skip to content

Commit 083c2de

Browse files
fix: fix quant linear autotune
1 parent 773aabd commit 083c2de

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

server/text_generation_server/utils/gptq/custom_autotune.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ def kernel_call():
8888
# In testings using only 40 reps seems to be close enough and it appears to be what PyTorch uses
8989
# PyTorch also sets fast_flush to True, but I didn't see any speedup so I'll leave the default
9090
return triton.testing.do_bench(
91-
kernel_call, percentiles=(0.5, 0.2, 0.8), rep=40
91+
kernel_call, quantiles=(0.5, 0.2, 0.8), rep=40
9292
)
9393
except triton.OutOfResources:
9494
return (float("inf"), float("inf"), float("inf"))

0 commit comments

Comments
 (0)