Skip to content

Commit e6590f0

Browse files
apinski-quiczatrazz
authored andcommitted
aarch64: Remove non-temporal load/stores from oryon-1's memset
The hardware architects have a new recommendation not to use non-temporal load/stores for memset. This patch removes this path. I found there was no difference in the memset speed with/without non-temporal load/stores either. Signed-off-by: Andrew Pinski <[email protected]> Reviewed-by: Adhemerval Zanella <[email protected]>
1 parent eb5eeb4 commit e6590f0

File tree

1 file changed

+0
-26
lines changed

1 file changed

+0
-26
lines changed

sysdeps/aarch64/multiarch/memset_oryon1.S

Lines changed: 0 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -93,8 +93,6 @@ L(set_long):
9393
cmp count, 256
9494
ccmp valw, 0, 0, cs
9595
b.eq L(try_zva)
96-
cmp count, #32768
97-
b.hi L(set_long_with_nontemp)
9896
/* Small-size or non-zero memset does not use DC ZVA. */
9997
sub count, dstend, dst
10098

@@ -117,30 +115,6 @@ L(set_long):
117115
stp val, val, [dstend, -16]
118116
ret
119117

120-
L(set_long_with_nontemp):
121-
/* Small-size or non-zero memset does not use DC ZVA. */
122-
sub count, dstend, dst
123-
124-
/* Adjust count and bias for loop. By subtracting extra 1 from count,
125-
it is easy to use tbz instruction to check whether loop tailing
126-
count is less than 33 bytes, so as to bypass 2 unnecessary stps. */
127-
sub count, count, 64+16+1
128-
129-
1: stnp val, val, [dst, 16]
130-
stnp val, val, [dst, 32]
131-
stnp val, val, [dst, 48]
132-
stnp val, val, [dst, 64]
133-
add dst, dst, #64
134-
subs count, count, 64
135-
b.hs 1b
136-
137-
tbz count, 5, 1f /* Remaining count is less than 33 bytes? */
138-
stnp val, val, [dst, 16]
139-
stnp val, val, [dst, 32]
140-
1: stnp val, val, [dstend, -32]
141-
stnp val, val, [dstend, -16]
142-
ret
143-
144118
L(try_zva):
145119
/* Write the first and last 64 byte aligned block using stp rather
146120
than using DC ZVA as it is faster. */

0 commit comments

Comments
 (0)