README.txt revision dbd8eb26ce1e7de9b69f5c46f45ba011a706c9b9
1//===---------------------------------------------------------------------===// 2// Random notes about and ideas for the SystemZ backend. 3//===---------------------------------------------------------------------===// 4 5The initial backend is deliberately restricted to z10. We should add support 6for later architectures at some point. 7 8-- 9 10SystemZDAGToDAGISel::SelectInlineAsmMemoryOperand() is passed "m" for all 11inline asm memory constraints; it doesn't get to see the original constraint. 12This means that it must conservatively treat all inline asm constraints 13as the most restricted type, "R". 14 15-- 16 17If an inline asm ties an i32 "r" result to an i64 input, the input 18will be treated as an i32, leaving the upper bits uninitialised. 19For example: 20 21define void @f4(i32 *%dst) { 22 %val = call i32 asm "blah $0", "=r,0" (i64 103) 23 store i32 %val, i32 *%dst 24 ret void 25} 26 27from CodeGen/SystemZ/asm-09.ll will use LHI rather than LGHI. 28to load 103. This seems to be a general target-independent problem. 29 30-- 31 32The tuning of the choice between LOAD ADDRESS (LA) and addition in 33SystemZISelDAGToDAG.cpp is suspect. It should be tweaked based on 34performance measurements. 35 36-- 37 38We don't support tail calls at present. 39 40-- 41 42We don't support prefetching yet. 43 44-- 45 46There is no scheduling support. 47 48-- 49 50We don't use the BRANCH ON COUNT or BRANCH ON INDEX families of instruction. 51 52-- 53 54We might want to use BRANCH ON CONDITION for conditional indirect calls 55and conditional returns. 56 57-- 58 59We don't use the combined COMPARE AND BRANCH instructions. 60 61-- 62 63We don't use the condition code results of anything except comparisons. 64 65Implementing this may need something more finely grained than the z_cmp 66and z_ucmp that we have now. It might (or might not) also be useful to 67have a mask of "don't care" values in conditional branches. For example, 68integer comparisons never set CC to 3, so the bottom bit of the CC mask 69isn't particularly relevant. JNLH and JE are equally good for testing 70equality after an integer comparison, etc. 71 72-- 73 74We don't use the LOAD AND TEST or TEST DATA CLASS instructions. 75 76-- 77 78We could use the generic floating-point forms of LOAD COMPLEMENT, 79LOAD NEGATIVE and LOAD POSITIVE in cases where we don't need the 80condition codes. For example, we could use LCDFR instead of LCDBR. 81 82-- 83 84We don't optimize block memory operations. 85 86It's definitely worth using things like MVC, CLC, NC, XC and OC with 87constant lengths. MVCIN may be worthwhile too. 88 89We should probably implement things like memcpy using MVC with EXECUTE. 90Likewise memcmp and CLC. MVCLE and CLCLE could be useful too. 91 92-- 93 94We don't optimize string operations. 95 96MVST, CLST, SRST and CUSE could be useful here. Some of the TRANSLATE 97family might be too, although they are probably more difficult to exploit. 98 99-- 100 101We don't take full advantage of builtins like fabsl because the calling 102conventions require f128s to be returned by invisible reference. 103 104-- 105 106ADD LOGICAL WITH SIGNED IMMEDIATE could be useful when we need to 107produce a carry. SUBTRACT LOGICAL IMMEDIATE could be useful when we 108need to produce a borrow. (Note that there are no memory forms of 109ADD LOGICAL WITH CARRY and SUBTRACT LOGICAL WITH BORROW, so the high 110part of 128-bit memory operations would probably need to be done 111via a register.) 112 113-- 114 115We don't use the halfword forms of LOAD REVERSED and STORE REVERSED 116(LRVH and STRVH). 117 118-- 119 120We could take advantage of the various ... UNDER MASK instructions, 121such as ICM and STCM. 122 123-- 124 125We could make more use of the ROTATE AND ... SELECTED BITS instructions. 126At the moment we only use RISBG, and only then for subword atomic operations. 127 128-- 129 130DAGCombiner can detect integer absolute, but there's not yet an associated 131ISD opcode. We could add one and implement it using LOAD POSITIVE. 132Negated absolutes could use LOAD NEGATIVE. 133 134-- 135 136DAGCombiner doesn't yet fold truncations of extended loads. Functions like: 137 138 unsigned long f (unsigned long x, unsigned short *y) 139 { 140 return (x << 32) | *y; 141 } 142 143therefore end up as: 144 145 sllg %r2, %r2, 32 146 llgh %r0, 0(%r3) 147 lr %r2, %r0 148 br %r14 149 150but truncating the load would give: 151 152 sllg %r2, %r2, 32 153 lh %r2, 0(%r3) 154 br %r14 155 156-- 157 158Functions like: 159 160define i64 @f1(i64 %a) { 161 %and = and i64 %a, 1 162 ret i64 %and 163} 164 165ought to be implemented as: 166 167 lhi %r0, 1 168 ngr %r2, %r0 169 br %r14 170 171but two-address optimisations reverse the order of the AND and force: 172 173 lhi %r0, 1 174 ngr %r0, %r2 175 lgr %r2, %r0 176 br %r14 177 178CodeGen/SystemZ/and-04.ll has several examples of this. 179 180-- 181 182Out-of-range displacements are usually handled by loading the full 183address into a register. In many cases it would be better to create 184an anchor point instead. E.g. for: 185 186define void @f4a(i128 *%aptr, i64 %base) { 187 %addr = add i64 %base, 524288 188 %bptr = inttoptr i64 %addr to i128 * 189 %a = load volatile i128 *%aptr 190 %b = load i128 *%bptr 191 %add = add i128 %a, %b 192 store i128 %add, i128 *%aptr 193 ret void 194} 195 196(from CodeGen/SystemZ/int-add-08.ll) we load %base+524288 and %base+524296 197into separate registers, rather than using %base+524288 as a base for both. 198 199-- 200 201Dynamic stack allocations round the size to 8 bytes and then allocate 202that rounded amount. It would be simpler to subtract the unrounded 203size from the copy of the stack pointer and then align the result. 204See CodeGen/SystemZ/alloca-01.ll for an example. 205 206-- 207 208Atomic loads and stores use the default compare-and-swap based implementation. 209This is much too conservative in practice, since the architecture guarantees 210that 1-, 2-, 4- and 8-byte loads and stores to aligned addresses are 211inherently atomic. 212 213-- 214 215If needed, we can support 16-byte atomics using LPQ, STPQ and CSDG. 216 217-- 218 219We might want to model all access registers and use them to spill 22032-bit values. 221