README_ALTIVEC.txt revision 9f7e1271336954d5319189c64f96bf187bb55cd9
1//===- README_ALTIVEC.txt - Notes for improving Altivec code gen ----------===// 2 3Implement PPCInstrInfo::isLoadFromStackSlot/isStoreToStackSlot for vector 4registers, to generate better spill code. 5 6//===----------------------------------------------------------------------===// 7 8Altivec support. The first should be a single lvx from the constant pool, the 9second should be a xor/stvx: 10 11void foo(void) { 12 int x[8] __attribute__((aligned(128))) = { 1, 1, 1, 1, 1, 1, 1, 1 }; 13 bar (x); 14} 15 16#include <string.h> 17void foo(void) { 18 int x[8] __attribute__((aligned(128))); 19 memset (x, 0, sizeof (x)); 20 bar (x); 21} 22 23//===----------------------------------------------------------------------===// 24 25Altivec: Codegen'ing MUL with vector FMADD should add -0.0, not 0.0: 26http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8763 27 28When -ffast-math is on, we can use 0.0. 29 30//===----------------------------------------------------------------------===// 31 32 Consider this: 33 v4f32 Vector; 34 v4f32 Vector2 = { Vector.X, Vector.X, Vector.X, Vector.X }; 35 36Since we know that "Vector" is 16-byte aligned and we know the element offset 37of ".X", we should change the load into a lve*x instruction, instead of doing 38a load/store/lve*x sequence. 39 40//===----------------------------------------------------------------------===// 41 42There are a wide range of vector constants we can generate with combinations of 43altivec instructions. Examples 44 GCC does: "t=vsplti*, r = t+t" for constants it can't generate with one vsplti 45 46 -0.0 (sign bit): vspltisw v0,-1 / vslw v0,v0,v0 47 48//===----------------------------------------------------------------------===// 49 50Missing intrinsics: 51 52ds* 53mf* 54vavg* 55vmax* 56vmin* 57vmladduhm 58vmr* 59vsel (some aliases only accessible using builtins) 60 61//===----------------------------------------------------------------------===// 62 63FABS/FNEG can be codegen'd with the appropriate and/xor of -0.0. 64 65//===----------------------------------------------------------------------===// 66 67For functions that use altivec AND have calls, we are VRSAVE'ing all call 68clobbered regs. 69 70//===----------------------------------------------------------------------===// 71 72VSPLTW and friends are expanded by the FE into insert/extract element ops. Make 73sure that the dag combiner puts them back together in the appropriate 74vector_shuffle node and that this gets pattern matched appropriately. 75 76//===----------------------------------------------------------------------===// 77 78Implement passing/returning vectors by value. 79 80//===----------------------------------------------------------------------===// 81 82GCC apparently tries to codegen { C1, C2, Variable, C3 } as a constant pool load 83of C1/C2/C3, then a load and vperm of Variable. 84 85//===----------------------------------------------------------------------===// 86 87We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a 16-byte 88aligned stack slot, followed by a lve*x/vperm. We should probably just store it 89to a scalar stack slot, then use lvsl/vperm to load it. If the value is already 90in memory, this is a huge win. 91 92//===----------------------------------------------------------------------===// 93 94Do not generate the MFCR/RLWINM sequence for predicate compares when the 95predicate compare is used immediately by a branch. Just branch on the right 96cond code on CR6. 97 98//===----------------------------------------------------------------------===// 99 100SROA should turn "vector unions" into the appropriate insert/extract element 101instructions. 102 103//===----------------------------------------------------------------------===// 104 105We need an LLVM 'shuffle' instruction, that corresponds to the VECTOR_SHUFFLE 106node. 107 108//===----------------------------------------------------------------------===// 109 110We need a way to teach tblgen that some operands of an intrinsic are required to 111be constants. The verifier should enforce this constraint. 112 113//===----------------------------------------------------------------------===// 114 115We should instcombine the lvx/stvx intrinsics into loads/stores if we know that 116the loaded address is 16-byte aligned. 117 118//===----------------------------------------------------------------------===// 119 120Instead of writting a pattern for type-agnostic operations (e.g. gen-zero, load, 121store, and, ...) in every supported type, make legalize do the work. We should 122have a canonical type that we want operations changed to (e.g. v4i32 for 123build_vector) and legalize should change non-identical types to thse. This is 124similar to what it does for operations that are only supported in some types, 125e.g. x86 cmov (not supported on bytes). 126 127This would fix two problems: 1281. Writing patterns multiple times. 1292. Identical operations in different types are not getting CSE'd (e.g. 130 { 0U, 0U, 0U, 0U } and {0.0, 0.0, 0.0, 0.0}. 131 132//===----------------------------------------------------------------------===// 133 134Instcombine llvm.ppc.altivec.vperm with an immediate into a shuffle operation. 135 136//===----------------------------------------------------------------------===// 137 138Handle VECTOR_SHUFFLE nodes with the appropriate shuffle mask with vsldoi, 139vpkuhum and vpkuwum. 140