bf37bf9e21653f2439960d906a9c28cc19042bb0 |
|
18-Mar-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
ARM cost model: Make some vector integer to float casts cheaper The default logic marks them as too expensive. For example, before this patch we estimated: cost of 16 for instruction: %r = uitofp <4 x i16> %v0 to <4 x float> While this translates to: vmovl.u16 q8, d16 vcvt.f32.u32 q8, q8 All other costs are left to the values assigned by the fallback logic. Theses costs are mostly reasonable in the sense that they get progressively more expensive as the instruction sequences emitted get longer. radar://13445992 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177334 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|
01f25710148721f9fc2dece5eec17899ca414bcc |
|
18-Mar-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
ARM cost model: Correct cost for some cheap float to integer conversions Fix cost of some "cheap" cast instructions. Before this patch we used to estimate for example: cost of 16 for instruction: %r = fptoui <4 x float> %v0 to <4 x i16> While we would emit: vcvt.s32.f32 q8, q8 vmovn.i32 d16, q8 vuzp.8 d16, d17 All other costs are left to the values assigned by the fallback logic. Theses costs are mostly reasonable in the sense that they get progressively more expensive as the instruction sequences emitted get longer. radar://13434072 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177333 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|
5193e4ebe216dd5a07ab9cc58d40de5aafaa990c |
|
15-Mar-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
ARM cost model: Fix costs for some vector selects I was too pessimistic in r177105. Vector selects that fit into a legal register type lower just fine. I was mislead by the code fragment that I was using. The stores/loads that I saw in those cases came from lowering the conditional off an address. Changing the code fragment to: %T0_3 = type <8 x i18> %T1_3 = type <8 x i1> define void @func_blend3(%T0_3* %loadaddr, %T0_3* %loadaddr2, %T1_3* %blend, %T0_3* %storeaddr) { %v0 = load %T0_3* %loadaddr %v1 = load %T0_3* %loadaddr2 ==> FROM: ;%c = load %T1_3* %blend ==> TO: %c = icmp slt %T0_3 %v0, %v1 ==> USE: %r = select %T1_3 %c, %T0_3 %v0, %T0_3 %v1 store %T0_3 %r, %T0_3* %storeaddr ret void } revealed this mistake. radar://13403975 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177170 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|
c0d8dc0eb6e1df872affadba01f60e42275e2863 |
|
15-Mar-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
ARM cost model: Fix cost of fptrunc and fpext instructions A vector fptrunc and fpext simply gets split into scalar instructions. radar://13192358 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177159 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|
d81511f0a671a9271d7ba10cce6c27331b57f553 |
|
14-Mar-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
ARM cost model: Increase cost of some vector selects we do terrible on By terrible I mean we store/load from the stack. This matters on PAQp8 in _Z5trainPsS_ii (which is inlined into Mixer::update) where we decide to vectorize a loop with a VF of 8 resulting in a 25% degradation on a cortex-a8. LV: Found an estimated cost of 2 for VF 8 For instruction: icmp slt i32 LV: Found an estimated cost of 2 for VF 8 For instruction: select i1, i32, i32 The bug that tracks the CodeGen part is PR14868. radar://13403975 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177105 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|
b6f4872d29136637a3a5dfdf185f5afcbcdd3b2a |
|
12-Mar-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
ARM cost model: Increase the cost for vector casts that use the stack Increase the cost of v8/v16-i8 to v8/v16-i32 casts and truncates as the backend currently lowers those using stack accesses. This was responsible for a significant degradation on MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 where we vectorize one loop to a vector factor of 16. After this patch we select a vector factor of 4 which will generate reasonable code. unsigned char cle[32]; void test(short c) { unsigned short compte; for (compte = 0; compte <= 31; compte++) { cle[compte] = cle[compte] ^ c; } } radar://13220512 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176898 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|
6851623c54b35673f6e9a0ed0fd12378c93f48c4 |
|
12-Feb-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
ARM cost model: Add vector reverse shuffle costs A reverse shuffle is lowered to a vrev and possibly a vext instruction (quad word). radar://13171406 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174933 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|
fb55a8fd7c38aa09d9c243d48a8a72d890f36a3d |
|
08-Feb-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
ARM cost model: Address computation in vector mem ops not free Adds a function to target transform info to query for the cost of address computation. The cost model analysis pass now also queries this interface. The code in LoopVectorize adds the cost of address computation as part of the memory instruction cost calculation. Only there, we know whether the instruction will be scalarized or not. Increase the penality for inserting in to D registers on swift. This becomes necessary because we now always assume that address computation has a cost and three is a closer value to the architecture. radar://13097204 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174713 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|
66f535a273e52d56199c7ce8f975796017b6cbb2 |
|
07-Feb-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
ARM cost model: Add costs for vector selects Vector selects are cheap on NEON. They get lowered to a vbsl instruction. radar://13158753 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174631 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|
e2d5590c33f1b5203c0104c1c82bf8e0f28b828e |
|
05-Feb-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
ARM cost model: Cost for scalar integer casts and floating point conversions Also adds some costs for vector integer float conversions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174371 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|
a7ad84851b018602487779d97195bad0536f9a7a |
|
04-Feb-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
ARM cost model: Penalize insertelement into D subregisters Swift has a renaming dependency if we load into D subregisters. We don't have a way of distinguishing between insertelement operations of values from loads and other values. Therefore, we are pessimistic for now (The performance problem showed up in example 14 of gcc-loops). radar://13096933 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174300 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|
0261cea689c71a15175faf37fdc6bd1d9f69c46e |
|
30-Jan-2013 |
Renato Golin <renato.golin@linaro.org> |
Adding simple cast cost to ARM Changing ARMBaseTargetMachine to return ARMTargetLowering intead of the generic one (similar to x86 code). Tests showing which instructions were added to cast when necessary or cost zero when not. Downcast to 16 bits are not lowered in NEON, so costs are not there yet. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173849 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|
14925e6b885f8bd8cf448627386d412831f4bf1b |
|
09-Jan-2013 |
Nadav Rotem <nrotem@apple.com> |
ARM Cost model: Use the size of vector registers and widest vectorizable instruction to determine the max vectorization factor. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172010 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|
83be7b0dd3ae9a3cb22d36ae4c1775972553b94b |
|
09-Jan-2013 |
Nadav Rotem <nrotem@apple.com> |
Cost Model: Move the 'max unroll factor' variable to the TTI and add initial Cost Model support on ARM. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171928 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|
be04929f7fd76a921540e9901f24563e51dc1219 |
|
07-Jan-2013 |
Chandler Carruth <chandlerc@gmail.com> |
Move TargetTransformInfo to live under the Analysis library. This no longer would violate any dependency layering and it is in fact an analysis. =] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171686 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|
aeef83c6afa1e18d1cf9d359cc678ca0ad556175 |
|
07-Jan-2013 |
Chandler Carruth <chandlerc@gmail.com> |
Switch TargetTransformInfo from an immutable analysis pass that requires a TargetMachine to construct (and thus isn't always available), to an analysis group that supports layered implementations much like AliasAnalysis does. This is a pretty massive change, with a few parts that I was unable to easily separate (sorry), so I'll walk through it. The first step of this conversion was to make TargetTransformInfo an analysis group, and to sink the nonce implementations in ScalarTargetTransformInfo and VectorTargetTranformInfo into a NoTargetTransformInfo pass. This allows other passes to add a hard requirement on TTI, and assume they will always get at least on implementation. The TargetTransformInfo analysis group leverages the delegation chaining trick that AliasAnalysis uses, where the base class for the analysis group delegates to the previous analysis *pass*, allowing all but tho NoFoo analysis passes to only implement the parts of the interfaces they support. It also introduces a new trick where each pass in the group retains a pointer to the top-most pass that has been initialized. This allows passes to implement one API in terms of another API and benefit when some other pass above them in the stack has more precise results for the second API. The second step of this conversion is to create a pass that implements the TargetTransformInfo analysis using the target-independent abstractions in the code generator. This replaces the ScalarTargetTransformImpl and VectorTargetTransformImpl classes in lib/Target with a single pass in lib/CodeGen called BasicTargetTransformInfo. This class actually provides most of the TTI functionality, basing it upon the TargetLowering abstraction and other information in the target independent code generator. The third step of the conversion adds support to all TargetMachines to register custom analysis passes. This allows building those passes with access to TargetLowering or other target-specific classes, and it also allows each target to customize the set of analysis passes desired in the pass manager. The baseline LLVMTargetMachine implements this interface to add the BasicTTI pass to the pass manager, and all of the tools that want to support target-aware TTI passes call this routine on whatever target machine they end up with to add the appropriate passes. The fourth step of the conversion created target-specific TTI analysis passes for the X86 and ARM backends. These passes contain the custom logic that was previously in their extensions of the ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces. I separated them into their own file, as now all of the interface bits are private and they just expose a function to create the pass itself. Then I extended these target machines to set up a custom set of analysis passes, first adding BasicTTI as a fallback, and then adding their customized TTI implementations. The fourth step required logic that was shared between the target independent layer and the specific targets to move to a different interface, as they no longer derive from each other. As a consequence, a helper functions were added to TargetLowering representing the common logic needed both in the target implementation and the codegen implementation of the TTI pass. While technically this is the only change that could have been committed separately, it would have been a nightmare to extract. The final step of the conversion was just to delete all the old boilerplate. This got rid of the ScalarTargetTransformInfo and VectorTargetTransformInfo classes, all of the support in all of the targets for producing instances of them, and all of the support in the tools for manually constructing a pass based around them. Now that TTI is a relatively normal analysis group, two things become straightforward. First, we can sink it into lib/Analysis which is a more natural layer for it to live. Second, clients of this interface can depend on it *always* being available which will simplify their code and behavior. These (and other) simplifications will follow in subsequent commits, this one is clearly big enough. Finally, I'm very aware that much of the comments and documentation needs to be updated. As soon as I had this working, and plausibly well commented, I wanted to get it committed and in front of the build bots. I'll be doing a few passes over documentation later if it sticks. Commits to update DragonEgg and Clang will be made presently. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171681 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
|