7251a75f6ee9ce38263be6580a235187475458ed |
|
12-Jul-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
X86 cost model: Add cost for vectorized gather/scather radar://14351991 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186189 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
e6dc376eece3e48d7316b788846dac90181d2ffe |
|
27-Jun-2013 |
Nadav Rotem <nrotem@apple.com> |
Get rid of the unused class member. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185086 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
16d36a5cd1a581dfac79a4616b6b9602a43b6cd1 |
|
27-Jun-2013 |
Nadav Rotem <nrotem@apple.com> |
CostModel: improve the cost model for load/store of non power-of-two types such as <3 x float>, which are popular in graphics. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185085 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
34eb2406b41854fc8df688fca7c0129f77d768f7 |
|
25-Jun-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
X86 cost model: Vectorizing integer division is a bad idea radar://14057959 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184872 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
ef2d9e31940fc3121646e15effdfcc8f7f5e239b |
|
18-Jun-2013 |
Nadav Rotem <nrotem@apple.com> |
Fix 80 col violation. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184228 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
9c63f0d687cf1130ee2e76a6fdc87d71ae9d3961 |
|
17-Apr-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
X86 cost model: Exit before calling getSimpleVT on non-simple VTs getSimpleVT can only handle simple value types. radar://13676022 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179714 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
813456527e73f0c1468514c523c6258d360bcd91 |
|
08-Apr-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
X86 cost model: Model cost for uitofp and sitofp on SSE2 The costs are overfitted so that I can still use the legalization factor. For example the following kernel has about half the throughput vectorized than unvectorized when compiled with SSE2. Before this patch we would vectorize it. unsigned short A[1024]; double B[1024]; void f() { int i; for (i = 0; i < 1024; ++i) { B[i] = (double) A[i]; } } radar://13599001 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179033 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
2537f3c6597bc1b8eb14c76c8f8e7046be41c9ba |
|
05-Apr-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
X86 cost model: Differentiate cost for vector shifts of constants SSE2 has efficient support for shifts by a scalar. My previous change of making shifts expensive did not take this into account marking all shifts as expensive. This would prevent vectorization from happening where it is actually beneficial. With this change we differentiate between shifts of constants and other shifts. radar://13576547 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178808 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
6bf4f676413b8f7d97aaff289997aab344180957 |
|
05-Apr-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
CostModel: Add parameter to instruction cost to further classify operand values On certain architectures we can support efficient vectorized version of instructions if the operand value is uniform (splat) or a constant scalar. An example of this is a vector shift on x86. We can efficiently support for (i = 0 ; i < ; i += 4) w[0:3] = v[0:3] << <2, 2, 2, 2> but not for (i = 0; i < ; i += 4) w[0:3] = v[0:3] << x[0:3] This patch adds a parameter to getArithmeticInstrCost to further qualify operand values as uniform or uniform constant. Targets can then choose to return a different cost for instructions with such operand values. A follow-up commit will test this feature on x86. radar://13576547 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178807 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
6b6050b229976a2f53184f6d6857e6f445a869d0 |
|
03-Apr-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
X86 cost model: Vector shifts are expensive in most cases The default logic does not correctly identify costs of casts because they are marked as custom on x86. For some cases, where the shift amount is a scalar we would be able to generate better code. Unfortunately, when this is the case the value (the splat) will get hoisted out of the loop, thereby making it invisible to ISel. radar://13130673 radar://13537826 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178703 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
13497b3aa7589fc4f9e924f850a7e5151e9ddd2f |
|
01-Apr-2013 |
Benjamin Kramer <benny.kra@googlemail.com> |
X86TTI: Add accurate costs for itofp operations, based on the actual instruction counts. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178459 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
f74e9bf650d7c40d595d3bb60e3c901e2bccec4b |
|
20-Mar-2013 |
Michael Liao <michael.liao@intel.com> |
Correct cost model for vector shift on AVX2 - After moving logic recognizing vector shift with scalar amount from DAG combining into DAG lowering, we declare to customize all vector shifts even vector shift on AVX is legal. As a result, the cost model needs special tuning to identify these legal cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177586 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
b05130e1b20ed17ae9d5ab3351933babd27213e1 |
|
19-Mar-2013 |
Nadav Rotem <nrotem@apple.com> |
Optimize sext <4 x i8> and <4 x i16> to <4 x i64>. Patch by Ahmad, Muhammad T <muhammad.t.ahmad@intel.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177421 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
5f0d9dbdf48a9efe16bfadf88e5335f7b9a8ec3f |
|
02-Mar-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
X86 cost model: Adjust cost for custom lowered vector multiplies This matters for example in following matrix multiply: int **mmult(int rows, int cols, int **m1, int **m2, int **m3) { int i, j, k, val; for (i=0; i<rows; i++) { for (j=0; j<cols; j++) { val = 0; for (k=0; k<cols; k++) { val += m1[i][k] * m2[k][j]; } m3[i][j] = val; } } return(m3); } Taken from the test-suite benchmark Shootout. We estimate the cost of the multiply to be 2 while we generate 9 instructions for it and end up being quite a bit slower than the scalar version (48% on my machine). Also, properly differentiate between avx1 and avx2. On avx-1 we still split the vector into 2 128bits and handle the subvector muls like above with 9 instructions. Only on avx-2 will we have a cost of 9 for v4i64. I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an add instead of a mul because with a mul we now no longer vectorize. I did verify that the mul would be indeed more expensive when vectorized with 3 kernels: for (i ...) r += a[i] * 3; for (i ...) m1[i] = m1[i] * 3; // This matches the test case in avx1.ll and a matrix multiply. In each case the vectorized version was considerably slower. radar://13304919 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176403 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
52981c4b6016d9f0e295e0771ec0a50dd073b4b3 |
|
20-Feb-2013 |
Elena Demikhovsky <elena.demikhovsky@intel.com> |
I optimized the following patterns: sext <4 x i1> to <4 x i64> sext <4 x i8> to <4 x i64> sext <4 x i16> to <4 x i64> I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns: (sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT))) The sext_in_reg (v4i32 x) may be lowered to shl+sar operations. The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution. I also added a cost of this operations to the AVX costs table. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175619 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
b3755e7fa2e386e9bd348eda6b1876ae09c1bf99 |
|
25-Jan-2013 |
Renato Golin <renato.golin@linaro.org> |
Moving Cost Tables up to share with other targets git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173382 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
053a2119835ac6ca3484f1b496cabd43c37e4279 |
|
20-Jan-2013 |
Renato Golin <renato.golin@linaro.org> |
Revert CostTable algorithm, will re-write git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172992 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
d3c965d6251e6d939f7797f8704d4e3a82f7e274 |
|
16-Jan-2013 |
Renato Golin <renato.golin@linaro.org> |
Change CostTable model to be global to all targets Moving the X86CostTable to a common place, so that other back-ends can share the code. Also simplifying it a bit and commoning up tables with one and two types on operations. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172658 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
14925e6b885f8bd8cf448627386d412831f4bf1b |
|
09-Jan-2013 |
Nadav Rotem <nrotem@apple.com> |
ARM Cost model: Use the size of vector registers and widest vectorizable instruction to determine the max vectorization factor. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172010 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
83be7b0dd3ae9a3cb22d36ae4c1775972553b94b |
|
09-Jan-2013 |
Nadav Rotem <nrotem@apple.com> |
Cost Model: Move the 'max unroll factor' variable to the TTI and add initial Cost Model support on ARM. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171928 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
bb00800ff46e7a2a628d0a6741a7f0422c74c198 |
|
07-Jan-2013 |
Chandler Carruth <chandlerc@gmail.com> |
Fix the enumerator names for ShuffleKind to match tho coding standards, and make its comments doxygen comments. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171688 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
d1b8ef97c47d347f2a2261a0d6de4872f248321f |
|
07-Jan-2013 |
Chandler Carruth <chandlerc@gmail.com> |
Make the popcnt support enums and methods have more clear names and follow the conding conventions regarding enumerating a set of "kinds" of things. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171687 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
be04929f7fd76a921540e9901f24563e51dc1219 |
|
07-Jan-2013 |
Chandler Carruth <chandlerc@gmail.com> |
Move TargetTransformInfo to live under the Analysis library. This no longer would violate any dependency layering and it is in fact an analysis. =] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171686 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|
aeef83c6afa1e18d1cf9d359cc678ca0ad556175 |
|
07-Jan-2013 |
Chandler Carruth <chandlerc@gmail.com> |
Switch TargetTransformInfo from an immutable analysis pass that requires a TargetMachine to construct (and thus isn't always available), to an analysis group that supports layered implementations much like AliasAnalysis does. This is a pretty massive change, with a few parts that I was unable to easily separate (sorry), so I'll walk through it. The first step of this conversion was to make TargetTransformInfo an analysis group, and to sink the nonce implementations in ScalarTargetTransformInfo and VectorTargetTranformInfo into a NoTargetTransformInfo pass. This allows other passes to add a hard requirement on TTI, and assume they will always get at least on implementation. The TargetTransformInfo analysis group leverages the delegation chaining trick that AliasAnalysis uses, where the base class for the analysis group delegates to the previous analysis *pass*, allowing all but tho NoFoo analysis passes to only implement the parts of the interfaces they support. It also introduces a new trick where each pass in the group retains a pointer to the top-most pass that has been initialized. This allows passes to implement one API in terms of another API and benefit when some other pass above them in the stack has more precise results for the second API. The second step of this conversion is to create a pass that implements the TargetTransformInfo analysis using the target-independent abstractions in the code generator. This replaces the ScalarTargetTransformImpl and VectorTargetTransformImpl classes in lib/Target with a single pass in lib/CodeGen called BasicTargetTransformInfo. This class actually provides most of the TTI functionality, basing it upon the TargetLowering abstraction and other information in the target independent code generator. The third step of the conversion adds support to all TargetMachines to register custom analysis passes. This allows building those passes with access to TargetLowering or other target-specific classes, and it also allows each target to customize the set of analysis passes desired in the pass manager. The baseline LLVMTargetMachine implements this interface to add the BasicTTI pass to the pass manager, and all of the tools that want to support target-aware TTI passes call this routine on whatever target machine they end up with to add the appropriate passes. The fourth step of the conversion created target-specific TTI analysis passes for the X86 and ARM backends. These passes contain the custom logic that was previously in their extensions of the ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces. I separated them into their own file, as now all of the interface bits are private and they just expose a function to create the pass itself. Then I extended these target machines to set up a custom set of analysis passes, first adding BasicTTI as a fallback, and then adding their customized TTI implementations. The fourth step required logic that was shared between the target independent layer and the specific targets to move to a different interface, as they no longer derive from each other. As a consequence, a helper functions were added to TargetLowering representing the common logic needed both in the target implementation and the codegen implementation of the TTI pass. While technically this is the only change that could have been committed separately, it would have been a nightmare to extract. The final step of the conversion was just to delete all the old boilerplate. This got rid of the ScalarTargetTransformInfo and VectorTargetTransformInfo classes, all of the support in all of the targets for producing instances of them, and all of the support in the tools for manually constructing a pass based around them. Now that TTI is a relatively normal analysis group, two things become straightforward. First, we can sink it into lib/Analysis which is a more natural layer for it to live. Second, clients of this interface can depend on it *always* being available which will simplify their code and behavior. These (and other) simplifications will follow in subsequent commits, this one is clearly big enough. Finally, I'm very aware that much of the comments and documentation needs to be updated. As soon as I had this working, and plausibly well commented, I wanted to get it committed and in front of the build bots. I'll be doing a few passes over documentation later if it sticks. Commits to update DragonEgg and Clang will be made presently. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171681 91177308-0d34-0410-b5e6-96231b3b80d8
/external/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
|