d4c20e82ae34b105fb2d06c8c412656aba2ca1b9 |
|
15-Nov-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Restructure texture insts as ALU ops with tex_[strb] as the dst. For now we're still just generating MOVs, but this will let us fold into other ops in the future. No difference on shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
d3cdbf6fd817ae5e7a8a72bcc3f43cc1b04a709b |
|
09-Jul-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add a "qir_for_each_inst_inorder" macro and use it in many places. We have the prior list_foreach() all over the code, but I need to move where instructions live as part of adding support for control flow. Start by just converting to a helper iterator macro. (The simpler "qir_for_each_inst()" will be used for the for-each-inst-in-a-block iterator macro later)
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
71db7d3dc577e48da3689fd66989ec3b0a069089 |
|
22-Dec-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Replace the SSA-style SEL operators with conditional MOVs. I'm moving away from QIR being SSA (since NIR is doing lots of SSA optimization for us now) and instead having QIR just be QPU operations with virtual registers. By making our SELs be composed of two MOVs, we could potentially coalesce the registers for the MOV's src and dst and eliminate the MOV. total instructions in shared programs: 88448 -> 88028 (-0.47%) instructions in affected programs: 39845 -> 39425 (-1.05%) total estimated cycles in shared programs: 246306 -> 245762 (-0.22%) estimated cycles in affected programs: 162887 -> 162343 (-0.33%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
076551116ed5fc1b0991cb84e1e5453f5a2e11db |
|
11-Dec-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Add quick algebraic optimization for clamping of unpacked values. GL likes to saturate your incoming color, but if that color's coming from unpacking from unorms, there's no point. Ideally we'd have a range propagation pass that cleans these up in NIR, but that doesn't seem to be going to land soon. It seems like we could do a one-off optimization in nir_opt_algebraic, except that doesn't want to operate on expressions involving unpack_unorm_4x8, since it's sized. total instructions in shared programs: 87879 -> 87761 (-0.13%) instructions in affected programs: 6044 -> 5926 (-1.95%) total estimated cycles in shared programs: 349457 -> 349252 (-0.06%) estimated cycles in affected programs: 6172 -> 5967 (-3.32%) No SSPD on openarena (which had the biggest gains, in its VS/CSes), n=15.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
e3efc4b02334897e0103f8cf926f376159ca1293 |
|
11-Dec-2015 |
Eric Anholt <eric@anholt.net> |
vc4: When doing algebraic optimization into a MOV, use the right MOV. If there were src unpacks, changing to the integer MOV instead of float (for example) would change the unpack operation.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
b70a2f4d81940ef103c95ee51f2a84391a076ac0 |
|
11-Dec-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Add missing progress note in opt_algebraic.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
a4bf28178f064082d3b818d2cd48abf9075cc459 |
|
11-Nov-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB. It looks like nir_lower_idiv is going to use it soon, so add support. With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with GL 3.0 forced on). Cc: "11.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
99a9a5a345fab8bbf36ab4e42581f8ee04a59a63 |
|
25-Oct-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Switch the unpack ops to being unpack flags on a mov. This paves the way for copy propagating our unpacks. We end up with a small change on shader-db: total instructions in shared programs: 89390 -> 89251 (-0.16%) instructions in affected programs: 19041 -> 18902 (-0.73%) which appears to be because we no longer convert MOVs for an FMAX dst, r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and this ends up with a slightly better schedule.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
8cae9f2fda37b9868ea973a665e1acc115172b45 |
|
19-Aug-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Add algebraic opt for rcp(1.0). We're generating rcps as part of backend lowering of the packed coordinate in the CS, and we don't want to lower them in NIR because of the extra newton-raphson steps in the common case. However, GLB2.7 is moving a vertex attribute with a 1.0 W component to the position, and that makes us produce some silly RCPs. total instructions in shared programs: 97590 -> 97580 (-0.01%) instructions in affected programs: 74 -> 64 (-13.51%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
8b36d107fdd6f6b91556fcdc3498df16803d4181 |
|
19-Aug-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Pack the unorm-packing bits into a src MUL instruction when possible. Now that we do non-SSA QIR instructions, we can take a NIR SSA src that's only used by the unorm packing and just stuff the pack bits into it. total instructions in shared programs: 98136 -> 97974 (-0.17%) instructions in affected programs: 4149 -> 3987 (-3.90%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
fd74da11c48dcd9098d4f64508aae65775c68b75 |
|
19-Aug-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Drop an unused algebraic op. NIR now handles this optimization for us.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
0bba4fa070583f5fd8a0f7208fbfa181dc25e71b |
|
04-Aug-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Allow QIR registers to be non-SSA. Now that we have NIR, most of the optimization we still need to do is peepholes on instruction selection rather than general dataflow operations. This means we want to be able to have QIR be a lot closer to the actual QPU instructions, just with virtual registers. Allowing multiple instructions writing the same register opens up a lot of possibilities.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
78c773bb3646295e4a4f1fe7d6d10f05758ee48b |
|
30-May-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Convert from simple_list.h to list.h list.h is a nicer and more familiar set of list functions/macros.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
1dcc1ee314a6907213e2abd5337ec0bbba3bd1bf |
|
30-Mar-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Drop integer multiplies with 0 to moves of 0. This cleans up more instructions generated by uniform array indexing multiplies. total instructions in shared programs: 39989 -> 39961 (-0.07%) instructions in affected programs: 896 -> 868 (-3.12%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
85316d059c899ac096331251de6b233229aa0b4f |
|
19-Feb-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Keep an array of pointers to instructions defining the temps around. The optimization passes are always regenerating it and throwing it away, but it's not hard to keep track of.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
3f1e1287fd960966eee8b12a75c8a8f62e11cdd2 |
|
12-Feb-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Make SF be a flag on the QIR instructions. Right now the places that used to emit a mov.sf just put the SF on the previous instruction when it generated the source of the SF value. Even without optimization to push the sf up further (and kill thus potentially kill more MOVs), this gets us: total uniforms in shared programs: 13455 -> 13457 (0.01%) uniforms in affected programs: 3 -> 5 (66.67%) total instructions in shared programs: 40296 -> 40198 (-0.24%) instructions in affected programs: 12595 -> 12497 (-0.78%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
753c327151ed7d23218879149950f0028b0e7b4d |
|
01-Feb-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Kill a bunch of color write calculation when colormask is all off. I could have done this in the bit that generates the ANDs and ORs, but it's probably generally useful. Sadly, I still need this even if I move to NIR, because I can't yet express my read of the destination color in NIR, which I would need to move my blend/logicop/colormask handling into NIR. total uniforms in shared programs: 13497 -> 13455 (-0.31%) uniforms in affected programs: 101 -> 59 (-41.58%) total instructions in shared programs: 40797 -> 40296 (-1.23%) instructions in affected programs: 1639 -> 1138 (-30.57%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
e473fbe4690b5cbe3769042a4917f22559e2ba8d |
|
10-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for turning constant uniforms into small immediates. Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
ff266483fb61fd69775daf5c931ca7a56a26f4ac |
|
11-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Move follow_movs() to common QIR code. I want this from other passes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
7c474f9f2e5e3161ad27129844139ee14d916726 |
|
09-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Optimize the other case of SEL_X_Y wih a 0 -> SEL_X_0(a). Cleans up some output to be more obvious in a piglit test I'm looking at.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
7e67ea994c34a6ebbaeb6a097036702c7a96496f |
|
09-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Optimize out adds of 0.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
0401f55fffc6e77807e6987e23d2709a1599d61e |
|
09-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Optimize fmul(x, 0) and fmul(x, 1). This was being generated frequently by matrix multiplies of 2 and 3-channel vertex attributes (which have the 0 or 1 loaded in the shader).
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
1cd8c1aab04c4da9aa6cbbd52460607b8416ce1b |
|
09-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Factor out the turn-it-into-a-mov in opt_algebraic. This will be used more in the next commits.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
5a1352289862a9bd695a15009c69cad54727c66b |
|
09-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Optimize SF(ITOF(x)) -> SF(x). This is a common production of st_glsl_to_tgsi, because CMP takes a float argument.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
00a9aebfe064ec252a95e0f3a38f4f6c967dadc4 |
|
09-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add some optimization of FADD(FSUB(0, x)). This is a common production of st_glsl_to_tgsi, which uses negate flags on source arguments to handle subtraction.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
76cd9955d96c1b0a13905e255571eb35b3aa2a99 |
|
25-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Optimize out silly SUBs of 0. Drops instructions on vs-temp-array-mat4-index-col-row-wr.shader_test, which I was looking at because it's failing to register allocate.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
64122b16ce74a3fb65269bab325c651c26ccd2d0 |
|
25-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Dump constant uniform values in VC4_DEBUG=qir. Definitely helps when trying to understand and optimize a program.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
3311513041b81fe4e2fcf2c0e8a363a6d292c7b1 |
|
25-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Turn a SEL_X_Y(x, 0) into SEL_X_0(x). This may reduce register pressure and uniform counts. Drops a bunch of 0 uniform loads on vs-temp-array-mat4-index-col-row-wr.shader_test, which is failing to register allocate.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
4bca922878a4d433077d21d4918b1db71b3a15f7 |
|
13-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Merge qcompile and tgsi_to_qir The split between these two didn't make much sense. I'm going to want the chance to look at uniform contents in optimization passes, and the QPU emit I think is going to end up rewriting the uniforms stream.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
874dfa8b2ecccf3c9a73453d7ccc6638363a59bd |
|
25-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Expose compares at a lower level in QIR. Before, we had some special opcodes like CMP and SNE that emitted multiple instructions. Now, we reduce those operations significantly, giving optimization more to look at for reducing redundant operations. The downside is that QOP_SF is pretty special -- we're going to have to track it separately when we're doing instruction scheduling, and we want to peephole it into the instruction generating the destination write in most cases (and not allocate the destination reg, probably. Unless it's used for some other purpose, as well).
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
b0a1e401a93b7b13870b936bc667b3fc15dba6d5 |
|
19-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Include stdio/stdlib in headers so I don't have to include it per file. There are a few tools I want to have always available, and fprintf() and abort() are among them.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|
1d23d55ae97d07b6eb70a3e37a91ecb7de38d8d2 |
|
03-Jul-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add an initial pass of algebraic optimization. There was a lot of extra noise in my piglit shader dumps because of silly CMPs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
|