Cross Reference: /external/mesa3d/src/gallium/drivers/vc4/vc4_opt

History log of /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
d4c20e82ae34b105fb2d06c8c412656aba2ca1b9	15-Nov-2016	Eric Anholt <eric@anholt.net>	vc4: Restructure texture insts as ALU ops with tex_[strb] as the dst. For now we're still just generating MOVs, but this will let us fold into other ops in the future. No difference on shader-db. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
d3cdbf6fd817ae5e7a8a72bcc3f43cc1b04a709b	09-Jul-2016	Eric Anholt <eric@anholt.net>	vc4: Add a "qir_for_each_inst_inorder" macro and use it in many places. We have the prior list_foreach() all over the code, but I need to move where instructions live as part of adding support for control flow. Start by just converting to a helper iterator macro. (The simpler "qir_for_each_inst()" will be used for the for-each-inst-in-a-block iterator macro later) /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
71db7d3dc577e48da3689fd66989ec3b0a069089	22-Dec-2015	Eric Anholt <eric@anholt.net>	vc4: Replace the SSA-style SEL operators with conditional MOVs. I'm moving away from QIR being SSA (since NIR is doing lots of SSA optimization for us now) and instead having QIR just be QPU operations with virtual registers. By making our SELs be composed of two MOVs, we could potentially coalesce the registers for the MOV's src and dst and eliminate the MOV. total instructions in shared programs: 88448 -> 88028 (-0.47%) instructions in affected programs: 39845 -> 39425 (-1.05%) total estimated cycles in shared programs: 246306 -> 245762 (-0.22%) estimated cycles in affected programs: 162887 -> 162343 (-0.33%) /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
076551116ed5fc1b0991cb84e1e5453f5a2e11db	11-Dec-2015	Eric Anholt <eric@anholt.net>	vc4: Add quick algebraic optimization for clamping of unpacked values. GL likes to saturate your incoming color, but if that color's coming from unpacking from unorms, there's no point. Ideally we'd have a range propagation pass that cleans these up in NIR, but that doesn't seem to be going to land soon. It seems like we could do a one-off optimization in nir_opt_algebraic, except that doesn't want to operate on expressions involving unpack_unorm_4x8, since it's sized. total instructions in shared programs: 87879 -> 87761 (-0.13%) instructions in affected programs: 6044 -> 5926 (-1.95%) total estimated cycles in shared programs: 349457 -> 349252 (-0.06%) estimated cycles in affected programs: 6172 -> 5967 (-3.32%) No SSPD on openarena (which had the biggest gains, in its VS/CSes), n=15. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
e3efc4b02334897e0103f8cf926f376159ca1293	11-Dec-2015	Eric Anholt <eric@anholt.net>	vc4: When doing algebraic optimization into a MOV, use the right MOV. If there were src unpacks, changing to the integer MOV instead of float (for example) would change the unpack operation. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
b70a2f4d81940ef103c95ee51f2a84391a076ac0	11-Dec-2015	Eric Anholt <eric@anholt.net>	vc4: Add missing progress note in opt_algebraic. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
a4bf28178f064082d3b818d2cd48abf9075cc459	11-Nov-2015	Eric Anholt <eric@anholt.net>	vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB. It looks like nir_lower_idiv is going to use it soon, so add support. With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with GL 3.0 forced on). Cc: "11.0" <mesa-stable@lists.freedesktop.org> /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
99a9a5a345fab8bbf36ab4e42581f8ee04a59a63	25-Oct-2015	Eric Anholt <eric@anholt.net>	vc4: Switch the unpack ops to being unpack flags on a mov. This paves the way for copy propagating our unpacks. We end up with a small change on shader-db: total instructions in shared programs: 89390 -> 89251 (-0.16%) instructions in affected programs: 19041 -> 18902 (-0.73%) which appears to be because we no longer convert MOVs for an FMAX dst, r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and this ends up with a slightly better schedule. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
8cae9f2fda37b9868ea973a665e1acc115172b45	19-Aug-2015	Eric Anholt <eric@anholt.net>	vc4: Add algebraic opt for rcp(1.0). We're generating rcps as part of backend lowering of the packed coordinate in the CS, and we don't want to lower them in NIR because of the extra newton-raphson steps in the common case. However, GLB2.7 is moving a vertex attribute with a 1.0 W component to the position, and that makes us produce some silly RCPs. total instructions in shared programs: 97590 -> 97580 (-0.01%) instructions in affected programs: 74 -> 64 (-13.51%) /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
8b36d107fdd6f6b91556fcdc3498df16803d4181	19-Aug-2015	Eric Anholt <eric@anholt.net>	vc4: Pack the unorm-packing bits into a src MUL instruction when possible. Now that we do non-SSA QIR instructions, we can take a NIR SSA src that's only used by the unorm packing and just stuff the pack bits into it. total instructions in shared programs: 98136 -> 97974 (-0.17%) instructions in affected programs: 4149 -> 3987 (-3.90%) /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
fd74da11c48dcd9098d4f64508aae65775c68b75	19-Aug-2015	Eric Anholt <eric@anholt.net>	vc4: Drop an unused algebraic op. NIR now handles this optimization for us. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
0bba4fa070583f5fd8a0f7208fbfa181dc25e71b	04-Aug-2015	Eric Anholt <eric@anholt.net>	vc4: Allow QIR registers to be non-SSA. Now that we have NIR, most of the optimization we still need to do is peepholes on instruction selection rather than general dataflow operations. This means we want to be able to have QIR be a lot closer to the actual QPU instructions, just with virtual registers. Allowing multiple instructions writing the same register opens up a lot of possibilities. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
78c773bb3646295e4a4f1fe7d6d10f05758ee48b	30-May-2015	Eric Anholt <eric@anholt.net>	vc4: Convert from simple_list.h to list.h list.h is a nicer and more familiar set of list functions/macros. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
1dcc1ee314a6907213e2abd5337ec0bbba3bd1bf	30-Mar-2015	Eric Anholt <eric@anholt.net>	vc4: Drop integer multiplies with 0 to moves of 0. This cleans up more instructions generated by uniform array indexing multiplies. total instructions in shared programs: 39989 -> 39961 (-0.07%) instructions in affected programs: 896 -> 868 (-3.12%) /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
85316d059c899ac096331251de6b233229aa0b4f	19-Feb-2015	Eric Anholt <eric@anholt.net>	vc4: Keep an array of pointers to instructions defining the temps around. The optimization passes are always regenerating it and throwing it away, but it's not hard to keep track of. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
3f1e1287fd960966eee8b12a75c8a8f62e11cdd2	12-Feb-2015	Eric Anholt <eric@anholt.net>	vc4: Make SF be a flag on the QIR instructions. Right now the places that used to emit a mov.sf just put the SF on the previous instruction when it generated the source of the SF value. Even without optimization to push the sf up further (and kill thus potentially kill more MOVs), this gets us: total uniforms in shared programs: 13455 -> 13457 (0.01%) uniforms in affected programs: 3 -> 5 (66.67%) total instructions in shared programs: 40296 -> 40198 (-0.24%) instructions in affected programs: 12595 -> 12497 (-0.78%) /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
753c327151ed7d23218879149950f0028b0e7b4d	01-Feb-2015	Eric Anholt <eric@anholt.net>	vc4: Kill a bunch of color write calculation when colormask is all off. I could have done this in the bit that generates the ANDs and ORs, but it's probably generally useful. Sadly, I still need this even if I move to NIR, because I can't yet express my read of the destination color in NIR, which I would need to move my blend/logicop/colormask handling into NIR. total uniforms in shared programs: 13497 -> 13455 (-0.31%) uniforms in affected programs: 101 -> 59 (-41.58%) total instructions in shared programs: 40797 -> 40296 (-1.23%) instructions in affected programs: 1639 -> 1138 (-30.57%) /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
e473fbe4690b5cbe3769042a4917f22559e2ba8d	10-Dec-2014	Eric Anholt <eric@anholt.net>	vc4: Add support for turning constant uniforms into small immediates. Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
ff266483fb61fd69775daf5c931ca7a56a26f4ac	11-Dec-2014	Eric Anholt <eric@anholt.net>	vc4: Move follow_movs() to common QIR code. I want this from other passes. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
7c474f9f2e5e3161ad27129844139ee14d916726	09-Oct-2014	Eric Anholt <eric@anholt.net>	vc4: Optimize the other case of SEL_X_Y wih a 0 -> SEL_X_0(a). Cleans up some output to be more obvious in a piglit test I'm looking at. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
7e67ea994c34a6ebbaeb6a097036702c7a96496f	09-Oct-2014	Eric Anholt <eric@anholt.net>	vc4: Optimize out adds of 0. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
0401f55fffc6e77807e6987e23d2709a1599d61e	09-Oct-2014	Eric Anholt <eric@anholt.net>	vc4: Optimize fmul(x, 0) and fmul(x, 1). This was being generated frequently by matrix multiplies of 2 and 3-channel vertex attributes (which have the 0 or 1 loaded in the shader). /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
1cd8c1aab04c4da9aa6cbbd52460607b8416ce1b	09-Oct-2014	Eric Anholt <eric@anholt.net>	vc4: Factor out the turn-it-into-a-mov in opt_algebraic. This will be used more in the next commits. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
5a1352289862a9bd695a15009c69cad54727c66b	09-Oct-2014	Eric Anholt <eric@anholt.net>	vc4: Optimize SF(ITOF(x)) -> SF(x). This is a common production of st_glsl_to_tgsi, because CMP takes a float argument. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
00a9aebfe064ec252a95e0f3a38f4f6c967dadc4	09-Oct-2014	Eric Anholt <eric@anholt.net>	vc4: Add some optimization of FADD(FSUB(0, x)). This is a common production of st_glsl_to_tgsi, which uses negate flags on source arguments to handle subtraction. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
76cd9955d96c1b0a13905e255571eb35b3aa2a99	25-Sep-2014	Eric Anholt <eric@anholt.net>	vc4: Optimize out silly SUBs of 0. Drops instructions on vs-temp-array-mat4-index-col-row-wr.shader_test, which I was looking at because it's failing to register allocate. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
64122b16ce74a3fb65269bab325c651c26ccd2d0	25-Sep-2014	Eric Anholt <eric@anholt.net>	vc4: Dump constant uniform values in VC4_DEBUG=qir. Definitely helps when trying to understand and optimize a program. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
3311513041b81fe4e2fcf2c0e8a363a6d292c7b1	25-Sep-2014	Eric Anholt <eric@anholt.net>	vc4: Turn a SEL_X_Y(x, 0) into SEL_X_0(x). This may reduce register pressure and uniform counts. Drops a bunch of 0 uniform loads on vs-temp-array-mat4-index-col-row-wr.shader_test, which is failing to register allocate. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
4bca922878a4d433077d21d4918b1db71b3a15f7	13-Aug-2014	Eric Anholt <eric@anholt.net>	vc4: Merge qcompile and tgsi_to_qir The split between these two didn't make much sense. I'm going to want the chance to look at uniform contents in optimization passes, and the QPU emit I think is going to end up rewriting the uniforms stream. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
874dfa8b2ecccf3c9a73453d7ccc6638363a59bd	25-Aug-2014	Eric Anholt <eric@anholt.net>	vc4: Expose compares at a lower level in QIR. Before, we had some special opcodes like CMP and SNE that emitted multiple instructions. Now, we reduce those operations significantly, giving optimization more to look at for reducing redundant operations. The downside is that QOP_SF is pretty special -- we're going to have to track it separately when we're doing instruction scheduling, and we want to peephole it into the instruction generating the destination write in most cases (and not allocate the destination reg, probably. Unless it's used for some other purpose, as well). /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
b0a1e401a93b7b13870b936bc667b3fc15dba6d5	19-Aug-2014	Eric Anholt <eric@anholt.net>	vc4: Include stdio/stdlib in headers so I don't have to include it per file. There are a few tools I want to have always available, and fprintf() and abort() are among them. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
1d23d55ae97d07b6eb70a3e37a91ecb7de38d8d2	03-Jul-2014	Eric Anholt <eric@anholt.net>	vc4: Add an initial pass of algebraic optimization. There was a lot of extra noise in my piglit shader dumps because of silly CMPs. /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c