History log of /external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
d4c20e82ae34b105fb2d06c8c412656aba2ca1b9 15-Nov-2016 Eric Anholt <eric@anholt.net> vc4: Restructure texture insts as ALU ops with tex_[strb] as the dst.

For now we're still just generating MOVs, but this will let us fold into
other ops in the future. No difference on shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
d3cdbf6fd817ae5e7a8a72bcc3f43cc1b04a709b 09-Jul-2016 Eric Anholt <eric@anholt.net> vc4: Add a "qir_for_each_inst_inorder" macro and use it in many places.

We have the prior list_foreach() all over the code, but I need to move
where instructions live as part of adding support for control flow. Start
by just converting to a helper iterator macro. (The simpler
"qir_for_each_inst()" will be used for the for-each-inst-in-a-block
iterator macro later)
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
71db7d3dc577e48da3689fd66989ec3b0a069089 22-Dec-2015 Eric Anholt <eric@anholt.net> vc4: Replace the SSA-style SEL operators with conditional MOVs.

I'm moving away from QIR being SSA (since NIR is doing lots of SSA
optimization for us now) and instead having QIR just be QPU operations
with virtual registers. By making our SELs be composed of two MOVs, we
could potentially coalesce the registers for the MOV's src and dst and
eliminate the MOV.

total instructions in shared programs: 88448 -> 88028 (-0.47%)
instructions in affected programs: 39845 -> 39425 (-1.05%)
total estimated cycles in shared programs: 246306 -> 245762 (-0.22%)
estimated cycles in affected programs: 162887 -> 162343 (-0.33%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
076551116ed5fc1b0991cb84e1e5453f5a2e11db 11-Dec-2015 Eric Anholt <eric@anholt.net> vc4: Add quick algebraic optimization for clamping of unpacked values.

GL likes to saturate your incoming color, but if that color's coming from
unpacking from unorms, there's no point. Ideally we'd have a range
propagation pass that cleans these up in NIR, but that doesn't seem to be
going to land soon. It seems like we could do a one-off optimization in
nir_opt_algebraic, except that doesn't want to operate on expressions
involving unpack_unorm_4x8, since it's sized.

total instructions in shared programs: 87879 -> 87761 (-0.13%)
instructions in affected programs: 6044 -> 5926 (-1.95%)
total estimated cycles in shared programs: 349457 -> 349252 (-0.06%)
estimated cycles in affected programs: 6172 -> 5967 (-3.32%)

No SSPD on openarena (which had the biggest gains, in its VS/CSes), n=15.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
e3efc4b02334897e0103f8cf926f376159ca1293 11-Dec-2015 Eric Anholt <eric@anholt.net> vc4: When doing algebraic optimization into a MOV, use the right MOV.

If there were src unpacks, changing to the integer MOV instead of float
(for example) would change the unpack operation.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
b70a2f4d81940ef103c95ee51f2a84391a076ac0 11-Dec-2015 Eric Anholt <eric@anholt.net> vc4: Add missing progress note in opt_algebraic.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
a4bf28178f064082d3b818d2cd48abf9075cc459 11-Nov-2015 Eric Anholt <eric@anholt.net> vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.

It looks like nir_lower_idiv is going to use it soon, so add support.
With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with
GL 3.0 forced on).

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
99a9a5a345fab8bbf36ab4e42581f8ee04a59a63 25-Oct-2015 Eric Anholt <eric@anholt.net> vc4: Switch the unpack ops to being unpack flags on a mov.

This paves the way for copy propagating our unpacks. We end up with a
small change on shader-db:

total instructions in shared programs: 89390 -> 89251 (-0.16%)
instructions in affected programs: 19041 -> 18902 (-0.73%)

which appears to be because we no longer convert MOVs for an FMAX dst,
r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and
this ends up with a slightly better schedule.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
8cae9f2fda37b9868ea973a665e1acc115172b45 19-Aug-2015 Eric Anholt <eric@anholt.net> vc4: Add algebraic opt for rcp(1.0).

We're generating rcps as part of backend lowering of the packed coordinate
in the CS, and we don't want to lower them in NIR because of the extra
newton-raphson steps in the common case. However, GLB2.7 is moving a
vertex attribute with a 1.0 W component to the position, and that makes us
produce some silly RCPs.

total instructions in shared programs: 97590 -> 97580 (-0.01%)
instructions in affected programs: 74 -> 64 (-13.51%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
8b36d107fdd6f6b91556fcdc3498df16803d4181 19-Aug-2015 Eric Anholt <eric@anholt.net> vc4: Pack the unorm-packing bits into a src MUL instruction when possible.

Now that we do non-SSA QIR instructions, we can take a NIR SSA src that's
only used by the unorm packing and just stuff the pack bits into it.

total instructions in shared programs: 98136 -> 97974 (-0.17%)
instructions in affected programs: 4149 -> 3987 (-3.90%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
fd74da11c48dcd9098d4f64508aae65775c68b75 19-Aug-2015 Eric Anholt <eric@anholt.net> vc4: Drop an unused algebraic op.

NIR now handles this optimization for us.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
0bba4fa070583f5fd8a0f7208fbfa181dc25e71b 04-Aug-2015 Eric Anholt <eric@anholt.net> vc4: Allow QIR registers to be non-SSA.

Now that we have NIR, most of the optimization we still need to do is
peepholes on instruction selection rather than general dataflow
operations. This means we want to be able to have QIR be a lot closer to
the actual QPU instructions, just with virtual registers. Allowing
multiple instructions writing the same register opens up a lot of
possibilities.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
78c773bb3646295e4a4f1fe7d6d10f05758ee48b 30-May-2015 Eric Anholt <eric@anholt.net> vc4: Convert from simple_list.h to list.h

list.h is a nicer and more familiar set of list functions/macros.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
1dcc1ee314a6907213e2abd5337ec0bbba3bd1bf 30-Mar-2015 Eric Anholt <eric@anholt.net> vc4: Drop integer multiplies with 0 to moves of 0.

This cleans up more instructions generated by uniform array indexing
multiplies.

total instructions in shared programs: 39989 -> 39961 (-0.07%)
instructions in affected programs: 896 -> 868 (-3.12%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
85316d059c899ac096331251de6b233229aa0b4f 19-Feb-2015 Eric Anholt <eric@anholt.net> vc4: Keep an array of pointers to instructions defining the temps around.

The optimization passes are always regenerating it and throwing it away,
but it's not hard to keep track of.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
3f1e1287fd960966eee8b12a75c8a8f62e11cdd2 12-Feb-2015 Eric Anholt <eric@anholt.net> vc4: Make SF be a flag on the QIR instructions.

Right now the places that used to emit a mov.sf just put the SF on the
previous instruction when it generated the source of the SF value. Even
without optimization to push the sf up further (and kill thus potentially
kill more MOVs), this gets us:

total uniforms in shared programs: 13455 -> 13457 (0.01%)
uniforms in affected programs: 3 -> 5 (66.67%)
total instructions in shared programs: 40296 -> 40198 (-0.24%)
instructions in affected programs: 12595 -> 12497 (-0.78%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
753c327151ed7d23218879149950f0028b0e7b4d 01-Feb-2015 Eric Anholt <eric@anholt.net> vc4: Kill a bunch of color write calculation when colormask is all off.

I could have done this in the bit that generates the ANDs and ORs, but
it's probably generally useful. Sadly, I still need this even if I move
to NIR, because I can't yet express my read of the destination color in
NIR, which I would need to move my blend/logicop/colormask handling into
NIR.

total uniforms in shared programs: 13497 -> 13455 (-0.31%)
uniforms in affected programs: 101 -> 59 (-41.58%)
total instructions in shared programs: 40797 -> 40296 (-1.23%)
instructions in affected programs: 1639 -> 1138 (-30.57%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
e473fbe4690b5cbe3769042a4917f22559e2ba8d 10-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Add support for turning constant uniforms into small immediates.

Small immediates have the downside of taking over the raddr B field, so
you might have less chance to pack instructions together thanks to raddr B
conflicts. However, it also reduces some register pressure since it lets
you load 2 "uniform" values in one instruction (avoiding a previous load
of the constant value to a register), and increases some pairing for the
same reason.

total uniforms in shared programs: 16231 -> 13374 (-17.60%)
uniforms in affected programs: 10280 -> 7423 (-27.79%)
total instructions in shared programs: 40795 -> 41168 (0.91%)
instructions in affected programs: 25551 -> 25924 (1.46%)

In a previous version of this patch I had a reduction in instruction count
by forcing the other args alongside a SMALL_IMM to be in the A file or
accumulators, but that increases register pressure and had a bug in
handling FRAG_Z. In this patch is I just use raddr conflict resolution,
which is more expensive. I think I'd rather tweak allocation to have some
way to slightly prefer good choices for files in general, rather than risk
failing to register allocate by forcing things into register classes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
ff266483fb61fd69775daf5c931ca7a56a26f4ac 11-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Move follow_movs() to common QIR code.

I want this from other passes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
7c474f9f2e5e3161ad27129844139ee14d916726 09-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Optimize the other case of SEL_X_Y wih a 0 -> SEL_X_0(a).

Cleans up some output to be more obvious in a piglit test I'm looking at.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
7e67ea994c34a6ebbaeb6a097036702c7a96496f 09-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Optimize out adds of 0.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
0401f55fffc6e77807e6987e23d2709a1599d61e 09-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Optimize fmul(x, 0) and fmul(x, 1).

This was being generated frequently by matrix multiplies of 2 and
3-channel vertex attributes (which have the 0 or 1 loaded in the shader).
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
1cd8c1aab04c4da9aa6cbbd52460607b8416ce1b 09-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Factor out the turn-it-into-a-mov in opt_algebraic.

This will be used more in the next commits.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
5a1352289862a9bd695a15009c69cad54727c66b 09-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Optimize SF(ITOF(x)) -> SF(x).

This is a common production of st_glsl_to_tgsi, because CMP takes a float
argument.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
00a9aebfe064ec252a95e0f3a38f4f6c967dadc4 09-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Add some optimization of FADD(FSUB(0, x)).

This is a common production of st_glsl_to_tgsi, which uses negate flags on
source arguments to handle subtraction.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
76cd9955d96c1b0a13905e255571eb35b3aa2a99 25-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Optimize out silly SUBs of 0.

Drops instructions on vs-temp-array-mat4-index-col-row-wr.shader_test,
which I was looking at because it's failing to register allocate.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
64122b16ce74a3fb65269bab325c651c26ccd2d0 25-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Dump constant uniform values in VC4_DEBUG=qir.

Definitely helps when trying to understand and optimize a program.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
3311513041b81fe4e2fcf2c0e8a363a6d292c7b1 25-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Turn a SEL_X_Y(x, 0) into SEL_X_0(x).

This may reduce register pressure and uniform counts. Drops a bunch of 0
uniform loads on vs-temp-array-mat4-index-col-row-wr.shader_test, which is
failing to register allocate.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
4bca922878a4d433077d21d4918b1db71b3a15f7 13-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Merge qcompile and tgsi_to_qir

The split between these two didn't make much sense. I'm going to want the
chance to look at uniform contents in optimization passes, and the QPU
emit I think is going to end up rewriting the uniforms stream.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
874dfa8b2ecccf3c9a73453d7ccc6638363a59bd 25-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Expose compares at a lower level in QIR.

Before, we had some special opcodes like CMP and SNE that emitted multiple
instructions. Now, we reduce those operations significantly, giving
optimization more to look at for reducing redundant operations.

The downside is that QOP_SF is pretty special -- we're going to have to
track it separately when we're doing instruction scheduling, and we want
to peephole it into the instruction generating the destination write in
most cases (and not allocate the destination reg, probably. Unless it's
used for some other purpose, as well).
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
b0a1e401a93b7b13870b936bc667b3fc15dba6d5 19-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Include stdio/stdlib in headers so I don't have to include it per file.

There are a few tools I want to have always available, and fprintf() and
abort() are among them.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c
1d23d55ae97d07b6eb70a3e37a91ecb7de38d8d2 03-Jul-2014 Eric Anholt <eric@anholt.net> vc4: Add an initial pass of algebraic optimization.

There was a lot of extra noise in my piglit shader dumps because of silly
CMPs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_opt_algebraic.c