History log of /external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
ce0eebc935b5fb91b29ca656f79bdfff39beb8f8 07-Feb-2017 Eric Anholt <eric@anholt.net> vc4: Avoid emitting small immediates for UBO indirect load address guards.

The kernel will reject our shader if we emit one here, and having 4, 8, or
12 as the top end of our UBO clamp rare is enough that it's not worth
making the kernel let us.

Fixes piglit fs-const-array-of-struct and
fs-const-array-of-struct-of-array since recent GLSL linking changes made
us get this as an indirect load of a uniform, instead of a tempoary.

Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit b2309393039b2ec0cc00a8e6fd828c60c4ef1e11)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
4690a93b123a64f8730a870a336ae9756d11fd18 15-Nov-2016 Eric Anholt <eric@anholt.net> vc4: Add support for coalescing ALU ops into tex_[srtb] MOVs.

This isn't as complete as I would like (can't merge interpolation because
of the implicit r5 dependency, doesn't work with control flow), but this
was cheap and easy.

Improves 3DMMES Taiji performance by 1.15353% +/- 0.299896% (n=29, 16)

total instructions in shared programs: 99810 -> 99059 (-0.75%)
instructions in affected programs: 10705 -> 9954 (-7.02%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
a025983dd9cfcba8a452205efbc5c0be8ff3da74 23-Nov-2016 Eric Anholt <eric@anholt.net> vc4: Make qir_for_each_inst_inorder() safe against removal.

The dead code elimination wants it to be safe, and I actually got
segfaults due to it being unsafe with the new coalescing pass.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
27544ea8d330309a7f1604bece6d2fcb4e9a8ae3 15-Nov-2016 Eric Anholt <eric@anholt.net> vc4: Split optimizing VPM writes from VPM reads.

The VPM write logic will be basically the same as the texture coordinate
write logic we need, and it's not really related to the VPM read logic
other than the reuse of the use_count array.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
d4c20e82ae34b105fb2d06c8c412656aba2ca1b9 15-Nov-2016 Eric Anholt <eric@anholt.net> vc4: Restructure texture insts as ALU ops with tex_[strb] as the dst.

For now we're still just generating MOVs, but this will let us fold into
other ops in the future. No difference on shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
314f0c57e4c00b0a5cb544fa43e356c1069acd8f 15-Nov-2016 Eric Anholt <eric@anholt.net> vc4: Refactor qir_get_op_nsrc(enum qop) to qir_get_nsrc(struct qinst *).

Every caller was dereffing the qinst, and this will let us make the number
of sources vary depending on the destination of the qinst so that we can
have general ALU ops that store to tex_[strb] and get an implicit uniform.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
51087327f2ba929739719b2ae243d8c69d31346f 15-Nov-2016 Eric Anholt <eric@anholt.net> vc4: Replace the qinst src[] with a fixed-size array.

This may have made a tiny bit of sense when we had one 4-arg inst per
shader, but if we only ever put 2 things in, having a pointer to 2 things
almost every instruction is pointless indirection.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
a220f1b5a9beaba146096971354ae37c6f75d4ef 15-Nov-2016 Eric Anholt <eric@anholt.net> vc4: Remove qir_inst4().

This was used originally for unorm4x8 packs, but we now represent those as
a series of packed movs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
414dbb2d5c48b7e9dc0dc8b14583f91415ca3960 22-Nov-2016 Eric Anholt <eric@anholt.net> vc4: Don't conditionalize the src1 mov of qir_SEL().

My thought in having both arguments conditionally moved was that it should
theoretically save some power by not doing work in those channels.
However, it ends up costing us instructions because we can't
register-coalesce the first of the MOVs, and it also introduces extra
scheduling dependencies. The instruction cost would swamp whatever power
benefit I was hoping for.

shader-db results:
total instructions in shared programs: 100548 -> 99741 (-0.80%)
instructions in affected programs: 42450 -> 41643 (-1.90%)

With obvious outliers removed (I had an X11 emacs running over the network
in the "after" case), 3DMMES Taiji showed 1.07231% +/- 0.488241% fps
improvement (n=18, 30).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
ace0d810e56a1e2978fc3ac237158918ebe2a23c 11-Nov-2016 Eric Anholt <eric@anholt.net> vc4: Flag the last thread switch in the program as the last.

We don't allow the last thread switch to be inside control flow, to be
sure that we hit the last state exactly once. If the last texturing was
in control flow, fall back to single threaded.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
755037173d19b65777a97f55455c1f64bf618264 11-Nov-2016 Eric Anholt <eric@anholt.net> vc4: Add support for register allocation for threaded shaders.

We have two major requirements: Make sure that only the bottom half of the
physical reg space is used, and make sure that none of our values are live
in an accumulator across a switch.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
4f527f12604269f15704bbd14a4962766afdfb9a 11-Nov-2016 Eric Anholt <eric@anholt.net> vc4: Add a thread switch QIR instruction.

This will eventually be generated at the QIR level, so that
vc4_qir_schedule.c can arrange the separation of tex_strb from tex_result
correctly. It will also be important so that register allocation set the
register classes appropriately for values that are live across the switch.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
4d019bd703e7c20d56d5b858577607115b4926a3 07-Nov-2016 Eric Anholt <eric@anholt.net> vc4: Don't abort when a shader compile fails.

It's much better to just skip the draw call entirely. Getting this
information out of register allocation will also be useful for
implementing threaded fragment shaders, which will need to retry
non-threaded if RA fails.

Cc: <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
d4ae5ca823227214dd1f536e5f4058bede20b2dd 05-Oct-2016 Eric Anholt <eric@anholt.net> vc4: Fix live intervals analysis for screening defs in if statements.

If a conditional assignment is only conditioned on the exec mask, that's
still screening off the value in the executed channels (and, since we're
not storing to the unexcuted channels, we don't care what's in there).

Fixes a bunch of extra register pressure on Processing's Ribbons demo,
which is failing to allocate.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
60bed14d0fdc3a05b6251b4ffc6013b5d3ca3e0f 26-Aug-2016 Eric Anholt <eric@anholt.net> vc4: Handle discards while in control flow.

I missed this while adding loop support because the discard test inside a
loop was crashing before, anyway. Fixes piglit glsl-fs-discard-04.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
8ce65261789f085e657e6a487db93d38ee6bea63 25-Aug-2016 Eric Anholt <eric@anholt.net> vc4: Add support for MUL output rotation.

Extracted from a patch by jonasarrow on github.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
074f1f3c0c2cd15213a62eb7f589423ece6391c8 25-Aug-2016 Eric Anholt <eric@anholt.net> vc4: Add support for the 2-bit LOAD_IMM variants.

Extracted and fixed up from a patch by jonasarrow on github. This ended
up not getting used for ddx/ddy, but seems like it might still be useful.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
31da39ddc92e780dc539bf34d2de7f82fc65fa86 25-Aug-2016 Eric Anholt <eric@anholt.net> vc4: Add a QIR value for the QPU element register.

This will be used in the ddx/ddy support for "Am I the top half?" or "Am I
the left half?" checks.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
87a88f2daabfe14b12d447b3d96b9f8938c5cf03 22-Aug-2016 Eric Anholt <eric@anholt.net> vc4: Fix GPU hangs with >16 varying values.

Fixes glsl-routing in piglit and hangs in glbenchmark 2.0.2.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
e8378fee0c20ecd26451c079c725420077606cb9 27-Jul-2016 Eric Anholt <eric@anholt.net> nir: Define system values for vc4's blending-lowering arguments.

In the GLSL-to-NIR conversion of VC4, I had a bit of trouble with what I
was calling the "state uniforms" that I was putting into the NIR fighting
with its other lowering passes. Instead of using magic uniform base
numbers in the backend, follow the lead of load_user_clip_plane and just
define system values for them.

v2: Fix unintended change to channel_num, drop unspecified const_index
value on blend_const_color_r_float.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
2350569a78c60d32e3b751b4386ea7e6d7e2ebe9 03-Aug-2016 Eric Anholt <eric@anholt.net> vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far.

We don't want to bake the whole array into the FS key, because of the
hashing overhead. But we can keep a set of the arrays seen, and use a
pointer to the copy in as the array's proxy.

Between this and the previous patch, gl-1.0-blend-func now passes on
hardware, where previously it was filling the 256MB CMA area with shaders
and OOMing.

Drops 712 shaders from shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
bc1fc9c98539f38f5a29b314d4a993a2e2f7ca0a 03-Aug-2016 Eric Anholt <eric@anholt.net> vc4: Avoid generating a custom shader per level in glGenerateMipmaps().

We were baking in the LOD of the source level to each shader. Instead,
pass it in as a uniform -- this requires storing it to a temp register,
but that's better than compiling a ton of separate shaders:

total instructions in shared programs: 115032 -> 115036 (0.00%)
instructions in affected programs: 96 -> 100 (4.17%)
LOST: 572
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
3bcd0f1912a60cc9d3813923d18d29465e41ff56 15-Jul-2016 Eric Anholt <eric@anholt.net> vc4: Speed up glGenerateMipmaps by avoiding shadow baselevel.

To support general GL_TEXTURE_BASE_LEVEL we have to copy to a temporary
miptree. However, if a single level is being selected, we can use the
existing miptree and force all the sampling to be from that particular
level.

This avoids a ton of software fallbacks in glGenerateMipmaps(), which uses
base levels in the blit implementation in gallium. Improves "glmark2 -b
terrain" from 2 fps to 3 (perhaps some more precision would be useful?),
and cuts its CPU usage during the benchmarking from ~30% to ~10% (total
CPU time from 8.8s to 7.6s).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
9194473dd260fe72042807a97be0072c6f0537da 06-May-2016 Eric Anholt <eric@anholt.net> vc4: Emit resets of the uniform stream at the starts of blocks.

If a block might be entered from multiple locations, then the uniform
stream will (probably) be at different points, and we need to make sure
that it's pointing where we expect it to be. The kernel also enforces
that any block reading a uniform resets uniforms, to prevent reading
outside of the uniform stream by using looping.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
44df061aaad96fc5db630ae69fb2fe2a03bb5659 27-Apr-2016 Eric Anholt <eric@anholt.net> vc4: Add support for scheduling of branch instructions.

For now we don't fill the delay slots, and instead just drop in NOPs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
a59da513d3229c883809ac2088c9612abcec1470 16-Mar-2016 Eric Anholt <eric@anholt.net> vc4: Move the QPU instructions to schedule into each block.

We'll want to schedule them individually, to handle delay slots.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
420845acb2207cb9d903e67b66deaf08637ac3b2 02-May-2016 Eric Anholt <eric@anholt.net> vc4: Add support for NIR loops and break/continue.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
f505f66cd5a266dc70ad12e2b015e6c631651aec 28-Apr-2016 Eric Anholt <eric@anholt.net> vc4: Add support for storing to NIR registers in a non-SSA fashion.

Previously, there were occasionally NIR registers in our programs, but
they were always actually used SSA-only. Now that we're trying to support
control flow, we need to actually conditionally move to registers based on
whether channels are active or not.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
05bcd9dd960d5658801ab35d429ba9778f67cad0 15-Mar-2016 Eric Anholt <eric@anholt.net> vc4: Define a QIR branch instruction

This uses the branch condition code in inst->cond to jump to either
successor[0] (condition matches) or successor[0] (condition doesn't
match).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
89918c1e74e454af119e7ae23f3ed66fc26abc4b 10-Mar-2016 Eric Anholt <eric@anholt.net> vc4: Implement live intervals using a CFG.

Right now our CFG is always a trivial single basic block, but that will
change when enable loops.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
6c1f834a237540c344fa794d60501a69bf066fb5 09-Jul-2016 Eric Anholt <eric@anholt.net> vc4: Create a basic block structure and move the instructions into it.

The optimization passes and scheduling aren't actually ready for multiple
blocks with control flow yet (as seen by the "cur_block" references in
them instead of iterating over blocks), but this creates the structures
necessary for converting them.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
d3cdbf6fd817ae5e7a8a72bcc3f43cc1b04a709b 09-Jul-2016 Eric Anholt <eric@anholt.net> vc4: Add a "qir_for_each_inst_inorder" macro and use it in many places.

We have the prior list_foreach() all over the code, but I need to move
where instructions live as part of adding support for control flow. Start
by just converting to a helper iterator macro. (The simpler
"qir_for_each_inst()" will be used for the for-each-inst-in-a-block
iterator macro later)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
ac772b24a18bddc490dae171441795dedd85d7b2 26-Jun-2016 Eric Anholt <eric@anholt.net> vc4: Regularize instruction emit macros

ALU0 didn't have the _dest variant, and ALU2 didn't unset the def the way
ALU1 did. This should make the ALU[012] macros much clearer, by moving
most of their contents to vc4_qir.c
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
200b4e4bd5e87fea91193e3d1976b9cf0eabf8ba 03-Jun-2016 Eric Anholt <eric@anholt.net> vc4: Move SF removal to a separate peephole pass.

The DCE pass is going to change significantly to handle control flow,
while we don't really need to change it for the SF handling. We also need
to add some more SF peephole optimization for SF updates generated by
control flow support.

No change on shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
eaa53f80d9da292ade219c609f8ac37f9a8ca0d7 26-Jun-2016 Eric Anholt <eric@anholt.net> vc4: Drop the dead QIR_PACK() macro.

This isn't used since we switched to using the dst.pack field instead of
custom instructions.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
18260d05820eca971873407e939007c12600660c 17-May-2016 Eric Anholt <eric@anholt.net> vc4: Add support for vertex color clamping in the rasterizer.

This gets us precompile of vertex shaders at the state tracker level as
well.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
a1f698881e13a4993e958815b79f8150d48e2739 06-May-2016 Eric Anholt <eric@anholt.net> vc4: Add support for loading immediate values in QIR.

This will be used for resetting the uniform stream in the presence of
branching, but may also be useful as an optimization to reduce how many
uniforms we have to copy out per draw call (in exchange for increasing
icache pressure).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
8e2d0843c02daf5280184f179ae8ed440ac90d7f 02-May-2016 Eric Anholt <eric@anholt.net> vc4: Add a small QIR validate pass.

This has caught a couple of bugs during loop development so far, and I
should probably have written it long ago.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
419fee92eef229314e28879a7b8a6a8dc3b4b549 02-May-2016 Eric Anholt <eric@anholt.net> vc4: When emitting an instruction to an existing temp, mark it non-SSA.

Prevents a bug in the later control-flow support series.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
226bd9294541f65c91cad44924ef68b6da18f2a2 22-Apr-2016 Eric Anholt <eric@anholt.net> vc4: Use NIR lowering for sRGB decode.

This should get us the same decode code generated, but with a lot less
custom code in the driver.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
84322b2f315d5dfdb15302ed6cbe5ed79d775d69 28-Apr-2016 Eric Anholt <eric@anholt.net> vc4: Remove the CSE pass.

It's not doing anything according to shader-db now that we're using NIR.
It would have had to be reworked significantly anyway, to handle control
flow.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
b145b731ab01937993e2bf7ecc072217932568ff 28-Apr-2016 Eric Anholt <eric@anholt.net> vc4: Emit only one FRAG_Z or FRAG_W QIR opcode.

We were generating piles of FRAG_W for interpolation, only to CSE them
away immediately. Since this is the only thing that CSE is doing for us
any more, just avoid making the CSE work necessary.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
2402bb60955b56f915e49f6648eb6c6221fe0862 19-Jan-2016 Eric Anholt <eric@anholt.net> vc4: Remove unused "immediates" field

This was for TGSI, which we no longer have to deal with.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
30b818d5eb67c7427fbefb456c7bc2d876bf9eac 21-Mar-2016 Eric Anholt <eric@anholt.net> vc4: Move FRAG_X/Y/REV_FLAG to a QFILE like VPM or TLB color writes.

This gives us one less set of special instruction generation cases, and
instead just the case for returning the correct register to read.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
f029932cac36859df5a6d04d1dd7343672ced83a 21-Mar-2016 Eric Anholt <eric@anholt.net> vc4: Allow TLB Z/color/stencil writes from any ALU operation in QIR.

This lets us write the Z directly from the FTOI for computed Z, and may
let us coalesce color writes in the future.

No change in my shader-db, but clearly drops an instruction in piglit's
early-z test.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
44d7b8ad12df504058615901c7233c45e4f24a9f 21-Mar-2016 Eric Anholt <eric@anholt.net> vc4: Add a helper function for the construction of qregs.

The separate declaration of the struct is not helping clarity, and I was
going to be writing a whole lot more of these in the upcoming patches.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
483c172989be74a992befce3c0a9058a82b35c80 21-Mar-2016 Eric Anholt <eric@anholt.net> vc4: Drop the multi_instruction distinction for QIR instructions.

It wasn't correctly flagged everywhere, and QPU generation now handles the
only remaining case that was paying attention to it.

No change on shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
99a759a4a3c29c283ae93612017d2f31c0ddbe73 08-Apr-2016 Eric Anholt <eric@anholt.net> vc4: Switch to using NIR_PASS macros.

This gets us better validation of our NIR transformations.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
2b9f0dffe00bdc556436da02c099b8a50ecc4f49 16-Mar-2016 Eric Anholt <eric@anholt.net> vc4: Move discard handling to the condition flag.

Now that the field exists in the instruction, we can make discards less
special. As a bonus, that means that we should be able to merge some more
.sf instructions together when we get around to that.

This causes some scheduling changes, as it allows tlb_color_reads to be
delayed past the discard condition setup. Since the tlb_color_read ends
up later, this may mean performance improvements, but I haven't tested.

total instructions in shared programs: 78114 -> 78035 (-0.10%)
instructions in affected programs: 1922 -> 1843 (-4.11%)
total estimated cycles in shared programs: 234318 -> 234329 (0.00%)
estimated cycles in affected programs: 8200 -> 8211 (0.13%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
e103b52aec773537d2821d8acc42ac9caa2a4b17 07-Mar-2016 Varad Gautam <varadgautam@gmail.com> vc4: Coalesce instructions using VPM reads into the VPM read.

This is done instead of copy propagating the VPM reads into the
instructions using them, because VPM reads have to stay in order.

shader-db results:
total instructions in shared programs: 78509 -> 78114 (-0.50%)
instructions in affected programs: 5203 -> 4808 (-7.59%)
total estimated cycles in shared programs: 234670 -> 234318 (-0.15%)
estimated cycles in affected programs: 5345 -> 4993 (-6.59%)

Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Rhys Kidd <rhyskidd@gmail.com>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
a39a8fbbaa129f4e52f2a3ad2747182e9a74d910 17-Jan-2016 Emil Velikov <emil.velikov@collabora.com> nir: move to compiler/

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
12519a972f53dba13289b0abebd558fd8506a539 19-Dec-2015 Eric Anholt <eric@anholt.net> vc4: Use NIR texture lowering for texture swizzling.

We can't use its other features currently (mostly because we don't want
Newton-Raphson on rcps for texture coordinates), but it gets us started.

This eliminates some comparisons with constants in GLB2.7 and ETQW traces
at the QIR level by moving the comparisons into NIR, where they get
constant-folded out.

instructions in affected programs: 165 -> 156 (-5.45%)
total uniforms in shared programs: 32087 -> 32085 (-0.01%)
total estimated cycles in shared programs: 245762 -> 245752 (-0.00%)
estimated cycles in affected programs: 461 -> 451 (-2.17%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
71db7d3dc577e48da3689fd66989ec3b0a069089 22-Dec-2015 Eric Anholt <eric@anholt.net> vc4: Replace the SSA-style SEL operators with conditional MOVs.

I'm moving away from QIR being SSA (since NIR is doing lots of SSA
optimization for us now) and instead having QIR just be QPU operations
with virtual registers. By making our SELs be composed of two MOVs, we
could potentially coalesce the registers for the MOV's src and dst and
eliminate the MOV.

total instructions in shared programs: 88448 -> 88028 (-0.47%)
instructions in affected programs: 39845 -> 39425 (-1.05%)
total estimated cycles in shared programs: 246306 -> 245762 (-0.22%)
estimated cycles in affected programs: 162887 -> 162343 (-0.33%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
f1fb85e5440d8874997eea1df982cf02b6ca2ca2 19-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Do instruction scheduling on the QIR to hide texture fetch latency.

This is a rewrite of vc4_opt_qpu_schedule.c to operate on QIR. Texture
fetch can probably take as much as the rest of the cycles of the program,
so it's important to hide our other cycles during it (which is hard to do
after register allocation). Also, we can queue up multiple texture
requests before collecting the resulting samples, so that we keep the
texture unit busy more of the time.

High-settings openarena performance +2.35849% +/- 0.221154% (n=7). Also
about 2-3% on the multiarb demo. 8 piglit tests
(ext_framebuffer_multisample accuracy depthstencil) go from failing in
rendering to failing in register allocation, but hopefully I can fix that
up with some better register pressure handling here.

total instructions in shared programs: 87723 -> 88448 (0.83%)
instructions in affected programs: 78411 -> 79136 (0.92%)
total estimated cycles in shared programs: 276583 -> 246306 (-10.95%)
estimated cycles in affected programs: 265691 -> 235414 (-11.40%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
5989ef2b0feb40821a20768c7b4b196b3e793960 09-Dec-2015 Eric Anholt <eric@anholt.net> vc4: Add debugging of the estimated time to run the shader to shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
78b81be627734ea7fa50ea246c07b0d4a3a1638a 25-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir: Get rid of *_indirect variants of input/output load/store intrinsics

There is some special-casing needed in a competent back-end. However, they
can do their special-casing easily enough based on whether or not the
offset is a constant. In the mean time, having the *_indirect variants
adds special cases a number of places where they don't need to be and, in
general, only complicates things. To complicate matters, NIR had no way to
convdert an indirect load/store to a direct one in the case that the
indirect was a constant so we would still not really get what the back-ends
wanted. The best solution seems to be to get rid of the *_indirect
variants entirely.

This commit is a bunch of different changes squashed together:

- nir: Get rid of *_indirect variants of input/output load/store intrinsics
- nir/glsl: Stop handling UBO/SSBO load/stores differently depending on indirect
- nir/lower_io: Get rid of load/store_foo_indirect
- i965/fs: Get rid of load/store_foo_indirect
- i965/vec4: Get rid of load/store_foo_indirect
- tgsi_to_nir: Get rid of load/store_foo_indirect
- ir3/nir: Use the new unified io intrinsics
- vc4: Do all uniform loads with byte offsets
- vc4/nir: Use the new unified io intrinsics
- vc4: Fix load_user_clip_plane crash
- vc4: add missing src for store outputs
- vc4: Fix state uniforms
- nir/lower_clip: Update to the new load/store intrinsics
- nir/lower_two_sided_color: Update to the new load intrinsic

NIR and i965 changes are

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

NIR indirect declarations and vc4 changes are

Reviewed-by: Eric Anholt <eric@anholt.net>

ir3 changes are

Reviewed-by: Rob Clark <robdclark@gmail.com>

NIR changes are

Acked-by: Rob Clark <robdclark@gmail.com>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
6b4dfd53ae9b4f86cda0377a4d67b79e9faf7cc8 23-Jun-2015 Eric Anholt <eric@anholt.net> vc4: Add support for texel fetches from MSAA resources.

This is the core of ARB_texture_multisample. Most of the piglit tests for
GL_ARB_texture_multisample require GL 3.0, but exposing support for this
lets us use the gallium blitter for multisample resolves. We can
sometimes multisample resolve using just the RCL, but that requires that
the blit is 1:1, unflipped, and aligned to tile boundaries.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
a97b40dca4949b5b8b3320e76768e54f430c9e78 23-Jun-2015 Eric Anholt <eric@anholt.net> vc4: Add support for multisample framebuffer operations.

This includes GL_SAMPLE_COVERAGE, GL_SAMPLE_ALPHA_TO_ONE, and
GL_SAMPLE_ALPHA_TO_COVAGE.

I haven't implemented a dithering function yet, and gallium doesn't give
me a good chance to do so for GL_SAMPLE_COVERAGE.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
74c4b3b80cc4246fd1eb503d97edb3d293eef5de 21-Nov-2015 Eric Anholt <eric@anholt.net> vc4: Add support for storing sample mask.

From the API perspective, writing 1 bits can't turn on pixels that were
off, so we AND it with the sample mask from the payload.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
a664233042e1ad343184a0c237c3bd7ac5010779 21-Nov-2015 Eric Anholt <eric@anholt.net> vc4: Add support for loading sample mask.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
a4bf28178f064082d3b818d2cd48abf9075cc459 11-Nov-2015 Eric Anholt <eric@anholt.net> vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.

It looks like nir_lower_idiv is going to use it soon, so add support.
With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with
GL 3.0 forced on).

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
01ca4f207efac555ff5f729dce1687a68ba65400 26-Oct-2015 Eric Anholt <eric@anholt.net> vc4: Rewrite the pack instructions as a MOV with a dst pack flag

Another step in reducing the special-casing of instructions.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
99a9a5a345fab8bbf36ab4e42581f8ee04a59a63 25-Oct-2015 Eric Anholt <eric@anholt.net> vc4: Switch the unpack ops to being unpack flags on a mov.

This paves the way for copy propagating our unpacks. We end up with a
small change on shader-db:

total instructions in shared programs: 89390 -> 89251 (-0.16%)
instructions in affected programs: 19041 -> 18902 (-0.73%)

which appears to be because we no longer convert MOVs for an FMAX dst,
r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and
this ends up with a slightly better schedule.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
652a864b257650e730ecec9e5882d765840a02e1 26-Oct-2015 Eric Anholt <eric@anholt.net> vc4: Fix up the test for whether the unpack can be from r4.

We can do 16a/16b from float as well. No difference on shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
f09ed63f4342846e361242233162799140674d5f 25-Oct-2015 Eric Anholt <eric@anholt.net> vc4: Fix the test for skipping raw MOVs.

I don't know what previous test was trying to do, but it dates back to the
first add of vc4_qpu_emit.c. No change to shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
70b06fb5d55d639fd74596a2ff6971cb57c030ca 19-Aug-2015 Eric Anholt <eric@anholt.net> vc4: Convert blending to being done in 4x8 unorm normally.

We can't do this all the time, because you want blending to be done in
linear space, and sRGB would lose too much precision being done in 4x8.
The win on instructions is pretty huge when you can, though.

total uniforms in shared programs: 32065 -> 32168 (0.32%)
uniforms in affected programs: 327 -> 430 (31.50%)
total instructions in shared programs: 92644 -> 89830 (-3.04%)
instructions in affected programs: 15580 -> 12766 (-18.06%)

Improves openarena performance at 1920x1080 from 10.7fps to 11.2fps.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
8e701fda499af0387f5c72f7bc14510182738647 09-Jan-2015 Eric Anholt <eric@anholt.net> vc4: Add QIR/QPU support for the 8-bit vector instructions.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
fb064901e9bd83a63d486f246b9ea943cd00f6cd 21-Oct-2015 Eric Anholt <eric@anholt.net> vc4: Use Rob's NIR-based user clip lowering.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
cfa980f49356eb2d94178f8cc9d67d01b4e3d695 09-Sep-2015 Eric Anholt <eric@anholt.net> vc4: convert from tgsi semantic/index to varying-slot

(originally part of previous patch, split out to separate patch by Rob)

v2: squash in some fixes from Eric
v3: Another fix from Eric for point coords.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
89b1b33f44bc6ce71109ac8668529c30b6d6d910 21-Aug-2015 Eric Anholt <eric@anholt.net> vc4: Fold the 16-bit integer pack into the instructions generating it.

total instructions in shared programs: 97580 -> 96798 (-0.80%)
instructions in affected programs: 52826 -> 52044 (-1.48%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
4ae137534a8718db4611782dbfec773504b6e3be 19-Aug-2015 Eric Anholt <eric@anholt.net> vc4: Make _dest variants of qir ALU helpers to provide an explicit dest.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
8b36d107fdd6f6b91556fcdc3498df16803d4181 19-Aug-2015 Eric Anholt <eric@anholt.net> vc4: Pack the unorm-packing bits into a src MUL instruction when possible.

Now that we do non-SSA QIR instructions, we can take a NIR SSA src that's
only used by the unorm packing and just stuff the pack bits into it.

total instructions in shared programs: 98136 -> 97974 (-0.17%)
instructions in affected programs: 4149 -> 3987 (-3.90%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
572a48366d9dfac6a7f9ee8f4d29832c496125e2 19-Aug-2015 Eric Anholt <eric@anholt.net> vc4: Add a QIR helper for whether the op is a MUL type.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
98728ce0718e49864b872beb76fc3afbf341b38a 06-Aug-2015 Eric Anholt <eric@anholt.net> vc4: Switch QPU_PACK_SCALED to be two non-SSA instructions.

total instructions in shared programs: 98159 -> 98136 (-0.02%)
instructions in affected programs: 12279 -> 12256 (-0.19%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
69ef08d303cdf153fe2432a7e40faccae5d62aab 06-Aug-2015 Eric Anholt <eric@anholt.net> vc4: Make the pack-to-unorm instructions be non-SSA.

This helps ensure that the register allocator doesn't force the later pack
operations to insert extra MOVs.

total instructions in shared programs: 98170 -> 98159 (-0.01%)
instructions in affected programs: 2134 -> 2123 (-0.52%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
bf3c50fba221f216e38d3f60f89161ced4c684c0 14-Apr-2015 Eric Anholt <eric@anholt.net> vc4: Move all of our fixed function fragment color handling to NIR.

This massively reduces our dependency on VC4-specific optimization passes.

shader-db:
total uniforms in shared programs: 32077 -> 32067 (-0.03%)
uniforms in affected programs: 149 -> 139 (-6.71%)
total instructions in shared programs: 98208 -> 98182 (-0.03%)
instructions in affected programs: 2154 -> 2128 (-1.21%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
38c6c0f5b499e2bcff2cc9607f67c0f1836f305b 31-Jul-2015 Eric Anholt <eric@anholt.net> vc4: Add a helper for making driver-specific NIR load_uniform for GL state

In order to move more of our lowering into NIR, we need the ability to
reference various pipeline state (like texture rectangle scaling factors
or blend colors), so we just set those up as a load_uniform with a big
offset to indicate that it's not within the shader's uniform storage and
is one of our state values.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
cc8fb2904673588d31b660dbfaf692615b5202dd 31-Jul-2015 Eric Anholt <eric@anholt.net> vc4: Make r4-writes implicitly move to a temp, and allocate temps to r4.

Previously, SFU values always moved to a temporary, and TLB color reads
and texture reads always lived in r4. Instead, we can have these results
just be normal temporaries, and the register allocator can leave the
values in r4 when they don't interfere with anything else using r4.

shader-db results:
total instructions in shared programs: 100809 -> 100040 (-0.76%)
instructions in affected programs: 42383 -> 41614 (-1.81%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
9b403c0756ecf806a8ff768bd73a4cbf42986bdb 31-Jul-2015 Eric Anholt <eric@anholt.net> vc4: Drop a dead prototype.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
13ddd48b97474c261ef2d7412629748d6d91f2ad 30-Jul-2015 Eric Anholt <eric@anholt.net> vc4: Move program keys to the header file.

I want to be able to inspect them from other files for lowering passes in
NIR.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
b85f6ae4b24ee50948f14a9effa982eb0b9b3681 30-Jul-2015 Eric Anholt <eric@anholt.net> vc4: Start adding a NIR-based output lowering pass.

For now, this just splits up store_output intrinsics to be scalars, and
drops unused outputs in the coordinate shader. My goal is to be able to
drop a bunch of my VC4-specific optimization by letting NIR handle it.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
0f69d59b1c8f5314c1abe18659b96adcfc51a0e5 24-Jun-2015 Eric Anholt <eric@anholt.net> vc4: Make a helper for TLB color writes, too.

We've done so for all the other QIR instruction generation in this file.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
78c773bb3646295e4a4f1fe7d6d10f05758ee48b 30-May-2015 Eric Anholt <eric@anholt.net> vc4: Convert from simple_list.h to list.h

list.h is a nicer and more familiar set of list functions/macros.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
73e2d4837d7e4611f31532ab0ccc14369341e0cb 30-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Convert to consuming NIR.

NIR brings us better optimization than I would have bothered to write
within the driver, developers sharing future optimization work, and the
ability to share device-specific lowering code that we and other
GLES2-level drivers need.

total uniforms in shared programs: 13421 -> 13422 (0.01%)
uniforms in affected programs: 62 -> 63 (1.61%)
total instructions in shared programs: 39961 -> 39707 (-0.64%)
instructions in affected programs: 15494 -> 15240 (-1.64%)

v2: Add missing imov support, and assert that there are no dest saturates.
v3: Rebase on the target-specific algebraic series.
v4: Rebase on gallium-includes-from-NIR changes in mater.
v5: Rebase on variables being in lists instead of hash tables.
v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which
I'm not committing)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
8c5dcdbccb68b73d2856d9c1faafadc536e682e3 30-Mar-2015 Eric Anholt <eric@anholt.net> vc4: Add a constant folding pass.

This cleans up some pointless operations generated by the in-driver mul24
lowering (commonly generated by making a vec4 index for a matrix in a
uniform array).

I could fill in other operations, but pretty much anything else ought to
be getting handled at the NIR level, I think.

total uniforms in shared programs: 13423 -> 13421 (-0.01%)
uniforms in affected programs: 346 -> 344 (-0.58%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
85316d059c899ac096331251de6b233229aa0b4f 19-Feb-2015 Eric Anholt <eric@anholt.net> vc4: Keep an array of pointers to instructions defining the temps around.

The optimization passes are always regenerating it and throwing it away,
but it's not hard to keep track of.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
877b48a531adc397493e508e509aba2918915349 19-Feb-2015 Eric Anholt <eric@anholt.net> vc4: Move qir_uniform() and the constant-value versions to vc4_qir.c/h.

I may want them in optimization passes, and they're not really particular
to the program translation stage.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
14dc281c1332518b6144718e1fb3845abbe23ff7 19-Feb-2015 Eric Anholt <eric@anholt.net> vc4: Enforce one-uniform-per-instruction after optimization.

This lets us more intelligently decide which uniform values should be put
into temporaries, by choosing the most reused values to push to temps
first.

total uniforms in shared programs: 13457 -> 13433 (-0.18%)
uniforms in affected programs: 1524 -> 1500 (-1.57%)
total instructions in shared programs: 40198 -> 40019 (-0.45%)
instructions in affected programs: 6027 -> 5848 (-2.97%)

I noticed this opportunity because with the NIR work, some programs were
happening to make different uniform copy propagation choices that
significantly increased instruction counts.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
3f1e1287fd960966eee8b12a75c8a8f62e11cdd2 12-Feb-2015 Eric Anholt <eric@anholt.net> vc4: Make SF be a flag on the QIR instructions.

Right now the places that used to emit a mov.sf just put the SF on the
previous instruction when it generated the source of the SF value. Even
without optimization to push the sf up further (and kill thus potentially
kill more MOVs), this gets us:

total uniforms in shared programs: 13455 -> 13457 (0.01%)
uniforms in affected programs: 3 -> 5 (66.67%)
total instructions in shared programs: 40296 -> 40198 (-0.24%)
instructions in affected programs: 12595 -> 12497 (-0.78%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
d70eb3851753ed7b57c56e4a7fd538857e4385ce 14-Nov-2014 Eric Anholt <eric@anholt.net> gallium: Replace u_simple_list.h with util/simple_list.h

The code was exactly the same, except util/ has c++ guards and a struct
simple_node declaration.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
772c47aefe96694c5f3fa354bd6792d137824700 11-Jan-2015 Eric Anholt <eric@anholt.net> vc4: Move the tests for src needing to be an A register to vc4_qir.c.

I want it from another location.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
a8e14c293b19a2d298f91f283d6b6839f36fb518 10-Jan-2015 Eric Anholt <eric@anholt.net> vc4: Allow dead code elimination of VPM reads.

This gets a bunch of dead reads out of the CSes, which don't read most
attributes generally.

total instructions in shared programs: 39753 -> 39487 (-0.67%)
instructions in affected programs: 4721 -> 4455 (-5.63%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
b920ecf793bd419558a240014624add08774765d 10-Jan-2015 Eric Anholt <eric@anholt.net> vc4: Cook up the draw-time VPM setup info during shader compile.

This will give the compiler the chance to dead-code eliminate unused VPM
reads. This is particularly a big deal in the CS where a bunch of vattrs
are just not going to be used.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
c772c92153fdcd4ba4920b7ef1745ce83b09603b 10-Jan-2015 Eric Anholt <eric@anholt.net> vc4: Split two notions of instructions having side effects.

Some ops can't be DCEd, while some of the ops that are just important due
to the args they have can be.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
a58ae83882b3ad3ecb53271f42cf1fd8f9c2907c 10-Jan-2015 Eric Anholt <eric@anholt.net> vc4: Redo VPM reads as a read file.

This will let us do copy propagation of the VPM reads.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
92a0b0bd7099b15320faaccfd70b3c8dc877810e 09-Jan-2015 Eric Anholt <eric@anholt.net> vc4: Pack VPM attr contents according to just the size of the attribute.

total instructions in shared programs: 40960 -> 39753 (-2.95%)
instructions in affected programs: 20871 -> 19664 (-5.78%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
72cb6619cb75a92901d372d687505a747a384571 09-Jan-2015 Eric Anholt <eric@anholt.net> vc4: Restructure color packing as a series of channel replacements.

I'm using this in some WIP commits for doing blending in 8888 instead of
vec4. But it also gives us these results immediately, thanks to allowing
more uniforms/immediates in the arguments:

total instructions in shared programs: 41027 -> 40960 (-0.16%)
instructions in affected programs: 4381 -> 4314 (-1.53%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
e06b0778f59980429fececb1aa0de0f0a3f23427 18-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Coalesce MOVs into VPM with the instructions generating the values.

total instructions in shared programs: 41168 -> 40976 (-0.47%)
instructions in affected programs: 18156 -> 17964 (-1.06%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
a871eff16cc18232ee03b372d75cb6f633213e14 18-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Redefine VPM writes as a (destination) QIR register file.

This will let me coalesce the VPM writes into the instructions generating
the values.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
e473fbe4690b5cbe3769042a4917f22559e2ba8d 10-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Add support for turning constant uniforms into small immediates.

Small immediates have the downside of taking over the raddr B field, so
you might have less chance to pack instructions together thanks to raddr B
conflicts. However, it also reduces some register pressure since it lets
you load 2 "uniform" values in one instruction (avoiding a previous load
of the constant value to a register), and increases some pairing for the
same reason.

total uniforms in shared programs: 16231 -> 13374 (-17.60%)
uniforms in affected programs: 10280 -> 7423 (-27.79%)
total instructions in shared programs: 40795 -> 41168 (0.91%)
instructions in affected programs: 25551 -> 25924 (1.46%)

In a previous version of this patch I had a reduction in instruction count
by forcing the other args alongside a SMALL_IMM to be in the A file or
accumulators, but that increases register pressure and had a bug in
handling FRAG_Z. In this patch is I just use raddr conflict resolution,
which is more expensive. I think I'd rather tweak allocation to have some
way to slightly prefer good choices for files in general, rather than risk
failing to register allocate by forcing things into register classes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
ff266483fb61fd69775daf5c931ca7a56a26f4ac 11-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Move follow_movs() to common QIR code.

I want this from other passes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
48a2154520351a22fc860efcdaa4329a51d29c8d 15-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Add support for 16-bit signed/unsigned norm/scaled vertex attrs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
2142fd1f6f36ef9a384ef298fec02111dc826308 15-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Add support for 8-bit unnormalized vertex attrs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
8e678de761e755564ade2794dbf68280a4972b66 15-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Rename UNPACK_8* to UNPACK_8*_F.

There is an equivalent unpack function without conversion to float if you
use an integer operation instead.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
3fe4d8e1e39b47c9c5c4bfdd87300abd0c336a7e 26-Nov-2014 Eric Anholt <eric@anholt.net> vc4: Introduce scheduling of QPU instructions.

This doesn't reschedule much currently, just tries to fit things into the
regfile A/B write-versus-read slots (the cause of the improvements in
shader-db), and hide texture fetch latency by scheduling setup early and
results collection late (haven't performance tested it). This
infrastructure will be important for doing instruction pairing, though.

shader-db2 results:
total instructions in shared programs: 61874 -> 59583 (-3.70%)
instructions in affected programs: 50677 -> 48386 (-4.52%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
f87c7008958cdb095efa1cfb29ca8f3c9b9066e4 02-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Add support for ARL and indirect register access on TGSI_FILE_CONSTANT.

Fixes 14 ARB_vp tests (which had no lowering done), and should improve
performance of indirect uniform array access in GLSL.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
52824811b9c0a9bb78a40fcb43af00b315f612d0 24-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Allow dead code elimination of unused varyings.

total instructions in shared programs: 39022 -> 37341 (-4.31%)
instructions in affected programs: 26979 -> 25298 (-6.23%)
total uniforms in shared programs: 11242 -> 10523 (-6.40%)
uniforms in affected programs: 5836 -> 5117 (-12.32%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
5d32e263357e562779bfc0d2af712d4c7538a32b 22-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Add debug output to match shaderdb info to program dumps.

I'm going to be using VC4_DEBUG=shaderdb,norast to do shaderdb stats, but
when debugging regressions, I want to match shaderdb output to shader
disassembly.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
201d4c0b2a6f7f0c1d59c4fd5cce4916fc48a2d2 15-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Add support for user clip plane and gl_ClipVertex.

Fixes about 15 piglit tests about interpolation and clipping.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
5d72a1c95662109b1338605da83329dd25e00859 13-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Match VS outputs to FS inputs.

If the VS doesn't output a value that the FS needs, we still need to read
the right contents for the remaining FS inputs, by emitting padding. And
if the VS outputs something the FS doesn't need, we shouldn't put it in
the VPM at all (so the code producing it can get DCEed).

Fixes 77 piglit tests.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
24d998056275f4fab9bf3e98c962d91245ef1b7b 02-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Add support for sampling from sRGB.

This isn't perfect -- the filtering is happening on the srgb values, and
we're decoding afterwards, which is not what you want. I think that's the
cause of some additional texwrap(GL_CLAMP, LINEAR) failures, though many
other texwrap tests on srgb start to pass since unfiltered values come out
correct.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
75f8e0bc2ae03154038c3f17fec1bcad699856e0 01-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Make the last static array in vc4_program.c dynamically sized.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
d7a0502a5440359d1cecd42e58bdb85c2d857824 01-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Add support for the FACE semantic.

Fixes glsl-fs-frontfacing.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
ae22f5aa14db0a42f4b6adafa11aa9f7bfd5d115 29-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Add support for texture border color.

One spot in the docs says that it's stored at a miplevel just beyond the
last miplevel, which was scary. But really, you just load it as the R
coordinate (which conflicts with cubemaps, but you don't do border
clamping on cubes).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
64122b16ce74a3fb65269bab325c651c26ccd2d0 25-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Dump constant uniform values in VC4_DEBUG=qir.

Definitely helps when trying to understand and optimize a program.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
730267eb23b418637c78662a77de0a93af91be35 28-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Add support for texture cube maps.

It's not passing some of the piglit tests, because it looks like at small
miplevels some contents from surrounding faces are getting filtered in at
the corners. It does get 7 new tests passing.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
7a85ebf6e211423c98bb045ad21026c5ffeaa9bb 29-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Stop trying to reuse temporaries that store uniform values.

Almost always, the MOV will get copy propagated out. Even if it doesn't,
it's probably better to just reload the uniform at next use (to reduce
register pressure) rather than try to save instruction count.

I was looking at this because in the presence of texturing (which calls
add_uniform() directly to get the uniform load forced into the
instruction) the c->uniform_contents indices don't match 1:1 with the
temporary qregs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
66b7bd60e01fd17a356bf26d69ea351261080586 24-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Add support for point size setting.

This is the support for both the global and per-vertex modes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
8cd165051b52a9d70512fd138463aa165bea849a 24-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Add support for alpha test.

Fixes most of piglit fbo-alphatest-formats (but not RGB565/332).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
45b104e0a228595142ed4bc62bbc8948100b9325 24-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Add support for flat shading.

This is just the GL 1.1 flat shading of colors -- we don't need to support
TGSI constant interpolation bits, because we don't do GLSL 1.30.

Fixes 7 piglit tests.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
2e48b286bf21501ac06832799a4b7957bb8ac893 23-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Add support for 8-bit unorm/snorm vertex inputs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
dcd03e74768bb4ba55b5742250f3ed15771b6f66 19-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Use the same method as for FRAG_Z to handle fragcoord W.

I need to get the non-reciprocal version of W for interpolation, anyway.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
19589147ef660c0bf7fcc52ca82dfbbadf3a9a23 18-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Add support for stencil operations.

While depth test state is passed through the fragment shader as sideband,
data, the stencil test state has to be set by the fragment shader itself.

Many tests are still failing, but this gets most of hiz/ passing.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
5e90ed79f670cc1c5c12c8b733d4591af0acb5ab 17-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Add support for reordering the uniform stream after optimization.

This allows for introducing dead code eliminating of uniforms, copy
propagation of uniforms, and instruction rescheduling between instructions
that both read uniforms.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
2264925f85f349f57773d46114806a148eae6394 16-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Add support for computed depth writes.

Fixes piglit glsl-1.10-fragdepth and early-z.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
aae4223fbd2d94f922339baa11ffefdb88896770 16-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Restructure depth input/output in fragment shaders.

The goal here is to have an argument for the depth write opcode so that I
can do computed depth. In the process, this makes the calculations that
will be emitted more obvious in the QIR.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
5638b87d4c72e0ed7bb4544885829f27ae3a91f5 15-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Dynamically allocate the TGSI-to-qreg arrays.

Fixes buffer overflows in some piglit tests (which are still failing to
register allocate anyway).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
2147dd96813d1faee5c55e84b332355ad05f070a 15-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Fix memory leaks of struct qinst.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
d952a98c5322e64cb436bd8b0f0064441f37ac77 07-Sep-2014 Eric Anholt <eric@anholt.net> vc4: Expose r4 to register allocation.

We potentially need to be careful that use of a value stored in r4 isn't
copy-propagated (or something) across another r4 write. That doesn't
appear to happen currently, and this makes the dataflow more obvious. It
also opens up not unpacking the r4 value, which will be useful for depth
textures.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
4bca922878a4d433077d21d4918b1db71b3a15f7 13-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Merge qcompile and tgsi_to_qir

The split between these two didn't make much sense. I'm going to want the
chance to look at uniform contents in optimization passes, and the QPU
emit I think is going to end up rewriting the uniforms stream.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
55d2a1626219ac041ce05477827b592efa1c7b81 25-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Add a CSE optimization pass.

Debugging a regression in discard support was just too full of duplicate
instructions, so I decided to remove them instead of re-analyzing each of
them as I dumped their outputs in simulation.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
80b27ca2cd8cd2bb2937baa441c43a396887cc03 24-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Switch to using native integers.

There were troubles with bools without using native integers
(st_glsl_to_tgsi seemed to think bool true was 1.0f sometimes, when as a
uniform it's stored as ~0), and since I've got native integers other than
divide, I might as well just support them.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
874dfa8b2ecccf3c9a73453d7ccc6638363a59bd 25-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Expose compares at a lower level in QIR.

Before, we had some special opcodes like CMP and SNE that emitted multiple
instructions. Now, we reduce those operations significantly, giving
optimization more to look at for reducing redundant operations.

The downside is that QOP_SF is pretty special -- we're going to have to
track it separately when we're doing instruction scheduling, and we want
to peephole it into the instruction generating the destination write in
most cases (and not allocate the destination reg, probably. Unless it's
used for some other purpose, as well).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
e51e20c35ef89409494161010f86750366faef4c 21-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Add support for fragment discards.

Fixes piglit glsl-fs-discard-01 and -03, and allows a lot of mesa demos to
start running. glsl-fs-discard-02 has a problem where the first tile is
not getting stored on the first render.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
b0a1e401a93b7b13870b936bc667b3fc15dba6d5 19-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Include stdio/stdlib in headers so I don't have to include it per file.

There are a few tools I want to have always available, and fprintf() and
abort() are among them.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
50b4293eb3f26609c28c37898877d15e3c597702 18-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Add a helper for QOP_R4_UNPACK_[ABCD].
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
7c65b714ed974248f09dcc0b4f020b2e2bf50227 14-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Add support for blending.

Passes blendminmax and blendsquare. glean's more serious blendFunc fails
in simulation due to binner memory overflow (I really need to work around
that), and fbo-blending-formats fails due to Mesa refusing one of the
getter requests, even before it could fail due to the driver not actually
supporting different formats yet.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
e63598aecb5d1cc2a20b8db1ef85790e301f4241 05-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Add support for gl_FragCoord.

This isn't passing all tests (glsl-fs-fragcoord-zw-ortho, for example),
but it does get a bunch more tests passing.

v2: Rebase on helpers change.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
99070c6daad72c96d517ffb18185c8b21b9d67f2 02-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Add proper translation from Zc to Zs for vertex output.

This fixes the remaining failure in depthfunc.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
4160ac5ee41630a5c9fc4e1f3520f0fabf42cb14 01-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Add support for depth clears and tests within a tile.

This doesn't load/store the Z contents across submits yet. It also
disables early Z, since it's going to require tracking of Z functions
across multiple state updates to track the early Z direction and whether
it can be used.

v2: Move the key setup to before the search for the key.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
857dcc09fa89aa676fdc95d318ecc4f7ad9cd70a 17-Jul-2014 Eric Anholt <eric@anholt.net> vc4: Add support for texture rectangles

v2: Rebase on helpers change.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
66c6c401279aa4152a24681f64d0e101aa004593 15-Jul-2014 Eric Anholt <eric@anholt.net> vc4: Add support for texturing (under simulation)

Only rgba8888 works, and only a single texture unit, and it's only under
simulation because I haven't built the kernel interface yet.

v2: Rebase on helpers.
v3: Fold in the don't-break-the-arm-build fix.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
bf542cd37286decbd9fc0c939007b82176e16a81 05-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Add support for the TGSI TRUNC opcode.

v2: Rebase on helpers.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
75afa64ef85aa33dfed8325aae767f8a55fd1840 17-Jul-2014 Eric Anholt <eric@anholt.net> vc4: Add support for multiple attributes
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
6ff2129d5842898eb87d2a457ee018fe66471f06 16-Jul-2014 Eric Anholt <eric@anholt.net> vc4: Add support for the lit opcode.

v2: Fix how it was using the X channel for the real work of the opcode,
instead of Y. Fixes glean's LIT test.
v3: Rebase on the helpers.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
ec9da314baf11bea57f315346091ae941ac4f662 04-Jul-2014 Eric Anholt <eric@anholt.net> vc4: Add copy propagation between temps.

We put in a bunch of extra MOVs for program outputs, and this can clean
those up. We should do uniforms, too, though.

v2: Fix missing flagging of progress when we actually optimize. Caught by
Aaron Watry.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
d9d1c14430aaeb5b22aa66b269ba288e3df24103 04-Jul-2014 Eric Anholt <eric@anholt.net> vc4: Add dead code elimination.

This cleans up a bunch of noise in the compiled coordinate shaders (since
we don't need the varying outputs), and also from writemasked instructions
with negated src operands.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
1d23d55ae97d07b6eb70a3e37a91ecb7de38d8d2 03-Jul-2014 Eric Anholt <eric@anholt.net> vc4: Add an initial pass of algebraic optimization.

There was a lot of extra noise in my piglit shader dumps because of silly
CMPs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
4c53087c67a266d81653459baa08ece5c1b418d8 16-Jul-2014 Eric Anholt <eric@anholt.net> vc4: Add support for CMP.

This took a couple of tries, and this is the squash of those attempts.

v2: Fix register file conflicts on the args in the
destination-is-accumulator case.
v3: Rebase on helper change and qir_inst4 change.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
eea1d36915cb97ee1a6eb6aeaf15dd5689f03148 04-Jul-2014 Eric Anholt <eric@anholt.net> vc4: Make scheduling of NOPs a separate step from QIR -> QPU translation.

This should also be used as a way to pair QIR instructions into QPU
instructions later.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
c29392751180e21a2857cade8d0b4902cbe9d001 04-Jul-2014 Eric Anholt <eric@anholt.net> vc4: Add WIP support for varyings.

It doesn't do all the interpolation yet, but more tests can run now.

v2: Rebase on helpers.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
2e35981d4d625d951328ef5b8f95798112997fb3 28-Jun-2014 Eric Anholt <eric@anholt.net> vc4: Add support for SNE/SEQ/SGE/SLT.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
792d1c92df6f58f219eb8b77e668424cdcc9c9af 27-Jun-2014 Eric Anholt <eric@anholt.net> vc4: Switch to actually generating vertex and fragment shader code from TGSI.

This introduces an IR (QIR, for QPU IR) to do optimization on. It's a
scalar, SSA IR in general. It looks like optimization is pretty easy this
way, though I haven't figured out if it's going to be good for our weird
register allocation or not (or if I want to reduce to basically QPU
instructions first), and I've got some problems with it having some
multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably
want to break down.

Of course, this commit mostly doesn't work, since many other things are
still hardwired, like the VBO data.

v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR
instructions into temporary values, and make qir_inst4 take the 4 args
separately instead of an array (all later callers wanted individual
args).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h