History log of /external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
98d7e874888db57841272402df4ce808c6271c99 30-Nov-2016 Eric Anholt <eric@anholt.net> vc4: Allow merging instructions with SF set where the other writes NOP.

I'm not sure how I managed to write the SF merge code
(7d8b79f398f18ed7bb48a74b1b82950e2f08abad) without allowing merges with
NOPs. *Everything* we try to merge with will have a NOP on one or the
other side of the instruction, and that's why that commit showed no
benefit.

total instructions in shared programs: 99347 -> 95128 (-4.25%)
instructions in affected programs: 91906 -> 87687 (-4.59%)

3DMMES performance +2.57105% +/- 0.135276% (n=6,8)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
8ce65261789f085e657e6a487db93d38ee6bea63 25-Aug-2016 Eric Anholt <eric@anholt.net> vc4: Add support for MUL output rotation.

Extracted from a patch by jonasarrow on github.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
074f1f3c0c2cd15213a62eb7f589423ece6391c8 25-Aug-2016 Eric Anholt <eric@anholt.net> vc4: Add support for the 2-bit LOAD_IMM variants.

Extracted and fixed up from a patch by jonasarrow on github. This ended
up not getting used for ddx/ddy, but seems like it might still be useful.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
c73aa0a09b996feff5aec42e0347b99b35b2f981 15-Mar-2016 Eric Anholt <eric@anholt.net> vc4: Add QPU support for generating BRANCH instructions.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
8d3b92af21afb58b6a65e18bb05785d7aae72c27 30-Aug-2015 Boyan Ding <boyan.j.ding@gmail.com> vc4: Try to pair up instructions when only one of them has PM bit

Instructions with difference in PM field can actually be paired up if
the one without PM doesn't do packing/unpacking and non-NOP
packing/unpacking operations from PM instruction aren't added to the
other without PM.

total instructions in shared programs: 48209 -> 47460 (-1.55%)
instructions in affected programs: 11688 -> 10939 (-6.41%)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
8f2fb68026d11feedb5d94cf17e719affe7b9423 11-Jan-2015 Eric Anholt <eric@anholt.net> vc4: Don't swap the raddr on instructions doing unpacks.

It would mean different unpacking behavior, since only the A file does
unpack (with PM==0).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
5d5707707fb10712ba130c2dafb50e8fc92a2bcc 11-Jan-2015 Eric Anholt <eric@anholt.net> vc4: Don't let pairing happen with badly mismatched unpack flags.

No difference on shader-db, but prevents definite regressions in the
blending changes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
3820866e40d16f5d05319f0390956fb6e6407239 11-Jan-2015 Eric Anholt <eric@anholt.net> vc4: Don't let pairing happen with badly mismatched pack flags.

No difference on shader-db, but will become more important as I introduce
more use of pack flags with the blending changes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
e473fbe4690b5cbe3769042a4917f22559e2ba8d 10-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Add support for turning constant uniforms into small immediates.

Small immediates have the downside of taking over the raddr B field, so
you might have less chance to pack instructions together thanks to raddr B
conflicts. However, it also reduces some register pressure since it lets
you load 2 "uniform" values in one instruction (avoiding a previous load
of the constant value to a register), and increases some pairing for the
same reason.

total uniforms in shared programs: 16231 -> 13374 (-17.60%)
uniforms in affected programs: 10280 -> 7423 (-27.79%)
total instructions in shared programs: 40795 -> 41168 (0.91%)
instructions in affected programs: 25551 -> 25924 (1.46%)

In a previous version of this patch I had a reduction in instruction count
by forcing the other args alongside a SMALL_IMM to be in the A file or
accumulators, but that increases register pressure and had a bug in
handling FRAG_Z. In this patch is I just use raddr conflict resolution,
which is more expensive. I think I'd rather tweak allocation to have some
way to slightly prefer good choices for files in general, rather than risk
failing to register allocate by forcing things into register classes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
1f0e1060503e9e700c22a07fa050c47ef5257a40 16-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Add support for turning add-based MOVs to muls for pairing.

total instructions in shared programs: 43053 -> 40795 (-5.24%)
instructions in affected programs: 37996 -> 35738 (-5.94%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
f96bd9673edd79e4304d8e60a4cb4a0119b12a28 16-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Add a helper for changing a field in an instruction.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
8e18adea61d023ab0a04207fc1ff1e2e25b6ab99 16-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Fix the name of qpu_waddr_ignores_ws().

We're deciding about the WS bit, not PM.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
4da9e3d80556253a05179c398ffb1c3120fa3089 15-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Fix leak of a copy of the scheduled QPU instructions.

They're copied into a vc4_bo after compiling is done.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
cff8c96a0d418f41e00aa97a13dc55e3ed213eb7 10-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Refuse to merge instructions involving 32-bit immediate loads.

An immediate load overwrites the mul and add operations, so you can't
merge with them.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
befdff81423a1b6a05969dfde59bfa9c521c4621 05-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Try swapping the regfile A to B to pair instructions.

total instructions in shared programs: 56995 -> 56087 (-1.59%)
instructions in affected programs: 40503 -> 39595 (-2.24%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
7d8b79f398f18ed7bb48a74b1b82950e2f08abad 05-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Allow pairing of some instructions that disagree about the WS bit.

No difference on shader-db because we tend to have a lot of other
conflicts going on as well (like RADDR_A disagreements)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
bd4057a5d74fd12222801c55ee98346af9c1095d 03-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Refuse to merge two ops that both access shared functions.

Avoids assertion failures in vc4_qpu_validate.c if we happen to find the
right set of operations available.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
29c7cf2b2ba850cf467167548d53383e1338fd5c 01-Dec-2014 Eric Anholt <eric@anholt.net> vc4: Pair up QPU instructions when scheduling.

We've got two mostly-independent operations in each QPU instruction, so
try to pack two operations together. This is fairly naive (doesn't track
read and write separately in instructions, doesn't convert ADD-based MOVs
into MUL-based movs, doesn't reorder across uniform loads), but does show
a decent improvement on shader-db-2.

total instructions in shared programs: 59583 -> 57651 (-3.24%)
instructions in affected programs: 47361 -> 45429 (-4.08%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
3fe4d8e1e39b47c9c5c4bfdd87300abd0c336a7e 26-Nov-2014 Eric Anholt <eric@anholt.net> vc4: Introduce scheduling of QPU instructions.

This doesn't reschedule much currently, just tries to fit things into the
regfile A/B write-versus-read slots (the cause of the improvements in
shader-db), and hide texture fetch latency by scheduling setup early and
results collection late (haven't performance tested it). This
infrastructure will be important for doing instruction pairing, though.

shader-db2 results:
total instructions in shared programs: 61874 -> 59583 (-3.70%)
instructions in affected programs: 50677 -> 48386 (-4.52%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
2d5784c8254b4a0e3e04dd0f1e46ab1eb85612dd 27-Nov-2014 Eric Anholt <eric@anholt.net> vc4: Add another check for invalid TLB scoreboard handling.

This was caught by an assertion in the simulator.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
84caf5a8617b99b6453fb66cb371a89ea2205dba 03-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Set unused raddr fields to QPU_R_NOP.

The simulator assertion fails if you have a write to a reg and then a read
(for example, in the NOP side of an instruction), even if the read isn't
used for anything. By setting unused raddrs to NOP, we avoid the problem
(since only the phsyical registers are tracked).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
48af7426f295a02ea68c4b460e006c289b10192c 03-Oct-2014 Eric Anholt <eric@anholt.net> vc4: Abstract out the field-merging logic for instructions.

I'm going to be doing the same logic for some more fields next.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
78d144f7de8cad42dfe588a667e105543f6b2e4b 25-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Set the other WADDR in the qpu instruction helpers.

Now you don't need to qpu_inst() your instruction with a NOP to get the
other waddr set.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
54499a85fff415e5c627a44d27a3592b6633bd4b 25-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Merge qpu_a_NOP() and qpu_m_NOP to a single qpu_NOP() helper.

Now that qpu_inst() ignores the WADDR from the other half of the
instruction, we can set both the ADD and MUL WADDRs in the NOP helper.
Thanks to that, we also no longer need to qpu_inst(NOP, NOP).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
1a7035f386c4402b07e7a2073daf914f95bd0a02 25-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Ignore WADDRs from the other half of the instruction when merging.

This allows setting the opposite-side WADDR to NOP (a non-zero value) in
qpu_* helpers, so that we don't need to qpu_inst() merge them with NOPs
all the time just to get the waddr set.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
0f894b2795b7a1a33e0d8233eeb2e8eba9c8dcc0 21-Aug-2014 Eric Anholt <eric@anholt.net> vc4: Make some helpers for setting condition codes in instructions.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
792d1c92df6f58f219eb8b77e668424cdcc9c9af 27-Jun-2014 Eric Anholt <eric@anholt.net> vc4: Switch to actually generating vertex and fragment shader code from TGSI.

This introduces an IR (QIR, for QPU IR) to do optimization on. It's a
scalar, SSA IR in general. It looks like optimization is pretty easy this
way, though I haven't figured out if it's going to be good for our weird
register allocation or not (or if I want to reduce to basically QPU
instructions first), and I've got some problems with it having some
multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably
want to break down.

Of course, this commit mostly doesn't work, since many other things are
still hardwired, like the VBO data.

v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR
instructions into temporary values, and make qir_inst4 take the 4 args
separately instead of an array (all later callers wanted individual
args).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
1850d0a1cbf044dc4d29b7a9ede2c634f667d853 19-Jun-2014 Eric Anholt <eric@anholt.net> vc4: Initial skeleton driver import.

This mostly just takes every draw call and turns it into a sequence of
commands that clear the FBO and draw a single shaded triangle to it,
regardless of the actual input vertices or shaders. I copied the initial
driver skeleton mostly from freedreno, and I've preserved Rob Clark's
copyright for those. I also based my initial hardcoded shaders and
command lists on Scott Mansell (phire)'s "hackdriver" project, though the
bit patterns of the shaders emitted end up being different.

v2: Rebase on gallium megadrivers changes.
v3: Rebase on PIPE_SHADER_CAP_MAX_CONSTS change.
v4: Rely on simpenrose actually being installed when building for
simulation.
v5: Add more header duplicate-include guards.
v6: Apply Emil's review (protection against vc4 sim and ilo at the same
time, and dropping the dricommon drm bits) and fix a copyright header
(thanks, Roland)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c