98d7e874888db57841272402df4ce808c6271c99 |
|
30-Nov-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Allow merging instructions with SF set where the other writes NOP. I'm not sure how I managed to write the SF merge code (7d8b79f398f18ed7bb48a74b1b82950e2f08abad) without allowing merges with NOPs. *Everything* we try to merge with will have a NOP on one or the other side of the instruction, and that's why that commit showed no benefit. total instructions in shared programs: 99347 -> 95128 (-4.25%) instructions in affected programs: 91906 -> 87687 (-4.59%) 3DMMES performance +2.57105% +/- 0.135276% (n=6,8)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
8ce65261789f085e657e6a487db93d38ee6bea63 |
|
25-Aug-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for MUL output rotation. Extracted from a patch by jonasarrow on github.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
074f1f3c0c2cd15213a62eb7f589423ece6391c8 |
|
25-Aug-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for the 2-bit LOAD_IMM variants. Extracted and fixed up from a patch by jonasarrow on github. This ended up not getting used for ddx/ddy, but seems like it might still be useful.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
c73aa0a09b996feff5aec42e0347b99b35b2f981 |
|
15-Mar-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add QPU support for generating BRANCH instructions.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
8d3b92af21afb58b6a65e18bb05785d7aae72c27 |
|
30-Aug-2015 |
Boyan Ding <boyan.j.ding@gmail.com> |
vc4: Try to pair up instructions when only one of them has PM bit Instructions with difference in PM field can actually be paired up if the one without PM doesn't do packing/unpacking and non-NOP packing/unpacking operations from PM instruction aren't added to the other without PM. total instructions in shared programs: 48209 -> 47460 (-1.55%) instructions in affected programs: 11688 -> 10939 (-6.41%) Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
8f2fb68026d11feedb5d94cf17e719affe7b9423 |
|
11-Jan-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Don't swap the raddr on instructions doing unpacks. It would mean different unpacking behavior, since only the A file does unpack (with PM==0).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
5d5707707fb10712ba130c2dafb50e8fc92a2bcc |
|
11-Jan-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Don't let pairing happen with badly mismatched unpack flags. No difference on shader-db, but prevents definite regressions in the blending changes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
3820866e40d16f5d05319f0390956fb6e6407239 |
|
11-Jan-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Don't let pairing happen with badly mismatched pack flags. No difference on shader-db, but will become more important as I introduce more use of pack flags with the blending changes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
e473fbe4690b5cbe3769042a4917f22559e2ba8d |
|
10-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for turning constant uniforms into small immediates. Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
1f0e1060503e9e700c22a07fa050c47ef5257a40 |
|
16-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for turning add-based MOVs to muls for pairing. total instructions in shared programs: 43053 -> 40795 (-5.24%) instructions in affected programs: 37996 -> 35738 (-5.94%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
f96bd9673edd79e4304d8e60a4cb4a0119b12a28 |
|
16-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add a helper for changing a field in an instruction.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
8e18adea61d023ab0a04207fc1ff1e2e25b6ab99 |
|
16-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Fix the name of qpu_waddr_ignores_ws(). We're deciding about the WS bit, not PM.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
4da9e3d80556253a05179c398ffb1c3120fa3089 |
|
15-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Fix leak of a copy of the scheduled QPU instructions. They're copied into a vc4_bo after compiling is done.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
cff8c96a0d418f41e00aa97a13dc55e3ed213eb7 |
|
10-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Refuse to merge instructions involving 32-bit immediate loads. An immediate load overwrites the mul and add operations, so you can't merge with them.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
befdff81423a1b6a05969dfde59bfa9c521c4621 |
|
05-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Try swapping the regfile A to B to pair instructions. total instructions in shared programs: 56995 -> 56087 (-1.59%) instructions in affected programs: 40503 -> 39595 (-2.24%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
7d8b79f398f18ed7bb48a74b1b82950e2f08abad |
|
05-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Allow pairing of some instructions that disagree about the WS bit. No difference on shader-db because we tend to have a lot of other conflicts going on as well (like RADDR_A disagreements)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
bd4057a5d74fd12222801c55ee98346af9c1095d |
|
03-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Refuse to merge two ops that both access shared functions. Avoids assertion failures in vc4_qpu_validate.c if we happen to find the right set of operations available.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
29c7cf2b2ba850cf467167548d53383e1338fd5c |
|
01-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Pair up QPU instructions when scheduling. We've got two mostly-independent operations in each QPU instruction, so try to pack two operations together. This is fairly naive (doesn't track read and write separately in instructions, doesn't convert ADD-based MOVs into MUL-based movs, doesn't reorder across uniform loads), but does show a decent improvement on shader-db-2. total instructions in shared programs: 59583 -> 57651 (-3.24%) instructions in affected programs: 47361 -> 45429 (-4.08%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
3fe4d8e1e39b47c9c5c4bfdd87300abd0c336a7e |
|
26-Nov-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Introduce scheduling of QPU instructions. This doesn't reschedule much currently, just tries to fit things into the regfile A/B write-versus-read slots (the cause of the improvements in shader-db), and hide texture fetch latency by scheduling setup early and results collection late (haven't performance tested it). This infrastructure will be important for doing instruction pairing, though. shader-db2 results: total instructions in shared programs: 61874 -> 59583 (-3.70%) instructions in affected programs: 50677 -> 48386 (-4.52%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
2d5784c8254b4a0e3e04dd0f1e46ab1eb85612dd |
|
27-Nov-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add another check for invalid TLB scoreboard handling. This was caught by an assertion in the simulator.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
84caf5a8617b99b6453fb66cb371a89ea2205dba |
|
03-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Set unused raddr fields to QPU_R_NOP. The simulator assertion fails if you have a write to a reg and then a read (for example, in the NOP side of an instruction), even if the read isn't used for anything. By setting unused raddrs to NOP, we avoid the problem (since only the phsyical registers are tracked).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
48af7426f295a02ea68c4b460e006c289b10192c |
|
03-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Abstract out the field-merging logic for instructions. I'm going to be doing the same logic for some more fields next.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
78d144f7de8cad42dfe588a667e105543f6b2e4b |
|
25-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Set the other WADDR in the qpu instruction helpers. Now you don't need to qpu_inst() your instruction with a NOP to get the other waddr set.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
54499a85fff415e5c627a44d27a3592b6633bd4b |
|
25-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Merge qpu_a_NOP() and qpu_m_NOP to a single qpu_NOP() helper. Now that qpu_inst() ignores the WADDR from the other half of the instruction, we can set both the ADD and MUL WADDRs in the NOP helper. Thanks to that, we also no longer need to qpu_inst(NOP, NOP).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
1a7035f386c4402b07e7a2073daf914f95bd0a02 |
|
25-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Ignore WADDRs from the other half of the instruction when merging. This allows setting the opposite-side WADDR to NOP (a non-zero value) in qpu_* helpers, so that we don't need to qpu_inst() merge them with NOPs all the time just to get the waddr set.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
0f894b2795b7a1a33e0d8233eeb2e8eba9c8dcc0 |
|
21-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Make some helpers for setting condition codes in instructions.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
792d1c92df6f58f219eb8b77e668424cdcc9c9af |
|
27-Jun-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Switch to actually generating vertex and fragment shader code from TGSI. This introduces an IR (QIR, for QPU IR) to do optimization on. It's a scalar, SSA IR in general. It looks like optimization is pretty easy this way, though I haven't figured out if it's going to be good for our weird register allocation or not (or if I want to reduce to basically QPU instructions first), and I've got some problems with it having some multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably want to break down. Of course, this commit mostly doesn't work, since many other things are still hardwired, like the VBO data. v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR instructions into temporary values, and make qir_inst4 take the 4 args separately instead of an array (all later callers wanted individual args).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|
1850d0a1cbf044dc4d29b7a9ede2c634f667d853 |
|
19-Jun-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Initial skeleton driver import. This mostly just takes every draw call and turns it into a sequence of commands that clear the FBO and draw a single shaded triangle to it, regardless of the actual input vertices or shaders. I copied the initial driver skeleton mostly from freedreno, and I've preserved Rob Clark's copyright for those. I also based my initial hardcoded shaders and command lists on Scott Mansell (phire)'s "hackdriver" project, though the bit patterns of the shaders emitted end up being different. v2: Rebase on gallium megadrivers changes. v3: Rebase on PIPE_SHADER_CAP_MAX_CONSTS change. v4: Rely on simpenrose actually being installed when building for simulation. v5: Add more header duplicate-include guards. v6: Apply Emil's review (protection against vc4 sim and ilo at the same time, and dropping the dricommon drm bits) and fix a copyright header (thanks, Roland)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qpu.c
|