ce0eebc935b5fb91b29ca656f79bdfff39beb8f8 |
|
07-Feb-2017 |
Eric Anholt <eric@anholt.net> |
vc4: Avoid emitting small immediates for UBO indirect load address guards. The kernel will reject our shader if we emit one here, and having 4, 8, or 12 as the top end of our UBO clamp rare is enough that it's not worth making the kernel let us. Fixes piglit fs-const-array-of-struct and fs-const-array-of-struct-of-array since recent GLSL linking changes made us get this as an indirect load of a uniform, instead of a tempoary. Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org> (cherry picked from commit b2309393039b2ec0cc00a8e6fd828c60c4ef1e11)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
4690a93b123a64f8730a870a336ae9756d11fd18 |
|
15-Nov-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for coalescing ALU ops into tex_[srtb] MOVs. This isn't as complete as I would like (can't merge interpolation because of the implicit r5 dependency, doesn't work with control flow), but this was cheap and easy. Improves 3DMMES Taiji performance by 1.15353% +/- 0.299896% (n=29, 16) total instructions in shared programs: 99810 -> 99059 (-0.75%) instructions in affected programs: 10705 -> 9954 (-7.02%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
a025983dd9cfcba8a452205efbc5c0be8ff3da74 |
|
23-Nov-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Make qir_for_each_inst_inorder() safe against removal. The dead code elimination wants it to be safe, and I actually got segfaults due to it being unsafe with the new coalescing pass.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
27544ea8d330309a7f1604bece6d2fcb4e9a8ae3 |
|
15-Nov-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Split optimizing VPM writes from VPM reads. The VPM write logic will be basically the same as the texture coordinate write logic we need, and it's not really related to the VPM read logic other than the reuse of the use_count array.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
d4c20e82ae34b105fb2d06c8c412656aba2ca1b9 |
|
15-Nov-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Restructure texture insts as ALU ops with tex_[strb] as the dst. For now we're still just generating MOVs, but this will let us fold into other ops in the future. No difference on shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
314f0c57e4c00b0a5cb544fa43e356c1069acd8f |
|
15-Nov-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Refactor qir_get_op_nsrc(enum qop) to qir_get_nsrc(struct qinst *). Every caller was dereffing the qinst, and this will let us make the number of sources vary depending on the destination of the qinst so that we can have general ALU ops that store to tex_[strb] and get an implicit uniform.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
51087327f2ba929739719b2ae243d8c69d31346f |
|
15-Nov-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Replace the qinst src[] with a fixed-size array. This may have made a tiny bit of sense when we had one 4-arg inst per shader, but if we only ever put 2 things in, having a pointer to 2 things almost every instruction is pointless indirection.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
a220f1b5a9beaba146096971354ae37c6f75d4ef |
|
15-Nov-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Remove qir_inst4(). This was used originally for unorm4x8 packs, but we now represent those as a series of packed movs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
414dbb2d5c48b7e9dc0dc8b14583f91415ca3960 |
|
22-Nov-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Don't conditionalize the src1 mov of qir_SEL(). My thought in having both arguments conditionally moved was that it should theoretically save some power by not doing work in those channels. However, it ends up costing us instructions because we can't register-coalesce the first of the MOVs, and it also introduces extra scheduling dependencies. The instruction cost would swamp whatever power benefit I was hoping for. shader-db results: total instructions in shared programs: 100548 -> 99741 (-0.80%) instructions in affected programs: 42450 -> 41643 (-1.90%) With obvious outliers removed (I had an X11 emacs running over the network in the "after" case), 3DMMES Taiji showed 1.07231% +/- 0.488241% fps improvement (n=18, 30).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
ace0d810e56a1e2978fc3ac237158918ebe2a23c |
|
11-Nov-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Flag the last thread switch in the program as the last. We don't allow the last thread switch to be inside control flow, to be sure that we hit the last state exactly once. If the last texturing was in control flow, fall back to single threaded.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
755037173d19b65777a97f55455c1f64bf618264 |
|
11-Nov-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for register allocation for threaded shaders. We have two major requirements: Make sure that only the bottom half of the physical reg space is used, and make sure that none of our values are live in an accumulator across a switch.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
4f527f12604269f15704bbd14a4962766afdfb9a |
|
11-Nov-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add a thread switch QIR instruction. This will eventually be generated at the QIR level, so that vc4_qir_schedule.c can arrange the separation of tex_strb from tex_result correctly. It will also be important so that register allocation set the register classes appropriately for values that are live across the switch.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
4d019bd703e7c20d56d5b858577607115b4926a3 |
|
07-Nov-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Don't abort when a shader compile fails. It's much better to just skip the draw call entirely. Getting this information out of register allocation will also be useful for implementing threaded fragment shaders, which will need to retry non-threaded if RA fails. Cc: <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
d4ae5ca823227214dd1f536e5f4058bede20b2dd |
|
05-Oct-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Fix live intervals analysis for screening defs in if statements. If a conditional assignment is only conditioned on the exec mask, that's still screening off the value in the executed channels (and, since we're not storing to the unexcuted channels, we don't care what's in there). Fixes a bunch of extra register pressure on Processing's Ribbons demo, which is failing to allocate.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
60bed14d0fdc3a05b6251b4ffc6013b5d3ca3e0f |
|
26-Aug-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Handle discards while in control flow. I missed this while adding loop support because the discard test inside a loop was crashing before, anyway. Fixes piglit glsl-fs-discard-04.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
8ce65261789f085e657e6a487db93d38ee6bea63 |
|
25-Aug-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for MUL output rotation. Extracted from a patch by jonasarrow on github.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
074f1f3c0c2cd15213a62eb7f589423ece6391c8 |
|
25-Aug-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for the 2-bit LOAD_IMM variants. Extracted and fixed up from a patch by jonasarrow on github. This ended up not getting used for ddx/ddy, but seems like it might still be useful.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
31da39ddc92e780dc539bf34d2de7f82fc65fa86 |
|
25-Aug-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add a QIR value for the QPU element register. This will be used in the ddx/ddy support for "Am I the top half?" or "Am I the left half?" checks.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
87a88f2daabfe14b12d447b3d96b9f8938c5cf03 |
|
22-Aug-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Fix GPU hangs with >16 varying values. Fixes glsl-routing in piglit and hangs in glbenchmark 2.0.2.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
e8378fee0c20ecd26451c079c725420077606cb9 |
|
27-Jul-2016 |
Eric Anholt <eric@anholt.net> |
nir: Define system values for vc4's blending-lowering arguments. In the GLSL-to-NIR conversion of VC4, I had a bit of trouble with what I was calling the "state uniforms" that I was putting into the NIR fighting with its other lowering passes. Instead of using magic uniform base numbers in the backend, follow the lead of load_user_clip_plane and just define system values for them. v2: Fix unintended change to channel_num, drop unspecified const_index value on blend_const_color_r_float. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
2350569a78c60d32e3b751b4386ea7e6d7e2ebe9 |
|
03-Aug-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far. We don't want to bake the whole array into the FS key, because of the hashing overhead. But we can keep a set of the arrays seen, and use a pointer to the copy in as the array's proxy. Between this and the previous patch, gl-1.0-blend-func now passes on hardware, where previously it was filling the 256MB CMA area with shaders and OOMing. Drops 712 shaders from shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
bc1fc9c98539f38f5a29b314d4a993a2e2f7ca0a |
|
03-Aug-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Avoid generating a custom shader per level in glGenerateMipmaps(). We were baking in the LOD of the source level to each shader. Instead, pass it in as a uniform -- this requires storing it to a temp register, but that's better than compiling a ton of separate shaders: total instructions in shared programs: 115032 -> 115036 (0.00%) instructions in affected programs: 96 -> 100 (4.17%) LOST: 572
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
3bcd0f1912a60cc9d3813923d18d29465e41ff56 |
|
15-Jul-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Speed up glGenerateMipmaps by avoiding shadow baselevel. To support general GL_TEXTURE_BASE_LEVEL we have to copy to a temporary miptree. However, if a single level is being selected, we can use the existing miptree and force all the sampling to be from that particular level. This avoids a ton of software fallbacks in glGenerateMipmaps(), which uses base levels in the blit implementation in gallium. Improves "glmark2 -b terrain" from 2 fps to 3 (perhaps some more precision would be useful?), and cuts its CPU usage during the benchmarking from ~30% to ~10% (total CPU time from 8.8s to 7.6s).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
9194473dd260fe72042807a97be0072c6f0537da |
|
06-May-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Emit resets of the uniform stream at the starts of blocks. If a block might be entered from multiple locations, then the uniform stream will (probably) be at different points, and we need to make sure that it's pointing where we expect it to be. The kernel also enforces that any block reading a uniform resets uniforms, to prevent reading outside of the uniform stream by using looping.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
44df061aaad96fc5db630ae69fb2fe2a03bb5659 |
|
27-Apr-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for scheduling of branch instructions. For now we don't fill the delay slots, and instead just drop in NOPs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
a59da513d3229c883809ac2088c9612abcec1470 |
|
16-Mar-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Move the QPU instructions to schedule into each block. We'll want to schedule them individually, to handle delay slots.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
420845acb2207cb9d903e67b66deaf08637ac3b2 |
|
02-May-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for NIR loops and break/continue.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
f505f66cd5a266dc70ad12e2b015e6c631651aec |
|
28-Apr-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for storing to NIR registers in a non-SSA fashion. Previously, there were occasionally NIR registers in our programs, but they were always actually used SSA-only. Now that we're trying to support control flow, we need to actually conditionally move to registers based on whether channels are active or not.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
05bcd9dd960d5658801ab35d429ba9778f67cad0 |
|
15-Mar-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Define a QIR branch instruction This uses the branch condition code in inst->cond to jump to either successor[0] (condition matches) or successor[0] (condition doesn't match).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
89918c1e74e454af119e7ae23f3ed66fc26abc4b |
|
10-Mar-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Implement live intervals using a CFG. Right now our CFG is always a trivial single basic block, but that will change when enable loops.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
6c1f834a237540c344fa794d60501a69bf066fb5 |
|
09-Jul-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Create a basic block structure and move the instructions into it. The optimization passes and scheduling aren't actually ready for multiple blocks with control flow yet (as seen by the "cur_block" references in them instead of iterating over blocks), but this creates the structures necessary for converting them.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
d3cdbf6fd817ae5e7a8a72bcc3f43cc1b04a709b |
|
09-Jul-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add a "qir_for_each_inst_inorder" macro and use it in many places. We have the prior list_foreach() all over the code, but I need to move where instructions live as part of adding support for control flow. Start by just converting to a helper iterator macro. (The simpler "qir_for_each_inst()" will be used for the for-each-inst-in-a-block iterator macro later)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
ac772b24a18bddc490dae171441795dedd85d7b2 |
|
26-Jun-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Regularize instruction emit macros ALU0 didn't have the _dest variant, and ALU2 didn't unset the def the way ALU1 did. This should make the ALU[012] macros much clearer, by moving most of their contents to vc4_qir.c
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
200b4e4bd5e87fea91193e3d1976b9cf0eabf8ba |
|
03-Jun-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Move SF removal to a separate peephole pass. The DCE pass is going to change significantly to handle control flow, while we don't really need to change it for the SF handling. We also need to add some more SF peephole optimization for SF updates generated by control flow support. No change on shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
eaa53f80d9da292ade219c609f8ac37f9a8ca0d7 |
|
26-Jun-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Drop the dead QIR_PACK() macro. This isn't used since we switched to using the dst.pack field instead of custom instructions.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
18260d05820eca971873407e939007c12600660c |
|
17-May-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for vertex color clamping in the rasterizer. This gets us precompile of vertex shaders at the state tracker level as well.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
a1f698881e13a4993e958815b79f8150d48e2739 |
|
06-May-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for loading immediate values in QIR. This will be used for resetting the uniform stream in the presence of branching, but may also be useful as an optimization to reduce how many uniforms we have to copy out per draw call (in exchange for increasing icache pressure).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
8e2d0843c02daf5280184f179ae8ed440ac90d7f |
|
02-May-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add a small QIR validate pass. This has caught a couple of bugs during loop development so far, and I should probably have written it long ago.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
419fee92eef229314e28879a7b8a6a8dc3b4b549 |
|
02-May-2016 |
Eric Anholt <eric@anholt.net> |
vc4: When emitting an instruction to an existing temp, mark it non-SSA. Prevents a bug in the later control-flow support series.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
226bd9294541f65c91cad44924ef68b6da18f2a2 |
|
22-Apr-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Use NIR lowering for sRGB decode. This should get us the same decode code generated, but with a lot less custom code in the driver.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
84322b2f315d5dfdb15302ed6cbe5ed79d775d69 |
|
28-Apr-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Remove the CSE pass. It's not doing anything according to shader-db now that we're using NIR. It would have had to be reworked significantly anyway, to handle control flow.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
b145b731ab01937993e2bf7ecc072217932568ff |
|
28-Apr-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Emit only one FRAG_Z or FRAG_W QIR opcode. We were generating piles of FRAG_W for interpolation, only to CSE them away immediately. Since this is the only thing that CSE is doing for us any more, just avoid making the CSE work necessary.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
2402bb60955b56f915e49f6648eb6c6221fe0862 |
|
19-Jan-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Remove unused "immediates" field This was for TGSI, which we no longer have to deal with.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
30b818d5eb67c7427fbefb456c7bc2d876bf9eac |
|
21-Mar-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Move FRAG_X/Y/REV_FLAG to a QFILE like VPM or TLB color writes. This gives us one less set of special instruction generation cases, and instead just the case for returning the correct register to read.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
f029932cac36859df5a6d04d1dd7343672ced83a |
|
21-Mar-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Allow TLB Z/color/stencil writes from any ALU operation in QIR. This lets us write the Z directly from the FTOI for computed Z, and may let us coalesce color writes in the future. No change in my shader-db, but clearly drops an instruction in piglit's early-z test.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
44d7b8ad12df504058615901c7233c45e4f24a9f |
|
21-Mar-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Add a helper function for the construction of qregs. The separate declaration of the struct is not helping clarity, and I was going to be writing a whole lot more of these in the upcoming patches.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
483c172989be74a992befce3c0a9058a82b35c80 |
|
21-Mar-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Drop the multi_instruction distinction for QIR instructions. It wasn't correctly flagged everywhere, and QPU generation now handles the only remaining case that was paying attention to it. No change on shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
99a759a4a3c29c283ae93612017d2f31c0ddbe73 |
|
08-Apr-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Switch to using NIR_PASS macros. This gets us better validation of our NIR transformations.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
2b9f0dffe00bdc556436da02c099b8a50ecc4f49 |
|
16-Mar-2016 |
Eric Anholt <eric@anholt.net> |
vc4: Move discard handling to the condition flag. Now that the field exists in the instruction, we can make discards less special. As a bonus, that means that we should be able to merge some more .sf instructions together when we get around to that. This causes some scheduling changes, as it allows tlb_color_reads to be delayed past the discard condition setup. Since the tlb_color_read ends up later, this may mean performance improvements, but I haven't tested. total instructions in shared programs: 78114 -> 78035 (-0.10%) instructions in affected programs: 1922 -> 1843 (-4.11%) total estimated cycles in shared programs: 234318 -> 234329 (0.00%) estimated cycles in affected programs: 8200 -> 8211 (0.13%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
e103b52aec773537d2821d8acc42ac9caa2a4b17 |
|
07-Mar-2016 |
Varad Gautam <varadgautam@gmail.com> |
vc4: Coalesce instructions using VPM reads into the VPM read. This is done instead of copy propagating the VPM reads into the instructions using them, because VPM reads have to stay in order. shader-db results: total instructions in shared programs: 78509 -> 78114 (-0.50%) instructions in affected programs: 5203 -> 4808 (-7.59%) total estimated cycles in shared programs: 234670 -> 234318 (-0.15%) estimated cycles in affected programs: 5345 -> 4993 (-6.59%) Signed-off-by: Varad Gautam <varadgautam@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Rhys Kidd <rhyskidd@gmail.com>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
a39a8fbbaa129f4e52f2a3ad2747182e9a74d910 |
|
17-Jan-2016 |
Emil Velikov <emil.velikov@collabora.com> |
nir: move to compiler/ Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Matt Turner <mattst88@gmail.com> Acked-by: Jose Fonseca <jfonseca@vmware.com>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
12519a972f53dba13289b0abebd558fd8506a539 |
|
19-Dec-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Use NIR texture lowering for texture swizzling. We can't use its other features currently (mostly because we don't want Newton-Raphson on rcps for texture coordinates), but it gets us started. This eliminates some comparisons with constants in GLB2.7 and ETQW traces at the QIR level by moving the comparisons into NIR, where they get constant-folded out. instructions in affected programs: 165 -> 156 (-5.45%) total uniforms in shared programs: 32087 -> 32085 (-0.01%) total estimated cycles in shared programs: 245762 -> 245752 (-0.00%) estimated cycles in affected programs: 461 -> 451 (-2.17%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
71db7d3dc577e48da3689fd66989ec3b0a069089 |
|
22-Dec-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Replace the SSA-style SEL operators with conditional MOVs. I'm moving away from QIR being SSA (since NIR is doing lots of SSA optimization for us now) and instead having QIR just be QPU operations with virtual registers. By making our SELs be composed of two MOVs, we could potentially coalesce the registers for the MOV's src and dst and eliminate the MOV. total instructions in shared programs: 88448 -> 88028 (-0.47%) instructions in affected programs: 39845 -> 39425 (-1.05%) total estimated cycles in shared programs: 246306 -> 245762 (-0.22%) estimated cycles in affected programs: 162887 -> 162343 (-0.33%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
f1fb85e5440d8874997eea1df982cf02b6ca2ca2 |
|
19-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Do instruction scheduling on the QIR to hide texture fetch latency. This is a rewrite of vc4_opt_qpu_schedule.c to operate on QIR. Texture fetch can probably take as much as the rest of the cycles of the program, so it's important to hide our other cycles during it (which is hard to do after register allocation). Also, we can queue up multiple texture requests before collecting the resulting samples, so that we keep the texture unit busy more of the time. High-settings openarena performance +2.35849% +/- 0.221154% (n=7). Also about 2-3% on the multiarb demo. 8 piglit tests (ext_framebuffer_multisample accuracy depthstencil) go from failing in rendering to failing in register allocation, but hopefully I can fix that up with some better register pressure handling here. total instructions in shared programs: 87723 -> 88448 (0.83%) instructions in affected programs: 78411 -> 79136 (0.92%) total estimated cycles in shared programs: 276583 -> 246306 (-10.95%) estimated cycles in affected programs: 265691 -> 235414 (-11.40%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
5989ef2b0feb40821a20768c7b4b196b3e793960 |
|
09-Dec-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Add debugging of the estimated time to run the shader to shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
78b81be627734ea7fa50ea246c07b0d4a3a1638a |
|
25-Nov-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
nir: Get rid of *_indirect variants of input/output load/store intrinsics There is some special-casing needed in a competent back-end. However, they can do their special-casing easily enough based on whether or not the offset is a constant. In the mean time, having the *_indirect variants adds special cases a number of places where they don't need to be and, in general, only complicates things. To complicate matters, NIR had no way to convdert an indirect load/store to a direct one in the case that the indirect was a constant so we would still not really get what the back-ends wanted. The best solution seems to be to get rid of the *_indirect variants entirely. This commit is a bunch of different changes squashed together: - nir: Get rid of *_indirect variants of input/output load/store intrinsics - nir/glsl: Stop handling UBO/SSBO load/stores differently depending on indirect - nir/lower_io: Get rid of load/store_foo_indirect - i965/fs: Get rid of load/store_foo_indirect - i965/vec4: Get rid of load/store_foo_indirect - tgsi_to_nir: Get rid of load/store_foo_indirect - ir3/nir: Use the new unified io intrinsics - vc4: Do all uniform loads with byte offsets - vc4/nir: Use the new unified io intrinsics - vc4: Fix load_user_clip_plane crash - vc4: add missing src for store outputs - vc4: Fix state uniforms - nir/lower_clip: Update to the new load/store intrinsics - nir/lower_two_sided_color: Update to the new load intrinsic NIR and i965 changes are Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> NIR indirect declarations and vc4 changes are Reviewed-by: Eric Anholt <eric@anholt.net> ir3 changes are Reviewed-by: Rob Clark <robdclark@gmail.com> NIR changes are Acked-by: Rob Clark <robdclark@gmail.com>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
6b4dfd53ae9b4f86cda0377a4d67b79e9faf7cc8 |
|
23-Jun-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for texel fetches from MSAA resources. This is the core of ARB_texture_multisample. Most of the piglit tests for GL_ARB_texture_multisample require GL 3.0, but exposing support for this lets us use the gallium blitter for multisample resolves. We can sometimes multisample resolve using just the RCL, but that requires that the blit is 1:1, unflipped, and aligned to tile boundaries.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
a97b40dca4949b5b8b3320e76768e54f430c9e78 |
|
23-Jun-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for multisample framebuffer operations. This includes GL_SAMPLE_COVERAGE, GL_SAMPLE_ALPHA_TO_ONE, and GL_SAMPLE_ALPHA_TO_COVAGE. I haven't implemented a dithering function yet, and gallium doesn't give me a good chance to do so for GL_SAMPLE_COVERAGE.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
74c4b3b80cc4246fd1eb503d97edb3d293eef5de |
|
21-Nov-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for storing sample mask. From the API perspective, writing 1 bits can't turn on pixels that were off, so we AND it with the sample mask from the payload.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
a664233042e1ad343184a0c237c3bd7ac5010779 |
|
21-Nov-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for loading sample mask.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
a4bf28178f064082d3b818d2cd48abf9075cc459 |
|
11-Nov-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB. It looks like nir_lower_idiv is going to use it soon, so add support. With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with GL 3.0 forced on). Cc: "11.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
01ca4f207efac555ff5f729dce1687a68ba65400 |
|
26-Oct-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Rewrite the pack instructions as a MOV with a dst pack flag Another step in reducing the special-casing of instructions.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
99a9a5a345fab8bbf36ab4e42581f8ee04a59a63 |
|
25-Oct-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Switch the unpack ops to being unpack flags on a mov. This paves the way for copy propagating our unpacks. We end up with a small change on shader-db: total instructions in shared programs: 89390 -> 89251 (-0.16%) instructions in affected programs: 19041 -> 18902 (-0.73%) which appears to be because we no longer convert MOVs for an FMAX dst, r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and this ends up with a slightly better schedule.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
652a864b257650e730ecec9e5882d765840a02e1 |
|
26-Oct-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Fix up the test for whether the unpack can be from r4. We can do 16a/16b from float as well. No difference on shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
f09ed63f4342846e361242233162799140674d5f |
|
25-Oct-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Fix the test for skipping raw MOVs. I don't know what previous test was trying to do, but it dates back to the first add of vc4_qpu_emit.c. No change to shader-db.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
70b06fb5d55d639fd74596a2ff6971cb57c030ca |
|
19-Aug-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Convert blending to being done in 4x8 unorm normally. We can't do this all the time, because you want blending to be done in linear space, and sRGB would lose too much precision being done in 4x8. The win on instructions is pretty huge when you can, though. total uniforms in shared programs: 32065 -> 32168 (0.32%) uniforms in affected programs: 327 -> 430 (31.50%) total instructions in shared programs: 92644 -> 89830 (-3.04%) instructions in affected programs: 15580 -> 12766 (-18.06%) Improves openarena performance at 1920x1080 from 10.7fps to 11.2fps.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
8e701fda499af0387f5c72f7bc14510182738647 |
|
09-Jan-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Add QIR/QPU support for the 8-bit vector instructions.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
fb064901e9bd83a63d486f246b9ea943cd00f6cd |
|
21-Oct-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Use Rob's NIR-based user clip lowering.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
cfa980f49356eb2d94178f8cc9d67d01b4e3d695 |
|
09-Sep-2015 |
Eric Anholt <eric@anholt.net> |
vc4: convert from tgsi semantic/index to varying-slot (originally part of previous patch, split out to separate patch by Rob) v2: squash in some fixes from Eric v3: Another fix from Eric for point coords. Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
89b1b33f44bc6ce71109ac8668529c30b6d6d910 |
|
21-Aug-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Fold the 16-bit integer pack into the instructions generating it. total instructions in shared programs: 97580 -> 96798 (-0.80%) instructions in affected programs: 52826 -> 52044 (-1.48%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
4ae137534a8718db4611782dbfec773504b6e3be |
|
19-Aug-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Make _dest variants of qir ALU helpers to provide an explicit dest.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
8b36d107fdd6f6b91556fcdc3498df16803d4181 |
|
19-Aug-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Pack the unorm-packing bits into a src MUL instruction when possible. Now that we do non-SSA QIR instructions, we can take a NIR SSA src that's only used by the unorm packing and just stuff the pack bits into it. total instructions in shared programs: 98136 -> 97974 (-0.17%) instructions in affected programs: 4149 -> 3987 (-3.90%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
572a48366d9dfac6a7f9ee8f4d29832c496125e2 |
|
19-Aug-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Add a QIR helper for whether the op is a MUL type.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
98728ce0718e49864b872beb76fc3afbf341b38a |
|
06-Aug-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Switch QPU_PACK_SCALED to be two non-SSA instructions. total instructions in shared programs: 98159 -> 98136 (-0.02%) instructions in affected programs: 12279 -> 12256 (-0.19%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
69ef08d303cdf153fe2432a7e40faccae5d62aab |
|
06-Aug-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Make the pack-to-unorm instructions be non-SSA. This helps ensure that the register allocator doesn't force the later pack operations to insert extra MOVs. total instructions in shared programs: 98170 -> 98159 (-0.01%) instructions in affected programs: 2134 -> 2123 (-0.52%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
bf3c50fba221f216e38d3f60f89161ced4c684c0 |
|
14-Apr-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Move all of our fixed function fragment color handling to NIR. This massively reduces our dependency on VC4-specific optimization passes. shader-db: total uniforms in shared programs: 32077 -> 32067 (-0.03%) uniforms in affected programs: 149 -> 139 (-6.71%) total instructions in shared programs: 98208 -> 98182 (-0.03%) instructions in affected programs: 2154 -> 2128 (-1.21%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
38c6c0f5b499e2bcff2cc9607f67c0f1836f305b |
|
31-Jul-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Add a helper for making driver-specific NIR load_uniform for GL state In order to move more of our lowering into NIR, we need the ability to reference various pipeline state (like texture rectangle scaling factors or blend colors), so we just set those up as a load_uniform with a big offset to indicate that it's not within the shader's uniform storage and is one of our state values.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
cc8fb2904673588d31b660dbfaf692615b5202dd |
|
31-Jul-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Make r4-writes implicitly move to a temp, and allocate temps to r4. Previously, SFU values always moved to a temporary, and TLB color reads and texture reads always lived in r4. Instead, we can have these results just be normal temporaries, and the register allocator can leave the values in r4 when they don't interfere with anything else using r4. shader-db results: total instructions in shared programs: 100809 -> 100040 (-0.76%) instructions in affected programs: 42383 -> 41614 (-1.81%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
9b403c0756ecf806a8ff768bd73a4cbf42986bdb |
|
31-Jul-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Drop a dead prototype.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
13ddd48b97474c261ef2d7412629748d6d91f2ad |
|
30-Jul-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Move program keys to the header file. I want to be able to inspect them from other files for lowering passes in NIR.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
b85f6ae4b24ee50948f14a9effa982eb0b9b3681 |
|
30-Jul-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Start adding a NIR-based output lowering pass. For now, this just splits up store_output intrinsics to be scalars, and drops unused outputs in the coordinate shader. My goal is to be able to drop a bunch of my VC4-specific optimization by letting NIR handle it.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
0f69d59b1c8f5314c1abe18659b96adcfc51a0e5 |
|
24-Jun-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Make a helper for TLB color writes, too. We've done so for all the other QIR instruction generation in this file.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
78c773bb3646295e4a4f1fe7d6d10f05758ee48b |
|
30-May-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Convert from simple_list.h to list.h list.h is a nicer and more familiar set of list functions/macros.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
73e2d4837d7e4611f31532ab0ccc14369341e0cb |
|
30-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Convert to consuming NIR. NIR brings us better optimization than I would have bothered to write within the driver, developers sharing future optimization work, and the ability to share device-specific lowering code that we and other GLES2-level drivers need. total uniforms in shared programs: 13421 -> 13422 (0.01%) uniforms in affected programs: 62 -> 63 (1.61%) total instructions in shared programs: 39961 -> 39707 (-0.64%) instructions in affected programs: 15494 -> 15240 (-1.64%) v2: Add missing imov support, and assert that there are no dest saturates. v3: Rebase on the target-specific algebraic series. v4: Rebase on gallium-includes-from-NIR changes in mater. v5: Rebase on variables being in lists instead of hash tables. v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which I'm not committing)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
8c5dcdbccb68b73d2856d9c1faafadc536e682e3 |
|
30-Mar-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Add a constant folding pass. This cleans up some pointless operations generated by the in-driver mul24 lowering (commonly generated by making a vec4 index for a matrix in a uniform array). I could fill in other operations, but pretty much anything else ought to be getting handled at the NIR level, I think. total uniforms in shared programs: 13423 -> 13421 (-0.01%) uniforms in affected programs: 346 -> 344 (-0.58%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
85316d059c899ac096331251de6b233229aa0b4f |
|
19-Feb-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Keep an array of pointers to instructions defining the temps around. The optimization passes are always regenerating it and throwing it away, but it's not hard to keep track of.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
877b48a531adc397493e508e509aba2918915349 |
|
19-Feb-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Move qir_uniform() and the constant-value versions to vc4_qir.c/h. I may want them in optimization passes, and they're not really particular to the program translation stage.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
14dc281c1332518b6144718e1fb3845abbe23ff7 |
|
19-Feb-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Enforce one-uniform-per-instruction after optimization. This lets us more intelligently decide which uniform values should be put into temporaries, by choosing the most reused values to push to temps first. total uniforms in shared programs: 13457 -> 13433 (-0.18%) uniforms in affected programs: 1524 -> 1500 (-1.57%) total instructions in shared programs: 40198 -> 40019 (-0.45%) instructions in affected programs: 6027 -> 5848 (-2.97%) I noticed this opportunity because with the NIR work, some programs were happening to make different uniform copy propagation choices that significantly increased instruction counts.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
3f1e1287fd960966eee8b12a75c8a8f62e11cdd2 |
|
12-Feb-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Make SF be a flag on the QIR instructions. Right now the places that used to emit a mov.sf just put the SF on the previous instruction when it generated the source of the SF value. Even without optimization to push the sf up further (and kill thus potentially kill more MOVs), this gets us: total uniforms in shared programs: 13455 -> 13457 (0.01%) uniforms in affected programs: 3 -> 5 (66.67%) total instructions in shared programs: 40296 -> 40198 (-0.24%) instructions in affected programs: 12595 -> 12497 (-0.78%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
d70eb3851753ed7b57c56e4a7fd538857e4385ce |
|
14-Nov-2014 |
Eric Anholt <eric@anholt.net> |
gallium: Replace u_simple_list.h with util/simple_list.h The code was exactly the same, except util/ has c++ guards and a struct simple_node declaration. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
772c47aefe96694c5f3fa354bd6792d137824700 |
|
11-Jan-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Move the tests for src needing to be an A register to vc4_qir.c. I want it from another location.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
a8e14c293b19a2d298f91f283d6b6839f36fb518 |
|
10-Jan-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Allow dead code elimination of VPM reads. This gets a bunch of dead reads out of the CSes, which don't read most attributes generally. total instructions in shared programs: 39753 -> 39487 (-0.67%) instructions in affected programs: 4721 -> 4455 (-5.63%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
b920ecf793bd419558a240014624add08774765d |
|
10-Jan-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Cook up the draw-time VPM setup info during shader compile. This will give the compiler the chance to dead-code eliminate unused VPM reads. This is particularly a big deal in the CS where a bunch of vattrs are just not going to be used.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
c772c92153fdcd4ba4920b7ef1745ce83b09603b |
|
10-Jan-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Split two notions of instructions having side effects. Some ops can't be DCEd, while some of the ops that are just important due to the args they have can be.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
a58ae83882b3ad3ecb53271f42cf1fd8f9c2907c |
|
10-Jan-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Redo VPM reads as a read file. This will let us do copy propagation of the VPM reads.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
92a0b0bd7099b15320faaccfd70b3c8dc877810e |
|
09-Jan-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Pack VPM attr contents according to just the size of the attribute. total instructions in shared programs: 40960 -> 39753 (-2.95%) instructions in affected programs: 20871 -> 19664 (-5.78%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
72cb6619cb75a92901d372d687505a747a384571 |
|
09-Jan-2015 |
Eric Anholt <eric@anholt.net> |
vc4: Restructure color packing as a series of channel replacements. I'm using this in some WIP commits for doing blending in 8888 instead of vec4. But it also gives us these results immediately, thanks to allowing more uniforms/immediates in the arguments: total instructions in shared programs: 41027 -> 40960 (-0.16%) instructions in affected programs: 4381 -> 4314 (-1.53%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
e06b0778f59980429fececb1aa0de0f0a3f23427 |
|
18-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Coalesce MOVs into VPM with the instructions generating the values. total instructions in shared programs: 41168 -> 40976 (-0.47%) instructions in affected programs: 18156 -> 17964 (-1.06%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
a871eff16cc18232ee03b372d75cb6f633213e14 |
|
18-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Redefine VPM writes as a (destination) QIR register file. This will let me coalesce the VPM writes into the instructions generating the values.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
e473fbe4690b5cbe3769042a4917f22559e2ba8d |
|
10-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for turning constant uniforms into small immediates. Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
ff266483fb61fd69775daf5c931ca7a56a26f4ac |
|
11-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Move follow_movs() to common QIR code. I want this from other passes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
48a2154520351a22fc860efcdaa4329a51d29c8d |
|
15-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for 16-bit signed/unsigned norm/scaled vertex attrs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
2142fd1f6f36ef9a384ef298fec02111dc826308 |
|
15-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for 8-bit unnormalized vertex attrs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
8e678de761e755564ade2794dbf68280a4972b66 |
|
15-Dec-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Rename UNPACK_8* to UNPACK_8*_F. There is an equivalent unpack function without conversion to float if you use an integer operation instead.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
3fe4d8e1e39b47c9c5c4bfdd87300abd0c336a7e |
|
26-Nov-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Introduce scheduling of QPU instructions. This doesn't reschedule much currently, just tries to fit things into the regfile A/B write-versus-read slots (the cause of the improvements in shader-db), and hide texture fetch latency by scheduling setup early and results collection late (haven't performance tested it). This infrastructure will be important for doing instruction pairing, though. shader-db2 results: total instructions in shared programs: 61874 -> 59583 (-3.70%) instructions in affected programs: 50677 -> 48386 (-4.52%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
f87c7008958cdb095efa1cfb29ca8f3c9b9066e4 |
|
02-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for ARL and indirect register access on TGSI_FILE_CONSTANT. Fixes 14 ARB_vp tests (which had no lowering done), and should improve performance of indirect uniform array access in GLSL.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
52824811b9c0a9bb78a40fcb43af00b315f612d0 |
|
24-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Allow dead code elimination of unused varyings. total instructions in shared programs: 39022 -> 37341 (-4.31%) instructions in affected programs: 26979 -> 25298 (-6.23%) total uniforms in shared programs: 11242 -> 10523 (-6.40%) uniforms in affected programs: 5836 -> 5117 (-12.32%)
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
5d32e263357e562779bfc0d2af712d4c7538a32b |
|
22-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add debug output to match shaderdb info to program dumps. I'm going to be using VC4_DEBUG=shaderdb,norast to do shaderdb stats, but when debugging regressions, I want to match shaderdb output to shader disassembly.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
201d4c0b2a6f7f0c1d59c4fd5cce4916fc48a2d2 |
|
15-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for user clip plane and gl_ClipVertex. Fixes about 15 piglit tests about interpolation and clipping.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
5d72a1c95662109b1338605da83329dd25e00859 |
|
13-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Match VS outputs to FS inputs. If the VS doesn't output a value that the FS needs, we still need to read the right contents for the remaining FS inputs, by emitting padding. And if the VS outputs something the FS doesn't need, we shouldn't put it in the VPM at all (so the code producing it can get DCEed). Fixes 77 piglit tests.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
24d998056275f4fab9bf3e98c962d91245ef1b7b |
|
02-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for sampling from sRGB. This isn't perfect -- the filtering is happening on the srgb values, and we're decoding afterwards, which is not what you want. I think that's the cause of some additional texwrap(GL_CLAMP, LINEAR) failures, though many other texwrap tests on srgb start to pass since unfiltered values come out correct.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
75f8e0bc2ae03154038c3f17fec1bcad699856e0 |
|
01-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Make the last static array in vc4_program.c dynamically sized.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
d7a0502a5440359d1cecd42e58bdb85c2d857824 |
|
01-Oct-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for the FACE semantic. Fixes glsl-fs-frontfacing.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
ae22f5aa14db0a42f4b6adafa11aa9f7bfd5d115 |
|
29-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for texture border color. One spot in the docs says that it's stored at a miplevel just beyond the last miplevel, which was scary. But really, you just load it as the R coordinate (which conflicts with cubemaps, but you don't do border clamping on cubes).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
64122b16ce74a3fb65269bab325c651c26ccd2d0 |
|
25-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Dump constant uniform values in VC4_DEBUG=qir. Definitely helps when trying to understand and optimize a program.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
730267eb23b418637c78662a77de0a93af91be35 |
|
28-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for texture cube maps. It's not passing some of the piglit tests, because it looks like at small miplevels some contents from surrounding faces are getting filtered in at the corners. It does get 7 new tests passing.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
7a85ebf6e211423c98bb045ad21026c5ffeaa9bb |
|
29-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Stop trying to reuse temporaries that store uniform values. Almost always, the MOV will get copy propagated out. Even if it doesn't, it's probably better to just reload the uniform at next use (to reduce register pressure) rather than try to save instruction count. I was looking at this because in the presence of texturing (which calls add_uniform() directly to get the uniform load forced into the instruction) the c->uniform_contents indices don't match 1:1 with the temporary qregs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
66b7bd60e01fd17a356bf26d69ea351261080586 |
|
24-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for point size setting. This is the support for both the global and per-vertex modes.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
8cd165051b52a9d70512fd138463aa165bea849a |
|
24-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for alpha test. Fixes most of piglit fbo-alphatest-formats (but not RGB565/332).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
45b104e0a228595142ed4bc62bbc8948100b9325 |
|
24-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for flat shading. This is just the GL 1.1 flat shading of colors -- we don't need to support TGSI constant interpolation bits, because we don't do GLSL 1.30. Fixes 7 piglit tests.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
2e48b286bf21501ac06832799a4b7957bb8ac893 |
|
23-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for 8-bit unorm/snorm vertex inputs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
dcd03e74768bb4ba55b5742250f3ed15771b6f66 |
|
19-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Use the same method as for FRAG_Z to handle fragcoord W. I need to get the non-reciprocal version of W for interpolation, anyway.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
19589147ef660c0bf7fcc52ca82dfbbadf3a9a23 |
|
18-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for stencil operations. While depth test state is passed through the fragment shader as sideband, data, the stencil test state has to be set by the fragment shader itself. Many tests are still failing, but this gets most of hiz/ passing.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
5e90ed79f670cc1c5c12c8b733d4591af0acb5ab |
|
17-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for reordering the uniform stream after optimization. This allows for introducing dead code eliminating of uniforms, copy propagation of uniforms, and instruction rescheduling between instructions that both read uniforms.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
2264925f85f349f57773d46114806a148eae6394 |
|
16-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for computed depth writes. Fixes piglit glsl-1.10-fragdepth and early-z.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
aae4223fbd2d94f922339baa11ffefdb88896770 |
|
16-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Restructure depth input/output in fragment shaders. The goal here is to have an argument for the depth write opcode so that I can do computed depth. In the process, this makes the calculations that will be emitted more obvious in the QIR.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
5638b87d4c72e0ed7bb4544885829f27ae3a91f5 |
|
15-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Dynamically allocate the TGSI-to-qreg arrays. Fixes buffer overflows in some piglit tests (which are still failing to register allocate anyway).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
2147dd96813d1faee5c55e84b332355ad05f070a |
|
15-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Fix memory leaks of struct qinst.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
d952a98c5322e64cb436bd8b0f0064441f37ac77 |
|
07-Sep-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Expose r4 to register allocation. We potentially need to be careful that use of a value stored in r4 isn't copy-propagated (or something) across another r4 write. That doesn't appear to happen currently, and this makes the dataflow more obvious. It also opens up not unpacking the r4 value, which will be useful for depth textures.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
4bca922878a4d433077d21d4918b1db71b3a15f7 |
|
13-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Merge qcompile and tgsi_to_qir The split between these two didn't make much sense. I'm going to want the chance to look at uniform contents in optimization passes, and the QPU emit I think is going to end up rewriting the uniforms stream.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
55d2a1626219ac041ce05477827b592efa1c7b81 |
|
25-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add a CSE optimization pass. Debugging a regression in discard support was just too full of duplicate instructions, so I decided to remove them instead of re-analyzing each of them as I dumped their outputs in simulation.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
80b27ca2cd8cd2bb2937baa441c43a396887cc03 |
|
24-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Switch to using native integers. There were troubles with bools without using native integers (st_glsl_to_tgsi seemed to think bool true was 1.0f sometimes, when as a uniform it's stored as ~0), and since I've got native integers other than divide, I might as well just support them.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
874dfa8b2ecccf3c9a73453d7ccc6638363a59bd |
|
25-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Expose compares at a lower level in QIR. Before, we had some special opcodes like CMP and SNE that emitted multiple instructions. Now, we reduce those operations significantly, giving optimization more to look at for reducing redundant operations. The downside is that QOP_SF is pretty special -- we're going to have to track it separately when we're doing instruction scheduling, and we want to peephole it into the instruction generating the destination write in most cases (and not allocate the destination reg, probably. Unless it's used for some other purpose, as well).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
e51e20c35ef89409494161010f86750366faef4c |
|
21-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for fragment discards. Fixes piglit glsl-fs-discard-01 and -03, and allows a lot of mesa demos to start running. glsl-fs-discard-02 has a problem where the first tile is not getting stored on the first render.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
b0a1e401a93b7b13870b936bc667b3fc15dba6d5 |
|
19-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Include stdio/stdlib in headers so I don't have to include it per file. There are a few tools I want to have always available, and fprintf() and abort() are among them.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
50b4293eb3f26609c28c37898877d15e3c597702 |
|
18-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add a helper for QOP_R4_UNPACK_[ABCD].
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
7c65b714ed974248f09dcc0b4f020b2e2bf50227 |
|
14-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for blending. Passes blendminmax and blendsquare. glean's more serious blendFunc fails in simulation due to binner memory overflow (I really need to work around that), and fbo-blending-formats fails due to Mesa refusing one of the getter requests, even before it could fail due to the driver not actually supporting different formats yet.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
e63598aecb5d1cc2a20b8db1ef85790e301f4241 |
|
05-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for gl_FragCoord. This isn't passing all tests (glsl-fs-fragcoord-zw-ortho, for example), but it does get a bunch more tests passing. v2: Rebase on helpers change.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
99070c6daad72c96d517ffb18185c8b21b9d67f2 |
|
02-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add proper translation from Zc to Zs for vertex output. This fixes the remaining failure in depthfunc.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
4160ac5ee41630a5c9fc4e1f3520f0fabf42cb14 |
|
01-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for depth clears and tests within a tile. This doesn't load/store the Z contents across submits yet. It also disables early Z, since it's going to require tracking of Z functions across multiple state updates to track the early Z direction and whether it can be used. v2: Move the key setup to before the search for the key.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
857dcc09fa89aa676fdc95d318ecc4f7ad9cd70a |
|
17-Jul-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for texture rectangles v2: Rebase on helpers change.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
66c6c401279aa4152a24681f64d0e101aa004593 |
|
15-Jul-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for texturing (under simulation) Only rgba8888 works, and only a single texture unit, and it's only under simulation because I haven't built the kernel interface yet. v2: Rebase on helpers. v3: Fold in the don't-break-the-arm-build fix.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
bf542cd37286decbd9fc0c939007b82176e16a81 |
|
05-Aug-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for the TGSI TRUNC opcode. v2: Rebase on helpers.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
75afa64ef85aa33dfed8325aae767f8a55fd1840 |
|
17-Jul-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for multiple attributes
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
6ff2129d5842898eb87d2a457ee018fe66471f06 |
|
16-Jul-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for the lit opcode. v2: Fix how it was using the X channel for the real work of the opcode, instead of Y. Fixes glean's LIT test. v3: Rebase on the helpers.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
ec9da314baf11bea57f315346091ae941ac4f662 |
|
04-Jul-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add copy propagation between temps. We put in a bunch of extra MOVs for program outputs, and this can clean those up. We should do uniforms, too, though. v2: Fix missing flagging of progress when we actually optimize. Caught by Aaron Watry.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
d9d1c14430aaeb5b22aa66b269ba288e3df24103 |
|
04-Jul-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add dead code elimination. This cleans up a bunch of noise in the compiled coordinate shaders (since we don't need the varying outputs), and also from writemasked instructions with negated src operands.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
1d23d55ae97d07b6eb70a3e37a91ecb7de38d8d2 |
|
03-Jul-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add an initial pass of algebraic optimization. There was a lot of extra noise in my piglit shader dumps because of silly CMPs.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
4c53087c67a266d81653459baa08ece5c1b418d8 |
|
16-Jul-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for CMP. This took a couple of tries, and this is the squash of those attempts. v2: Fix register file conflicts on the args in the destination-is-accumulator case. v3: Rebase on helper change and qir_inst4 change.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
eea1d36915cb97ee1a6eb6aeaf15dd5689f03148 |
|
04-Jul-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Make scheduling of NOPs a separate step from QIR -> QPU translation. This should also be used as a way to pair QIR instructions into QPU instructions later.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
c29392751180e21a2857cade8d0b4902cbe9d001 |
|
04-Jul-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add WIP support for varyings. It doesn't do all the interpolation yet, but more tests can run now. v2: Rebase on helpers.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
2e35981d4d625d951328ef5b8f95798112997fb3 |
|
28-Jun-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Add support for SNE/SEQ/SGE/SLT.
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|
792d1c92df6f58f219eb8b77e668424cdcc9c9af |
|
27-Jun-2014 |
Eric Anholt <eric@anholt.net> |
vc4: Switch to actually generating vertex and fragment shader code from TGSI. This introduces an IR (QIR, for QPU IR) to do optimization on. It's a scalar, SSA IR in general. It looks like optimization is pretty easy this way, though I haven't figured out if it's going to be good for our weird register allocation or not (or if I want to reduce to basically QPU instructions first), and I've got some problems with it having some multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably want to break down. Of course, this commit mostly doesn't work, since many other things are still hardwired, like the VBO data. v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR instructions into temporary values, and make qir_inst4 take the 4 args separately instead of an array (all later callers wanted individual args).
/external/mesa3d/src/gallium/drivers/vc4/vc4_qir.h
|