9919542f1cfff70524bc6117d19bf88e59159caa |
|
15-Jan-2017 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Make DCE set null destinations on messages with side effects. (Co-authored by Matt Turner.) Image atomics, for example, return a value - but the shader may not want to use it. We assigned a useless VGRF destination. This seemed harmless, but it can actually be quite harmful. The register allocator has to assign that VGRF to a real register. It may assign the same actual GRF to the destination of an instruction that follows soon after. This results in a write-after-write (WAW) dependency, and stall. A number of "Deus Ex: Mankind Divided" shaders use image atomics, but don't use the return value. Several of these were hitting WAW stalls for nearly 14,000 (poorly estimated) cycles a pop. Making dead code elimination null out the destination avoids this issue. This patch cuts one shader's estimated cycles by -98.39%! Removing the message response should also help with data cluster bandwidth. On Skylake: (instruction counts remain identical) total cycles in shared programs: 255413890 -> 248081010 (-2.87%) cycles in affected programs: 12019948 -> 4687068 (-61.01%) helped: 24 HURT: 10 v2: Make can_omit_write independent of can_eliminate (Curro). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
90bf39cd2b39874557a7c492d92b85945d45f3c6 |
|
15-Dec-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Combine some dead code elimination NOP'ing code. In theory we might have incorrectly NOP'd instructions that write the flag, but where that flag value isn't used, and yet the instruction either writes the accumulator or has side effects. I don't believe any such instructions exist, so this is mostly a code cleanup. Curro pointed out that FS_OPCODE_FB_WRITE has a null destination and actually writes the flag on Gen4-5 to dynamically decide whether to write some payload data. The hunk removed in this patch might have NOP'd it, except that we don't actually mark flags_written() in the IR, so it doesn't think the flag is touched at all. That's sketchy, but it means it wouldn't hit this today (though there are likely other problems!). v2: Properly replace the inst->regs_written() check in the second hunk with the flag being live (mistake caught by Curro). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
be5f53e769deb936509efd6f0576b15b7a5432b9 |
|
18-Jan-2017 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Make DCE explicitly not eliminate any control flow instructions. According to Matt, the dead code pass explicitly avoided IF and WHILE because on Sandybridge, these could have conditional modifiers and null destination registers. Normally, those instructions use BAD_FILE for the destination register. Nowadays, we don't do that anymore, so we could technically drop these checks. However, it's clearer to explicitly leave control flow instructions alone, so change it to the more generic !inst->is_control_flow(). This should have no actual change. [This patch implements review feedback from Curro and Matt.] Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
4d4335c81a3f7d8434d9983881a63abcbc29dd5c |
|
11-Oct-2016 |
Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> |
i965/fs: fill allocated memory with zeros where needed Signed-off-by: Marek Olšák <marek.olsak@amd.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
0bc46cc9619b8ae43e7a7c96bfe91f19371d301d |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Simplify result_live calculation in dead_code_eliminate(). No need to unroll the first iteration of the loop manually. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
c458eeb94620fbce0a37474fc292545002d67f76 |
|
08-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Add wrapper functions for fs_inst::regs_read and ::regs_written. This is in preparation for dropping fs_inst::regs_read and ::regs_written in favor of more accurate alternatives expressed in byte units. The main reason these wrappers are useful is that a number of optimization passes implement dataflow analysis with register granularity, so these helpers will come in handy once we've switched register offsets and sizes to the byte representation. The wrapper functions will also make sure that GRF misalignment (currently neglected by most of the back-end) is taken into account correctly in the calculation of regs_read and regs_written. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
7d430fc05e8f0a6211fb587f1bc7b2a76ed7de10 |
|
19-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
0fec265373f269d116f6d4de900b208fffabe2a1 |
|
19-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Track flag register liveness with byte granularity. This is required for correctness in presence of multiple 8-wide flag writes (e.g. 8-wide instructions with a conditional mod set) which update a different portion of the same 16-bit flag subregister. Right now we keep track of flag dataflow with 16-bit granularity and consider flag writes to have killed any previous definition of the same subregister even if the write was less than 16 channels wide, which can cause live flag register updates to be dead code-eliminated incorrectly. Additionally this makes sure that we handle 32-wide flag writes and reads which may span multiple flag subregisters so the current approach of just setting/testing a single bit from the live set wouldn't have worked. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
7e6a6f3e619e7dfed244043a95082f2168a5c953 |
|
30-Nov-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Do dead-code elimination in a single pass. The first pass marked dead instructions as opcode = NOP, and a second pass deleted those instructions so that the live ranges used in the first pass wouldn't change. But since we're walking the instructions in reverse order, we can just do everything in one pass. The only thing we have to do is walk the blocks in reverse as well. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
48b4e88d3d2cfa2ccd912184cfdcbe559cd36ff0 |
|
26-Nov-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Don't mark dead instructions' sources live. Removes dead code from glsl-mat-from-int-ctor-03.shader_test. Reported-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
b163aa01487ab5f9b22c48b7badc5d65999c4985 |
|
27-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Rename GRF to VGRF. The 2-bit hardware register file field is ARF, GRF, MRF, IMM. Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to mean an assigned general purpose register. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
c3d7caa1e006f00c3544a79a0be7d78904ce4177 |
|
22-Oct-2015 |
Alejandro Piñeiro <apinheiro@igalia.com> |
i965: check inst->predicate when clearing flag_live at dead code eliminate Detected by Matt Turner while reviewing commit a59359ecd22154cc2b3f88bb8c599f21af8a3934 Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
a3ee6c7d1991a90d22fae992c1cb94123e51ae54 |
|
06-Feb-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Remove dependency of fs_inst on the visitor class. The fs_visitor argument of fs_inst::regs_read() wasn't used at all. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
3759a89ad358139ef981bd5d46261ee115762b94 |
|
22-Aug-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Eliminate null-dst instructions without side-effects. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
b37273b92431a2d986235774f04a9fba2aa1bf74 |
|
29-Oct-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Use const fs_reg & rather than a copy or pointer. Also while we're touching var_from_reg, just make it an inline function. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
60d507c3c5c7caed57119df0ab4d824ad1ea85dc |
|
29-Oct-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Dead code eliminate instructions writing the flag. Most prominently helps Natural Selection 2, which has a surprising number shaders that do very complicated things before drawing black. instructions in affected programs: 21052 -> 16978 (-19.35%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
13f660158573846d6b1bc30ed4c61d97405bea58 |
|
29-Oct-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Use local pointer to block_data in live intervals. The next patch will be simplified because of this, and makes reading the code a lot easier. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
ab7234c8520499fcfeed153e0aefeb6b43758d1f |
|
09-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use the var_from_vgrf helper function instead of doing it manually Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
c24dd54f973d1a42b0e2cc81aa219bb58f7523d9 |
|
24-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Fix a bug with dead_code_eliminate on large writes Previously, if an instruction wrote to more than one register, we implicitly assumed that it filled the entire register. We never hit this before because the only time we did multi-register writes was things like texturing which always wrote to all of the registers. However, with the upcoming ability to do 16-wide instructions in SIMD8 and things of that nature, we can have multi-register writes at offsets and we'll hit this. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
072ea414d04f1b9a7bf06a00b9011e8ad521c878 |
|
01-Sep-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Remove cfg-invalidating parameter from invalidate_live_intervals. Everything has been converted to preserve the CFG. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
20a849b4aa63c7fce96b04de674a4c70f054ed9c |
|
13-Jul-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Use basic-block aware insertion/removal functions. To avoid invalidating and recreating the control flow graph. Also stop invalidating the CFG in places we didn't add or remove an instruction. cfg calculations: 202951 -> 80307 (-60.43%) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
596990d91e2a4c4a3a303c6c2da623bf1840771b |
|
12-Jul-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Add and use foreach_block macro. Use this as an opportunity to rename 'block_num' to 'num'. block->num is clear, and block->block_num has always been redundant.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
680fe0acb3e6569f7b9aab1913e9181d5a7eee2f |
|
12-Jul-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Add cfg to backend_visitor. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
2e90d1fb62a6ef53c15eff76e242c510145178a9 |
|
29-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Pass cfg to calculate_live_intervals(). We've often created the CFG immediately before, so use it when available. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
266109736a9a69c3fdbe49fe1665a7a63c5cc122 |
|
25-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Use typed foreach_in_list_safe instead of foreach_list_safe. Acked-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
bc2fbbafd216676ccc7c3abd794ecb7dd1fa631f |
|
24-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Add and use foreach_inst_in_block macros. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
b1dcdcde2e323f960833f5c7da65d5c2c20113c9 |
|
17-Mar-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Loop from 0 to inst->sources, not 0 to 3. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
306ed81b9363721058c568244f9860c5c8c819f4 |
|
04-Apr-2014 |
Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> |
i965: Add writes_accumulator flag Our hardware has an "accumulator" register, which can be used to store intermediate results across multiple instructions. Many instructions can implicitly write a value to the accumulator in addition to their normal destination register. This is enabled by the "AccWrEn" flag. This patch introduces a new flag, inst->writes_accumulator, which allows us to express the AccWrEn notion in the IR. It also creates a n ALU2_ACC macro to easily define emitters for instructions that implicitly write the accumulator. Previously, we only supported implicit accumulator writes from the ADDC, SUBB, and MACH instructions. We always enabled them on those instructions, and left them disabled for other instructions. To take advantage of the MAC (multiply-accumulate) instruction, we need to be able to set AccWrEn on other types of instructions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
18d12336b964cad54bbc0780380c3dcf625abb3d |
|
14-Apr-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Clear variable from live-set if it's completely overwritten. One program affected: instructions in affected programs: 246 -> 244 (-0.81%) Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|
f34f39330bb41fb0a86930908de10353193a841d |
|
13-Apr-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Reimplement dead_code_elimination(). total instructions in shared programs: 1653399 -> 1651790 (-0.10%) instructions in affected programs: 92157 -> 90548 (-1.75%) GAINED: 2 LOST: 2 Also significantly reduces the number of optimization loop iterations: total loop iterations in shared programs: 39724 -> 31651 (-20.32%) loop iterations in affected programs: 21617 -> 13544 (-37.35%) Including some great pathological cases, like 29 -> 3 in Strike Suit Zero and 24 -> 3 in Dota2. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
|