7bed52bb5fb4cfd5f91c902a654b3452f921da17 |
|
29-Nov-2016 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Reject copy propagation into SEL if not min/max. We shouldn't ever see a SEL with conditional mod other than GE (for max) or L (for min), but we might see one with predication and no conditional mod. total instructions in shared programs: 8241806 -> 8241902 (0.00%) instructions in affected programs: 13284 -> 13380 (0.72%) HURT: 62 total cycles in shared programs: 84165104 -> 84166244 (0.00%) cycles in affected programs: 75364 -> 76504 (1.51%) helped: 10 HURT: 34 Fixes generated code in at least Sanctum 2, Borderlands 2, Goat Simulator, XCOM: Enemy Unknown, and Shogun 2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92234 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
6014da50ec41d1ad43fec94a625962ac3f2f10cb |
|
28-Nov-2016 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Rename opt_copy_propagate -> opt_copy_propagation. Matches the vec4 backend, cmod propagation, and saturate propagation. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
4d4335c81a3f7d8434d9983881a63abcbc29dd5c |
|
11-Oct-2016 |
Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> |
i965/fs: fill allocated memory with zeros where needed Signed-off-by: Marek Olšák <marek.olsak@amd.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
f2d2156ba225a844723443d6f4356454e72112e0 |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Move region_contained_in to the IR header and fix for non-VGRF files. Also changed the argument names since 'src' and 'dst' don't make that much sense outside of the context of copy propagation. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
645261c4b2a12b5724946f9f6d35f74e28ce760f |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Change region_contained_in() to use byte units. This makes the function less annoying to use and more accurate -- We shouldn't propagate a copy into a register region that wasn't fully contained in the destination of the copy (IOW, a source region that wasn't fully defined by the copy) just because the number of registers written and read by each instruction happened to get rounded up to the same GRF multiple. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
1c67e272475f576c8ab4b2be367f4c3c664cb23c |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Simplify copy propagation LOAD_PAYLOAD ACP setup. By keeping track of 'offset' in byte units. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
b42c13a5b8ac7d643bbf4c1592607811a81b4ebb |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Drop fs_inst::overwrites_reg() in favor of regions_overlap(). fs_inst::overwrites_reg is rather easy to misuse because it cannot tell how large the register region starting at 'reg' is, so in cases where the destination region starts after 'reg' it may give a misleading result. regions_overlap() is somewhat more verbose to use but handles arbitrary overlap correctly so it should generally be used instead. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
5cc6425d708a9b8c660c2f43f5e277c507c98bf0 |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix can_propagate_from() source/destination overlap check. The previous overlap condition only made sure that the VGRF numbers or GRF-aligned offsets were different without taking the amount of data written and read by the instruction into consideration. Use the regions_overlap() helper instead. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
e1a918ba7be6b21303caa2d81671f2d3f17dd692 |
|
08-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Replace fs_inst::regs_read with ::size_read using byte units. The previous regs_read value can be recovered by rewriting each reference of regs_read() like 'x = i.regs_read(j)' to 'x = DIV_ROUND_UP(i.size_read(j), reg_unit)'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
69570bbad876bb9da609c3b651aacda28cecc542 |
|
07-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Replace fs_inst::regs_written with ::size_written field in bytes. The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
be095e11e41158f91bcb3f6fcbc2e2a91a5d9124 |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Replace fs_reg::subreg_offset with fs_reg::offset expressed in bytes. The fs_reg::subreg_offset and ::offset fields are now redundant, the sub-GRF offset can just be added to the single ::offset field expressed in byte units. The current subreg_offset value can be recovered by applying the following rule: Replace each rvalue reference of subreg_offset like 'x = r.subreg_offset' with 'x = r.offset % reg_unit', and each lvalue reference like 'r.subreg_offset = x' with 'r.offset = ROUND_DOWN_TO(r.offset, reg_unit) + x'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
86944e063ad40cac0860bfd85a3cc4e9a9805aa3 |
|
01-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes. The fs_reg::offset field in byte units introduced in this patch is a more straightforward alternative to the current register offset representation split between fs_reg::reg_offset and ::subreg_offset. The split representation makes it too easy to forget about one of the offsets while dealing with the other, which has led to multiple back-end bugs in the past. To make the matter worse the unit reg_offset was expressed in was rather inconsistent, for uniforms it would be expressed in either 4B or 16B units depending on the back-end, and for most other things it would be expressed in 32B units. This encodes reg_offset as a new offset field expressed consistently in byte units. Each rvalue reference of reg_offset in existing code like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and each lvalue reference like 'r.reg_offset = x' is rewritten to 'r.offset = r.offset % reg_unit + x * reg_unit'. Because the change affects a lot of places and is rather non-trivial to verify due to the inconsistent value of reg_unit, I've tried to avoid making any additional changes other than applying the rewrite rule above in order to keep the patch as simple as possible, sometimes at the cost of introducing obvious stupidity (e.g. algebraic expressions that could be simplified given some knowledge of the context) -- I'll clean those up later on in a second pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
527f37199929932300acc1688d8160e1f3b1d753 |
|
23-Aug-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
intel: s/brw_device_info/gen_device_info/ Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
7244dc1e0651958b62222cafb15e34487851a6cd |
|
03-Jun-2016 |
Francisco Jerez <currojerez@riseup.net> |
Revert "i965/fs: Allow scalar source regions on SNB math instructions." This reverts commit c1107cec44ab030c7fcc97c67baa12df1cc9d7b5. Apparently the hardware spec text I quoted in the commit message was outright lying about scalar source math being supported on SNB, the hardware seems to load 32 contiguous bits of data for each channel regardless of the regioning mode. Fixes regressions in the following CTS tests (which we didn't catch early due to CTS being temporarily disabled in our CI system): es2-cts.gtf.gl.atan.atan_vec3_frag_xvary es2-cts.gtf.gl.cos.cos_vec2_frag_xvary es2-cts.gtf.gl.atan.atan_vec2_frag_xvary es2-cts.gtf.gl.pow.pow_vec2_frag_xvary_yconsthalf es2-cts.gtf.gl.cos.cos_float_frag_xvary es2-cts.gtf.gl.pow.pow_float_frag_xvary_yconsthalf es2-cts.gtf.gl.atan.atan_vec3_frag_xvaryyvary es2-cts.gtf.gl.pow.pow_vec3_frag_xvary_yconsthalf es2-cts.gtf.gl.cos.cos_vec3_frag_xvary es2-cts.gtf.gl.atan.atan_vec2_frag_xvaryyvary Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96346 Reported-by: Mark Janes <mark.a.janes@intel.com> Acked-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
c1107cec44ab030c7fcc97c67baa12df1cc9d7b5 |
|
28-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Allow scalar source regions on SNB math instructions. I haven't found any evidence that this isn't supported by the hardware, in fact according to the SNB hardware spec: "The supported regioning modes for math instructions are align16, align1 with the following restrictions: - Scalar source is supported. [...] - Source and destination offset must be the same, except the case of scalar source." Cc: "12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
a9f00a9e535019747d041f4121c56404057465a3 |
|
27-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Generalize regions_overlap() from copy propagation to handle non-VGRF files. This will be useful in several places. The only externally visible difference (other than non-VGRF files being supported now) is that the region sizes are now passed in byte units instead of in GRF units because the loss of precision would have become a problem in the SIMD lowering pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
48d743c5019076056739561f979e7101c04acf21 |
|
30-Apr-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Allow constant propagation into logical send sources. Logical sends are eventually lowered into a series of copies so they can take almost anything as source. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
ad8f66ed33172ab40d4679063780a501b6f80740 |
|
27-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix multiple ACP interference during copy propagation. This is more fallout from cf375a3333e54a01462f192202d609436e5fbec8. It's possible for multiple ACP entries to interfere with a given VGRF write, so we need to continue iterating even if an overlapping entry has already been found. Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
0bc5ad8d1997fe33dd43bb476c67163039f065ff |
|
20-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Avoid constant propagation when the type sizes don't match. The case where the source type of the instruction is smaller than the immediate type could be handled by calculating the portion of the immediate read by the instruction (assuming that the source channels are aligned with the destination channels of the copy) and then representing the same value as an immediate of the source type (assuming such an immediate type exists), but the code below doesn't do that, so just bail for the moment. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
2db9dd5aeb9566c8480651989981cb1169957748 |
|
24-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix off-by-one region overlap comparison in copy propagation. This was introduced in cf375a3333e54a01462f192202d609436e5fbec8 but the blame is mine because the pseudocode I sent in my review comment for the original patch suggesting to do things this way already had the off-by-one error. This may have caused copy propagation to be unnecessarily strict while checking whether VGRF writes interfere with any ACP entries and possibly miss valid optimization opportunities in cases where multiple copy instructions write sequential locations of the same VGRF. Cc: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
9149fd681735b02e421b8cd9e7cea92f039d8590 |
|
11-Mar-2016 |
Iago Toral Quiroga <itoral@igalia.com> |
i965/fs: fix copy/constant propagation regioning checks We were not accounting for subreg_offset in the check for the start of the region. Also, fs_reg::regs_read() already takes the stride into account, so we should not multiply its result by the stride again. This was making copy-propagation fail to copy-propagate cases that would otherwise be safe to copy-propagate. Again, this was observed in fp64 code, since there we use stride > 1 often. v2 (Sam): - Rename function and add comment (Jason, Curro). - Assert that register files and number are the same (Jason). - Fix code to take into account the assumption that src.subreg_offset is strictly less than the reg_offset unit (Curro). - Don't pass the registers by value to the function, use 'const fs_reg &' instead (Curro). - Remove obsolete comment in the commit log (Curro). v3 (Sam): - Remove the assert and put the condition in the return (Curro). - Fix function name (Curro). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
789eecdb79d899a070507355ecb4dc137600f700 |
|
12-Apr-2016 |
Iago Toral Quiroga <itoral@igalia.com> |
i965/fs: fix copy propagation from load payload We were not considering the case where the load payload is writing to a destination with a reg_offset > 0. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
cf375a3333e54a01462f192202d609436e5fbec8 |
|
11-Mar-2016 |
Iago Toral Quiroga <itoral@igalia.com> |
i965/fs: fix copy propagation of partially invalidated entries We were not invalidating entries with a src that reads more than one register when we find writes that overwrite any register read by entry->src after the first. This leads to incorrect copy propagation because we re-use entries from the ACP that have been partially invalidated. Same thing for entries with a dst that writes to more than one register. v2 (Sam): - Improve code by defining regions_overlap() and using it instead of a loop (Curro). Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
ea1ef49a16dc429c50ece388e92bf206ccf282a7 |
|
11-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Reindent register offset calculation of try_copy_propagate(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
0fb19806c069cbf34aaf02e77f5ae37a9e4cf3b0 |
|
11-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Simplify and fix register offset calculation of try_copy_propagate(). try_copy_propagate() was special-casing UNIFORM registers (the BAD_FILE, ARF and FIXED_GRF cases are dead, see the assertion at the top of the function) and then failing to take into account the possibility of the instruction reading from a non-zero offset of the destination of the copy. The VGRF/ATTR handling takes it into account correctly, and there is no reason we couldn't use the exact same logic for the UNIFORM file aside from the fact that uniforms represent reg_offset in different units. We can work around that easily by defining an additional constant with the right unit reg_offset is expressed in. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
7aa53cd725cc2287fc206033120e08cde74cde2a |
|
23-Mar-2016 |
Iago Toral Quiroga <itoral@igalia.com> |
i965/fs: disallow type change in copy-propagation if types have different sizes Because the semantics of source modifiers are type-dependent, the type of the original source of the copy must be kept unmodified while propagating it into some instruction, which implies that we need to have the guarantee that the meaning of the instruction is going to remain the same after we have changed the types. Whenthe size of the new type is different from the size of the old type the new and old instructions cannot possibly be equivalent because the new instruction will be reading more data than the old one was. Prevents that we turn this: load_payload(8) vgrf17:DF, |vgrf4+0.0|:DF 1sthalf mov(8) vgrf18:DF, vgrf17:DF 1sthalf load_payload(8) vgrf5:DF, vgrf18:DF, vgrf20:DF NoMask 1sthalf WE_all load_payload(8) vgrf21:UD, vgrf5+0.4<2>:UD 1sthalf mov(8) vgrf22:UD, vgrf21:UD 1sthalf into: load_payload(8) vgrf17:DF, |vgrf4+0.0|:DF 1sthalf mov(8) vgrf18:DF, |vgrf4+0.0|:DF 1sthalf load_payload(8) vgrf5:DF, |vgrf4+0.0|:DF, |vgrf4+2.0|:DF NoMask 1sthalf WE_all load_payload(8) vgrf21:UD, vgrf5+0.4<2>:UD 1sthalf mov(8) vgrf22:DF, |vgrf4+0.4|<2>:DF 1sthalf where the semantics of the last instruccion have changed. v2 (Curro): - Update commit log and add comment to explain the problem better. - Simplify the condition. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
ac9b966aac4d0276de889990f3b170e0b939c542 |
|
18-Jan-2016 |
Iago Toral Quiroga <itoral@igalia.com> |
i965/fs: Fix copy propagation of load payload for double operands Specifically, consider the size of the data type of the operand to compute the number of registers written. v2 (Sam): - Fix line width (Jordan). - Add an assert (Jordan). - Use REG_SIZE in the calculation of regs_written (Curro) v3 (Sam): - Fix assert and calculation of regs_written (Curro). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
70dc19f9d628f0459db93466fbf65af1bfe75af1 |
|
26-Apr-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix propagation of copies with strided source. This has likely been broken since we started propagating copies not matching the offset of the instruction exactly (1728e74957a62b1b4b9fbb62a7de2c12b77c8a75). The copy source stride needs to be taken into account to find out the offset at the origin that corresponds to the offset at the destination of the copy which is being read by the instruction. This has led to program miscompilation on both my SIMD32 branch and Igalia's FP64 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
ba582e58cd30c815137a11c9497b01d97842e525 |
|
05-May-2016 |
Connor Abbott <cwabbott0@gmail.com> |
i965/fs: add PACK opcode Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
0f2e227d5ce6de4697ba94ed57f5ff7ca2d86f69 |
|
03-Aug-2015 |
Connor Abbott <connor.w.abbott@intel.com> |
i965/fs: don't propagate 64-bit immediates They can only be used with 1-src instructions, which practically (since we should've constant-propagated away all 1-src instructions with 64-bit immediates in NIR) means that they must be kept in separate MOV's and can't be propagated. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
1cc7573162a7f0e8346d7abab50890c58a0dce9a |
|
28-Apr-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965: Pass devinfo pointer to is_3src() helpers. This is not strictly required for the following changes because none of the three-source opcodes we support at the moment in the compiler back-end has been removed or redefined, but that's likely to change in the future. In any case having hardware instructions specified as a pair of hardware device and opcode number explicitly in all cases will simplify the opcode look-up interface introduced in a subsequent commit, since the opcode number alone is in general ambiguous. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
9881eab197c70b85346d682b525b8ea9ed241862 |
|
17-Mar-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Don't constant-fold RCP No shader-db changes on Broadwell Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
c70b7c80e3e1dea150ece96e02ef3d364284812d |
|
27-Feb-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965: Don't try copy propagation if constant propagation succeeded. It cannot get any better. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
e2060aaf57f4146b1a0faec55f5f2e45190d427e |
|
16-Feb-2016 |
Rob Clark <robdclark@gmail.com> |
i965: fix new gcc6 warnings src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp:244:1: warning: ‘void {anonymous}::fs_copy_prop_dataflow::dump_block_data() const’ defined but not used [-Wunused-function] fs_copy_prop_dataflow::dump_block_data() const ^~~~~~~~~~~~~~~~~~~~~ From looking at git history, it looks like this is intended to be unused (ie. just for adding on-demand debug prints) Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
f36993b46962eab4446bc1964eb47149751aee26 |
|
23-Nov-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Clean up #includes in the compiler. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
2d8c5299032d229c8f6e936db5644cd53716e6c1 |
|
20-Nov-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Prevent implicit upcasts to brw_reg. Now that backend_reg inherits from brw_reg, we have to be careful to avoid the object slicing problem. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
d982922b184930a4ceed1d97b772cce5c371865d |
|
14-Aug-2015 |
Connor Abbott <connor.w.abbott@intel.com> |
i965/fs: add stride restrictions for copy propagation There are various restrictions on what the hstride can be that depend on the Gen, and now that we're using hstride == 2 for packing/unpacking doubles, we're going to run into these restrictions a lot more often. Pull them out into a separate function, and move the one restriction we checked previously into it. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
b3315a6f56fb93f2884168cbf9358b2606641db5 |
|
27-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Replace HW_REG with ARF/FIXED_GRF. HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to look at different fields for type, abs, negate, writemask, swizzle, and a second file. They also caused annoying problems like immediate sources being considered scheduling barriers (commit 6148e94e2) and other such nonsense. Instead use ARF/FIXED_GRF/MRF for fixed registers in those files. After a sufficient amount of time has passed since "GRF" was used, we can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
b163aa01487ab5f9b22c48b7badc5d65999c4985 |
|
27-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Rename GRF to VGRF. The 2-bit hardware register file field is ARF, GRF, MRF, IMM. Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to mean an assigned general purpose register. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
7638e75cf99263c1ee8e31c6cc5a319feec2c943 |
|
26-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Use brw_reg's nr field to store register number. In addition to combining another field, we get replace silliness like "reg.reg" with something that actually makes sense, "reg.nr"; and no one will ever wonder again why dst.reg isn't a dst_reg. Moving the now 16-bit nr field to a 16-bit boundary decreases code size by about 3k. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
1392e45bfb396ccbfa5bb0c6063522e0550988d3 |
|
24-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Use immediate storage in inherited brw_reg. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
e42fb0c2a687cdcd6af2a590f6f5e24f64cfff3b |
|
23-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Make 'dw1' and 'bits' unnamed structures in brw_reg. Generated by sed -i -e 's/\.bits\././g' *.c *.h *.cpp sed -i -e 's/dw1\.//g' *.c *.h *.cpp and then reverting changes to comments in gen7_blorp.cpp and brw_fs_generator.cpp. There wasn't any utility offered by forcing the programmer to list these to access their fields. Removing them will reduce churn in future commits. This is C11 (and gcc has apparently supported it for sometime "compatibility with other compilers") See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
7c81a6a647257c309cb1ca36c60aa4bfa8e2e022 |
|
26-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Replace default case with list of enum values. If we add a new file type, we'd like to get warnings if it's not handled. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
8f60dc83f7edba51037662c2637f830feeea3fc6 |
|
21-Oct-2015 |
Kristian Høgsberg Kristensen <krh@bitplanet.net> |
i965/fs: Allow copy propagating into new surface access opcodes Reviewed-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
9e17c36b8ba79e688011a5fd293ad5f42da21b66 |
|
14-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Extract can_change_source_types() functions. Make them members of fs_inst/vec4_instruction for use elsewhere. Also fix the fs version to check that dst.type == src[1].type and for !saturate. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
2ace64fd598816fd1be9877962734242fc27b87b |
|
03-Sep-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix copy propagation type changes. commit 472ef9a02f2e5c5d0caa2809cb736a0f4f0d4693 introduced code to change the types of SEL and MOV instructions for moves that simply "copy bits around". It didn't account for type conversion moves, however. So it would happily turn this: mov(8) vgrf6:D, -vgrf5:D mov(8) vgrf7:F, vgrf6:UD into this: mov(8) vgrf6:D, -vgrf5:D mov(8) vgrf7:D, -vgrf5:D which erroneously drops the conversion to float. Cc: "11.0 10.6" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
8f5d0988ea2ccaba7f049f113b652f331524d2a6 |
|
04-Aug-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965: Define virtual instruction to calculate the high 32 bits of a multiply. This instruction will translate to the MUL/MACH sequence that computes the high 32-bits of the result of a 64-bit multiply. Before Gen8 integer operations that used the accumulator were limited to 8-wide, but the SIMD lowering pass can easily be hooked up to sidestep this limitation, we just need a virtual opcode to represent the MUL/MACH sequence in the IR. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
f68ec2baf49e37f9ce4fffe95f13177eb7225015 |
|
13-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Make sure that the type sizes are compatible during copy propagation. It's surprising that we weren't checking for this already. A future patch will cause code like the following to be emitted: MOV(16) tmp<1>:uw, src MOV(8) dst<1>:ud, tmp<8,8,1>:ud The second MOV comes from the expansion of a LOAD_PAYLOAD header copy, so I don't have control over its types. Copy propagation will happily turn this into: MOV(8) dst<1>:ud, src Which has different semantics. Fix it by preventing propagation in cases where a single channel of the instruction would span several channels of the copy (this requirement could in fact be relaxed if the copy is just a trivial memcpy, but this case is unusual enough that I don't think it matters in practice). I'm deliberately only checking if the type of the instruction is larger than the original, because the converse case seems to be handled correctly already in the code below. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
830f67046ace3c0b95a7f093fe373eeb417a1aad |
|
18-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Remove the width field from fs_reg As of now, the width field is no longer used for anything. The width field "seemed like a good idea at the time" but is actually entirely redundant with the instruction's execution size. Initially, it gave us the ability to easily set the instructions execution size based entirely on register widths. With the builder, we can easiliy set the sizes explicitly and the width field doesn't have as much purpose. At this point, it's just redundant information that can get out of sync so it really needs to go. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
81deefc45ba7b7d3b2b5e7ccf9e1680df6e31e3a |
|
15-May-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Unrestrict constant propagation into integer multiply. Gen8+'s MUL instruction doesn't ignore the high 16-bits of one source like on earlier platforms, so we can constant propagate into it without worry. Integer multiplies (not into the accumulator, which is done for imul_high) are lowered in lower_integer_multiplication(), so it's safe there as well. On Broadwell, fragment shaders only: total instructions in shared programs: 4377769 -> 4377451 (-0.01%) instructions in affected programs: 48064 -> 47746 (-0.66%) helped: 156 On Broadwell, vertex shaders only: total instructions in shared programs: 2858885 -> 2856313 (-0.09%) instructions in affected programs: 26380 -> 23808 (-9.75%) helped: 134 On Broadwell, vertex shaders only (with INTEL_USE_NIR=1): total instructions in shared programs: 2911688 -> 2865984 (-1.57%) instructions in affected programs: 1421715 -> 1376011 (-3.21%) helped: 6186 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
0c0ca557117edd3a57443f4f454c3a8da1d4e0b5 |
|
10-Mar-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Allow copy propagation on ATTR file registers. This especially helps with NIR because we currently emit MOVs at the top of the shader to copy from various ATTR registers to a giant VGRF array of all inputs. (This could potentially be done better, but since there's only ever one write to each register, it should be trivial to copy propagate away...) With NIR - only vertex shaders: total instructions in shared programs: 3129373 -> 2889581 (-7.66%) instructions in affected programs: 3119717 -> 2879925 (-7.69%) helped: 20833 Without NIR - only vertex shaders: total instructions in shared programs: 2745901 -> 2724483 (-0.78%) instructions in affected programs: 693426 -> 672008 (-3.09%) helped: 3516 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
7a75b55a01d355090d186357896e3cb141b9775e |
|
02-Apr-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs_inst: Get rid of the effective_width field The effective_width field was an ill-concieved hack to get around issues in the LOAD_PAYLOAD instruction. Now that the LOAD_PAYLOAD instruction is far more sane, this field can die. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
f2fad0dc80627e853eea558498f18a9fa769992e |
|
19-Feb-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965: Perform basic optimizations on the BROADCAST opcode. v2: Style fixes. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
7f5a8ac155283e78df2da5b172a65361a80d38b6 |
|
25-Apr-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Disallow constant propagation into POW on Gen 6. Fixes assertion failures in three piglit tests on Gen 6 since commit 0087cf23e.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
51c61fff8f46472820ac413ad22e9f3edf670396 |
|
24-Apr-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Don't constant propagate into integer math instructions. Constant combining won't promote non-floats, so this isn't safe. Fixes regressions since commit 0087cf23e.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
0087cf23e8e399778e93369d67dd543e767ab526 |
|
17-Mar-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Allow 2-src math instructions to have immediate src1. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
28e9601d0e681411b60a7de8be9f401b0df77d29 |
|
16-Apr-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Add a devinfo field to backend_visitor and use it for gen checks Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
472ef9a02f2e5c5d0caa2809cb736a0f4f0d4693 |
|
03-Apr-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Change SEL and MOV types as needed to propagate source modifiers SEL and MOV instructions, as long as they don't have source modifiers, are just copying bits around. This commit adds support to copy propagation to switch the type of a SEL or MOV instruction as needed so that it can propagate source modifiers. This is needed because NIR generates integer SEL and MOV instructions whenver it doesn't know what else to generate. shader-db results with NIR: total FS instructions in shared programs: 4360910 -> 4360186 (-0.02%) FS instructions in affected programs: 59094 -> 58370 (-1.23%) helped: 341 HURT: 0 GAINED: 2 LOST: 0 Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
bb99a58e7710acd19463646c38cdddbd926e89c4 |
|
03-Apr-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use the source type when looking for UD negations in copy prop There can be problems with floats and conditional modifiers when copy-propagating a negated UD source. The problem arises when a source modifier is applied to a UD value. In this case, a 33-bit representation is internally used. If you do the following: 1: mov foo:UD 7U 2: mov bar:UD -foo:UD 3: mov out:F bar:UD the out register will have the value (float)(unt32_t)-7 which is some very large floating-point number. However, if we allow copy-propagation of the second mov, we get 1: mov foo:UD 7U 3: mov out:f -bar:UD and, since the negation is computed in 33-bits, we get a value of -7.0f which is clearly not the same. This is a similar problem if the instruction has a conditional modifier where the 33-bit value is used in the comparison and not the 32-bit version. Previously, we checked the source to be copied for the negate and then checked the source being propagated to for the type. This isn't quite what we want because we are really just looking for negated UD sources. A check later in the file ensures that both ends of the propagate have the right type so it works. However, if we relax the restriction that both ends of the propagation have the same type, it ends up causing us to bail early in cases we don't want. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
b53d035825ef3ad680470aa5c4f9dc51f8f5676b |
|
12-Feb-2015 |
Eric Anholt <eric@anholt.net> |
util: Move Mesa's bitset.h to util/. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
36bc5f06dd22cde0ba572c00ae7548fe8cb7c731 |
|
27-Oct-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Allow immediates in MAD and LRP instructions. And then the opt_combine_constants() pass will pull them out into registers. This will allow us to do some algebraic optimizations on MAD and LRP. total instructions in shared programs: 5946656 -> 5931320 (-0.26%) instructions in affected programs: 778247 -> 762911 (-1.97%) helped: 3780 HURT: 6 GAINED: 12 LOST: 12 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
aef83957e1e13ecb96df436d53373ecc4cedeb08 |
|
06-Feb-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965: Handle negated unsigned immediate values in constant propagation. Negation of UD/UW sources behaves the same as for D/W sources, taking the two's complement of the source, except for bitwise logical operations on Gen8 and up which take the one's complement. Fixes crash in a GLSL shader with subtraction of two unsigned values. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
a3ee6c7d1991a90d22fae992c1cb94123e51ae54 |
|
06-Feb-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Remove dependency of fs_inst on the visitor class. The fs_visitor argument of fs_inst::regs_read() wasn't used at all. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
e87928a494a7cf0985a9d1cd78bda8729d17c614 |
|
29-Jan-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add support for constant propagating into sources with modifiers. All but 16 of the programs helped were ARB fp programs. total instructions in shared programs: 5949286 -> 5945470 (-0.06%) instructions in affected programs: 275162 -> 271346 (-1.39%) helped: 1197 GAINED: 1 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
6dd346c2328df86f843c3a355a7919fb0404f6df |
|
19-Jan-2015 |
Iago Toral Quiroga <itoral@igalia.com> |
i965: Fix negate with unsigned integers For code such as: uint tmp1 = uint(in0); uint tmp2 = -tmp1; float out0 = float(tmp2); We produce code like: mov(8) g5<1>.xF -g9<4,4,1>.xUD which does not produce correct results. This code produces the results we would expect if tmp1 and tmp2 were signed integers instead. It seems that a similar problem was detected and addressed when using negations with unsigned integers as part of condionals, but it looks like the problem has a wider impact than that. This patch fixes the problem by preventing copy-propagation of negated UD registers in all scenarios, not only in conditionals. Fixes the following 24 dEQP tests: dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uint_* dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec2_* dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec3_* dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec4_* Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
3a5c7e47fdcfb3e322c0756e960cbcf8403e4230 |
|
16-Oct-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Allow constant propagation between different types This will be needed for NIR because it is typeless and treats all constants as uint32 values and reinterprets them when they are used later. This commit allows those values to be properly propagated. Also, this helps some synmark shaders because it allows us to copy propagate a 0x00000000UD into a 0.0F in a load_payload, which then lets us combine 4 load_payloads. instructions in affected programs: 2288 -> 2144 (-6.29%) Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
840e8fc9203390615f051259efeab0f61f48bbfc |
|
28-Oct-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965: Don't copy propagate constants from sources with saturate We don't propagate the saturate bit and some instructions can't saturate at all. If the source has saturate set, just skip propagation. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
29f4c5b5d5d142f19283c06e77bedd4b3793657a |
|
24-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Constant propagate into LOAD_PAYLOAD Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
5f41d052bf53e32761fb528f4be99a1af3a33ebc |
|
20-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Make fs_reg::effective_width take fs_inst* instead of fs_visitor* Now that we have execution sizes, we can use that instead of the dispatch width. This way it also works for 8-wide instructions in SIMD16. i965/fs: Make effective_width a variable instead of a function i965/fs: Preserve effective width in constant propagation Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
7210583eb84a5d49803dbe37b0960373b4224d10 |
|
18-Aug-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode This is actually the squash of a bunch of different changes. Individual commit titles follow: i965/fs: Always 2-align registers SIMD16 for gen <= 5 i965/fs: Use the register width when applying offsets This reworks both byte_offset() and offset() to be more intelligent. The byte_offset() function now supports offsets bigger than 32. The offset() function uses the byte_offset() function together with the register width and the type size to offset the register by the correct amount. i965/fs: Change regs_read to be in hardware registers i965/fs: Change regs_written to be actual hardware registers i965/fs: Properly handle register widths in LOAD_PAYLOAD The LOAD_PAYLOAD instruction is a bit special because it collects a bunch of registers (with possibly different widths) into a single payload block. Once the payload is constructed, it's treated as a single block of data and most of the information such as register widths doesn't matter anymore. In particular, the offset of any particular source register is the accumulation of the sizes of the previous source registers. i965/fs: Properly set writemasks in LOAD_PAYLOAD i965/fs: Handle register widths in demote_pull_constants i965/fs: Get rid of implicit register doubling in the allocator i965/fs: Reserve enough registers for PLN instructions i965/fs: Make sources and destinations interfere in 16-wide i965/fs: Properly handle register widths in CSE i965/fs: Properly handle register widths in register_coalesce i965/fs: Properly handle widths in copy propagation i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD i965/fs: Properly handle register widths and odd register sizes in spilling i965/fs: Don't waste a register on texture lookups for gen >= 7 Previously, we were waisting a register in SIMD16 mode because we could only allocate registers in pairs. Now that we can allocate and address odd-sized registers, let's get rid of this special-case. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
1728e74957a62b1b4b9fbb62a7de2c12b77c8a75 |
|
24-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Copy propagate partial reads. This commit reworks copy propagation a bit to support propagating the copying of partial registers. This comes up every time we have pull constants because we do a pull constant read immediately followed by a move to splat the one component of the out to 8 or 16-wide. This allows us to eliminate the copy and simply use the one component of the register. Shader DB results: total instructions in shared programs: 5044937 -> 5044428 (-0.01%) instructions in affected programs: 66112 -> 65603 (-0.77%) GAINED: 0 LOST: 0 Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
a4fb8897a2bd00eefa8a503ec17d45e791bced91 |
|
01-Sep-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Remove now unneeded calls to calculate_cfg(). Now that nothing invalidates the CFG, we can calculate_cfg() immediately after emit_fb_writes()/emit_thread_end() and never again. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
072ea414d04f1b9a7bf06a00b9011e8ad521c878 |
|
01-Sep-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Remove cfg-invalidating parameter from invalidate_live_intervals. Everything has been converted to preserve the CFG. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
40aeb558ce8a7ffaaa6f81be16419b9b238c16d8 |
|
03-Jul-2014 |
Abdiel Janulgue <abdiel.janulgue@linux.intel.com> |
i965/fs: Allow propagation of instructions with saturate flag to sel When sel conditon is bounded within 0 and 1.0. This allows code as: mov.sat a b sel.ge dst a 0.25F To be propagated as: sel.ge.sat dst b 0.25F v3: Syntax clarifications in inst->saturate assignment (Matt Turner) Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
20a849b4aa63c7fce96b04de674a4c70f054ed9c |
|
13-Jul-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Use basic-block aware insertion/removal functions. To avoid invalidating and recreating the control flow graph. Also stop invalidating the CFG in places we didn't add or remove an instruction. cfg calculations: 202951 -> 80307 (-60.43%) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
596990d91e2a4c4a3a303c6c2da623bf1840771b |
|
12-Jul-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Add and use foreach_block macro. Use this as an opportunity to rename 'block_num' to 'num'. block->num is clear, and block->block_num has always been redundant.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
9b9dd22f4448da6a6e825faaa40cd601b6fb2b59 |
|
29-Jul-2014 |
Anuj Phogat <anuj.phogat@gmail.com> |
i965: Bail on FS copy propagation for scratch writes with source modifiers Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
e005c1148db04cb706a65392c2b0beda19efa0b1 |
|
11-Aug-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Return NONE from brw_swap_cmod on unknown input. Comparing ~0u with a packed enum (i.e., 1 byte) always evaluates to false. Shouldn't gcc warn about this? Reported-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
1761671b0627ce8e1c0eae721e1fca5c2d04690e |
|
12-Jul-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Replace cfg instances with calls to calculate_cfg(). Avoids regenerating it unnecessarily. Every program in shader-db improved, none by an amount less than a 1/3 reduction. One Dota2 shader decreased from 62 -> 24. cfg calculations: 429492 -> 193197 (-55.02%) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
a2de6562783ea87ca5fbcb67dbd36c2f345f2054 |
|
16-Jul-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Don't copy propagate abs into Broadwell logic instructions. It's not clear what abs on logical instructions means on Broadwell, and it doesn't appear to do anything sensible. Fixes 270 Piglit tests (the bitand/bitor/bitxor tests with abs). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81157 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: "10.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
58270c2fac493497ed7923830f49051a53e86a07 |
|
08-Jul-2014 |
Connor Abbott <cwabbott0@gmail.com> |
exec_list: Make various places use the new length() method. Instead of hand-rolling it. v2 [mattst88]: Rename get_size to length. Expand comment in ir_reader. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1] Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Connor Abbott <connor.abbott@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
46e5b2a497216133be656b38ebfcf96da64b7744 |
|
30-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Make a brw_conditional_mod enum. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
53992a102ffddf2e0fad401252cfc1c034d022ad |
|
30-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Use immediate storage in brw_reg for visitor regs. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
266109736a9a69c3fdbe49fe1665a7a63c5cc122 |
|
25-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Use typed foreach_in_list_safe instead of foreach_list_safe. Acked-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
c5030ac0ac15d3c91c4352789f94281da9a9dcad |
|
25-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Use typed foreach_in_list instead of foreach_list. Acked-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
bc2fbbafd216676ccc7c3abd794ecb7dd1fa631f |
|
24-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Add and use foreach_inst_in_block macros. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
038eb649b30dfddaf40888ea28b5e88de3af2214 |
|
24-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Constant propagate into 2-src math instructions on Gen8. total instructions in shared programs: 1878133 -> 1876986 (-0.06%) instructions in affected programs: 153007 -> 151860 (-0.75%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
aca4a951ea2bab855bcc2491a3b8996b54639ebd |
|
24-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Make try_constant_propagate() static. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
46659d46a8c2f7bbc8deb472faff2dccbde92d29 |
|
24-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Make can_do_source_mods() a member of the instruction classes. Pretty nonsensical to have it as a method of the visitor just for access to brw. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
18372a710028fcbe1ff74f2f727e986c223957ba |
|
18-Apr-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Copy propagate from load_payload. But only into non-load_payload instructions. Otherwise we would prevent register coalescing from combining identical payloads.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
609d00e13e1e3e61ce540c42250c35977d4bcaa1 |
|
05-Jun-2014 |
Abdiel Janulgue <abdiel.janulgue@linux.intel.com> |
i965/fs: skip copy-propate for logical instructions with negated src entries The negation source modifier on src registers has changed meaning in Broadwell when used with logical operations. Don't copy propagate when negate src modifier is set and when the destination instruction is a logical op. Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
a66660d2b75197814f5e36b9994b1e9eadff0a2e |
|
05-Jun-2014 |
Abdiel Janulgue <abdiel.janulgue@linux.intel.com> |
i965/fs: Refactor check for potential copy propagated instructions. Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
b1dcdcde2e323f960833f5c7da65d5c2c20113c9 |
|
17-Mar-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Loop from 0 to inst->sources, not 0 to 3. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
58bcf5996dc60043eee5946a6f2f96256768fc9f |
|
12-May-2014 |
Matt Turner <mattst88@gmail.com> |
i965/cfg: Embed exec_node in bblock_link. In order to remove bblock_link's inheritance of exec_node. Also makes linked list walk code much nicer. Acked-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
63d57f3b086db1e403df5d8f61a368df5e2f5afc |
|
26-Mar-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Name temporary ralloc contexts something other than mem_ctx. Or else poor programmers might mistakenly use the temporary mem_ctx, instead of the fs_visitor's mem_ctx and wonder why their code is crashing. Also remove the parenting. These contexts are local to the optimization passes they're in and are freed at the end.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
a76e5dce4fc8d50f8699c108833f24e80167d706 |
|
23-Dec-2013 |
Eric Anholt <eric@anholt.net> |
i965: Move compiler debugging output to stderr. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
019bf6ed8dd4843512e9d4924f4702ce36047ad5 |
|
15-Jan-2014 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Remove fs_reg::smear. The same effect can be achieved using a combination of ::stride and ::subreg_offset. Remove the less flexible ::smear to keep the data members of fs_reg orthogonal. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
756d37b1d6d09ad7ee3b8835888a49d4256e427b |
|
08-Dec-2013 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Add support for specifying register horizontal strides. v2: Some improvements for copy propagation with non-contiguous register strides and mismatching types. v3: Add example of the situation that the copy propagation changes are intended to avoid. Clarify that 'fs_reg::apply_stride()' is expected to work with zero strides too. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
4c7206bafdd7bde7617e14840812e43459682718 |
|
08-Dec-2013 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Add support for sub-register byte offsets to the FS back-end IR. It would be nice if we could have a single 'reg_offset' field expressed in bytes that would serve the purpose of both, but the semantics of 'reg_offset' are quite complex currently (it's measured in units of one, eight or sixteen dwords depending on the register file and the dispatch width) and changing it to bytes would be a very intrusive change at this stage. Add a separate 'subreg_offset' field for now. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
b1eb2ad8d159748034e7befc22b46a0b3b040186 |
|
28-Nov-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Allow commuting the operands of ADDC for const propagation. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
8786f381eca2c818e381af74feda8d4a22c0e411 |
|
26-Nov-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Allow constant propagation into ASR and BFI1. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
d2fcdd0973ee33a2627d1dee6d78091e605af160 |
|
29-Nov-2013 |
Matt Turner <mattst88@gmail.com> |
i965/cfg: Clean up cfg_t constructors. parent_mem_ctx was unused since db47074a, so remove the two wrappers around create() and make create() the constructor. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
503fe278b070285b75a4000408873973d8d5f2b1 |
|
20-Oct-2013 |
Matt Turner <mattst88@gmail.com> |
i965: s/Muchnik/Muchnick/. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
36fbe66d3a71df76fcb6f915846da4471b3a8442 |
|
10-Oct-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Convert gen7 to using GRFs for texture messages. Looking at Lightsmark's shaders, the way we used MRFs (or in gen7's case, GRFs) was bad in a couple of ways. One was that it prevented compute-to-MRF for the common case of a texcoord that gets used exactly once, but where the texcoord setup all gets emitted before the texture calls (such as when it's a bare fragment shader input, which gets interpolated before processing main()). Another was that it introduced a bunch of dependencies that constrained scheduling, and forced waits for texture operations to be done before they are required. For example, we can now move the compute-to-MRF interpolation for the second texture send down after the first send. The downside is that this generally prevents remove_duplicate_mrf_writes() from doing anything, whereas previously it avoided work for the case of sampling from the same texcoord twice. However, I suspect that most of the win that originally justified that code was in avoiding the WAR stall on the first send, which this patch also avoids, rather than the small cost of the extra instruction. We see instruction count regressions in shaders in unigine, yofrankie, savage2, hon, and gstreamer. Improves GLB2.7 performance by 0.633628% +/- 0.491809% (n=121/125, avg of ~66fps, outliers below 61 dropped). Improves openarena performance by 1.01092% +/- 0.66897% (n=425). No significant difference on Lightsmark (n=44). v2: Squash in the fix for register unspilling for send-from-GRF, fixing a segfault in lightsmark. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
4b821a97b5fcdc4c530d5455c43196be09830322 |
|
06-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Create a helper function for invalidating live intervals. For now, this simply sets live_intervals_valid = false, but in the future it will do something more sophisticated. Based on a patch by Eric Anholt. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
014cce3dc49f5b0bfd7fbb1940ed661c9fc7bbd7 |
|
19-Sep-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Generate code for ir_binop_carry and ir_binop_borrow. Using the ADDC and SUBB instructions on Gen7. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
89f5f675ad27bd485d1c1be654ca10c49321957f |
|
06-Aug-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Allow immediates to be folded into logical and shift instructions. These instructions will be used with immediate arguments in the upcoming ldexp lowering pass and frexp implementation. v2: Add vec4 support as well. Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
9d08756ac7d27b5a392d17c2a91a55cec6edab5d |
|
19-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Add code to print out global copy propagation sets. This was invaluable when debugging the global copy propagation algorithm. We may as well commit it in case someone needs to print out the sets in the future. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
44960ef918fff24cf7e49f4c89e845709aae3541 |
|
10-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Re-enable global copy propagation. I believe the data flow analysis actually works now, and it should be safe to re-enable global copy propagation. It even does things now. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
72f2249c115a6bfafc809ebb4cb78c860279e41f |
|
10-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Fix computation of livein. Since the initial value for livein is an overestimation (0xffffffff), it's extremely likely that it will shrink, which means we can't simply OR in new bits - we need to fully recompute it based on the current liveout values. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
70b02a7facf88d5f17655be5e17f053d8531a278 |
|
10-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Fully recompute liveout at each step. Since we start with an overestimation of livein (0xffffffff), successive steps can actually take away values. This means we can't simply OR in new liveout values; we need to recompute it from scratch at each iteration of the fixed-point algorithm. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
d20b472d0a6b016e4827d0986a10df29277a3a5e |
|
10-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Skip the initial block when updating livein/liveout. The starting block always has livein = 0 and liveout = copy. Since we start with real data, not estimates, there's no need to refine it with the fixed point algorithm. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
731145c5794c2831a833778b0940c999273ec984 |
|
12-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Drop unnecessary and incorrect liveout initialization. The previous commit properly initialized liveout. This previous (and incorrect) initialization is no longer necessary. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
1d40c784f22dcbe814e7915d1fae45774a264526 |
|
10-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Properly initialize the livein/liveout sets. Previously, livein was initialized to 0 for all blocks. According to the textbook, it should be the universal set (~0) for all blocks except the one representing the start of the program (which should be 0). liveout also needs to be initialized to COPY for the initial block. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
f06826cece7ad6348c93760e473e5a35ad872431 |
|
10-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Use the COPY set in the calculation for liveout. According to page 360 of the textbook, the proper formula for liveout is: CPout(n) = COPY(i) union (CPin(i) - KILL(i)) Previously, we omitted COPY. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
a291c59bbae7d9d96487a984f81a298a1fd71389 |
|
10-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Simplify liveout calculation. Excluding the existing liveout bits is a deviation from the textbook algorithm. The reason for doing so was to determine if the value changed, which means the fixed-point algorithm needs to run for another iteration. The simpler way to do that is to save the value from step (N-1) and compare it to the new value at step N. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
597efd2b67d1afb8a95be38145c4f977ed36b672 |
|
09-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Create the COPY() set for use in copy propagation dataflow. This is the "COPY" set from Muchnick's textbook, which is necessary to do the dataflow algorithm correctly. v2: Simplify initialization based on Paul Berry's observation that out_acp contains exactly what needs to be in the COPY set. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
669d4d7f77648948800abce59bc99a29a338a3ad |
|
10-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Rename setup_kills() to setup_initial_values(). Although this function currently only initializes the KILL set, it will soon initialize other data flow sets as well. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
2ef81372dccc102d95b3dcec22b42406e1b55af9 |
|
10-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Separate the updating of liveout/livein. To compute the actual liveout/livein data flow values, we start with some initial values and apply a fixed-point algorithm until they settle. Previously, we iterated through all blocks, updating both liveout and livein together in one pass. This is awkward, since computing livein for a block requires knowing liveout for all parent blocks. Not all of those parent blocks may have been processed yet. This patch separates the two. First, we update liveout for all blocks. At iteration N of the fixed-point algorithm, this uses livein values from iteration N-1. Secondly, we update livein for all blocks. At step N, this uses the liveout information we just computed (in step N). This ensures each computation has a consistent picture of the data, rather than seeing an random mix of data from steps N-1 and N depending on the order of the blocks in the CFG data structure. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
7d86042dee17dfd985dcab098fc97838c11a5662 |
|
10-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Rename "cont" to "progress" in dataflow algorithm. This variable indicates that the fixed-point algorithm made changes to the data at this step, so it needs to run for another iteration. "progress" seems a nicer name for that. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
0225dea6c49674a27d5be6e933447d8a4ba5a82e |
|
10-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Switch to a do-while loop in copy propagation dataflow. The fixed-point algorithm needs to run at least once, so a do-while loop is more natural. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
3c68662bb1d41727b6c53fd58868cdcfe6a98492 |
|
10-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Skip global copy propagation step. The dataflow analysis used for global copy propagation is severely broken, and I believe it doesn't actually do anything. Fixing it will require a lot of changes, each of which might break things. Once all the fixes land, we can re-enable this. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
9c48ae751ab28f35eb878551d24c071be0ce11b0 |
|
09-Aug-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Don't copy propagate bitcasts with source modifiers. Previously, copy propagation would cause bitcast_f2u(abs(float)) to be performed in a single step, but the application of source modifiers (abs, neg) happens after type conversion, leading to incorrect results. That is, for bitcast_f2u(abs(float)) we would in fact generate code to do abs(bitcast_f2u(float)). For example, whereas bitcast_f2u(abs(float)) might result in a register argument such as (abs)g2.2<0,1,0>UD v2: Set interfered = true and break in register_coalesce instead of returning false. Reviewed-by: Paul Berry <stereoytpe441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
2cb7f1e766d28dd238274f74d9568ab4438c4965 |
|
04-Jun-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add a helper function for checking for partial register updates. These checks were all over, and every time I wrote one I had to try to decide again what the cases were for partial updates. v2: Fix inadvertent reladdr check removal. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
df25b4f3cf22282b06e622f3cf1f5855b8f767a8 |
|
04-Apr-2013 |
Eric Anholt <eric@anholt.net> |
mesa: Add a macro to bitset for determining bitset size. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
1d6ead38042cc0d1e667d8ff55937c1e32d108b1 |
|
15-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Allow constant propagation into MACH. This happens quite a bit with varying-index uniform loads. We could also do better by avoiding the MACH entirely, but there's no reason not to at least take this step. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
eda434921d6d7980f8b116e4ebde2da6553b9094 |
|
22-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Improve performance of copy propagation dataflow using bitsets. Reduces compile time of l4d2's slowest shader by 17.8% +/- 1.3% (n=10). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
49bdebad3857bb9ebac53f593d08f0057f5a20d3 |
|
16-Feb-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Fix copy propagation with smearing. We were correctly relaying the smear from MOV's src, but if the MOV didn't do a smear, we don't want to smash the smear value from the instruction being propagated into. Prevents a regression in the upcoming UBO change. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> NOTE: This is a candidate for the 9.1 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
2c4ad502ce12259160be6c73ebdd6e73a5d27c6f |
|
08-Jan-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix build error with clang. Technically, variable sized arrays are a required feature of C99, redacted to be optional in C11, and not actually part of C++ whatsoever. Gcc allows using them in C++ unless you specify -pedantic, and Clang appears to allow them for simple/POD types. exec_list is arguably POD, since it doesn't have virtual methods, but I can see why Clang would be like "meh, it's a C++ struct, say no", seeing as it's meant to support C99. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58970 Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
f22a909a080d603db122ac8517a80bd8f4006fe2 |
|
09-Nov-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Restrict optimization that would fail for gen7's SENDs from GRFs v2: Fix SNB math bug in register_coalesce() where I was looking at the instruction to be removed, not the instruction to be copy propagated into.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
29340d02dc38a9cc352d44412871dc9d4e3f878a |
|
07-Nov-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Rename the existing pull constant load opcode. We're going to use another send message for handling loads with a varying per-fragment array index.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
177c82555b24a80c15c34315ff17437cc39d1ba5 |
|
30-Oct-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add support for global copy propagation. It is common for complicated shaders, particularly code-generated ones, to have a big array of uniforms or attributes, and a prologue in the shader that dereferences from the big array to more informatively-named local variables. Then there will be some small control flow operation (like a ? : statement), and then use of those informatively-named variables. We were emitting extra MOVs in these cases, because copy propagation couldn't reach across control flow. Instead, implement dataflow analysis on the output of the first copy propagation pass and re-run it to propagate those extra MOVs out. On one future Steam release, reduces VS+FS instruction count from 42837 to 41437. No statistically significant performance difference (n=48), though, at least at the low resolution I'm running it at. shader-db results: total instructions in shared programs: 722170 -> 702545 (-2.72%) instructions in affected programs: 260618 -> 240993 (-7.53%) Some shaders do get hurt by up to 2 instructions, because a choice to copy propagate instead of coalesce or something like that results in a dead write sticking around. Given that we already have instances of those instructions in the affected programs (particularly unigine), we should just improve dead code elimination to fix the problem.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
9864a5b098d2a8087c8966f3e654969be51b660d |
|
30-Oct-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Fix a comment in copy propagation. We haven't been only tracking raw GRF-GRF moves since the constant propagation merge, and also the extension for source modifiers and uniforms. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
545b59b62a65a9cf1121668bd407e3365bbfa296 |
|
30-Oct-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Allow copy-propagation on pull constant load values. Given that we handle similarly-regioned GRFs registers for our copy propagation from our UNIFORM file, there's no reason not to allow it. The shader-db impact is negligible -- +90 instructions total, 2 shaders helped and 7 hurt (slightly increased register pressure increased spilling), but this is to prevent regression in other shaders when fixing copy_propagation to reduce register pressure in the shaders that are hurt here. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
c226b7a4d3f0fbba08a384e3bacd08b3a0a82531 |
|
03-Oct-2012 |
Eric Anholt <eric@anholt.net> |
i965: Make the cfg reusable from the VS. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
54679fcbcae7a2d41cb439e52e386bd811a291b4 |
|
03-Oct-2012 |
Eric Anholt <eric@anholt.net> |
i965: Share the predicate field between FS and VS. Note that BRW_PREDICATE_NONE is 0 and BRW_PREDICATE_NORMAL is 1, so that's a lot like the true/false we had in the FS before. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
7abfb67dc42ec3a96443ed025807267646c56e86 |
|
03-Oct-2012 |
Eric Anholt <eric@anholt.net> |
i965: Rename fs_cfg types to not mention fs. fs_bblock_link -> bblock_link fs_bblock -> bblock_t (to avoid conflicting with all the fs_bblock *bblock) fs_cfg -> cfg_t (to avoid conflicting with all the fs_cfg *cfg) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
5ed57d9543df1875d843638212a1f650fc0b17ec |
|
03-Oct-2012 |
Eric Anholt <eric@anholt.net> |
i965: Move brw_fs_cfg.* to brw_cfg.*. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
6a514494fa4c45e921bd6af7f3187a67c1e8d9d2 |
|
21-Sep-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Improve performance of copy/constant propagation. Use a simple chaining hash table for the ACP. This is not really very good, because we still do a full walk of the tree per destination write, but it still reduces fp-long-alu runtime from 5.3 to 3.9s. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
fb5bf03a2092159166229eacf57c71587f762c57 |
|
21-Sep-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Move constant propagation to the same codebase as copy prop. This means that we don't get constant prop across into the first block after a BRW_OPCODE_IF or a BRW_OPCODE_DO, but we have hope for properly doing it across control flow at some point. More importantly, with the next commit it will help avoid O(n^2) with instruction count runtime for shaders that have many constant moves. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
a454f8ec6df9334df42249be910cc2d57d913bff |
|
07-Jul-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs.h: Refactor tests for instructions modifying a register. There's one instance of a potential behavior change: propagate_constants may now propagate into a part of a vgrf after a different part of it was overwritten by a send that returns multiple registers. I don't think we ever generate IR that meets that condition, but it's something to note if we bisect behavior change to this. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
1e28f55ab7909496d93ab1b552faad17453c10ac |
|
05-Jun-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Invalidate live intervals after copy propagation. For copy propgation, we've dropped the use of a GRF in favor of a (probably later) use of a different GRF. This definitely requires invalidating intervals. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
dd4282e38fd92c081875da6bce0b2345bd472532 |
|
07-Jun-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Allow copy propagation on uniforms. This is a big win for savage2, hon and yofrankie. 62 new programs for savage2/hon get 16-wide mode, along with one for humus demos and two for tropics. Even a few shaders from tropics see reductions of 15% or more. total instructions in shared programs: 216536 -> 207353 (-4.24%) instructions in affected programs: 123941 -> 114758 (-7.41%) In benchmarking Tropics, only a .040% +/- 034% performance improvement was observed (n=90). Rather disappointing, but I was primarily motivated to do this patch by a regression in the number of 16-wide shaders compiled after a GRF texturing on IVB patch I'm working on. Hopefully this helps avoid that regression. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
0c4630bae001139dea42b78cd08157de4d90542b |
|
06-Jun-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Allow copy propagation with source modifiers. This shaves a few instructions off of a ton of programs. For 12 shaders from tropics and sanctuary, it's enough reduction in register pressure to get 16-wide mode. 7 shaders from heroes of newerth and savage2 are hurt by about 1.1%, where copy propagation of negates ends up preventing coalescing, but we could regain that by doing dataflow analysis in our copy propagation. No significant performance difference in tropics (n=11) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
458f7f014139deb48a4cf0a9e6bdca3a57d24208 |
|
06-Jun-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Move copy propagation test out to a separate function. It's going to get more complicated in a moment. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|
d7787adda8006506545256547d8d590a282487af |
|
08-May-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add support for copy propagation. We could do more by handling abs/negate and non-GRF sources, but this is a good start. Improves tropics performance 0.30% +/- .17% (n=43). shader-db results: Total instructions: 208032 -> 207184 60/1246 programs affected (4.8%) 23286 -> 22438 instructions in affected programs (3.6% reduction) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
|