History log of /external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
7bed52bb5fb4cfd5f91c902a654b3452f921da17 29-Nov-2016 Matt Turner <mattst88@gmail.com> i965/fs: Reject copy propagation into SEL if not min/max.

We shouldn't ever see a SEL with conditional mod other than GE (for max)
or L (for min), but we might see one with predication and no conditional
mod.

total instructions in shared programs: 8241806 -> 8241902 (0.00%)
instructions in affected programs: 13284 -> 13380 (0.72%)
HURT: 62

total cycles in shared programs: 84165104 -> 84166244 (0.00%)
cycles in affected programs: 75364 -> 76504 (1.51%)
helped: 10
HURT: 34

Fixes generated code in at least Sanctum 2, Borderlands 2, Goat
Simulator, XCOM: Enemy Unknown, and Shogun 2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92234
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
6014da50ec41d1ad43fec94a625962ac3f2f10cb 28-Nov-2016 Matt Turner <mattst88@gmail.com> i965/fs: Rename opt_copy_propagate -> opt_copy_propagation.

Matches the vec4 backend, cmod propagation, and saturate propagation.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
4d4335c81a3f7d8434d9983881a63abcbc29dd5c 11-Oct-2016 Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> i965/fs: fill allocated memory with zeros where needed

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
f2d2156ba225a844723443d6f4356454e72112e0 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Move region_contained_in to the IR header and fix for non-VGRF files.

Also changed the argument names since 'src' and 'dst' don't make that
much sense outside of the context of copy propagation.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
645261c4b2a12b5724946f9f6d35f74e28ce760f 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Change region_contained_in() to use byte units.

This makes the function less annoying to use and more accurate -- We
shouldn't propagate a copy into a register region that wasn't fully
contained in the destination of the copy (IOW, a source region that
wasn't fully defined by the copy) just because the number of registers
written and read by each instruction happened to get rounded up to the
same GRF multiple.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
1c67e272475f576c8ab4b2be367f4c3c664cb23c 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Simplify copy propagation LOAD_PAYLOAD ACP setup.

By keeping track of 'offset' in byte units.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
b42c13a5b8ac7d643bbf4c1592607811a81b4ebb 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Drop fs_inst::overwrites_reg() in favor of regions_overlap().

fs_inst::overwrites_reg is rather easy to misuse because it cannot
tell how large the register region starting at 'reg' is, so in cases
where the destination region starts after 'reg' it may give a
misleading result. regions_overlap() is somewhat more verbose to use
but handles arbitrary overlap correctly so it should generally be used
instead.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
5cc6425d708a9b8c660c2f43f5e277c507c98bf0 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix can_propagate_from() source/destination overlap check.

The previous overlap condition only made sure that the VGRF numbers or
GRF-aligned offsets were different without taking the amount of data
written and read by the instruction into consideration. Use the
regions_overlap() helper instead.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
e1a918ba7be6b21303caa2d81671f2d3f17dd692 08-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Replace fs_inst::regs_read with ::size_read using byte units.

The previous regs_read value can be recovered by rewriting each
reference of regs_read() like 'x = i.regs_read(j)' to 'x =
DIV_ROUND_UP(i.size_read(j), reg_unit)'.

For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
69570bbad876bb9da609c3b651aacda28cecc542 07-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Replace fs_inst::regs_written with ::size_written field in bytes.

The previous regs_written field can be recovered by rewriting each
rvalue reference of regs_written like 'x = i.regs_written' to 'x =
DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference
like 'i.regs_written = x' to 'i.size_written = x * reg_unit'.

For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
be095e11e41158f91bcb3f6fcbc2e2a91a5d9124 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Replace fs_reg::subreg_offset with fs_reg::offset expressed in bytes.

The fs_reg::subreg_offset and ::offset fields are now redundant, the
sub-GRF offset can just be added to the single ::offset field
expressed in byte units. The current subreg_offset value can be
recovered by applying the following rule: Replace each rvalue
reference of subreg_offset like 'x = r.subreg_offset' with 'x =
r.offset % reg_unit', and each lvalue reference like 'r.subreg_offset
= x' with 'r.offset = ROUND_DOWN_TO(r.offset, reg_unit) + x'.

For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
86944e063ad40cac0860bfd85a3cc4e9a9805aa3 01-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes.

The fs_reg::offset field in byte units introduced in this patch is a
more straightforward alternative to the current register offset
representation split between fs_reg::reg_offset and ::subreg_offset.
The split representation makes it too easy to forget about one of the
offsets while dealing with the other, which has led to multiple
back-end bugs in the past. To make the matter worse the unit
reg_offset was expressed in was rather inconsistent, for uniforms it
would be expressed in either 4B or 16B units depending on the
back-end, and for most other things it would be expressed in 32B
units.

This encodes reg_offset as a new offset field expressed consistently
in byte units. Each rvalue reference of reg_offset in existing code
like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and
each lvalue reference like 'r.reg_offset = x' is rewritten to
'r.offset = r.offset % reg_unit + x * reg_unit'.

Because the change affects a lot of places and is rather non-trivial
to verify due to the inconsistent value of reg_unit, I've tried to
avoid making any additional changes other than applying the rewrite
rule above in order to keep the patch as simple as possible, sometimes
at the cost of introducing obvious stupidity (e.g. algebraic
expressions that could be simplified given some knowledge of the
context) -- I'll clean those up later on in a second pass.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
527f37199929932300acc1688d8160e1f3b1d753 23-Aug-2016 Jason Ekstrand <jason.ekstrand@intel.com> intel: s/brw_device_info/gen_device_info/

Generated by:

sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
7244dc1e0651958b62222cafb15e34487851a6cd 03-Jun-2016 Francisco Jerez <currojerez@riseup.net> Revert "i965/fs: Allow scalar source regions on SNB math instructions."

This reverts commit c1107cec44ab030c7fcc97c67baa12df1cc9d7b5.
Apparently the hardware spec text I quoted in the commit message was
outright lying about scalar source math being supported on SNB, the
hardware seems to load 32 contiguous bits of data for each channel
regardless of the regioning mode. Fixes regressions in the following
CTS tests (which we didn't catch early due to CTS being temporarily
disabled in our CI system):

es2-cts.gtf.gl.atan.atan_vec3_frag_xvary
es2-cts.gtf.gl.cos.cos_vec2_frag_xvary
es2-cts.gtf.gl.atan.atan_vec2_frag_xvary
es2-cts.gtf.gl.pow.pow_vec2_frag_xvary_yconsthalf
es2-cts.gtf.gl.cos.cos_float_frag_xvary
es2-cts.gtf.gl.pow.pow_float_frag_xvary_yconsthalf
es2-cts.gtf.gl.atan.atan_vec3_frag_xvaryyvary
es2-cts.gtf.gl.pow.pow_vec3_frag_xvary_yconsthalf
es2-cts.gtf.gl.cos.cos_vec3_frag_xvary
es2-cts.gtf.gl.atan.atan_vec2_frag_xvaryyvary

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96346
Reported-by: Mark Janes <mark.a.janes@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
c1107cec44ab030c7fcc97c67baa12df1cc9d7b5 28-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Allow scalar source regions on SNB math instructions.

I haven't found any evidence that this isn't supported by the
hardware, in fact according to the SNB hardware spec:

"The supported regioning modes for math instructions are align16,
align1 with the following restrictions:
- Scalar source is supported.
[...]
- Source and destination offset must be the same, except the case of
scalar source."

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
a9f00a9e535019747d041f4121c56404057465a3 27-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Generalize regions_overlap() from copy propagation to handle non-VGRF files.

This will be useful in several places. The only externally visible
difference (other than non-VGRF files being supported now) is that the
region sizes are now passed in byte units instead of in GRF units
because the loss of precision would have become a problem in the SIMD
lowering pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
48d743c5019076056739561f979e7101c04acf21 30-Apr-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Allow constant propagation into logical send sources.

Logical sends are eventually lowered into a series of copies so they
can take almost anything as source.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
ad8f66ed33172ab40d4679063780a501b6f80740 27-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix multiple ACP interference during copy propagation.

This is more fallout from cf375a3333e54a01462f192202d609436e5fbec8.
It's possible for multiple ACP entries to interfere with a given VGRF
write, so we need to continue iterating even if an overlapping entry
has already been found.

Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
0bc5ad8d1997fe33dd43bb476c67163039f065ff 20-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Avoid constant propagation when the type sizes don't match.

The case where the source type of the instruction is smaller than the
immediate type could be handled by calculating the portion of the
immediate read by the instruction (assuming that the source channels
are aligned with the destination channels of the copy) and then
representing the same value as an immediate of the source type
(assuming such an immediate type exists), but the code below doesn't
do that, so just bail for the moment.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
2db9dd5aeb9566c8480651989981cb1169957748 24-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix off-by-one region overlap comparison in copy propagation.

This was introduced in cf375a3333e54a01462f192202d609436e5fbec8 but
the blame is mine because the pseudocode I sent in my review comment
for the original patch suggesting to do things this way already had
the off-by-one error. This may have caused copy propagation to be
unnecessarily strict while checking whether VGRF writes interfere with
any ACP entries and possibly miss valid optimization opportunities in
cases where multiple copy instructions write sequential locations of
the same VGRF.

Cc: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
9149fd681735b02e421b8cd9e7cea92f039d8590 11-Mar-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: fix copy/constant propagation regioning checks

We were not accounting for subreg_offset in the check for the start
of the region.

Also, fs_reg::regs_read() already takes the stride into account, so we
should not multiply its result by the stride again. This was making
copy-propagation fail to copy-propagate cases that would otherwise be
safe to copy-propagate. Again, this was observed in fp64 code, since
there we use stride > 1 often.

v2 (Sam):
- Rename function and add comment (Jason, Curro).
- Assert that register files and number are the same (Jason).
- Fix code to take into account the assumption that src.subreg_offset
is strictly less than the reg_offset unit (Curro).
- Don't pass the registers by value to the function, use
'const fs_reg &' instead (Curro).
- Remove obsolete comment in the commit log (Curro).

v3 (Sam):
- Remove the assert and put the condition in the return (Curro).
- Fix function name (Curro).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
789eecdb79d899a070507355ecb4dc137600f700 12-Apr-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: fix copy propagation from load payload

We were not considering the case where the load payload is writing to
a destination with a reg_offset > 0.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
cf375a3333e54a01462f192202d609436e5fbec8 11-Mar-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: fix copy propagation of partially invalidated entries

We were not invalidating entries with a src that reads more than one register
when we find writes that overwrite any register read by entry->src after
the first. This leads to incorrect copy propagation because we re-use
entries from the ACP that have been partially invalidated. Same thing for
entries with a dst that writes to more than one register.

v2 (Sam):
- Improve code by defining regions_overlap() and using it instead of a
loop (Curro).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
ea1ef49a16dc429c50ece388e92bf206ccf282a7 11-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Reindent register offset calculation of try_copy_propagate().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
0fb19806c069cbf34aaf02e77f5ae37a9e4cf3b0 11-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Simplify and fix register offset calculation of try_copy_propagate().

try_copy_propagate() was special-casing UNIFORM registers (the
BAD_FILE, ARF and FIXED_GRF cases are dead, see the assertion at the
top of the function) and then failing to take into account the
possibility of the instruction reading from a non-zero offset of the
destination of the copy. The VGRF/ATTR handling takes it into account
correctly, and there is no reason we couldn't use the exact same logic
for the UNIFORM file aside from the fact that uniforms represent
reg_offset in different units. We can work around that easily by
defining an additional constant with the right unit reg_offset is
expressed in.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
7aa53cd725cc2287fc206033120e08cde74cde2a 23-Mar-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: disallow type change in copy-propagation if types have different sizes

Because the semantics of source modifiers are type-dependent, the type of the
original source of the copy must be kept unmodified while propagating it into
some instruction, which implies that we need to have the guarantee that the
meaning of the instruction is going to remain the same after we have changed
the types. Whenthe size of the new type is different from the size of the old
type the new and old instructions cannot possibly be equivalent because the new
instruction will be reading more data than the old one was.

Prevents that we turn this:

load_payload(8) vgrf17:DF, |vgrf4+0.0|:DF 1sthalf
mov(8) vgrf18:DF, vgrf17:DF 1sthalf
load_payload(8) vgrf5:DF, vgrf18:DF, vgrf20:DF NoMask 1sthalf WE_all
load_payload(8) vgrf21:UD, vgrf5+0.4<2>:UD 1sthalf
mov(8) vgrf22:UD, vgrf21:UD 1sthalf

into:

load_payload(8) vgrf17:DF, |vgrf4+0.0|:DF 1sthalf
mov(8) vgrf18:DF, |vgrf4+0.0|:DF 1sthalf
load_payload(8) vgrf5:DF, |vgrf4+0.0|:DF, |vgrf4+2.0|:DF NoMask 1sthalf WE_all
load_payload(8) vgrf21:UD, vgrf5+0.4<2>:UD 1sthalf
mov(8) vgrf22:DF, |vgrf4+0.4|<2>:DF 1sthalf

where the semantics of the last instruccion have changed.

v2 (Curro):
- Update commit log and add comment to explain the problem better.
- Simplify the condition.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
ac9b966aac4d0276de889990f3b170e0b939c542 18-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: Fix copy propagation of load payload for double operands

Specifically, consider the size of the data type of the operand to compute
the number of registers written.

v2 (Sam):
- Fix line width (Jordan).
- Add an assert (Jordan).
- Use REG_SIZE in the calculation of regs_written (Curro)

v3 (Sam):
- Fix assert and calculation of regs_written (Curro).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
70dc19f9d628f0459db93466fbf65af1bfe75af1 26-Apr-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix propagation of copies with strided source.

This has likely been broken since we started propagating copies not
matching the offset of the instruction exactly
(1728e74957a62b1b4b9fbb62a7de2c12b77c8a75). The copy source stride
needs to be taken into account to find out the offset at the origin
that corresponds to the offset at the destination of the copy which is
being read by the instruction. This has led to program miscompilation
on both my SIMD32 branch and Igalia's FP64 branch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
ba582e58cd30c815137a11c9497b01d97842e525 05-May-2016 Connor Abbott <cwabbott0@gmail.com> i965/fs: add PACK opcode

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
0f2e227d5ce6de4697ba94ed57f5ff7ca2d86f69 03-Aug-2015 Connor Abbott <connor.w.abbott@intel.com> i965/fs: don't propagate 64-bit immediates

They can only be used with 1-src instructions, which practically (since
we should've constant-propagated away all 1-src instructions with 64-bit
immediates in NIR) means that they must be kept in separate MOV's and
can't be propagated.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
1cc7573162a7f0e8346d7abab50890c58a0dce9a 28-Apr-2016 Francisco Jerez <currojerez@riseup.net> i965: Pass devinfo pointer to is_3src() helpers.

This is not strictly required for the following changes because none
of the three-source opcodes we support at the moment in the compiler
back-end has been removed or redefined, but that's likely to change in
the future. In any case having hardware instructions specified as a
pair of hardware device and opcode number explicitly in all cases will
simplify the opcode look-up interface introduced in a subsequent
commit, since the opcode number alone is in general ambiguous.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
9881eab197c70b85346d682b525b8ea9ed241862 17-Mar-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Don't constant-fold RCP

No shader-db changes on Broadwell

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
c70b7c80e3e1dea150ece96e02ef3d364284812d 27-Feb-2016 Francisco Jerez <currojerez@riseup.net> i965: Don't try copy propagation if constant propagation succeeded.

It cannot get any better.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
e2060aaf57f4146b1a0faec55f5f2e45190d427e 16-Feb-2016 Rob Clark <robdclark@gmail.com> i965: fix new gcc6 warnings

src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp:244:1: warning:
‘void {anonymous}::fs_copy_prop_dataflow::dump_block_data() const’ defined but not used [-Wunused-function]
fs_copy_prop_dataflow::dump_block_data() const
^~~~~~~~~~~~~~~~~~~~~

From looking at git history, it looks like this is intended to be unused
(ie. just for adding on-demand debug prints)

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
f36993b46962eab4446bc1964eb47149751aee26 23-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Clean up #includes in the compiler.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
2d8c5299032d229c8f6e936db5644cd53716e6c1 20-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Prevent implicit upcasts to brw_reg.

Now that backend_reg inherits from brw_reg, we have to be careful to
avoid the object slicing problem.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
d982922b184930a4ceed1d97b772cce5c371865d 14-Aug-2015 Connor Abbott <connor.w.abbott@intel.com> i965/fs: add stride restrictions for copy propagation

There are various restrictions on what the hstride can be that depend on
the Gen, and now that we're using hstride == 2 for packing/unpacking
doubles, we're going to run into these restrictions a lot more often.
Pull them out into a separate function, and move the one restriction we
checked previously into it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
b3315a6f56fb93f2884168cbf9358b2606641db5 27-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Replace HW_REG with ARF/FIXED_GRF.

HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to
look at different fields for type, abs, negate, writemask, swizzle, and
a second file. They also caused annoying problems like immediate sources
being considered scheduling barriers (commit 6148e94e2) and other such
nonsense.

Instead use ARF/FIXED_GRF/MRF for fixed registers in those files.

After a sufficient amount of time has passed since "GRF" was used, we
can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
b163aa01487ab5f9b22c48b7badc5d65999c4985 27-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Rename GRF to VGRF.

The 2-bit hardware register file field is ARF, GRF, MRF, IMM.

Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
7638e75cf99263c1ee8e31c6cc5a319feec2c943 26-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Use brw_reg's nr field to store register number.

In addition to combining another field, we get replace silliness like
"reg.reg" with something that actually makes sense, "reg.nr"; and no one
will ever wonder again why dst.reg isn't a dst_reg.

Moving the now 16-bit nr field to a 16-bit boundary decreases code size
by about 3k.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
1392e45bfb396ccbfa5bb0c6063522e0550988d3 24-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Use immediate storage in inherited brw_reg.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
e42fb0c2a687cdcd6af2a590f6f5e24f64cfff3b 23-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Make 'dw1' and 'bits' unnamed structures in brw_reg.

Generated by

sed -i -e 's/\.bits\././g' *.c *.h *.cpp
sed -i -e 's/dw1\.//g' *.c *.h *.cpp

and then reverting changes to comments in gen7_blorp.cpp and
brw_fs_generator.cpp.

There wasn't any utility offered by forcing the programmer to list these
to access their fields. Removing them will reduce churn in future
commits.

This is C11 (and gcc has apparently supported it for sometime
"compatibility with other compilers")

See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
7c81a6a647257c309cb1ca36c60aa4bfa8e2e022 26-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Replace default case with list of enum values.

If we add a new file type, we'd like to get warnings if it's not
handled.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
8f60dc83f7edba51037662c2637f830feeea3fc6 21-Oct-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965/fs: Allow copy propagating into new surface access opcodes

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
9e17c36b8ba79e688011a5fd293ad5f42da21b66 14-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Extract can_change_source_types() functions.

Make them members of fs_inst/vec4_instruction for use elsewhere.

Also fix the fs version to check that dst.type == src[1].type and for
!saturate.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
2ace64fd598816fd1be9877962734242fc27b87b 03-Sep-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Fix copy propagation type changes.

commit 472ef9a02f2e5c5d0caa2809cb736a0f4f0d4693 introduced code to
change the types of SEL and MOV instructions for moves that simply
"copy bits around". It didn't account for type conversion moves,
however. So it would happily turn this:

mov(8) vgrf6:D, -vgrf5:D
mov(8) vgrf7:F, vgrf6:UD

into this:

mov(8) vgrf6:D, -vgrf5:D
mov(8) vgrf7:D, -vgrf5:D

which erroneously drops the conversion to float.

Cc: "11.0 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
8f5d0988ea2ccaba7f049f113b652f331524d2a6 04-Aug-2015 Francisco Jerez <currojerez@riseup.net> i965: Define virtual instruction to calculate the high 32 bits of a multiply.

This instruction will translate to the MUL/MACH sequence that computes
the high 32-bits of the result of a 64-bit multiply. Before Gen8
integer operations that used the accumulator were limited to 8-wide,
but the SIMD lowering pass can easily be hooked up to sidestep this
limitation, we just need a virtual opcode to represent the MUL/MACH
sequence in the IR.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
f68ec2baf49e37f9ce4fffe95f13177eb7225015 13-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Make sure that the type sizes are compatible during copy propagation.

It's surprising that we weren't checking for this already. A future
patch will cause code like the following to be emitted:

MOV(16) tmp<1>:uw, src
MOV(8) dst<1>:ud, tmp<8,8,1>:ud

The second MOV comes from the expansion of a LOAD_PAYLOAD header copy,
so I don't have control over its types. Copy propagation will happily
turn this into:

MOV(8) dst<1>:ud, src

Which has different semantics. Fix it by preventing propagation in
cases where a single channel of the instruction would span several
channels of the copy (this requirement could in fact be relaxed if the
copy is just a trivial memcpy, but this case is unusual enough that I
don't think it matters in practice).

I'm deliberately only checking if the type of the instruction is
larger than the original, because the converse case seems to be
handled correctly already in the code below.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
830f67046ace3c0b95a7f093fe373eeb417a1aad 18-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Remove the width field from fs_reg

As of now, the width field is no longer used for anything. The width field
"seemed like a good idea at the time" but is actually entirely redundant
with the instruction's execution size. Initially, it gave us the ability
to easily set the instructions execution size based entirely on register
widths. With the builder, we can easiliy set the sizes explicitly and the
width field doesn't have as much purpose. At this point, it's just
redundant information that can get out of sync so it really needs to go.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
81deefc45ba7b7d3b2b5e7ccf9e1680df6e31e3a 15-May-2015 Matt Turner <mattst88@gmail.com> i965/fs: Unrestrict constant propagation into integer multiply.

Gen8+'s MUL instruction doesn't ignore the high 16-bits of one source
like on earlier platforms, so we can constant propagate into it without
worry. Integer multiplies (not into the accumulator, which is done for
imul_high) are lowered in lower_integer_multiplication(), so it's safe
there as well.

On Broadwell, fragment shaders only:
total instructions in shared programs: 4377769 -> 4377451 (-0.01%)
instructions in affected programs: 48064 -> 47746 (-0.66%)
helped: 156

On Broadwell, vertex shaders only:
total instructions in shared programs: 2858885 -> 2856313 (-0.09%)
instructions in affected programs: 26380 -> 23808 (-9.75%)
helped: 134

On Broadwell, vertex shaders only (with INTEL_USE_NIR=1):
total instructions in shared programs: 2911688 -> 2865984 (-1.57%)
instructions in affected programs: 1421715 -> 1376011 (-3.21%)
helped: 6186

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
0c0ca557117edd3a57443f4f454c3a8da1d4e0b5 10-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Allow copy propagation on ATTR file registers.

This especially helps with NIR because we currently emit MOVs at the top
of the shader to copy from various ATTR registers to a giant VGRF array
of all inputs. (This could potentially be done better, but since
there's only ever one write to each register, it should be trivial to
copy propagate away...)

With NIR - only vertex shaders:
total instructions in shared programs: 3129373 -> 2889581 (-7.66%)
instructions in affected programs: 3119717 -> 2879925 (-7.69%)
helped: 20833

Without NIR - only vertex shaders:
total instructions in shared programs: 2745901 -> 2724483 (-0.78%)
instructions in affected programs: 693426 -> 672008 (-3.09%)
helped: 3516

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
7a75b55a01d355090d186357896e3cb141b9775e 02-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_inst: Get rid of the effective_width field

The effective_width field was an ill-concieved hack to get around issues in
the LOAD_PAYLOAD instruction. Now that the LOAD_PAYLOAD instruction is far
more sane, this field can die.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
f2fad0dc80627e853eea558498f18a9fa769992e 19-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965: Perform basic optimizations on the BROADCAST opcode.

v2: Style fixes.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
7f5a8ac155283e78df2da5b172a65361a80d38b6 25-Apr-2015 Matt Turner <mattst88@gmail.com> i965/fs: Disallow constant propagation into POW on Gen 6.

Fixes assertion failures in three piglit tests on Gen 6 since commit
0087cf23e.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
51c61fff8f46472820ac413ad22e9f3edf670396 24-Apr-2015 Matt Turner <mattst88@gmail.com> i965/fs: Don't constant propagate into integer math instructions.

Constant combining won't promote non-floats, so this isn't safe.

Fixes regressions since commit 0087cf23e.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
0087cf23e8e399778e93369d67dd543e767ab526 17-Mar-2015 Matt Turner <mattst88@gmail.com> i965/fs: Allow 2-src math instructions to have immediate src1.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
28e9601d0e681411b60a7de8be9f401b0df77d29 16-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Add a devinfo field to backend_visitor and use it for gen checks

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
472ef9a02f2e5c5d0caa2809cb736a0f4f0d4693 03-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Change SEL and MOV types as needed to propagate source modifiers

SEL and MOV instructions, as long as they don't have source modifiers, are
just copying bits around. This commit adds support to copy propagation to
switch the type of a SEL or MOV instruction as needed so that it can
propagate source modifiers. This is needed because NIR generates integer
SEL and MOV instructions whenver it doesn't know what else to generate.

shader-db results with NIR:
total FS instructions in shared programs: 4360910 -> 4360186 (-0.02%)
FS instructions in affected programs: 59094 -> 58370 (-1.23%)
helped: 341
HURT: 0
GAINED: 2
LOST: 0

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
bb99a58e7710acd19463646c38cdddbd926e89c4 03-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use the source type when looking for UD negations in copy prop

There can be problems with floats and conditional modifiers when
copy-propagating a negated UD source. The problem arises when a source
modifier is applied to a UD value. In this case, a 33-bit representation
is internally used. If you do the following:

1: mov foo:UD 7U
2: mov bar:UD -foo:UD
3: mov out:F bar:UD

the out register will have the value (float)(unt32_t)-7 which is some very
large floating-point number. However, if we allow copy-propagation of the
second mov, we get

1: mov foo:UD 7U
3: mov out:f -bar:UD

and, since the negation is computed in 33-bits, we get a value of -7.0f
which is clearly not the same. This is a similar problem if the
instruction has a conditional modifier where the 33-bit value is used in
the comparison and not the 32-bit version.

Previously, we checked the source to be copied for the negate and then
checked the source being propagated to for the type. This isn't quite what
we want because we are really just looking for negated UD sources. A check
later in the file ensures that both ends of the propagate have the right
type so it works. However, if we relax the restriction that both ends of
the propagation have the same type, it ends up causing us to bail early in
cases we don't want.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
b53d035825ef3ad680470aa5c4f9dc51f8f5676b 12-Feb-2015 Eric Anholt <eric@anholt.net> util: Move Mesa's bitset.h to util/.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
36bc5f06dd22cde0ba572c00ae7548fe8cb7c731 27-Oct-2014 Matt Turner <mattst88@gmail.com> i965/fs: Allow immediates in MAD and LRP instructions.

And then the opt_combine_constants() pass will pull them out into
registers. This will allow us to do some algebraic optimizations on MAD
and LRP.

total instructions in shared programs: 5946656 -> 5931320 (-0.26%)
instructions in affected programs: 778247 -> 762911 (-1.97%)
helped: 3780
HURT: 6
GAINED: 12
LOST: 12

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
aef83957e1e13ecb96df436d53373ecc4cedeb08 06-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965: Handle negated unsigned immediate values in constant propagation.

Negation of UD/UW sources behaves the same as for D/W sources, taking
the two's complement of the source, except for bitwise logical
operations on Gen8 and up which take the one's complement. Fixes
crash in a GLSL shader with subtraction of two unsigned values.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
a3ee6c7d1991a90d22fae992c1cb94123e51ae54 06-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Remove dependency of fs_inst on the visitor class.

The fs_visitor argument of fs_inst::regs_read() wasn't used at all.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
e87928a494a7cf0985a9d1cd78bda8729d17c614 29-Jan-2015 Matt Turner <mattst88@gmail.com> i965/fs: Add support for constant propagating into sources with modifiers.

All but 16 of the programs helped were ARB fp programs.

total instructions in shared programs: 5949286 -> 5945470 (-0.06%)
instructions in affected programs: 275162 -> 271346 (-1.39%)
helped: 1197
GAINED: 1

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
6dd346c2328df86f843c3a355a7919fb0404f6df 19-Jan-2015 Iago Toral Quiroga <itoral@igalia.com> i965: Fix negate with unsigned integers

For code such as:

uint tmp1 = uint(in0);
uint tmp2 = -tmp1;
float out0 = float(tmp2);

We produce code like:
mov(8) g5<1>.xF -g9<4,4,1>.xUD

which does not produce correct results. This code produces the
results we would expect if tmp1 and tmp2 were signed integers
instead.

It seems that a similar problem was detected and addressed when
using negations with unsigned integers as part of condionals, but
it looks like the problem has a wider impact than that.

This patch fixes the problem by preventing copy-propagation of
negated UD registers in all scenarios, not only in conditionals.

Fixes the following 24 dEQP tests:

dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uint_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec2_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec3_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec4_*

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
3a5c7e47fdcfb3e322c0756e960cbcf8403e4230 16-Oct-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Allow constant propagation between different types

This will be needed for NIR because it is typeless and treats all constants
as uint32 values and reinterprets them when they are used later. This
commit allows those values to be properly propagated.

Also, this helps some synmark shaders because it allows us to copy
propagate a 0x00000000UD into a 0.0F in a load_payload, which then lets us
combine 4 load_payloads.

instructions in affected programs: 2288 -> 2144 (-6.29%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
840e8fc9203390615f051259efeab0f61f48bbfc 28-Oct-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Don't copy propagate constants from sources with saturate

We don't propagate the saturate bit and some instructions can't
saturate at all. If the source has saturate set, just skip propagation.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
29f4c5b5d5d142f19283c06e77bedd4b3793657a 24-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Constant propagate into LOAD_PAYLOAD

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
5f41d052bf53e32761fb528f4be99a1af3a33ebc 20-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Make fs_reg::effective_width take fs_inst* instead of fs_visitor*

Now that we have execution sizes, we can use that instead of the
dispatch width. This way it also works for 8-wide instructions in
SIMD16.

i965/fs: Make effective_width a variable instead of a function

i965/fs: Preserve effective width in constant propagation

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
7210583eb84a5d49803dbe37b0960373b4224d10 18-Aug-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode

This is actually the squash of a bunch of different changes. Individual
commit titles follow:

i965/fs: Always 2-align registers SIMD16 for gen <= 5

i965/fs: Use the register width when applying offsets

This reworks both byte_offset() and offset() to be more intelligent.
The byte_offset() function now supports offsets bigger than 32. The
offset() function uses the byte_offset() function together with the
register width and the type size to offset the register by the correct
amount.

i965/fs: Change regs_read to be in hardware registers

i965/fs: Change regs_written to be actual hardware registers

i965/fs: Properly handle register widths in LOAD_PAYLOAD

The LOAD_PAYLOAD instruction is a bit special because it collects a
bunch of registers (with possibly different widths) into a single
payload block. Once the payload is constructed, it's treated as a
single block of data and most of the information such as register widths
doesn't matter anymore. In particular, the offset of any particular
source register is the accumulation of the sizes of the previous source
registers.

i965/fs: Properly set writemasks in LOAD_PAYLOAD

i965/fs: Handle register widths in demote_pull_constants

i965/fs: Get rid of implicit register doubling in the allocator

i965/fs: Reserve enough registers for PLN instructions

i965/fs: Make sources and destinations interfere in 16-wide

i965/fs: Properly handle register widths in CSE

i965/fs: Properly handle register widths in register_coalesce

i965/fs: Properly handle widths in copy propagation

i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD

i965/fs: Properly handle register widths and odd register sizes in spilling

i965/fs: Don't waste a register on texture lookups for gen >= 7

Previously, we were waisting a register in SIMD16 mode because we could
only allocate registers in pairs. Now that we can allocate and address
odd-sized registers, let's get rid of this special-case.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
1728e74957a62b1b4b9fbb62a7de2c12b77c8a75 24-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Copy propagate partial reads.

This commit reworks copy propagation a bit to support propagating the
copying of partial registers. This comes up every time we have pull
constants because we do a pull constant read immediately followed by a move
to splat the one component of the out to 8 or 16-wide. This allows us to
eliminate the copy and simply use the one component of the register.

Shader DB results:

total instructions in shared programs: 5044937 -> 5044428 (-0.01%)
instructions in affected programs: 66112 -> 65603 (-0.77%)
GAINED: 0
LOST: 0

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
a4fb8897a2bd00eefa8a503ec17d45e791bced91 01-Sep-2014 Matt Turner <mattst88@gmail.com> i965: Remove now unneeded calls to calculate_cfg().

Now that nothing invalidates the CFG, we can calculate_cfg() immediately
after emit_fb_writes()/emit_thread_end() and never again.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
072ea414d04f1b9a7bf06a00b9011e8ad521c878 01-Sep-2014 Matt Turner <mattst88@gmail.com> i965: Remove cfg-invalidating parameter from invalidate_live_intervals.

Everything has been converted to preserve the CFG.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
40aeb558ce8a7ffaaa6f81be16419b9b238c16d8 03-Jul-2014 Abdiel Janulgue <abdiel.janulgue@linux.intel.com> i965/fs: Allow propagation of instructions with saturate flag to sel

When sel conditon is bounded within 0 and 1.0. This allows code as:
mov.sat a b
sel.ge dst a 0.25F

To be propagated as:
sel.ge.sat dst b 0.25F

v3: Syntax clarifications in inst->saturate assignment (Matt Turner)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
20a849b4aa63c7fce96b04de674a4c70f054ed9c 13-Jul-2014 Matt Turner <mattst88@gmail.com> i965: Use basic-block aware insertion/removal functions.

To avoid invalidating and recreating the control flow graph. Also stop
invalidating the CFG in places we didn't add or remove an instruction.

cfg calculations: 202951 -> 80307 (-60.43%)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
596990d91e2a4c4a3a303c6c2da623bf1840771b 12-Jul-2014 Matt Turner <mattst88@gmail.com> i965: Add and use foreach_block macro.

Use this as an opportunity to rename 'block_num' to 'num'. block->num is
clear, and block->block_num has always been redundant.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
9b9dd22f4448da6a6e825faaa40cd601b6fb2b59 29-Jul-2014 Anuj Phogat <anuj.phogat@gmail.com> i965: Bail on FS copy propagation for scratch writes with source modifiers

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
e005c1148db04cb706a65392c2b0beda19efa0b1 11-Aug-2014 Matt Turner <mattst88@gmail.com> i965: Return NONE from brw_swap_cmod on unknown input.

Comparing ~0u with a packed enum (i.e., 1 byte) always evaluates to
false. Shouldn't gcc warn about this?

Reported-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
1761671b0627ce8e1c0eae721e1fca5c2d04690e 12-Jul-2014 Matt Turner <mattst88@gmail.com> i965: Replace cfg instances with calls to calculate_cfg().

Avoids regenerating it unnecessarily.

Every program in shader-db improved, none by an amount less than a 1/3
reduction. One Dota2 shader decreased from 62 -> 24.

cfg calculations: 429492 -> 193197 (-55.02%)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
a2de6562783ea87ca5fbcb67dbd36c2f345f2054 16-Jul-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Don't copy propagate abs into Broadwell logic instructions.

It's not clear what abs on logical instructions means on Broadwell, and
it doesn't appear to do anything sensible.

Fixes 270 Piglit tests (the bitand/bitor/bitxor tests with abs).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81157
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
58270c2fac493497ed7923830f49051a53e86a07 08-Jul-2014 Connor Abbott <cwabbott0@gmail.com> exec_list: Make various places use the new length() method.

Instead of hand-rolling it.

v2 [mattst88]: Rename get_size to length. Expand comment in ir_reader.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
46e5b2a497216133be656b38ebfcf96da64b7744 30-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Make a brw_conditional_mod enum.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
53992a102ffddf2e0fad401252cfc1c034d022ad 30-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Use immediate storage in brw_reg for visitor regs.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
266109736a9a69c3fdbe49fe1665a7a63c5cc122 25-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Use typed foreach_in_list_safe instead of foreach_list_safe.

Acked-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
c5030ac0ac15d3c91c4352789f94281da9a9dcad 25-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Use typed foreach_in_list instead of foreach_list.

Acked-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
bc2fbbafd216676ccc7c3abd794ecb7dd1fa631f 24-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Add and use foreach_inst_in_block macros.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
038eb649b30dfddaf40888ea28b5e88de3af2214 24-Jun-2014 Matt Turner <mattst88@gmail.com> i965/fs: Constant propagate into 2-src math instructions on Gen8.

total instructions in shared programs: 1878133 -> 1876986 (-0.06%)
instructions in affected programs: 153007 -> 151860 (-0.75%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
aca4a951ea2bab855bcc2491a3b8996b54639ebd 24-Jun-2014 Matt Turner <mattst88@gmail.com> i965/fs: Make try_constant_propagate() static.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
46659d46a8c2f7bbc8deb472faff2dccbde92d29 24-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Make can_do_source_mods() a member of the instruction classes.

Pretty nonsensical to have it as a method of the visitor just for access
to brw.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
18372a710028fcbe1ff74f2f727e986c223957ba 18-Apr-2014 Matt Turner <mattst88@gmail.com> i965/fs: Copy propagate from load_payload.

But only into non-load_payload instructions. Otherwise we would prevent
register coalescing from combining identical payloads.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
609d00e13e1e3e61ce540c42250c35977d4bcaa1 05-Jun-2014 Abdiel Janulgue <abdiel.janulgue@linux.intel.com> i965/fs: skip copy-propate for logical instructions with negated src entries

The negation source modifier on src registers has changed meaning in Broadwell when
used with logical operations. Don't copy propagate when negate src modifier is set
and when the destination instruction is a logical op.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
a66660d2b75197814f5e36b9994b1e9eadff0a2e 05-Jun-2014 Abdiel Janulgue <abdiel.janulgue@linux.intel.com> i965/fs: Refactor check for potential copy propagated instructions.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
b1dcdcde2e323f960833f5c7da65d5c2c20113c9 17-Mar-2014 Matt Turner <mattst88@gmail.com> i965/fs: Loop from 0 to inst->sources, not 0 to 3.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
58bcf5996dc60043eee5946a6f2f96256768fc9f 12-May-2014 Matt Turner <mattst88@gmail.com> i965/cfg: Embed exec_node in bblock_link.

In order to remove bblock_link's inheritance of exec_node. Also makes
linked list walk code much nicer.

Acked-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
63d57f3b086db1e403df5d8f61a368df5e2f5afc 26-Mar-2014 Matt Turner <mattst88@gmail.com> i965/fs: Name temporary ralloc contexts something other than mem_ctx.

Or else poor programmers might mistakenly use the temporary mem_ctx,
instead of the fs_visitor's mem_ctx and wonder why their code is
crashing.

Also remove the parenting. These contexts are local to the optimization
passes they're in and are freed at the end.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
a76e5dce4fc8d50f8699c108833f24e80167d706 23-Dec-2013 Eric Anholt <eric@anholt.net> i965: Move compiler debugging output to stderr.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
019bf6ed8dd4843512e9d4924f4702ce36047ad5 15-Jan-2014 Francisco Jerez <currojerez@riseup.net> i965/fs: Remove fs_reg::smear.

The same effect can be achieved using a combination of ::stride and
::subreg_offset. Remove the less flexible ::smear to keep the data
members of fs_reg orthogonal.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
756d37b1d6d09ad7ee3b8835888a49d4256e427b 08-Dec-2013 Francisco Jerez <currojerez@riseup.net> i965/fs: Add support for specifying register horizontal strides.

v2: Some improvements for copy propagation with non-contiguous
register strides and mismatching types.
v3: Add example of the situation that the copy propagation changes are
intended to avoid. Clarify that 'fs_reg::apply_stride()' is expected
to work with zero strides too.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
4c7206bafdd7bde7617e14840812e43459682718 08-Dec-2013 Francisco Jerez <currojerez@riseup.net> i965/fs: Add support for sub-register byte offsets to the FS back-end IR.

It would be nice if we could have a single 'reg_offset' field
expressed in bytes that would serve the purpose of both, but the
semantics of 'reg_offset' are quite complex currently (it's measured
in units of one, eight or sixteen dwords depending on the register
file and the dispatch width) and changing it to bytes would be a very
intrusive change at this stage. Add a separate 'subreg_offset' field
for now.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
b1eb2ad8d159748034e7befc22b46a0b3b040186 28-Nov-2013 Matt Turner <mattst88@gmail.com> i965: Allow commuting the operands of ADDC for const propagation.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
8786f381eca2c818e381af74feda8d4a22c0e411 26-Nov-2013 Matt Turner <mattst88@gmail.com> i965: Allow constant propagation into ASR and BFI1.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
d2fcdd0973ee33a2627d1dee6d78091e605af160 29-Nov-2013 Matt Turner <mattst88@gmail.com> i965/cfg: Clean up cfg_t constructors.

parent_mem_ctx was unused since db47074a, so remove the two wrappers
around create() and make create() the constructor.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
503fe278b070285b75a4000408873973d8d5f2b1 20-Oct-2013 Matt Turner <mattst88@gmail.com> i965: s/Muchnik/Muchnick/.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
36fbe66d3a71df76fcb6f915846da4471b3a8442 10-Oct-2013 Eric Anholt <eric@anholt.net> i965/fs: Convert gen7 to using GRFs for texture messages.

Looking at Lightsmark's shaders, the way we used MRFs (or in gen7's
case, GRFs) was bad in a couple of ways. One was that it prevented
compute-to-MRF for the common case of a texcoord that gets used
exactly once, but where the texcoord setup all gets emitted before the
texture calls (such as when it's a bare fragment shader input, which
gets interpolated before processing main()). Another was that it
introduced a bunch of dependencies that constrained scheduling, and
forced waits for texture operations to be done before they are
required. For example, we can now move the compute-to-MRF
interpolation for the second texture send down after the first send.

The downside is that this generally prevents
remove_duplicate_mrf_writes() from doing anything, whereas previously
it avoided work for the case of sampling from the same texcoord twice.
However, I suspect that most of the win that originally justified that
code was in avoiding the WAR stall on the first send, which this patch
also avoids, rather than the small cost of the extra instruction. We
see instruction count regressions in shaders in unigine, yofrankie,
savage2, hon, and gstreamer.

Improves GLB2.7 performance by 0.633628% +/- 0.491809% (n=121/125, avg of
~66fps, outliers below 61 dropped).

Improves openarena performance by 1.01092% +/- 0.66897% (n=425).

No significant difference on Lightsmark (n=44).

v2: Squash in the fix for register unspilling for send-from-GRF, fixing a
segfault in lightsmark.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
4b821a97b5fcdc4c530d5455c43196be09830322 06-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Create a helper function for invalidating live intervals.

For now, this simply sets live_intervals_valid = false, but in the
future it will do something more sophisticated.

Based on a patch by Eric Anholt.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
014cce3dc49f5b0bfd7fbb1940ed661c9fc7bbd7 19-Sep-2013 Matt Turner <mattst88@gmail.com> i965: Generate code for ir_binop_carry and ir_binop_borrow.

Using the ADDC and SUBB instructions on Gen7.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
89f5f675ad27bd485d1c1be654ca10c49321957f 06-Aug-2013 Matt Turner <mattst88@gmail.com> i965: Allow immediates to be folded into logical and shift instructions.

These instructions will be used with immediate arguments in the upcoming
ldexp lowering pass and frexp implementation.

v2: Add vec4 support as well.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
9d08756ac7d27b5a392d17c2a91a55cec6edab5d 19-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Add code to print out global copy propagation sets.

This was invaluable when debugging the global copy propagation
algorithm. We may as well commit it in case someone needs to print
out the sets in the future.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
44960ef918fff24cf7e49f4c89e845709aae3541 10-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Re-enable global copy propagation.

I believe the data flow analysis actually works now, and it should be
safe to re-enable global copy propagation. It even does things now.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
72f2249c115a6bfafc809ebb4cb78c860279e41f 10-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Fix computation of livein.

Since the initial value for livein is an overestimation (0xffffffff),
it's extremely likely that it will shrink, which means we can't simply
OR in new bits - we need to fully recompute it based on the current
liveout values.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
70b02a7facf88d5f17655be5e17f053d8531a278 10-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Fully recompute liveout at each step.

Since we start with an overestimation of livein (0xffffffff), successive
steps can actually take away values. This means we can't simply OR in
new liveout values; we need to recompute it from scratch at each
iteration of the fixed-point algorithm.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
d20b472d0a6b016e4827d0986a10df29277a3a5e 10-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Skip the initial block when updating livein/liveout.

The starting block always has livein = 0 and liveout = copy. Since we
start with real data, not estimates, there's no need to refine it with
the fixed point algorithm.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
731145c5794c2831a833778b0940c999273ec984 12-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Drop unnecessary and incorrect liveout initialization.

The previous commit properly initialized liveout. This previous
(and incorrect) initialization is no longer necessary.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
1d40c784f22dcbe814e7915d1fae45774a264526 10-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Properly initialize the livein/liveout sets.

Previously, livein was initialized to 0 for all blocks. According to
the textbook, it should be the universal set (~0) for all blocks except
the one representing the start of the program (which should be 0).

liveout also needs to be initialized to COPY for the initial block.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
f06826cece7ad6348c93760e473e5a35ad872431 10-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Use the COPY set in the calculation for liveout.

According to page 360 of the textbook, the proper formula for liveout
is:

CPout(n) = COPY(i) union (CPin(i) - KILL(i))

Previously, we omitted COPY.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
a291c59bbae7d9d96487a984f81a298a1fd71389 10-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Simplify liveout calculation.

Excluding the existing liveout bits is a deviation from the textbook
algorithm. The reason for doing so was to determine if the value
changed, which means the fixed-point algorithm needs to run for another
iteration.

The simpler way to do that is to save the value from step (N-1) and
compare it to the new value at step N.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
597efd2b67d1afb8a95be38145c4f977ed36b672 09-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Create the COPY() set for use in copy propagation dataflow.

This is the "COPY" set from Muchnick's textbook, which is necessary
to do the dataflow algorithm correctly.

v2: Simplify initialization based on Paul Berry's observation that
out_acp contains exactly what needs to be in the COPY set.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
669d4d7f77648948800abce59bc99a29a338a3ad 10-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Rename setup_kills() to setup_initial_values().

Although this function currently only initializes the KILL set, it will
soon initialize other data flow sets as well.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
2ef81372dccc102d95b3dcec22b42406e1b55af9 10-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Separate the updating of liveout/livein.

To compute the actual liveout/livein data flow values, we start with
some initial values and apply a fixed-point algorithm until they settle.

Previously, we iterated through all blocks, updating both liveout and
livein together in one pass. This is awkward, since computing livein
for a block requires knowing liveout for all parent blocks. Not all
of those parent blocks may have been processed yet.

This patch separates the two. First, we update liveout for all blocks.
At iteration N of the fixed-point algorithm, this uses livein values
from iteration N-1. Secondly, we update livein for all blocks. At
step N, this uses the liveout information we just computed (in step N).

This ensures each computation has a consistent picture of the data,
rather than seeing an random mix of data from steps N-1 and N depending
on the order of the blocks in the CFG data structure.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
7d86042dee17dfd985dcab098fc97838c11a5662 10-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Rename "cont" to "progress" in dataflow algorithm.

This variable indicates that the fixed-point algorithm made changes to
the data at this step, so it needs to run for another iteration.

"progress" seems a nicer name for that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
0225dea6c49674a27d5be6e933447d8a4ba5a82e 10-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Switch to a do-while loop in copy propagation dataflow.

The fixed-point algorithm needs to run at least once, so a do-while loop
is more natural.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
3c68662bb1d41727b6c53fd58868cdcfe6a98492 10-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Skip global copy propagation step.

The dataflow analysis used for global copy propagation is severely
broken, and I believe it doesn't actually do anything. Fixing it will
require a lot of changes, each of which might break things.

Once all the fixes land, we can re-enable this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
9c48ae751ab28f35eb878551d24c071be0ce11b0 09-Aug-2013 Matt Turner <mattst88@gmail.com> i965: Don't copy propagate bitcasts with source modifiers.

Previously, copy propagation would cause bitcast_f2u(abs(float)) to
be performed in a single step, but the application of source modifiers
(abs, neg) happens after type conversion, leading to incorrect results.

That is, for bitcast_f2u(abs(float)) we would in fact generate code to
do abs(bitcast_f2u(float)).

For example, whereas bitcast_f2u(abs(float)) might result in a register
argument such as
(abs)g2.2<0,1,0>UD

v2: Set interfered = true and break in register_coalesce instead of
returning false.

Reviewed-by: Paul Berry <stereoytpe441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
2cb7f1e766d28dd238274f74d9568ab4438c4965 04-Jun-2012 Eric Anholt <eric@anholt.net> i965/fs: Add a helper function for checking for partial register updates.

These checks were all over, and every time I wrote one I had to try to
decide again what the cases were for partial updates.

v2: Fix inadvertent reladdr check removal.
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
df25b4f3cf22282b06e622f3cf1f5855b8f767a8 04-Apr-2013 Eric Anholt <eric@anholt.net> mesa: Add a macro to bitset for determining bitset size.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
1d6ead38042cc0d1e667d8ff55937c1e32d108b1 15-Mar-2013 Eric Anholt <eric@anholt.net> i965/fs: Allow constant propagation into MACH.

This happens quite a bit with varying-index uniform loads. We could also
do better by avoiding the MACH entirely, but there's no reason not to at
least take this step.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
eda434921d6d7980f8b116e4ebde2da6553b9094 22-Mar-2013 Eric Anholt <eric@anholt.net> i965/fs: Improve performance of copy propagation dataflow using bitsets.

Reduces compile time of l4d2's slowest shader by 17.8% +/- 1.3% (n=10).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
49bdebad3857bb9ebac53f593d08f0057f5a20d3 16-Feb-2013 Eric Anholt <eric@anholt.net> i965/fs: Fix copy propagation with smearing.

We were correctly relaying the smear from MOV's src, but if the MOV
didn't do a smear, we don't want to smash the smear value from the
instruction being propagated into. Prevents a regression in the
upcoming UBO change.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.1 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
2c4ad502ce12259160be6c73ebdd6e73a5d27c6f 08-Jan-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Fix build error with clang.

Technically, variable sized arrays are a required feature of C99,
redacted to be optional in C11, and not actually part of C++ whatsoever.

Gcc allows using them in C++ unless you specify -pedantic, and Clang
appears to allow them for simple/POD types.

exec_list is arguably POD, since it doesn't have virtual methods, but I
can see why Clang would be like "meh, it's a C++ struct, say no", seeing as
it's meant to support C99.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58970
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
f22a909a080d603db122ac8517a80bd8f4006fe2 09-Nov-2012 Eric Anholt <eric@anholt.net> i965/fs: Restrict optimization that would fail for gen7's SENDs from GRFs

v2: Fix SNB math bug in register_coalesce() where I was looking at the
instruction to be removed, not the instruction to be copy propagated
into.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
29340d02dc38a9cc352d44412871dc9d4e3f878a 07-Nov-2012 Eric Anholt <eric@anholt.net> i965/fs: Rename the existing pull constant load opcode.

We're going to use another send message for handling loads with a varying
per-fragment array index.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
177c82555b24a80c15c34315ff17437cc39d1ba5 30-Oct-2012 Eric Anholt <eric@anholt.net> i965/fs: Add support for global copy propagation.

It is common for complicated shaders, particularly code-generated ones, to
have a big array of uniforms or attributes, and a prologue in the shader that
dereferences from the big array to more informatively-named local variables.
Then there will be some small control flow operation (like a ? : statement),
and then use of those informatively-named variables. We were emitting extra
MOVs in these cases, because copy propagation couldn't reach across control
flow.

Instead, implement dataflow analysis on the output of the first copy
propagation pass and re-run it to propagate those extra MOVs out.

On one future Steam release, reduces VS+FS instruction count from 42837 to
41437. No statistically significant performance difference (n=48), though, at
least at the low resolution I'm running it at.

shader-db results:

total instructions in shared programs: 722170 -> 702545 (-2.72%)
instructions in affected programs: 260618 -> 240993 (-7.53%)

Some shaders do get hurt by up to 2 instructions, because a choice to copy
propagate instead of coalesce or something like that results in a dead write
sticking around. Given that we already have instances of those instructions
in the affected programs (particularly unigine), we should just improve dead
code elimination to fix the problem.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
9864a5b098d2a8087c8966f3e654969be51b660d 30-Oct-2012 Eric Anholt <eric@anholt.net> i965/fs: Fix a comment in copy propagation.

We haven't been only tracking raw GRF-GRF moves since the constant propagation
merge, and also the extension for source modifiers and uniforms.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
545b59b62a65a9cf1121668bd407e3365bbfa296 30-Oct-2012 Eric Anholt <eric@anholt.net> i965/fs: Allow copy-propagation on pull constant load values.

Given that we handle similarly-regioned GRFs registers for our copy
propagation from our UNIFORM file, there's no reason not to allow it.

The shader-db impact is negligible -- +90 instructions total, 2 shaders helped
and 7 hurt (slightly increased register pressure increased spilling), but this
is to prevent regression in other shaders when fixing copy_propagation to
reduce register pressure in the shaders that are hurt here.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
c226b7a4d3f0fbba08a384e3bacd08b3a0a82531 03-Oct-2012 Eric Anholt <eric@anholt.net> i965: Make the cfg reusable from the VS.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
54679fcbcae7a2d41cb439e52e386bd811a291b4 03-Oct-2012 Eric Anholt <eric@anholt.net> i965: Share the predicate field between FS and VS.

Note that BRW_PREDICATE_NONE is 0 and BRW_PREDICATE_NORMAL is 1, so that's a
lot like the true/false we had in the FS before.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
7abfb67dc42ec3a96443ed025807267646c56e86 03-Oct-2012 Eric Anholt <eric@anholt.net> i965: Rename fs_cfg types to not mention fs.

fs_bblock_link -> bblock_link
fs_bblock -> bblock_t (to avoid conflicting with all the fs_bblock *bblock)
fs_cfg -> cfg_t (to avoid conflicting with all the fs_cfg *cfg)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
5ed57d9543df1875d843638212a1f650fc0b17ec 03-Oct-2012 Eric Anholt <eric@anholt.net> i965: Move brw_fs_cfg.* to brw_cfg.*.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
6a514494fa4c45e921bd6af7f3187a67c1e8d9d2 21-Sep-2012 Eric Anholt <eric@anholt.net> i965/fs: Improve performance of copy/constant propagation.

Use a simple chaining hash table for the ACP. This is not really very good,
because we still do a full walk of the tree per destination write, but it
still reduces fp-long-alu runtime from 5.3 to 3.9s.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
fb5bf03a2092159166229eacf57c71587f762c57 21-Sep-2012 Eric Anholt <eric@anholt.net> i965/fs: Move constant propagation to the same codebase as copy prop.

This means that we don't get constant prop across into the first block after a
BRW_OPCODE_IF or a BRW_OPCODE_DO, but we have hope for properly doing it
across control flow at some point. More importantly, with the next commit it
will help avoid O(n^2) with instruction count runtime for shaders that have
many constant moves.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
a454f8ec6df9334df42249be910cc2d57d913bff 07-Jul-2012 Eric Anholt <eric@anholt.net> i965/fs.h: Refactor tests for instructions modifying a register.

There's one instance of a potential behavior change: propagate_constants may
now propagate into a part of a vgrf after a different part of it was
overwritten by a send that returns multiple registers. I don't think we ever
generate IR that meets that condition, but it's something to note if we bisect
behavior change to this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
1e28f55ab7909496d93ab1b552faad17453c10ac 05-Jun-2012 Eric Anholt <eric@anholt.net> i965/fs: Invalidate live intervals after copy propagation.

For copy propgation, we've dropped the use of a GRF in favor of a
(probably later) use of a different GRF. This definitely requires
invalidating intervals.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
dd4282e38fd92c081875da6bce0b2345bd472532 07-Jun-2012 Eric Anholt <eric@anholt.net> i965/fs: Allow copy propagation on uniforms.

This is a big win for savage2, hon and yofrankie. 62 new programs for
savage2/hon get 16-wide mode, along with one for humus demos and two
for tropics. Even a few shaders from tropics see reductions of 15% or
more.

total instructions in shared programs: 216536 -> 207353 (-4.24%)
instructions in affected programs: 123941 -> 114758 (-7.41%)

In benchmarking Tropics, only a .040% +/- 034% performance improvement
was observed (n=90). Rather disappointing, but I was primarily
motivated to do this patch by a regression in the number of 16-wide
shaders compiled after a GRF texturing on IVB patch I'm working on.
Hopefully this helps avoid that regression.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
0c4630bae001139dea42b78cd08157de4d90542b 06-Jun-2012 Eric Anholt <eric@anholt.net> i965/fs: Allow copy propagation with source modifiers.

This shaves a few instructions off of a ton of programs. For 12
shaders from tropics and sanctuary, it's enough reduction in register
pressure to get 16-wide mode. 7 shaders from heroes of newerth and
savage2 are hurt by about 1.1%, where copy propagation of negates ends
up preventing coalescing, but we could regain that by doing dataflow
analysis in our copy propagation.

No significant performance difference in tropics (n=11)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
458f7f014139deb48a4cf0a9e6bdca3a57d24208 06-Jun-2012 Eric Anholt <eric@anholt.net> i965/fs: Move copy propagation test out to a separate function.

It's going to get more complicated in a moment.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
d7787adda8006506545256547d8d590a282487af 08-May-2012 Eric Anholt <eric@anholt.net> i965/fs: Add support for copy propagation.

We could do more by handling abs/negate and non-GRF sources, but this is
a good start. Improves tropics performance 0.30% +/- .17% (n=43).

shader-db results:
Total instructions: 208032 -> 207184
60/1246 programs affected (4.8%)
23286 -> 22438 instructions in affected programs (3.6% reduction)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp