History log of /external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
7d3a10c516eb5b1b4058bee4640c5fbb22617f5b 20-Feb-2017 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/fs: detect different bit size accesses to uniforms to push them in proper locations

Previously, if we had accesses with different sizes to the same uniform, we might not
push it aligned with the bigger one. This is a problem in BSW/BXT when we access
an array of DF uniform with both direct and indirect addressing because for the latter
we use 32-bit MOV INDIRECT instructions. However this problem can happen with other
generations and bitsizes.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit a497ab6838ae5a9898abfed82f7bc8295b490911)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d4caa4249c28a4941e9f6a57ea97955f5d63797f 21-Feb-2017 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/fs: mark last DF uniform array element as 64 bit live one

This bug can make that we don't detect the end of a contiguous area
correctly and push larger areas than the real ones.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 7427425247d80c9f59a3c3ad2dfeeb2429de6f67)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0d5071db5e50629a63490639a3c86dfc65bf27ab 13-Jan-2017 Kenneth Graunke <kenneth@whitecape.org> i965: Move Gen4-5 interpolation stuff to brw_wm_prog_data.

This fixes glxgears rendering, which had surprisingly been broken since
late October! Specifically, commit 91d61fbf7cb61a44adcaae51ee08ad0dd6b.

glxgears uses glShadeModel(GL_FLAT) when drawing the main portion of the
gears, then uses glShadeModel(GL_SMOOTH) for drawing the Gouraud-shaded
inner portion of the gears. This results in the same fragment program
having two different state-dependent interpolation maps: one where
gl_Color is flat, and another where it's smooth.

The problem is that there's only one gen4_fragment_program, so it can't
store both. Each FS compile would trash the last one. But, the FS
compiles are cached, so the first one would store FLAT, and the second
would see a matching program in the cache and never bother to compile
one with SMOOTH. (Clearing the program cache on every draw made it
render correctly.)

Instead, move it to brw_wm_prog_data, where we can keep a copy for
every specialization of the program. The only downside is bloating
the structure a bit, but we can tighten that up a bit if we need to.
This also lets us kill gen4_fragment_program entirely!

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5edc3381628d1db4468f31b1c66bb518146e35b5 09-Jan-2017 Kenneth Graunke <kenneth@whitecape.org> compiler: Merge shader_info's tcs and tes structs.

Annoyingly, SPIR-V lets you specify all of these fields in either the
TCS or TES, which means that we need to be able to store all of them
for either shader stage. Putting them in a union won't work.

Combining both is an easy solution, and given that the TCS struct only
had a single field, it's pretty inexpensive.

This patch renames the combined struct to "tess" to indicate that it's
for tessellation in general, not one of the two stages.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c2acf97fcc9b32eaa9778771282758e5652a8ad4 16-Dec-2016 Juan A. Suarez Romero <jasuarez@igalia.com> nir/i965: use two slots from inputs_read for dvec3/dvec4 vertex input attributes

So far, input_reads was a bitmap tracking which vertex input locations
were being used.

In OpenGL, an attribute bigger than a vec4 (like a dvec3 or dvec4)
consumes just one location, any other small attribute. So we mark the
proper bit in inputs_read, and also the same bit in double_inputs_read
if the attribute is a dvec3/dvec4.

But in Vulkan, this is slightly different: a dvec3/dvec4 attribute
consumes two locations, not just one. And hence two bits would be marked
in inputs_read for the same vertex input attribute.

To avoid handling two different situations in NIR, we just choose the
latest one: in OpenGL, when creating NIR from GLSL/IR, any dvec3/dvec4
vertex input attribute is marked with two bits in the inputs_read bitmap
(and also in the double_inputs_read), and following attributes are
adjusted accordingly.

As example, if in our GLSL/IR shader we have three attributes:

layout(location = 0) vec3 attr0;
layout(location = 1) dvec4 attr1;
layout(location = 2) dvec3 attr2;

then in our NIR shader we put attr0 in location 0, attr1 in locations 1
and 2, and attr2 in location 3 and 4.

Checking carefully, basically we are using slots rather than locations
in NIR.

When emitting the vertices, we do a inverse map to know the
corresponding location for each slot.

v2 (Jason):
- use two slots from inputs_read for dvec3/dvec4 NIR from GLSL/IR.

v3 (Jason):
- Fix commit log error.
- Use ladder ifs and fix braces.
- elements_double is divisible by 2, don't need DIV_ROUND_UP().
- Use if ladder instead of a switch.
- Add comment about hardware restriction in 64bit vertex attributes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e729504fb1799c3ae31cea76d73946530ef9806f 14-Sep-2016 Timothy Arceri <timothy.arceri@collabora.com> nir: pass compiler rather than devinfo to functions that call nir_optimize

Later we will pass compiler to nir_optimise to be used by the loop unroll
pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b56fa830c6095f8226456b2aeb62f2dfad804be5 09-Dec-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Fetch one cacheline of pull constants at a time.

Asking the DC for less than one cacheline (4 owords) of data for
uniform pull constants is suboptimal because the DC cannot request
less than that from L3, resulting in wasted bandwidth and unnecessary
message dispatch overhead, and exacerbating the IVB L3 serialization
bug. The following table summarizes the overall framerate improvement
(with statistical significance of 5% and sample size ~10) from the
whole series up to this patch for several benchmarks and hardware
generations:

| SKL | BDW | HSW
SynMark2 OglShMapPcf | 24.63% ±0.45% | 4.01% ±0.70% | 10.31% ±0.38%
GfxBench4 gl_manhattan31 | 5.93% ±0.35% | 3.92% ±0.31% | 6.62% ±0.22%
GfxBench4 gl_4 | 2.52% ±0.44% | 1.23% ±0.10% | N/A
Unigine Valley | 0.83% ±0.17% | 0.23% ±0.05% | 0.74% ±0.45%

Note that there are two versions of the Manhattan demo shipped with
GfxBench4, one of them is the original gl_manhattan demo which doesn't
use UBOs, so this patch will have no effect on it, and another one is
the gl_manhattan31 demo based on GL 4.3/GLES 3.1, which this patch
benefits as shown above.

I haven't observed any statistically significant regressions in the
benchmarks I have at hand. Note that the comparatively huge
improvement on SKL in the OglShMapPcf test case is due to the combined
effect of this patch and the register pressure benefit on SKL+ of
"i965/fs: Switch to the constant cache for uniform pull constants.",
part of the same series.

Going up to 8 oword blocks would improve performance of pull constants
even more, but at the cost of some additional bandwidth and register
pressure, so it would have to be done on-demand based on the number of
constants actually used by the shader.

v2: Fix for Gen4 and 5.
v3: Non-trivial rebase. Rework to allow the visitor specifiy
arbitrary pull constant block sizes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9b22a0d295316b7547667ebbfe1e1b6182439186 09-Dec-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Expose arbitrary pull constant load sizes to the IR.

Change the FS generator to ask the dataport for enough owords worth of
constants to fill the execution size of the instruction -- Which means
that the visitor now needs to set the execution size correctly for
uniform pull constant load instructions, which we were kind of
neglecting until now.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ad38ba113491869ab0dffed937f7b3dd50e8a735 26-Oct-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Switch to the constant cache for uniform pull constants.

This reverts to using the oword block read messages for uniform pull
constant loads, as used to be the case until
4c1fdae0a01b3f92ec03b61aac1d3df5. There are two important differences
though: Now the L3 cacheability bits are set up correctly for UBOs
(since 11f5d8a5d4fbb861ec161f68593e429cbd65d1cd), and we target the
constant cache instead of the data cache. The latter used to get no
L3 way allocation on boot on all platforms that existed at the time,
so oword read messages wouldn't get cached on L3 regardless of the
MOCS bits, what probably explains the apparent slowness of oword
fetches.

Constant cache loads seem to perform better than SIMD4x2 sampler loads
in a number of cases, they alleviate some of the cache thrashing
caused by the competition with textures for the L1/L2 sampler caches,
and they allow fetching up to 128B worth of constants with a single
oword fetch message.

Note that IVB devices suffer from a hardware bug that leads to
serialization of L3 read requests overlapping the same cacheline as
result of a (on IVB buggy) mechanism of the L3 to preserve coherency.
Since read requests for matching cachelines from any L3 client are not
pipelined, throughput may decrease in cases where there are no
non-overlapping requests left in the queue that can be processed
between them.

This situation should be relatively uncommon as long as we make sure
that we don't use the 1/2 oword messages in cases where the shader
intends to read from any other location of the same cacheline at some
other point. This is generally a good idea anyway on all generations
because using the 1 and 2 oword messages is expected to waste
bandwidth since the minimum L3 request size for the DC is exactly 4
owords (i.e. one cacheline). A future commit will have this effect.
I haven't been able to find any real-world example where this would
still result in a regression on IVB, but if someone happens to find
one it shouldn't be too difficult to add an IVB-specific check to have
it fall back to the sampler cache for pull constant loads.

Note that on SKL+ this change has the additional benefit of reducing
the register footprint of pull constant loads. The following table
summarizes the effect of the whole series on several shader-db stats:

Total instructions Total cycles
BWR: 4571248 -> 4568342 (-0.06%) 123375740 -> 123373296 (-0.00%)
ELK: 3989020 -> 3985402 (-0.09%) 98757068 -> 98754058 (-0.00%)
ILK: 6383591 -> 6376787 (-0.11%) 143649910 -> 143648914 (-0.00%)
SNB: 7528395 -> 7501446 (-0.36%) 103503796 -> 102460370 (-1.01%)
IVB: 6949221 -> 6943317 (-0.08%) 60592262 -> 60584422 (-0.01%)
HSW: 6409753 -> 6403702 (-0.09%) 60609070 -> 60604414 (-0.01%)
BDW: 8043467 -> 7976364 (-0.83%) 68427730 -> 68483042 (0.08%)
CHV: 8045019 -> 7977916 (-0.83%) 68297426 -> 68352756 (0.08%)
SKL: 8204037 -> 7939086 (-3.23%) 66583900 -> 65624378 (-1.44%)

Lost->Gained Total spills Total fills
BWR: 5 -> 5 1488 -> 1488 (0.00%) 1957 -> 1957 (0.00%)
ELK: 5 -> 5 1489 -> 1489 (0.00%) 1958 -> 1958 (0.00%)
ILK: 1 -> 4 1449 -> 1449 (0.00%) 1921 -> 1921 (0.00%)
SNB: 0 -> 0 549 -> 549 (0.00%) 52 -> 52 (0.00%)
IVB: 13 -> 3 1271 -> 1271 (0.00%) 1162 -> 1162 (0.00%)
HSW: 11 -> 0 1271 -> 1271 (0.00%) 1162 -> 1162 (0.00%)
BDW: 12 -> 0 1340 -> 1340 (0.00%) 1452 -> 1452 (0.00%)
CHV: 12 -> 0 1340 -> 1340 (0.00%) 1452 -> 1452 (0.00%)
SKL: 0 -> 120 1269 -> 375 (-70.45%) 1563 -> 690 (-55.85%)

v3: Non-trivial rebase.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fd249c803e3ae2acb83f5e3b7152728e73228b7b 12-Dec-2016 Ilia Mirkin <imirkin@alum.mit.edu> treewide: s/comparitor/comparator/

git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g'

Just happened to notice this in a patch that was sent and included one
of the tokens in question.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6014da50ec41d1ad43fec94a625962ac3f2f10cb 28-Nov-2016 Matt Turner <mattst88@gmail.com> i965/fs: Rename opt_copy_propagate -> opt_copy_propagation.

Matches the vec4 backend, cmod propagation, and saturate propagation.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e9f17e9fb06a4389588f47be8c766b07e8d8b89f 25-Nov-2016 Lionel Landwerlin <lionel.g.landwerlin@intel.com> i965: enable INTEL_conservative_rasterization on Gen9+

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0ff74a8990d9fe37365beb35ed8abacfbf3ed567 06-Dec-2016 Plamena Manolova <plamena.manolova@intel.com> i965: Add i965 plumbing for ARB_post_depth_coverage for i965 (gen9+).

This extension allows the fragment shader to control whether values in
gl_SampleMaskIn[] reflect the coverage after application of the early
depth and stencil tests.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
faf20df143a63e58aa729446f21c38ae39a438f2 29-Nov-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Refactor handling of constant tg4 offsets

Previously, we had an OFFSET_VALUE source for logical texture instructions
that was intended to mean exactly what it says, "offset". In reality, we
only fully used it for tg4 offsets. We used offset_value.file == IMM to
mean, "you have a constant offset, go look in instr->offset" and didn't
actually use the contents of the register at all in that case except for
in nir_emit_texture where we used it as a temporary before we copy it into
instr->offset.

This commit renames OFFSET_VALUE to TG4_OFFSET and restricts its usage to
indirect tg4 offsets only. The nir_emit_texture code is refactored so that
we explicitly build a header_bits value which is placed in instr->offset
and the constant offset values (both for tg4 and regular texture
operations) are used to construct header_bits and don't go through the
offset source at all. Finally, we stop passing offset_value in to
lower_sampler_logical_send_gen5 because we can't do indirect offsets until
gen7 anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b9df2251c17e3ce52fa55c81f492591e08c3ee04 25-Oct-2016 Anuj Phogat <anuj.phogat@gmail.com> i965: Fix GPU hang related to multiple render targets and alpha testing

This patch should have been the part of commit e592f7df.
In a situation when there are multiple render targets with alpha testing
enabled, if fragment shader doesn't write to draw buffer zero, it causes
the GPU hang on SKL. No GPU hang is seen on HSW. Simulator gives a
warning for all gen6+ h/w:
"Illegal render target write message length 0xa expected 0xc"

This patch fixes the GPU hang as well as the simulator warning with
new piglit test fbo-mrt-alphatest-no-buffer-zero-write:
https://patchwork.freedesktop.org/patch/118212

No regressions in Jenkins CI system.

Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
91d61fbf7cb61a44adcaae51ee08ad0dd6b2a03b 20-Oct-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: rewrite brw_setup_vue_interpolation()

Here brw_setup_vue_interpolation() is rewritten not to use the InterpQualifier
array in gl_fragment_program which will allow us to remove it.

This change also makes the code which is only used by gen4/5 more self contained
as it now has its own gen5_fragment_program struct rather than storing the map
in brw_context. This means the interpolation map will only get processed once
and will get stored in the in memory cache rather than being processed everytime
the fs changes.

Also by calling this from the fs compile code rather than from the upload code
and using the interpolation assigned there we can get rid of the
BRW_NEW_INTERPOLATION_MAP flag.

It might not seem ideal to add a gen5_fragment_program struct however by the end
of this series we will have gotten rid of all the brw_{shader_stage}_program
structs and replaced them with a generic brw_program struct so there will only
be two program structs which is better than what we have now.

V2: Don't remove BRW_NEW_INTERPOLATION_MAP from dirty_bit_map until the following
patch to fix build error.

V3 - Suggestions by Jason:
- name struct gen4_fragment_program rather than gen5_fragment_program
- don't use enum with memset()
- create interp mode set helper and simplify logic to call it
- add assert when calling function to show prog will never be NULL for
gen4/5 i.e. no Vulkan

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e1af20f18a86f52a9640faf2d4ff8a71b0a4fa9b 13-Oct-2016 Timothy Arceri <timothy.arceri@collabora.com> nir/i965/anv/radv/gallium: make shader info a pointer

When restoring something from shader cache we won't have and don't
want to create a nir_shader this change detaches the two.

There are other advantages such as being able to reuse the
shader info populated by GLSL IR.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
59864e8e02057cc6fa0448a8af067a3cf53389da 13-Oct-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Don't use nir_assign_var_locations for VS/TES/GS outputs.

Fixes spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-dvec3.

v2: Remove nir_outputs field from fs_visitor (caught by Tim and Iago).

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
27715c73ff84349466f62df0023863acd477f262 15-Oct-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Make split_virtual_grfs() call compact_virtual_grfs().

Post-splitting, VGRFs have a maximum size (MAX_VGRF_SIZE). This is
required by the register allocator, as we have to create classes for
each size of VGRF.

We can (and do) allocate virtual registers larger than MAX_VGRF_SIZE,
but we must ensure that they are splittable. split_virtual_grfs()
asserts that the post-splitting register size is in range.

Unfortunately, these trip for completely dead registers which are too
large - we only set split points for live registers. So dead ones are
never split, and if they happened to be too large, they'd trip asserts.

To fix this, call compact_virtual_grfs() to eliminate dead registers
before splitting.

v2: Add a comment written by Iago.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e51e055fcdf8107aafaba358fa65b00f963e1728 09-Sep-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Introduce downcast helpers for prog_data structures.

Similar to brw_context(...), intel_texture_object(...), and so on.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
df4ff31d3c1d907c237ed0e699deec1e24e8a9d3 04-Oct-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: add MAYBE_UNUSED to assert param

This fixes an unused variable warning on release builds.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
da274ba5f88ca76bb2e4369967cea381b9f219e4 09-Sep-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Drop pointless stage == MESA_SHADER_FRAGMENT checks.

There's an assert right above this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e5311ba1acba738346a18ef661b0f8bbc33bba8e 16-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/ir: Test thread dispatch packing assumptions.

Not [originally] intended for upstream. Should cause a GPU hang if
some thread is executed with a non-contiguous dispatch mask breaking
assumptions of brw_stage_has_packed_dispatch(). Doesn't cause any
CTS, DEQP or Piglit regressions, while replacing
brw_stage_has_packed_dispatch() with a dummy implementation that
unconditionally returns true on top of this patch causes multiple GPU
hangs.

v2: Refactor into a separate function instead of emitting the test
code directly from emit_nir_code(), drop VEC4 test and clean up
slightly for upstream. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f57f526fc5cfaedf26b2becf8f1899d5de0d0461 16-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/ir: Skip eliminate_find_live_channel() for stages with sparse thread dispatch.

The eliminate_find_live_channel optimization eliminates
FIND_LIVE_CHANNEL instructions in cases where control flow is known to
be uniform, and replaces them with 'MOV 0', which in turn unblocks
subsequent elimination of the BROADCAST instruction frequently used on
the result of FIND_LIVE_CHANNEL. This is however not correct in
per-sample fragment shader dispatch because the PSD can dispatch a
fully unlit sample under certain conditions. Disable the optimization
in that case.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

v2: Add devinfo argument to brw_stage_has_packed_dispatch() to
implement hardware generation check.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a2392cee48076f1fe6feab7d49214990cfa6a551 15-Sep-2016 Jason Ekstrand <jason@jlekstrand.net> i965/reg: Make brw_sr0_reg take a subnr and return a vec1 reg

The state register sr0 is really a collection of dwords not a SIMD8
anything. It's much more convenient for brw_sr0_reg to return the
particular dword you're looking for rather than a giant blob you have to
massage into what you want.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
[ Francisco Jerez: Trivial simplification of brw_ud1_reg(). ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
111f6b250d01fa1937103f24b5cb54b15dd77fbf 14-Sep-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Roll set_default_interpolation into lower_fs_inputs

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
246db0063eb6e01aad961b1c73d32fca911ae1df 14-Sep-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use NIR for handling forced per-sample interpolation

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
114874b22beafb2d07006b197c62d717fc7f80cc 14-Sep-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use sample interpolation for interpolateAtCentroid in persample mode

From the ARB_gpu_shader5 spec:

The built-in functions interpolateAtCentroid() and interpolateAtSample()
will sample variables as though they were declared with the "centroid"
or "sample" qualifiers, respectively.

When running with persample dispatch forced by the API, we interpolate
anything that isn't flat as if it's qualified by "sample". In order to
keep interpolateAtCentroid() consistent with the "centroid" qualifier, we
need to make interpolateAtCentroid() do sample interpolation instead.
Nothing in the GLSL spec guarantees that the result of
interpolateAtCentroid is uniform across samples in any way, so this is a
perfectly fine thing to do.

Fixes 8 of the new dEQP-VK.pipeline.multisample_interpolation.* Vulkan CTS
tests that specifically validate consistency between the "sample" qualifier
and interpolateAtSample()

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
eb746a80e5e99bafd3957a1cb2d9db8548a1a6be 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/ir: Update several stale comments.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
47784e2346b56bea6a1111fecaa953239ff198ca 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/ir: Don't print ARF subnr values twice.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ec259f5307bc801f8482f2825ca9d52fe5ead95e 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Print fs_reg::offset field consistently for all register files.

The offset printing code in fs_visitor::dump_instruction() was doing
things differently for sources and destinations and for each register
file -- In some cases it would be added to the base register number
fs_reg::nr, in other cases it would follow the base register separated
with a plus sign, in other cases (uniforms) it would do both (!). The
sub-register offset was also being printed or not rather
inconsistently. Fix it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
80e1d670b4b4c080ce2092a3b52d2415bc4c6a42 01-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Get rid of fs_inst::set_smear().

component() was generally a better alternative because of several
issues set_smear() had:

- It wouldn't take the original stride and offset of the register
into account, which means that set_smear() on the result of
e.g. another set_smear() call or an offset() call would give a
bogus region as result.

- It was an inherently destructive operation. See the
'nir_intrinsic_shader_clock' hunk below for how this could lead to
subtle bugs in cases where set_smear() was called multiple times on
the same register like 'r.set_smear(0), r.set_smear(1)' with the
expectation that each call would return a separate value instead of
a reference to the same subsequently mutated object.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8e58e4412f97be9c3b07d7a7d72d3884606411a2 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Use region_contained_in() in compute-to-mrf coalescing pass.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2d7d4a791083ff63f37ac1e40bfe8b448e7f8045 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Simplify a bunch of fs_inst::size_written calculations by using component_size().

Using component_size() is easier and generally more correct because it
takes into account the register type and stride for you.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
62aaef6c83e4eb354bd7f15803db01e90d22fc34 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Simplify and fix buggy stride/offset calculations using subscript().

These were bashing the 'offset' and 'stride' values of several
registers without taking the previous value into account, which
probably didn't matter in practice for optimize_frontfacing_ternary()
because the 'tmp' register already had a known region, but it would
have given the wrong region as result in the other cases in
lower_integer_multiplication(). subscript(..., i) is a more
straightforward way to take the i-th field of a given type from each
channel of a register which should give the right answer as result
regardless of the original 'offset' and 'stride' parameters of the
register region.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3b7b90878770530ad3da44c6beb1401c40f1ffd6 07-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Simplify get_fpu_lowered_simd_width() by using inequalities instead of rounding.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bae3a411712d815bf8b8d4526c72c174512086d3 08-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix signedness of the return value of fs_inst::size_read().

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a384503c156e182560104e6c43a6bf0c64608791 03-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Switch mask_relative_to() used in compute-to-mrf to byte units.

This makes the helper function less annoying to use and somewhat more
accurate.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
401fc228fd7214086ced0a887bbbefd2e60948fa 03-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix bogus sub-MRF offset calculation in compute-to-mrf.

The 'scan_inst->dst.offset % REG_SIZE' term in the final
'scan_inst->dst.offset' calculation is obviously bogus. The offset
from the start of the copy destination register 'inst->dst' where the
destination of the generating instruction 'scan_inst' would be written
to (before compute-to-mrf runs) is just the offset of 'scan_inst->dst'
relative to the source of the copy instruction (AKA rel_offset in the
code below).

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cd0134072a7e088cf1ebcf1c4250aa13ac8a5c59 03-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Take into account copy register offset during compute-to-mrf.

This was dropping 'inst->dst.offset' on the floor. Nothing in the
code above seems to guarantee that it's zero and in that case the
offset of the register being coalesced into wouldn't be taken into
account while rewriting the generating instruction.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b42c13a5b8ac7d643bbf4c1592607811a81b4ebb 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Drop fs_inst::overwrites_reg() in favor of regions_overlap().

fs_inst::overwrites_reg is rather easy to misuse because it cannot
tell how large the register region starting at 'reg' is, so in cases
where the destination region starts after 'reg' it may give a
misleading result. regions_overlap() is somewhat more verbose to use
but handles arbitrary overlap correctly so it should generally be used
instead.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3a4ea7cf803cb5af2b7d0e7d71ee4825294a94aa 03-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Don't consider LOAD_PAYLOAD with stride > 1 source to behave like a raw copy.

Noticed the problem by inspection while typing in the previous commit.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1164aa1a1ba9d140a2b1435703b0029e0fe69f6f 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Don't consider LOAD_PAYLOAD with sub-GRF offset to behave like a raw copy.

This was likely the original intention, and at least register coalesce
relies on it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
717d8efd584d8db7fbbdbe7deb51371e28d6c492 07-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Take into account misalignment in regs_written() and regs_read().

There was a workaround for this in fs_inst::size_read() for the
SHADER_OPCODE_MOV_INDIRECT instruction and FIXED_GRF register file
*only*. We should take this possibility into account for the sources
and destinations of all instructions on all optimization passes that
need to quantize dataflow in 32B increments by adding the amount of
misalignment to the size read or written from the regs_read() and
regs_written() helpers respectively.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d6b60934aaf2d525f7d1072c0c21af8468254647 07-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Return more accurate read size for LINTERP from fs_inst::size_read.

The LINTERP virtual instruction only reads three scalar components
from the first 16B of the second source, we can now teach size_read()
about it since its return value is represented with byte granularity.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
31a40202b8bdf8bb65d33862144a03610fd57e3f 03-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Return more accurate read size from fs_inst::size_read for IMM and UNIFORM files.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e1a918ba7be6b21303caa2d81671f2d3f17dd692 08-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Replace fs_inst::regs_read with ::size_read using byte units.

The previous regs_read value can be recovered by rewriting each
reference of regs_read() like 'x = i.regs_read(j)' to 'x =
DIV_ROUND_UP(i.size_read(j), reg_unit)'.

For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
69570bbad876bb9da609c3b651aacda28cecc542 07-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Replace fs_inst::regs_written with ::size_written field in bytes.

The previous regs_written field can be recovered by rewriting each
rvalue reference of regs_written like 'x = i.regs_written' to 'x =
DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference
like 'i.regs_written = x' to 'i.size_written = x * reg_unit'.

For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c458eeb94620fbce0a37474fc292545002d67f76 08-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Add wrapper functions for fs_inst::regs_read and ::regs_written.

This is in preparation for dropping fs_inst::regs_read and
::regs_written in favor of more accurate alternatives expressed in
byte units. The main reason these wrappers are useful is that a
number of optimization passes implement dataflow analysis with
register granularity, so these helpers will come in handy once we've
switched register offsets and sizes to the byte representation. The
wrapper functions will also make sure that GRF misalignment (currently
neglected by most of the back-end) is taken into account correctly in
the calculation of regs_read and regs_written.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
be095e11e41158f91bcb3f6fcbc2e2a91a5d9124 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Replace fs_reg::subreg_offset with fs_reg::offset expressed in bytes.

The fs_reg::subreg_offset and ::offset fields are now redundant, the
sub-GRF offset can just be added to the single ::offset field
expressed in byte units. The current subreg_offset value can be
recovered by applying the following rule: Replace each rvalue
reference of subreg_offset like 'x = r.subreg_offset' with 'x =
r.offset % reg_unit', and each lvalue reference like 'r.subreg_offset
= x' with 'r.offset = ROUND_DOWN_TO(r.offset, reg_unit) + x'.

For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
86944e063ad40cac0860bfd85a3cc4e9a9805aa3 01-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes.

The fs_reg::offset field in byte units introduced in this patch is a
more straightforward alternative to the current register offset
representation split between fs_reg::reg_offset and ::subreg_offset.
The split representation makes it too easy to forget about one of the
offsets while dealing with the other, which has led to multiple
back-end bugs in the past. To make the matter worse the unit
reg_offset was expressed in was rather inconsistent, for uniforms it
would be expressed in either 4B or 16B units depending on the
back-end, and for most other things it would be expressed in 32B
units.

This encodes reg_offset as a new offset field expressed consistently
in byte units. Each rvalue reference of reg_offset in existing code
like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and
each lvalue reference like 'r.reg_offset = x' is rewritten to
'r.offset = r.offset % reg_unit + x * reg_unit'.

Because the change affects a lot of places and is rather non-trivial
to verify due to the inconsistent value of reg_unit, I've tried to
avoid making any additional changes other than applying the rewrite
rule above in order to keep the patch as simple as possible, sometimes
at the cost of introducing obvious stupidity (e.g. algebraic
expressions that could be simplified given some knowledge of the
context) -- I'll clean those up later on in a second pass.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
175ac629be1396fb8566836e32961a22fc5cca21 08-Sep-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Fail the shader compile instead of asserting when we can't spill

Blorp doesn't handle spilling so we set allow_spilling to false in that
case. The blorp 16x MSAA resolve shader spills in 16-wide but not 8-wide.
This commit makes it so that we fail the 16-wide compile and successfully
fall back to 8-wide instead of just assert-failing when trying to compile
the 16-wide shader.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
527f37199929932300acc1688d8160e1f3b1d753 23-Aug-2016 Jason Ekstrand <jason.ekstrand@intel.com> intel: s/brw_device_info/gen_device_info/

Generated by:

sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
db123df74773f458e573a9c034ee783570a3ed0f 22-Jul-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Define logical framebuffer read opcode and lower it to physical reads.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f2f75b0cf05d2519d618c71b19d2187b8ed0d545 22-Jul-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Define framebuffer read virtual opcode.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fe6abb5755e0368c993e6f7cf25a0712ee6503a9 22-Jul-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Emit interpolation setup if non-coherent framebuffer fetch is in use.

This will be required for the next commit since the non-coherent path
makes use of the fragment coordinates implicitly, so they need to be
calculated.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
98d61ee083de57da6b97c9fcf67003f56f5f5a6b 22-Jul-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Force per-sample dispatch if the shader reads from a multisample FBO.

The result of a framebuffer fetch from a multisample FBO is inherently
per-sample, so the spec requires at least those sections of the shader
that depend on the framebuffer fetch result to be executed once per
sample.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b2b621a0ec57f08586b9afcf666c0eadc0993ca0 08-Aug-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Switch to per-subspan discard jumps.

ANY4H is more efficient than ANY8H and ANY16H because it makes sure
that whenever a whole subspan hits a discard statement it gets
disabled by the EU until the end of the program, regardless of whether
the discard condition is uniform across all channels of the SIMD8-16
thread. OTOH ANY8H/ANY16H would cause the rest of the program to be
executed for *all* channels if only one of the channels hadn't taken
the discard branch, potentially increasing the bandwidth and ALU usage
of the program unnecessarily.

This change increases the FPS by over 3x of a simple micro-benchmark
that discards a bunch of fragments and then does a single costly
texturing operation. I've just re-verified the FPS change on HSW and
SKL, but I expect all platforms from Gen6 up to get a similar benefit.

Note that we could potentially be more aggressive and use the NORMAL
predicate to discard individual channels, but that would need to
happen post-scheduling because the scheduler currently doesn't care to
reorder HALT instructions with respect to other instructions, and the
NORMAL predicate would cause the results of subsequent derivative
computations to become undefined -- If the scheduler didn't reorder
HALT instructions it would actually be safe to switch to NORMAL
because the behavior of derivative computations after a non-uniform
discard statement is undefined by the GLSL spec, but that would make
the optimization implemented by one of the following commits somewhat
more difficult.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4d436c011fd9f7ebcadbaebef05090d2056e9d48 12-Aug-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Estimate maximum sampler message execution size more accurately.

The current logic used to determine the execution size of sampler
messages was based on special-casing several argument and opcode
combinations, which unsurprisingly missed the possibility that some
messages could exceed the payload size limit or not depending on the
number of coordinate components present. In particular:

- The TXL, TXB and TEX messages (the latter on non-FS stages only)
would attempt to use SIMD16 on Gen7+ hardware even if a shadow
reference was present and the texture was a cubemap array, causing
it to overflow the maximum supported sampler payload size and
crash.

- The TG4_OFFSET message with shadow comparison was falling back to
SIMD8 regardless of the number of coordinate components, which is
unnecessary when two coordinates or less are present.

Both cases have been handled incorrectly ever since cubemap arrays and
texture gather were respectively enabled (the current logic used by
the SIMD lowering pass is almost unchanged from the previous no16
fall-back logic used pre-SIMD lowering times).

Fixes the following GL4.5 conformance test on Gen7-8 (the bug also
affects Gen9+ in principle, but SKL passes the test by luck because it
manages to use the TXL_LZ message instead of TXL):

GL45-CTS.texture_cube_map_array.sampling

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97267
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
61a02fb74c07d574b726a8b27517a02251aa4be4 13-Aug-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Return zero from fs_inst::components_read for non-present sources.

This makes it easier for the caller to find out how many scalar
components are actually read by the instruction. As a bonus we no
longer need to special-case BAD_FILE in the implementation of
fs_inst::regs_read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0c754d1c4203d87dbb9d2dd882ef42686e6d01ec 12-Aug-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Lower TEX to TXL during NIR translation.

This simplifies the code slightly and will allow the SIMD lowering
pass to find out easily what the actual texturing opcode is in order
to determine the maximum execution size of texturing instructions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cec377eed3ab6420679dceef98ad0eea27b5f644 01-Aug-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: fix comparison warning

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ebdc82d06532f992aea592265c29a11330e698fa 26-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Fix move_interpolation_to_top() pass.

The pass I introduced in commit a2dc11a7818c04d8dc0324e8fcba98d60bae
was entirely broken. A missing "break" made the load_interpolated_input
case always fall through to "default" and hit a "continue", making it
not actually move any load_interpolated_input intrinsics at all.
It would only move the simple load_barycentric_* intrinsics, which
don't emit any code anyway, making it basically useless.

The initial version I sent of the pass worked, but I apparently
failed to verify that the simplified version in v2 actually worked.

With the obvious fix applied (so we actually tried to move
load_interpolated_input intrinsics), I discovered a second bug: we
weren't moving the offset SSA def to the top, breaking SSA validation.

The new version of the pass actually moves load_interpolated_input
intrinsics and all their dependencies, as intended.

Papers over GPU hangs on Ivybridge and Baytrail caused by the
recent NIR FS input rework by restoring the old behavior.
(I'm not honestly sure why they hang with PLN not at the top.)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97083
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2db357e4c3dcb49deabae7b68721d57ad9ea0000 21-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Include VUE handles for GS with invocations > 1.

We always resort to the pull model for instanced GS inputs. So, we'd
better include the VUE handles, or else we can't actually pull anything.

Ian reports that on his branch with OES_geometry_shader enabled,
this fixes a bunch of dEQP-GLES31.functional.geometry_shading tests::

- instanced.draw_2_instances_geometry_2_invocations
- instanced.draw_2_instances_geometry_8_invocations
- instanced.draw_4_instances_geometry_2_invocations
- instanced.draw_4_instances_geometry_8_invocations
- instanced.draw_8_instances_geometry_2_invocations
- instanced.draw_8_instances_geometry_8_invocations
- instanced.geometry_2_invocations
- instanced.geometry_32_invocations
- instanced.geometry_8_invocations
- instanced.geometry_max_invocations
- instanced.geometry_output_different_2_invocations
- instanced.geometry_output_different_32_invocations
- instanced.geometry_output_different_8_invocations
- instanced.geometry_output_different_max_invocations
- instanced.invocation_output_vary_by_attribute
- instanced.invocation_output_vary_by_texture
- instanced.invocation_output_vary_by_uniform
- query.primitives_generated_instanced

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
09e46f99ad465ab253de3fc321f39062cfbe1984 19-Jul-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: bring back type_size_vec4_times_4()

We will use this for output varyings. To make component
packing simpler we will just treat all varyings as vec4s.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
160820995210e0b85fd25821f5ae785d6a539e08 16-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Delete the FS_OPCODE_INTERPOLATE_AT_CENTROID virtual opcode.

We no longer use this message. As far as I can tell, it's fairly
useless - the equivalent information is provided in the payload.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1eef0b73aa323d94d5a080cd1efa81ccacdbd0d2 12-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Rewrite FS input handling to use the new NIR intrinsics.

This eliminates the need to walk the list of input variables, recurse
into their types (via logic largely redundant with nir_lower_io), and
interpolate all possible inputs up front. The backend no longer has
to care about variables at all, which eliminates complications from
trying to pack multiple variables into the same location. Instead,
each intrinsic specifies exactly what's needed.

This should unblock Timothy's work on GL_ARB_enhanced_layouts.

Each load_interpolated_input intrinsic corresponds to PLN instructions,
while load_barycentric_at_* intrinsics correspond to pixel interpolator
messages. The pixel/centroid/sample barycentric intrinsics simply refer
to payload fields (delta_xy[]), and don't actually generate any code.

Because we use a single intrinsic for both centroid-qualified variables
and interpolateAtCentroid(), they become indistinguishable. We stop
sending pixel interpolator messages for those, and instead use the
payload provided data, which should be considerably faster.

On Broadwell:

total instructions in shared programs: 9067751 -> 9067570 (-0.00%)
instructions in affected programs: 145902 -> 145721 (-0.12%)
helped: 422
HURT: 209

total spills in shared programs: 2849 -> 2899 (1.76%)
spills in affected programs: 760 -> 810 (6.58%)
helped: 0
HURT: 10

total fills in shared programs: 3910 -> 3950 (1.02%)
fills in affected programs: 617 -> 657 (6.48%)
helped: 0
HURT: 10

LOST: 3
GAINED: 3

The differences mostly appear to be slight changes in MOVs.

v2: Use nir_shader_compiler_options::use_interpolated_input_intrinsics
flag rather than passing it directly to nir_lower_io. Use the
unreachable() macro rather than assert in one place. (Review
feedback from Chris Forbes.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a2dc11a7818c04d8dc0324e8fcba98d60baea529 18-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Move load_interpolated_input/barycentric_* intrinsics to the top.

Currently, i965 interpolates all FS inputs at the top of the program.
This has advantages and disadvantages, but I'd like to keep that policy
while reworking this code. We can consider changing it independently.

The next patch will make the compiler generate PLN instructions "on the
fly", when it encounters an input load intrinsic, rather than doing it
for all inputs at the start of the program.

To emulate this behavior, we introduce an ugly pass to move all NIR
load_interpolated_input and payload-based (not interpolator message)
load_barycentric_* intrinsics to the shader's start block.

This helps avoid regressions in shader-db for cases such as:

if (...) {
...load some input...
} else {
...load that same input...
}

which CSE can't handle, because there's no dominance relationship
between the two loads. Because the start block dominates all others,
we can CSE all inputs and emit PLNs exactly once, as we did before.

Ideally, global value numbering would eliminate these redundant loads,
while not forcing them all the way to the start block. When that lands,
we should consider dropping this hacky pass.

Again, this pass currently does nothing, as i965 doesn't generate these
intrinsics yet. But it will shortly, and I figured I'd separate this
code as it's relatively self-contained.

v2: Dramatically simplify pass - instead of creating new instructions,
just remove/re-insert their list nodes (suggested by Jason Ekstrand).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
048a56c1fc8f66e74645cc5ff4b4eb3d5ee471a8 18-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Add a pass to demote sample interpolation intrinsics.

When working with a non-multisampled render target, asking for "sample"
interpolation locations doesn't make sense. We demote them to centroid.

In a couple of patches, brw_compute_barycentric_modes will begin looking
at these intrinsics to determine the barycentric modes. fs_visitor also
will use them to code-generate pixel interpolator messages or payload
references. Handling the "but what if it's not MSAA?" logic ahead of
time in a NIR pass simplifies things and prevents duplicated logic.

This patch doesn't actually do anything useful yet as we don't generate
these intrinsics. I decided to keep it separate as it's self-contained,
in the hopes of shrinking the "convert everything" patch for reviewers.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d7a47a76e0c6ccd1765f4c10c390e7d4f5f86414 28-Jun-2016 Ian Romanick <ian.d.romanick@intel.com> i965: Update assertion to account for Gen < 7

Previously SHADER_OPCODE_MULH could only exist on Gen7+, so the
assertion assumed the Gen7+ accumulator rules. A future patch will
allow this instruction on at least Gen6, so update the assertion.

v2: Use get_lowered_simd_width instead of open coding it. Suggested by
Curro.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7ef7738a61ded5632105b8de6f8141307592e20a 15-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Write gl_FragCoord directly to the destination.

This patch makes emit_general_interpolation take a destination register
as an argument, and write directly to that. This is simpler than the
old approach of ralloc'ing a register, writing to that temporary, and
then making the caller emit per-component MOVs to copy it to the actual
destination.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a03812c32188f6d29d386165ca02771fe0865352 15-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Drop has_pln checks in unlit centroid workaround.

The unlit centroid workaround starts being necessary on Gen6, which
is the first platform with multisampling. PLN exists on G45+, so all
platforms which need this workaround have PLN.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b94890c19fa82003a03f960d9c3de091756233ac 14-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Drop VARYING_SLOT_FACE special case in barycentric setup.

glsl_to_nir always produces a system value for gl_FrontFacing, rather
than an input. So there should never be an input with this slot,
making this code dead.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ac1181ffbef5250cb3b651e047cce5116727c34c 07-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> compiler: Rename INTERP_QUALIFIER_* to INTERP_MODE_*.

Likewise, rename the enum type to glsl_interp_mode.

Beyond the GLSL front-end, talking about "interpolation modes" seems
more natural than "interpolation qualifiers" - in the IR, we're removed
from how exactly the source language specifies how to interpolate an
input. Also, SPIR-V calls these "decorations" rather than "qualifiers".

Generated by:
$ find . -regextype egrep -regex '.*\.(c|cpp|h)' -type f -exec sed -i \
-e 's/INTERP_QUALIFIER_/INTERP_MODE_/g' \
-e 's/glsl_interp_qualifier/glsl_interp_mode/g' {} \;

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f05770121fb165b28b06af9c502dd21300dee530 12-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Remove the emit_linterp() helper.

Rather than computing the barycentric mode each time we emit a LINTERP,
we can simply compute it once, as soon as we know we're doing non-flat
interpolation.

At that point, emit_linterp() doesn't do much, so fold it into the
call sites and drop it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
203243f5ffe438c7f7b5f92d8bc177b76880bf5b 12-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Reduce the number of fs_reg(brw_reg) calls in LINTERP handling.

A bit tidier.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
eefbbb943e81b182a1c5ef6cac8425686f5b636c 12-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Make a barycentric_mode() helper function.

This combines two copies of basically the same code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
783511e605160bcfc9132b6fbc83c8816262effd 12-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Rename brw_wm_barycentric_interp_mode to brw_barycentric_mode.

brw_wm_barycentric_interp_mode is wordy, brw_barycentric_mode is less
typing and suffers from fewer line wrapping problems.

The enum values themselves don't really benefit from "WM" in the name,
either. Put "BARYCENTRIC" first instead of at the end and drop "WM".

Generated by:

for file in *.c *.cpp *.h; do sed -i \
-e 's/brw_wm_barycentric_interp_mode/brw_barycentric_mode/g' \
-e 's/BRW_WM_\([A-Z_]*\)_BARYCENTRIC/BRW_BARYCENTRIC_\1/g' \
-e 's/BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT/BRW_BARYCENTRIC_MODE_COUNT/g' \
$file;
done

with a few whitespace changes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2d6dd30a9b30cbbd12a32122249dbd0963209bf1 07-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Handle default interpolation modes and locations in NIR.

This consolidates a bunch of hacks in a single place - by setting
the interpolation modes and locations on variables appropriately,
we can simply trust them in the rest of the code. This avoids
having to handle INTERP_QUALIFIER_NONE, gl_Color overrides,
sample-shading overrides, and Gen4-5 centroid-overrides in a bunch
of places.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a2bd7334ed4faba5fc1cf3cad7e119f560c2c904 08-Jul-2016 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/fs: do d2x lowering before simd splitting

So that we can have gen7 split large writes produced by this lowering pass.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
376d7ee5874615c8e4208de3e70983a002617e26 01-Apr-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: do pack lowering before simd splitting

So that we can have gen7 split large writes produced by the pack lowering.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
aa4796ae815f38ff44283476f3553edc06114e80 30-Mar-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs/gen7: split instructions that run into exec masking bugs

In fp64 we can produce code like this:

mov(16) vgrf2<2>:UD, vgrf3<2>:UD

That our simd lowering pass would typically split in instructions with a
width of 8, writing to two consecutive registers each. Unfortunately, gen7
hardware has a bug affecting execution masking and as a result, the
second GRF register write won't work properly. Curro verified this:

"The problem is that pre-Gen8 EUs are hardwired to use the QtrCtrl+1
(where QtrCtrl is the 8-bit quarter of the execution mask signals
specified in the instruction control fields) for the second
compressed half of any single-precision instruction (for
double-precision instructions it's hardwired to use NibCtrl+1,
at least on HSW), which means that the EU will apply the wrong
execution controls for the second sequential GRF write if the number
of channels per GRF is not exactly eight in single-precision mode (or
four in double-float mode)."

In practice, this means that we cannot write more than one
consecutive GRF in a single instruction if the number of channels
per GRF is not exactly eight in single-precision mode (or four
in double-float mode).

This patch makes our SIMD lowering pass split this kind of instructions
so that the split versions only write to a single register. In the
example above this means that we split the write in 4 instructions, each
one writing 4 UD elements (width = 4) to a single register.

v2 (Curro):
- Make explicit that the thing about hardwiring NibCtrl+1 for the second
compressed half is known to happen in Haswell and the issue with IVB
might not be exactly the same.
- Assign max_width instead of returning early so that we can handle
multiple restrictions affecting to the same instruction.
- Avoid division by 0 if the instruction does not write any registers.
- Ignore instructions what have WE_all set.
- Use the instruction execution type size instead of the dst type size.

v3 (Curro):
- Move the implementation down so it is not placed in the middle of another
workaround.
- Declare channels_per_grf as const.
- Don't break the loop early if we find a BAD_FILE source.
- Fix the number of channels that the hardware shifts for the second half
of a compressed instruction to be 8 in single precision and 4 in double
precision.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
034bd2532775a1f7da5379a523621458e273f619 26-Jun-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Print EOT in fs_visitor::dump_instruction().

This was useful when debugging the previous commit's issue.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
192813e50ee8888a9012f5adce3003d0ca2aee22 23-Jun-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Delete send-from-GRF only opcodes from implied_mrf_writes().

These only exist post-Sandybridge, and always use send-from-GRF.
So inst->base_mrf will be -1, and we will have already returned 0.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
255cff76d961e56199acab2ab523140e43ea2de2 23-Jun-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Drop unnecessary inst->base_mrf = -1 assignments.

These are now unnecessary, as base_mrf is -1 by default.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3e04e3758e90b2a65eaefb95155d43605f506961 23-Jun-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Set fs_inst::base_mrf = -1 by default.

On MRF platforms, we need to set base_mrf to the first MRF value we'd
like to use for the message. On send-from-GRF platforms, we set it to
-1 to indicate that the operation doesn't use MRFs.

As MRF platforms are becoming increasingly a thing of the past, we've
forgotten to bother with this. It makes more sense to set it to -1 by
default, so we don't have to think about it for new code.

I searched the code for every instance of 'mlen =' in brw_fs*cpp, and
it appears that all MRF-based messages correctly program a base_mrf.

Forgetting to set base_mrf = -1 can confuse the register allocator,
causing it to think we have a large fake-MRF region. This ends up
moving the send-with-EOT registers earlier, sometimes even out of
the g112-g127 range, which is illegal. For example, this fixes
illegal sends in Piglit's arb_gpu_shader_fp64-layout-std430-fp64-shader,
which had SSBO messages with mlen > 0 but base_mrf == 0.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
60a27ad122128145d28be37e9c0b0bc86a8e5181 23-Jun-2016 Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Remove wrongly repeated words in comments

Clean up misrepetitions ('if if', 'the the' etc) found throughout the
comments. This has been done manually, after grepping
case-insensitively for duplicate if, is, the, then, do, for, an,
plus a few other typos corrected in fly-by

v2:
* proper commit message and non-joke title;
* replace two 'as is' followed by 'is' to 'as-is'.
v3:
* 'a integer' => 'an integer' and similar (originally spotted by
Jason Ekstrand, I fixed a few other similar ones while at it)

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0195299c868ec99bc6c595c641da81bb2632252e 07-Jun-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use a default Y coordinate of 0 for TXF on gen9+

Previously, we were incrementing length but not actually putting anything
in the Y coordinate. This meant that 1-D TXF operations had a garbage
array index. If the surface is emitted as 1-D non-array, the coordinate
gets discarded and it works fine. If it happens to be bound as an array
surface, it may count as an out-of-bounds array access and you get zero.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
40013c50333caf7a4a66204ac29695aad0d9b06d 14-Jun-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Reorganize prog_data->total_scratch code a bit.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cd89c834a8b3b4e5f5874c8e1f90c9b01d541181 09-Jun-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Fix multiplication of immediates on Cherryview/Broxton.

Cherryview and Broxton don't support DW x DW multiplication. We have
piles of code to handle this, but apparently weren't retyping in the
immediate case.

For example,
tests/spec/arb_tessellation_shader/execution/dvec3-vs-tcs-tes
makes the simulator angry about instructions such as:

mul(8) r18<1>:D r10.0<8;8,1>:D 0x00000003:D

Just retype to W or UW. It should be safe on all platforms.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a8a9d1bf41c00123cefb6e757f3509c62e880a15 14-Jun-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: remove type_size_vec4_times_4()

type_size_vec4_times_4() was introduced as a fix in 8dcf807cb43383
however since 3810c1561 we can just use type_size_scalar() and
get the actual number of outputs we need.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bd9f9726519fad94e88b9266b0c255aa00251f4d 11-Jun-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix regs_written for SIMD-lowered instructions some more.

ISTR having suggested this during review of the recent FP64 changes to
the SIMD lowering pass, but it doesn't look like it was taken into
account in the end. Using the fs_reg::component_size helper instead
of this open-coded variant makes sure that the stride is taken into
account correctly. Fixes at least the following piglit tests with
spilling forced on (since otherwise regs_written would be calculated
incorrectly and the spilling code would be rather confused about how
much data needs to be spilled):

spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-shader
spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-mixed-shader

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1db37ebecf5af55215ace3801f8dbb8b10c5305e 10-Jun-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Assert that the scratch spaces are in range.

I don't know that anything actually guarantees this, but if we exceed
the limits, we may end up overflowing and trashing random buffers that
happen to be nearby in the VMA space, leading to rendering corruption,
hangs, or worse.

We should really fix this properly. However, the pitfall has existed
for ages, so for now we should at least detect it.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a42a93dc123163f84058f3886e5ce1b02b9856f5 10-Jun-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Fix CS scratch size calculations on Ivybridge and Baytrail.

These are linear, not powers of two, and much more limited.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
147a90d82a5de637f968e0d5f383cabcb792f1ce 10-Jun-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Fix Haswell CS per-thread scratch space encoding.

Most scratch stages use power of two sizes, in kilobytes, where
0 means 1kB. But compute shaders on Haswell have a minimum of 2kB,
and use a representation where 0 = 2kB.

This meant that we were effectively telling the hardware to allocate
each thread twice as much space as we meant to, while simultaneously
not allocating that much space in the buffer, leading to overflows.

Note that the existing code is completely wrong for Ivybridge,
but that will take additional work to sort out, so I've left it
as is for now. A subsequent commit will take care of that.

Together with the previous patches, this fixes rendering corruption
on Synmark's Gl43CSDof on Haswell.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cb30727648fea301cfff1647d947bfab540c3bf6 26-May-2016 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/fs: fix FS_OPCODE_CINTERP for unpacked double input varyings

Data starts at suboffet 3 in 32-bit units (12 bytes), so it is not
64-bit aligned and the current implementation fails to read the data
properly. Instead, when there is is a double input varying, read it as
vector of floats with twice the number of components.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
060c8d245deb83aeb412de98810cad6052aafb78 27-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Reindent emit_zip().

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7aa76d66a1f5edad9e8c1d54aafdce99ffa6c345 27-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Skip SIMD lowering destination zipping if possible.

Skipping the temporary allocation and copy instructions is easy (just
return dst), but the conditions used to find out whether the copy can
be optimized out safely without breaking the program are rather
complex: The destination must be exactly one component of at most the
execution width of the lowered instruction, and all source regions of
the instruction must be either fully disjoint from the destination or
be aligned with it group by group.

v2: Don't handle partial source-destination overlap for simplicity
(Jason). No instruction count regressions with respect to v1 in
either shader-db or the few FP64 shader_runner test-cases with
partial overlap I've checked manually.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0a3acff5b53d409181dcd2f31a4a50af06f73a57 23-May-2016 Jordan Justen <jordan.l.justen@intel.com> i965: Remove old CS local ID handling

The old method pushed data for each channels uvec3 data of
gl_LocalInvocationID.

The new method pushes 1 dword of data that is a 'thread local ID'
value. Based on that value, we can generate gl_LocalInvocationIndex
and gl_LocalInvocationID with some calculations.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b1f22c6317940dac543e44dd638ea9f4fbcd6ca7 01-Jun-2016 Jordan Justen <jordan.l.justen@intel.com> i965: Enable cross-thread constants and compact local IDs for hsw+

The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

One complication is that cross-thread constants are loaded into
registers before per-thread constants. Previously, our local IDs were
loaded before the uniform data and treated as 'payload' data, even
though they were actually pushed into the registers like the other
uniform data.

Therefore, in this patch we simultaneously enable a newer layout where
each thread now uses a single uniform slot for a unique local ID for
the thread. This uniform is handled specially to make sure it is added
last into the uniform push constant registers. This minimizes our
usage of push constant registers, and maximizes our ability to use
cross-thread constants for registers.

To swap from the old to the new layout, we also need to flip some
lowering pass switches to let our driver handle the lowering instead.
We also no longer force thread_local_id_index to -1.

v4:
* Minimize size of patch that switches from the old local ID layout
to the new layout (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d437798ace47e47dbcb1244734dc1af3ecb5ab84 23-May-2016 Jordan Justen <jordan.l.justen@intel.com> i965: Add CS push constant info to brw_cs_prog_data

We need information about push constants in a few places for the GL
driver, and another couple places for the vulkan driver.

When we add support for uploading both a common (cross-thread) set of
push constants, combined with the previous per-thread push constant
data, things are going to get even more complicated. To simplify
things, we add push constant info into the cs prog_data struct.

The cross-thread constant support is added as of Haswell. To support
it we need to make sure all push constants with uniform values are
added to earlier registers. The register that varies per thread and
holds the thread invocation's unique local ID needs to be added last.

For now we add the code that would calculate cross-thread constatn
information for hsw+, but we force it (cross_thread_supported) off
until the other parts of the driver support it.

v4:
* Support older local ID push constant layout as well. (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1b79e7ebbd77a7e714fafadd91459059aacf2407 26-May-2016 Jordan Justen <jordan.l.justen@intel.com> i965: Store number of threads in brw_cs_prog_data

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3ef0957dac11edee7babc9746ec766dcb055d909 22-May-2016 Jordan Justen <jordan.l.justen@intel.com> i965: Add nir based intrinsic lowering and thread ID uniform

We add a lowering pass for nir intrinsics. This pass can replace nir
intrinsics with driver specific nir lower code.

We lower the gl_LocalInvocationIndex intrinsic based on a uniform
which is loaded with a thread specific ID.

We also lower the gl_LocalInvocationID based on
gl_LocalInvocationIndex.

v2:
* Create variable during lowering pass. (Ken)

v3:
* Don't create a variable, but instead just insert an intrisic call
to load a uniform from the allocated location. (Jason)

v4:
* Don't run this pass if thread_local_id_index < 0

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
04fc72501a90af94b0b5699e57fea68ad6e8795b 23-May-2016 Jordan Justen <jordan.l.justen@intel.com> i965: Put CS local thread ID uniform in last push register

This thread ID uniform will be used to compute the
gl_LocalInvocationIndex and gl_LocalInvocationID values.

It is important for this uniform to be added in the last push constant
register. fs_visitor::assign_constant_locations is updated to make
sure this happens.

The reason this is important is that the cross-thread push constant
registers are loaded first, and the per-thread push constant registers
are loaded after that. (Broadwell adds another push constant upload
mechanism which reverses this order, but we are ignoring this for
now.)

v2:
* Add variable in intrinsics lowering pass
* Make sure the ID is pushed last in assign_constant_locations, and
that we save a spot for the ID in the push constants

v3:
* Simplify code based with Jason's suggestions.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fa279dfbf0fc89b07007141ad8850ac42206e397 29-May-2016 Jordan Justen <jordan.l.justen@intel.com> i965: Add uniform for a CS thread local base ID

v4:
* Force thread_local_id_index to -1 for now, and have
fs_visitor::setup_cs_payload look at thread_local_id_index. This
enables us to more easily cut over from the old local ID layout to
the new layout, as suggested by Jason.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1205999c229b8e67af39fb9875bd87bc0a1404eb 02-Jun-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Copy the offset when lowering logical pull constant sends

This fixes 64 Vulkan CTS tests per gen

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96299
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
303ec22ed6124f7860de3856599ab4f02808b84b 25-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Extend remove_duplicate_mrf_writes() to handle non-VGRF to MRF copies.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4fe4f6e8a776acc60633809693e4135f5c894aa3 28-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix compute_to_mrf() to coalesce VGRFs initialized by multiple single-GRF writes.

Which requires using a bitset instead of a boolean flag to keep track
of the GRFs we've seen a generating instruction for already. The
search loop continues until all instructions initializing the value of
the source VGRF have been found, or it is determined that coalescing
is not possible.

Fixes a few piglit test cases on Gen4-6 which were regressed by
6956015aa514f2d06d0e4b33bfe6bca83142fbf0 due to the different (yet
perfectly valid) ordering in which copy instructions are emitted now
by the simd lowering pass, which had the side effect of causing this
optimization pass to start corrupting the program in cases where a
VGRF-to-MRF copy instruction would be eliminated but only the last
instruction writing to the source VGRF region would be rewritten to
point to the target MRF.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1898673f586b9110fb2a3125e2781cbb1d795c73 27-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Teach compute_to_mrf() about the COMPR4 address transformation.

This will be required to correctly transform the destination of 8-wide
instructions that write a single GRF of a VGRF to MRF copy marked
COMPR4.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
485fbaff03f7d281ff4f22bd6321548512783799 27-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Refactor compute_to_mrf() to split search and rewrite into separate loops.

This will allow compute_to_mrf to handle cases where the source of the
VGRF-to-MRF copy is initialized by more than one instruction. In such
cases we cannot rewrite the destination of any of the generating
instructions until it's known whether the whole VGRF source region can
be coalesced into the destination MRF, which will imply continuing the
search until all generating instructions have been found or it has
been determined that the VGRF and MRF registers cannot be coalesced.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4b0ec9f4759bab68b51e2f410e9305e39c1e1e7f 28-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix compute-to-mrf VGRF region coverage condition.

Compute-to-mrf was checking whether the destination of scan_inst is
more than one component (making assumptions about the instruction data
type) in order to find out whether the result is being fully copied
into the MRF destination, which is rather inaccurate in cases where a
single-component instruction is only partially contained in the source
region, or when the execution size of the copy and scan_inst
instructions differ. Instead check whether the destination region of
the instruction is really contained within the bounds of the source
region of the copy.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bb61e24787952a4796a687a86200a05cf83af7e9 27-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Simplify and improve accuracy of compute_to_mrf() by using regions_overlap().

Compute-to-mrf was being rather heavy-handed about checking whether
instruction source or destination regions interfere with the copy
instruction, which could conceivably lead to program miscompilation.
Fix it by using regions_overlap() instead of the open-coded and
dubiously correct overlap checks.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4decc426c26a86beb76dc48658ce175d051464c2 25-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Skip gen4 pre/post-send dependency workaronds for the first/last block.

We know that there cannot be any destination dependency race if we
reach the beginning or end of the program without having found any
other instruction the send could possibly race with. This avoids
emitting a pile of useless moves at the beginning or end of the
program in the most common case in which the program has a single
basic block only.

On the original i965 I get the following shader-db results:

total instructions in shared programs: 3354165 -> 3215637 (-4.13%)
instructions in affected programs: 3183065 -> 3044537 (-4.35%)
helped: 13498
HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
daf4a71883bffcedaf27ff046a1ddd4af9d41f7f 29-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Skip SIMD lowering source unzipping for regular scalar regions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6956015aa514f2d06d0e4b33bfe6bca83142fbf0 27-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Factor out region zipping and unzipping from the SIMD lowering pass.

Just to make sure we keep the SIMD lowering pass tidy when we
introduce additional logic to try to optimize out the copy
instructions used to zip and unzip the destination and source regions
into multiple packed regions of the lowered instruction width.
Shouldn't cause any functional changes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a5b4f63c1593cdcbc253cce2838c85b2fd796dac 20-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Implement opt_sampler_eot() in terms of logical sends.

This makes the whole LOAD_PAYLOAD munging unnecessary which simplifies
the code and will allow the optimization to succeed in more cases
independent of whether the LOAD_PAYLOAD instruction can be found or
not.

The following patch is squashed in:

SQUASH: i965/fs: Add basic dataflow check to opt_sampler_eot().

The sampler EOT optimization pass naively assumes that the texturing
instruction provides all the data used by the FB write just because
they're standing next to each other. The least we should be checking
is whether the source and destination regions of the FB write and
texturing instructions match. Without this the previous seemingly
harmless patch would have caused opt_sampler_eot() to misoptimize a
shader from dota-2 causing DCE to eliminate all of its 78 instructions
except for the final sampler EOT message (!).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a0d9aed2682f78626f467cbc2b7fc3185d9f9034 30-Apr-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix UB list sentinel dereference in opt_sampler_eot().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2a166c13d4a6edecaffc56a8220dda146e3ce8a0 04-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Take opt_redundant_discard_jumps out of the optimization loop.

No shader-db regressions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d5f2f32b118331070507faf292bbe3da2671df4b 01-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Run SIMD and logical send lowering after the optimization loop.

There are two reasons why this is useful:

- It avoids the introduction of an amount of partial writes emitted
by the SIMD lowering pass to zip and unzip register regions early
during optimization, which can make subsequent optimization less
effective.

- It substantially reduces the burden on the compiler when a large
fraction of the instructions in the program need to be split (e.g.
during SIMD32 builds). Individual halves of split instructions
will be optimized identically (if they can still be optimized at
all), so doing it up front can duplicate the amount of instructions
the optimizer has to deal with which causes the compilation time to
explode in some cases due to the worse-than-linear runtime
behaviour of the back-end.

It seems helpful to re-run a few optimization passes in cases where
any of the lowering passes was able to make progress.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b0c8e5e0c88f7c5d7395715e58a8731e2ab55f7e 30-Apr-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Pass a BAD_FILE register to the logical FB write when oMask is unused.

This will let the optimizer know that the sample mask value is unused
so its definition can be DCE'ed.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
46ce93ed22891455dbe3eb4c69f5eddd2a7dcf00 27-May-2016 Francisco Jerez <currojerez@riseup.net> i965: Add do32 debug option.

The do32 INTEL_DEBUG option causes the back-end to try to generate a
SIMD32 program when compiling a compute shader regardless of the
specified compute shader workgroup size, which will be useful for
testing SIMD32 code generation in the most common case in which the
workgroup size doesn't exceed the SIMD16 limit so SIMD32 codegen
wouldn't be automatically enabled.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
864737ce6cd5bae030079e749b8b18774a62d073 17-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Build 32-wide compute shader when needed.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
37fd13ee2daf1dbd80cc7b43f7dcfdd1bb64bcc7 18-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Extend back-end interface for limiting the shader dispatch width.

This replaces the current fs_visitor::no16() interface with
fs_visitor::limit_dispatch_width(), which takes an additional
parameter allowing the caller to specify the maximum dispatch width a
shader can be compiled with.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2d288cb9ea5b1b46eb4fe0061d694560bf54943f 18-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Implement SIMD32 register allocation support.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1d5bf46ad1533ffdb30b5dc0f9244f60b0539285 01-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Don't mutate multi-component arguments in sampler payload set-up.

The Gen5+ sampler message payload construction code steps through the
coordinate and derivative components by induction like 'coordinate =
offset(coordinate, bld, 1)', the problem is that while doing that it
may step one past the end of the coordinate vector causing an
assertion failure in offset() if it happens to be a (single component)
immediate. Right now coordinates and derivatives are typically passed
as actual registers but that will no longer be the case when we start
propagating constants into logical messages.

Instead express coordinate components in closed form like
'offset(coordinate, bld, i)' -- The end result seems slightly more
readable that way and it allows passing the coordinate and derivative
registers by const reference instead of by value, so it seems like a
clean-up in its own right.

v2: Fold a few post-increment operators into the last MOV
statement. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8476233ae22c77ca26d8109f0f0d6c74457969f8 26-Apr-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Estimate number of registers written correctly in opt_register_renaming.

The current estimate is incorrect for non-32b types.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
51dd6a60f5ef43a12d1b4384a2aded4d55d14056 20-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Reset reg_offset of the original destination to zero in compute_to_mrf().

Prevents an assertion failure in the following commit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b9eab911baa380fea1a3d3393f5944c00aa63076 26-Apr-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Skip remove_duplicate_mrf_writes() during SIMD32 runs.

The pass is disabled in SIMD16 dispatch mode for the same reason, it
cannot handle instructions that write multiple MRF registers at once.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7d430fc05e8f0a6211fb587f1bc7b2a76ed7de10 19-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
df1aec763eb972c69bc5127be102a9f281ce94f6 19-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.

v2: Codestyle fixes (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ece41df247af247fb573ae8ec208d50e895b7aef 21-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Expose arbitrary channel execution groups to the IR.

This generalizes the current fs_inst::force_sechalf flag to allow
specifying channel enable groups other than 0 or 8. At some point it
will likely make sense to fix the vec4 generator to support arbitrary
execution groups and then move the definition of fs_inst::group into
backend_instruction (e.g. so we can do FP64 in the VEC4 back-end).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5db4d623956ceb5ffa8599e7797bd13470898158 20-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Remove FS_OPCODE_PACK_STENCIL_REF virtual instruction.

It's just a byte MOV with strided source.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cf5443f984da4eb500c9b1ad9b9f53bc8747fef3 20-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Limit SIMD width of various virtual opcodes to the maximum supported value.

Which is 16 or 8 in most cases. This will make sure that 32-wide
virtual instructions get chopped up into chunks of their maximum
execution size.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
197833caa3d684c092ee76d1e9ff3fac28576b04 20-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Lower LOAD_PAYLOAD instructions of unsupported width.

Only per-channel LOAD_PAYLOAD instructions can be lowered, which
should cover everything that comes in from the front-end.

LOAD_PAYLOAD instructions used to construct actual message payloads
cannot be easily lowered because they contain headers and vectors of
variable type that aren't necessarily channel-aligned -- We shouldn't
find any of them in the program at SIMD lowering time though because
they're introduced during logical send lowering.

An alternative that may be worth considering would be to re-run the
SIMD lowering pass after LOAD_PAYLOAD lowering instead of this patch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9eea3df29f21eb7507354c3b1d85d238b671a211 18-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Lower DDY instructions to SIMD8 during SIMD lowering time

...on hardware lacking compressed Align16 support. Will allow
simplifying the generator code and fixing it for SIMD32 codegen.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
12ae87abb194e2fc5339d8944b6d0e9ddf54ea22 18-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Apply usual FPU-like execution size restrictions to MULH.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
dea9c1df89cf58591cce83b67d3d905a28f0c101 18-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Calculate maximum execution size of MOV_INDIRECT correctly.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
122e0315480704a7c6777b994c42448d360e6774 18-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Assert that IF instruction with embedded compare has legal exec_size.

We shouldn't encounter these right now but if we did it wouldn't be
possible for the SIMD lowering pass to split it into multiple
instructions because of its side effects on control flow, so just
assert in order to kill the program.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
98c8bef01cae5fd70dda22fd7ac0b5694c4dfb5f 18-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Implement HSW BFI exec size workarounds in the SIMD lowering pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
88d9cc15637559229fe725c0531de8ad7a0a60a7 18-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Implement workaround for IVB CMP dependency race in the SIMD lowering pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a6bf5f88c7be5ba1d1d9ebf1412e99886e0cf75c 20-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Enforce common regioning restrictions by SIMD splitting.

This change addresses a number of hardware restrictions on the source
and destination regions and other execution controls of regular
FPU-like instructions that in some cases can be avoided by reducing
the execution size of the instruction. Some of these restrictions
(e.g. the one about 3src instructions not supporting compression on
some hardware) are currently being worked around case by case in the
generator with ad-hoc splitting code that is buggy in several ways
(e.g. doesn't handle non-trivial execution controls which would break
SIMD32 code), but it seems cleaner to implement as many restrictions
as we can in a single lowering pass since that will allow us to
simplify some of the surrounding code considerably and also make sure
that we don't forget applying them in the future.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2b5adb942bad418058d266c85c396040d558f680 20-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Enforce extended math exec size limits during SIMD lowering.

This teaches the SIMD lowering pass about the hardware limits on the
execution size of math instructions, which will allow simplifying the
generator code and at the same time get rid of a number of bugs in the
manual SIMD unrolling done currently that prevent SIMD32 codegen from
working.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a8e7b4f1d9ec50d2214e7694da26af6a108e506f 20-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Handle SAMPLEINFO consistently like other texturing instructions.

Seems like this texturing opcode was missing its logical counterpart
which would prevent it from taking advantage of the SIMD lowering
infrastructure, define it and plumb it through the back-end. At some
point we'll likely want to emit a single SAMPLEINFO message shared
among all channels irrespective of this change, but for the moment
this should be enough to get the intrinsic working in SIMD32 mode.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
99b5476d33f967ac2a30c3f8f7f958a7169e7123 18-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Lower math into Gen4-5 send-like instructions in lower_logical_sends.

The benefit is we will be able to use the SIMD lowering pass to unroll
math instructions of unsupported width and then remove some cruft from
the generator.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ed4d0e41acb78f268b8b5c2dd03f654d11c4460b 20-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Rename Gen4 physical varying pull constant load opcode.

For consistency with the Gen7 variant. I'm not doing the same to the
uniform pull constant message at this point because the non-GEN7 one
is still overloaded to be either an expression-like logical
instruction or a Gen4-specific physical send message.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
64a6cb87f1fbfe2e410d6a4087450c2d4eb72228 18-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Implement promotion of varying pull loads on Gen4 during SIMD lowering.

Varying pull constant loads inherit the same limitation of pre-ILK
hardware that requires expanding SIMD8 texel fetch instructions to
SIMD16, we can deal with pull constant loads in the same way it's done
for texturing during SIMD lowering.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d8a3294ac21741c3a78eef72b832902e15fbd948 18-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Hide varying pull constant load message setup behind logical opcode.

This will allow the SIMD lowering pass to split 32-wide varying pull
constant loads (not natively supported by the hardware) into 16-wide
instructions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c5f224145a41079ddcc77c0d7df8b4b75ed2d4fe 18-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Handle instruction predication in SIMD lowering pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1760c24b4bcf028477404e283f5768f2b6f25123 18-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: No need to unzip SIMD-periodic sources during SIMD lowering.

If the source value is going to the same for all SIMD-lowered chunks
of the instruction there should be no need to unzip the value into
multiple temporary registers one for each lowered chunk. As a side
effect this fixes SIMD lowering of instructions with a vector
immediate source. In the long term it *might* still be worth fixing
offset() to handle vector immediates correctly though, this should be
good enough for the moment.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e79aa19d88b4d6dbd26c23287292e6bf9f41ce33 20-May-2016 Juan A. Suarez Romero <jasuarez@igalia.com> i965: fix double-precision vertex inputs measurement

For double-precision vertex inputs we need to measure them in dvec4
terms, and for single-precision vertex inputs we need to measure them in
vec4 terms.

For the later case, we use type_size_vec4() function. For the former
case, we had a wrong implementation based on type_size_vec4().

This commit introduces a proper type_size_dvec4() function, that we use
to measure vertex inputs.

Measuring double-precision vertex inputs as dvec4 is required because
ARB_vertex_attrib_64bit states that these uses the same number of
locations than the single-precision version. That is, two consecutives
dvec4 would be located in location "x" and location "x+1", not "x+2".

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
dac10e8a1390711f1f36f224644c4a33586cebe3 17-May-2016 Kenneth Graunke <kenneth@whitecape.org> i965, anv: Use NIR FragCoord re-center and y-transform passes.

This handles gl_FragCoord transformations and other window system vs.
user FBO coordinate system flipping by multiplying/adding uniform
values, rather than recompiles.

This is much better because we have no decent way to guess whether
the application is going to use a shader with the window system FBO
or a user FBO, much less the drawable height. This led to a lot of
recompiles in many applications.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8a65b5135a167d4f12cef19408e0ca52fffe06bc 05-May-2016 Matt Turner <mattst88@gmail.com> i965/fs: Recognize and emit ld_lz, sample_lz, sample_c_lz.

Ken suggested instead of a big and complicated optimization pass, to
just recognize the operations here. It's certainly less code and a lot
prettier, but it seems to actually perform worse for currently unknown
reasons.

total instructions in shared programs: 8923452 -> 8904108 (-0.22%)
instructions in affected programs: 814563 -> 795219 (-2.37%)
helped: 3336
HURT: 10

total cycles in shared programs: 66970734 -> 66651476 (-0.48%)
cycles in affected programs: 10582686 -> 10263428 (-3.02%)
helped: 2438
HURT: 691

total spills in shared programs: 1811 -> 1789 (-1.21%)
spills in affected programs: 85 -> 63 (-25.88%)
helped: 4

total fills in shared programs: 3143 -> 3109 (-1.08%)
fills in affected programs: 167 -> 133 (-20.36%)
helped: 4

LOST: 2
GAINED: 36

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
75dccf5ac2af716175990ae9eac44cc2c99b7e9c 05-May-2016 Matt Turner <mattst88@gmail.com> i965: Add infrastucture for sample lod-zero operations.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
767168771376a3ee95d5b1f5b2f5fd577b76391e 17-May-2016 Eduardo Lima Mitev <elima@igalia.com> i965/fs: Silence warnings related to use of uninitialized values

brw_fs.cpp: In function ‘const unsigned int* brw_compile_fs(const [...]
brw_fs.cpp:6093:64: warning: ‘simd16_grf_start’ may be used uninitialized [...]
prog_data->base.dispatch_grf_start_reg = simd16_grf_start;

brw_fs.cpp:5996:29: note: ‘simd16_grf_start’ was declared here
uint8_t simd8_grf_start, simd16_grf_start;

brw_fs.cpp:6094:52: warning: ‘simd16_grf_used’ may be used uninitialized [...]
prog_data->reg_blocks_0 = brw_register_blocks(simd16_grf_used);

brw_fs.cpp:5997:29: note: ‘simd16_grf_used’ was declared here
unsigned simd8_grf_used, simd16_grf_used;

(and more)

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
265487aedfabbcfb073f9d6053d1ceb510b78b27 16-May-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Add an allow_spilling flag to brw_compile_fs

This allows us to disable spilling for blorp shaders since blorp state
setup doesn't handle spilling. Without this, blorp fails hard if you run
with INTEL_DEBUG=spill.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d6281a9d955ad97f993927bc214e4b641cfbe359 15-Apr-2016 Juan A. Suarez Romero <jasuarez@igalia.com> i965: take care of doubles when lowering VS inputs

Input attributes can require 2 vec4 or 1 vec4 depending on whether they
are double-precision or not.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7ea09511ca4f58640063cc1ee08386cce5300535 04-Apr-2016 Juan A. Suarez Romero <jasuarez@igalia.com> i965/fs: calculate first non-payload GRF using attrib slots

When computing where the first non-payload GRF starts, we can't rely on
the number of attributes, as each attribute can be using 1 or 2 slots
depending on whether they are a dvec3/4 or other.

Instead, we need to use the number of slots used by the attributes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
96c276dda909ddf12714b9e64b7207156e8fd4bb 23-Mar-2016 Alejandro Piñeiro <apinheiro@igalia.com> i965/fs: half exec_size when dealing with 64 bits attributes

The HW has a restriction that only vertical stride may cross register
boundaries. Until now this was only handled on VGRFs at
rw_reg_from_fs_reg, but it is also needed for attributes.

v2:
* Remove reference to commit id on commit message (Juan Suarez)
* Simplify code that compute final exec_size (Ian Romanick)
* Use REG_SIZE on that same code (Kenneth Graunke)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
58f1804c4f38b76c20872d6887b7b5e6029e0454 18-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: fix pull constant load component selection for doubles

UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4 starting at a
constant offset that is 16-byte aligned. If we need to access an unaligned
offset we emit a load with an aligned offset and use the remaining constant
offset to select the component into the vec4 result that we are interested
in. This component must be computed in units of the type size, since that
is what fs_reg::set_smear expects.

This patch does this change in the two places where we use this message:
In demote_pull_constants when we lower uniform access with constant offset
into the pull constant buffer and in UBO loads with constant offset.

v2 (Sam):
- Fix set_smear() in fs_visitor::lower_constant_loads(), take into account
source type instead and remove MAX2 (Curro).
- Improve changes to nir_intrinsic_load_ubo case in nir_emit_intrinsic()
(Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e209134f717078fb6c1d4a6d048b4aba22c87993 14-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: Fix fs_visitor::VARYING_PULL_CONSTANT_LOAD for doubles

v2 (Curro):
- Assert on scale == 1 when shuffling 64-bit data.
- Remove type_slots, use type_sz(vec4_result.type) instead.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4d9c461e53440182de42d0a16ec66ad7f5c3b00a 04-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Stop using the LOAD_PAYLOAD instruction in lower_simd_width.

Instead of using the LOAD_PAYLOAD instruction (emitted through the
emit_transpose() helper that is no longer useful and this commit
removes) which had to be marked force_writemask_all in some cases,
emit a series of moves to apply proper channel enable signals to the
destination. Until now lower_simd_width() had mainly been used to
lower things that invariably had a basic block-local temporary as
destination so it didn't seem like a big deal, but I found it to be
the reason for several Piglit regressions in my SIMD32 branch and
Igalia discovered the same issue independently while working on FP64
support.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a0e6e5f21ffea8acb9500ef699b204c557214b75 30-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use MRF0 for the repclear message

This is what BLORP does. Making them match cuts down on the noise when
looking at AUB diffs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bee160b31be9e09eeab83f62d26ac331f08955fa 29-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Organize prog_data by ksp number rather than SIMD width

The hardware packets organize kernel pointers and GRF start by slots that
don't map directly to dispatch width. This means that all of the state
setup code has to re-arrange the data from prog_data into these slots.
This logic has been duplicated 4 times in the GL driver and one more time
in the Vulkan driver. Let's just put it all in brw_fs.cpp.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1ec466d0ff59ab17edef95c84ed733c1fea5655e 28-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Stop setting dispatch_grf_start_reg from the visitor

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
082768af30cb73050bda8103a29136afb2fd020f 28-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Clean up the logic in compile_fs a bit

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
712a980adde0b14eee8b4accd02af9b9740091a2 10-May-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Rework the persample shading key/prog_data bits

This commit reworks and simplifies the way we handle persample shading in
the shader key and prog_data. The previous approach had three different
key bits that had slightly different and hard-to-decern meanings while the
new bits are far more clear. This commit changes it to two easily
understood bits that communicate everything we need:

1) key->persample_interp: means that the user has requested persample
interpolation through the API. This is equivalent to having
SAMPLE_SHADING enabled and having MIN_SAMPLE_SHADING_VALUE set high
enough that you actually get multiple per-sample invocations.

2) key->multisample_fbo: means that the shader will be running on an
actual multi-sampled framebuffer.

This commit also adds a new "persample_dispatch" bit to prog_data which
indicates that the shader should be run in persample mode. This way the
state setup code doesn't have to look at the fragment program or GL state
and can just pull that data out of the prog_data.

In theory, this shuffle could mean more recompiles. However, in practice,
we were shoving enough state into the key before that we were probably
hitting a recompile on every per-sample shader anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
607fb0f13df8e328ed5d173c98fc250449c55aee 10-May-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Reduce the SIMD8 GS push constant threshold from 32 to 24.

Three Shadow of Mordor geometry shaders increase by a single
instruction, but the number of spills/fills in Orbital Explorer
is reduced from 194:1279 -> 82:454. No other programs are affected.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
203c786a73847fb07d805c4cc799b7c7d028695c 10-May-2016 Jason Ekstrand <jason@jlekstrand.net> i965/fs: Default all constants to a location of -1

Otherwise constants which aren't live get an undefined constant location.
When we go to set up param and pull_param we end up assigning all unused
uniforms to slot 0. This cases the Vulkan driver to segfault because it
doesn't have pull_param.

This fixes bugs in the Vulkan driver introduced in c3fab3d000.

Reviewed-by: Mark Janes <mark.a.janes@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4c9006f95796e67cf2cac98795627c31b15b0371 20-Apr-2016 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/fs: fix MOV_INDIRECT exec_size for doubles

In that case, the writes need two times the size of a 32-bit value.
We need to adjust the exec_size, so it is not breaking any hardware
rule.

v2:
- Add an assert to verify type size is not less than 4 bytes (Jordan).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
03687ab77fea7893f8786ce407d6f4d108b28012 27-Nov-2015 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/fs: demote_pull_constants() did not take into account double types

The constants could be double, and it was allocating size for float types
for the destination register of varying pull constant loads.

Then the fs_visitor::validate() will complain.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c3fab3d00095ed4a5693d5272073298f07dcb9b5 05-May-2016 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/fs: push first double-based uniforms in push constant buffer

When there is a mix of definitions of uniforms with 32-bit or 64-bit
data type sizes, the driver ends up doing misaligned access to double
based variables in the push constant buffer.

To fix this, this patch pushes first all the 64-bit variables and
then the rest. Then, all the variables would be aligned to
its data type size.

v2:
- Fix typo and improve comment (Jordan).
- Use ralloc(NULL,...) instead of rzalloc(mem_ctx,...) (Jordan).
- Fix typo (Topi).
- Use pointers instead of references in set_push_pull_constant_loc() (Topi).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
193cb67a84c1725382f62a2f3aa60564d275c2f8 31-Mar-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: recognize writes with a subreg_offset > 0 as partial

Usually, writes to a subreg_offset > 0 would also have a stride > 1
and we would recognize them as partial, however, there is one case
where this does not happen, that is when we generate code for 64-bit
imemdiates in gen7, where we produce something like this:

mov(8) vgrf10:UD, <low 32-bit>
mov(8) vgrf10+0.4:UD, <high 32-bit>

and then we use the result with a stride of 0, as in:

mov(8) vgrf13:DF, vgrf10<0>:DF

Although we could try to avoid this issue by producing different code
for this by using writes with a stride of 2, that runs into other
problems affecting gen7 and the fact is that any instruction that
writes to a subreg_offset > 0 is a partial write so we should really
recognize them as such.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
34ed61b33459c975074df0e83a2161fb76526621 15-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs/lower_simd_width: Fix registers written for split instructions

When the original instruction had a stride > 1, the combined registers
written by the split instructions won't amount to the same register space
written by the original instruction because the split instructions will
use a stride of 1. The current code assumed otherwise and computed the
number of registers written by split instructions as an equal share based
on the relation between the lowered width and the original execution size
of the instruction.

It is only after the split, when we interleave the components of the result
from the lowered instructions back into the original dst register, that the
original stride takes effect and we write all the registers specified by
the original instruction.

Just make the number of register written the same as the vgrf space we
allocate for the dst of the split instruction.

Fixes crashes in fp64 tests produced as a result of assigning incorrectly the
number of registers written by split instructions, which led to incorrect
validation of the size of the writes against the allocated vgrf space.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9741cff1ec3bdc0edf4122bf20aa3447dd8cb741 18-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: rename our lower_d2f pass to lower_d2x

Since it no longer handles conversions from double to float but from
double to various other 32-bit types.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9e1b3ea199c3bd01fe89e6ab3eee4cae3da92264 01-Nov-2015 Connor Abbott <cwabbott0@gmail.com> i965/fs: add a pass for legalizing d2f

We need to do this late, in order to avoid partial writes during the
optimization loop.

v2: Use subscript() instead of stride().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6b6d68ae0786e456faa828a7eaf76c981c44b1cb 11-Aug-2015 Connor Abbott <connor.w.abbott@intel.com> i965/fs: fix is_copy_payload() for doubles

v2 (Sam):
- LOAD_PAYLOAD treats each header source as a 32B block
regardless of the datatype. Drop the change (Curro)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4f3888c1caf3455f61b2e20ccf7c39e59f4feaf3 29-Jul-2015 Connor Abbott <connor.w.abbott@intel.com> i965/fs: fix assign_constant_locations() for doubles

Uniform doubles will read two registers, in which case we need to mark
both as being live.

v2 (Sam):
- Use a formula to get the number of registers read with proper
units (Curro).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1f51aada3fbf73ffe601f743b5244df63e17f9d5 29-Jul-2015 Connor Abbott <connor.w.abbott@intel.com> i965/fs: fix type_size() for doubles

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fd763177c13579ff51cb35fd8bc3b6d703073b61 05-May-2016 Connor Abbott <cwabbott0@gmail.com> i965/fs: add a pass for lowering PACK opcodes

v2: Use subscript() instead of stride() (Curro)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ba582e58cd30c815137a11c9497b01d97842e525 05-May-2016 Connor Abbott <cwabbott0@gmail.com> i965/fs: add PACK opcode

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a308bae58f3e2dabd2ffaec98c1f91c9abf7a9f8 04-Aug-2015 Connor Abbott <connor.w.abbott@intel.com> i965/fs: add support for printing double immediates

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3a886721ed449be0c87ece972acada96cc0811b6 04-May-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Silence unused variable warning

I added this when deleting some unnecessary code in a rebase.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1cc7573162a7f0e8346d7abab50890c58a0dce9a 28-Apr-2016 Francisco Jerez <currojerez@riseup.net> i965: Pass devinfo pointer to is_3src() helpers.

This is not strictly required for the following changes because none
of the three-source opcodes we support at the moment in the compiler
back-end has been removed or redefined, but that's likely to change in
the future. In any case having hardware instructions specified as a
pair of hardware device and opcode number explicitly in all cases will
simplify the opcode look-up interface introduced in a subsequent
commit, since the opcode number alone is in general ambiguous.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c55dc77ab13420a9fe0177ccd21a6b0a950d9113 28-Apr-2016 Francisco Jerez <currojerez@riseup.net> i965: Pass devinfo pointer to brw_instruction_name().

A future series will implement support for an instruction that happens
to have the same opcode number as another instruction we support
already on a disjoint set of hardware generations. In order to
disambiguate which instruction it is brw_instruction_name() will need
some way to find out which device we are generating code for.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7d9143ad885752184156b3a0d3e492aef09af3b0 15-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Write a scalar TCS backend that runs in SINGLE_PATCH mode.

Unlike most shader stages, the Hull Shader hardware makes us explicitly
tell it how many threads to dispatch and manually configure the channel
mask. One perk of this is that we have a lot of flexibility - we can
run it in either SIMD4x2 or SIMD8 mode.

Treating it as SIMD8 means that shaders with 8 or fewer output vertices
(which is overwhemingly the common case) can be handled by a single
thread. This has several intriguing properties:

- Accessing input arrays with gl_InvocationID as the index is a simple
SIMD8 URB read with g1 as the header. No indirect addressing required.
- Barriers are no-ops.
- We could potentially do output shadowing to combine writes, as the
concurrency concerns are gone. (We don't do this yet, though.)

v2: Drop first_non_payload_grf change, as it was always adding 0
(caught by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
13195f7ef85e0923a7b7d5b8a35eb6b6c257db1c 23-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Reduce the response length of sampler messages on Skylake.

Often, we don't need a full 4 channels worth of data from the sampler.
For example, depth comparisons and red textures only return one value.
To handle this, the sampler message header contains a mask which can
be used to disable channels, and reduce the message length (in SIMD16
mode on all hardware, and SIMD8 mode on Broadwell and later).

We've never used it before, since it required setting up a message
header. This meant trading a smaller response length for a larger
message length and additional MOVs to set it up.

However, Skylake introduces a terrific new feature: for headerless
messages, you can simply reduce the response length, and it makes
the implicit header contain an appropriate mask. So to read only
RG, you would simply set the message length to 2 or 4 (SIMD8/16).

This means we can finally take advantage of this at no cost.

total instructions in shared programs: 9091831 -> 9073067 (-0.21%)
instructions in affected programs: 191370 -> 172606 (-9.81%)
helped: 2609
HURT: 0

total cycles in shared programs: 70868114 -> 68454752 (-3.41%)
cycles in affected programs: 35841154 -> 33427792 (-6.73%)
helped: 16357
HURT: 8188

total spills in shared programs: 3492 -> 1707 (-51.12%)
spills in affected programs: 2749 -> 964 (-64.93%)
helped: 74
HURT: 0

total fills in shared programs: 4266 -> 2647 (-37.95%)
fills in affected programs: 3029 -> 1410 (-53.45%)
helped: 74
HURT: 0

LOST: 1
GAINED: 143

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
acc2f1fe361af87ce4d50b7e2b58e0da093477e1 09-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use inst->regs_written for rlen for texture instructions

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
33565d67641142a68d537023e181b6dcd587e551 20-Apr-2016 Matt Turner <mattst88@gmail.com> i965/fs: Readd opt_drop_redundant_mov_to_flags().

This reverts commit b449366587b5f3f64c6fb45fe22c39e4bc8a4309.

I removed the pass thinking that it was now not useful, but that was not
true. I believe I ran shader-db on HSW and saw no results, but HSW does
not use the unlit centroid workaround code and as a result does not emit
redundant MOV_DISPATCH_TO_FLAGS instructions.

On IVB, the shader-db results are:

total instructions in shared programs: 6650806 -> 6646303 (-0.07%)
instructions in affected programs: 106893 -> 102390 (-4.21%)
helped: 793

total cycles in shared programs: 56195538 -> 56103720 (-0.16%)
cycles in affected programs: 873048 -> 781230 (-10.52%)
helped: 553
HURT: 209

On SNB, the shader-db results are:

total instructions in shared programs: 7173074 -> 7168541 (-0.06%)
instructions in affected programs: 119757 -> 115224 (-3.79%)
helped: 799

total cycles in shared programs: 98128032 -> 98072938 (-0.06%)
cycles in affected programs: 1437104 -> 1382010 (-3.83%)
helped: 454
HURT: 237

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
447d3eec6a869200612e5010f47335cb26789a3a 06-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Fix gl_SampleMaskIn[] in per-sample shading mode.

The coverage mask is not sufficient - in per-sample mode, we also need
to AND with a mask representing the samples being processed by the
current fragment shader invocation.

Fixes 18 dEQP-GLES31.functional.shaders.sample_variables tests:

sample_mask_in.bit_count_per_sample.multisample_{rbo,texture}_{1,2,4,8}
sample_mask_in.bit_count_per_two_samples.multisample_{rbo,texture}_{4,8}
sample_mask_in.bits_unique_per_sample.multisample_{rbo,texture}_{1,2,4,8}
sample_mask_in.bits_unique_per_two_samples.multisample_{rbo,texture}_{4,8}

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
66a725570c9f93ab0341e9479390c9d042d7cd00 05-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Only enable oMask output when there's a multisample FBO.

The ARB_sample_shading specification says that setting gl_SampleMask
bits to 0 means that the corresponding sample "should be considered
uncovered for the purposes of multisample fragment operations
(Section 4.1.3)."

The OpenGL 4.4 specification, section 17.3.3 ("Multisample Fragment
Operations") specifies:

"No changes to the fragment alpha or coverage values are made at this
step if MULTISAMPLE is disabled, or if the value of SAMPLE_BUFFERS
is not one."

oMask output alters coverage masks and can kill pixels. We need to
disable it in the above case, which conveniently corresponds to
key->multisample_fbo being false.

Khronos bug #12188 also spells this out clearly:
https://cvs.khronos.org/bugzilla/show_bug.cgi?id=12188

Fixes two Piglit tests:
tests/spec/arb_sample_shading/builtin-gl-sample-mask-simple 0
tests/spec/arb_sample_shading/builtin-gl-sample-mask 0

Fixes 21 ES3 conformance tests:
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_0
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_1
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_7
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_4
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_5
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_7
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_4
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_6
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_0
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_5
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_7

Fixes 9 dEQP-GLES31.functional.shaders.sample_variables tests:
sample_mask.discard_half_per_pixel.default_framebuffer
sample_mask.discard_half_per_pixel.singlesample_rbo
sample_mask.discard_half_per_pixel.singlesample_texture
sample_mask.discard_half_per_sample.default_framebuffer
sample_mask.discard_half_per_sample.singlesample_rbo
sample_mask.discard_half_per_sample.singlesample_texture
sample_mask.discard_half_per_two_samples.default_framebuffer
sample_mask.discard_half_per_two_samples.singlesample_rbo
sample_mask.discard_half_per_two_samples.singlesample_texture

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
81407531e0b8d2e6a7f9c39cb44ed6a72dc61e77 06-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Generalize wm_key->compute_sample_id to wm_key->multisample_fbo.

I'm going to need a key entry meaning "we have a multisample FBO,
and multisampling is enabled" in an upcoming patch. This is basically
wm_key->compute_sample_id, except that it also checks that the SAMPLE_ID
system value is read.

The only use of wm_key->compute_sample_id is in emit_sampleid_setup(),
which is only called when handling the SAMPLE_ID system value. So we
can just eliminate the check and generalize the field.

v2: Also update the Vulkan driver.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
57118a19da932b4b5756021a0d75e91f42a68d99 06-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Simplify gl_SampleID setup on Gen8+.

On Gen7+, the thread payload provides the sample ID - we can read it
in two instructions, without any elaborate calculations. We don't even
need a state dependency - this will properly produce zero in the
non-MSAA case. Unfortunately, we need the state flag anyway, so we
may as well continue to use it to produce a single MOV 0 instead of
SHR/AND.

For some reason, the sample ID field is always zero on Gen7/7.5, so
we can't use this yet. However, it works fine on Gen8+. So, land the
code and use it where it's working, and leave a TODO for later.

v2: Fix register types in the comment (caught by Matt Turner!).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
528255b0b1498d22c820cecc5d75591d25ddb375 19-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Flip key->compute_sample_id check.

This just moves the simple case first.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f1d29099b4eedafb0302a21c0673d12a6610c369 06-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965: Push everything if pull_param == NULL

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
963513bb24bdd542f1af3733fab53ad450d3221b 09-Dec-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Push small uniform arrays

Unfortunately, this also means that we need to use a slightly different
algorithm for assign_constant_locations. The old algorithm worked based on
the assumption that each read of a uniform value read exactly one float.
If it encountered a MOV_INDIRECT, it would immediately bail and push the
whole thing. Since we can now read ranges using MOV_INDIRECT, we need to
be able to push a series of floats without breaking them up. To do this,
we use an algorithm similar to the on in split_virtual_grfs.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
71f8039f728eb0a67e471321da61f0e88aec8035 09-Dec-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Rename demote_pull_constants to lower_constant_loads

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
479e38ad63ab1421afe4f25d36f434ac2e12e817 25-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Get rid of the param_size array

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
30874216cbaa21e9b757af7db1ef165b5c780a39 25-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Stop relying on param_size in assign_constant_locations

Now that we have MOV_INDIRECT opcodes, we have all of the size information
we need directly in the opcode. With a little restructuring of the
algorithm used in assign_constant_locations we don't need param_size
anymore. The big thing to watch out for now, however, is that you can have
two ranges overlap where neither contains the other. In order to deal with
this, we make the first pass just flag what needs pulling and handle
assigning pull constant locations until later.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
275855f315623923eff863265077a9a840885c9e 25-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Get rid of reladdr

We aren't using it anymore.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3c93cdfaf598bc3c28e3dc288da35675c666602b 25-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use MOV_INDIRECT for all indirect uniform loads

Instead of using reladdr, this commit changes the FS backend to emit a
MOV_INDIRECT whenever we need an indirect uniform load. We also have to
rework some of the other bits of the backend to handle this new form of
uniform load. The obvious change is that demote_pull_constants now acts
more like a lowering pass when it hits a MOV_INDIRECT.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
27bd8ac6f309b9f052a7fa9380ac5e12fb686e31 24-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Add support for MOV_INDIRECT on pre-Broadwell hardware

While we're at it, we also add support for the possibility that the
indirect is, in fact, a constant. This shouldn't happen in the common case
(if it does, that means NIR failed to constant-fold something), but it's
possible so we should handle it.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
889e6054b7795baa789cc771e76e009d1605efae 24-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Fix regs_read() for MOV_INDIRECT with a non-zero subnr

The subnr field is in bytes so we don't need to multiply by type_sz.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
40a8fe04dcee7a867e7d6044b23fafc20599c899 24-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Add support for doing MOV_INDIRECT on uniforms

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
240d16ea94834eb2472e91fd4856381951a07007 25-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use UD type for offsets in VARYING_PULL_CONSTANT_LOAD

Reveiewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e61cc87c757f8bc0b6a3af318a512b22c072595c 06-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Add a flat_inputs field to prog_data

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3921b64e63db39a3f19ebb8250081ba7ddf843a2 04-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Make the repclear shader support either a uniform or a flat input

In the Vulkan driver we use a single flat input instead of a uniform
because setting up push constants is more disruptive to the pipeline than
setting up another vertex input. This uses the number of uniforms as a key
to keep it working for the GL driver.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
01425c45b32fa7f323515b05697c6cc0d245ad32 17-Mar-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965: Remove the RCP+RSQ algebraic optimizations

NIR already has this optimization and it can do much better than the little
peephole in the backend.

No shader-db change on Haswell or Broadwell.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7d021cb15e6d67ecef8b020fd36c4a680bcc9c39 18-Jan-2016 Jordan Justen <jordan.l.justen@intel.com> i965/nir: Lower nir compute shader shared variables

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
93be4158aed9accab06e3df2d8c526d3312bfff8 12-Mar-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Add missing analysis invalidation in fixup_3src_null_dest().

Bug found by the liveness analysis validation pass that will be
introduced in a later commit. fixup_3src_null_dest() was allocating
registers which makes the cached liveness analysis calculation
incomplete, so it must be invalidated.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6691c03fd39be463e1d222b56e3ec8da9f3b7f24 12-Mar-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Add missing analysis invalidation in opt_sampler_eot().

Bug found by the liveness analysis validation pass that will be
introduced in a later commit. opt_sampler_eot() was allocating
registers and inserting and removing instructions, which makes the
cached liveness analysis calculation inconsistent with the shader IR,
so it must be invalidated.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a100a57e30010da49c96f84a661cec9c57f9eebe 20-Feb-2016 Jordan Justen <jordan.l.justen@intel.com> i965/hsw: Initialize SLM index in state register

For Haswell, we need to initialize the SLM index in the state
register. This can be copied out of the CS header dword 0.

v2:
* Use UW move to avoid changing upper 16-bits of sr0.1 (mattst88)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94081
Fixes: piglit arb_compute_shader/execution/shared-atomics.shader_test
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d8347f12ead89c5a58f69ce9283a54ac8487159c 22-Feb-2016 Jordan Justen <jordan.l.justen@intel.com> i965/compute: Skip SIMD8 generation if it can't be used

If the local workgroup size is sufficiently large, then the SIMD8
program can't be used. In this case we can skip generating the SIMD8
program. For complex programs this can save a significant amount of
time.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e1d54b1ba5a9d579020fab058bb065866bc35554 22-Feb-2016 Jordan Justen <jordan.l.justen@intel.com> i965/fs: Allow spilling for SIMD16 compute shaders

For fragment shaders, we can always use a SIMD8 program. Therefore, if
we detect spilling with a SIMD16 program, then it is better to skip
generating a SIMD16 program to only rely on a SIMD8 program.

Unfortunately, this doesn't work for compute shaders. For a compute
shader, we may be required to use SIMD16 if the local workgroup size
is bigger than a certain size. For example, on gen7, if the local
workgroup size is larger than 512, then a SIMD16 program is required.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93840
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cfbd9831f89ef165e7998d0b8524a1aefedec404 25-Feb-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Eliminate brw_nir_lower_{inputs,outputs,io} functions.

Now that each stage is directly calling brw_nir_lower_io(), and we have
per-stage helper functions, it makes sense to just call the relevant one
directly, rather than going through multiple switch statements.

This also eliminates stupid function parameters, such as the two that
only apply to vertex attributes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b3cb6e78aa219ad73c145a25ee1bb48fd8b025d0 17-Feb-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Do lower_io late for fragment shaders

The Vulkan driver wants to be able to delete fragment outputs that are
beyond key.nr_color_regions; this is a lot easier if we lower outputs at
specialization time rather than link time.

(Rationale added to commit message by Ken)

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2f2c00c7279e7c43e520e21de1781f8cec263e92 11-Feb-2016 Matt Turner <mattst88@gmail.com> i965: Lower min/max after optimization on Gen4/5.

Gen4/5's SEL instruction cannot use conditional modifiers, so min/max
are implemented as CMP + SEL. Handling that after optimization lets us
CSE more.

On Ironlake:

total instructions in shared programs: 6426035 -> 6422753 (-0.05%)
instructions in affected programs: 326604 -> 323322 (-1.00%)
helped: 1411

total cycles in shared programs: 129184700 -> 129101586 (-0.06%)
cycles in affected programs: 18950290 -> 18867176 (-0.44%)
helped: 2419
HURT: 328

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
95ea9f770878517364ac2161eb943afbc77bfef9 10-Feb-2016 Jason Ekstrand <jason.ekstrand@intel.com> glsl/types: Add support for function types

SPIR-V has a concept of a function type that's used fairly heavily. We
could special-case function types in SPIR-V -> NIR but it's easier if we
just add support to glsl_types.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5743fd957145040a4734b5542ee5187cfad4cf1d 11-Feb-2016 Ben Widawsky <benjamin.widawsky@intel.com> i965: Rename optimizer debug 00 filename

This allows ls, and scripts to get the file names in the correct order of
optimization.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
56eb9c44adfa38f776689dd1a1bc42fe55c15dd8 11-Feb-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Pass usage of depth, W, and sample mask through prog_data

We really need to stop pulling information directly out of shaders for
state setup. For one thing, if we want any sort of an on-disk shader
cache, having all of this metadata in one place is going to be crucial.
Also, passing it all through prog_data cleans up the compiler <-> state
setup API substantially.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ae3543950c93ec4ac179013cb1c7baaf6f5ef4a7 11-Feb-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Refactor setup_payload_gen6 to assume FS

It's extremely FS specific so the fact that we have a stage check in the
middle of it is rather bogus. While were here, we rename
setup_payload_gen4 and setup_payload_gen6 to make it obvious that they are
both FS specific.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a2c8b5ece5790825dba951c35e4c5aab003e3217 11-Feb-2016 Chris Forbes <chrisf@ijw.co.nz> i965: ir: dump floats as %-g rather than %f, so we can see denormals

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b8ab9c8c8674d67e09c1134ca44b37e0a611f5b5 06-Feb-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Plumb separate surfaces and samplers through from NIR

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a37b8110c13bf9e38220d6eb9e531b2acffcb4ed 06-Feb-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Add an enum for keeping track of texture instruciton sources

These logical texture instructions can have a *lot* of sources. It's much
safer if we have symbolic names for them.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
24f984f64ae58c274f79eaf9148aea37df67131c 18-Jan-2016 Emil Velikov <emil.velikov@collabora.com> nir: move glsl_types.{cpp,h} to compiler

Allows us to remove the SCons workaround :-)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
315cda671570a149af5117d9b265dc71396122ba 21-Jan-2016 Ben Widawsky <benjamin.widawsky@intel.com> i965/fs: Remove unused count from vs urb setup

This was originally removed here:
commit 031d3501322aee0a1474c7f2a9b79f9fa9947430
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Tue Aug 25 16:59:12 2015 -0700

i965/vs: Unify URB entry size/read length calculations between backends.

Then added back:
commit bd198b9f0a292a9ff4ffffec3a29bad23d62caba
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Fri Aug 14 16:01:33 2015 -0700

i965/vs: Simplify fs_visitor's ATTR file.

Note that the authorship dates are out of order, but the above reflects the
order of the commit dates.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9870f798beab701a9edda81ff7ccc39f1875d610 15-Jan-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs/generator: Take an actual shader stage rather than a string

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
97685ff10e0f866d809fc1e8f115fb6e92ce717c 29-Dec-2015 Marta Lofstedt <marta.lofstedt@intel.com> i965/gen8: Always use BRW_REGISTER_TYPE_UW for MUL on GEN8+

The imulExtended tests of the shader bitfield tests of the
OpenGL ES 3.1 CTS, fail on gen8+, when BRW_REGISTER_TYPE_W
is used for SHADER_OPECODE_MULH.

Also, remove unused helper function:
static inline bool type_is_signed(unsigned type)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cddfc2cefa93b884c40329dcb193fe4fb22143ab 10-Dec-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Add support for gl_DrawIDARB and enable extension

We have to break open a new vec4 for gl_DrawIDARB. We've used up all
space in the vec4 we use for SGVS and gl_DrawIDARB has to come from its
own separate vertex buffer anyway. This is because we point the vb for
base vertex and base instance into the draw parameter BO for indirect
draw calls, but the draw id is generated by mesa in a different buffer.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
17ebb55a14b5a9aa639845fbda9330ef9421834a 10-Dec-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Add support for gl_BaseVertexARB and gl_BaseInstanceARB

We already have gl_BaseVertexARB in the .x component of the SGVS vec4
and plug gl_BaseInstanceARB into the last free component (.y).

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a5038427c3624e559f954124d77304f9ae9b884c 10-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Add tessellation evaluation shaders

The TES is essentially a post-tessellator VS, which has access to the
entire TCS output patch, and a special gl_TessCoord input. Otherwise,
they're very straightforward.

This patch implements SIMD8 tessellation evaluation shaders for Gen8+.
The tessellator can generate a lot of geometry, so operating in SIMD8
mode (8 vertices per thread) is more efficient than SIMD4x2 mode (only
2 vertices per thread). I have another patch which implements SIMD4x2
mode for older hardware (or via an environment variable override).

We currently handle all inputs via the pull model.

v2: Improve comments (suggested by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c51f133197437d01696abd9513fbcda4b16b897c 11-Dec-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Move brw_cs_fill_local_id_payload() to libi965_compiler

This is a helper function for setting up the local invocation ID
payload according to the cs_prog_data generated by the compiler. It's
intended to be available to users of libi965_compiler so move it there.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
18069dce4a4c3d71e6afc6b10bfa7bee0560ba9c 11-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Make uniform offsets be in terms of bytes

This commit pushes makes uniform offsets be terms of bytes starting with
nir_lower_io. They get converted to be in terms of vec4s or floats when we
cram them in the UNIFORM register file but reladdr remains in terms of
bytes all the way down to the point where we lower it to a pull constant
load.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
13ad8d03f201a4d09bf7ab9078b00807d61dfada 01-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use a stride of 1 and byte offsets for UBOs

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
83dedb6354d0e9b04e8ccad77e86bdb7bad44bdd 20-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Add src/dst interference for certain instructions with hazards.

When working on tessellation shaders, I created some vec4 virtual
opcodes for creating message headers through a sequence like:

mov(8) g7<1>UD 0x00000000UD { align1 WE_all 1Q compacted };
mov(1) g7.5<1>UD 0x00000100UD { align1 WE_all };
mov(1) g7<1>UD g0<0,1,0>UD { align1 WE_all compacted };
mov(1) g7.3<1>UD g8<0,1,0>UD { align1 WE_all };

This is done in the generator since the vec4 backend can't handle align1
regioning. From the visitor's point of view, this is a single opcode:

hs_set_output_urb_offsets vgrf7.0:UD, 1U, vgrf8.xxxx:UD

Normally, there's no hazard between sources and destinations - an
instruction (naturally) reads its sources, then writes the result to the
destination. However, when the virtual instruction generates multiple
hardware instructions, we can get into trouble.

In the above example, if the register allocator assigned vgrf7 and vgrf8
to the same hardware register, then we'd clobber the source with 0 in
the first instruction, and read back the wrong value in the last one.

It occured to me that this is exactly the same problem we have with
SIMD16 instructions that use W/UW or B/UB types with 0 stride. The
hardware implicitly decodes them as two SIMD8 instructions, and with
the overlapping regions, the first would clobber the second.

Previously, we handled that by incrementing the live range end IP by 1,
which works, but is excessive: the next instruction doesn't actually
care about that. It might also be the end of control flow. This might
keep values alive too long. What we really want is to say "my source
and destinations interfere".

This patch creates new infrastructure for doing just that, and teaches
the register allocator to add interference when there's a hazard. For
my vec4 case, we can determine this by switching on opcodes. For the
SIMD16 case, we just move the existing code there.

I audited our existing virtual opcodes that generate multiple
instructions; I believe FS_OPCODE_PACK_HALF_2x16_SPLIT needs this
treatment as well, but no others.

v2: Rebased by mattst88.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3e9003e9cf55265ab1fb6522dc5cbb2f455ea1f9 20-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Fix fragment shader struct inputs.

Apparently we have literally no support for FS varying struct inputs.
This is somewhat surprising, given that we've had tests for that very
feature that have been passing for a long time.

Normally, varying packing splits up structures for us, so we don't see
them in the backend. However, with SSO, varying packing isn't around
to save us, and we get actual structs that we have to handle.

This patch changes fs_visitor::emit_general_interpolation() to work
recursively, properly handling nested structs/arrays/and so on.
(It's easier to read with diff -b, as indentation changes.)

When using the vec4 VS backend, this fixes rendering in an upcoming
game from Feral Interactive. (The scalar VS backend requires additional
bug fixes in the next patch.)

v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt).

Cc: "11.1 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f36993b46962eab4446bc1964eb47149751aee26 23-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Clean up #includes in the compiler.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6ba700c3c38f216987ebb9b8a1ce80ac784f2d5a 23-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Compile brw_cs_fill_local_id_payload() as C.

It's only called from C, it compiles as C, so just compile it as C.

Notice the missing extern "C" on the definition of the function, which
would screw things up if the prototype wasn't parsed before the
definition.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ecac1aab538d65f0867fd93e23d0d020c1a5d0f1 23-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Push down inclusion of brw_program.h.

We were including it in headers, which then caused it to be included in
tons of places it wasn't needed.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2d8c5299032d229c8f6e936db5644cd53716e6c1 20-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Prevent implicit upcasts to brw_reg.

Now that backend_reg inherits from brw_reg, we have to be careful to
avoid the object slicing problem.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
799f924073c62c3a012c48a51895b46ad621e36c 24-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Use scope operator to ensure brw_reg is interpreted as a type.

In the next patch, I make backend_reg's inheritance from brw_reg
private, which confuses clang when it sees the type "struct brw_reg" in
the derived class constructors, thinking it is referring to the
privately inherited brw_reg:

brw_fs.cpp:366:23: error: 'brw_reg' is a private member of 'brw_reg'
fs_reg::fs_reg(struct brw_reg reg) :
^
brw_shader.h:39:22: note: constrained by private inheritance here
struct backend_reg : private brw_reg
^~~~~~~~~~~~~~~
brw_reg.h:232:8: note: member is declared here
struct brw_reg {
^

Avoid this by marking brw_reg with the scope resolution operator.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
309a44d63c75a7d688157486b094e555f49c907d 22-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Add and use backend_reg::equals().

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6c8ba59cff14a1a86273f4008ff2a8e68335ab25 11-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Use nir_lower_tex for texture coordinate lowering

Previously, we had a rescale_texcoords helper in the FS backend for
handling rescaling of texture coordinates. Now that we can do variants in
NIR, we can use nir_lower_tex to do the rescaling for us. This allows us
to delete the i965-specific code and gives us proper TEXTURE_RECTANGLE and
GL_CLAMP handling in vertex and geometry shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ce767bbdfff7c2a7829b652c111a11eb9ddba026 11-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Move postprocess_nir to codegen time

This allows us to insert NIR passes between initial NIR compilation and
optimization (link time) and actual backend code-gen. In particular, it
will allow us to do shader variants in NIR and share some of that shader
variant code between backends.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
718b9f52dd9ba780decf5bb59f5100cf590393a0 05-Aug-2015 Connor Abbott <connor.w.abbott@intel.com> i965/fs: print non-1 strides when dumping instructions

v2:
- Simplify code (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3ccc41ecfc5e9345a1c291748d8840984f7413ae 02-Nov-2015 Matt Turner <mattst88@gmail.com> i965/fs: Replace fs_reg(imm) constructors with brw_imm_*().

Cuts 10k of .text, of which only 776 bytes are the fs_reg constructor
implementations themselves.

text data bss dec hex filename
5204535 214112 27784 5446431 531b1f i965_dri.so before
5193977 214112 27784 5435873 52f1e1 i965_dri.so after

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fc19a0d2e422ea8e45bc5440a91f858f5f345884 08-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Allow indirect GS input indexing in the scalar backend.

This allows arbitrary non-constant indices on GS input arrays,
both for the vertex index, and any array offsets beyond that.

All indirects are handled via the pull model. We could potentially
handle indirect addressing of pushed data as well, but it would add
additional code complexity, and we usually have to pull inputs anyway
due to the sheer volume of input data. Plus, marking pushed inputs
as live due to indirect addressing could exacerbate register pressure
problems pretty badly. We'd need to be careful.

v2: Use updated MOV_INDIRECT opcode.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c531d409274328c9713221f33f1d24e0f4877451 17-Nov-2015 Ben Widawsky <benjamin.widawsky@intel.com> i965: Add assertion for src_stencil payload size

This helps address a coverity warning and prevents future questions about this
code.

Reported-by: Coverity (via Ilia)
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d2f089ba17c6b17823fc3d244e15c0a18108d5ce 08-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Introduce a MOV_INDIRECT opcode.

The geometry and tessellation control shader stages both read from
multiple URB entries (one per vertex). The thread payload contains
several URB handles which reference these separate memory segments.

In GLSL, these inputs are represented as per-vertex arrays; the
outermost array index selects which vertex's inputs to read. This
array index does not necessarily need to be constant.

To handle that, we need to use indirect addressing on GRFs to select
which of the thread payload registers has the appropriate URB handle.
(This is before we can even think about applying the pull model!)

This patch introduces a new opcode which performs a MOV from a
source using VxH indirect addressing (which allows each of the 8
SIMD channels to select distinct data.)

Based on a patch by Jason Ekstrand.

v2: Rename from INDIRECT_THREAD_PAYLOAD_MOV to MOV_INDIRECT; make it
a bit more generic. Use regs_read() instead of hacking up the
register allocator. (Suggested by Jason Ekstrand.)

v3: Fix regs_read() to be more accurate for small unaligned regions.
Also rebase on Matt's work.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v3]
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> [v1]
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5480bbd90ea288877b6e56d4860feb8f97bcba80 07-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Add a SHADER_OPCODE_URB_READ_SIMD8_PER_SLOT opcode.

We need to use per-slot offsets when there's non-uniform indexing,
as each SIMD channel could have a different index. We want to use
them for any non-constant index (even if uniform), as it lives in
the message header instead of the descriptor, allowing us to set
offsets in GRFs rather than immediates.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f88c175a29bb287d41ef90343eb6670525475a06 12-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Make convert_attr_sources_to_hw_regs handle stride == 0.

This makes expressions like component(fs_reg(ATTR, n), 7) get a proper
<0,1,0> region instead of the invalid <0,8,0>.

Nobody uses this today, but I plan to.

v2: Rebase on Matt's changes; simplify.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
49b3215d7076db8b9afe8998b01ef250795b5892 27-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Combine register file field.

The first four values (2-bits) are hardware values, and VGRF, ATTR, and
UNIFORM remain values used in the IR.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b3315a6f56fb93f2884168cbf9358b2606641db5 27-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Replace HW_REG with ARF/FIXED_GRF.

HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to
look at different fields for type, abs, negate, writemask, swizzle, and
a second file. They also caused annoying problems like immediate sources
being considered scheduling barriers (commit 6148e94e2) and other such
nonsense.

Instead use ARF/FIXED_GRF/MRF for fixed registers in those files.

After a sufficient amount of time has passed since "GRF" was used, we
can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4b0fbebf024e564c195f3ce94e1ce43a3d6442ea 02-Nov-2015 Matt Turner <mattst88@gmail.com> i965/fs: Set stride correctly for immediates in fs_reg(brw_reg).

The fs_reg() constructors for immediates set stride to 0, except for
vector-immediates, which set stride to 1. This patch makes the fs_reg
constructor that takes a brw_reg do likewise, so that stride is set
correctly for cases such as fs_reg(brw_imm_v(...)).

The generator asserts that this is true (and presumably it's useful in
some optimization passes?) and the VF fs_reg constructors did this (by
virtue of the fact that it doesn't override what init() does).

In the next commit, calling this constructor with brw_imm_* will generate
an IMM file register rather than a HW_REG, making this change necessary
to avoid breakage with existing uses of brw_imm_v().

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b163aa01487ab5f9b22c48b7badc5d65999c4985 27-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Rename GRF to VGRF.

The 2-bit hardware register file field is ARF, GRF, MRF, IMM.

Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7638e75cf99263c1ee8e31c6cc5a319feec2c943 26-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Use brw_reg's nr field to store register number.

In addition to combining another field, we get replace silliness like
"reg.reg" with something that actually makes sense, "reg.nr"; and no one
will ever wonder again why dst.reg isn't a dst_reg.

Moving the now 16-bit nr field to a 16-bit boundary decreases code size
by about 3k.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3048053908310eaf082058e5be34ae902e1fc02c 26-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Unwrap some lines.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
94b1031703b1b5759436fe215323727cffce5f86 25-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Remove fixed_hw_reg field from backend_reg.

Since backend_reg now inherits brw_reg, we can use it in place of the
fixed_hw_reg field.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1392e45bfb396ccbfa5bb0c6063522e0550988d3 24-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Use immediate storage in inherited brw_reg.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e42fb0c2a687cdcd6af2a590f6f5e24f64cfff3b 23-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Make 'dw1' and 'bits' unnamed structures in brw_reg.

Generated by

sed -i -e 's/\.bits\././g' *.c *.h *.cpp
sed -i -e 's/dw1\.//g' *.c *.h *.cpp

and then reverting changes to comments in gen7_blorp.cpp and
brw_fs_generator.cpp.

There wasn't any utility offered by forcing the programmer to list these
to access their fields. Removing them will reduce churn in future
commits.

This is C11 (and gcc has apparently supported it for sometime
"compatibility with other compilers")

See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e42a29531ae3d5dedb72011da2947357dfa8715b 10-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Print force_writemask_all in dump_instructions().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8dcf807cb43383590ba193c7ff20b8a98e4a9f65 14-Oct-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Fix scalar VS float[] and vec2[] output arrays.

The scalar VS backend has never handled float[] and vec2[] outputs
correctly (my original code was broken). Outputs need to be padded
out to vec4 slots.

In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot
by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4. However,
this is wrong: type_size_scalar() for a float[2] would return 2, or
for vec2[2] it would return 4. This looked like a single slot, even
though in reality each array element would be stored in separate vec4
slots.

Because of this bug, outputs[] and output_components[] would not get
initialized for the second element's VARYING_SLOT, which meant
emit_urb_writes() would skip writing them. Nothing used those values,
and dead code elimination threw a party.

To fix this, we introduce a new type_size_vec4_times_4() function which
pads array elements correctly, but still counts in scalar components,
generating correct indices in store_output intrinsics.

Normally, varying packing avoids this problem by turning varyings into
vec4s. So this doesn't actually fix any Piglit or dEQP tests today.
However, if varying packing is disabled, things would be broken.
Tessellation shaders can't use varying packing, so this fixes various
tcs-input Piglit tests on a branch of mine.

v2: Shorten the implementation of type_size_4x to a single line (caught
by Connor Abbott), and rename it to type_size_vec4_times_4()
(renaming suggested by Jason Ekstrand). Use type_size_vec4
rather than using type_size_vec4_times_4 and then dividing by 4.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d7013988fb1d1c277e1fbce8623abddc43f78e05 30-Oct-2015 Iago Toral Quiroga <itoral@igalia.com> i965/fs: Do not mark used direct surfaces in UNIFORM_PULL_CONSTANT_LOAD

Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
027b64a55afc0fe8efcf9f6217192807e285c830 30-Oct-2015 Iago Toral Quiroga <itoral@igalia.com> i965/fs: Do not mark direct used surfaces in VARYING_PULL_CONSTANT_LOAD

Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

v2: Use const and remove useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a6804654283a9d03bee92d61eee5b1d036c8db68 09-Sep-2015 Neil Roberts <neil@linux.intel.com> i965/fs/skl+: Fix calculating gl_SampleID for 16x MSAA

In order to accomodate 16x MSAA, the starting sample pair index is now
3 bits rather than 2 on SKL+.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e386fb0dee40d0f2342b43b6750b64c8174463a9 08-Sep-2015 Neil Roberts <neil@linux.intel.com> i965/fs/skl+: Use ld2dms_w instead of ld2dms

In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data retrieved from the
ld_mcs instruction already returns 4 or 8 registers and is documented
to return zeroes for the mcsh value when the sample count is less than
16.

v2: Use get_lowered_simd_width to fall back to SIMD8 instructions when
the message length would be too long in SIMD16.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
36fd65381756ed1b8f774f7fcdd555941a3d39e1 12-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Add scalar geometry shader support.

This is hidden behind INTEL_SCALAR_GS=1 for now, as we don't yet support
instanced geometry shaders, and Orbital Explorer's shader spills like
crazy. But the infrastructure is in place, and it's largely working.

v2: Lots of rebasing.

v3: (feedback from Kristian Høgsberg)
- Handle stride and subreg_offset correctly for ATTRs; use a helper.
- Fix missing emit_shader_time_end() call.
- Delete dead code after early EOT in static vertex case to avoid
tripping asserts in emit_shader_time_end().
- Use proper D/UD type in intexp2().
- Fix "EndPrimitve" and "to that" typos.
- Assert that invocations == 1 so we know this is missing.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7c81a6a647257c309cb1ca36c60aa4bfa8e2e022 26-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Replace default case with list of enum values.

If we add a new file type, we'd like to get warnings if it's not
handled.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6a15517242214c739bfdd8b6a480ecca81e776d6 09-Oct-2015 Emil Velikov <emil.velikov@collabora.com> i965/fs: move the fs_reg::smear() from get_timestamp() to the callers

We're about to reuse get_timestamp() for the nir_intrinsic_shader_clock.
In the latter the generalisation does not apply, so move the smear()
where needed. This also makes the function analogous to the vec4 one.

v2: Tweak the comment - The caller -> We (Matt, Connor).
v3: More comment tweaks (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
486268bdb03a36faf09d84e0458ff49dd1325c40 06-Jun-2015 Connor Abbott <cwabbott0@gmail.com> i965: always run the post-RA scheduler

Before, we would only do scheduling after register allocation if we
spilled, despite the fact that the pre-RA scheduler was only supposed to
be for register pressure and set the latencies of every instruction to
1. This meant that unless we spilled, which we rarely do, then we never
considered instruction latencies at all, and we usually never bothered
to try and hide texture fetch latency. Although a later commit removes
the setting the latency to 1 part, we still want to always run the
post-RA scheduler since it's able to take the false dependencies that
the register allocator creates into account, and it can be more
aggressive than the pre-RA scheduler since it doesn't have to worry
about register pressure at all.

Test master post-ra-sched diff %diff
bench_OglPSBump2 396.730 402.386 5.656 +1.400%
bench_OglPSBump8 244.370 247.591 3.221 +1.300%
bench_OglPSPhong 241.117 242.002 0.885 +0.300%
bench_OglPSPom 59.555 59.725 0.170 +0.200%
bench_OglShMapPcf 86.149 102.346 16.197 +18.800%
bench_OglVSTangent 388.849 395.489 6.640 +1.700%
bench_trex 65.471 65.862 0.390 +0.500%
bench_trexoff 69.562 70.150 0.588 +0.800%
bench_heaven 25.179 25.254 0.074 +0.200%

Reviewed-by: Jason Ekstrand <jasoan.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8c4151b866181198cb850137a6b65052e79554b1 29-Oct-2015 Matt Turner <mattst88@gmail.com> i965/fs: Use group(4, 0) to emit an exec-size 4 MOV.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8c902a580a490181e7cde29073b11181db4614f8 17-Jun-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Implement ARB_fragment_layer_viewport.

Normally, we could read gl_Layer from bits 26:16 of R0.0. However, the
specification requires that bogus out-of-range 32-bit values written by
previous stages need to appear in the fragment shader as-written.

Instead, we pass in the full 32-bit value from the VUE header as an
extra flat-shaded varying. We have the SF override the value to 0
when the previous stage didn't actually write a value (it's actually
defined to return 0).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b94cdcdada251bb8e866cb7af0f2ff222b55a918 26-Oct-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Properly check for PAD in fragment shaders with > 16 varyings.

Commit 268008f98c3810b9f276df985dc93efc0c49f33e changed unused VUE map
slots to be initialized with BRW_VARYING_SLOT_PAD, not COUNT. I missed
updating this. It also means that commit message was wrong, as some
code *did* rely slots being initialized to COUNT.

This may fix a bug with SSO programs with > 16 FS input varyings.
I think we probably just emitted extra pointless code, but probably
didn't break anything. We might also just have no tests for that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
aedc0aab19c233cc084211959ef2b6be1c500bb7 21-Oct-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965/fs: Use unsigned immediate 0 when eliminating SHADER_OPCODE_FIND_LIVE_CHANNEL

The destination for SHADER_OPCODE_FIND_LIVE_CHANNEL is always a UD
register. When we replace the opcode with a MOV, make sure we use a UD
immediate 0 so copy propagation doesn't bail because of non-matching
types.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
de5a450bd360d24db65cbba5b6633f800fda0d2e 17-Oct-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Don't use message headers for untyped reads

We always set the mask to 0xffff, which is what it defaults to when no
header is present. Let's drop the header instead.

v2: Only remove header for untyped reads. Typed reads always need the
header.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e2344e11ce8ddefb89a222bbf63a7c60e8ba5655 21-Oct-2015 Matt Turner <mattst88@gmail.com> i965/fs: Trim unneeded channels in SampleID setup.

The AND and SHR produce a scalar value that we had been replicating
across $dispatch_width channels. The immediate MOV produces only four
useful channels of data.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e10fc055e7dc5281f03a77088a24392098e3473b 21-Oct-2015 Matt Turner <mattst88@gmail.com> i965/fs: Use type-W for immediate in SampleID setup.

Not a functional difference, but register is loaded with a signed
immediate (V) and added to a signed type (D) producing a signed result
(D).

Also change the type of g0 to allow for compaction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1db44252d01bf7539452ccc2b5210c74b8dcd573 20-Oct-2015 Ben Widawsky <benjamin.widawsky@intel.com> i965: Implement ARB_shader_stencil_export (gen9+)

v2: remove useless source_stencil_to_render_target (Ken)
Squash in the actual packing function, which also got to
v2:
Move the definition of the OPCODE outside of FB_WRITE opcodes (Matt)
Reorder the regioning to be in VWH order (Matt)
Don't retype src in the backend, just assert instead (Matt)
Rename the debug prints to something better (Matt)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5fa7114652068735347c8715d1fc1d2cef72c433 20-Oct-2015 Ben Widawsky <benjamin.widawsky@intel.com> i965/fs: Enumerate logical fb writes arguments

Gen9 adds the ability to write out a stencil value, so we need to expand the
virtual payload by one. Abstracting this now makes that change easier to read.

I was admittedly confused early on about some of the hardcoding. If people
believe the resulting code is inferior, I am not super attached to the patch.

v2:
Remove explicit numbering from the enumeration (Matt).
Use a real naming scheme, and reference it in the opcode definition (Curro)
Add a missed hardcoded logical position in get_lowered_simd_width (Ben)
Add an assertion to make sure the component numbering is correct (Ben)

Cc: Matt Turner <mattst88@gmail.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ac98888afdc121e6eaafc9c5393647a2df4baef6 29-Sep-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Introduce a new SHADER_OPCODE_URB_READ_SIMD8 opcode.

In scalar mode, geometry shader inputs can easily take up hundreds of
registers. This makes pushing VUE entries impractical; we'll need to
resort to the pull model in some cases.

To support this, we introduce a new opcode corresponding to the "URB
Read SIMD8" message.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bea75227829512ab0e4766e00ac1b509c7586667 06-May-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Introduce new SHADER_OPCODE_URB_WRITE_SIMD8_MASKED/PER_SLOT opcodes.

In the vec4 backend, we have a vec4_instruction::urb_write_flags field.
There are many kinds of flags for SIMD4x2 messages.

However, there are really only two (per-slot offset, use channel masks)
for SIMD8 messages. Rather than adding a boolean flag for per-slot
offsets (polluting all instructions), I decided to just make three new
opcodes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ee77796a5c97105bf7e92e3a7931ee0f331a0545 20-Oct-2015 Neil Roberts <neil@linux.intel.com> i965/fs: Disable opt_sampler_eot for more message types

In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
message types because I found by experimentation that it doesn't work.
I wrote in the comment that I couldn't find any documentation for this
problem. However I've now found the documentation and it has
additional restrictions on further message types so this patch updates
the comment and adds the others.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
801f151917fedb13c5c6e96281a18d833dd6901f 20-Oct-2015 Neil Roberts <neil@linux.intel.com> i965: Remove block arg from foreach_inst_in_block_*_starting_from

Since 49374fab5d793 these macros no longer actually use the block
argument. I think this is worth doing to make the macros easier to use
because they already have really long names and a confusing set of
arguments.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9e17c36b8ba79e688011a5fd293ad5f42da21b66 14-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Extract can_change_source_types() functions.

Make them members of fs_inst/vec4_instruction for use elsewhere.

Also fix the fs version to check that dst.type == src[1].type and for
!saturate.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4467344c829f1dccdf74e27bef2c5fda72552be6 09-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Rename brw_foo_emit to brw_compile_foo

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
67db9072b9fde74277f74f7303366b8bdd3a711e 09-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Move some of the prog_data setup into brw_wm_emit

This commit moves the common/modern stuff. Some legacy stuff such as
setting use_alt_mode was left because it needs to know whether or not we're
an ARB program.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4e711872d024ce41c8b07b1150d8a393de21e26d 09-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/cs: Rework cs_emit to take a nir_shader and a brw_compiler

This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
22ad44910e993e1acd0b4052722fe786626008b5 06-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Rework wm_fs_emit to take a nir_shader and a brw_compiler

This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5e86f5b3d21fe8e96676bb0608990d72dbf61b85 06-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Remove the gl_program from the generator

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b9b40ef9b7644ea24768bc8b7464b1719efe99bf 10-Oct-2015 Rob Clark <robclark@freedesktop.org> nir: remove dependency on glsl

Move glsl_types into NIR, now that the dependency on glsl_symbol_table
has been split out.

Possibly makes sense to rename things at this point, but if we do that
I'd like to keep it split out into a separate patch to make git history
easier to follow (IMHO).

v2: fix android build
v3: I f***ing hate scons.. but at least it builds

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
176e6930e6c24dfce7cc730faa2612d27689a4df 18-Jul-2015 Timothy Arceri <t_arceri@yahoo.com.au> i965: add arrays of arrays support for varyings

V2: get the correct vector elements value for outputs

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bd198b9f0a292a9ff4ffffec3a29bad23d62caba 15-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Simplify fs_visitor's ATTR file.

Previously, ATTR was indexed by VERT_ATTRIB_* slots; at the end of
compilation, assign_vs_urb_setup() translated those into GRF units,
and converted ATTR to HW_REGs.

This patch moves the transslation earlier, making ATTR work in terms of
GRF units from the beginning. assign_vs_urb_setup() simply has to add
the number of payload registers and push constants to obtain the final
hardware GRF number. (We can't do this earlier as those values aren't
known.)

ATTR still supports reg_offset; however, it's simply added to reg.
It's not clear whether this is valuable or not.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
031d3501322aee0a1474c7f2a9b79f9fa9947430 26-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Unify URB entry size/read length calculations between backends.

Both the vec4 and scalar VS backends had virtually identical URB entry
size and read length calculations. We can move those up a level to
backend-agnostic code and reuse it for both.

Unfortunately, the backends need to know nr_attributes to compute
first_non_payload_grf, so I had to store that in prog_data. We could
use urb_read_length, but that's nr_attributes rounded up to a multiple
of two, so doing so would waste a register in some cases.

There's more code to be removed in the vec4 backend, but that will
come in a follow-on patch.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9a2573e5fc63f48cde56efdb191c129e7d7fb7b1 07-Oct-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965/cs: Get max_cs_threads from brw_compiler devinfo

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ee0f0108c8e87b9cfec25bade66670bbc4254139 07-Oct-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Move brw_get_shader_time_index() call out of emit functions

brw_get_shader_time_index() is all tangled up in brw_context state and
we can't call it from the compiler. Thanks the Jasons recent
refactoring, we can just get the index and pass to the emit functions
instead.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
469d0e449b78ad68e199dbe60e900487255a5d5d 06-Oct-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965/cs: Split out helper for building local id payload

The initial motivation for this patch was to avoid calling
brw_cs_prog_local_id_payload_dwords() in gen7_cs_state.c from the
compiler. This commit ends up refactoring things a bit more so as to
split out the logic to build the local id payload to brw_fs.cpp. This
moves the payload building closer to the compiler code that uses the
payload layout and makes it available to other users of the compiler.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ba71d581aeb96c4626500eb5b19f3bef2f40d586 05-Oct-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Move brw_dump_ir() out of brw_*_emit() functions

We move these calls one level up into the codegen functions.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3141906fa36839e9276cb65033857c85b39376e5 22-Sep-2015 Iago Toral Quiroga <itoral@igalia.com> i965: Define FIRST_SPILL_MRF and FIRST_PULL_LOAD_MRF only once and in one place

That should make tracking where we do spills and pull loads a bit easier.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
36e82b137d4a77f24de0fc722c80e445b6e3375c 22-Sep-2015 Iago Toral Quiroga <itoral@igalia.com> i965: make pull constant loads in gen6 start at MRFs 16/17

So they do not conflict with our (un)spills (MRF 21..23) or our
URB writes (MRF 1..15)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0c2add775192f3ee0325d61964ef67f7ca3f6d4e 22-Sep-2015 Iago Toral Quiroga <itoral@igalia.com> i965: Fix remove_duplicate_mrf_writes so it can handle 24 MRFs in gen6

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5a360dcad1fdb91f9129cb21775b9af60cbf57e4 03-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Generalize predicated break pass for use in vec4 backend.

instructions in affected programs: 44204 -> 43762 (-1.00%)
helped: 221

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bf7b6fd3fd6d98305d64ee6224ca9f9e7ba48444 02-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/shader: Get rid of the shader, prog, and shader_prog fields

Unfortunately, we can't get rid of them entirely. The FS backend still
needs gl_program for handling TEXTURE_RECTANGLE. The GS vec4 backend still
needs gl_shader_program for handling transfom feedback. However, the VS
needs neither and we can substantially reduce the amount they are used.
One day we will be free from their tyranny.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
404419ee1a57c79982d93eefe4de099d61ad2eee 02-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs,vec4: Get rid of the sanity_param_count

It doesn't exist for anything other than an assert that, as far as I can
tell, isn't possible to trip. Soon, we will remove prog from the visitor
entirely and this will become even more impossible to hit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
756613ed35d6fd2216b5138731c0c38886b8e14a 02-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use the nir info instead of pulling things out of [shader_]prog

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7b974c5f902b3f652776471aa35306195247a8a7 01-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/cs: Remove the prog argument from local_id_payload_dwords

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ea006c4cb5eb2d98d6bfd5a6c32fcae10b636f17 01-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Move binding table setup to codegen time.

Setting up binding tables really has little to do with the actual process
of turning shaders into instructions; it's more part of setting up
prog_data. This commit moves it out of the visitors and with the rest of
the prog_data setup stuff.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
28709e37d96d6b64753ca4dcce5fbfeb75f5b499 01-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/shader: Pull assign_common_binding_table_offsets out of backend_shader

This really has nothing to do with the backend compiler and we'd like to
eventually be able to set this up earlier in the compile process.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3de81508ea513bf01f2c996c25a2cfdb5b3231d0 30-Sep-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/shader: Get rid of the setup_vec4_uniform_value helper

It's not used by anything anymore

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
604ce8253ae796ecf9763f1612e2fff25591cb07 26-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Print reg and reg_offset separately for ATTR files.

Reading this output was really confusing. reg represents attribute
slots; reg_offset is the x/y/z/w component (0..3) within a vec4 slot.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d1be9d21265cf4e344a5d78b17cea7ee2c8408a1 24-Sep-2015 Jordan Justen <jordan.l.justen@intel.com> i965/cs: Add a binding table entry for gl_NumWorkGroups

If glDispatchComputeIndirect is used, then the value for this variable
must be read from the indirect BO.

To allow the same generated code to support indirect and
glDispatchCompute, we will also setup a BO for the number of work
groups using the intel_upload_data mechanism. This will only be
required if the gl_NumWorkGroups variable is accessed.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
99df02ca26f6127c8fa24d38a8a069ac6159356a 10-Sep-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Don't re-layout varyings for separate shader programs.

Previously, our VUE map code always assigned slots to varyings
sequentially, in one contiguous block.

This was a bad fit for separate shaders - the GS input layout depended
or the VS output layout, so if we swapped out vertex shaders, we might
have to recompile the GS on the fly - which rather defeats the point of
using separate shader objects. (Tessellation would suffer from this
as well - we could have to recompile the HS, DS, and GS.)

Instead, this patch makes the VUE map for separate shaders use a fixed
layout, based on the input/output variable's location field. (This is
either specified by layout(location = ...) or assigned by the linker.)
Corresponding inputs/outputs will match up by location; if there's a
mismatch, we're allowed to have undefined behavior.

This may be less efficient - depending what locations were chosen, we
may have empty padding slots in the VUE. But applications presumably
use small consecutive integers for locations, so it hopefully won't be
much worse in practice.

3% of Dota 2 Reborn shaders are hurt, but only by 2 instructions.
This seems like a small price to pay for avoiding recompiles.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b23eb643ebab9ef250ce026a7e2f651de9be10f6 13-Apr-2015 Samuel Iglesias Gonsalvez <siglesias@igalia.com> i965/fs: Implement FS_OPCODE_GET_BUFFER_SIZE

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2ea16966ae66d4dd5c5dcb996d7996d9c734bbee 24-Sep-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Respect stride and subreg_offset for ATTR registers

When we assign hw regs to attributes, we don't incorporate the stride
and subreg_offset from the fs_reg. It's rarely used, but the integer
multiplication lowering uses unusual stride and subreg_offset
combination breaks when one source is an attribute.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91970
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f50645d05c6dffa6463856ded0b8461ac9d24535 15-Sep-2015 Iago Toral Quiroga <itoral@igalia.com> i965: Turn BRW_MAX_MRF into a macro that accepts a hardware generation

There are some bug reports about shaders failing to compile in gen6
because MRF 14 is used when we need to spill. For example:
https://bugs.freedesktop.org/show_bug.cgi?id=86469
https://bugs.freedesktop.org/show_bug.cgi?id=90631

Discussion in bugzilla pointed to the fact that gen6 might actually have
24 MRF registers available instead of 16, so we could use other MRF
registers and avoid these conflicts (we still need to investigate why
some shaders need up to MRF 14 anyway, since this is not expected).

Notice that the hardware docs are not clear about this fact:

SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device
Hardware" says "Number per Thread" - "24 registers"

However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says:

"Normal threads should construct their messages in m1..m15. (...)
Regardless of actual hardware implementation, the thread should
not assume th at MRF addresses above m15 wrap to legal MRF registers."

Therefore experimentation was necessary to evaluate if we had these extra
MRF registers available or not. This was tested in gen6 using MRF
registers 21..23 for spilling and doing a full piglit run (all.py) forcing
spilling of everything on the FS backend. It was also tested by doing
spilling of everything on both the FS and the VS backends with a piglit run
of shader.py. In both cases no regressions were observed. In fact, many of
these tests where helped in the cases where we forced spilling, since that
triggered the same underlying problem described in the bug reports. Here are
some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on
gen6 hardware:

Using MRFs 13..15 for spilling:
crash: 2, fail: 113, pass: 6621, skip: 5461

Using MRFs 21..23 for spilling:
crash: 2, fail: 12, pass: 6722, skip: 5461

This patch sets the ground for later patches to implement spilling
using MRF registers 21..23 in gen6.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
47e18a595731c054ac254e26066e6dea804f34e8 15-Sep-2015 Jordan Justen <jordan.l.justen@intel.com> i965/fs: The barrier send uses only 1 payload register

When preparing the barrier payload, the instructions should operate in
simd8 mode since we only use 1 payload register.

fs_inst::regs_read is also updated to indicate that it only reads one
register for SHADER_OPCODE_BARRIER.

These issues were flagged by:

commit cadd7dd384b33a779d46bd664f456bed4a21a5b7
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date: Thu Jul 2 15:41:02 2015 -0700

i965/fs: Add a very basic validation pass

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cadd7dd384b33a779d46bd664f456bed4a21a5b7 03-Jul-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Add a very basic validation pass

Currently the validation pass only validates that regs_read and
regs_written are consistent with the sizes of VGRF's. We can add more as
we find it to be useful.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a548c75e31b4146d55133cb8c57a82117c196584 05-Sep-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Move perf_debug code to brw_codegen_*_prog()

We're trying to avoid a libdrm dependency in the core compiler, so let's
move the perf_debug code one level up from the brw_*_emit() helpers to
the brw_codegen_*_prog() helpers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
84f2ed2cfdab45aa949aa6affe46cfe2944759c1 05-Sep-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Move brw_fs_precompile() to brw_wm.c

All other precompile functions live in the brw_<stage>.c files, make fs
follow the convention.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
dc70c86b9b485cb5006a55cc2efd1f154dbfd469 05-Sep-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Move compute shader code around

This moves the compute shader code around in order to make the way the
code is split up more consistent. There should be no functional changes.
Typically we have a few files per stage:

brw_vs.c, brw_wm.c brw_gs.c:

code to drive code generation and implement precompiling and
cache search.

genX_<stage>_state.c

gen specific implementation of the state emission for the shader
stage.

The brw_*_emit() functions are all in the same files as the visitor
classes they use (with the exception of VS, which may use either vec4 or
fs).

To make compute follow this convention, we move the brw_cs_emit()
function into brw_fs.cpp. We can then rename brw_cs.cpp to brw_cs.c and
do this in C like the other similar files. Finally, move state setup
and atoms to gen7_cs_state.c.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c7161a3c3559f0450a90bb1228c74e8fdc9c939b 22-Nov-2014 Jordan Justen <jordan.l.justen@intel.com> i965/cs: Reserve local invocation id in payload regs

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
af48612b88cb51cd3b957e70490462c0c404f92c 04-Oct-2014 Jordan Justen <jordan.l.justen@intel.com> i965/fs: Set first_non_payload_grf in assign_curb_setup

first_non_payload_grf may be updated in assign_urb_setup for FS or
assign_vs_urb_setup for VS.

We need to set this in assign_curb_setup for compute shaders since cs
does not have an assign_cs_urb_setup like assign_urb_setup (fs) or
assign_vs_urb_setup (vs).

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0b91bcea98c0fe201bba89abe1ca3aee4d04c56c 12-Aug-2015 Ilia Mirkin <imirkin@alum.mit.edu> i965: add support for textureSamples function

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
[v2: kayden-supplied code in fs_nir replacing need for logical opcode]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a2151560b8d65be31129c00872ea8d70c564b110 28-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Move brw_setup_tex_for_precompile to brw_program.[ch].

This living in brw_fs.{h,cpp} is a historical artifact of us supporting
texturing for fragment shaders before any other stages. It's kind of
awkward given that we use it for all stages.

This avoids having to include brw_fs.h in geometry shader code in order
to access this function.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9390cb84593bda516e8c1521c87a08475574d1be 02-Sep-2015 Matt Turner <mattst88@gmail.com> i965/fs: Handle MRF destinations in lower_integer_multiplication().

The lowered code reads from the destination, which isn't possible from
message registers.

Fixes the following dEQP tests on SNB:

dEQP-GLES3.functional.shaders.precision.int.highp_mul_fragment
dEQP-GLES3.functional.shaders.precision.int.mediump_mul_fragment
dEQP-GLES3.functional.shaders.precision.int.lowp_mul_fragment

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8765f1d7ddfb00dc5b202e4e679ebe640a547d50 18-Aug-2015 Matt Turner <mattst88@gmail.com> i965: Only consider fixed_hw_reg in equals() if file is HW_REG/IMM.

Noticed when debugging things that lead to the next patch.

On G45 (and presumably ILK) this helps register coalescing:

total instructions in shared programs: 4077373 -> 4077340 (-0.00%)
instructions in affected programs: 43751 -> 43718 (-0.08%)
helped: 52
HURT: 2

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fee0c5af11dd0995de96e7053377d425a66d03a0 19-Aug-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Split VGRFs after lowering pull constants

The split_virtual_grfs code doesn't properly rewrite reladdr so we need to
make sure that any uniform indirects are lowered away first.

This fixes the glsl-fs-uniform-indexed-by-swizzled-vec4.shader_test in piglit

Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f2e667172a6382f81d1f3e709f02c7ee6cfda4c7 19-Aug-2015 Jason Ekstrand <jason.ekstrand@intel.com> i964/fs: Refactor assign_constant_locations

Now that all constant locations are assigned in a single function, we can
refactor it a bit to unify things. In particular, we now handle
pull_constant_loc and push_constant_loc more similarly and we only modify
stage_prog_data->params[] in one place at the end of the function.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
dfacae3a56463e2df3a67e245f868e9f2be64dcd 19-Aug-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Combine assign_constant_locations and move_uniform_array_access_to_pull_constants

The comment above move_uniform_array_access_to_pull_constants was
completely bogus because it has nothing to do with lowering instructions.
Instead, it's assiging locations of pull constants.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
640c472fd075814972b1276c5b0ed3a769aacda5 12-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Move type_size() methods out of visitor classes.

I want to use C function pointers to these, and they don't use anything
in the visitor classes anyway.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c56899f41a904762225267cb9c543a0abd901ad5 19-Aug-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Make setup_vec4_uniform_value and _image_uniform_values take an offset

This way they don't implicitly increment the uniforms variable and don't
have to be called in-sequence during uniform setup.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8d8b8f58540abbdb8a006a38830a08346a0edf34 19-Aug-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Rename setup_vector_uniform_values to setup_vec4_uniform_value

The new name more accurately represents what it does: Set up a single vec4
uniform value.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
84431c1f1d343c85f3b7fa265293a1d245ba9cf3 05-May-2015 Francisco Jerez <currojerez@riseup.net> i965: Teach type_size() about the size of an image uniform.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8a688bee83ced46eb4bff741f05d2da033c07ade 10-Aug-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Make resolve_source_modifiers consistent with the vec4 version

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ee977183dcb543c919d0d70dde610cb191d5a3ea 04-Aug-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Lower arithmetic instructions with register regions of unsupported width.

This extends the SIMD lowering pass to enforce the hardware limitation
that no directly-addressed source may read more than 2 physical GRFs.
One can easily go over this limit when doing 64-bit arithmetic
(e.g. FP64 or extended-precision integer MULs) or SIMD32, so it's nice
to be able to just emit an instruction of the intended execution size
from the visitor and let the lowering pass deal with this restriction
transparently.

Some hardware arithmetic instructions are not handled here, including
all instructions that use the accumulator implicitly (which the SIMD
lowering pass deliberately doesn't handle), instructions with
non-per-channel sources (e.g. LINE or PLANE) and SEND-like
instructions, which need special handling most likely as virtual
opcodes.

Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
42a18ca76057621ae7d8812b29ea2245d6ff282d 05-Aug-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix fs_inst::regs_read() for sources in the ATTR file.

Otherwise it would crash on Gen8 with scalar VS. The issue can easily
be reproduced with the following patch, but I don't see any reason why
it wouldn't be possible to end up with an ATTR argument here even
without it.

CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3b48a0eeda20f5cf2dbc8de5e36f8fe3461f41bf 06-Aug-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Lower the MULH virtual instruction.

Translate MULH into the MUL/MACH sequence. This does roughly the same
thing that nir_emit_alu() used to do but we can now handle 16-wide by
taking advantage of the SIMD lowering pass. The force_sechalf
workaround near the bottom is required because the SIMD lowering pass
will emit instructions with non-zero quarter control and we need to
make sure we avoid that on integer arithmetic instructions with
implicit accumulator access due to a known hardware bug on IVB.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2e731264382954beb1192cd7cc62e16e0b8e7978 05-Aug-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Indent the implementation of 32x32-bit MUL lowering by one level.

In order to make room for the code that will lower the MULH virtual
instruction. Also move the hardware generation and execution type
checks into the same branch, they are going to have to be different
for MULH.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f5b37fb1acad9cf044b7b6d4fa5f2582bd8bc7f4 05-Aug-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Lower 32x32 bit multiplication on BXT.

AFAIK BXT has the same annoying alignment limitation as CHV on the
source register regions of 32x32 bit MULs, give it the same treatment.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c1da15709a0c0c2775bd9e534f67c60f7dc95ce8 12-Jul-2015 Matt Turner <mattst88@gmail.com> i965: Use float calculations when double is unnecessary.

Literals without an f/F suffix are of type double, and implicit
conversion rules specify that the float in (float op double) be
converted to a double before the operation is performed. I believe float
execution was intended (in nearly all cases) or is sufficient (in the
case of gen7_urb.c).

Removes a lot of float <-> double conversion instructions and replaces
many double instructions with float instructions which are cheaper.

text data bss dec hex filename
4928659 195160 26192 5150011 4e953b i965_dri.so before
4928315 195152 26192 5149659 4e93db i965_dri.so after

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
02425d3ec2af6945a03583cadcaa5f3f330bbc0e 27-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Make the default builder 64-wide before entering the optimization loop.

Not a typo. Replace the default builder with one of bogus width to
catch cases in which optimization passes assume that the default
dispatch width is good enough. The execution controls of instructions
emitted during optimization should in general match the original code
that is being manipulated. Many of the problems fixed in this series
were caught by the assertions introduced in this patch.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4529916dfd227af6c4e151f45261db22157fe45f 27-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Don't set exec_all on instructions wider than the original in lower_simd_width.

This could have led to somewhat increased bandwidth usage for lowered
texturing instructions on Gen4 (which is the only case in which
lower_width may be greater than inst->exec_size). After the previous
patches the invariant mentioned in the comment should no longer be
assumed by any of the other optimization and lowering passes, so the
exec_all() call shouldn't be necessary anymore.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
eaba922582cfdccc7b198f9b23d8bd3c26197d03 27-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Initialize a builder explicitly in the gen4 send dependency work-arounds.

Instead of relying on the default one. This shouldn't lead to any
functional changes because DEP_RESOLVE_MOV overrides the execution
size of the instruction anyway and other execution controls are
irrelevant.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
992cda2c8a452ec86386a0f98eaf522afe206695 27-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Switch lower_logical_sends() to the fs_builder constructor from instruction.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
930ebb258524762c765fa864ef7063bd8bb754a1 27-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Switch lower_load_payload() to the fs_builder constructor from instruction.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bfad71606a987f14f20d2c3607846648f8537f2b 27-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Set up the builder execution size explicitly in opt_sampler_eot().

opt_sampler_eot() was relying on the default builder to have the same
width as the sampler and FB write opcodes it was eliminating, the
channel selects didn't matter because the builder was only being used
to allocate registers, no new instructions were being emitted with it.
A future commit will change the width of the default builder what will
break this assumption, so initialize it explicitly here.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ff463af436bcf07430807512c9f0bf0f627288ce 27-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Set execution controls correctly in lower_integer_multiplication().

lower_integer_multiplication() was ignoring the execution controls of
the original MUL instruction. Fix it by using the new fs_builder
constructor.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ce90227c71c8cbe6ca4317f1873ff12c70081c4c 27-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Set execution controls correctly for lowered pull constant loads.

demote_pull_constants() was ignoring the execution size and channel
selects of the instruction that wanted the constant, which doesn't
matter for uniform pull constant loads because all channels get the
same scalar value, but it might for varying pull constant loads. Fix
it by using the new fs_builder() constructor that takes care of
setting execution controls compatible with the instruction passed as
argument.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3352724dfa4eb5c93290db92ae99d26d9b89e630 14-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Implement lowering of logical surface instructions.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
086d29f4d747bbcfe37beeb18ba77fb2cb84dbdc 18-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Hook up SIMD lowering to unroll surface instructions of unsupported width.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7a594a95a930f1658062e4d86d0f37d491b372b3 21-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Define logical typed and untyped surface opcodes.

Each logical variant is largely equivalent to the original opcode but
instead of taking a single payload source it expects its arguments
separately as individual sources, like:

typed_surface_write_logical null, coordinates, source, surface,
num_coordinates, num_components

This patch defines the opcodes and usual instruction boilerplate,
including a placeholder lowering function provided mainly as
documentation for their source registers.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a0c02d2bbb765b0e997ad524d8e51838e529d9c0 28-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965: Define the setup_vector_uniform_values() backend_visitor interface.

This cleans up the VEC4 implementation of setup_uniform_values()
somewhat and will avoid duplication of the image uniform upload code
by having a common interface to upload a vector of uniforms on either
back-end.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4be99438e6e40280f9dc071882ce3bfbfabadb4a 13-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Hook up SIMD lowering to handle texturing opcodes of unsupported width.

This should match the set of cases in which we currently call fail()
or no16() from the emit_texture_*() methods and the ones in which
emit_texture_gen4() enables the SIMD16 workaround.

Hint for reviewers: It's not a big deal if I happen to have missed
some case here, it will just lead to an assertion failure down the
road which is easily fixable, however being stricter than necessary
won't cause any visible breakage, it would just decrease performance
silently due to the unnecessary message splitting, so feel free to
double-check that all cases listed here already cause a SIMD8/16
fall-back with the current texturing code -- You may want to skip over
the Gen5-6 cases though if you don't have pencil and paper at hand.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2cd466f6c3192015ea1794afc57eb453d7f13818 18-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Implement lowering of logical texturing opcodes on Gen4.

Unlike its Gen5 and Gen7 counterparts this patch isn't a plain
refactor of the previous Gen4 texturing code, it's more of a rewrite
largely based on emit_texture_gen4_simd16(). The reason is that on
the one hand the original emit_texture_gen4() code didn't seem easily
fixable to be SIMD width-invariant and had plenty of clutter to
support SIMD-width workarounds which are no longer required. On the
other hand emit_texture_gen4_simd16() was missing a number of
SIMD8-only opcodes. This should generalize both and roughly match
their current behaviour where there is overlap.

Incidentally this will fix the following piglits on Gen4:

arb_shader_texture_lod.execution.arb_shader_texture_lod-texgrad
arb_shader_texture_lod.execution.tex-miplevel-selection *gradarb 2d
arb_shader_texture_lod.execution.tex-miplevel-selection *gradarb 3d
arb_shader_texture_lod.execution.tex-miplevel-selection *projgradarb 2d
arb_shader_texture_lod.execution.tex-miplevel-selection *projgradarb 2d_projvec4
arb_shader_texture_lod.execution.tex-miplevel-selection *projgradarb 3d

Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
501134b9fe02633ca0cdda66a9b670ae38e791f7 18-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Implement lowering of logical texturing opcodes on Gen5-6.

This should be largely equivalent to emit_texture_gen5() except for
slight codestyle changes and the use i965 opcodes instead of the
ir_texture_opcode enum, see "i965/fs: Implement lowering of logical
texturing opcodes on Gen7+." for the mapping between them.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
03582f95b256e483fc1b0d78bd6a49203a448a23 17-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Lower SHADER_OPCODE_TXF_UMS/MCS_LOGICAL too on Gen7+.

These weren't being handled by emit_texture_gen7() but we can easily
lower them here for consistency with other texturing opcodes.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8be01e3548bdd900b7cadb5c9a77e52b01151cfe 13-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Implement lowering of logical texturing opcodes on Gen7+.

This should be largely equivalent to emit_texture_gen7() except that
we now get i965 sampling opcodes directly rather than
ir_texture_opcode enum values. The mapping is as follows:

- ir_tex -> SHADER_OPCODE_TEX
- ir_txb -> FS_OPCODE_TXB
- ir_txl -> SHADER_OPCODE_TXL
- ir_txd -> SHADER_OPCODE_TXD
- ir_txf -> SHADER_OPCODE_TXF
- ir_txf_ms -> SHADER_OPCODE_TXF_CMS
- ir_txs -> SHADER_OPCODE_TXS
- ir_query_levels -> SHADER_OPCODE_TXS too, the visitor will make
sure that the provided lod value is zero in this
case.
- ir_lod -> SHADER_OPCODE_LOD
- ir_tg4 -> SHADER_OPCODE_TG4_OFFSET if the offset value is not
immediate, SHADER_OPCODE_TG4 otherwise.

Other than that there are only minor changes and style fixes like the
implementation now being factored out in static functions to improve
encapsulation.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
44a8cf488e0370d7e5abe363c1fd2d21247a6e32 13-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix opt_zero_samples() for texturing ops not matching dispatch_width.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
33deff4f0582d2c073d34d4d6ec8344d2b1fbf7d 21-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Define logical texture sampling opcodes.

Each logical variant is largely equivalent to the original opcode but
instead of taking a single payload source it expects the arguments
separately as individual sources, like:

tex_logical dst, coordinates, shadow_c, lod, lod2,
sample_index, mcs, sampler, offset,
num_coordinate_components, num_grad_components

This patch defines the opcodes and usual instruction boilerplate,
including a placeholder lowering function provided mostly as
documentation for their source registers.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
59e7e6f7a21f13ff8963cf21af2e969f1f7961f5 13-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Implement lowering of logical framebuffer writes.

This does essentially the same thing as
fs_visitor::emit_single_fb_write(), with some slight differences:

- We don't have to worry about exec_size and use_2nd_half anymore,
16-wide sources have already been lowered to 8-wide thanks to the
previous commit and the manual argument unzipping is no longer
required.

- The src/dst_depth and sample_mask values are now explicit sources
of the instruction instead of being taken from the visitor state
directly. The same goes for the kill-pixel mask that will be
passed to the instruction explicitly as predicate.

- Everything is now done in static functions to improve
encapsulation.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
633938afd349f2b423146969688c11f1e29ca17a 13-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Hook up SIMD lowering to unroll FB writes of unsupported width.

This shouldn't have any effect because we don't emit logical
framebuffer writes yet.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a9f31a032b0a1068a4e2ceed9ed4680ecf13e28b 27-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Define logical framebuffer write opcode.

The logical variant is largely equivalent to the original opcode but
instead of taking a single payload source it expects its arguments
that make up the payload separately as individual sources, like:

fb_write_logical null, color0, color1, src0_alpha,
src_depth, dst_depth, sample_mask, num_components

This patch defines the opcode and usual instruction boilerplate,
including a placeholder lowering function provided mainly as
self-documentation.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8368939e5d94f8d4ae55a1f22a755922ee77132b 13-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Implement pass to lower instructions of unsupported SIMD width.

This lowering pass implements an algorithm to expand SIMDN
instructions into a sequence of SIMDM instructions in cases where the
hardware doesn't support the original execution size natively for some
particular instruction. The most important use-cases are:

- Lowering send message instructions that don't support SIMD16
natively into SIMD8 (several texturing, framebuffer write and typed
surface operations).

- Lowering messages that don't support SIMD8 natively into SIMD16
(*cough*gen4*cough*).

- 64-bit precision operations (e.g. FP64 and 64-bit integer
multiplication).

- SIMD32.

The algorithm works by splitting the sources of the original
instruction into chunks of width appropriate for the lowered
instructions, and then interleaving the results component-wise into
the destination of the original instruction. The pass is controlled
by the get_lowered_simd_width() function that currently just returns
the original execution size making the whole pass a no-op for the
moment until some user is introduced.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

v2: Reverse order of the source transformations and split_inst emit
call to make the code a bit easier to understand.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
86ae788baefefdb2fa77fe3c242ad2d81c8e834e 16-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix return value of fs_inst::regs_read() for BAD_FILE.

Typically BAD_FILE sources are used to mark a source as not present
what implies that no registers are read. This will become much more
frequent with logical send opcodes which have a large number of
sources, many of them optionally used and marked as BAD_FILE when they
aren't applicable. It will prove to be useful to be able to rely on
the value of regs_read() regardless of whether a source is present or
not.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1dd3543ac1bebe089bfe3a8ae5efbe3f564e1144 13-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Add stub lowering pass for logical send-message opcodes.

This pass will house ad-hoc lowering code for several send
message-like virtual opcodes that will represent their logically
independent arguments as separate instruction sources rather than as a
single payload blob. This pass will basically just take the separate
arguments that are supposed to be part of the payload and concatenate
them to construct a message in the form required by the hardware.
Virtual instructions in separate-source form will eventually allow
some simplification of the visitor code and make several
transformations easier like lowering SIMD16 instructions to SIMD8
algorithmically in cases where the hardware doesn't support the former
natively.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fb7eba97d7235d49ac712a21fb51009c86f3bc64 21-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Factor out source components calculation to a separate method.

This cleans up fs_inst::regs_read() slightly by disentangling the
calculation of "components" from the handling of message payload
arguments. This will also simplify the SIMD lowering and logical send
message lowering passes, because it will avoid expressions like
'regs_read * REG_SIZE / component_size' which are not only ugly, they
may be inaccurate because regs_read rounds up the result to the
closest register multiple so they could give incorrect results when
the component size is lower than one register (e.g. uniforms). This
didn't seem to be a problem right now because all such expressions
happen to be dealing with per-channel GRFs only currently, but that's
by no means obvious so better be safe than sorry.

v2: Split PIXEL_X/Y and LINTERP into separate case blocks.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
80511d176a49e754a18ce585bab413db7af63bf7 21-Jul-2015 Dave Airlie <airlied@redhat.com> i965: add support for ARB_shader_subroutine

This just adds some missing pieces to nir/i965,
it is lightly tested on my Haswell.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9383664a9cbc5bc4858fc50d7fa565f43028d779 13-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix stride field for uniforms.

This fixes essentially the same problem as for immediates. Registers
of the UNIFORM file are typically accessed according to the formula:

read_uniform(r, channel_index, array_index) =
read_element(r, channel_index * 0 + array_index * 1)

Which matches the general direct addressing formula for stride=0:

read_direct(r, channel_index, array_index) =
read_element(r, channel_index * stride +
array_index * max{1, stride * width})

In either case if reladdr is present the access will be according to
the composition of two register regions, the first one determining the
per-channel array_index used for the second, like:

read_indirect(r, channel_index, array_index) =
read_direct(r, channel_index,
read(r.reladdr, channel_index, array_index))

where:
read(r, channel_index, array_index) = if r.reladdr == NULL
then read_direct(r, channel_index, array_index)
else read_indirect(r, channel_index, array_index)

In conclusion we can handle uniforms consistently with the other
register files if we set stride to zero. After lowering to a GRF
using VARYING_PULL_CONSTANT_LOAD in demote_pull_constant_loads() the
stride of the source is set to one again because the result of
VARYING_PULL_CONSTANT_LOAD is generally non-uniform.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5f8d9ae5a54961deb02eb52e924a84b99b60f035 13-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix stride for immediate registers.

When the width field was removed from fs_reg the BROADCAST handling
code in opt_algebraic() started to miss a number of trivial
optimization cases resulting in the ugly indirect-addressing sequence
to be emitted unnecessarily for some variable-indexed texturing and
UBO loads regardless of one of the sources of BROADCAST being
immediate. Apparently the reason was that we were setting the stride
field to one for immediates even though they are typically uniform.
Width used to be set to one too which is why this optimization used to
work previously until the "reg.width == 1" check was removed.

The stride field of vector immediates is intentionally left equal to
one, because they are strictly speaking not uniform. The assertion in
fs_generator makes sure that immediates have the expected stride as
consistency check.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9f344b908a95440d215f29c0b05b8ea8dba2839e 01-Jul-2015 Connor Abbott <connor.w.abbott@intel.com> i965/fs: fix regs_read() for LINTERP

The second source always stays within the same SIMD8 register.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7e337859ff98a0caf00fd201a5389933d42d0baa 17-Jul-2015 Jordan Justen <jordan.l.justen@intel.com> i965/cs: Return 1 for regs_read on CS_OPCODE_CS_TERMINATE

This prevents an assertion failure in brw_fs_live_variables.cpp,
fs_live_variables::setup_one_read: Assertion `var < num_vars' failed.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4bddd82bf3dae44c2b75cef34e9e85e15d63df7f 14-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Factor out universally broken calculation of the register component size.

This in principle simple calculation was being open-coded in a number
of places (in a series I haven't yet sent for review there will be a
couple more), all of them were subtly broken in one way or another:
None of them were handling the HW_REG case correctly as pointed out by
Connor, and fs_inst::regs_read() was handling the stride=0 case rather
naively. This patch solves both problems and factors out the
calculation as a new fs_reg method.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
dabec9c293ee29335f5a6d5d1d3c2b7a715605c1 30-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Relax fs_builder channel group assertion when force_writemask_all is on.

This assertion was meant to catch code inadvertently escaping the
control flow jail determined by the group of channel enable signals
selected by some caller, however it seems useful to be able to
increase the default execution size as long as force_writemask_all is
enabled, because force_writemask_all is an explicit indication that
there is no longer a one-to-one correspondence between channels and
SIMD components so the restriction doesn't apply.

In addition reorder the calls to fs_builder::group and ::exec_all in a
couple of places to make sure that we don't temporarily break this
invariant in the future for instructions with exec_size higher than
the dispatch width.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ebe3043eeacb073c7dbb6162d8f0aee3bc66eeb1 01-Jul-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Fix PIXEL_X/Y in regs_read()

PIXEL_X/Y takes a vec2 in the first argument
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
830f67046ace3c0b95a7f093fe373eeb417a1aad 18-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Remove the width field from fs_reg

As of now, the width field is no longer used for anything. The width field
"seemed like a good idea at the time" but is actually entirely redundant
with the instruction's execution size. Initially, it gave us the ability
to easily set the instructions execution size based entirely on register
widths. With the builder, we can easiliy set the sizes explicitly and the
width field doesn't have as much purpose. At this point, it's just
redundant information that can get out of sync so it really needs to go.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
83458e7c53cfc1f344280da6eb9a3b4e2dfdbc00 18-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use exec_size instead of dst.width for computing component size

There are a variety of places where we use dst.width / 8 to compute the
size of a single logical channel. Instead, we should be using exec_size.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
21803b7b3304f053a48e313951ffddf1d2cd0bd9 18-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use the builder dispatch width instead of dst.width for pull constants

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c9676329dd6c69b2e0b12405c3b4078f7d216f2f 18-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Remove exec_size guessing from fs_inst::init()

Now that all of the non-explicit constructors are gone, we don't need to
guess anymore.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
500525e96019aff551afa8fee841d00ca9ec4c4f 18-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use exec_size for determining regs read/written and partial writes

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
89bc4c78c394e50ddb16cc089bd3ec90681342d7 18-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Remove fs_inst constructors that don't take an explicit exec_size

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
67c4c9e1a709508b88d6d31eb1f7cb61d187189e 18-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Make better use of the builder in shader_time

Previously, we were just depending on register widths to ensure that
various things were exec_size of 1 etc. Now, we do so explicitly using the
builder.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f7dcc1160331462a071c54ca1067f9e2f57b55be 18-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Add a builder argument to offset()

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c5a8da5f24eae4479b4ebe6301d780f781e24ed2 01-Jul-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Properly handle LOAD_PAYLOAD in fs_inst::regs_read

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
12bc22ef58377191508af91a918efd18e2da7500 19-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Report the right value in fs_inst::regs_read() for PIXEL_X/Y

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
aca5228011e7b9e96f3bd3a621c88e63ba47a4f3 18-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Fix fs_inst::regs_read() for uniform pull constant loads

Previously, fs_inst::regs_read() fell back to depending on the register
width for the second source. This isn't really correct since it isn't a
SIMD8 value at all, but a SIMD4x2 value. This commit changes it to
explicitly be always one register.

v2: Use mlen for determining the number of registers read

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
241317d59ab440bdcda25bacaadacfb3b4c2dd93 19-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Actually set/use the mlen for gen7 uniform pull constant loads

Previously, we were allocating the payload with different sizes per gen and
then figuring out the mlen in the generator based on gen. This meant,
among other things, that the higher level passes knew nothing about it.

Acked-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3258e1b80d66ec26f14a24a5eae0629a2d23a444 18-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use a switch statement in fs_inst::regs_read()

This makes things a little simpler, more efficient, and quite a bit more
readable.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
19a0ba130fd0d0f3b86181a8d05cf5391420360d 27-Jun-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Move compute_clip_distance() out of emit_urb_writes().

Legacy user clipping (using gl_Position or gl_ClipVertex) is handled by
turning those into the modern gl_ClipDistance equivalents.

This is unnecessary in Core Profile: if user clipping is enabled, but
the shader doesn't write the corresponding gl_ClipDistance entry,
results are undefined. Hence, it is also unnecessary for geometry
shaders.

This patch moves the call up to run_vs(). This is equivalent for VS,
but removes the need to pass clip distances into emit_urb_writes().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
40801295d5a3d747661abb1e2ca64d44c0e3dc05 23-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Remove the brw_context from the visitors

As of this commit, nothing actually needs the brw_context.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
663f8d121d792edee5c012461bfd0b650011ff4a 20-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vs: Pass the current set of clip planes through run() and run_vs()

Previously, these were pulled out of the GL context conditionally based on
whether we were running ff/ARB or a GLSL program. Now, we just pass them
in so that the visitor doesn't have to grab them itself.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4af62c0f5cbadc762abb1bd2e59f44ca220e3f0a 20-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Add a do_rep_send flag to run_fs

Previously, we were pulling it from brw->do_rep_send

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1b0f6ffa15b25e8601d60fe1ea74e893f7d33cf5 20-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Pull calls to get_shader_time_index out of the visitor

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c7893dc3c590b86787d8118e3920debaea3f16da 19-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Use a single index per shader for shader_time.

Previously, each shader took 3 shader time indices which were potentially
at arbirary points in the shader time buffer. Now, each shader gets a
single index which refers to 3 consecutive locations in the buffer. This
simplifies some of the logic at the cost of having a magic 3 a few places.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
073294d3ef20d0dbeffcc38aff3d69eda624ee75 23-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Plumb compiler debug logging through brw_compiler

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3fd457c9ddd4b9f730e70bfd19b2f9eeeeaef089 23-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Do the no16 perf logging directly in fs_visitor::no16()

While we're at it, we'll drop the note about 10-20% performance loss.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f45bf97f30f2feacf8f976271a43feea70e5c382 23-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Make no16 non-variadic

We never used the fact that it was variadic anyway.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d7565b7d65f8203c20735a61b86e9158b8ec4447 16-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Remove the dependance on brw_context from the generators

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e639a6f68e701f23b977a49c45d646c164991d36 16-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Plumb compiler debug logging through a function pointer in brw_compiler

v2 (Ken): Make shader_debug_log a printf-like function.
v3 (Jason): Add a void * to pass the brw_context through

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
630764407aeba4acf9364739bafb0e3516f72e31 20-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Replace some instances of brw->gen with devinfo->gen
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a49328d58d1e3e143f9434976d9f3574acefc4ea 22-Jun-2015 Matt Turner <mattst88@gmail.com> i965/fs: Don't mess up stride for uniform integer multiplication.

If the stride is 0, the source is a uniform and we should not modify the
stride.

Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91047
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8d3c48eed24f351c86361707978647c78010bb7f 10-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Remove one more fixed brw_null_reg() from the visitor.

Instead use fs_builder::null_reg_f() which has the correct register
width. Avoids the assertion failure in fs_builder::emit() hit by the
"ES3-CTS.shaders.loops.for_dynamic_iterations.unconditional_break_fragment"
GLES3 conformance test introduced by 4af4cfba9ee1014baa4a777660fc9d53d57e4c82.

Reported-and-reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
44928b799adbbf2671c482431b3b7a390118725c 08-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Remove dead IR construction code from the visitor.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fe88c7ae38c72ea09ced69fb12ff00f58bdf1d6e 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate translation of NIR ALU instructions to the IR builder.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e32c16c47f7a3cf25e2b4d2f3b97d0f8f89669c0 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate FS framebuffer writes to the IR builder.

The explicit call to fs_builder::group() in emit_single_fb_write() is
required by the builder (otherwise the assertion in fs_builder::emit()
would fail) because the subsequent LOAD_PAYLOAD and FB_WRITE
instructions are in some cases emitted with a non-native execution
width. The previous code would always use the channel enables for the
first quarter, which is dubious but probably worked in practice
because FB writes are never emitted inside non-uniform control flow
and we don't pass the kill-pixel mask via predication in the cases
where we have to fall-back to SIMD8 writes.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ad68853f17868081a69b3f73f4bf4c1bc8b2571d 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate FS discard handling to the IR builder.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
46f264638ad97a0b806e6fad7117d62a2cf914b6 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate FS gl_SamplePosition/ID computation code to the IR builder.

v2: Use fs_builder::AND/SHR/MOV instead of ::emit.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
31477226ec6cbe956a4bbdcae81cc7ca5ad28cc6 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate FS interpolation code to the IR builder.

v2: Fix some preexisting trivial codestyle issues.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d3c10ad42729c1fe74a7f7c67465bd2beb7f9e75 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate shader time to the IR builder.

v2: Change null register destination type to UD so it can be compacted.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
546839ef639bf871feaa62ab7d811f2fc783bdcd 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate pull constant loads to the IR builder.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8f626c14989f005599f7841b89144d2bf58b5704 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate Gen4 send dependency workarounds to the IR builder.

v2: Change brw_null_reg() to bld.null_reg_f().

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4af4cfba9ee1014baa4a777660fc9d53d57e4c82 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate lower_integer_multiplication to the IR builder.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
efa60e49f2e5dd56f1c81487e9aad9f89136d8b4 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate lower_load_payload to the IR builder.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6114ba4dccfdb8f7c657feeed8f8c9b69debba91 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate opt_sampler_eot to the IR builder.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e04b4156a745fc09afa066c892c1913362eae9df 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Allocate a common IR builder object in fs_visitor.

v2: Call fs_builder::at_end() to point the builder at the end of the
program explicitly.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c820407ef0aac87546d1a778e169cfa1a915a219 03-Jun-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Print mlen in dump_instructions() output.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
78644ffc4d341deb431145108f0b2d377e59b61e 20-May-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Remove the ir_visitor code

Now that everything is running through NIR, this is all dead.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
114497afff4e49139b8c7d61f11a7872b81398bf 20-May-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Make NIR non-optional for scalar shaders

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
99cb4233205edcfa1a1e2967eef7bb16ff19bec4 20-May-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Rename backend_visitor to backend_shader

The backend_shader class really is a representation of a shader. The fact
that it inherits from ir_visitor is somewhat immaterial.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0596134410a0decc2f6bba77bfedb82d308aabbe 27-May-2015 Matt Turner <mattst88@gmail.com> i965/fs: Fix lowering of integer multiplication with cmod.

If the multiplication's result is unused, except by a conditional_mod,
the destination will be null. Since the final instruction in the lowered
sequence is a partial-write, we can't put the conditional mod on it and
we have to store the full result to a register and do a MOV with a
conditional mod.

Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90580
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6ca67f62e885f0e42c0cef2db5c0ae837adfe646 20-May-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Fix implied_mrf_writes for scratch writes

We build the entire message in the generator so all the MRF writes are
implied.

Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f7df169ba13d22338e9276839a7e9629ca0a6b4f 14-May-2015 Matt Turner <mattst88@gmail.com> i965/fs: Implement integer multiply without mul/mach.

Ivybridge and Baytrail can't use mach with 2Q quarter control, so just
do it without the accumulator. Stupid accumulator.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4ec09c77471e39e6ff81c99f1edde2e1713a7f24 13-May-2015 Matt Turner <mattst88@gmail.com> i965/fs: Support integer multiplication in SIMD16 on Haswell.

Ivybridge (and presumably Baytrail) have a bug that prevents this from
working.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1e4e17fbd9296cc5064aabdb351a894d10190cb6 11-May-2015 Matt Turner <mattst88@gmail.com> i965/fs: Lower integer multiplication after optimizations.

32-bit x 32-bit integer multiplication requires multiple instructions
until Broadwell. This patch just lets us treat the MUL instruction in
the FS backend like it operates on Broadwell, and after optimizations
we lower it into a sequence of instructions on older platforms.

Doing this will allow us to some extra optimization on integer
multiplies.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3687d752e51829b4723c9abb07ae56d2bbcda570 12-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Combine the fs_visitor constructors.

For scalar GS support, we either need to add a fourth constructor which
takes the GS structures, or combine the existing two and pass the shader
stage.

Given that they're not significantly different, I opted for the latter.

v2: Remove more stuff from the .h file (Jason and Jordan).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0db663503ea86579d3352fe83d428d573a8d2b03 07-May-2015 Francisco Jerez <currojerez@riseup.net> i965: Don't forget the force_sechalf flag in lower_load_payload().

Regression from commit 41868bb6824c6106a55c8442006c1e2215abf567.
Fixes a bunch of ARB_shader_image_load_store tests.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bfdae9149e00bd5c2521db3e75669ae043eed5cc 08-May-2015 Neil Roberts <neil@linux.intel.com> i965/fs: Disable opt_sampler_eot for textureGather

The opt_sampler_eot optimisation seems to break when the last
instruction is SHADER_OPCODE_TG4. A bunch of Piglit tests end up doing
this so it causes a lot of regressions. I can't find any documentation
or known workarounds to indicate that this is expected behaviour, but
considering that this is probably a pretty unlikely situation in a
real use case we might as well disable it in order to avoid the
regressions. In total this fixes 451 tests.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f98c3f3e44abb0c8cb158c589418def111d72052 08-May-2015 Neil Roberts <neil@linux.intel.com> i965/fs: Improve a comment about stripping trailing zeroes

Originally I wrote that removing the first parameter doesn't work but
I didn't know why. I now found a mention of this in the PRM so it's
probably worthing adding it to the comment.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e51bad669a4c42845c44a925bbb5d8885799c28f 07-May-2015 Neil Roberts <neil@linux.intel.com> i965/skl: In opt_sampler_eot always set destination register to null

opt_sampler_eot enables a direct write to framebuffer from a sample.
In order to do this the sample message needs to have a message header
so if there wasn't one already then the function adds one. In addition
the function sets the destination register to null because it's no
longer used. However it was only doing this in cases where it was
adding a message header. This patch just moves setting the destination
so that it happens even if there's a messge header. In practice this
doesn't seem to make any difference but it's a bit cleaner.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1c5de556c5972c3020b4095c586a9b439b20cf69 07-May-2015 Neil Roberts <neil@linux.intel.com> i965/fs: Set the header_size on LOAD_PAYLOAD in opt_sampler_eot

Commit 94ee908448 added a header size parameter to the function to
create the LOAD_PAYLOAD instruction. However this broke
opt_sampler_eot which manually constructs the instruction and so
wasn't setting the header_size. This ends up making the parameters for
the send message all have the wrong location and it all falls apart.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7a75b55a01d355090d186357896e3cb141b9775e 02-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_inst: Get rid of the effective_width field

The effective_width field was an ill-concieved hack to get around issues in
the LOAD_PAYLOAD instruction. Now that the LOAD_PAYLOAD instruction is far
more sane, this field can die.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
41868bb6824c6106a55c8442006c1e2215abf567 25-Mar-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Rework the fs_visitor LOAD_PAYLOAD instruction

The newly reworked instruction is far more straightforward than the
original. Before, the LOAD_PAYLOAD instruction was lowered by a the
complicated and broken-by-design pile of heuristics to try and guess
force_writemask_all, exec_size, and a number of other factors on the
sources.

Instead, we use the header_size on the instruction to denote which sources
are "header sources". Header sources are required to be a single physical
hardware register that is copied verbatim. The registers that follow are
considered the actual payload registers and have a width that correspond's
to the LOAD_PAYLOAD's exec_size and are treated as being per-channel. This
gives us a fairly straightforward lowering:

1) All header sources are copied directly using force_writemask_all and,
since they are guaranteed to be a single register, there are no
force_sechalf issues.

2) All non-header sources are copied using the exact same force_sechalf
and force_writemask_all modifiers as the LOAD_PAYLOAD operation itself.

3) In order to accommodate older gens that need interleaved colors,
lower_load_payload detects when the destination is a COMPR4 register
and automatically interleaves the non-header sources. The
lower_load_payload pass does the right thing here regardless of whether
or not the hardware actually supports COMPR4.

This patch commit itself is made up of a bunch of smaller changes squashed
together. Individual change descriptions follow:

i965/fs: Rework fs_visitor::LOAD_PAYLOAD

We rework LOAD_PAYLOAD to verify that all of the sources that count as
headers are, indeed, exactly one register and that all of the non-header
sources match the destination width. We then take the exec_size for
LOAD_PAYLOAD directly from the destination width.

i965/fs: Make destinations of load_payload have the appropreate width

i965/fs: Rework fs_visitor::lower_load_payload

v2: Don't allow the saturate flag on LOAD_PAYLOAD instructions

i965/fs_cse: Support the new-style LOAD_PAYLOAD

i965/fs_inst::is_copy_payload: Support the new-style LOAD_PAYLOAD

i965/fs: Simplify setup_color_payload

Previously, setup_color_payload was a a big helper function that did a
lot of gen-specific special casing for setting up the color sources of
the LOAD_PAYLOAD instruction. Now that lower_load_payload is much more
sane, most of that complexity isn't needed anymore. Instead, we can do
a simple fixup pass for color clamps and then just stash sources
directly in the LOAD_PAYLOAD. We can trust lower_load_payload to do the
right thing with respect to COMPR4.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
94ee908448405c8271e8662914a1c49df8d623b2 24-Mar-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Make LOAD_PAYLOAD take a header size

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
32af7d4188e286a525081ada9965070dd41dbab7 02-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_inst: Add an is_copy_payload helper

This commit adds a new is_copy_payload helper to fs_inst that takes the
place of the similarly named functions in cse and register coalesce. The
two is_copy_payload functions in CSE and register coalesce were subtly
different and potentially subtly broken. The new version unifies the two
and should be more correct.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
76c1086f2dfb37a1edf6d2df6eebbe11ccbfc50b 24-Mar-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Change header_present to header_size in backend_instruction

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3da9f708d4f1375d674fae4d6c6eb06e4c8d9613 20-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.

v2: Save some CPU cycles by doing 'return progress' rather than
'depth++' in the discard jump special case.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f2fad0dc80627e853eea558498f18a9fa769992e 19-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965: Perform basic optimizations on the BROADCAST opcode.

v2: Style fixes.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f118e5d15fd9b35cf27a975a702c5fb81d3157aa 23-Apr-2015 Francisco Jerez <currojerez@riseup.net> i965: Add typed surface access opcodes.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0775d8835ac8d1f2ab75d04f0cddbad36b6787fe 23-Apr-2015 Francisco Jerez <currojerez@riseup.net> i965: Add untyped surface write opcode.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b750e14fbbeb20a6daa869ae642c0c1e1ce6e6d2 16-Apr-2015 Jordan Justen <jordan.l.justen@intel.com> i965/fs: Add CS shader time support

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
17233f9bbcbf570f0c7633c63dbd5ed88634ed60 21-Apr-2015 Jordan Justen <jordan.l.justen@intel.com> i965: Add brw_setup_tex_for_precompile. Use in VS, GS & FS.

Suggested-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c380973a9564be57acdae5ab6c6a9efcb72cf6c9 31-Aug-2014 Jordan Justen <jordan.l.justen@intel.com> i965/fs: Support compute programs in fs_visitor

v2:
* Clean out some unneeded code copied from run_fs (krh)
* Always use NIR
* Split shader time out into a separate commit

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
02e9773bc8526f64e4d79e3d9ac11f49882c022f 24-Apr-2015 Neil Roberts <neil@linux.intel.com> i965/fs: Strip trailing constant zeroes in sample messages

If a send message is emitted with a message length that is less than
required for the message then the remaining parameters default to
zero. We can take advantage of this to save a register when a shader
passes constant zeroes as the final coordinates to the sample
function.

I think this might be useful for GLES applications that are using 2D
textures to simulate 1D textures.

On Skylake it will be useful for shaders that do
texelFetch(tex,something,0) which I think is fairly common. This helps
more on Skylake because in that case the order of the instruction
operands are u,v,lod,r which is good for 2D textures whereas before
they were u,lod,v,r which is only good for 1D textures.

On Haswell:
total instructions in shared programs: 8535730 -> 8533261 (-0.03%)
instructions in affected programs: 236968 -> 234499 (-1.04%)
helped: 1174

On Skylake:
total instructions in shared programs: 10345646 -> 10341237 (-0.04%)
instructions in affected programs: 293011 -> 288602 (-1.50%)
helped: 1218

Reviewed-by: Matt Turner <mattst88@gmail.com>

v2: Applied suggestions by Kenneth Graunke:
- Only apply on Gen5+
- Apply to all texture opcodes, not just TEX and TXF.
Moved the optimisation into the loop as suggested by Matt Turner.
Fix the array index when there is a header.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1ac7db07b363207e8ded9259f84bbcaa084b8667 12-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Unhardcode a few more stage names and abbreviations.

The stage_abbrev and stage_name fields in backend_visitor provide what
we need without any additional effort. It also means we'll get the
right names for compute shaders, SIMD8 geometry shaders, and both kinds
of tessellation shaders.

This does unfortunately change the capitalization of the stage
abbreviation in the INTEL_DEBUG=optimizer output filenames. It doesn't
seem worth adding code to handle, though.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5d4f085a43ccd1122301421f2013e42a3f0a7604 28-Apr-2015 Neil Roberts <neil@linux.intel.com> i965: Don't try to apply the opt_sampler_eot extension for vs

The opt_sampler_eot optimisation of fs_visitor effectively assumes
that it is running on a fragment shader because it casts the program
key to a brw_wm_prog_key. However on Skylake fs_visitor can also be
used for vertex shaders. It looks like this usually works anyway
because the optimisation is skipped if key->nr_color_regions != 1.
However for a vertex shader the key is actually a brw_vs_prog_key so
the space for nr_color_regions is probably taken up by
key->base.program_string_id. This can end up making nr_color_regions
be 1 in which case the function will later assert when the last
instruction is not FS_OPCODE_FB_WRITE. This was making the DEQP test
suite assert. Presumably this only happens there because that compiles
a lot of shaders so it would end up with a high value for
program_string_id.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a85c4c9b3f75cac9ab133caa91a40eec2e4816ae 16-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Rename brw_compile to brw_codegen

This name better matches what it's actually used for. The patch was
generated with the following command:

for file in *; do
sed -i -e s/brw_compile/brw_codegen/g $file
done

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cfc56fcee36912d5fb41262c71463292a737160e 17-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Use device_info instead of the context for computing vue maps

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
28e9601d0e681411b60a7de8be9f401b0df77d29 16-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Add a devinfo field to backend_visitor and use it for gen checks

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5af0604d528733af9113a6f8711c39796ce0ae40 07-Apr-2015 Matt Turner <mattst88@gmail.com> i965/fs: Calculate delta_x and delta_y together.

This lets SIMD16 programs on G45 and Gen5 use the PLN instruction.

On Ironlake:

total instructions in shared programs: 5634757 -> 5518055 (-2.07%)
instructions in affected programs: 1745837 -> 1629135 (-6.68%)
helped: 11439
HURT: 4

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a1dd2f0bb6f9bf61d4a40d033740140b86c060e0 12-Apr-2015 Matt Turner <mattst88@gmail.com> i965/fs: Add LINTERP's src0 to fs_inst::regs_read().

LINTERP's src0 is PLN's src1, and PLN's src1 reads exec_size / 4
registers.

Having that information lets us drop the delta_x/y special case code in
split_virtual_grfs().

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b069f9eafd945a86be633d8fff4e715fc6d7ec2d 08-Feb-2015 Ben Widawsky <benjamin.widawsky@intel.com> i965/fs: Combine tex/fb_write operations (opt)

Certain platforms support the ability to sample from a texture, and write it out
to the file RT - thus saving a costly send instructions (note that this is a
potnential win if one wanted to backport to a tag that didn't have the patch
from Topi which removed excess MOVs from LOAD_PAYLOAD - 97caf5fa04dbd2),

v2: Modify the algorithm. Instead of iterating in reverse through blocks and
insts, since the last block/inst is the only thing which can benefit. Rebased
on top of Ken's patching modifying is_last_send

v3: Rebased over almost 2 months, and Incorporated feedback from Matt:
Some comment typo fixes and rewordings.
Whitespace
Move the optimization pass outside of the optimize loop

v4: Some cosmetic changes requested from Ken. These changes ensured that the
optimization function always returned true when an optimization occurred, and
false when one did not. This behavior did not exist with the original patch. As
a result, having the separate helper function which Matt did not like no longer
made sense, and so now I believe everyone should be happy.

Benchmark (n=20) %diff
*OglBatch5 -1.4
*OglBatch7 -1.79
OglFillTexMulti 5.57
OglFillTexSingle 1.16
OglShMapPcf 0.05
OglTexFilterAniso 3.01
OglTexFilterTri 1.94

No piglit regressions:
(http://otc-gfxtest-01.jf.intel.com:8080/view/dev/job/bwidawsk/112/)

[*] I believe my measurements are incorrect for Batch5-7. If I add this new
optimization, but never emit the new instruction I see similar results.

v5: Remove declaration of combine_tex_header since v4 dropped that function
(Ben)
Remove check for impossible case of an empty block (Matt)
Set dest earlier to avoid extra special-casing in generate_tex (Matt)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6866378cf42c86d03f38616804e6714a932ab70b 10-Apr-2015 Ben Widawsky <benjamin.widawsky@intel.com> i965/fs: Only emit FS_OPCODE_PLACEHOLDER_HALT if there are discards

Based originally on a patch from Ken in May 2014 of the same title. Things
changed enough that I didn't feel comfortable leaving his authorship.

v2: Replace fp->UsesKill with wm_prog_data->uses_kill. Since Ken took the time
to also explain the difference to me, here is his explanation for posterity:

"fp->UsesKill indicates that a ARB_fragment_program shader uses the KIL
instruction, or that a GLSL shader uses the "discard" insntruction
(which are analogous).

On Gen4-5, we sometimes have to simulate OpenGL's "Alpha Test" feature
by emitting shader code that implicitly does a "discard" instruction.

In the key setup, we do:

/* key->alpha_test_func means simulating alpha testing via discards,
* so the shader definitely kills pixels.
*/
prog_data.uses_kill = fp->program.UsesKill || key->alpha_test_func;

Even though the shader may not technically contain a "discard", we need
to act as if it does.

I've also been trying to move the i965 state setup code to use
brw_wm_prog_key for everything, rather than poking at core Mesa's
gl_program/gl_fragment_program/gl_shader/gl_shader_program structures.

--Ken"

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
38707e1478a4b6f4687c583d06fbd68e22900735 01-Apr-2015 Ben Widawsky <benjamin.widawsky@intel.com> i965/fs: Create a has_side_effects for fs_inst

When an instruction has a side effect, it impacts the available options when
reordering an instruction. As the EOT flag is an implied write to the render
target in the FS, it can be considered a side effect.

This patch shouldn't actually have any impact on the current code since the EOT
flag implies that the opcode is already one with side effects,
FS_OPCODE_FB_WRITE. The next patch however will introduce an optimization
whereby the EOT flag can occur with an opcode SHADER_OPCODE_TEX, and as that
instruction will perform the same implied write to the render target, it cannot
be reordered.

v2: Remove extra whitespace (Matt)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
21d29124a719bdaf5794859a4a7441cc6be33df7 12-Apr-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Fix INTEL_DEBUG=shader_time for SIMD8 VS.

In commit 4ebeb71573ad44f7657810dc5dd2c9030e3e63db, I deleted the
emit_shader_time_end() call in emit_urb_writes(). But I failed to add
it to run_vs(), as I intended. So no data was recorded at all.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8aee87fe4cce0a883867df3546db0e0a36908086 20-Feb-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Use SIMD16 instead of SIMD8 on Gen4 when possible.

Gen5+ systems allow you to specify multiple shader programs - both SIMD8
and SIMD16 - and the hardware will automatically dispatch to the most
appropriate one, given the number of subspans to be processed.

However, that is not the case on Gen4. Instead, you program a single
shader. If you enable multiple dispatch modes (SIMD8 and SIMD16), the
shader is supposed to contain a series of jump instructions at the
beginning. The hardware will launch the shader at a small offset,
hitting one of the jumps.

We've always thought that sounds like a pain, and weren't clear how it
affected performance - is it worth having multiple shader types? So,
we never bothered with SIMD16 until now.

This patch takes a simpler approach: try and compile a SIMD16 shader.
If possible, set the no_8 flag, telling the hardware to just use the
SIMD16 variant all the time.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bff421332661bfd0f82ab9eee9e4fec9d06ed1a1 03-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Check the INTEL_USE_NIR environment variable once at context creation

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b9b66985c3d33fa0db2b49c0e0231aa6d341e183 20-Mar-2015 Carl Worth <cworth@cworth.org> i965: Rename do_<stage>_prog to brw_compile_<stage>_prog (and export)

This is in preparation for these functions to be called from other
files.

This commit is intended to have no functional change. It exists in
preparation for some upcoming code movement in preparation for the
shader cache.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ac69ab7302dffa1350c64a9c69abd7721d0f0127 27-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Move env_var_as_boolean to intel_debug.c.

I need to use this in brw_vec4.cpp, so it can't be static anymore.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
826d3afb8f421a62020308813397e541e672381e 30-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Add ARB_fragment_program support to the NIR backend.

Use prog_to_nir where we would normally call glsl_to_nir, handle program
parameter lists, and skip a few things that don't exist.

Using NIR generates much better shader code than Mesa IR, since we get
real optimizations, as opposed to prog_optimize:

total instructions in shared programs: 314007 -> 279892 (-10.86%)
instructions in affected programs: 285173 -> 251058 (-11.96%)
helped: 2001
HURT: 67
GAINED: 4
LOST: 7

v2: Change early return in nir_setup_uniforms to if/else (Jordan).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
74c7e5d35181d31e4448c614f6aa62c1e1f60694 18-Mar-2015 Francisco Jerez <currojerez@riseup.net> i965: Define method to check whether a backend_reg is inside a given range.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8a0946f3b1522e5f91afe14c8c3b22ba6009ed04 06-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Make an emit_discard_jump() function to reduce duplication.

This is already copied in two places, and I want to copy it to a third
place.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b0d422cd2a99d2fd26ab11880d5d8410ebfc64b2 16-Mar-2015 Matt Turner <mattst88@gmail.com> i965/fs: Print spills:fills and number of promoted constants.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e1f3ddef8c9928d9b8e845b811dc08983c541f99 17-Mar-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Make our environment variable checking smarter

Before, we enabled NIR if you set INTEL_USE_NIR to anything which mean that
INTEL_USE_NIR=false would actually turn on NIR. In preparation for turning
NIR on by default, this commit makes it smarter by allowing the
INTEL_USE_NIR variable to work as either a force-enable or a force-disable.

Reviewed-by: Mark Janes <mark.a.janes@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
627c68308683abbd6e563a09af6013a33938a790 16-Mar-2015 Tapani Pälli <tapani.palli@intel.com> i965/fs: in MAD optimizations, switch last argument to be immediate

Commit bb33a31 introduced optimizations that transform cases of MAD
in to simpler forms but it did not take in to account that src[0]
can not be immediate and did not report progress. Patch switches
src[0] and src[1] if src[0] is immediate and adds progress
reporting. If both sources are immediates, this is taken care of by
the same opt_algebraic pass on later run.

v2: Fix for all cases, use temporary fs_reg (Matt, Kenneth)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89569
Reviewed-by: Francisco Jerez <currojerez@riseup.net> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
547c760964bcad23a056e5156e4fefd7487c0192 09-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Use NIR for scalar VS when INTEL_USE_NIR is set.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6ac1bc90c4a7a6f32901a9782e14b090f6fe5270 10-Mar-2015 Iago Toral Quiroga <itoral@igalia.com> i965: Fix out-of-bounds accesses into pull_constant_loc array

The piglit test glsl-fs-uniform-array-loop-unroll.shader_test was designed
to do an out of bounds access into an uniform array to make sure that we
handle that situation gracefully inside the driver, however, as Ken describes
in bug 79202, Valgrind reports that this is leading to an out-of-bounds access
in fs_visitor::demote_pull_constants().

Before accessing the pull_constant_loc array we should make sure that
the uniform we are trying to access is valid.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79202
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4ebeb71573ad44f7657810dc5dd2c9030e3e63db 27-Feb-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Make emit_shader_time_end() insert before EOT.

Previously, we emitted the shader-time epilogue from emit_fb_writes(),
during the middle of looping through color regions (or emit_urb_writes
for the VS). This is duplicated several times and rather awkward.

I need to fix a bug in our FB write handling, and it will be a lot
easier if we move emit_shader_time_end() out of there.

Now, we simply emit FB writes/URB writes, and subsequently have
emit_shader_time_end() insert instructions before the final SEND with
EOT. Not only is this simpler, it's actually a slight improvement:
we now include the MOVs to set up the final FB write payload in our
shader-time measurements.

Note that INTEL_DEBUG=shader_time only exists on Gen7+, and uses
send-from-GRF. (In the past, we might have hit trouble where both
attempt to use MRFs for messages; that's not a problem now.)

v2: Rebase on v3 of the previous patch and other shader_time fixes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> [v1]
Acked-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e43af8d09f919d02b5ac0810c1c0f1783cbef6ef 27-Feb-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Make get_timestamp() pass back the MOV rather than emitting it.

This makes another part of the INTEL_DEBUG=shader_time code emittable
at arbitrary locations, rather than just at the end of the instruction
stream.

v2: Don't lose smear! Caught by Topi Pohjolainen.
v3: Don't set smear on the destination of the MOV. Thanks Topi!

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bea854c7f33cc10b8292f931f114afc4f88a8dd4 27-Feb-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Make emit_shader_time_write return rather than emit.

Instead of emit_shader_time_write, we now do emit(SHADER_TIME_ADD(...)).
The advantage is that we can also insert a shader time write at an
arbitrary location in the instruction stream, rather than being
restricted to emitting at the end.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f1adc45dbe649cdd4538fb96f6d2a27328bbfba1 08-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Set smear on shader_time diff register.

The ADD(diff, diff, fs_reg(-2u)) instruction reads diff, which is a
width 1 register. We need to read it as <0,1,0> with a subreg of 0,
which is what smear accomplishes.

Fixes assertion:
brw_eu_emit.c:285: validate_reg: Assertion `hstride == 0' failed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86974
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ef9cc7d0c176669c03130abf576f2b700be39514 08-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Set force_writemask_all on shader_time instructions.

These computations don't have anything to do with the currently
executing channels, so they should use force_writemask_all.

This fixes assert failures.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86974
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f9779e4a8f2ca67423cded0203adac6ad3d5c448 28-Feb-2015 Ian Romanick <ian.d.romanick@intel.com> i965/fs: Silence unused parameter warning

Unused since b18fd23.

brw_fs.cpp:2878:44: warning: unused parameter 'dispatch_width' [-Wunused-parameter]
clear_deps_for_inst_src(fs_inst *inst, int dispatch_width, bool *deps,
^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a84f66a9b6cf46bb19ca71faca5b1d6d81209caf 06-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Resolve source modifiers on Gen8+ logic operations.

On Gen8+, AND/OR/XOR/NOT don't support the abs() source modifier, and
negate changes meaning to bitwise-not (~, not -). This isn't what NIR
expects, so we should resolve the source modifers via a MOV.

+30 Piglits (fs-op-bit{and,or,xor}-not-abs-*).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
237dcb4aa7c39c59bfd225ae3d73caf709be216d 05-Mar-2015 Mark Janes <mark.a.janes@intel.com> Fix invalid extern "C" around header inclusion.

System headers may contain C++ declarations, which cannot be given C
linkage. For this reason, include statements should never occur
inside extern "C".

This patch moves the C linkage statements to enclose only the
declarations within a single header.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e214000f258ae564e64d839cccee9418526f226b 14-Jan-2015 Matt Turner <mattst88@gmail.com> i965/fs: Don't use backend_visitor::instructions after creating the CFG.

This is a fix for a regression introduced in commit a9f8296d ("i965/fs:
Preserve the CFG in a few more places.").

The errata this code works around is described in a comment before the function:

"[DevBW, DevCL] Errata: A destination register from a send can not be
used as a destination register until after it has been sourced by an
instruction with a different destination register.

The framebuffer write's sources must be in message registers, which SEND
instructions cannot have as a destination. There's no way for this
errata to affect anything at the end of the program. Just remove the
code.

Cc: 10.4, 10.5 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84613
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
49a7f8c919d23fec977116f218780a35896cc1dd 28-Feb-2015 Brian Paul <brianp@vmware.com> i965: replace Elements() with ARRAY_SIZE()

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ee3f6745723856419d7f5ecb17652e19855c4caa 06-Jul-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Remove redundant discard jumps.

With the previous optimization in place, some shaders wind up with
multiple discard jumps in a row, or jumps directly to the next
instruction. We can remove those.

Without NIR on Haswell:
total instructions in shared programs: 5777258 -> 5775872 (-0.02%)
instructions in affected programs: 20312 -> 18926 (-6.82%)
helped: 716

With NIR on Haswell:
total instructions in shared programs: 5773163 -> 5771785 (-0.02%)
instructions in affected programs: 21040 -> 19662 (-6.55%)
helped: 717

v2: Use the CFG rather than the old instructions list. Presumably
the placeholder halt will be in the last basic block.

v3: Make sure placeholder_halt->prev isn't the head sentinel (caught
twice by Eric Anholt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
34c93fd7f119fa824062e05377de849b8a2da0e6 04-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix lower_load_payload() not to use an incorrect half for immediates and uniforms.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ea7b4d25c8da352f4ca0dcaefa4fadb9e202636e 06-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix lower_load_payload() to take into account non-zero reg_offset.

Fixes metadata guess when instructions in the program specify a
destination register with non-zero reg_offset and when the payload of
a LOAD_PAYLOAD spans several registers.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
08b4c8f7bf2cc2fe914a07a32bf4961894593e72 04-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().

MRFs cannot be read from anyway so they cannot possibly be a valid
source of LOAD_PAYLOAD.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8e47f51a5a7aba2bb56e7185988072431444d811 17-Jan-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Less broken handling of force_writemask_all in lower_load_payload().

It's perfectly fine to read the second half of a register written with
force_writemask_all from a first half MOV instruction or vice versa, and
lower_load_payload shouldn't mark the whole MOV as belonging to the second
half in that case. Replicate the same metadata to both halves of the
destination when writemasking is disabled.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6e62a52865787362ae1deb9dee80140d3a66c519 20-Feb-2015 Ben Widawsky <benjamin.widawsky@intel.com> i965/skl: Use 1 register for uniform pull constant payload

When under dispatch_width=16 the previous code would allocate 2 registers for
the payload when only one is needed. This manifested itself through bugs on SKL
which needs to mess with this instruction.

Ken though this might impact shader-db, but apparently it doesn't

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89118
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88999
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Timo Aaltonen <timo.aaltonen@canonical.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c442d0961e4ec6dcc304d652b637bb60687ce3cb 14-Aug-2014 Dave Airlie <airlied@gmail.com> i965: just avoid warnings with fp64

This just fills in some blanks to avoid warnings in the i965 driver.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2bd139e18c941e7ea0870ba43314a5c10fd5bb12 19-Feb-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Un-hardcode DEBUG_WM, "FS", and "fragment".

These code paths can (or will) be used for other shader stages.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bb33a31c3830945ae768ebdaeb686291bdf897fa 10-Nov-2014 Matt Turner <mattst88@gmail.com> i965/fs: Add algebraic optimizations for MAD.

total instructions in shared programs: 5764176 -> 5763808 (-0.01%)
instructions in affected programs: 25121 -> 24753 (-1.46%)
helped: 164
HURT: 2

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2dad1e3abdb1ad153289455f3e273101e5bac1a8 12-Feb-2014 Matt Turner <mattst88@gmail.com> i965/fs: Add pass to combine immediates.

total instructions in shared programs: 5885407 -> 5940958 (0.94%)
instructions in affected programs: 3617311 -> 3672862 (1.54%)
helped: 3
HURT: 23556
GAINED: 31
LOST: 165

... but will allow us to always emit MAD instructions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7a83f7d4814c9216316a742e97c33259f7b3ae76 09-Jan-2015 Matt Turner <mattst88@gmail.com> i965/fs: Handle W/UW-type immediates in dump_instructions().
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
74ef90acd751fc91ba9e20c2f16871fa9bf140e0 13-Feb-2015 Matt Turner <mattst88@gmail.com> i965: Let dump_instructions() work before calculate_cfg().

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fa124a337ca10d2c5d2d81a89dc8c21a7ba2f58b 13-Feb-2015 Matt Turner <mattst88@gmail.com> i965/fs: Call calculate_cfg() before optimize().

The CFG is fundamental to the FS IR, not merely a piece of optimization.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
eb47d0efd39d73d4388389d6c0ebe458160f79fa 05-Feb-2015 Matt Turner <mattst88@gmail.com> i965: Optimize multiplication by -1 into a negated MOV.

instructions in affected programs: 968 -> 942 (-2.69%)
helped: 4

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
09d6ea9ae3c487be20fb3157368003d30856d3bc 11-Feb-2015 Matt Turner <mattst88@gmail.com> i965/fs: Remove conditional mod when optimizing a SEL into a MOV.

Missed in commit ca675b73, but got right in the companion commit 3c28b2c0.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3df2cb2f863836ec909f5259693c1eeef675a594 03-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix fs_inst::regs_written calculation for instructions with scalar dst.

Scalar registers are required to have zero stride, fix the
regs_written calculation not to assume that the instruction writes
zero registers in that case.

v2: Rename CEILING() to DIV_ROUND_UP(). (Matt, Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f2668f9f214201503419342b980d3afa8b796926 06-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix stack allocation of fs_inst and stop stealing src array provided on construction.

Using 'ralloc*(this, ...)' is wrong if the object has automatic
storage or was allocated through any other means. Use normal dynamic
memory instead.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a3ee6c7d1991a90d22fae992c1cb94123e51ae54 06-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Remove dependency of fs_inst on the visitor class.

The fs_visitor argument of fs_inst::regs_read() wasn't used at all.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
447879eb88b8df41ad32cf4406cc636b112b72d9 10-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965: Factor out virtual GRF allocation to a separate object.

Right now virtual GRF book-keeping and allocation is performed in each
visitor class separately (among other hundred different things),
leading to duplicated logic in each visitor and preventing layering as
it forces any code that manipulates i965 IR and needs to allocate
virtual registers to depend on the specific visitor that happens to be
used to translate from GLSL IR.

v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor).

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d4a461caaf00ae13b83f106f032d3f4125687a02 15-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Fix INTEL_DEBUG=shader_time for SIMD8 VS (and GS).

We were incorrectly attributing VS time to FS8 on Gen8+, which now use
fs_visitor for vertex shaders.

We don't hit this for geometry shaders yet, but we may as well add
support now - the fix is obvious, and we'll just forget later.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6b3a301f611c9aabc090522951eda589e8302562 07-Jan-2015 Matt Turner <mattst88@gmail.com> i965: Set CMP's destination type to src0's type.

Allows CMP instructions with float sources to be compacted and coissued.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
94e7b59a75fc2ecc51a74196f6cd198546603b85 05-Jan-2015 Matt Turner <mattst88@gmail.com> i965: Convert CMP.GE -(abs)reg 0 -> CMP.Z reg 0.

total instructions in shared programs: 5952059 -> 5951603 (-0.01%)
instructions in affected programs: 138812 -> 138356 (-0.33%)
GAINED: 1
LOST: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
19f9cb72c8b95febd53b80de137e7bf716fb45f1 22-Aug-2014 Matt Turner <mattst88@gmail.com> i965/fs: Add pass to propagate conditional modifiers.

total instructions in shared programs: 5974160 -> 5959463 (-0.25%)
instructions in affected programs: 1743737 -> 1729040 (-0.84%)
GAINED: 0
LOST: 12

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
eed7223243c35bba092dc0b26e592f6af1ba3fd7 30-Dec-2014 Matt Turner <mattst88@gmail.com> i965/fs: Add a pass to fixup 3-src instructions that have a null dest.

3-src instructions can only have GRF/MRF destinations. It's really
difficult to deal with that restriction in dead code elimination (that
wants to give instructions null destinations to show that their result
isn't used) while allowing 3-src instructions to have conditional mod,
so don't, and just give then a destination before register allocation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a5ca86a9833d6fd57ee609d8d1e630dc66ebd371 16-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Enable SIMD16 support in the NIR FS backend.

With the previous commits in place, it just works.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3f263ffbb37d77f97a86686e1d2d5eeabf4ecae6 16-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Replace fs_reg(GRF, virtual_grf_alloc(...)) with vgrf(...).

brw_fs_nir.cpp creates almost all of its registers via:

fs_reg reg = fs_reg(GRF, virtual_grf_alloc(num_components));

When we add SIMD16 support, we'll need to set reg->width = 16 and
double the VGRF size...on pretty much every VGRF it allocates.

This patch replaces that pattern with a new "vgrf" helper method:

fs_reg reg = vgrf(num_components);

The new function correctly takes reg_width into account. For now,
reg_width is always 1, so this should have no functional change.

v2: Just make vgrf() account for reg_width right away, rather than
changing the behavior in the next patch.

v3: Replace one last virtual_grf_alloc I missed. It's used in code
that only runs for dispatch_width == 8, so it doesn't matter,
but consistency is nice.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d1533d87cc7e2c39e7ce9dc838b45a2c39c96e33 16-May-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Replace fs_reg(fs_visitor, type) with fs_visitor::vgrf(type).

I dislike how fs_reg has a constructor that knows about fs_visitor.
Apart from that, it stands alone, with no need to interact with the
rest of the compiler. Which is sensible - a class that represents
a register should do just that. Allocating virtual register numbers
should be left up to the compiler (fs_visitor).

This patch replaces the constructor with a new fs_visitor::vgrf method,
eliminating fs_reg's dependency on fs_visitor. It ends up being no
more code.

v2: Rebase from May 2014 -> January 2015.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
faaca237341abc0f784edfb16df50104110365b8 16-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Make lower_load_payload etc. appear in INTEL_DEBUG=optimizer.

In order to support calling lower_load_payload() inside a condition,
this patch makes OPT() a statement expression:

https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html

We recently did the equivalent change in the vec4 backend (commit
9b8bd67768769b685c25e1276e053505aede5f93).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b1fe8604c6b679768e880b5e1d7f18b92067721b 21-Oct-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Don't take an ir_variable for emit_general_interpolation

Previously, emit_general_interpolation took an ir_variable and pulled the
information it needed from that. This meant that in fs_fp, we were
constructing a dummy ir_variable just to pass into it. This commit makes
emit_general_interpolation take only the information it needs and gets rid
of the fs_fp cruft.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ae2880d131e3197114940fc7028397079840f97d 15-Oct-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Only use nir for 8-wide non-fast-clear shaders.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2faf7f87d6a1c00b3f3d3907178a2eeeefa5d2a9 15-Aug-2014 Connor Abbott <connor.abbott@intel.com> i965/fs: add a NIR frontend

This is similar to the GLSL IR frontend, except consuming NIR. This lets
us test NIR as part of an actual compiler.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
Make brw_fs_nir build again
Only use NIR of INTEL_USE_NIR is set
whitespace fixes
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
616a48ebc6b858cf15ade15238f1a549b701ebc3 05-Aug-2014 Connor Abbott <connor.abbott@intel.com> i965/fs: make emit_fragcoord_interpolation() not take an ir_variable
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
68ed14d6adcaf4b91216fc1c53792e88d1fd024d 13-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Pass a shader stage abbreviation to fs_generator().

A lot of messages hardcoded the string "FS", which is confusing on
Broadwell, where we use this code for VS support as well.

shader-db particularly got confused, as it reported two "FS SIMD8"
shaders, and no vertex shaders at all. Craziness ensued.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0ac4c272755c75108a10a84ce33bf6a6234985d3 10-Dec-2014 Kristian Høgsberg <krh@bitplanet.net> i965/skl: Always use a header for SIMD4x2 sampler messages

SKL+ overloads the SIMD4x2 SIMD mode to mean either SIMD8D or SIMD4x2
depending on bit 22 in the message header. If the bit is 0 or there is
no header we get SIMD8D. We always wand SIMD4x2 in vec4 and for fs pull
constants, so use a message header in those cases and set bit 22 there.

Based on an initial patch from Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0b98b2bf535d6e6b6b02c0d47ea03f98adf42f15 01-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Make the precompile ignore DEPTH_TEXTURE_MODE on Gen7.5+.

Gen7.5+ platforms that support the "Shader Channel Select" feature leave
key->tex.swizzles[i] as SWIZZLE_NOOP except when GL_DEPTH_TEXTURE_MODE
is GL_ALPHA (which is really uncommon). So, the precompile should leave
them as SWIZZLE_NOOP (aka SWIZZLE_XYZW) as well.

We didn't notice this because prog->ShadowSamplers is not set correctly.
The next patch will fix that problem.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
408e298942ffb03c00e05dce2569c291df6bec49 01-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Fix INTEL_DEBUG=optimizer with VF types.

Hardcoding stderr is wrong; INTEL_DEBUG=optimizer uses other files.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7bc6e455e231076bfac6c678c375ea4aca94ebf0 21-Dec-2014 Matt Turner <mattst88@gmail.com> i965: Add support for saturating immediates.

I don't feel great about assert(!"unimplemented: ...") but these
cases do only seem possible under some currently impossible circumstances.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3978585bccf69ff8f607cad0de025ea91c418587 20-Dec-2014 Matt Turner <mattst88@gmail.com> i965: Add fs_reg/src_reg constructors that take vf[4].

Sometimes it's easier to generate 4x values into an array, and the
memcpy is 1 instruction, rather than 11 to piece 4 arguments together.

I'd forgotten to remove the prototype from fs_reg from a previous patch,
so it's already there for us here.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a5481d6fbba9bcaa0c7d49ae0a3580fee21041a6 19-Dec-2014 Matt Turner <mattst88@gmail.com> i965/fs: Add missing const qualifier.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fc016bc0f3d83bbf3eb968938f4bc9df55214ecd 16-Dec-2014 Mark Janes <mark.a.janes@intel.com> i965: remove includes of sampler.h from extern "C" blocks

C linkage was removed from functions in program/sampler.cpp. However,
some cpp files include program/sampler.h within extern "C" blocks,
causing link errors for test_vec4_copy_propagation.

Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7ff457b93028d1884c7952080edd919008edf141 28-Oct-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Clean up fs_visitor::run and rename to run_fs

Now that fs_visitor::run is back to being only fragment
shader compilation, we can clean up a few stage == MESA_SHADER_FRAGMENT
conditions and rename it to run_fs.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8b6a797d743be38396fcaf4a2f7fb01d3bcd9ba3 28-Oct-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Add fs_visitor::run_vs() to generate scalar vertex shader code

This patch uses the previous refactoring to add a new run_vs() method
that generates vertex shader code using the scalar visitor and
optimizer.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3d10f0a98c6169dcf4b1a001e624b489abca8298 21-Oct-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Prepare for using the ATTR register file in the fs backend

The scalar vertex shader will use the ATTR register file for vertex
attributes. This patch adds support for the ATTR file to fs_visitor.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d9e29f5d88d2ddd8ee9d10b7d88377a60fd0094f 21-Oct-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Add SIMD8 URB write low-level IR instruction

This is all we need from the generator for SIMD8 vertex shaders. This
opcode is just the send instruction, all the hard work will happen
in the visitor using LOAD_PAYLOAD.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
686ef091a4f76fa68d9d9cd5ef00f40c1416a5da 28-Oct-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Remove shader program argument and member from fs_generator

Now that the caller passes in the shader debug name, we don't need this
anymore.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9a1af7b31824ca573b2609434cf8299bfc9bc5e2 28-Oct-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Set shader name for generator from call site

fs_generator no longer knows what stage it's generating code for, so
we have to set the debug name of the shader from the call site.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7bb9d33b8d6ecc03670078c3f9623f188135abb7 21-Oct-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Generalize fs_generator further

This removes all stage specific data from the generator, and lets us
create a generator for any stage.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
092c73a7c32b240a26ffeab2ee475f6d590540b2 06-Dec-2014 Chris Forbes <chrisf@ijw.co.nz> i965: Fix regs read for FS_OPCODE_INTERP_PER_SLOT_OFFSET

Dead code elimination was eating the Y offset.

Fixes the piglit test:
spec/ARB_gpu_shader5/arb_gpu_shader5-interpolateAtOffset-nonconst

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2881b123d00562fee8b7d2b4f7825f89a73e0d9f 02-Dec-2014 Matt Turner <mattst88@gmail.com> i965: Use ~0 to represent true on all generations.

Jason realized that we could fix the result of the CMP instruction on
Gen <= 5 by doing -(result & 1). Also do the resolves in the vec4
backend before use, rather than when the bool was created. The FS does
this and it saves some unnecessary resolves.

On Ironlake:

total instructions in shared programs: 4289762 -> 4287277 (-0.06%)
instructions in affected programs: 619430 -> 616945 (-0.40%)

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2a4f5728ad27bd1605b3604908caa9ad4983e256 01-Dec-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Remove "disable_derivative_optimization" driconf option.

This was added in September 2013 when we first implemented the fast
(but lower quality) derivatives. A quick Google search didn't turn
up anyone using or recommending the option, so I suspect no one does.

Applications that want to control the quality of their derivatives can
use the new GL_ARB_derivative_control extension, or use the glHint
mechanism. The driconf option seems superfluous.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b449366587b5f3f64c6fb45fe22c39e4bc8a4309 03-Nov-2014 Matt Turner <mattst88@gmail.com> i965/fs: Remove opt_drop_redundant_mov_to_flags().

Dead code elimination now handles this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f1e5418f402c7ac087b1c127cb4476d0d02e0073 12-Nov-2014 Matt Turner <mattst88@gmail.com> i965: Don't treat IF or WHILE with cmod as writing the flag.

Sandybridge's IF and WHILE instructions can do an embedded comparison
with conditional mod.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
133280120b4bc714bbb7665e383f36ab262c280a 08-Nov-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Set prog_data->uses_kill if simulating alpha test via discards.

When using MRT on Gen4-5, we have to simulate GL's alpha test feature
by emitting discards in the fragment shader. In this case, it makes
sense to set prog_data->uses_kill, which means the fragment shader may
kill pixels via the discard mechanism.

This saves us from having to look an extra key value in a couple of
places, including in the generator.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5d23721c1df3d1a05c49b705f0d63e409c89d25f 09-Mar-2014 Matt Turner <mattst88@gmail.com> i965/fs: Add vector float immediate infrastructure.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b55777f39d00a0c54023eba012d326ff09fa530b 24-Nov-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Make precompile functions accessible from C.

Previously, the prototypes for brw_vs/gs/fs_precompile were scattered
between brw_vs.h (C), brw_gs.h (C), and brw_fs.h (C++ only). Also,
brw_fs_precompile had C++ linkage, while the others were C.

This patch moves all the prototypes to a central location (brw_shader.h)
and makes brw_fs_precompile have C linkage.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
62b425448ca92f568a571e656133e6d234434b4c 24-Nov-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Pass gl_program pointers into precompile functions.

We'd like to do precompiling for ARB vertex and fragment programs,
which only have gl_program structures - gl_shader_program is NULL.

This patch makes the various precompile functions take a gl_program
parameter directly, rather than accessing it via gl_shader_program.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
40c0d79d295657f30cb86b002003800844851703 12-Nov-2014 Matt Turner <mattst88@gmail.com> i965/fs: Remove is_valid_3src().

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1fdc75fde418a231a91ef0e68ea92d54bf594ea1 12-Nov-2014 Matt Turner <mattst88@gmail.com> i965/fs: Remove unused apply_stride().

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a4ffc2a445055a81a655e64d57ee393a14a2eb16 14-Nov-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Move fs_visitor ra pass to new fs_visitor::allocate_registers()

This will be reused for the scalar VS pass.

v2 (Ken): Rebase on master.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c50f2dadc588caa0cb350f44febce56d76d60ccb 14-Nov-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Move fs_visitor optimization pass into new method fs_visitor::optimize()

We'll reuse this toplevel optimization driver for the scalar VS.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5c4efc644e731178a07bb41c55cf96425166993f 14-Nov-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Move more code into codegen-branch of the fs_visitor::run() if statement

These last few operations all only apply when we've actually generated
code, optimized and allocated registers. The dummy and the repclear
shaders don't need the gen4 send workaround, and don't spill. This
means we can move these lines into the else-branch, which will make
the following refactoring easier.

v2 (Ken): Rebase on master, which removed the uncompressed stack.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f2bb655ac75d04dc033546479aabbbf4112cc54e 14-Nov-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Refactor fs_generator API

We split out SIMD8 and SIMD16 generation into seperate calls to
new method generate_code(), which returns the start offset for the
generated code. A new get_assembly() method returns the generated code.

This avoids asserting MESA_SHADER_FRAGMENT and accessing wm_prog_data
in the generator.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
497122a338e8a259abb43a71e79c1475fd44ce65 31-Oct-2014 Matt Turner <mattst88@gmail.com> i965/fs: Remove force uncompressed stack.

Last use was in shader_time.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7e19e6c877714e05e65ca2cecd1c782fdc260cb6 31-Oct-2014 Matt Turner <mattst88@gmail.com> i965/fs: Use execution size of 1 for some shader_time operations.

The ADDs depended on dispatch_width, which really isn't what we wanted.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ee7e6009a94d070f58a52001780d295798a28073 31-Oct-2014 Matt Turner <mattst88@gmail.com> i965/fs: Use mov(4) instructions to read timestamp.

We only want fields 0-2.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
799106d38734a867bf33add2994cb9d414d965e7 29-Oct-2014 Matt Turner <mattst88@gmail.com> i965/fs: Don't compute_to_mrf() on Gen >= 7.

No differences in shader-db on Haswell (Gen 7.5).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7d560a3861ff30aa9d8ec872cf9cd7d72a980eb2 21-Oct-2014 Ian Romanick <ian.d.romanick@intel.com> i965: Silence unused parameter warning in brw_dump_ir

Just remove the parameter. Silences:

brw_program.c: In function 'brw_dump_ir':
brw_program.c:566:33: warning: unused parameter 'brw' [-Wunused-parameter]

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
40492be2a4a339b02c38990ad8736644f3a8776b 24-Oct-2014 Matt Turner <mattst88@gmail.com> i965/fs: Silence uninitialized variable warning.

The compiler isn't privy to the knowledge that we're doing at least one
framebuffer write.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ffe582aa2076bc06f9d06e36287bdded45ab5b98 17-Oct-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Don't pass ir_variable * to emit_sampleid_setup().

gl_SampleID is a built-in variable that always is of type "int".

Suggested by Connor Abbott.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
50d0e2e118fb3e42dc83c83de34da3eac0a0d8a1 01-Oct-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Add a MAX_GRF_SIZE define and use it various places

Previously, we had a MAX_SAMPLER_MESSAGE_SIZE which we used instead.
However, some FB write messages can validly be longer than this so we need
something different. Since MAX_SAMPLER_MESSAGE_SIZE is validly useful on
its own, we leave it alone and add a new MAX_GRF_SIZE that's big enough for
FB writes.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84539
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
eedbce9c63a3f385908bdc8a69e8be98dd3522ff 01-Oct-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Fix the build
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
83669fac9d6d3f0633d19dcfebe7cf0286e69ab7 01-Oct-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Fix an uninitialized value warnings

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
94b68109fbe1cb60cc23a4c5a319039ada81ea81 27-Sep-2014 Matt Turner <mattst88@gmail.com> i965/fs: Optimize sqrt+inv into rsq.

Transform

sqrt a, b
rcp c, a

into

sqrt a, b
rsq c, b

The improvement here is that we've broken a dependency between these
instructions. Leads to 330 fewer INV instructions and 330 more RSQ.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5aa8d8194c4975876276a9c57cdd672978a491ad 15-May-2014 Ian Romanick <ian.d.romanick@intel.com> glsl: Make ir_variable::num_state_slots and ir_variable::state_slots private

Also move num_state_slots inside ir_variable_data for better packing.

The payoff for this will come in a few more patches.

No change Valgrind massif results for a trimmed apitrace of dota2.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4ddc25a8d4796316f0296eaa10eba26bd6dd1718 27-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Properly calculate the number of instructions in calculate_register_pressure

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
514fd1c55e617bb325979cbee4a89f0727c3b567 13-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use the GRF for FB writes on gen >= 7

On gen 7, the MRF was removed and we gained the ability to do send
instructions directly from the GRF. This commit enables that
functinoality for FB writes.

v2: Make handling of components more sane.

i965/fs: Force a high register for the final FB write

v2: Renamed the array for the range mappings and added a comment

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1dd9b90ecd8e001b40febfb8908c0b9a0c08c7d5 17-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Handle COMPR4 in LOAD_PAYLOAD

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6d770ce93aacf29940bacb6fe2ae78cf716751dc 20-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Add split_virtual_grfs and compute_to_mrf after lower_load_payload

If we are going to use LOAD_PAYLOAD operations to fill MRF registers, then
we will need this.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9e1f52a6e2b0277de063a8d8b07c5e520795a23b 12-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use the GRF for UNTYPED_SURFACE_READ instructions

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d25aaf1cb1688b38b2a4025dbbff26d74291723c 12-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use the GRF for UNTYPED_ATOMIC instructions

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
48ddd2889e15aaf8ddb6dff5d8b6dc275f7f3f8d 17-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use exec_size instead of force_uncompressed in dump_instruction

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b18fd234da275a0ec6b3c5cb77497a4c487c6366 16-Aug-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use instruction execution sizes instead of heuristics

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8f1adb59659617a682988bc503b8a0a7077abb84 05-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Remove unneeded uses of force_uncompressed

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5f41d052bf53e32761fb528f4be99a1af3a33ebc 20-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Make fs_reg::effective_width take fs_inst* instead of fs_visitor*

Now that we have execution sizes, we can use that instead of the
dispatch width. This way it also works for 8-wide instructions in
SIMD16.

i965/fs: Make effective_width a variable instead of a function

i965/fs: Preserve effective width in constant propagation

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6ba31cc000b096a3b1fe0e0a935a3ab2aa6803d2 12-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Better guess the width of LOAD_PAYLOAD

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
071ac3a467479ce1ada1b86e2f65d4cc7d07753e 14-Aug-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Add an exec_size field to fs_inst

This will, eventually, allow us to manage execution sizes of
instructions in a much more natural way from the fs_visitor level.

i965/fs: Explicitly set instruction execute size a couple of places

i965/blorp: Explicitly set instruction execute sizes

Since blorp is all 16-wide and nothing isn't, in general, very careful
about register width, we'll just set it all explicitly.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fbc0a798eef49c366437014134c59e16c39c7f95 30-Aug-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Determine partial writes based on the destination width

Now that we track both halves of a 16-wide vgrf, we no longer need to worry
about force_sechalf or force_uncompressed. The only real issue is if the
destination is too small.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7210583eb84a5d49803dbe37b0960373b4224d10 18-Aug-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode

This is actually the squash of a bunch of different changes. Individual
commit titles follow:

i965/fs: Always 2-align registers SIMD16 for gen <= 5

i965/fs: Use the register width when applying offsets

This reworks both byte_offset() and offset() to be more intelligent.
The byte_offset() function now supports offsets bigger than 32. The
offset() function uses the byte_offset() function together with the
register width and the type size to offset the register by the correct
amount.

i965/fs: Change regs_read to be in hardware registers

i965/fs: Change regs_written to be actual hardware registers

i965/fs: Properly handle register widths in LOAD_PAYLOAD

The LOAD_PAYLOAD instruction is a bit special because it collects a
bunch of registers (with possibly different widths) into a single
payload block. Once the payload is constructed, it's treated as a
single block of data and most of the information such as register widths
doesn't matter anymore. In particular, the offset of any particular
source register is the accumulation of the sizes of the previous source
registers.

i965/fs: Properly set writemasks in LOAD_PAYLOAD

i965/fs: Handle register widths in demote_pull_constants

i965/fs: Get rid of implicit register doubling in the allocator

i965/fs: Reserve enough registers for PLN instructions

i965/fs: Make sources and destinations interfere in 16-wide

i965/fs: Properly handle register widths in CSE

i965/fs: Properly handle register widths in register_coalesce

i965/fs: Properly handle widths in copy propagation

i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD

i965/fs: Properly handle register widths and odd register sizes in spilling

i965/fs: Don't waste a register on texture lookups for gen >= 7

Previously, we were waisting a register in SIMD16 mode because we could
only allocate registers in pairs. Now that we can allocate and address
odd-sized registers, let's get rid of this special-case.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4232a776a699d80601496802ab2d817374a31f56 13-Aug-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Handle printing of registers better.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5390ca8ce93028d2d6016d4817e92427d09e4a21 25-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965: Explicitly set widths on gen5 math instruction destinations.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
004fbd53759a8993198883a32d93c9e3f6a65bbd 16-Aug-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Make half() divide the register width by 2 and use it more

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
24d023b9fe18847158ec6c14e1e0e32ff022f060 13-Aug-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Add a concept of a width to fs_reg

Every register in i965 assembly implicitly has a concept of a "width".
Usually, this is derived from the execution size of the instruction.
However, when writing a compiler it turns out that it is frequently a
useful to have the width explicitly in the register and derive the
execution size of the instruction from the widths of the registers used in
it.

This commit adds a width field to fs_reg along with an effective_width()
helper function. The effective_width() function tells you how wide the
register effectively is when used in an instruction. For example, uniform
values have width 1 since the data is not actually repeated, but when used
in an instruction they take on the width of the instruction. However, for
some instructions (LOAD_PAYLOAD being the notable exception), the width is
not the same.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4d5f0eb0487ad13e90f7248c95c023c35457eaf9 13-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Refactor fs_inst::is_send_from_grf()

A switch statement is much easier to read/edit than a big giant or
statement.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
72a3780f26951c405c35a1ae51598f7b0a65b92f 17-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Print BAD_FILE registers in dump_instruction

Sometimes these show up in LOAD_PAYLOAD instructions and it's nice to be
able to see them.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2af4b0aeaff53190b0e17a971119d1b77ddad25b 16-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Make compact_virtual_grfs an optimization pass

Previously we disabled compact_virtual_grfs when dumping optimizations.
The idea here was to make it easier to diff the dumped shader because you
didn't have a sudden renaming. However, sometimes a bug is affected by
compact_virtual_grfs and, when this happens, you want to keep dumping
instructions with compact_virtual_grfs enabled. By turning it into an
optimization pass and dumping it along with the others, we retain the
ability to diff because you can just diff against the compact_virtual_grf
output.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a25db10c1248d70cf7f4097833fa03fdccd98fe8 10-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i964/fs: Make immediate fs_reg constructors explicit

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f0d43c09b2fa32db66b7b6dc13becb0c7d3edeea 06-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use offset a lot more places

We have this wonderful offset() function for advancing registers, but we're
not using it. Using offset() allows us to do some sanity checking and
avoid manually touching fs_reg::reg_offset. In a few commits, we will make
offset do even more nifty things for us.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0089d025aa7f7497b3097c5067b589410cd40fbc 20-Aug-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: fix a comment in compact_virtual_grfs

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3dc3fccb7586e6198c50114d6245017fc9badde8 19-Aug-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Rewrite fs_visitor::split_virtual_grfs

The original vgrf splitting code was written with the assumption that vgrfs
came in two types: those that can be split into single registers and those
that can't be split at all It was very conservative and bailed as soon as
more than one element of a register was read or written. This won't work
once we start allowing a regular MOV or ADD operation to operate on
multiple registers. This rewrite allows for the case where a vgrf of size
5 may appropriately be split in to one register of size 1 and two registers
of size 2.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
75afe17b7954984ea5b55c2a6d5d124f5eb03328 26-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Manually generate the meta fast-clear shader

Previously, we were generating the fast-clear shader from GLSL. The
problem is that fast clears require that we use a replicated write rather
than a regular write instruction. In order to get this we had a
complicated and somewhat fragile optimization pass that looked for places
where we can use a replicated write and used it. Since replicated writes
have a lot of restrictions, we only ever use them for fast-clear
operations.

This commit replaces the optimization pass with a function that just
generates the shader we want. This is a) less code, b) less fragile than
the optimization pass, and c) generates a more efficient shader.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
49374fab5d793ed426e01f7fef82c87442c14860 02-Sep-2014 Matt Turner <mattst88@gmail.com> i965: Make instruction lists local to the bblocks.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
153d148e9e1f89a567b5079003b4b8070925ddcd 25-Aug-2014 Matt Turner <mattst88@gmail.com> i965: Replace initialization loops with memset().

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f0598d413bc8eb7ab02318f1db2dbd446a3c736c 02-Sep-2014 Matt Turner <mattst88@gmail.com> i965/fs: Don't iterate between blocks with inst->next/prev.

When instruction lists are per-basic block, this won't work.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2ff0ff880c14f246a419ae3949b2462617e485e1 01-Sep-2014 Matt Turner <mattst88@gmail.com> i965/fs: Don't use instruction list after calculating the cfg.

The only trick is changing a break into a return true in register
coalescing, since the macro is actually a double loop, and break will do
something different than you expect. (Wish I'd realized that earlier!)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a4fb8897a2bd00eefa8a503ec17d45e791bced91 01-Sep-2014 Matt Turner <mattst88@gmail.com> i965: Remove now unneeded calls to calculate_cfg().

Now that nothing invalidates the CFG, we can calculate_cfg() immediately
after emit_fb_writes()/emit_thread_end() and never again.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
072ea414d04f1b9a7bf06a00b9011e8ad521c878 01-Sep-2014 Matt Turner <mattst88@gmail.com> i965: Remove cfg-invalidating parameter from invalidate_live_intervals.

Everything has been converted to preserve the CFG.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a9f8296dbb4ee9fba22c4c2af625eaa29676f002 25-Aug-2014 Matt Turner <mattst88@gmail.com> i965/fs: Preserve the CFG in a few more places.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
517e01b5c3db9ba750698096e823134b288e213f 22-Sep-2014 Eric Anholt <eric@anholt.net> mesa: Move register_allocate.c to util.

The r300 gallium driver is using it outside of the Mesa tree, and I wanted
to do so for vc4 as well. Rather than make the multiple-definitions
problem even more complicated, just move it to more-shared code.

v2: Don't forget to delete the symlink in r300 (review by Matt).
Delete more r300-helper references (review by Emil)
Don't prefix util/ header inclusion with "util/" (review by Emil)

Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> (v1)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
19b08e1bb3a3508049d0527743b2f50f855a24c2 29-Aug-2014 Jordan Justen <jordan.l.justen@intel.com> i965/fs: Remove direct fs_visitor brw_wm_prog_key dependence

Instead we store a void pointer to the key, and cast it to
brw_wm_prog_key for fragment shader specific code paths.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
49e5f76a65978a6188572c0197523dd9c312ebeb 29-Aug-2014 Jordan Justen <jordan.l.justen@intel.com> i965/fs: Remove direct fs_visitor brw_wm_prog_data dependence

Instead we store a brw_stage_prog_data pointer, and cast it to
brw_wm_prog_data for fragment shader specific code paths.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
78bd12619474e98503965541c61c5d7e9c408110 13-Sep-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Mark delta_x/y as BAD_FILE if remapped away completely.

Commit afe3d1556f6b77031f7025309511a0eea2a3e8df (i965: Stop doing
remapping of "special" regs.) stopped remapping delta_x/delta_y, and
additionally stopped considering them always-live. We later realized
delta_x was used in register allocaiton, so we actually needed to remap
it, which was fixed in commit 23d782067ae834ad53522b46638ea21c62e94ca3
(i965/fs: Keep track of the register that hold delta_x/delta_y.).

However, that commit didn't restore the "always consider it live" part.
If all the code using delta_x was eliminated, fs_visitor::delta_x would
be left pointing at its old register number. Later code in register
allocation would handle that register number specially...even though it
wasn't actually delta_x.

To combat this, set delta_x/y to BAD_FILE if they're eliminated, and
check for that.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83127
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
936ca6f3cfb563719d8b51ae000d4f0594aba824 29-Aug-2014 Jordan Justen <jordan.l.justen@intel.com> i965: Add uses_kill to brw_wm_prog_data

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ecf6c2675783d369385b32a859b01491fb7fcf12 06-Sep-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Don't look at virtual_grf_sizes for uniforms

Uniform values are in the UNIFORM register file, not the GRF register file.
Looking in virtual_grf_sizes makes no sense and only makes the output of
dump_instructions confusing.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ef8477cddf9a6b1e13608e4fad9b55c86d0e5af4 05-Sep-2014 Matt Turner <mattst88@gmail.com> i965/fs: Fix basic block tracking in try_rep_send().

The 'start' instruction is always in the current block, except for the
case of shader time, which emits code in a pattern seen no where else.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
248eaff63d9a5484df1105a0c484d20e086f5f83 04-Sep-2014 Matt Turner <mattst88@gmail.com> i965/fs: Pass block to insert and remove functions missed earlier.

Otherwise, the basic block start/end IPs don't get updated properly,
leading to a broken CFG. This usually results in the following
assertion failure:

brw_fs_live_variables.cpp:141:
void brw::fs_live_variables::setup_def_use():
Assertion `ip == block->start_ip' failed.

Fixes KWin, WebGL demos, and a score of Piglit tests on Sandybridge and
earlier hardware.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
23e20f4687269f795e912a05bf12baaa94d0dd5a 29-Aug-2014 Jordan Justen <jordan.l.justen@intel.com> i965/fs: Use prog rather than fp->Base in fs_visitor

Reduce fs_visitor's dependence on gl_fragment_program.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f92fbd554f2e9e702a2bd650c9b2571a3f4f1ab8 02-Sep-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Move curb_read_length/total_scratch to brw_stage_prog_data.

All shader stages have these fields, so it makes sense to store them in
the common base structure, rather than duplicating them in each.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e8f83538dd4203befe63998b703afd2b488ad56a 29-Aug-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Don't segfault when debug-logging a null program

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cbfcb1b06992e4310683bb54a67d011b08010ec7 05-Aug-2014 Connor Abbott <connor.abbott@intel.com> i965/fs: don't pass ir_variable * to emit_samplepos_setup()

We were only using it to get at its type, which we already know because
it's a builtin variable.

Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ec3d06f591f9561289f7bc64a543a1e8a625faee 05-Aug-2014 Connor Abbott <connor.abbott@intel.com> i965/fs: don't pass ir_variable * to emit_frontfacing_interpolation()

We were only using it to get at its type, which we already know because
it's a builtin variable.

v2 (Ken): Rebase on Matt's optimized gl_FrontFacing calculations.

Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
70691f0c283ec4e03523f3a4690d9b897b36872e 30-Aug-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Fix GPU hangs when INTEL_DEBUG=no16 is set.

The replicated data clear shader needs to be SIMD16, or else the GPU
will hang. So, compile it even if INTEL_DEBUG=no16 is set.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
20a849b4aa63c7fce96b04de674a4c70f054ed9c 13-Jul-2014 Matt Turner <mattst88@gmail.com> i965: Use basic-block aware insertion/removal functions.

To avoid invalidating and recreating the control flow graph. Also stop
invalidating the CFG in places we didn't add or remove an instruction.

cfg calculations: 202951 -> 80307 (-60.43%)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a3d0ccb037082f3aa66bd558dfbe89f63a6eedd3 12-Jul-2014 Matt Turner <mattst88@gmail.com> i965: Pass a cfg pointer to generate_{code,assembly}.

The loop over all instructions is now two-fold, over all of the blocks
and all of the instructions in each block.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
19c6617adfec618889bb52d5398b8ac3d5969c18 10-Aug-2014 Matt Turner <mattst88@gmail.com> i965/fs: Optimize gl_FrontFacing calculation on Gen4/5.

Doesn't use fewer instructions, but it does avoid writing the flag
register and if we want to switch the representation of true for Gen4/5
in the future, we can just delete the AND instruction.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d1c43ed48777115072809fdb394bccae88ffe83c 10-Aug-2014 Matt Turner <mattst88@gmail.com> i965/fs: Optimize gl_FrontFacing calculation on Gen6+.

total instructions in shared programs: 4288650 -> 4282838 (-0.14%)
instructions in affected programs: 595018 -> 589206 (-0.98%)

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2e51dc838be177a09f60958da7d1d904f1038d9c 09-Aug-2014 Matt Turner <mattst88@gmail.com> i965: Use ~0 to represent true on Gen >= 6.

total instructions in shared programs: 4292303 -> 4288650 (-0.09%)
instructions in affected programs: 299670 -> 296017 (-1.22%)

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f9dc7aabb3273d6d8a54c6778a5695a8527f4454 08-Jul-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Add optimization pass to let us use the replicate data message

The data port has a SIMD16 'replicate data' message, which lets us write
the same color for all 16 pixels by sending the four floats in the
lower half of a register instead of sending 4 times 16 identical
component values in 8 registers.

The message comes with a lot of restrictions and could be made generally
useful by recognizing when those restriction are satisfied. For now,
this lets us enable the optimization when we know it's safe, but we don't
enable it by default. The optimization works for simple color clear shaders
only, but does recognized and support multiple render targets.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1effbf68983c924b3b70fd2fd9206af6b5475335 07-Jul-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Add an option to not generate the SIMD8 fragment shader

For now, this can only be triggered with a new 'no8' INTEL_DEBUG option
and a new context flag. We'll use the context flag later, but introducing
it now lets us bisect to this commit if it breaks something.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
35ca28816509a887538a6d0c62c96279b38ef8e4 15-Apr-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Add pass to rename registers to break live ranges.

The pass breaks live ranges of virtual registers by allocating new
registers when it sees an assignment to a virtual GRF it's already seen
written.

total instructions in shared programs: 4337879 -> 4335014 (-0.07%)
instructions in affected programs: 343865 -> 341000 (-0.83%)
GAINED: 46
LOST: 1

[mattst88]: Make pass not break in presence of control flow.
invalidate_live_intervals() only if progress.
Fix up delta_x/delta_y.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2c50212b14da27de4e3da62488ae4e35c069d84e 11-Aug-2014 Neil Roberts <neil@linux.intel.com> i965: Store uniform constant values in a gl_constant_value instead of float

The brw_stage_prog_data struct previously contained an array of float pointers
to the values of parameters. These were then copied into a batch buffer to
upload the values using a regular assignment. However the float values were
also being overloaded to store integer values for integer uniforms. This can
break if x87 floating-point registers are used to do the assignment because
the fst instruction tries to fix up invalid float values. If an integer
constant happened to look like an invalid float value then it would get
altered when it was copied into the batch buffer.

This patch changes the pointers to be gl_constant_value instead so that the
assignment should end up copying without any alteration. This also makes it
more obvious that the values being stored here are overloaded for multiple
types.

There are some static asserts where the values are uploaded to ensure that the
size of gl_constant_value is the same as a float.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81150
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c66d928f2c9fa59e162c391fbdd37df969959718 17-Jul-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Enable INTDIV in SIMD16 mode.

All we need to do is decompose this to two SIMD8 instructions, like we
do in many other cases. We even already have code for that.

I apparently just botched this last time I tried, and it was easy.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
24878f31c4287a6cc4cfd0fabc34075f9dad4e03 08-Jul-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Drop "do dual source blending" generator parameter.

When dual source blending, the visitor already stores a flag in
brw_wm_prog_data (dual_src_blend) for the state upload code to use.
The generator also receives this, so there's no need to pass an
additional flag.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f17bfc9ba954608c58fd0560f255e40eef7e7cea 11-Aug-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Never use the Gen8 code generators.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
074d472398b3cc7f32fe5c0cc742853cf66fabed 30-Jun-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Switch to the EU emit layer for code generation on Broadwell.

Everything should be in place to unify code generation between Gen4-7
and Gen8+. We should be able to drop the Gen8 generators at this point.

However, leave them hooked up for a brief moment, for testing and
comparison purposes. Set GEN8=1 to use the old Gen8+ code generator
paths.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
23d782067ae834ad53522b46638ea21c62e94ca3 11-Aug-2014 Matt Turner <mattst88@gmail.com> i965/fs: Keep track of the register that hold delta_x/delta_y.

They're needed in register allocation. Fixes a regression since
afe3d155.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78875
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0f4c5a70c6e759e3a7bddd7f1c2d2b8d219552a4 03-Aug-2014 Chris Forbes <chrisf@ijw.co.nz> i965: Get rid of backend_instruction::sampler

The generators no longer use this.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
72e55bb6888ff4d6b69b10d9c58573e4c3d492ec 25-Feb-2014 Kenneth Graunke <kenneth@whitecape.org> util: Move the open-addressing linear-probing hash_table to src/util.

This hash table is used in core Mesa, the GLSL compiler, and the i965
driver, which makes it a good candidate for the new src/util module.

It's much faster than program/hash_table.[ch] (see commit 6991c2922f5
for data), and José's u_hash_table.c has a comment saying Gallium should
probably consider switching to a linear probing hash table at some point.
So this seems like the best candidate for a shared data structure.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>

v2 (Jason Ekstrand): Pick up another hash_table use and patch up scons

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
38ffef7840edddada23bac48f669d2070e6f158c 18-Jul-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Fix gl_SampleID for 2x MSAA and SIMD16 mode.

We might be able to do this without an extra program key field, but this
is non-invasive and fixes the bug, for now.

This fixes the following Piglit tests on Broadwell:
- ARB_sample_shading/builtin-gl-sample-id 2
- ARB_sample_shading/builtin-gl-sample-position 2
- EXT_framebuffer_multisample/multisample-blit 2 color
- EXT_framebuffer_multisample/multisample-blit 2 color linear
- EXT_framebuffer_multisample/multisample-blit 2 depth
- EXT_framebuffer_multisample/no-color 2 depth combined
- EXT_framebuffer_multisample/no-color 2 depth separate
- EXT_framebuffer_multisample/no-color 2 depth single
- EXT_framebuffer_multisample/no-color 2 depth-computed combined
- EXT_framebuffer_multisample/no-color 2 depth-computed separate
- EXT_framebuffer_multisample/no-color 2 depth-computed single
- EXT_framebuffer_multisample/unaligned-blit 2 color msaa
- EXT_framebuffer_multisample/unaligned-blit 2 depth msaa

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5d9f5cd35b63e8d7fdb42a5ad26c53d2a19f6985 15-Jul-2014 Anuj Phogat <anuj.phogat@gmail.com> Revert "i965: Extend compute-to-mrf pass to understand blocks of MOVs"

This reverts commit bbefb15e01e1c16af69646898918982ae00f8c92.
Fixes the 11 regressions caused in framebuffer_blit tests in
Khronos GLES3 CTS tests:

Original patch reduced the instruction count but had no performance
benefits. So, it's safe to revert it without causing any performance
regressions.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cf1b5eee7f36af29d1d5caba3538ad4985e51f81 16-Jul-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Use WE_all for gl_SampleID header register munging.

This code should execute without regard to the currently executing
channels. Asking for gl_SampleID inside control flow might break in
strange ways. It appears to break even at the top of the program in
SIMD16 mode occasionally as well.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e5adc560cc8544200faa3e04504202839626ab37 11-Jul-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Set force_uncompressed and force_sechalf on samplepos setup.

gen8_fs_generator uses these to decide whether to set the execution size
to 8 or 16, so we incorrectly made both of these MOVs the full width in
SIMD16 shaders. (It happened to work out on Gen4-7.)

Setting them should also help inform optimization passes what's really
going on, which could help avoid bugs.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
58270c2fac493497ed7923830f49051a53e86a07 08-Jul-2014 Connor Abbott <cwabbott0@gmail.com> exec_list: Make various places use the new length() method.

Instead of hand-rolling it.

v2 [mattst88]: Rename get_size to length. Expand comment in ir_reader.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6e91f2df958c835a1973e32d71578fa295ef00a8 18-Nov-2013 Chris Forbes <chrisf@ijw.co.nz> i965/fs: add generator support for pixel interpolator query

V5: - Split into separate opcodes
- Pass message data in src1 immediate
- Put noperspective bit in fs_inst rather than adding any junk to
backend_instruction

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bbefb15e01e1c16af69646898918982ae00f8c92 08-Jul-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Extend compute-to-mrf pass to understand blocks of MOVs

The current compute-to-mrf pass doesn't handle blocks of MOVs. Shaders
that end with a texture fetch follwed by an fb write are left like this:

0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000010: send(8) g2<1>UW g6<8,8,1>F
sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q };
0x00000020: mov(8) g113<1>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000028: mov(8) g114<1>F g3<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000030: mov(8) g115<1>F g4<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000038: mov(8) g116<1>F g5<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000040: sendc(8) null g113<8,8,1>F
render ( RT write, 0, 4, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT };

This patch lets compute-to-mrf recognize blocks of MOVs and match them to
instructions (typically SEND) that writes multiple registers. With this,
the above shader becomes:

0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted };
0x00000010: send(8) g113<1>UW g6<8,8,1>F
sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q };
0x00000020: sendc(8) null g113<8,8,1>F
render ( RT write, 0, 20, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT };

which is the bulk of the shader db results:

total instructions in shared programs: 987040 -> 986720 (-0.03%)
instructions in affected programs: 844 -> 524 (-37.91%)
GAINED: 0
LOST: 0

The optimization also applies to MRT shaders that write the same
color value to multiple RTs, in which case we can eliminate four MOVs in
a similar fashion. See fbo-drawbuffers2-blend in piglit for an example.

No measurable performance impact. No piglit regressions.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ce706b4a9bd53fbe274687025965333541a0e70d 30-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Make a brw_predicate enum.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
46e5b2a497216133be656b38ebfcf96da64b7744 30-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Make a brw_conditional_mod enum.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3de11cacf0cb307ff3b4130746732d9db73d7583 30-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Use enum brw_reg_type for register types.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
34ef6a7651d6651e0bca77c4d4b890af582ad360 30-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Move is_zero/one/null/accumulator into backend_reg.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
53992a102ffddf2e0fad401252cfc1c034d022ad 30-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Use immediate storage in brw_reg for visitor regs.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
489ec685542590c7412db81623952c1aa75d946f 19-May-2014 Eric Anholt <eric@anholt.net> i965: Update a ton of comments about constant buffers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c0f1929dd23bbc558e9eef0f8fd40e10dfef3c21 19-May-2014 Eric Anholt <eric@anholt.net> i965: Move dispatch_grf_start_reg and first_curbe_grf into stage_prog_data.

I wanted to access this value from stage-generic code, so stop storing it
under two different names.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3d826729dabab53896cdbb1f453c76fab1c7e696 29-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Use unreachable() instead of unconditional assert().

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
266109736a9a69c3fdbe49fe1665a7a63c5cc122 25-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Use typed foreach_in_list_safe instead of foreach_list_safe.

Acked-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c5030ac0ac15d3c91c4352789f94281da9a9dcad 25-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Use typed foreach_in_list instead of foreach_list.

Acked-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e8e5f0a342505a4d10cbcdee03592c96d286b57c 24-Jun-2014 Matt Turner <mattst88@gmail.com> i965/fs: Use is_head_sentinel() instead of ->prev == NULL.

Makes it more clear what we're doing and requires less knowledge of
exec_list.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1bfc0a11027449ae7ab7c28eb695f26de530eccf 29-Jun-2014 Matt Turner <mattst88@gmail.com> i965/fs: Mark predicated PLN instructions with dependency hints.

To implement the unlit_centroid_workaround, previously we emitted

(+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 1Q };
(-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 1Q };

where the flag register contains the channel enable bits from g0.

Since the predicates are complementary, the pair of pln instructions
write to non-overlapping components of the destination, which is the
case that the dependency control hints are designed for.

Typically setting dependency control hints on predicated instructions
isn't safe (if an instruction doesn't execute due to the predicate, it
won't update the scoreboard, leaving it in a bad state) but since we
must have at least one channel executing (i.e., +f0 is true for some
channel) by virtue of the fact that the thread is running, we can put
the +f0 pln instruction last and set the hints:

(-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 NoDDClr 1Q };
(+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 NoDDChk 1Q };

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4fe53ee5d7c418d1ed51c5e8dfe5a2b1f48127a3 29-Jun-2014 Matt Turner <mattst88@gmail.com> i965/fs: Predicate PLN instructions used in unlit centroid WA.

Maybe lets us skip some PLN instructions if whole subspans are disabled?

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e58992aedd9693f0356f3691d510a5e976473a0c 29-Jun-2014 Matt Turner <mattst88@gmail.com> i965/fs: Pass const references to emit functions.

Cuts 10k of .text and saves a bunch of useless struct copies.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e4b05af5d42b192ead493bc6ef9061ae57390058 28-Jun-2014 Matt Turner <mattst88@gmail.com> i965/fs: Pass const references to instruction functions.

text data bss dec hex filename
4270747 123200 39648 4433595 43a6bb i965_dri.so
4244821 123200 39648 4407669 434175 i965_dri.so

Cuts 25k of .text and saves a bunch of useless struct copies.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
46659d46a8c2f7bbc8deb472faff2dccbde92d29 24-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Make can_do_source_mods() a member of the instruction classes.

Pretty nonsensical to have it as a method of the visitor just for access
to brw.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
48f1143c64e46b3d11dc318d7825b6167a2b78e5 23-Jun-2014 Matt Turner <mattst88@gmail.com> i965/fs: Don't fix_math_operand() on Gen >= 8.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fab92fa1cba4196a4947731e7105bd1494dfffc4 18-Apr-2014 Matt Turner <mattst88@gmail.com> i965/fs: Optimize SEL with the same sources into a MOV.

instructions in affected programs: 474 -> 462 (-2.53%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
138905d728fd1f4b38ff6a7137a5bbcac1d0875a 18-Apr-2014 Matt Turner <mattst88@gmail.com> i965/fs: Lower LOAD_PAYLOAD and clean up.

Clean up with with register_coalesce()/dead_code_eliminate().
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b996216384679e9bce5a62e417198da704c09c19 28-May-2014 Matt Turner <mattst88@gmail.com> i965/fs: Add SHADER_OPCODE_LOAD_PAYLOAD.

Will be used to simplify the handling of large virtual GRFs in SSA form.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
237aac39b1994b0fa1e8cd3490ad415b144a8b5f 09-Jun-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Invalidate live intervals when inserting Gen4 SEND workarounds.

We need to invalidate the live intervals when inserting new
instructions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ecc78eab119ac8fa3df380a80bc94975e986523c 09-Jun-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Don't use the head sentinel as an fs_inst in Gen4 workaround code.

When walking backwards, we want to stop at the head sentinel, which is
where scan_inst->prev->prev == NULL, not scan_inst->prev == NULL.

Fixes random crashes, as well as valgrind errors.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6e61892aea542593875ebb8ae209af18bbad84bd 05-Jun-2014 Iago Toral Quiroga <itoral@igalia.com> i965/fs: Let the gen < 8 generator know about runtime_check_aads_emit

In gen < 6 we need to produce conditional code based on this flag when doing
framebuffer writes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
84e0a5c406f2a8f060352eaa4b5c138e3f1a5a86 27-May-2014 Matt Turner <mattst88@gmail.com> i965/fs: Add fs_inst constructor that takes a list of sources.

Also add an emit() function that calls it.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
521f9b9a48da586ca3352cea7f8bf7c49741cf0d 20-Feb-2014 Matt Turner <mattst88@gmail.com> i965/fs: Add a function to resize fs_inst's sources array.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
07af0abef024f8a17a00975265eff79aa069c9b5 27-May-2014 Matt Turner <mattst88@gmail.com> i965/fs: Clean up fs_inst constructors.

In a fashion suggested by Ken.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b1dcdcde2e323f960833f5c7da65d5c2c20113c9 17-Mar-2014 Matt Turner <mattst88@gmail.com> i965/fs: Loop from 0 to inst->sources, not 0 to 3.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
27e12a8ea933e2f978e0ce9286422e6025c7377d 20-Feb-2014 Matt Turner <mattst88@gmail.com> i965/fs: Store the number of sources an fs_inst has.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1b60391ed48dc18b034fc3dc837919f4c8b7905c 20-Feb-2014 Matt Turner <mattst88@gmail.com> i965/fs: ralloc fs_inst's fs_reg sources.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6d3a15223aedaff26dd3aab900e02c8548956973 20-Feb-2014 Matt Turner <mattst88@gmail.com> i965/fs: Add and use an fs_inst copy constructor.

Will get more complicated when fs_reg src becomes a pointer.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
55bd8b8b660f983b486e699ca74fe5652297331d 07-Apr-2014 Matt Turner <mattst88@gmail.com> i965/fs: Debug the optimization passes by dumping instr to file.

With INTEL_DEBUG=optimizer, write the output of dump_instructions() to a
file each time an optimization pass makes progress. This lets you easily
diff successive files to see what an optimization pass did.

Example filenames written when running glxgears:
fs8-0000-00-start
fs8-0000-01-04-opt_copy_propagate
fs8-0000-01-06-dead_code_eliminate
fs8-0000-01-12-compute_to_mrf
fs8-0000-02-06-dead_code_eliminate
| | | |
| | | `-- optimization pass name
| | |
| | `-- optimization pass number in the loop
| |
| `-- optimization loop interation
|
`-- shader program number

Note that with INTEL_DEBUG=optimizer, we disable compact_virtual_grfs,
so that we can diff instruction lists across loop interations without
the register numbers being changes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e9bf1662b048e5927f841e84719a3180650a2b0a 29-May-2014 Matt Turner <mattst88@gmail.com> i965: Give dump_instructions() a filename argument.

This will allow debugging code to dump the IR after an optimization pass
makes progress (the next patch). Only let it open and write to a file if
the effective user isn't root.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
56d6dcf4f771d57d2759b2a5c5006f24444c696f 29-May-2014 Matt Turner <mattst88@gmail.com> i965: Give dump_instruction() a FILE* argument.

Use function overloading rather than default arguments, since gdb
doesn't know about default arguments.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c938be8ad272a06bc0e91c4e718b61a0c5de400e 17-Apr-2014 Matt Turner <mattst88@gmail.com> i965/fs: Don't use brw_imm_* unnecessarily.

Using brw_imm_* creates a source with file=HW_REG, and the scheduler
inserts barrier dependencies when it sees HW_REG. None of these are
hardware-registers in the sense that they're special and scheduling
shouldn't touch them. A few of the modified cases already have HW_REGs
for other sources, so it won't allow extra flexibility in some cases.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cd1c1d302b60bdcc131d0feb048c9bc03896ee2f 15-May-2014 Matt Turner <mattst88@gmail.com> i965/fs: Don't hardcode DEBUG_WM in generic fs code.

Similar to Paul's commit e9fa3a944 except brw_fs_generator's debug_flag
is for DEBUG_WM and DEBUG_BLORP.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1472584397f7b5ef70dfdffda0aab4a0a38a4db0 26-Jan-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Assume fragment color clamping is off when precompiling.

Modern applications frequencly use both UNORM buffers and FLOAT buffers
with color clamping disabled. (FLOAT with clamping explicitly enabled
and SNORM buffers appear to be less common.) We don't need to emit
saturates in the fragment shader in either of the common cases.

Mesa sets ctx->Color._ClampFragmentColor to false if all the color
buffers are UNORM. Also, for GL_FIXED_ONLY mode (the default in
legacy OpenGL), it will be false if any FLOAT buffers are bound.
Since the common case is false, that should be our default.

Thanks to Roland Scheidegger for pointing out some faulty logic
in v1 of this patch (unnecessary code and incorrect explanations).

v2: Drop superfluous code and reword commit message.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cca6dc9f0fd43db366730d67baae1affdca8c6de 14-May-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Rip struct brw_wm_compile out of the visitors and generators.

Instead, just pass the key and prog_data as separate parameters.

This moves it up a level - one step further toward getting rid of it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2d4ac9b5b825b745257e935dd9b33a2d3507c72a 14-May-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Plumb a mem_ctx all the way through the FS compile.

'c' is going away, but we still need a memory context that lives
for the duration of the compile.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
65b2df3ec8906c51ae5b28df9c0b2c71981080d0 14-May-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Replace c->key with a direct reference in fs_visitor.

'c' is going away. This is also shorter.

Marking the key pointer as const will also deter people from changing
it in fs_visitor, as it's absolutely not OK to modify it there.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8a04e0de8bbf4caf08c0759f2abaa94de64ee5fd 14-May-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Replace c->prog_data with a direct reference in fs_visitor.

'c' is going away. This is also a bit shorter.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
55f4e3a06b52c3e8b6bfad851e1d4e5243f1e2c0 14-May-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Move some flags that affect code generation to fs_visitor.

runtime_check_aads_emit isn't actually used currently, but I believe
we should be using it on Gen4-5, so I haven't eliminated it.
See https://bugs.freedesktop.org/show_bug.cgi?id=78679 for details.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8ef78828fadb0f35b07be93492b3d7c297bb9ffd 14-May-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Move payload register info from brw_wm_compile to fs_visitor.

This data is created by fs_visitor and only used when emitting code,
so keeping it in fs_visitor makes sense. I decided it would be
reasonable to group these all together in a struct, since they're
highly related.

v2: s/nr_payload_regs/payload.num_regs/ in some comments (chrisf).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c76e6db05f9256711a226de8562124a5f14aae2d 14-May-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Simplify gl_SampleMaskIn handling.

As far as I can tell, there's no point in allocating an extra register
and generating a MOV---we can just use the copy provided as part of our
thread payload directly. It's already in the right format.

Of course, there are zero Piglit tests for this. We don't actually ship
the extension (GL_ARB_gpu_shader5) that exposes this functionality
either.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5cd7cf58e66ebb4e87a7fe6bba3b43f062ace47f 14-May-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Rename c->sample_mask_reg to sample_mask_in_reg.

This is actually for gl_SampleMaskIn, which is quite different than
gl_SampleMask. Renaming should help avoid confusion.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
db9c915abcc5ad78d2d11d0e732f04cc94631350 14-May-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Move c->last_scratch into fs_visitor.

Nothing outside of fs_visitor uses it, so we may as well keep it
internal.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7e28bd797dbe1721e5d97916f041493d1f30220d 14-May-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Move total_scratch calculation into fs_visitor::run().

With this one use gone, c->last_scratch is now only used inside
fs_visitor. The rest of the driver uses prog_data->total_scratch.

We already compute similar prog_data fields in fs_visitor, so this
seems reasonable.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c51163b0cf7aff0375b1a5ea4cb3da9d9e164044 14-May-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Move perf_debug about register spilling to a more obvious spot.

The if (!allocated_without_spills) block is an obvious spot for this
performance warning message.

In the Vec4 backend, scratch is also used for indirect access of
temporary arrays. The FS backend doesn't implement that yet, but
if it did, this message would be inaccurate, since scratch access
wouldn't necessarily mean spilling. Moving it preemptively fixes that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
afe3d1556f6b77031f7025309511a0eea2a3e8df 07-May-2014 Eric Anholt <eric@anholt.net> i965: Stop doing remapping of "special" regs.

Now that we aren't using pixel_[xy] in live variables, nothing is looking
at these regs after the visitor stage.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
da0c3b02e71c7552ba9324a01a73602094105fcc 28-Mar-2014 Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> i965/fs: Add support for the MAC instruction.

This allows us to generate the MAC (multiply-accumulate) instruction,
which can be used to implement some expressions in fewer instructions
than doing a series of MUL and ADDs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
306ed81b9363721058c568244f9860c5c8c819f4 04-Apr-2014 Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> i965: Add writes_accumulator flag

Our hardware has an "accumulator" register, which can be used to store
intermediate results across multiple instructions. Many instructions
can implicitly write a value to the accumulator in addition to their
normal destination register. This is enabled by the "AccWrEn" flag.

This patch introduces a new flag, inst->writes_accumulator, which
allows us to express the AccWrEn notion in the IR. It also creates a
n ALU2_ACC macro to easily define emitters for instructions that
implicitly write the accumulator.

Previously, we only supported implicit accumulator writes from the
ADDC, SUBB, and MACH instructions. We always enabled them on those
instructions, and left them disabled for other instructions.

To take advantage of the MAC (multiply-accumulate) instruction, we
need to be able to set AccWrEn on other types of instructions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
30c35d1dcb2fde19b1c968751fda5151b795d257 09-Apr-2014 Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> i965: Add is_accumulator() function.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
73400d8f70bab8549fb4cbcdc9ba905bf93b8716 14-Apr-2014 Matt Turner <mattst88@gmail.com> i965/fs: Remove dead_code_eliminate_local().

Subsumed by the new dead_code_eliminate() function. No shader-db
changes.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f34f39330bb41fb0a86930908de10353193a841d 13-Apr-2014 Matt Turner <mattst88@gmail.com> i965/fs: Reimplement dead_code_elimination().

total instructions in shared programs: 1653399 -> 1651790 (-0.10%)
instructions in affected programs: 92157 -> 90548 (-1.75%)
GAINED: 2
LOST: 2

Also significantly reduces the number of optimization loop iterations:

total loop iterations in shared programs: 39724 -> 31651 (-20.32%)
loop iterations in affected programs: 21617 -> 13544 (-37.35%)

Including some great pathological cases, like 29 -> 3 in Strike Suit
Zero and 24 -> 3 in Dota2.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6230b646a5a3f100f4d3dc05dff6c3ace85ee96c 26-Mar-2014 Eric Anholt <eric@anholt.net> i965/fs: Track whether we're doing dual source in a more obvious way.

I'm going to be turning dual_src_output into an array in a moment.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
14b85e3a47b19ffe9c96f67b43f780f8abc86061 01-Apr-2014 Eric Anholt <eric@anholt.net> i965/fs: Add a couple more global special regs to special[]

Nothing bad came of this because they weren't used after visitor running,
but leaving them in a bad state seems like a recipe for pain later.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4303d26f93919914fa58c0418a3935235d5ae359 26-Mar-2014 Eric Anholt <eric@anholt.net> i965/fs: Handle arrays of special regs more cleanly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
72b845e6409ad353af1abd420162917dadda5a7e 26-Mar-2014 Eric Anholt <eric@anholt.net> i965/fs: Fix dump_instructions() on uniforms.

All of a vec4 uniform was being printed as "u0"

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6bda3a526759f655cad62178b491264584119ae1 07-Apr-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Fix "SIMD16 unsupported" messages via KHR_debug.

Performance warnings are logged via KHR_debug in addition to when the
INTEL_DEBUG=perf environment variable is set. Without this, messages in
debug contexts would have "(null)" for the reason.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0fbcdec2f6d6fd98db82c680d8bae8eee77ff9f2 08-Feb-2014 Matt Turner <mattst88@gmail.com> i965/fs: Split fs_visitor::register_coalesce() into its own file.

The function has gotten large, and brw_fs.cpp is the largest source file
in the driver.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8b1ab5c93bfcc22b7f50a5c10958e43d0571f8a0 27-Mar-2014 Matt Turner <mattst88@gmail.com> i965/fs: Mark appropriate fs_inst members as const.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
26012c16737a8542316062ef17fa9a0b34e274b7 26-Mar-2014 Matt Turner <mattst88@gmail.com> i965/fs: Recalculate live intervals in calculate_register_pressure().

Otherwise calling dump_instructions() after declaring a new fs_reg would
segfault when calculate_register_pressure()'s loop over reg walked off
the end of the virtual_grf_start[] array that calculate_live_intervals()
would have reallocated for you, if it had known there was a new
register.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0d99aef6c8a940e52afcbffa7091ff9c854ba120 24-Mar-2014 Eric Anholt <eric@anholt.net> i965: Fix compiler warning about signed/unsigned.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
de7ad2c88f4ec243c95eaed22c41d0e537912e01 07-Mar-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Accurately bail on SIMD16 compiles.

Ideally, we'd like to never even attempt the SIMD16 compile if we could
know ahead of time that it won't succeed---it's purely a waste of time.
This is especially important for state-based recompiles, which happen at
draw time.

The fragment shader compiler has a number of checks like:

if (dispatch_width == 16)
fail("...some reason...");

This patch introduces a new no16() function which replaces the above
pattern. In the SIMD8 compile, it sets a "SIMD16 will never work" flag.
Then, brw_wm_fs_emit can check that flag, skip the SIMD16 compile, and
issue a helpful performance warning if INTEL_DEBUG=perf is set. (In
SIMD16 mode, no16() calls fail(), for safety's sake.)

The great part is that this is not a heuristic---if the flag is set, we
know with 100% certainty that the SIMD16 compile would fail. (It might
fail anyway if we run out of registers, but it's always worth trying.)

v2: Fix missing va_end in early-return case (caught by Ilia Mirkin).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v1]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b207e88b25e526d0f1ada7b19605b880a27866dc 08-Mar-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Support pull parameters in SIMD16 mode.

This is just a matter of reusing the pull/push constant information set
up by the SIMD8 compile.

This gains us 78 SIMD16 programs in shader-db.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
229319e0f0f872cfb19de3eb0ab620ca611d65d8 12-Mar-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Use a single instance of the pull_constant_loc[] array.

Now that we don't renumber uniform registers, assign_constant_locations
and move_uniform_array_access_to_pull_constants use the same names.
So, they can share a single copy of the pull_constant_loc[] array.

This simplifies the code considerably. assign_constant_locations()
doesn't need to walk through pull_params[] to rediscover reladdr
demotions; it just has that information in pull_constant_loc[]. We also
only need to rewrite the instruction stream once, instead of twice.

Even better, we now have a single array describing the layout of
all pull parameters, which we can pass to the SIMD16 program.

This actually hurts a few shaders in Serious Sam 3, and one in KWin:
total instructions in shared programs: 1841957 -> 1842035 (0.00%)
instructions in affected programs: 1165 -> 1243 (6.70%)
Comparing dump_instructions() before and after the pull constant
transformations with and without this patch, it appears that there is
a uniform array with variable indexing (reladdr) and constant indexing
(of array element 0). Previously, we uploaded array element 0 as both
a pull constant (for reladdr) /and/ a push constant.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
542f2e47f2f22522b963a7ab1f8b485d1c9985ba 11-Mar-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Don't renumber UNIFORM registers.

Previously, remove_dead_constants() would renumber the UNIFORM registers
to be sequential starting from zero, and the resulting register number
would be used directly as an index into the params[] array.

This renumbering made it difficult to collect and save information about
pull constant locations, since setup_pull_constants() and
move_uniform_array_access_to_pull_constants() used different names.

This patch generalizes setup_pull_constants() to decide whether each
uniform register should be a pull constant, push constant, or neither
(because it's unused). Then, it stores mappings from UNIFORM register
numbers to params[] or pull_params[] indices in the push_constant_loc
and pull_constant_loc arrays. (We already did this for pull constants.)

Then, assign_curb_setup() just needs to consult the push_constant_loc
array to get the real index into the params[] array.

This effectively folds all the remove_dead_constants() functionality
into assign_constant_locations(), while being less irritable to work
with.

v2: Add assert(remapped <= i), requested by Topi.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d9f339eccd87413d9f6bf6dd6217db01630f12f8 10-Mar-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Split pull parameter decision making from mechanical demoting.

move_uniform_array_access_to_pull_constants() and setup_pull_constants()
both have two parts:

1. Decide which UNIFORM registers to demote to pull constants, and
assign locations.
2. Mechanically rewrite the instruction stream to pull the uniform
value into a temporary VGRF and use that, eliminating the UNIFORM
file access.

In order to support pull constants in SIMD16 mode, we will need to make
decisions exactly once, but rewrite both instruction streams.
Separating these two tasks will make this easier.

This patch introduces a new helper, demote_pull_constants(), which
takes care of rewriting the instruction stream, in both cases.

For the moment, a single invocation of demote_pull_constants can't
safely handle both reladdr and non-reladdr tasks, since the two callers
still use different names for uniforms due to remove_dead_constants()
remapping of things. So, we get an ugly boolean parameter saying
which to do. This will go away.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2163e0fd5a6bf2ac95aef331c30f010cb6e39cab 08-Mar-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Record pull constant locations for all array elements.

When demoting a variably indexed uniform array to pull constants, we
only recorded the location for the base of the array (element 0).

Recording locations for all array elements is a trivial amount of code
and will make subsequent refactoring easier.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7c7627781feca0c8738da66425d6c530ea598dc4 07-Mar-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Save push constant location information.

Previously, both move_uniform_array_access_to_pull_constants() and
setup_pull_constants() maintained stack-local arrays with this
information. Storing this information will allow it to be used from
multiple functions, allowing us to split and move code around.

We'll also eventually want to pass pull constant location information
to the SIMD16 compile. Saving this information will help us do that.

Unfortunately, the two functions *cannot* share the contents of the
array just yet. remove_dead_constants() renumbers all the UNIFORM
registers to be contiguous starting at zero, so the two functions
talk about uniforms using different names. We can't even remap them,
since move_uniform_array_access_to_pull_constants() deletes UNIFORM
registers that are only accessed with reladdr, so remove_dead_constants
can't even see them.

This situation will improve in the next few patches.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
de77efde91401919fe7282a4b07300a10185792b 11-Mar-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Delete dead code to fail compiles with SIMD16 pull parameters.

The SIMD8 compile will determine whether pull parameters are necessary.
If so, it will set prog_data->nr_pull_params to a value greater than 0.

brw_wm_fs_emit checks if nr_pull_params > 0 and skips the SIMD16 compile
altogether. So, this code should never occur.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7554539d7ebbed5f5048ddeadaf5a5dc6e2ce2a6 11-Mar-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Invalidate live intervals when demoting uniforms to pull params.

Normally, nothing uses live intervals at this point, so this isn't
necessary. However, dump_instructions() calculates them and uses them
to show register pressure. So, calling dump_instructions() in this area
of the code would segfault due to the arrays being the wrong size.

This is not a candidate for stable branches because it only serves to
fix internal debugging code that you manually have to invoke by altering
the source code or using gdb.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
13782dcf9d34a1bd276312cdecc44deb8f7caafd 11-Mar-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Print "+reladdr" on variably-indexed uniform arrays.

Previously, dump_instruction() would print output such as:
{ 2} 3: mov vgrf1:F, u0:F
{ 3} 4: mov vgrf7:F, u0:F
{ 4} 5: mov vgrf8:F, u0:F
which looked like either a scalar access or perhaps a constant-indexed
access of element 0, when it was really a variable index.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
01d9023a9b9a50b42f7a4ef4799d0e35e0b045ca 11-Mar-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Fix register types in dump_instructions(), again.

In commit e57d77280efcbfd6579a88f071426653287ef833, I fixed this for
destinations in the Vec4 backend, and sources in the scalar backend.
But not both types in both backends.

To prevent this mess from continuing, make the reg_encoding table
static, so only the disassembler can use it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a76e5dce4fc8d50f8699c108833f24e80167d706 23-Dec-2013 Eric Anholt <eric@anholt.net> i965: Move compiler debugging output to stderr.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f28c9208652143b4925bd97ce9823728c34d34a5 21-Feb-2014 Eric Anholt <eric@anholt.net> i965: Refactor debug dumping of GLSL IR.

This was only going to get worse when tesselation shows up, and was
causing too much extra duplication in my stderr changes coming up.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7770b026937948e1be3ed55f9ff97e6521c500df 22-Feb-2014 Matt Turner <mattst88@gmail.com> Revert "i965/fs: Make fs_reg's type an enum for better debugging."

This reverts commit 5ceadd29b0af835d741bcf09b9622c628e549ae6.

I rebased and apparently failed to build test.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75355
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
760c6777a0530b4894dec564cdf218f5364b4df1 22-Feb-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Drop the emit(fs_inst) overload.

Using this emit function implicitly creates three copies, which
is pointlessly inefficient.

1. Code creates the original instruction.
2. Calling emit(fs_inst) copies it into the function.
3. It then allocates a new fs_inst and copies it into that.

The second could be eliminated by changing the signature to

fs_inst(const fs_inst &)

but that wouldn't eliminate the third. Making callers heap allocate the
instruction and call emit(fs_inst *) allows us to just use the original
one, with no extra copies, and isn't much more of a burden.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
326fc60ee9457d17fb97a7f49c977743426b0859 20-Feb-2014 Matt Turner <mattst88@gmail.com> i965/fs: Pass fs_regs by constant reference where possible.

These functions (modulo emit_lrp, necessitating the small fix-up) pass
these arguments by value unmodified to other functions. No point in
making an additional copy.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
070f20272fcfdcafe5d843d240e876ef5cfda560 20-Feb-2014 Matt Turner <mattst88@gmail.com> i965/fs: Move setting opcode = NOP to its one useful location.

All other callers of init() immediately set opcode to something else.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5ceadd29b0af835d741bcf09b9622c628e549ae6 20-Feb-2014 Matt Turner <mattst88@gmail.com> i965/fs: Make fs_reg's type an enum for better debugging.

Since the enum is marked as packed, it'll still take only one byte.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c2ebbe2728cd709029313f4b9c9cc53432c510a1 20-Feb-2014 Eric Anholt <eric@anholt.net> i965: Stop throwing away our double precision for time calculations.

Fixes negative times being reported in our perf debug.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9e3cab8881626edd72d222f35c5d2a5fd9661bce 15-Feb-2014 Eric Anholt <eric@anholt.net> i965/fs: Add an optimization pass to remove redundant flags movs.

We generate steaming piles of these for the centroid workaround, and this
quickly cleans them up.

total instructions in shared programs: 1591228 -> 1590047 (-0.07%)
instructions in affected programs: 26111 -> 24930 (-4.52%)
GAINED: 0
LOST: 0

(Improved apps are l4d2, csgo, and dolphin)

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
eef710fc53113a5b3d6bbf7d9a20f63d7add7911 19-Feb-2014 Francisco Jerez <currojerez@riseup.net> i965/fs: Use a separate variable to keep track of the last uniform index seen.

Like the VEC4 back-end does. It will make dynamic allocation of the
param_size array easier in a future commit.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6f56d5dc6047d0f926706e28fe1d809622c5b7e3 08-Dec-2013 Francisco Jerez <currojerez@riseup.net> i965/fs: Remove fs_reg::retype.

There doesn't seem to be any reason for it to be a method, and it's
surprising that the expression 'reg.retype(t)' doesn't retype its
object but rather it creates a temporary with the new type. Use
'retype(reg, t)' instead.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ae8b066da5862b4cfc510b3a9a0e1273f9f6edd4 19-Feb-2014 Francisco Jerez <currojerez@riseup.net> i965: Move up duplicated fields from stage-specific prog_data to brw_stage_prog_data.

There doesn't seem to be any reason for nr_params, nr_pull_params,
param, and pull_param to be duplicated in the stage-specific
subclasses of brw_stage_prog_data. Moving their definition to the
common base class will allow some code sharing in a future commit, the
removal of brw_vec4_prog_data_compare and brw_*_prog_data_free, and
the simplification of the stage-specific brw_*_prog_data_compare.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
422679835479a053d5b5ac9cf75e2fbb7e827755 15-Feb-2014 Eric Anholt <eric@anholt.net> i965/fs: Drop dead comment about the old proj_attrib_mask optimization.

The code was removed early last year.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a3a55067bdf608402aeb98d515c52e2436a8f226 15-Jan-2014 Francisco Jerez <currojerez@riseup.net> i965/fs: Remove fs_reg::sechalf.

The same effect can be achieved using ::subreg_offset. Remove the
less flexible alternative and define a convenience function to keep
the fs_reg interface sane.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
019bf6ed8dd4843512e9d4924f4702ce36047ad5 15-Jan-2014 Francisco Jerez <currojerez@riseup.net> i965/fs: Remove fs_reg::smear.

The same effect can be achieved using a combination of ::stride and
::subreg_offset. Remove the less flexible ::smear to keep the data
members of fs_reg orthogonal.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
756d37b1d6d09ad7ee3b8835888a49d4256e427b 08-Dec-2013 Francisco Jerez <currojerez@riseup.net> i965/fs: Add support for specifying register horizontal strides.

v2: Some improvements for copy propagation with non-contiguous
register strides and mismatching types.
v3: Add example of the situation that the copy propagation changes are
intended to avoid. Clarify that 'fs_reg::apply_stride()' is expected
to work with zero strides too.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4c7206bafdd7bde7617e14840812e43459682718 08-Dec-2013 Francisco Jerez <currojerez@riseup.net> i965/fs: Add support for sub-register byte offsets to the FS back-end IR.

It would be nice if we could have a single 'reg_offset' field
expressed in bytes that would serve the purpose of both, but the
semantics of 'reg_offset' are quite complex currently (it's measured
in units of one, eight or sixteen dwords depending on the register
file and the dispatch width) and changing it to bytes would be a very
intrusive change at this stage. Add a separate 'subreg_offset' field
for now.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8a2508ee0726b349318c1e05122edbe5a545480a 25-Nov-2013 Francisco Jerez <currojerez@riseup.net> glsl: Add image type to the GLSL IR.

v2: Reuse the glsl_sampler_dim enum for images. Reuse the
glsl_type::sampler_* fields instead of creating new ones specific
to image types. Reuse the same constructor as for samplers adding
a new 'base_type' argument.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d3e948340be3fe61d3724f1b96651c2097b4026e 07-Feb-2014 Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> i965: Add missing null check in fs_visitor::dead_code_eliminate_local()

Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e57d77280efcbfd6579a88f071426653287ef833 05-Feb-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Fix register types in dump_instructions().

This regressed when I converted BRW_REGISTER_TYPE_* to be an abstract
type that doesn't match the hardware description. dump_instruction()
was using reg_encoding[] from brw_disasm.c, which no longer matches
(and was incorrect for Gen8+ anyway).

This patch introduces a new function to convert the abstract enum values
into the letter suffix we expect.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5eeb12c0bcd3d25fee9749d797f8541a96935192 25-Jan-2014 Chris Forbes <chrisf@ijw.co.nz> i965/fs: Assume FBO rendering in precompile if MRT.

If multiple color outputs are written, this shader is unlikely to be
useful with a winsys framebuffer.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
046f8d8a6fcae641ed0e7e06e24ab5da39a57c86 25-Jan-2014 Chris Forbes <chrisf@ijw.co.nz> i965/fs: Guess nr_color_regions better in precompile

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
947c828d5cbffe9640ac63103a6223112eeff27f 12-Dec-2013 Matt Turner <mattst88@gmail.com> i965/fs: Add a saturation propagation optimization pass.

Transforms, for example,

mul vgrf3, vgrf2, vgrf1
mov.sat vgrf4, vgrf3

into

mul.sat vgrf3, vgrf2, vgrf1
mov vgrf4, vgrf3

which gives register_coalescing an opportunity to remove the MOV
instruction.

total instructions in shared programs: 1515039 -> 1504634 (-0.69%)
instructions in affected programs: 798586 -> 788181 (-1.30%)
GAINED: 0
LOST: 4

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ce527a6722491fa7d696266d5dec13f0b72bf8e8 10-Dec-2013 Topi Pohjolainen <topi.pohjolainen@intel.com> i965: rename tex_ms to tex_cms

Prepares for the introduction of non-compressed multi-sampled
lookup used in the blorp programs.

v2: now also taking into account gen8

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f5cfb4ae21df8eebfc6b86c0ce858b1c0a9160dd 15-Jan-2014 Anuj Phogat <anuj.phogat@gmail.com> i965: Ignore 'centroid' interpolation qualifier in case of persample shading

This patch handles the use of 'centroid' qualifier with 'in' variables
in a fragment shader when persample shading is enabled. Per sample
shading for the whole fragment shader can be enabled by:
glEnable(GL_SAMPLE_SHADING) or using {gl_SamplePosition, gl_SampleID}
builtin variables in fragment shader. Explaining it below in more
detail.

/* Enable sample shading using OpenGL API */
glEnable(GL_SAMPLE_SHADING);
glMinSampleShading(1.0);

Example fragment shader:
in vec4 a;
centroid in vec4 b;
main()
{
...
}

Variable 'a' will be interpolated at sample location. But, what
interpolation should we use for variable 'b' ?

ARB_sample_shading recommends interpolation at sample position for
all the variables. GLSL 400 (and earlier) spec says that:

"When an interpolation qualifier is used, it overrides settings
established through the OpenGL API."
But, this text got deleted in later versions of GLSL.

NVIDIA's and AMD's proprietary linux drivers (at OpenGL 4.3)
interpolates at sample position. This convinces me to use
the similar approach on intel hardware.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a92e5f7cf63d496ad7830b5cea4bbab287c25b8e 06-Jan-2014 Anuj Phogat <anuj.phogat@gmail.com> i965: Use sample barycentric coordinates with per sample shading

Current implementation of arb_sample_shading doesn't set 'Barycentric
Interpolation Mode' correctly. We use pixel barycentric coordinates
for per sample shading. Instead we should select perspective sample
or non-perspective sample barycentric coordinates.

It also enables using sample barycentric coordinates in case of a
fragment shader variable declared with 'sample' qualifier.
e.g. sample in vec4 pos;

A piglit test to verify the implementation has been posted on piglit
mailing list for review.

V2: Do not interpolate all the 'in' variables at sample position
if fragment shader uses 'sample' qualifier with one of them.
For example we have a fragment shader:
#version 330
#extension ARB_gpu_shader5: require
sample in vec4 a;
in vec4 b;
main()
{
...
}

Only 'a' should be sampled at sample location, not 'b'.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bf0773aeca86669371d99eadb928c6dc92d5840a 10-Jan-2014 Matt Turner <mattst88@gmail.com> i965/fs: Optimize LRP with x == y into a MOV.

total instructions in shared programs: 1487331 -> 1485988 (-0.09%)
instructions in affected programs: 45638 -> 44295 (-2.94%)
GAINED: 7
LOST: 0

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
413622fbefb63c54d331ce5d708479ab847e6709 15-Dec-2013 Matt Turner <mattst88@gmail.com> i965/fs: Print the maximum register pressure.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
391eaa59bd2b71078a28ff34dd3d4eed470653ee 05-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Show register pressure in dump_instructions() output.

Dumping the number of live registers at each IP allows us to see
register pressure and identify any local maxima. This should
aid in debugging passes designed to reduce register pressure, as
well as optimizations that suddenly trigger spilling.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3b74f4b2333704bc7dbe5714e1f2aa4d201669ee 05-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Compute the number of live registers at each IP.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0ea600ef1ada70bc2280909d86abe29dfd3e8f73 16-Dec-2013 Matt Turner <mattst88@gmail.com> i965/fs: Call opt_peephole_sel later in the optimization loop.

Calling it after value numbering (added in the next commit) prevents
some instruction count regressions.

total instructions in shared programs: 1524387 -> 1523905 (-0.03%)
instructions in affected programs: 13112 -> 12630 (-3.68%)
GAINED: 0
LOST: 3

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ede6c341f686def647bf8ee4912e759b3d9933a6 16-Dec-2013 Matt Turner <mattst88@gmail.com> i965/fs: Calculate interference better in register_coalesce.

Previously we simply considered two registers whose live ranges
overlapped to interfere. Cases such as

set A ------
... |
mov B, A -- |
... | B | A
use B -- |
... |
use A ------

would be considered to interfere, even though B is an unmodified copy of
A whose live range fit wholly inside that of A.

If no writes to A or B occur between the mov B, A and the use of B then
we can safely coalesce them.

Instead of removing MOV instructions, we make them NOPs and remove them
at once after the main pass is finished in order to avoid recomputing
live intervals (which are needed to perform the previous step).

total instructions in shared programs: 1543768 -> 1513077 (-1.99%)
instructions in affected programs: 951563 -> 920872 (-3.23%)
GAINED: 46
LOST: 22

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4a7d0c550e28ae3d434da81c9029272d22fa315e 11-Dec-2013 Matt Turner <mattst88@gmail.com> i965/fs: Support coalescing registers of size > 1.

total instructions in shared programs: 1550048 -> 1549880 (-0.01%)
instructions in affected programs: 1896 -> 1728 (-8.86%)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9bb4d71fd2ff8ed24cb4d1485df1f1ff667bcb3c 11-Dec-2013 Matt Turner <mattst88@gmail.com> i965/fs: Add a comment explaining how register coalescing works.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
71bc11a37508542662132b16a53acd5f541cd2b4 05-Dec-2013 Matt Turner <mattst88@gmail.com> i965: Print reg_offset for vgrf of size > 1 in dump_instruction().

Previously we wouldn't print the +0 for the first part of a VGRF of size
greater than 1.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
11f6882e1daf73cead8bc9febe5e29ada98f4add 07-Dec-2012 Kenneth Graunke <kenneth@whitecape.org> i965: Create a new fragment shader backend for Broadwell.

This replaces the old fs_generator backend.

v2: Port to the C-based representation of assembly instructions.
Fix texturing after the texture-grf merge.

v3: Add high quality derivative support. Fix SET_SIMD4X2_OFFSET.

v4: Pass brw_context to gen8_instruction functions as required.

v5: Fixes for MRT, as well as zero render targets (alpha test only).

v6: Replace n-wide with SIMDn in comments and messages; port over
Topi's blorp-generator changes; add missing TXF_MCS opcode,
fix missing high quality derivatives for DDX; fix typo (all caught
by Eric). Simplify ADDC/SUBB handling; drop "Used only on Gen6+"
comment (caught by Matt). Emit SIMD16 versions of three source
instructions (caught by both Eric and Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
746e3e3b3ad20a29ee6de64d663d2dc11deac06e 13-Nov-2013 Eric Anholt <eric@anholt.net> i965: Replace 8-wide and 16-wide with SIMD8 and SIMD16.

Those are the terms used in the docs, and think "n-wide" was something I
just happened to say. Note that shader-db needs updating for the
INTEL_DEBUG=fs parsing.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
26a3bf5c726199d7664d5878ef1f73592e55caa7 28-Nov-2013 Eric Anholt <eric@anholt.net> i965: Stop doing our optimization on a copy of the GLSL IR.

The original intent was that we'd keep a driver-private copy, and there
would be the normal copy for swrast to make use of without the tuning (or
anything more invasive we might do) specific to i965. Only, we don't
generate swrast code any more, because swrast can't render current shaders
anyway. Thus, our private copy is rather a waste, and we can just do our
backend-specific operations on the linked shader.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
544869377d6ec8c150d4d91d17a01f22cd84d479 08-Dec-2013 Chris Forbes <chrisf@ijw.co.nz> i965/fs: add support for gl_SampleMaskIn[]

v2: - add assert so we don't run into trouble on Gen6.
- adjust for Tapani's rearrangement of ir_variable

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
447bb9029f7e03b03e507053b9f63536d8fc74ac 12-Dec-2013 Tapani Pälli <tapani.palli@intel.com> glsl: move variables in to ir_variable::data, part II

This patch moves following bitfields and variables to the data
structure:

explicit_location, explicit_index, explicit_binding, has_initializer,
is_unmatched_generic_inout, location_frac, from_named_ifc_block_nonarray,
from_named_ifc_block_array, depth_layout, location, index, binding,
max_array_access, atomic

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
33ee2c67c0a4e8f2fefbf37dacabd14918060af5 12-Dec-2013 Tapani Pälli <tapani.palli@intel.com> glsl: move variables in to ir_variable::data, part I

This patch moves following bitfields in to the data structure:

used, assigned, how_declared, mode, interpolation,
origin_upper_left, pixel_center_integer

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c1d3080ee86cd3d914712ffe0bb533c5d6a6b271 11-Dec-2013 Tapani Pälli <tapani.palli@intel.com> glsl: introduce data section to ir_variable

Data section helps serialization and cloning of a ir_variable. This
patch includes the helper bits used for read only ir_variables.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7629c489c88a6f6dd47b311a90ad64e216c9a37c 29-Nov-2013 Chris Forbes <chrisf@ijw.co.nz> i965: Add shader opcode for sampling MCS surface

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d30b2ed5f83841531b4c5aa21bde50acad35560a 23-Oct-2013 Matt Turner <mattst88@gmail.com> i965/fs: New peephole optimization to flatten IF/BREAK/ENDIF.

total instructions in shared programs: 1550713 -> 1550449 (-0.02%)
instructions in affected programs: 7931 -> 7667 (-3.33%)

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
13de9f03f177d3ae0921fded1a102b66130f8b40 23-Oct-2013 Matt Turner <mattst88@gmail.com> i965/fs: New peephole optimization to generate SEL.

fs_visitor::try_replace_with_sel optimizes only if statements whose
"then" and "else" bodies contain a single MOV instruction. It also
could not handle constant arguments, since they cause an extra MOV
immediate to be generated (since we haven't run constant propagation,
there are more than the single MOV).

This peephole fixes both of these and operates as a normal optimization
pass.

fs_visitor::try_replace_with_sel is still arguably necessary, since it
runs before pull constant loads are lowered.

total instructions in shared programs: 1559129 -> 1545833 (-0.85%)
instructions in affected programs: 167120 -> 153824 (-7.96%)
GAINED: 13
LOST: 6

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fa227e7cbca279cd70ea7028a33d520579385f9f 23-Oct-2013 Matt Turner <mattst88@gmail.com> i965/fs: Add SEL() convenience function.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8814806c97ed60c5bb4d6cb1927cd05445864388 21-Oct-2013 Matt Turner <mattst88@gmail.com> i965: Print conditional mod in dump_instruction().

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
637dda1c307aee921ecc646b75f891deab6585a9 02-Dec-2013 Matt Turner <mattst88@gmail.com> i965: Print argument types in dump_instruction().

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
942151af300e067f72572cd8785fa3526132570c 26-Nov-2013 Matt Turner <mattst88@gmail.com> i965/fs: Print ARF registers properly in dump_instruction().

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0e4053234df5e3461e80c90dfd743c3ac96006eb 26-Nov-2013 Matt Turner <mattst88@gmail.com> i965: Don't print extra (null) arguments in dump_instruction().

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
04d83396eef7a8c8603f55bc0a0b04c80a9f6cf5 30-Nov-2013 Matt Turner <mattst88@gmail.com> i965/fs: Rename register_coalesce_2() -> register_coalesce().

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9a6b14f6745206eb018c8474feafae4bafdcb8e5 30-Nov-2013 Matt Turner <mattst88@gmail.com> i965/fs: Remove now useless register_coalesce() pass.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1520ae48b880d9bee287583d15ac40c89d0ced8b 29-Nov-2013 Matt Turner <mattst88@gmail.com> i965/fs: Let register_coalesce_2() eliminate self-moves.

This is the last thing that register_coalesce() still handled.

total instructions in shared programs: 1561060 -> 1560908 (-0.01%)
instructions in affected programs: 15758 -> 15606 (-0.96%)

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c4815f6cd6f659acd361f1b4cf63473a46ca7de9 26-Nov-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Always reserve binding table space for at least one render target.

In brw_update_renderbuffer_surfaces(), if there are no color draw
buffers, we always set up a null render target at surface index 0 so we
have something to use with the FB write marking the end of thread.

However, when we recently began computing surface indexes dynamically,
we failed to reserve space for it. This meant that the first texture
would be assigned surface index 0, and our closing FB write would
clobber the texture.

Fixes Piglit's EXT_packed_depth_stencil/fbo-blit-d24s8 test on Gen4-5,
which regressed as of commit 4e5306453da6a1c076309e543ec92d999e02f67a
("i965/fs: Dynamically set up the WM binding table offsets.")

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70605
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Tested-by: lu hua <huax.lu@intel.com>
Cc: "10.0" mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4f64dabb5fc0361a86146ce095c11131f14dfc49 23-Nov-2013 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix misleading comment.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
46cf80fb366cb14827724a7fea004e81400cc602 19-Nov-2013 Eric Anholt <eric@anholt.net> i965/fs: Make the first pre-allocation heuristic be the post heuristic.

I recently made us try two different things that tried to reduce register
pressure so that we would be more likely to allocate successfully. But
now that we have the logic for trying two, we can make the first thing we
try be the normal, not-prioritizing-register-pressure heuristic.

This means one less scheduling pass in the common case of that heuristic
not producing spills, plus the best schedule we know how to produce, if
that one happens to succeed. This is important, because our register
allocation produces a lot of possibly avoidable dependencies for the
post-register-allocation schedule, despite ra_set_allocate_round_robin().

GLB2.7: 1.04127% +/- 0.732461% fps improvement (n=31)
nexuiz: No difference (n=5)
lightsmark: 0.838512% +/- 0.300147% fps improvement (n=86)
minecraft apitrace: No difference (n=15)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a97cd0f4d7902965d5173f4bcbf2ad27c0eb5d12 30-Oct-2013 Matt Turner <mattst88@gmail.com> i965: Add a pass to remove dead control flow.

Removes IF/ENDIF and IF/ELSE/ENDIF with no intervening instructions.

total instructions in shared programs: 1360393 -> 1360387 (-0.00%)
instructions in affected programs: 157 -> 151 (-3.82%)

(no change in vertex shaders)

Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9793fc1335f11b4131d6db680bec567dcfccfb5f 15-Nov-2013 Matt Turner <mattst88@gmail.com> i965/fs: Use source's original type in register_coalesce().

Previously, register_coalesce() would modify

mov vgrf1:f vgrf2:f
cmp null vgrf3:d vgrf1:d

to be

cmp null vgrf3:d vgrf2:f

and incorrectly use vgrf2's type in the instruction that the mov was
coalesced into.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ec8cc65926de3e7391f3bcec8ee26fc8f4d36159 02-Jan-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Remove force_sechalf stack

Only Gen4 color write setup uses the force_sechalf flag, and it only
sets it on a single instruction. It also already has to get a pointer
to the instruction and manually set the saturate flag, so we may as well
just set force_sechalf the same way and avoid the complexity of a stack.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e133c0103d4336c47911e89cc8a17a1c78bfdbb8 14-Nov-2013 Matt Turner <mattst88@gmail.com> i965: Assert that IF with cmod is Gen6 only.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e9daead784921e453906853a4a78a2f3135af2e0 07-Nov-2013 Eric Anholt <eric@anholt.net> i965/fs: Try a different pre-scheduling heuristic if the first spills.

Since LIFO fails on some shaders in one particular way, and non-LIFO
systematically fails in another way on different kinds of shaders, try
them both, and pick whichever one successfully register allocates first.
Slightly prefer non-LIFO in case we produce extra dependencies in register
allocation, since it should start out with fewer stalls than LIFO.

This is madness, but I haven't come up with another way to get unigine
tropics to not spill while keeping other programs from not spilling and
retaining the non-unigine performance wins from texture-grf.

total instructions in shared programs: 1626728 -> 1626288 (-0.03%)
instructions in affected programs: 1015 -> 575 (-43.35%)
GAINED: 50
LOST: 0

Improves Unigine Tropics performance by 14.5257% +/- 0.241838% (n=38)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fbd8303a943d0d491b7c2415eb237a0731c7dec5 07-Nov-2013 Eric Anholt <eric@anholt.net> i965/fs: Do instruction pre-scheduling just before register allocation.

Long ago, the HW_REG usage in assign_curb/urb_setup() were scheduling
barriers, so we had to run scheduler before them in order for it to be
able to do basically anything. Now that that's fixed, we can delay the
scheduling until we go to allocate (which will make the next change less
scary).

Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f7e15fcf56595aac99644292386a6e6d06dc6ec0 26-Oct-2013 Chris Forbes <chrisf@ijw.co.nz> i965/fs: Gen4-5: Implement alpha test in shader for MRT

V2: Add comment explaining what emit_alpha_test() is for;
fix spurious temp and bogus whitespace.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ca82ba90dd7ef78be2b95972dc19913c76d5e6a8 27-Oct-2013 Chris Forbes <chrisf@ijw.co.nz> i965/fs: Gen4-5: Setup discard masks for MRT alpha test

The same setup is required here as when the user-provided shader
explicitly uses KIL or discard.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
34fe051e215107dddbaae71e2edf15f88d839936 20-Oct-2013 Francisco Jerez <currojerez@riseup.net> i965: Add a 'has_side_effects' back-end instruction predicate.

This patch fixes the three dead code elimination passes and the
VEC4/FS instruction scheduling passes so they leave instructions with
side effects alone.

At some point it might be interesting to have the instruction
scheduler calculate the exact memory dependencies between atomic ops,
but they're rare enough that it seems unlikely that it will make any
practical difference.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e12bbb503f71b60b9f212e82fdd3ed9aaf3ab318 25-Oct-2013 Anuj Phogat <anuj.phogat@gmail.com> i965: Add FS backend for builtin gl_SampleID

V2:
- Update comments
- Add compute_sample_id variables in brw_wm_prog_key
- Add a special backend instruction to compute sample_id.

V3:
- Make changes to support simd16 mode.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
65d0452bbc14c69ecd2cffdb38f711cfbaab348e 25-Oct-2013 Anuj Phogat <anuj.phogat@gmail.com> i965: Add FS backend for builtin gl_SamplePosition

V2:
- Update comments.
- Add compute_pos_offset variable in brw_wm_prog_key.
- Add variable uses_pos_offset in brw_wm_prog_data.

V3:
- Make changes to support simd16 mode.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3c28b2c09f491bfa55dc9e5d7858a8b900c25432 28-Oct-2013 Matt Turner <mattst88@gmail.com> i965/fs: Optimize saturating SEL.G(E) with imm val <= 0.0f.

Only one program's instruction count is changed, but a shader in Tropics
is also affected.

instructions in affected programs: 326 -> 320 (-1.84%)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ca675b73d3ac2e1b57ec385c2c80b05b6382f6b6 28-Oct-2013 Matt Turner <mattst88@gmail.com> i965/fs: Optimize saturating SEL.L(E) with imm val >= 1.0.

total instructions in shared programs: 1409124 -> 1406971 (-0.15%)
instructions in affected programs: 158376 -> 156223 (-1.36%)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a8f76d829bdcdb5f238ba6206f1b768098745022 28-Oct-2013 Matt Turner <mattst88@gmail.com> i965/fs: Optimize OR with identical sources into a MOV.

Helps a lot of Steam games.

total instructions in shared programs: 1409360 -> 1409124 (-0.02%)
instructions in affected programs: 20842 -> 20606 (-1.13%)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
20d0297ff2d507aab42e59ebfde375d5205642cb 20-Oct-2013 Matt Turner <mattst88@gmail.com> i965/fs: Add reads_flag() and writes_flag() to fs_inst.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f768f998e0e5885c36af1efee6ca70fdf90deb96 22-Oct-2013 Matt Turner <mattst88@gmail.com> i965/fs: Add is_null() method to fs_reg.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6032261682388ced64bd33328a5025f561927a38 16-Oct-2013 Eric Anholt <eric@anholt.net> i965: Merge together opcodes for SHADER_OPCODE_GEN4_SCRATCH_READ/WRITE

I'm going to be introducing gen7 variants, and the previous naming was
going to get confusing.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
44ec2f1751ec4a9f0ba9035f2343ffe5e16e693c 16-Oct-2013 Eric Anholt <eric@anholt.net> i965/fs: Fix broken register spilling debug code.

Now that reg spilling generates new vgrfs, we were looping forever if you
ever turned it on.

Instead, move the debug code into the register allocator right near where
we'd be doing spilling anyway, which should more accurately reflect how
register spilling occurs in the wild.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
774b787d6b7abe601309cf437b09b592fea0394d 29-Oct-2013 Eric Anholt <eric@anholt.net> i965/fs: Drop our dead push constants before overflowing to pull constants.

The idea of the original order was that you'd dead code eliminate accesses
to push constants. But I've never seen a case of that (nor has
shader-db), while we frequently see sparse accesses of large constant
arrays that would overflow into pull constants.

Cuts pull constant use on csgo, serious sam, planeshift, and the cave:

total instructions in shared programs: 1695103 -> 1688795 (-0.37%)
instructions in affected programs: 92024 -> 85716 (-6.85%)
GAINED: 339
LOST: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5e621cb9fef7eada5a3c131d27f5b0b142658758 11-Sep-2013 Francisco Jerez <currojerez@riseup.net> i965/gen7: Implement code generation for untyped surface read instructions.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cfaaa9bbb7a6ab5819f4fa9e38352b72d6293cff 11-Sep-2013 Francisco Jerez <currojerez@riseup.net> i965/gen7: Implement code generation for untyped atomic instructions.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
26db3b933f7fbc81d6c2bead2a8b0479a3691424 20-Oct-2013 Francisco Jerez <currojerez@riseup.net> glsl: Add new atomic_uint built-in GLSL type.

v2: Fix GLSL version in which the type became available. Add
contains_atomic() convenience method. Split off atomic counter
comparison error checking to a separate patch that will handle all
opaque types. Include new ir_variable fields for atomic types.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6bb2cf2107c4461ea9dd100edaf110b839311b90 08-Oct-2013 Chris Forbes <chrisf@ijw.co.nz> i965: Add SHADER_OPCODE_TG4_OFFSET for gather with nonconstant offsets.

The generator code ends up clearer this way than if we had to sniff
via the message length. Implemented via the gather4_po message in
hardware, which is present in Gen7 and later.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
43b05b8fac68784bc8d61851125bd49783e5ebd0 20-Oct-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Only emit interpolation setup if there are actual FS inputs.

Dead code elimination would get rid of the extra instructions, but
skipping this saves iterations through the optimization loop.

From shader-db:

N Min Max Median Avg Stddev
x 14672 3 16 3 3.1334515 0.59904168
+ 14672 1 16 3 2.8955153 0.77732963
Difference at 95.0% confidence
-0.237936 +/- 0.0158798
-7.59342% +/- 0.506783%
(Student's t, pooled s = 0.693935)

Embarassingly, the classic shadow mapping shader:

void main() { }

used to require three iterations through the optimization loop.
With this patch, it only requires one (which makes no progress).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
197f3a33fbce525e8f7799466935304d9e24c0f1 09-Oct-2013 Matt Turner <mattst88@gmail.com> i965/fs: Handle printing HW_REGS in dump_instruction().

Scheduling debugging now prints:

Instructions before scheduling (reg_alloc 1)
0: linterp vgrf20, hw_reg2, hw_reg3, hw_reg4,
1: linterp vgrf21, hw_reg2, hw_reg3, hw_reg4+16,

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
caf9cef7eee77f736ff76a65f385bf718efd1dc1 01-Sep-2013 Paul Berry <stereotype441@gmail.com> i965/fs: Remove bogus field prog_data->dispatch_width.

Despite the name, this field wasn't being set to the dispatch width at
all; it was always 8. The only place it was used was that the
constant buffer read length was aligned to it, and as far as I can
tell from the docs, there is no need to align this value to the
dispatch width; aligning it to a multiple of 8 is sufficient. So I've
just replaced it with a hardcoded 8.

v2: In gen6_wm_state, use brw->wm.base.push_const_size for consistency
with VS and GS state upload.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
705a90e30435490c2de84f4f6741cab335fa7608 03-Oct-2013 Eric Anholt <eric@anholt.net> i965: Move the common binding table offset code to brw_shader.cpp.

Now that both vec4 and fs are dynamically assigning offsets, a lot of the
code is the same.

v2: Avoid passing around the next offset through the class. (Review by
Paul)

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4e5306453da6a1c076309e543ec92d999e02f67a 03-Oct-2013 Eric Anholt <eric@anholt.net> i965/fs: Dynamically set up the WM binding table offsets.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3c9dc2d31b80fc73bffa1f40a91443a53229c8e2 02-Oct-2013 Eric Anholt <eric@anholt.net> i965: Make a brw_stage_prog_data for storing the SURF_INDEX information.

It would be nice to be able to pack our binding table so that programs
that use 1 render target don't upload an extra BRW_MAX_DRAW_BUFFERS - 1
binding table entries. To do that, we need the compiled program to have
information on where its surfaces go.

v2: Rename size to size_bytes to be more explicit.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a5ec01fb1bd4ad5418eb16cb05e6f6929d1444e8 20-Sep-2013 Matt Turner <mattst88@gmail.com> i965: Don't copy prop source mods into instructions that can't take them.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
36fbe66d3a71df76fcb6f915846da4471b3a8442 10-Oct-2013 Eric Anholt <eric@anholt.net> i965/fs: Convert gen7 to using GRFs for texture messages.

Looking at Lightsmark's shaders, the way we used MRFs (or in gen7's
case, GRFs) was bad in a couple of ways. One was that it prevented
compute-to-MRF for the common case of a texcoord that gets used
exactly once, but where the texcoord setup all gets emitted before the
texture calls (such as when it's a bare fragment shader input, which
gets interpolated before processing main()). Another was that it
introduced a bunch of dependencies that constrained scheduling, and
forced waits for texture operations to be done before they are
required. For example, we can now move the compute-to-MRF
interpolation for the second texture send down after the first send.

The downside is that this generally prevents
remove_duplicate_mrf_writes() from doing anything, whereas previously
it avoided work for the case of sampling from the same texcoord twice.
However, I suspect that most of the win that originally justified that
code was in avoiding the WAR stall on the first send, which this patch
also avoids, rather than the small cost of the extra instruction. We
see instruction count regressions in shaders in unigine, yofrankie,
savage2, hon, and gstreamer.

Improves GLB2.7 performance by 0.633628% +/- 0.491809% (n=121/125, avg of
~66fps, outliers below 61 dropped).

Improves openarena performance by 1.01092% +/- 0.66897% (n=425).

No significant difference on Lightsmark (n=44).

v2: Squash in the fix for register unspilling for send-from-GRF, fixing a
segfault in lightsmark.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b6af650a095034eaa2de93bf6cf2985d7fdfce89 30-Apr-2013 Eric Anholt <eric@anholt.net> i965/fs: Use per-channel interference for register_coalesce_2().

This will let us coalesce into texture-from-GRF arguments, which would
otherwise be prevented due to the live interval for the whole vgrf
extending across all the MOVs setting up the channels of the message

v2 (Kenneth Graunke): Rebase for renames.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3093085847db0455a88e45f20e29660b2b7f8515 05-Jun-2012 Eric Anholt <eric@anholt.net> i965/fs: Use the new per-channel live ranges for dead code elimination.

v2 (Kenneth Graunke): Rebase on s/live_variables/live_intervals/g.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3ea84beb1687f20074efdb1bcc790370bed2fc65 07-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Invalidate live intervals when compacting; don't fix them.

When compacting the list of VGRFs, we patch up the live interval ranges
(which are indexed by VGRF number). Unfortunately, once we make
per-component data available, this will become too complicated to
maintain. Instead, simply invalidate them.

This was pulled out of a patch by Eric Anholt.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4b821a97b5fcdc4c530d5455c43196be09830322 06-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Create a helper function for invalidating live intervals.

For now, this simply sets live_intervals_valid = false, but in the
future it will do something more sophisticated.

Based on a patch by Eric Anholt.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a26e17a36503c5387447cd560c81dbea6f2d89f9 26-Sep-2013 Chia-I Wu <olv@lunarg.com> i965: keep SecHalf flag after register coalescing

Copy sechalf to the new register, otherwise we would read wrong HW registers.

Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b645913ff6c74228d8c05dd236a545ef2e734071 28-Sep-2013 Matt Turner <mattst88@gmail.com> i965: Remove the "ARF" register file.

The registers in the architecture register file don't share much in
common, so there's no point in grouping them together. Use the HW_REG
class instead. The vec4 backend already does this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e7dc88026a821a31bf2afeb934dded11c91401a1 20-Sep-2013 Matt Turner <mattst88@gmail.com> i965: Fixup for don't dead-code eliminate instructions that write to the accumulator.

Accidentally pushed an old version of the patch.

v2: Set destination register using brw_null_reg().
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
92dc16c3e2e2b9e3e71baaccc67bbe727e9d68ab 20-Sep-2013 Matt Turner <mattst88@gmail.com> i965: Don't dead-code eliminate instructions that write to the accumulator.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
014cce3dc49f5b0bfd7fbb1940ed661c9fc7bbd7 19-Sep-2013 Matt Turner <mattst88@gmail.com> i965: Generate code for ir_binop_carry and ir_binop_borrow.

Using the ADDC and SUBB instructions on Gen7.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fb455500bfb11cca0f45076a9eaccc0ddd764731 31-Mar-2013 Chris Forbes <chrisf@ijw.co.nz> i965: add SHADER_OPCODE_TG4

Adds the Gen7 message IDs, a new SHADER_OPCODE_TG4 pseudo-op, and
low-level support for emitting it via generate_tex().

V3: Updated for changes in master.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
848c0e72f36d0e1e460193a2d30b2f631529156f 12-Sep-2013 Chia-I Wu <olv@lunarg.com> i965: compute DDX in a subspan based only on top row

Consider only the top-left and top-right pixels to approximate DDX in a 2x2
subspan, unless the application requests a more accurate approximation via
GL_FRAGMENT_SHADER_DERIVATIVE_HINT or this optimization is disabled from the
new driconf option disable_derivative_optimization.

This results in a less accurate approximation. However, it improves the
performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at 95.0%
confidence) on Haswell. No noticeable image quality difference observed.

The improvement comes from faster sample_d. It seems, on Haswell, some
optimizations are introduced to allow faster sample_d when all pixels in a
subspan have the same derivative. I considered SAMPLE_STATE too, which allows
one to control the quality of sample_d on Haswell. But it gave much worse
image quality without giving better performance comparing to this change.

No piglit quick.tests regression on Haswell (tested with v1).

v2: better guess for precompile program key

Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0e72db9f9729b8fe62213452751fed1cd337a7bc 11-Sep-2013 Francisco Jerez <currojerez@riseup.net> mesa: Fix misplaced includes of "main/uniforms.h".

Several C++ source files include "main/uniforms.h" from an extern "C"
block, which is both unnecessary, because "uniforms.h" already checks
for a C++ compiler and sets the right linkage, and incorrect, because
the header file includes other C++ headers ("glsl_types.h" and
"ir_uniform.h") that are supposed to get C++ linkage.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
875972029eddfd53cb90a8e34e9f27b2afed119f 03-Sep-2013 Paul Berry <stereotype441@gmail.com> i965/fs: When >64 input components, order them to match prev pipeline stage.

Since the SF/SBE stage is only capable of performing arbitrary
reorderings of 16 varying slots, we can't arrange the fragment shader
inputs in an arbitrary order if there are more than 16 input varying
slots in use. We need to make sure that slots 16-31 match the
corresponding outputs of the previous pipeline stage.

The easiest way to accomplish this is to just make all varying slots
match up with the previous pipeline stage.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a4546ec114853235db375b20fb47ddcd6a7f21e7 03-Sep-2013 Paul Berry <stereotype441@gmail.com> i965/fs: Simplify computation of key.input_slots_valid during precompile.

The for loop was rather silly. In addition to checking brw->gen < 6
on each loop iteration, it took pains to exclude bits from
fp->Base.InputsRead that don't correspond to fragment shader inputs.
But those bits would never have been set in the first place, since the
only bits that are ever set in fp->Base.InputsRead are fragment shader
inputs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3a83b20dcccf21ec184e35bcfa9bc577379dfd51 03-Sep-2013 Paul Berry <stereotype441@gmail.com> i965/fs: Stop wasting input attribute space on gl_FragCoord and gl_FrontFacing.

Previously, if a fragment shader accessed gl_FragCoord or
gl_FrontFacing, we would assign them their own slots in the fragment
shader input attribute array, using up space that could be made
available to real varyings. This was not strictly necessary (since
these values are not true varyings, and are instead computed from
other data available in the FS payload). But we had to do it anyway
because the SF/SBE setup code assumed that every 1 bit in the
gl_program::InputsRead bitfield corresponded to a genuine varying
variable.

Now that the SF/SBE code consults brw_wm_prog_data and only sets up
the attributes that the fragment shader actually needs, we don't have
to do this anymore.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8c69eaba1a8a5e8a82112eb5c51b2f8978dd2c23 03-Sep-2013 Paul Berry <stereotype441@gmail.com> i965/fs: Change brw_wm_prog_data::urb_read_length to num_varying_inputs.

On gen4-5, the FS stage reads varying inputs from URB entries that
were output by the SF thread, where each register stores the
interpolation setup for two components of a vec4, therefore the FS
urb_read_length is twice the number of FS input varyings. On gen6+,
varying inputs are directly deposited in the FS payload by the SF/SBE
fixed function logic, so urb_read_length is irrelevant.

However, in future patches, it will be nice to be able to consult
brw_wm_prog_data to determine how many varying inputs the FS expects
(rather than inferring it from gl_program::InputsRead). So instead of
storing urb_read_length, we simply store num_varying_inputs in
brw_wm_prog_data. On gen4-5, we multiply this by 2 to recover the URB
read length.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
58f01bd17d5587c21d7f543b8f3769f3405dc420 03-Sep-2013 Paul Berry <stereotype441@gmail.com> i965/fs: Expose "urb_setup" as part of brw_wm_prog_data.

At the moment, for Gen6+, the FS assumes that all varying inputs are
delivered to it in the order in which they appear in the
gl_program::InputsRead bitfield, and the SF/SBE setup code ensures
that they are delivered in this order.

When we add support for more than 64 varying components, this will no
longer always be possible, because the Gen6+ SF/SBE stage is only
capable of performing arbitrary reorderings of 16 varying slots.

To allow extra flexibility in the ordering of FS varyings, this patch
causes the FS to advertise exactly what ordering it expects.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4b3c0a797f89830fd5ba0943b061abf4fc38337e 02-Sep-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Use brw_stage_state for WM data as well.

This gets the VS, GS, and PS all using the same data structure.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a35b32025011eeac01f2e5a476dbf3ac132a61b3 28-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Detect GRF sources in split_virtual_grfs send-from-GRF code.

It is incorrect to assume that src[0] of a SEND-from-GRF opcode is the
GRF. For example, FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD uses src[1] for
the GRF.

To be safe, loop over all the source registers and mark any GRFs. We
probably won't ever have more than one, but it's simpler to just check
all three rather than attempting to bail early.

Not observed to fix anything yet, but likely to. Parallels the bug fix
in the previous commit, which actually does fix known failures.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
530842127eabd41a809ee4d7136ff52857a4e685 24-Apr-2013 Matt Turner <mattst88@gmail.com> i965/fs: Add support for translating ir_triop_fma into MAD.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a4ff1fd388369dbf80d324c84502b28b5f9d3da4 15-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Shorten sampler loops in precompile key setup.

Now that we have the number of samplers available, we don't need to
iterate over all 16. This should be particularly helpful for vertex
shaders.

v2: Use the correct shader program (caught by Paul Berry).

This needs to initialize the exact same set of sampler swizzles as
the actual key setup, or else we end up doing recompiles due to some
being XYZW and others being 0.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9c48ae751ab28f35eb878551d24c071be0ce11b0 09-Aug-2013 Matt Turner <mattst88@gmail.com> i965: Don't copy propagate bitcasts with source modifiers.

Previously, copy propagation would cause bitcast_f2u(abs(float)) to
be performed in a single step, but the application of source modifiers
(abs, neg) happens after type conversion, leading to incorrect results.

That is, for bitcast_f2u(abs(float)) we would in fact generate code to
do abs(bitcast_f2u(float)).

For example, whereas bitcast_f2u(abs(float)) might result in a register
argument such as
(abs)g2.2<0,1,0>UD

v2: Set interfered = true and break in register_coalesce instead of
returning false.

Reviewed-by: Paul Berry <stereoytpe441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4d95efd14617d4a96a89d8e52d0cf684a5d6c4b1 05-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Add dump_instruction() support for ARF destinations.

CMP instructions use BRW_ARF_NULL as a destination. Prior to this
patch, dump_instruction() decoded the destination as "???".

Now it decodes BRW_ARF_NULL as "(null)" and other ARFs numerically.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ee7bfab06805bff508c31b3ad3fb13d181f3fbf1 05-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Remove extraneous newline in dump_instruction() for CMP.

This resulted in printouts like:

246: cmp.cmod.f0.0
???, vgrf152, 0.000000f, (null),

With this patch, CMP is properly printed on one line.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2c32c3985ca6232a81d21feb9ac6443145b42d0e 06-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Consider predicated SEL instructions as whole variable writes.

The instruction

(+f0.0) SEL dst, src0, src1

will write either src0 or src1 to dst, depending on the predicate.
Unlike most predicated instructions, it always writes to dst.

fs_inst::is_partial_write() is supposed to return true if the whole
register is guaranteed to be written. The !inst->predicated check makes
sense for most instructions, which might not write the whole register,
but SEL is a special case.

This caused live interval analysis to ignore the destination of
predicated SEL instructions when computing "def" information.

Requires the previous commit to avoid regressions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
53d8cff63b30326eaaafe3019d00354d4775a622 04-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Log a performance warning if skipping 16-wide due to pulls.

Usually, the driver creates both 8-wide and 16-wide variants of every
fragment shader. When 16-wide compilation fails, it logs a performance
warning explaining why only an 8-wide program exists.

However, when there are pull parameters, the driver won't even bother
trying the 16-wide compile (since it would fail). In this case, it
failed to emit a performance warning, leaving no explanation for the
missing 16-wide program.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
21922cb70d0a2de23f6080c8b9c4324cba5a2fff 06-Jul-2013 Chris Forbes <chrisf@ijw.co.nz> i965 Gen4/5: Generalize SF interpolation setup for GLSL1.3

Previously the SF only handled the builtin color varying specially.
This patch generalizes that support to cover user-defined varyings,
driven by the interpolation mode array set up alongside the VUE map.

Based on the following patches from Olivier Galibert:
- http://lists.freedesktop.org/archives/mesa-dev/2012-July/024335.html
- http://lists.freedesktop.org/archives/mesa-dev/2012-July/024339.html

With this patch, all the GLSL 1.3 interpolation tests that do not clip
(spec/glsl-1.30/execution/interpolation/*-none.shader_test) pass.

V5: Move key.do_flat_shading to brw_sf_compile.has_flat_shading; drop
vestigial hunks.
V6: Real bools.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
53631be4ebaa4fb13a7f129727c1cdd32fcc6f3d 06-Jul-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Move intel_context::gen and gt fields to brw_context.

Most functions no longer use intel_context, so this patch additionally
removes the local "intel" variables to avoid compiler warnings.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
794de2f3873bcedc78300b3ba69656adc755894c 06-Jul-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Move intel_context::is_<platform> flags to brw_context.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b15f1fc3c6b3b9dc4422940c412f80e581c9900d 03-Jul-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Move intel_context::perf_debug to brw_context.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
329779a0b45b63be17627f026533c80b2c8f7991 03-Jul-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Move intel_context::batch to brw_context.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
426ca34b7a2c3b9edfc0189daece8de3aff80627 13-Jun-2013 Eric Anholt <eric@anholt.net> glsl: Remove ir_print_visitor.h includes and usage

We have ir->print() to do the old declaration of a visitor and having the
IR accept the visitor (yuck!). And now you can call _mesa_print_ir()
safely anywhere that you know what an ir_instruction is.

A couple of missing printf("\n")s are added in error paths -- when an
expression is handed to the visitor, it doesn't print '\n' (since it might
be a step in printing a whole expression tree).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0677ea063cd96adefe87c1fb01ef7c66d905535b 30-May-2013 Dave Airlie <airlied@gmail.com> i965: fix problem with constant out of bounds access (v3)

Okay I now understand why Frank would want to run away, this is
my attempt at fixing the CVE out of bounds access to constants
outside the range. This attempt converts any illegal constants
to constant 0 as per the GL spec, and is undefined behaviour.

A future patch should add some debug for users to find this out,
but this needs to be backported to stable branches.

CVE-2013-1872

v2: drop the last hunk which was a separate fix (now in master).
hopefully fix the indentations.

v3: don't fail piglit, the whole 8/16 dispatch stuff was over
my head, and I spent a while figuring it out, but this one is
definitely safe, one piglit pass extra on my Ironlake.

NOTE: This is a candidate for stable branches.

Signed-off-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
60f9b722ef80c499a94b4e5ab7304dcd739ea569 30-May-2013 Kenneth Graunke <kenneth@whitecape.org> Revert "i965: fix problem with constant out of bounds access (v2)"

This reverts commit 98dfd59a0445666060c97b0dccaf0e9f030b547a.

The patch was clearly not Piglit tested, as it caused at least 225
tests to start crashing with assertion failures. That was before my
desktop tanked and the test run died completely.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
98dfd59a0445666060c97b0dccaf0e9f030b547a 30-May-2013 Dave Airlie <airlied@redhat.com> i965: fix problem with constant out of bounds access (v2)

This is my attempt at fixing this as the CVE is making RH security team
care enough to make me look at this. (please upstream, security fixes are
more important than whatever else you are doing, if for no other reason than
it saves me having to fix stuff I've no real clue about).

Since Frank's original fix was denied, here is my attempt to just
alias all constants that are out of bounds < 0 or > nr_params to constant 0,
hopefully this provides the undefined behaviour idr requires..

CVE-2013-1872

v2: drop the last hunk which was a separate fix (now in master).
hopefully fix the indentations.

NOTE: This is a candidate for stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e290372542d0475e612e4d10a27b22eae3158ecd 01-May-2013 Eric Anholt <eric@anholt.net> i965/fs: Make virtual grf live intervals actually cover their used range.

Previously, we would sometimes not consider a write to a register to
extend the end of the interval, nor would we consider a read before a
write to extend the start. This made for a bunch of complicated logic
related to how to treat the results when dead code might be present.
Instead, just extend the interval and fix dead code elimination to know
how to remove it.

Interestingly, this actually results in a tiny bit more optimization:
total instructions in shared programs: 1391220 -> 1390799 (-0.03%)
instructions in affected programs: 14037 -> 13616 (-3.00%)

v2: Fix a theoretical problem with the simd16 workaround if dst == src,
where we would revert the bump of the live range.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1f0f26d60c148e360908af34130c4e00dba8f3df 10-Apr-2013 Matt Turner <mattst88@gmail.com> i965/fs: Add support for bit instructions.

Don't bother scalarizing ir_binop_bfm, since its results are
identical for all channels.

v2: Subtract result of FBH from 31 (unless an error) to convert
MSB counts to LSB counts.
v3: Use op0->clone() in ir_triop_bfi to prevent (var_ref
channel_expressions) from appearing multiple times in the IR.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v2]
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ab04f3b2d74af061a0d2ebf3d1a02d8fcf73ff09 30-Apr-2013 Eric Anholt <eric@anholt.net> i965: Share the register file enum between the two backends.

I need this so I can look at vec4 and fs registers' files from the same
.cpp file without namespaces. As far as I can tell we never rely on the
particular numerical values of the files, though I thought it sounded like
a good idea when doing the VS (it turns out having 0 be BAD_FILE is nicer).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
63c8155b09bca7917631ec678a0d0db6e7965a1a 29-Apr-2013 Eric Anholt <eric@anholt.net> i965: Make dump_instructions be a virtual method of the visitor.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
61ca2c4f73f84eec29454698188309ab311eb503 26-Apr-2013 Eric Anholt <eric@anholt.net> i965/fs: Allow LRPs with uniform registers.

Improves GLB2.7 performance on my HSW by 0.671455% +/- 0.225037% (n=62).

v2: Make is_valid_3src() a method of the fs_reg. (recommended by Ken)

Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5e46482993dfd30b888d5219f6fecf4b4d1f42de 28-Apr-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Move is_math/is_tex/is_control_flow() to backend_instruction.

These are entirely based on the opcode, which is available in
backend_instruction. It makes sense to only implement them in one
place.

This changes the VS implementation of is_tex() slightly, which now
accepts FS_OPCODE_TXB and SHADER_OPCODE_LOD. However, since those
aren't generated in the VS anyway, it should be fine.

This also makes is_control_flow() available in the VS.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
79f786f9367e0071916e1d3c25bfff00d114339c 27-Apr-2013 Chris Forbes <chrisf@ijw.co.nz> i965/fs: Don't try to use bogus interpolation modes pre-Gen6.

Interpolation modes other than perspective-barycentric-pixel-center (and
their associated coefficients in the WM payload) only exist in Gen6 and
later.

Unfortunately, if a varying was declared as `centroid`, we would blindly
read the nonexistant values, and so produce all manner of bad behavior
-- texture swimming, snow, etc.

Fixes rendering in Counter-Strike Source and Team Fortress 2 on
Ironlake.

NOTE: This is a candidate for the 9.1 branch.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
578987ce1c17d17cfa538eb70d07a751fda55eb1 16-Apr-2013 Eric Anholt <eric@anholt.net> i965: Avoid recompiles for fragment clamping on non-clamping APIs.

Removes 75/78 state-dependent recompiles in GLB2.7 (the remaining 3 are
due to FBO-rendering size predictions). We currently expose
GL_ARB_color_buffer_float on GL core, so we may mis-predict there, but I'm
about to send a patch for removing that silly extension in that case.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
dcb1b89c65b963ccc0e37cb7ace1e69c42f8cd26 11-Apr-2013 Eric Anholt <eric@anholt.net> i965: Silence one more compile warning.

We don't want to store this thing in the class, and we do need the
definition to be at the top of the function and held onto until the end
here, so there's not much to do besides (void) reference it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
248175ab3b90139580f7e9403ac5243d7aeac823 11-Apr-2013 Eric Anholt <eric@anholt.net> i965: Fix an unused variable warning in the release build.

It's used in an assert, but we have this as a member of the class anyway.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
adf251406b5cf5e74f88c91799228dd9cc8dac29 11-Apr-2013 Eric Anholt <eric@anholt.net> i965/fs: Fix some untriggered optimization bugs with uncompressed/sechalf.

We have this support for firsthalf/sechalf instructions, which would be
called in the !has_compr4 (aka original gen4) 16-wide case. We currently
only support 16-wide for gen5+, so we weren't tripping over this, but it
would have been a problem if we ever try to enable it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
eaca8a94e27d5ec13fcbe5158212310292270e51 09-Apr-2013 Eric Anholt <eric@anholt.net> i965/fs: Add basic-block-level dead code elimination.

This is a poor substitute for proper global dead code elimination that
could replace both our current paths, but it was very easy to write. It
particularly helps with Valve's shaders that are translated out of DX
assembly, which has been register allocated and thus have a bunch of
unrelated uses of the same variable (some of which get copy-propagated
from and then left for dead).

shader-db results:
total instructions in shared programs: 1735753 -> 1731698 (-0.23%)
instructions in affected programs: 492620 -> 488565 (-0.82%)

v2: Fix comment typo

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
36d0fde603015066fce0ff37fd9be609800243e8 09-Apr-2013 Eric Anholt <eric@anholt.net> i965/fs: Remove incorrect note of writing attr in centroid workaround.

This instruction doesn't update its IR destination, it just moves from
payload to f0. This caused the dead code elimination pass I'm adding to
dead-code-eliminate the first step of interpolation.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2cb7f1e766d28dd238274f74d9568ab4438c4965 04-Jun-2012 Eric Anholt <eric@anholt.net> i965/fs: Add a helper function for checking for partial register updates.

These checks were all over, and every time I wrote one I had to try to
decide again what the cases were for partial updates.

v2: Fix inadvertent reladdr check removal.
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
32a8e877666f7c3798d736bb6f05ad2f41356ebf 11-Apr-2013 Matt Turner <mattst88@gmail.com> i965: NULL check prog on shader compilation failure.

Also change if (shader) to if (prog) for consistency.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fe97f26c86d65b1b0e026c725c7da348a91093d9 09-Apr-2013 Paul Berry <stereotype441@gmail.com> i965: Rename backend_visitor::prog to shader_prog.

The next patch is going to change the type of vec4_visitor::vp from
struct gl_vertex_program * to struct gl_program *, and rename it. The
sensible name to change it to is vec4_visitor::prog. However, prog is
already used in backend_visitor (which vec4_visitor derives from).
Since backend_visitor::prog is of type struct gl_shader_program *, it
makes sense to rename it to shader_prog.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
705c8247fa0eb50587b6c19561eb31e4d3a1b876 13-Mar-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Remove now dead brw_wm_prog_key::proj_attrib_mask field.

The previous commit removed the last user of this field, so there's no
longer any point in setting it. Removing this should eliminate
state-dependent recompiles, and make the precompile more reliable.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7183568869a0ce856b2f3d4cd9e1d7bd63ff9092 13-Mar-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Remove fixed-function texture projection avoidance optimization.

This optimization attempts to avoid extra attribute interpolation
instructions for texture coordinates where the W-component is 1.0.

Unfortunately, it requires a lot of complexity: the brw_wm_input_sizes
state atom (all the brw_vs_constval.c code) needs to run on each draw.
It computes the input_size_masks array, then uses that to compute
proj_attrib_mask. Differences in proj_attrib_mask can cause
state-dependent fragment shader recompiles. We also often fail to guess
proj_attrib_mask for the fragment shader precompile, causing us to
needlessly compile it twice.

Furthermore, this optimization only applies to fixed-function programs;
it does not help modern GLSL-based programs at all. Generally, older
fixed-function programs run fine on modern hardware anyway.

The optimization has existed in some form since the initial commit. When
we rewrote the fragment shader backend, we dropped it for a while. Eric
readded it in commit eb30820f268608cf451da32de69723036dddbc62 as part of
an attempt to cure a ~1% performance regression caused by converting the
fixed-function fragment shader generation code from Mesa IR to GLSL IR.
However, no performance data was included in the commit message, so it's
unclear whether or not it was successful.

Time has passed, so I decided to re-measure this. Surprisingly,
Eric's OpenArena timedemo actually runs /faster/ after removing this and
the brw_wm_input_sizes atom. On Ivybridge at 1024x768, I measured a
1.39532% +/- 0.91833% increase in FPS (n = 55). On Ironlake, there was
no statistically significant difference (n = 37).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
70b27e0e4b5d15e575ea477d63c0f6cb19d645c2 18-Mar-2013 Eric Anholt <eric@anholt.net> i965/fs: Use LD messages for pre-gen7 varying-index uniform loads

This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on
my GM45 forced to load all uniforms through the varying-index path), but we
get a whole vec4 at a time to reuse in the next commit.

v2: Fix comment about channels in the other message.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.1 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ce316f62efa208b1a43fe81831126fc75c5807c5 20-Mar-2013 Eric Anholt <eric@anholt.net> i965/fs: Don't double-emit SEND dependency workarounds at control flow.

We weren't setting needs_dep[i] in the loops, so we'd continue on to
potentially add the same workaround MOVs to the later basic block
boundaries, too. We can either set needs_dep[i] to exit through the
normal path, or we can just return since we know we're done.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3cf69b228404791cf15231321b6a18b5701be0a6 18-Mar-2013 Eric Anholt <eric@anholt.net> i965/fs: Bake regs_written into the IR instead of recomputing it later.

For sampler messages, it depends on the target gen, and on gen4
SIMD16-sampler-on-SIMD8-execution we were returning 4 instead of 8 like we
should.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.1 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
dca5fc14358a8b267b3854c39c976a822885898f 13-Mar-2013 Eric Anholt <eric@anholt.net> i965/fs: Improve performance of varying-index uniform loads on IVB.

Like we have done for the VS and for constant-index uniform loads, we use
the sampler engine to get caching in front of the L3 to avoid tickling the
IVB L3 bug. This is also a bit of a functional change, as we're now
loading a vec4 instead of a single dword, though we're not taking
advantage of the other 3 components of the vec4 (yet).

With the driver hacked to always take the varying-index path for all
uniforms, improves performance of my old GLSL demo by 315% +/- 2% (n=4).
This a major fix for some blur shaders in compositors from the
varying-index uniforms support I introduced in 9.1.

v2: Move old offset computation into the pre-gen7 path.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
NOTE: This is a candidate for the 9.1 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bc0e1591f64b8b3f2693fceaaa8bba9198e26171 15-Mar-2013 Eric Anholt <eric@anholt.net> i965/fs: Avoid inappropriate optimization with regs_written > 1.

Right now we don't have anything with regs_written() > 1 and !inst->mlen,
but that's about to change.

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
740350c982bd2735b9eb9063c2b91856b6f1ad31 14-Mar-2013 Eric Anholt <eric@anholt.net> i965: Make the fragment shader pull constants index by dwords, not vec4s.

We want to load vec4s, since loading a vec4 instead of a dword is
basically no increased latency. But for variable indexed access, the
previous requirement of aligned vec4s for a sampler LD was hard to
implement.

Note that this change only affects those messages that use the surface
format, like sampler LDs, but not to the untyped data cache loads we've
used in other cases.

No significant performance difference on my GLSL demo with uniforms forced
to take the varying pull constants path (n=4).

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8c694dfe6478ce9355c866ae70db45e49e499de3 13-Mar-2013 Eric Anholt <eric@anholt.net> i965/fs: Move varying uniform offset compuation into the helper func.

I'm going to want to change the math for gen7 using sampler LD
instructions in a way that gets CSE to occur like we'd hope.

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
59e858861caad2649f4c282eb277a7fc6202ab65 13-Mar-2013 Eric Anholt <eric@anholt.net> i965/fs: Remove creation of a MOV instruction that's never used.

We weren't inserting it into the list, so it did nothing. This line was
replaced by the MOV/MUL block above.

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
57a502518e79d42b014517bf36b297cc68947389 28-Mar-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Fix INTEL_DEBUG=shader_time for fragment shaders with discards.

"discard" instructions generate HALT instructions which jump to a final
HALT near the end of the shader. Previously, fs_generator created this
final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it
to jump right before the FB write epilogue. This is normally good.

However, INTEL_DEBUG=shader_time also has an epilogue section which
records the final timestamp. The frontend emits IR for this just before
FS_OPCODE_FB_WRITE. Unfortunately, this led to the following ordering:

1. Shader Time Epilogue
2. Final HALT (where discards jump)
3. Framebuffer Write Epilogue

This meant that discarded pixels completely skipped the shader time
epilogue, causing no ending timestamp to be written. This obviously
led to inaccurate results.

This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just
before any epilogue sections. This is where the final HALT should be
generated, and makes it easy to ensure the correct ordering:

1. Final HALT
2. Shader Time Epilogue
3. Framebuffer Write Epilogue

For shaders that don't discard, this opcode compiles away to nothing.
The scheduler adds barrier dependencies to make sure that it doesn't
get moved above any FS_OPCODE_DISCARD_JUMP instructions.

One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to
a mere 5.13 Gcycles.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
20d846ce8b46604ced835eb68079a0dbae2e19dc 12-Mar-2013 Eric Anholt <eric@anholt.net> i965: Add names for all instructions to dump_instruction() in FS and VS.

I'd previously added the minimum names to understand my dumps, but this
makes dumps in general much easier to read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b8aa9f7d3a146cff9c2c530abf815a1b316374ca 06-Mar-2013 Matt Turner <mattst88@gmail.com> i965/fs: Generate LOD sampler message from ir_lod.

v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
47e795d8612e5fde70740450d02370514ecc79e3 19-Mar-2013 Eric Anholt <eric@anholt.net> i965/fs: Include everything but the final FB write in shader_time.

Previously, if you just wrote a constant color to the render target, no
time got noted at all. This is convenient for doing single-instruction
timings, but not so much for actual program analysis.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5c5218ea6163f694a256562df1d73a108396e40d 19-Mar-2013 Eric Anholt <eric@anholt.net> i965/fs: Switch shader_time writes to using GRFs.

This avoids conflicts between shader_time and FB writes, so we can include
more of the program under our profiling. This does mean hiding more of
the message setup from the optimizer, which doesn't have a way to handle
multi-reg sends from GRFs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d2ba1c24b440ee74436335d8e815be9b72b1ba7f 19-Mar-2013 Eric Anholt <eric@anholt.net> i965: Track ARB program state along with GLSL state for shader_time.

This will let us do much better printouts for non-GLSL programs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0a0deb92d9e25067ac4b89cbbd8f8f8f3b4d05db 20-Mar-2013 Paul Berry <stereotype441@gmail.com> i965/fs: Rename vp_outputs_written to input_slots_valid.

With the introduction of geometry shaders, fragment inputs will no
longer come exclusively from the vertex shader; sometimes they come
from the geometry shader. So the name "vp_outputs_written" will
become a misnomer. This patch renames vp_outputs_written to
input_slots_valid, to reflect the true meaning of the bitfield from
the fragment shader's point of view: it indicates which of the
possible input slots contain valid data that was written by the
previous shader stage.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
995bbc22564b22de2ef6aac4e6881fd4c23e3162 23-Feb-2013 Paul Berry <stereotype441@gmail.com> i965/fs: Avoid unnecessary recompiles due to POS bit of proj_attrib_mask.

Previous to this patch, when using fixed function fragment shading,
bit VARYING_BIT_POS of brw_wm_prog_key::proj_attrib_mask was being set
differently during precompiles and normal usage. During precompiles
it was being set only if the fragment shader reads from window
position (which it never does), so it was always being set to 0.
During normal usage it was being set if the vertex shader writes to
all 4 components of gl_Position (which it usually does), so it was
usually being set to 1. As a result, we were almost always doing an
extra recompile for the fixed function fragment shader.

The recompile was totally unnecessary, though, because
brw_wm_prog_key::proj_attrib_mask is only consulted for
fs_visitor::emit_general_interpolation(), which isn't used for
VARYING_SLOT_POS.

This patch avoids the unnecessary recompile by always setting bit
VARYING_BIT_POS of brw_wm_prog_key::proj_attrib_mask to 1.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
eed6baf7621fa94e7888f8079b155fc67a08540c 23-Feb-2013 Paul Berry <stereotype441@gmail.com> Replace gl_frag_attrib enum with gl_varying_slot.

This patch makes the following search-and-replace changes:

gl_frag_attrib -> gl_varying_slot
FRAG_ATTRIB_* -> VARYING_SLOT_*
FRAG_BIT_* -> VARYING_BIT_*

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
10a131211ef18fa1d368dee394045df945dcbb6e 23-Feb-2013 Paul Berry <stereotype441@gmail.com> Get rid of _mesa_vert_result_to_frag_attrib().

Now that there is no difference between the enums that represent
vertex outputs and fragment inputs, there's no need for a conversion
function. But we still need to be able to detect when a given vertex
output has no corresponding fragment input. So it is replaced by a
new function, _mesa_varying_slot_in_fs(), which tells whether the
given varying slot exists as an FS input or not.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
36b252e94724b2512ea941eff2b3a3abeb80be79 23-Feb-2013 Paul Berry <stereotype441@gmail.com> Replace gl_vert_result enum with gl_varying_slot.

This patch makes the following search-and-replace changes:

gl_vert_result -> gl_varying_slot
VERT_RESULT_* -> VARYING_SLOT_*

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6bec74bfd98e2f9c090c550c18c02f71ea80d04e 24-Feb-2013 Paul Berry <stereotype441@gmail.com> i965: Change fragment input related bitfields to 64-bit.

This patch updates the bitfields brw_context::wm.input_size_masks,
tracker::size_masks, and brw_wm_prog_key::proj_attrib_mask, all of
which are indexed by gl_frag_attrib, from 32-bit to 64-bit.

This paves the way for supporting geometry shaders, and for merging
the gl_frag_attrib and gl_vert_result enums. The combination of these
two will require at least 55 bits in the bitfields.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
db3a0f13ef13b6d392dfc3b7346351533600d343 11-Mar-2013 Eric Anholt <eric@anholt.net> i965: Split shader_time entries into separate cachelines.

This avoids some snooping overhead between EUs processing separate shaders
(so VS versus FS).

Improves performance of a minecraft trace with shader_time by 28.9% +/-
18.3% (n=7), and performance of my old GLSL demo by 93.7% +/- 0.8% (n=4).

v2: Add a define for the stride with a comment explaining its units and
why.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4dc7e6dcbf0d9c360e257c704774c9b083511b47 07-Mar-2013 Eric Anholt <eric@anholt.net> i965/fs: Also do the gen4 SEND dependency workaround against other SENDs.

We were handling the the dependency workaround for the first written reg
of a send preceding the one we're fixing up, but didn't consider the other
regs. Thus if you had two sampler calls that got allocated to the same
set of regs, one might, rarely, ovewrite the other. This was occurring in
XBMC's GLSL shaders.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44567
NOTE: This is a candidate for the stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4c1fdae0a01b3f92ec03b61aac1d3df500d51fc6 06-Mar-2013 Eric Anholt <eric@anholt.net> i965/fs: Switch to using sampler LD messages for uniform pull constants.

When forcing the compiler to always generate pull constants instead of
push constants (in order to have an easy to use testcase), improves
performance of my old GLSL demo 23.3553% +/- 1.42968% (n=7).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60866
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1323772543083dec23baf5a50222bdfc88ff6c3a 07-Mar-2013 Eric Anholt <eric@anholt.net> i965/fs: Fix broken rendering in large shaders with UBO loads.

The lowering process creates a new vgrf on gen7 that should be represented
in live interval analysis. As-is, it was getting a conflicting allocation
with gl_FragDepth in the dolphin emulator, producing broken rendering.

NOTE: This is a candidate for the 9.1 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61317
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
14cec07177f438717cc6fb9252525e16d6b3d8dd 22-Feb-2013 Eric Anholt <eric@anholt.net> i965: Make perf_debug() output to GL_ARB_debug_output in a debug context.

I tried to ensure that performance in the non-debug case doesn't change
(we still just check one condition up front), and I think the impact is
small enough in the debug context case to warrant including all of it.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f52ce6a0ca73d1cd89091689efd8ea2e14748723 24-Jan-2013 Chris Forbes <chrisf@ijw.co.nz> i965: add a new virtual opcode: SHADER_OPCODE_TXF_MS

This is very similar to the TXF opcode, but lowers to `ld2dms` rather
than `ld` on Gen7.

V4: - add SHADER_OPCODE_TXF_MS to is_tex() functions, so regalloc thinks
it actually writes the correct number of registers. Otherwise in
nontrivial shaders some of the registers tend to get clobbered,
producing bad results.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0a1d145e5f1e6120e70e9b46e069167a0d653579 02-Dec-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Use the LRP instruction for ir_triop_lrp when possible.

v2 [mattst88]:
- Add BRW_OPCODE_LRP to list of CSE-able expressions.
- Fix op_var[] array size.
- Rename arguments to emit_lrp to (x, y, a) to clear confusion.
- Add LRP function to brw_fs.cpp/.h.
- Corrected comment about LRP instruction arguments in emit_lrp.
v3 [mattst88]:
- Duplicate MAD code for LRP instead of using a function pointer.
- Check for != GRF instead of == IMM in emit_lrp.
- Lower LRP on gen < 6.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>

1
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4eeb9ded9d0165a81aa9d2ac01197e30ddf9d835 12-Feb-2013 Matt Turner <mattst88@gmail.com> i965/fs/gen7: Allow MATH instructions to have MRF as a destination

total instructions in shared programs: 1376297 -> 1375626 (-0.05%)
instructions in affected programs: 35977 -> 35306 (-1.87%)

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b9f6795e34ad0b85b1f4f288dc6d1e5fcee30697 11-Feb-2013 Matt Turner <mattst88@gmail.com> i965/fs: Remove duplicate scan_inst->mlen check

Is already checked 20 lines below.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7b0731d940c758ca9c1e883cdea454d8787255c1 21-Feb-2013 Eric Anholt <eric@anholt.net> i965/fs: Fix broken math on values loaded from uniform buffers on gen6.

In a debug build this led to assertion failures, but on a non-debug
build the hardware would just reference the whole vec8 instead of the
same channel 8 times.

Fixes the new piglit glsl-1.40/uniform-buffer/fs-exp2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57121
Note: This is a candidate for the stable branches
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
aebd3f46e305829ebfcc817cafa8592edc2f80ab 16-Feb-2013 Eric Anholt <eric@anholt.net> i965/fs: Delay setup of uniform loads until after pre-regalloc scheduling.

This should fix the register allocation explosion on the GLES 3.0 test
on gen6. It also gives us an instruction that will fit our CSE handling.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.1 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
de7cb1cff3dbe30bbd691ed56e61c9d37ba5f2da 16-Feb-2013 Eric Anholt <eric@anholt.net> i965/fs: Add a bit more instruction dumping useful for upcoming work.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c37992c54d753e732783f712dea2d483450371dd 06-Feb-2013 Eric Anholt <eric@anholt.net> i965/fs: Do a general SEND dependency workaround for the original 965.

We'd been ad-hoc inserting instructions in some SEND messages with no
knowledge of when it was required (so extra instructions), but not all SENDs
(so not often enough). This should do much better than that, though it's
still flow-control-ignorant.

v2: Use BRW_MAX_MRF instead of magic numbers.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58960
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: Candidate for the stable branches.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bf91f0b03942d966cf453201dc52c4aa4049f8fa 06-Feb-2013 Eric Anholt <eric@anholt.net> i965/fs: Use a helper function for checking for flow control instructions.

In 2 of our checks, we were missing BREAK and CONTINUE.

NOTE: Candidate for the stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
491364e1f34ddb2c8ea439e871dd42aaa5cc9b28 11-Dec-2012 Ian Romanick <ian.d.romanick@intel.com> glsl: Add GLSL_TYPE_INTERFACE

Interfaces are structurally identical to structures from the compiler's
point of view. They have some additional restrictions, and generally
GPUs use different instructions to access them. Using a different base
type should make this a bit easier.

This commit also adds the glsl_type::interface_packing fields. For
GLSL_TYPE_INTERFACE types, this will track the specified packing mode.
It is analogous to gl_uniform_buffer::_Packing.

v2: Add serveral missing GLSL_TYPE_INTERFACE cases in switch-statements.

v3: Add information about glsl_type::interface_packing. Move row_major
checking in glsl_type::record_key_compare from this patch to the
previous patch. Both suggested by Paul Berry.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ecfb404e8d4fcd35524d1c4b3421e24980fe3976 11-Dec-2012 Ian Romanick <ian.d.romanick@intel.com> glsl: Replace most default cases in switches on GLSL type

This makes it easier to find switch-statements that need to be updated
after a new GLSL_TYPE_* is added because the compiler will generate a
warning.

Switch-statements that only had a small number of cases (e.g.,
everything in ir_constant_expression.cpp) were not modified. I may
regret that decision when we eventually add support for doubles.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4a6753926f51accd6f71d9caea18b15a99b8be24 11-Jan-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Add an INTEL_DEBUG=no16 option.

Often when debugging, I don't want to see SIMD16 shaders. It makes
INTEL_DEBUG=vs/fs output much easier to read, especially when a program
dumps many shaders. Plus, I also want to verify that SIMD8 works before
even considering SIMD16.

v2: Fix the likeliness check (caught by Chris and Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c0d1f508d6d471cf44329f43d8a79230ed8db0b6 21-Nov-2012 Eric Anholt <eric@anholt.net> i965/fs: Reference the core GL uniform storage for non-builtin uniforms.

There's no reason to use an external copy if the relayout in the
external copy isn't serving us.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f189570ccf60ab665cbe9feeff52685600f8163d 21-Nov-2012 Eric Anholt <eric@anholt.net> i965/fs: Remove the param_index/param_offset indirection.

Now that ParameterValues doesn't change across the visitor, we don't
need to go through this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d5efc14635cf25bc130bfa77737913913d9202ce 21-Nov-2012 Eric Anholt <eric@anholt.net> i965: Add asserts to check that we don't realloc ParameterValues.

Things are even more restrictive than they used to be, so I've made
mistakes in this area.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7171c45d3a6392b947d96c10362ce0459b741669 01-Dec-2012 Eric Anholt <eric@anholt.net> i965/fs: Drop an unnecessary _safe on a list walk.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
78ce522932e8c356880c7ca10dace4b6fe6cf313 01-Dec-2012 Eric Anholt <eric@anholt.net> i965/fs: Add a note explaining a detail of register_coalesce_2().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3c560633548f4b0298a372903de32639706f8c40 05-Dec-2012 Eric Anholt <eric@anholt.net> i965/fs: Move the failure for gen7 16-wide intdiv to emit_math().

The cube map array code adds another caller of emit_math(), which
needs this check.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6e34723ac9fa2d5c34cb2a38118ecf5b856c4992 28-Nov-2012 Chris Forbes <chrisf@ijw.co.nz> i965: fs: fix gen6+ math operands in one place

V4: Fix various style nits as pointed out by Eric, and expand IMM
operands on both Gen6 and Gen7.
v5: minor style nits (by anholt)

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
461a29783a28e579a9a5a236e5f47ffb6d18a328 05-Dec-2012 Eric Anholt <eric@anholt.net> i965/fs: Set up gen7 UBO loads as sends from GRFs.

This gives the instruction scheduler a chance to schedule between the
loads, whereas before it was restricted due to the dependencies between
the MRFs for setting them up.

For one shader in gles3conform, it goes from getting stuck in register
allocation for as long as anybody's bothered to leave it running down
to 23 seconds, thanks to the LIFO scheduling.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7a9f940cab8e5a3bbbab3e302de2311b36159d91 04-Dec-2012 Eric Anholt <eric@anholt.net> i965/fs: Schedule instructions both before and after register allocation.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f74560f3fb516971e6a7b03a2382db2f58699f59 10-Dec-2012 Eric Anholt <eric@anholt.net> i965: Scale shader_time to compensate for resets.

Some shaders experience resets more than others, which skews the numbers
reported. Attempt to correct for this by linearly scaling according to
the number of resets that happen.

Note that will not be accurate if invocations of shaders have varying
times and longer invocations are more likely to reset. However, this
should at least be better than the previous situation.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
338b5f887d462bbe7ef58a233cd00619e43415f0 10-Dec-2012 Eric Anholt <eric@anholt.net> i965: Adjust the split between shader_time_end() and shader_time_write().

I'm about to emit other kinds of writes besides time deltas, and it
turns out with the frequency of resets, we couldn't really use the old
time delta write() function more than once in a shader.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d5016495cc1b50b1673d0d3ab8e6af8249b071d5 06-Dec-2012 Eric Anholt <eric@anholt.net> i965/fs: Rewrite discards to use a flag subreg to track discarded pixels.

This makes much more sense on gen6+, and will also prove useful for
early exit of shaders on discard.

v2: fix up a stale comment from before converting gen4-5.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b278f65e1c5295794dcf08d100356e6ded6c1f32 06-Dec-2012 Eric Anholt <eric@anholt.net> i965/fs: Add an instruction flag for choosing the flag subregister.

We're going to redo discard handling to track discards in the other flag
subregister, saving instructions in the discard and allowing predicated
jumps out to the end of the shader.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
71f06344a0d72a6bd27750ceca571fc016b8de85 27-Nov-2012 Eric Anholt <eric@anholt.net> i965: Add a debug flag for counting cycles spent in each compiled shader.

This can be used for two purposes: Using hand-coded shaders to determine
per-instruction timings, or figuring out which shader to optimize in a
whole application.

Note that this doesn't cover the instructions that set up the message to
the URB/FB write -- we'd need to convert the MRF usage in these
instructions to GRFs so that our offsets/times don't overwrite our
shader outputs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)

v2: Check the timestamp reset flag in the VS, which is apparently
getting set fairly regularly in the range we watch, resulting in
negative numbers getting added to our 32-bit counter, and thus large
values added to our uint64_t.
v3: Rebase on reladdr changes, removing a new safety check that proved
impossible to satisfy. Add a comment to the AOP defs from Ken's
review, and put them in a slightly more sensible spot.
v4: Check timestamp reset in the FS as well.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a64c1eb9b110f29b8abf803a8256306702629bdc 09-Nov-2012 Eric Anholt <eric@anholt.net> i965/fs: Add support for uniform array access with a variable index.

Serious Sam 3 had a shader hitting this path, but it's used rarely so it
didn't show a significant performance difference (n=7). It does reduce
compile time massively, though -- one shader goes from 14s compile time
and 11723 instructions generated to .44s and 499 instructions.

Note that some shaders lose 16-wide mode because we don't support
16-wide and pull constants at the moment (generally, things looping over
a few-element array where the loop isn't getting unrolled). Given that
those shaders are being generated with 15-20% fewer instructions, it
probably outweighs the loss of 16-wide.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f22a909a080d603db122ac8517a80bd8f4006fe2 09-Nov-2012 Eric Anholt <eric@anholt.net> i965/fs: Restrict optimization that would fail for gen7's SENDs from GRFs

v2: Fix SNB math bug in register_coalesce() where I was looking at the
instruction to be removed, not the instruction to be copy propagated
into.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9156d0cba1090c4bcc3a6c0c7b2ad8921a295be4 26-Nov-2012 Eric Anholt <eric@anholt.net> i965/fs: Allow source mods on gen7+ math.

This gen6 restriction was removed in gen7 as the mathbox merge to act
more like a normal instruction was finished in the hardware.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d8214e4384aaf0ee412ad9aea80f9fec522c1e4a 07-Nov-2012 Eric Anholt <eric@anholt.net> i965/fs: Add instruction emit for varying-index reads of uniforms.

The gen7 send-from-GRF path is sufficiently different from the perspective of
IR generation and optimization that I just made it a separate opcode.

v2: fix whitespace, rebase on Ken's recent refactor.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
29340d02dc38a9cc352d44412871dc9d4e3f878a 07-Nov-2012 Eric Anholt <eric@anholt.net> i965/fs: Rename the existing pull constant load opcode.

We're going to use another send message for handling loads with a varying
per-fragment array index.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b126228f1247fb0fed686ee3ef2c87461f2fc7a7 30-Nov-2012 Eric Anholt <eric@anholt.net> i965: Include codegen time in the INTEL_DEBUG=perf stall detection.

In the VS case, we were missing the entire compile time in the stall
detection!

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0f06864ba566eaff5b739a9d0fba5ed7eaadd60b 30-Nov-2012 Eric Anholt <eric@anholt.net> i965: Don't leak the IR annotation into later instructions.

After walking our IR instructions (Mesa or GLSL), we don't want to also
mark the start of the FB/URB writes or whatever as being that IR. This
can end up being misleading when the end of the IR visit got copy
propagated out to a later instruction in the URB writes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8d0bb74a11f1905e32f6db23fbf8bb29ff8fa367 18-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Add fs_reg::is_zero() and is_one(); use for opt_algebraic().

These helper macros save you from writing nasty expressions like:

if ((inst->src[1].type == BRW_REGISTER_TYPE_F &&
inst->src[1].imm.f == 1.0) ||
((inst->src[1].type == BRW_REGISTER_TYPE_D ||
inst->src[1].type == BRW_REGISTER_TYPE_UD) &&
inst->src[1].imm.u == 1)) {

Instead, you simply get to write inst->src[1].is_one(). Simple.
Also, this makes the FS backend match the VS backend (which has these).

This patch also converts opt_algebraic to use the new helper functions.
As a consequence, it will now also optimize integer-typed expressions.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
154ef07aa74e1d91e16cf9f2492cae33790b0998 30-Oct-2012 Eric Anholt <eric@anholt.net> i965/fs: Add some minimal backend-IR dumping.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9136723214136a95a3c915d580943c888cd99503 21-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Move struct brw_compile (p) entirely inside fs_generator.

The brw_compile structure contains the brw_instruction store and the
brw_eu_emit.c state tracking fields. These are only useful for the
final assembly generation pass; the earlier compilation stages doesn't
need them.

This also means that the code generator for future hardware won't have
access to the brw_compile structure, which is extremely desirable
because it prevents accidental generation of Gen4-7 code.

v2: rzalloc p, as suggested by Eric.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ea681a0d64ecde3a2e729fe3b71d3f3fe4cedff0 09-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Split final assembly code generation out of fs_visitor.

Compiling shaders requires several main steps:

1. Generating FS IR from either GLSL IR or Mesa IR
2. Optimizing the IR
3. Register allocation
4. Generating assembly code

This patch splits out step 4 into a separate class named "fs_generator."

There are several reasons for doing so:

1. Future hardware has a different instruction encoding. Splitting
this out will allow us to replace fs_generator (which relies
heavily on the brw_eu_emit.c code and struct brw_instruction) with
a new code generator that writes the new format.

2. It reduces the size of the fs_visitor monolith. (Arguably, a lot
more should be split out, but that's left for "future work.")

3. Separate namespaces allow us to make helper functions for
generating instructions in both classes: ADD() can exist in
fs_visitor and create IR, while ADD() in fs_generator() can
create brw_instructions. (Patches for this upcoming.)

Furthermore, this patch changes the order of operations slightly.
Rather than doing steps 1-4 for SIMD8, then 1-4 for SIMD16, we now:

- Do steps 1-3 for SIMD8, then repeat 1-3 for SIMD16
- Generate final assembly code for both modes together

This is because the frontend work can be done independently, but final
assembly generation needs to pack both into a single program store to
feed the GPU.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4d09fe938e72b26d814b6b52caee5112cf6f1103 21-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Move uses of brw_compile from do_wm_prog to brw_wm_fs_emit.

The brw_compile structure is closely tied to the Gen4-7 hardware
encoding. However, do_wm_prog is very generic: it just calls out to
get a compiled program and then uploads it.

This isn't ultimately where we want it, but it's a step in the right
direction: it's now closer to the code generator.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3417b2f2b249d89fc71379bfc0eaa1055de365ba 20-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Pass the brw_context pointer into fs_visitor explicitly.

We used to steal it out of the brw_compile struct...but fs_visitor
isn't going to have one of those in the future.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1f74002a9817e000d3f5633dd5eb6adfd1d51ba5 20-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Move brw_wm_compile::fp to fs_visitor.

Also change it from a brw_fragment_program to a gl_fragment_program,
since that seems to be what everything wants anyway.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7b0d30eb8765066b9f3b5f2a50c426ccbac675fa 20-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Remove struct brw_shader * parameter to fs_visitor constructor.

We can easily recover it from prog, and this makes it clear that we
aren't passing additional information in.

v2: Use an if-statement rather than the ?: operator (suggested by Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a303df86de96a428f82377a8c38db8b7e3223447 20-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Move brw_wm_compile::dispatch_width into fs_visitor.

Also, rather than having brw_wm_fs_emit poke at it directly, make it a
parameter to the fs_visitor constructor.

All other changes generated by search and replace (with occasional
whitespace fixup).

v2: Make dispatch_width const (as suggested by Paul); fix doxygen
mistake (pointed out by Eric); update for rebase.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
47a6a7b51b774091f46aed264b3591fd36c8baed 19-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Move brw_wm_lookup_iz() to fs_visitor::setup_payload_gen4().

This necessitates compiling brw_wm_iz.c as C++.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2429c9d347fe1c6e98a248c1039041f6a59fd749 14-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Move brw_wm_payload_setup() to fs_visitor::setup_payload_gen6()

Now that we only have the one backend, there's no real point in keeping
this separate. Moving it should allow some future simplifications.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d82b873a501606d62b9f208b6d5cda79c9a6b4b8 09-Nov-2012 Eric Anholt <eric@anholt.net> i965/fs: Add helper functions for IF and CMP and use them.

v2: Rebase on gen6-if fix.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
32d6809bb50a08d9b80ed8b3d13cc6b76580a3a9 09-Nov-2012 Eric Anholt <eric@anholt.net> i965/fs: Add helper functions for generating ALU ops, like in the VS.

This gives us checking of our arguments (no more passing 1 operand to
BRW_OPCODE_MUL!), at the cost of a couple of extra parens.

v2: Rebase on gen6-if fix.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5cea0273414bd5897c318b4d632b08ce8080a2fe 15-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Properly patch special values during VGRF compaction.

In addition to registers used by instructions, fs_visitor maintains
direct references to certain "special" values used for inputs/outputs.

When I added VGRF compaction, I overlooked these, believing that these
direct references weren't used once instructions were generated. That
was wrong. For example, pixel_x/y are used in virtual_grf_interferes(),
which is called by optimization passes and register allocation.

This patch treats all of them as used and patches them after compacting.
While it's not strictly necessary to patch all of them (as some aren't
used after emitting code), it seems safer to simply fix them all.

Fixes oglconform's textureswizzle/advanced.shader.targets, piglit's
glsl-fs-lots-of-tex, and glean's texCombine on pre-Gen6 hardware.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56790
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d010e70a07ab4a0b24aad8c9693a7f9c680d6164 03-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Don't calculate_live_intervals() in opt_algebraic().

There's no point: opt_algebraic() doesn't use any liveness information.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cf26b4569a2effc59d072ffd2b2bf9b055faab43 31-Oct-2012 Eric Anholt <eric@anholt.net> i965/fs: Do dead code elimination just after copy propagation.

If we put the register coalescing in between the two, then we end up with code
sequences involving dead writes that the dead code elimination doesn't know
how to remove. In place of making dead code elimination smart (which we
should do, too), make it less important for the moment.

shader-db results:

total instructions in shared programs: 722240 -> 721275 (-0.13%)
instructions in affected programs: 50573 -> 49608 (-1.91%)

(no shaders regressed).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
05882b0d3b69ac14e9bc93460c77f9dc203c2ff9 02-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Compact the virtual GRF arrays.

During code generation, we create tons of temporary variables, many of
which get immediately killed and are never used. Later optimization and
analysis passes, such as compute_live_intervals, loop over all the
virtual GRFs. By compacting them, we can save a lot of overhead.

Reduces compilation time in L4D2's largest fragment shader from 10.2
seconds to 5.2 seconds (50%). Drops compute_live_variables() from
10-12% of another game's startup time to 8%.

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
54679fcbcae7a2d41cb439e52e386bd811a291b4 03-Oct-2012 Eric Anholt <eric@anholt.net> i965: Share the predicate field between FS and VS.

Note that BRW_PREDICATE_NONE is 0 and BRW_PREDICATE_NORMAL is 1, so that's a
lot like the true/false we had in the FS before.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
24aeeb2fdcde7a0c257db6469c6b0f064d53d3cf 03-Oct-2012 Eric Anholt <eric@anholt.net> i965: Make the FS and VS share a few visitor/instruction fields.

This will let us reuse brw_fs_cfg.cpp from brw_vec4_*.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
41954107c00d68869f0316126908e873662b4c6d 15-Oct-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Fix segfault when using INTEL_DEBUG=perf with non-GLSL.

Now that ARB programs and fixed function are routed through the new
backend, shader might be NULL. Don't do INTEL_DEBUG=perf support in
that case, since it relies on shader->compiled_once.

Since INTEL_DEBUG=perf wasn't previously supported, this maintains the
status quo. It might be nice to support it someday, however.

This could be moved to brw_shader_program instead of brw_shader, but
it appears even prog can be NULL in that case.

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fb5bf03a2092159166229eacf57c71587f762c57 21-Sep-2012 Eric Anholt <eric@anholt.net> i965/fs: Move constant propagation to the same codebase as copy prop.

This means that we don't get constant prop across into the first block after a
BRW_OPCODE_IF or a BRW_OPCODE_DO, but we have hope for properly doing it
across control flow at some point. More importantly, with the next commit it
will help avoid O(n^2) with instruction count runtime for shaders that have
many constant moves.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
97615b2d8c7c3cea6fd3a43bcb1739a96e2046c4 27-Aug-2012 Eric Anholt <eric@anholt.net> i965: Replace brw_wm_* with dumping code into the fs_visitor.

This makes a giant pile of code newly dead. It also fixes TXB on newer
chipsets, which has been totally broken (I now have a piglit test for that).
It passes the same set of Ian's ARB_fragment_program tests. It also improves
high-settings ETQW performance by 3.2 +/- 1.9% (n=3), thanks to better
optimization and having 8-wide along with 16-wide shaders.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=24355
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9cfc00a84c7a7176a385808e0b92705e79955505 20-Sep-2012 Eric Anholt <eric@anholt.net> i965/fs: Add a couple more algebraic cases that help some ARB_fp patterns.

ARB_fp doesn't go through the GLSL optimizer, and these were things you see
frequently thanks to conditionals being lowered to SLT/SGE and MUL.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
077d01b673ec255005a1a847faf3be897517f4e7 01-Feb-2012 Eric Anholt <eric@anholt.net> i965: Add support for instruction compaction.

This reduces program size by using some smaller encodings for common bit
patterns in the Gen ISA, with the hope of making programs fit in the
instruction cache better.

v2: Use larger bitshifts for the uncompressed field setups, in line with the
way it's described in the spec. Consistently name a brw_compile "p" like
all other code. Add a couple more tests. Consistently call things
"compacted" not "compressed" (which is a different feature). Drop the
explicit check for not compacting SENDs, which is unjustified and already
implied by our lack of support for immediate values.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
23cd6c43da6fb1ff89b994664df2658a7929402e 11-Sep-2012 Kenneth Graunke <kenneth@whitecape.org> i965: Remove incorrect comment above opt_algebraic.

The comment was cut-and-pasted from propagate_constants(), and had no
relation at all to opt_algebraic().
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f144b78dfbb97a70121be6f20d10bad8111267e3 27-Aug-2012 Eric Anholt <eric@anholt.net> i965: Make the param pointer arrays for the WM dynamically sized.

Saves 26.5MB of wasted memory allocation in the l4d2 demo.

v2: Rebase on compare func change, fix comments.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4d9abd96cc177cade79b64544096eb45bf8313a2 31-Aug-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Don't use brw->fragment_program in calculate_urb_setup().

Reading brw->fragment_program is nonsensical in compiler code: it
contains the currently active program (if any), not the one currently
being compiled. Attempting to access it may either lead to crashes
(null pointer dereference if no program is active) or wrong results.

Fixes piglit regressions since 9ef710575b914ddfc8e9a162d98ad554c1c217f7
on pre-Sandybridge hardware. The actual bug was created in commit
7b1fbc688999fd568e65211d79d7678562061594.

NOTE: This is a candidate for the 9.0 and 8.0 branches.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54183
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
85b24b07512c5f3f05c5a3eb9561598ace97526c 26-Aug-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Assume shadow sampler swizzling is <X, X, X, 1>.

Our previous assumption, SWIZZLE_XYZW, was completely bogus for depth
textures. There are no Y, Z, or W components.

DEPTH_TEXTURE_MODE has three options:
- GL_LUMINANCE: <X, X, X, 1>
- GL_INTENSITY: <X, X, X, X>
- GL_ALPHA: <0, 0, 0, X>

The default value is GL_LUMINANCE, and most applications don't seem to
alter DEPTH_TEXTURE_MODE. Make that our precompile guess.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f3d0daf7ea7e42ff9ce11e8bd6fba1059a2406e8 26-Aug-2012 Kenneth Graunke <kenneth@whitecape.org> i965: Index sampler program key data by linker-assigned index.

Now that most things are based on the linker-assigned index, it makes
sense to convert the arrays in the VS/WM program key as well. It seems
silly to leave them indexed by texture unit.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ab17762c70852ca8fc400d7b5c6696d412ff2afe 14-Aug-2012 Kenneth Graunke <kenneth@whitecape.org> i965: Only set proj_attrib_mask for fixed function.

brw_wm_prog_key's proj_attrib_mask field is designed to enable an
optimization for fixed-function programs, letting us avoid projecting
attributes where the divisor is 1.0.

However, for shaders, this is not useful, and is pretty much impossible
to guess when building the FS precompile key. Turning it off for
shaders should allow the precompile to work and not lose much.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b6b1fc1261e86e2aa03ae8d2dd587c88a207354f 14-Aug-2012 Kenneth Graunke <kenneth@whitecape.org> i965: Don't set vp_outputs_written in the WM program key on Gen6+.

It's only used by on pre-Sandybridge hardware.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a3685544e1e88828c4931059686cf3acc199079c 14-Aug-2012 Kenneth Graunke <kenneth@whitecape.org> i965: Don't set iz_lookup the FS precompile's program key on Gen6+.

We already changed the actual program key builder to only set these bits
on gen < 6; this patch just brings the precompile state back in line so
it doesn't mismatch every time.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
43e3a7533d5537e48cef23588131dd25d938ee4b 14-Aug-2012 Eric Anholt <eric@anholt.net> i965: Fix the scaling of seconds to ms in perf debug.

*headdesk*
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
006c1a3c652803e2ff8d5f7ea55c9cb5d8353279 07-Aug-2012 Eric Anholt <eric@anholt.net> i965: Add perf debug for stalls during shader compiles.

v2: fix bad comment from before I gave up and decided to just use doubles.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fc3b7c9b56701f23b002543de33a8d8c43f9bdc2 12-Jul-2012 Eric Anholt <eric@anholt.net> i965: Add performance debug for shader recompiles.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d72ff03e699e78381049e29d89163519e6730dd4 12-Jul-2012 Eric Anholt <eric@anholt.net> i965: Add INTEL_DEBUG=perf for failure to compile 16-wide shaders.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7426d9d7699452f15f3288e781e1791d8d00a64a 19-Jul-2012 Olivier Galibert <galibert@pobox.com> i965/fs: Fix the FS inputs setup when some SF outputs aren't used in the FS.

If there was an edge flag or a two-side-color pair present, we'd end up
mismatched and read values from earlier in the VUE for later FS inputs.

v2: Fix regression in gles2conform shaders generating point size. (change by
anholt)

Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 8.0 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
90de96ff0d6d54ba0f9a337a6a107acf4134682d 21-Jun-2012 Eric Anholt <eric@anholt.net> i965/fs: Add support for loading uniform buffer variables as pull constants.

Variable array indexing isn't finished, because the lowering pass
turns it all into conditional moves of constant index accesses so I
can't test it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
454dc83f66643e66ea7ee9117368211f0cfe84d7 21-Jun-2012 Eric Anholt <eric@anholt.net> i965/fs: Communicate the pull constant block read parameters through fs_regs.

I wanted to add the surface index as a variable value for UBO support,
and a reg seemed like the obvious way to go. This exposes more of the
information to CSE, which we'll probably want to apply to pull
constant loads for UBOs eventually (you might access 4 floats in a
row, each of which would produce an oword block read of the same
block).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cc44aa77490e1360b099eb0b887266f434298b4f 21-Jul-2012 Eric Anholt <eric@anholt.net> i965: Remove unused param conversion code.

Ever since ctx->NativeIntegers was set, the conversion flag has been
PARAM_NO_CONVERT.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d08fdacd58dfa6b1926e9df4707dd9e8dd5370c5 20-Jun-2012 Paul Berry <stereotype441@gmail.com> i965: Avoid unnecessary recompiles for shaders that don't use dFdy().

The i965 back-end needs to compile dFdy() differently for FBOs and
window system framebuffers, because Y coordinates are flipped between
the two (see commit 82d2596: i965: Compute dFdy() correctly for FBOs).
This patch avoids unnecessarily recompiling shaders that don't use
dFdy(), by only setting render_to_fbo in the wm program key if the
shader actually uses dFdy().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a454f8ec6df9334df42249be910cc2d57d913bff 07-Jul-2012 Eric Anholt <eric@anholt.net> i965/fs.h: Refactor tests for instructions modifying a register.

There's one instance of a potential behavior change: propagate_constants may
now propagate into a part of a vgrf after a different part of it was
overwritten by a send that returns multiple registers. I don't think we ever
generate IR that meets that condition, but it's something to note if we bisect
behavior change to this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fc01376c50c15938f3b78431023ca3281304663d 06-Jul-2012 Eric Anholt <eric@anholt.net> i965/fs: Replace usage is_tex() with regs_written() checks.

In these places, we care about any sort of send that hits more than one reg,
not just textures. We don't yet have anything else returning more than one
reg, so there's no change.

v2: Use mlen instead of is_tex() for the is-it-a-send check.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a6411520b40d59a8806289c7aaea4a6b26a54443 06-Jul-2012 Eric Anholt <eric@anholt.net> i965/fs: Rename virtual_grf_next to virtual_grf_count.

"count" is a more useful name, since most of the time we're using it for
looping over the variables.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b546aebae922214dced54c75e6f64830aabd5d1c 10-Jul-2012 Kenneth Graunke <kenneth@whitecape.org> i965: Delete previous workaround for textureGrad with shadow samplers.

It had many problems:
- The shadow comparison was done post-filtering.
- It required state-dependent recompiles whenever the comparison
function changed.
- It didn't even work: many cases hit assertion failures.
- I never implemented it for the VS.

The new lowering pass which converts textureGrad to textureLod by
computing the LOD value works much better.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2343fe9a5d1786413453e6e8e5c7700143d68a26 05-Jun-2012 Eric Anholt <eric@anholt.net> i965/fs: Invalidate live intervals in passes that remove an instruction.

Since live intervals are based on ip, removing an instruction trashes
the intervals unless we were to go do some surgery. These happen to
usually remove a use of a grf, so it's time to recalculate, anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 8.0 release branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fe27916ddf41b9fb60c334c47c1aa81b8dd9005e 04-Jul-2012 Eric Anholt <eric@anholt.net> i965/fs: Move class functions from the header to .cpp files.

Cuts compile time for brw_fs.h changes from 2.7s to .7s and reduces
i965_dri.so size by 70k.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8313f44409ceb733e9f8835926364164237b3111 21-Jun-2012 Paul Berry <stereotype441@gmail.com> i965/msaa: Fix centroid interpolation of unlit pixels.

From the Ivy Bridge PRM, Vol 2 Part 1 p280-281 (3DSTATE_WM:
Barycentric Interpolation Mode):

"Errata: When Centroid Barycentric mode is required, HW may
produce incorrect interpolation results when a 2X2 pixels have
unlit pixels."

To work around this problem, after doing centroid interpolation, we
replace the centroid-interpolated values for unlit pixels with
non-centroid-interpolated values (which are interpolated at pixel
centers). This produces correct rendering at the expense of a slight
increase in shader execution time.

I've conditioned the workaround with a runtime flag
(brw->needs_unlit_centroid_workaround) in the hopes that we won't need
it in future chip generations.

Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4}
{centroid-deriv,centroid-deriv-disabled}". All MSAA interpolation
tests pass now.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d1056541e239dfcee0ad6af2fd2d9fab37dbf025 18-Jun-2012 Paul Berry <stereotype441@gmail.com> i965/msaa: Add backend support for centroid interpolation.

This patch causes the fragment shader to be configured correctly (and
the correct code to be generated) for centroid interpolation. This
required two changes: brw_compute_barycentric_interp_modes() needs to
determine when centroid barycentric coordinates need to be included in
the pixel shader thread payload, and
fs_visitor::emit_general_interpolation() needs to interpolate using
the correct set of barycentric coordinates.

Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4}
centroid-edges" on i965.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cf0e7aa9f8bc9c175ebd9b2ab3a8bfec4afc5abf 21-Jun-2012 Paul Berry <stereotype441@gmail.com> i965/fs: Refactor interpolation code to prepare for adding centroid support.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
82d25963a838cfebdeb9b080169979329ee850ea 20-Jun-2012 Paul Berry <stereotype441@gmail.com> i965: Compute dFdy() correctly for FBOs.

On i965, dFdx() and dFdy() are computed by taking advantage of the
fact that each consecutive set of 4 pixels dispatched to the fragment
shader always constitutes a contiguous 2x2 block of pixels in a fixed
arrangement known as a "sub-span". So we calculate dFdx() by taking
the difference between the values computed for the left and right
halves of the sub-span, and we calculate dFdy() by taking the
difference between the values computed for the top and bottom halves
of the sub-span.

However, there's a subtlety when FBOs are in use: since FBOs use a
coordinate system where the origin is at the upper left, and window
system framebuffers use a coordinate system where the origin is at the
lower left, the computation of dFdy() needs to be negated for FBOs.

This patch modifies the fragment shader back-ends to negate the value
of dFdy() when an FBO is in use. It also modifies the code that
populates the program key (brw_wm_populate_key() and
brw_fs_precompile()) so that they always record in the program key
whether we are rendering to an FBO or to a window system framebuffer;
this ensures that the fragment shader will get recompiled when
switching between FBO and non-FBO use.

This will result in unnecessary recompiles of fragment shaders that
don't use dFdy(). To fix that, we will need to adapt the GLSL and
NV_fragment_program front-ends to record whether or not a given shader
uses dFdy(). I plan to implement this in a future patch series; I've
left FIXME comments in the code as a reminder.

Fixes Piglit test "fbo-deriv".

NOTE: This is a candidate for stable release branches.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f220f73b9c5aca16ca21ea8bbbbf8718703b12cf 08-May-2012 Eric Anholt <eric@anholt.net> i965/fs: Do more register coalescing by using the interference graph.

By using the live variables code for determining interference, we can
handle coalescing in the presence of control flow, which the other
register coalescing path couldn't.

Total instructions: 207184 -> 206990
74/1246 programs affected (5.9%)
33993 -> 33799 instructions in affected programs (0.6% reduction)

There is a newerth shader that loses out, because of some extra MOVs
that now get their dead-code nature obscured by coalescing. This
should be fixed by doing better at dead code elimination.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d7787adda8006506545256547d8d590a282487af 08-May-2012 Eric Anholt <eric@anholt.net> i965/fs: Add support for copy propagation.

We could do more by handling abs/negate and non-GRF sources, but this is
a good start. Improves tropics performance 0.30% +/- .17% (n=43).

shader-db results:
Total instructions: 208032 -> 207184
60/1246 programs affected (4.8%)
23286 -> 22438 instructions in affected programs (3.6% reduction)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a4e9b5a768d2d9e59b6054148afb6a6b94c0e4e6 11-May-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Add a local common subexpression elimination pass.

Total instructions: 18210 -> 17836
49/163 programs affected (30.1%)
12888 -> 12514 instructions in affected programs (2.9% reduction)

This reduces Lightsmark's "Scale down filter" shader from 395
instructions to 283, a whopping 28%. It also reduces register pressure
significantly: the SIMD8 program now uses 29 registers instead of 101,
giving us more than enough room for a SIMD16 program.

v2: Add && !inst->conditional_mod to the "skip some instructions" check.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d1029f99884e2ba7f663765274cd6bdb4f82feed 11-May-2012 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Use a const reference in fs_reg::equals instead of a pointer.

This lets you omit some ampersands and is more idiomatic C++. Using
const also marks the function as not altering either register (which
was obvious, but nice to enforce).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4433b0302d0aa9dc61002e8bb4fd1b752b0be338 20-Apr-2012 Brian Paul <brianp@vmware.com> intel: use _mesa_is_winsys/user_fbo() helpers

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
34b17ee598e855e1090a455c2dac31ed8104954b 11-Apr-2012 Eric Anholt <eric@anholt.net> i965: Move the old live interval analysis code next to the new live vars code.

I'm about to replace the insides of this using the new analysis.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
32ae8d3b321185a85b73ff703d8fc26bd5f48fa7 10-Mar-2012 Eric Anholt <eric@anholt.net> i965/fs: Try to avoid generating extra MOVs to do saturates.

This change (before the previous two) produced a .23% +/- .11%
performance improvement in Unigine Tropics at 1024x768 on IVB.

Total instructions: 269270 -> 262649
614/2148 programs affected (28.6%)
179386 -> 172765 instructions in affected programs (3.7% reduction)

v2: Move some of the logic of finding the instruction that produced
the result of an expression tree to a helper.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
43af02ac731dac7d80f7e47feb0c80e4da156769 27-Feb-2012 Yuanhan Liu <yuanhan.liu@linux.intel.com> i965: handle gl_PointCoord for Gen4 and Gen5 platforms

This patch add the support of gl_PointCoord gl builtin variable for
platform gen4 and gen5(ILK).

Unlike gen6+, we don't have a hardware support of gl_PointCoord, means
hardware will not calculate the interpolation coefficient for you.
Instead, you should handle it yourself in sf shader stage.

But badly, gl_PointCoord is a FS instead of VS builtin variable, thus
it's not included in c.vue_map generated in VS stage. Thus the current
code doesn't aware of this attribute. And to handle it correctly, we
need add it to c.vue_map manually to let SF shader generate the needed
interpolation coefficient for FS shader. SF stage has it's own copy of
vue_map, thus I think it's safe to do it manually.

Since handling gl_PointCoord for gen4 and gen5 platforms is somehow a
little special, I added a lot of comments and hope I didn't overdo it ;)

v2: add a /* _NEW_BUFFERS */ comment to note the state flag dependency
and also add the _NEW_BUFFERS dirty mask (Eric).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45975
Piglit: glsl-fs-pointcoord and fbo-gl_pointcoord

NOTE: This is a candidate for stable release branches.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a7f46eadea4555ed377928d4e3f89db4a445312e 09-Feb-2012 Eric Anholt <eric@anholt.net> i965: Report the failure message when failing to compile the fragment shader.

We just abort later, but at least this should result in more
informative bug reports.

NOTE: This is a candidate for release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a4586d2e2e444d1212d4abfd1ea5bbeff4503feb 27-Jan-2012 Eric Anholt <eric@anholt.net> intel: Comment typo fix.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
30f86aec01e1e1df4265d10a4618e34e9b8fec95 06-Jan-2012 Eric Anholt <eric@anholt.net> i965/fs: Fix projector==1.0 optimization pre-gen6.

The optimization was supposed to turn an attribute component that was
always 1.0 into a mov of 1.0. But by leaving loop this patch removes
out of that test, we applied the projection correction to the 1.0 and
got some other value, breaking openarena once it was converted to
using the new compiler backend.

Originally this hunk was separate from the former loop to make the
generated instructions slightly better pipelined. We now have
automatic instruction scheduling to handle that, and the generated
instruction sequence looked the same to me after this change (except
for the bugfix).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
069901e2f5a8f4a58047d25335f2526f1acc7234 19-Dec-2011 Eric Anholt <eric@anholt.net> i965/fs: Allow constant propagation into IF with embedded compare.

This saves a couple of instructions on most programs with control
flow. More interestingly, 6 shaders from unigine sanctuary now fit
into 16-wide without register spilling.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1b05fc7cdd0e5d77b50bc8ee2f2c851da5884d72 07-Dec-2011 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Factor out texturing related data from brw_wm_prog_key.

The idea is to reuse this for the VS and (in the future) GS as well.

v2: Include yuvtex data since we're not dropping GL_MESA_ycbycr.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
febad1779ae5cb5c85d66c2635baea62da52d2fa 26-Oct-2011 Kenneth Graunke <kenneth@whitecape.org> i965: Rename texturing ops from FS_OPCODE to SHADER_OPCODE, except TXB.

We'll be reusing most of these for the VS shortly. The one exception is
TXB (texturing with LOD bias), which is explicitly forbidden in the VS.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a73c65c5342bf41fa0dfefe7daa9197ce6a11db4 18-Oct-2011 Kenneth Graunke <kenneth@whitecape.org> i965: Enable faster workaround-free math on Ivybridge.

According to the documentation, Ivybridge's math instruction works in
SIMD16 mode for the fragment shader, and no longer forbids align16 mode
for the vertex shader.

The documentation claims that SIMD16 mode isn't supported for INT DIV,
but empirical evidence shows that it works fine. Presumably the note
is trying to warn us that the variant that returns both quotient and
remainder in (dst, dst + 1) doesn't work in SIMD16 mode since dst + 1
would be sechalf(dst), trashing half your results. Since we don't use
that variant, we don't care and can just enable SIMD16 everywhere.

The documentation also still claims that source modifiers and
conditional modifiers aren't supported, but empirical evidence and
study of the simulator both show that they work just fine.

Goodbye workarounds. Math just works now.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8fad0f99989866eeb72889a84f12f6f817334ddb 02-Nov-2011 Paul Berry <stereotype441@gmail.com> i965: Fix constant propagation into 32-bit integer MUL.

i965's MUL instruction can't take an immediate value as its first
argument. So normally, if constant propagation wants to propagate a
constant into the first argument of a MUL instruction, it swaps the
order of the two arguments.

This doesn't work for 32-bit integer (and unsigned integer)
multiplies, because the MUL operation is asymmetric in that case (it
multiplies 16 bits of one operand by 32 bits of the other).

Fixes piglit tests {vs,fs}-multiply-const-{ivec4,uvec4}.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9734bd05608c00a1d84851f3d46d5deb52e75d5e 25-Oct-2011 Paul Berry <stereotype441@gmail.com> i965: Fix flat integral varyings.

Previously, the vertex and fragment shader back-ends assumed that all
varyings were floats. In GLSL 1.30 this is no longer true--they can
also be of integral types provided that they have an interpolation
qualifier of "flat".

This required two changes in each back-end: assigning the correct type
to the register that holds the varying value during shader execution,
and assigning the correct type to the register that ties the varying
value to the rest of the graphics pipeline (the message register in
the case of VS, and the payload register in the case of FS).

Fixes piglit tests fs-int-interpolation and fs-uint-interpolation.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5aa96286e7e1a5380673eb75e8653616b48751fd 22-Oct-2011 Paul Berry <stereotype441@gmail.com> i965/gen6+: Add support for noperspective interpolation.

This required the following changes:

- WM setup now makes the appropriate set of barycentric coordinates
(perspective vs. noperspective) available to the fragment shader,
based on whether the shader requires perspective interpolation,
noperspective interpolation, both, or neither.

- The fragment shader backend now uses the appropriate set of
barycentric coordiantes when interpolating, based on the
interpolation mode returned by
ir_variable::determine_interpolation_mode().

- SF setup now uses gl_fragment_program::InterpQualifier to determine
which attributes are to be flat shaded (as opposed to the old logic,
which only flat shaded colors).

- CLIP setup now ensures that the clipper outputs non-perspective
barycentric coordinates when they are needed by the fragment shader.

Fixes the remaining piglit tests of interpolation qualifiers that were
failing:
- interpolation-flat-*-smooth-none
- interpolation-flat-other-flat-none
- interpolation-noperspective-*
- interpolation-smooth-gl_*Color-flat-*

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f8386a29f07c6a41c4afb99fc3ecd9f18e9151e8 21-Oct-2011 Paul Berry <stereotype441@gmail.com> i965/fs: use determine_interpolation_mode().

This patch changes how fs_visitor::emit_general_interpolation()
decides what kind of interpolation to do. Previously, it used the
shade model to determine how to interpolate colors, and used smooth
interpolation on everything else. Now it uses
ir_variable::determine_interpolation_mode(), so that it respects GLSL
1.30 interpolation qualifiers.

Fixes piglit tests interpolation-flat-*-smooth-{distance,fixed,vertex}
and interpolation-flat-other-flat-{distance,fixed,vertex}.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e04bdeae82797dbdcf6f544a997a4626fdfd4aee 22-Oct-2011 Paul Berry <stereotype441@gmail.com> i965/gen6+: Parameterize barycentric interpolation modes.

This patch modifies the fragment shader back-end so that instead of
using a single delta_x/delta_y register pair to store barycentric
coordinates, it uses an array of such register pairs, one for each
possible intepolation mode.

When setting up the WM, we intstruct it to only provide the
barycentric coordinates that are actually needed by the fragment
shader--that is computed by brw_compute_barycentric_interp_modes().
Currently this function returns just
BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC, because this is the only
interpolation mode we support. However, that will change in a later
patch.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
102bdd26e1acf1ebf75ef85b62df2400239fd480 21-Oct-2011 Paul Berry <stereotype441@gmail.com> i965/fs: Fix split_virtual_grfs() when delta_xy not in a virtual register.

This patch modifies the special case in
fs_visitor::split_virtual_grfs() that prevents splitting from being
applied to the delta_x/delta_y register pair (this register pair needs
to remain contiguous so that it can be used by the PLN instruction).

When gen>=6, this register pair is in a fixed location, not a virtual
register, so it was in no danger of being split. And
split_virtual_grfs' attempt not to split it was preventing some other
unrelated register from being split.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
73b0a28ba8b3e2ab917d4c729f34ddbde52c9e88 04-Oct-2011 Eric Anholt <eric@anholt.net> i965/fs: Fix comparisions with uint negation.

The condmod instruction ends up generating garbage condition codes,
because apparently the comparison happens on the accumulator value (33
bits for UD), not the truncated value that would be written.

Fixes fs-op-neg-*

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2e5a1a254ed81b1d3efa6064f48183eefac784d0 07-Oct-2011 Kenneth Graunke <kenneth@whitecape.org> intel: Convert from GLboolean to 'bool' from stdbool.h.

I initially produced the patch using this bash command:
for file in {intel,i915,i965}/*.{c,cpp,h}; do [ ! -h $file ] && sed -i
's/GLboolean/bool/g' $file && sed -i 's/GL_TRUE/true/g' $file && sed -i
's/GL_FALSE/false/g' $file; done

Then I manually added #include <stdbool.h> to fix compilation errors,
and converted a few functions back to GLboolean that were used in core
Mesa's function pointer table to avoid "incompatible pointer" warnings.

Finally, I cleaned up some whitespace issues introduced by the change.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chad Versace <chad@chad-versace.us>
Acked-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d06cc42c3c85382600176d118d8bf492b4de6a55 07-Oct-2011 Paul Berry <stereotype441@gmail.com> i965: Fix computation of abs(-x) in FS

When updating a register reference to reflect the fact that we were
taking its absolute value, the fragment shader back-end failed to
clear the negate flag, resulting in abs(-x) getting computed as
-abs(x).

I also found (and fixed) a similar problem in brw_eu.h, but I'm not
aware of an actual manifestation of that problem.

Fixes piglit test glsl-fs-abs-neg-with-intermediate.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
de772c402215b956ab3aa0875330fc1bf7cdf95b 21-Aug-2011 Ian Romanick <ian.d.romanick@intel.com> mesa: Use gl_shader_program::_LinkedShaders instead of FragmentProgram

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4170227407eea7fd8287b17480a37309bf73f4e4 07-Oct-2011 Brian Paul <brianp@vmware.com> i965: silence unused var warnings in non-debug builds

Reviewed-by: Chad Versace <chad@chad-versace.us>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b9af592dfa8f8d0fe9f29c2d48bf6846cbd5c50f 29-Sep-2011 Kenneth Graunke <kenneth@whitecape.org> i965: Reverse the operands for INT DIV prior to Gen6.

Apparently on Gen4 and 5, the denominator comes first.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ff8f272b0d02b41a0ce34ab6af7119b9e06f4961 29-Sep-2011 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Implement integer quotient and remainder math operations.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
30be2cc6c7c3378ee17885b5bf41d7ae53bf6fe0 26-Aug-2011 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Implement texelFetch() on Ironlake and Sandybridge.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
23eec54bb0f368d9c88894b544b4af8f01cae2ae 07-Sep-2011 Brian Paul <brianp@vmware.com> i965: add casts to silence int/enum conversion warnings
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
51e7b058750cc480c296d45f773d7a5a662457f5 06-Sep-2011 Brian Paul <brianp@vmware.com> mesa: put _mesa_ prefix on vert_result_to_frag_attrib()
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6489a1d5bab75589569658d374257bf23cb67a23 30-Aug-2011 Paul Berry <stereotype441@gmail.com> Refactor code that converts between gl_vert_result and gl_frag_attrib.

Previously, this conversion was duplicated in several places in the
i965 driver. This patch moves it to a common location in mtypes.h,
near the declaration of gl_vert_result and gl_frag_attrib.

I've also added comments to remind us that we may need to revisit the
conversion code when adding elements to gl_vert_result and
gl_frag_attrib.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2f0edc60f4bd2ae5999a6afa656e3bb3f181bf0f 26-Aug-2011 Chad Versace <chad@chad-versace.us> i965: Fix Android build by removing relative includes

Replace each occurence of
#include "../glsl/*.h"
with
#include "glsl/*.h"

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ecf8963754489abfb5097c130a9bcd4cdb76b6bd 19-Jun-2011 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Implement textureSize (TXS) on Gen5+.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e98ee06776e0ba055e0194836d5813a0bc7e7795 12-Aug-2011 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Don't double-convert integer/boolean uniforms.

When ctx->Const.NativeIntegers is set, Core Mesa loads integer/boolean
uniforms directly, rather than loading the floating point equivalent.
So, when that's set, we don't need to perform any conversions.

Unfortunately, we can't properly support native integers with the old
vertex shader backend, so this patch leaves them disabled for now.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7fbe7fe13359d3f349664410ec73d7bd48824ed6 11-Aug-2011 Eric Anholt <eric@anholt.net> i965/vs: Run the shader backend at link time and return compile failures.

Link failure is something that shouldn't happen, but we sometimes want
it during development. The precompile also allows analysis of shader
codegen with shader-db.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
65b5cbbcf783f6c668ab5b31a0734680dd396794 05-Aug-2011 Eric Anholt <eric@anholt.net> i965: Rename math FS_OPCODE_* to SHADER_OPCODE_*.

I want to just use the same enums in the VS.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6034b9a5124475d300d0678bd2fb6160865fa972 03-May-2011 Eric Anholt <eric@anholt.net> i965: Create a shared enum for hardware and compiler-internal opcodes.

This should make gdbing more pleasant, and it might be used in sharing
part of the codegen between the VS and FS backends.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c9e81fe14f36933617c862efb15ae09194485eab 15-May-2011 Eric Anholt <eric@anholt.net> i965: Drop the reg/hw_reg distinction.

"reg" was set in only one case, virtual GRFs pre register allocation,
and would be unset and have hw_reg set after allocation. Since we
never bothered with looking at virtual GRF number after allocation
anyway, just use the same storage and avoid confusion.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b76378d46a211521582cfab56dc05031a57502a6 04-May-2011 Eric Anholt <eric@anholt.net> i965/fs: Eliminate the magic nature of virtual GRF 0.

This was a debugging aid at one point -- virtual grf 0 should never be
allocated, and it would be used if undefined register access occurred
in codegen. However, it made the confusing register allocation code
even more confusing by indexing things off of 1 all over.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ee0373b833155804bb8846c6f05f897b9ee5afa6 26-Jul-2011 Eric Anholt <eric@anholt.net> i965/fs: Don't upload unused uniform components.

This saves both register space and upload bandwidth for unused values.

Note that previously we were relying on the visitor not initially
generating references to different sets of uniforms between the 8-wide
and 16-wide code generation, and now we're relying on them dead-code
eliminating the same stuff, too.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4683529048ee133481b2d8f1cae1685aa1736f9a 04-Aug-2011 Bryan Cain <bryancain3@gmail.com> Merge branch 'glsl-to-tgsi'

Conflicts:
src/mesa/state_tracker/st_atom_pixeltransfer.c
src/mesa/state_tracker/st_program.c
54db6e618e43abbd69b59e0a03e2b6ec83d3120f 30-Jun-2011 Bryan Cain <bryancain3@gmail.com> r200, r600c, i965: fix build
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f710b8c7501f29f5f8941e757ea1066cbeb03305 23-Jul-2011 Eric Anholt <eric@anholt.net> i965/fs: Allow register coalescing where the source is a uniform.

Removes 0.8% of the fragment shader instructions on Unigine Tropics.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a8b86459a1bb74cfdf0d63572a9fe194b2b5b53f 23-Jul-2011 Eric Anholt <eric@anholt.net> i965/fs: Optimize a * 1.0 -> a.

This appears in our instruction stream as a result of the
brw_vs_constval.c handling.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6d8d6b41b85a18685351f3023a4cd41266ba9e68 23-Jul-2011 Eric Anholt <eric@anholt.net> i965/fs: If we see a RCP of a constant, try to constant fold it.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
eb30820f268608cf451da32de69723036dddbc62 23-Jul-2011 Eric Anholt <eric@anholt.net> i965/fs: Port texture projection avoidance optimization from the old backend.

This is part of fixing a ~1% performance regression in OpenArena when
changing the fixed function fragment shader to using the new backend.
Right now this just avoids the LINTERP of the projector, not the math
using it.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
44ffb4ae207e48f78fae55925601b8708ed09c1d 29-Jul-2011 Eric Anholt <eric@anholt.net> i965/fs: Stop using the exec_list iterator.

The old style has gone out of favor in the project, but I kept copy
and pasting from existing iterator code.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6430df37736d71dd2bd6f1fe447d39f0b68cb567 10-Jun-2011 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Add support for TXD with shadow comparisons.

Our hardware doesn't have a sample_d_c message, so we have to do a
regular sample_d and emit instructions to manually perform the
comparison.

This requires a state dependent recompile whenever the sampler's compare
mode or function change. This adds the per-sampler comparison functions
to brw_wm_prog_key, but only sets them when the sampler's compare mode
is GL_COMPARE_R_TO_TEXTURE (i.e. only for shadow sampling).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ad9481e12813d5f1dec95ce123927e132fa935fb 11-Jun-2011 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Check for compilation failure and bail before optimizing.

Prior to this patch, it would attempt to optimize and allocate registers
for the program even if it failed to compile. This seems wasteful.

More importantly, the "message length > 11" failure seems to choke the
instruction scheduler, making it somehow use an undefined value and
segmentation fault.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c173541d9769d41a85cc899bc49699a3587df4bf 27-Apr-2011 Eric Anholt <eric@anholt.net> i965: Use state streaming on programs, and state base address on gen5+.

There will be a little bit of thrashing of the program cache BO as the
cache warms up, but once the application is in steady state, this
reduces relocations on gen5 and later.

On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6%
+/- 1.3% (n=6). No statistically significant performance difference
on nexuiz (n=5).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
23ef4a6063668c187d00a0502207f0c03be5f994 10-Jun-2011 Eugeni Dodonov <eugeni@mandriva.com> Fix format not a string literal error with -Werror=format-security

A trivial fix for error: format not a string literal and no format
arguments with compiling with -Werror=format-security flags.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c331b3123ecda127919458e24848b7c1596525ac 12-May-2011 Eric Anholt <eric@anholt.net> i965/fs: Use the embedded compare in SEL on gen6+.

This avoids the extra CMP and the predication on SEL, so in addition
to one less instruction, it makes scheduling less constrained.

Improves glbenchmark Egypt performance 0.6% +/- 0.2% (n=3). Reduces
FS instruction count across affected shaders in shader-db by 1.3%
without regressing any.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8752764076e5b3f052a57e0134424a37bf2e9164 17-May-2011 Eric Anholt <eric@anholt.net> i965/fs: Do a FS compile up front at link time to produce link errors.

At glLinkShaders time, a fail() call in FS compile in 8-wide (the one
that's required to succeed, though we may relax that at some point for
pre-Ironlake performance) will now report out as a link error.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d1f70a8a6c6ec7007bad22d3d6013415be2d243a 25-May-2011 Eric Anholt <eric@anholt.net> i965/fs: Split the GLSL IR -> FS LIR visitor to brw_fs_visitor.cpp.

We now have:
brw_fs.cpp handles calling out to everything and optimization.
brw_fs_visitor.cpp handles translating to our LIR.
brw_fs_emit.cpp handles emitting from our LIR to native code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
11dd9e9c0fcf9985b90ff4b63b2833345fece027 25-May-2011 Eric Anholt <eric@anholt.net> i965/fs: Split the BRW native code emit to brw_fs_emit.cpp

This is all separate from the visitor and the optimization passes
which feed into it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b7b700aeb0eab2cae26a01d9db42feea969333c7 26-May-2011 Eric Anholt <eric@anholt.net> i965: Move a couple of GLSL IR -> BRW helper functions to brw_shader.cpp.

These will be used by the VS backend as well.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
14b86f3c9131c1b26b01e07679cc899df0885b23 26-May-2011 Eric Anholt <eric@anholt.net> i965: Move non-FS-specific shader support to brw_shader.cpp.

These only existed in brw_fs.cpp because it was the only .cpp file in
the area when I wrote them.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
53c89c67f33639afef951e178f93f4e29acc5d53 27-Apr-2011 Eric Anholt <eric@anholt.net> i965: Avoid generating MOVs for assignments of expressions.

No statistically significant difference measured in 3dbenchmark
egypt/pro. It does reduce fragment shader instructions across
shader-db by 0.3%.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1791857d7d950d3d2834bbb09b495f51f43ef7c1 17-May-2011 Eric Anholt <eric@anholt.net> i965/fs: Move the computation of register block count from unit to compile.

No net code size change, but unit update is down 0.8% code size
pre-gen6.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
615117ce4efd041459f7d4b0c77aa8e248345e66 23-May-2011 Eric Anholt <eric@anholt.net> i965/fs: Track fixed GRF regs separate from allocated GRF file in scheduling.

There's an assumption here that fixed GRFs will never intersect with
the allocated GRFs. That's true today, though it might change some
day if we decide to register-allocate the regs containing push
constants once they're dead.

This fixes a regression in 0f7325b89038937bd428f7c89ed9859189a0ab0b in
Lightsmark from the texture instructions now containing g0 references
instead of having that be implied. Performance is improved 15.2% +/-
3.6% (n=3).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34968
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f147599ef4b0d14c25a7e0d3f9f1c9b0229bb6fc 19-May-2011 Eric Anholt <eric@anholt.net> i965: Remove linear_color for GL_PERSPECTIVE_CORRECTION_HINT.

From the GL 2.1 spec:

"Required perspective-correct interpolation for all fragment
attributes except depth in sections 3.4.1 and 3.5.1, effectively
making GL PERSPECTIVE CORRECT HINT a no-op."

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fa42de5ad7ebbc0b81ce6ba0553742f0413690a7 24-May-2011 Eric Anholt <eric@anholt.net> i965: Fix assertion failures in unused brw_reg setup by deleting it.

I was using undefined values to create an unused value. Go me.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37366
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9be8524af753791d26fbd65417c5380b4d934296 21-May-2011 Kenneth Graunke <kenneth@whitecape.org> i965: Fix sampling on Ivybridge after headerless change.

Fixes a regression since 90e922267a89fa9bef254bb257405531ceff7356.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
24de02acaca2ed2e5149a6a026b8707cd0d6d27f 21-May-2011 Kenneth Graunke <kenneth@whitecape.org> i965: Remove "TXD" from justification of sampler message headers.

The coordinate offsets set in the m1 header are for textureOffset;
they have nothing to do with textureGrad (TXD).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
90e922267a89fa9bef254bb257405531ceff7356 12-May-2011 Eric Anholt <eric@anholt.net> i965/fs: Don't emit a header on gen5+ sample messages unless required.

Improves glbenchmark egypt performance 0.6% +/- 0.4% (n=6).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4bbc7915f16a8b0dcead3f34aa1b4f0328147bea 12-May-2011 Eric Anholt <eric@anholt.net> i965/fs: Fix GPU hang on texture2d-bias on pre-Ironlake.

In the 16-wide rework, I missed that we were setting some things to be
SIMD16 mode (corresponding to their setup in emit_texture_gen4()).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b126a0c0cb30b1e2f2df1953fe14d8596d1cf4f7 02-Nov-2010 Eric Anholt <eric@anholt.net> i965: Add support for correct GL_CLAMP behavior by clamping coordinates.

This removes the stupid strict-conformance fallback code I broke when
adding ARB_sampler_objects.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36572
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7592f005608e6c03d53c18d27d9af84bde802014 11-May-2011 Eric Anholt <eric@anholt.net> i965/fs: Drop the viewport index/rtai clearing in gen6 fb writes.

These fields are documented to be in the payload, and though the FB
write docs say they *aren't* in the payload, for all other fields the
payload and header is structured so that no overwriting is required
except for non-default options.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
136eb2bde769713b100351ff96bceb970f068c0a 10-May-2011 Eric Anholt <eric@anholt.net> i965/fs: Add support for "if" statements in 16-wide mode on gen6+.

It turns out there's nothing in the hardware preventing this. It
appears that it ought to work on pre-gen6 as well, but just produces
GPU hangs.

Improves glbenchmark Egypt framerate 4.4% +/- 0.3% (n=3), and Pro by
2.6% +/- 0.6% (n=3).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
27b03926618ddcafabb7b61e652fe6458b017b24 11-May-2011 Eric Anholt <eric@anholt.net> i965/fs: Fix discard and alpha test in 16-wide.

As of gen6, alt-mode (which we use) MOVs of floats are not raw --
they'll modify infs/nans. This broke discard and alpha test in
16-wide, where apparently the upper 8 bits of the pixel enables being
set were causing the whole value to get trashed upon being moved.
Treating the values as UD instead of float makes sure they get
preserved. While I'm here, replace the two 8-wide moves of the halves
of the header with a single compressed move.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36648
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
51761a1aefd31b7df12edd9467ac630b9cbbbbc9 11-May-2011 Eric Anholt <eric@anholt.net> i965/fs: Cut an instruction and a temporary from gen6 discard statements.

I thought I was thwarted initially when I couldn't do conditional mod
on a MOV, and couldn't use two immediate constants in one instruction.
But g0 != g0 is also a way to produce a failing comparison.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5dd5be69f099211db027b6e39150cacefcfdf8b6 09-May-2011 Eric Anholt <eric@anholt.net> i965/fs: Fix compiler warnings about dead code from 963431829055f63ec94d
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2a95568f64a6641a49a2d4855272e9be2ac2db6d 11-May-2011 Kenneth Graunke <kenneth@whitecape.org> i965: Avoid register coalescing away MATH workarounds on Ivybridge.

The MATH instruction cannot handle source modifiers, even on Gen7.
So, apply this workaround for Sandybridge on Ivybridge as well.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
64ce592679a5b08d66e3cbbf964f9e695e14aee1 16-Mar-2011 Kenneth Graunke <kenneth@whitecape.org> i965: Add support for IF/ELSE/ENDIF control flow on Ivybridge.

Ivybridge's IF instruction doesn't support conditional modifiers.
It also introduces UIP, which must point to the ENDIF instruction.

ELSE and ENDIF remain the same except that JIP moves from dst to src1.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ff6e3c73f6553cd29b915497b5b00e3ef158a27d 29-Apr-2011 Kenneth Graunke <kenneth@whitecape.org> i965: Add support for Ivybridge texturing messages.

Ivybridge puts the shadow comparator first, then lod/bias, and finally
the coordinate---unlike previous generations which always reserved four
slots for the coordinate at the beginning.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5936d96d33e767aa99f6afa92f2a6582ff04df23 16-May-2011 Kenneth Graunke <kenneth@whitecape.org> i965: Move IF stack handling into the EU abstraction layer/brw_compile.

This hides the IF stack and back-patching of IF/ELSE instructions from
each of the code generators, greatly simplifying the interface.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
37642518b8864ce751754957b08cdb437998f4e7 29-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Add support for compute-to-mrf in 16-wide mode.

This is more painful than instruction scheduling, as we have to
compare two MRF writes to see if they coincide, and have to handle
partial GRF writes before that (for example, the result of a math
instruction written to color).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
445289b5093acb9abaf7e0a89bfa319fcb4a1c31 29-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Typo fix a comment.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0834607a891f7c2529d1f2cdeca28b6e98899f8b 25-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Enable constant propagation in 16-wide.

All that needed fixing was skipping the newly-possible
uncompressed/sechalf partial GRF constant writes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3b20f999bb7e9056e83ca09a842a9747d4ac1674 23-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Add support for 16-wide dispatch with uniforms in use.

This is glued in in a bit of an ugly way -- we rely on the uniforms
having been set up by 8-wide dispatch, and we just reuse them without
the ability to add new uniforms for any reason, since the 8-wide
compile is already completed. Today, this all works out because our
optimization passes are effectively the same for both and even if they
weren't, we don't reduce the set of uniforms pushed after
optimization.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b943b9b1a696cf51adfb2a18bcb9cf503fb2737f 23-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Add a little whitespace between shader dumping debug.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9c57780dc0604f871650c5d23c06d627d964d803 28-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Add support for compr4 MRF writes.

These reduce an emitted (not decoded) instruction per shader on
g4x/gen5, but may allow for additional register coalescing as well.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
42ad2f0b9b6a18f1613f6d915a46b4a4a89c5aa2 14-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Add support for 16-wide dispatch on gen5.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
662f1b48bd1a02907bb42ecda889a3aa52a5755d 12-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Add initial support for 16-wide dispatch on gen6.

At this point it doesn't do uniforms, which have to be laid out the
same between 8 and 16. Other than that, it supports everything but
flow control, which was the thing that forced us to choose 8-wide for
general GLSL support.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
76b7a0c1af23838cb5100424a2a88d621b881d05 24-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Add support for discard instructions in 16-wide mode.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
148a32e622c5b95a4dbd9a8776fddf85ef484147 29-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Add support for math instructions in 16-wide mode.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
54990673a65b72fd222aeafc19f3a384ce597146 24-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Fix interference calculation of pixel_[xy] in 16-wide.

Fixes glsl-fs-ceil in that mode, which produced the code in the comment.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
af20328271425c217630b5114ee172bd8387a91a 23-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Disable some optimization passes under 16-wide for now.

These are fixable for 16, but that can wait until after it's basically
working.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8575d1836249309048d77d342671aad65c7fa7ff 13-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Add support for 16-wide texturing on gen5+.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
141b0bb2779c80d3cd3fd21d2e9d10efa0433f26 21-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Add support for computing pixel_[xy] in 16-wide.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7c647a2fe98a645723fa5eace7f7f6c5c26f4f8e 14-Mar-2011 Eric Anholt <eric@anholt.net> i965: Move the destination reg setup for 8/16 wide to the emit code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4847f802c28e595130bda14055cd52c9b1f51cd7 09-Apr-2011 Eric Anholt <eric@anholt.net> i965/fs: Constant-fold immediates in src0 of SEL instructions.

This is like what we do for add/mul, but we have to invert the
predicate to choose the other source instead.

This removes 5 extra moves of constants in nexuiz shaders. No
statistically significant performance difference on my Sandybridge
laptop (n=5).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
811c147220d2630b769e505ce4d40ef9108fe034 09-Apr-2011 Eric Anholt <eric@anholt.net> i965/fs: Constant-fold immediates in src0 of CMP instructions.

This is like what we do with add/mul, but we also have to flip the
conditional test.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
963431829055f63ec94d88c97a5d07d30e49833a 03-Apr-2011 Eric Anholt <eric@anholt.net> i965/fs: Remove broken optimization for live intervals in loops.

The theory here was to detect a temporary variable used within a loop,
and avoid considering it live across the entire loop. However, it was
overeager and failed when the first definition of the variable
appeared within the loop but was only conditionally defined.

Fixes glsl-fs-loop-redundant-condition.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5d7fefb9afbcc6f1d58a92d07c390e6b912c3b00 03-Apr-2011 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Switch W and 1/W in Sandybridge interpolation setup.

Various documentation mentions that "W" is handed to the WM stage,
but further digging seems to indicate that they really mean 1/W.

The code here is still unclear, but changing this fixes piglit
test "fragcoord_w" on Sandybridge as well as a Khronos ES2 conformance
test. I also tested 3DMarkMobile ES2.0's taiji and hoverjet demos, as
well as Nexuiz, just to be safe.

NOTE: This is a candidate for the 7.10 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a99e80d795f7c6aec0e73369a31d1728577b9727 25-Mar-2011 Ian Romanick <ian.d.romanick@intel.com> mesa: Fix ugly indentation left from previous commit

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
89d81ab16c05818b290ed735c1343d3abde449bf 25-Jan-2011 Ian Romanick <ian.d.romanick@intel.com> glsl: Calcluate Mesa state slots in front-end instead of back-end

This should be the last bit of infrastructure changes before
generating GLSL IR for assembly shaders.

This commit leaves some odd code formatting in ir_to_mesa and brw_fs.
This was done to minimize whitespace changes / reindentation in some
loops. The following commit will restore formatting sanity.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
252eaa765e69a70036ec33a7e1e0ffeac1aab2ff 29-Mar-2011 Chris Wilson <chris@chris-wilson.co.uk> i965: Avoid name clash of loop counter and member

src/mesa/drivers/dri/i965/brw_fs.cpp:565 warning: name lookup of ‘c’ changed

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0c8beb0ab5e72a9d2ecaad51db16a7d5291e120b 27-Mar-2011 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Fix linear gl_Color interpolation on pre-gen6 hardware.

Civilization 4's shaders make heavy use of gl_Color and don't use
perspective interpolation. This resulted in rivers, units, trees, and
so on being rendered almost entirely white. This is a regression
compared to the old fragment shader backend.

Found by inspection (comparing the old and new FS backend code).

References: https://bugs.freedesktop.org/show_bug.cgi?id=32949

NOTE: This is a candidate for the 7.10 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4e994e150f65c854229b4af12eae5519ebd9dda1 25-Mar-2011 Ian Romanick <ian.d.romanick@intel.com> i965/fs: Use different name for inner loop counter

'i' is already used for the outer loop. This caused some problems
while doing other work in this area. No bug exists here... until you
want to use the outer loop counter in the inner loop.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2911fa0cca86f7acbc5423cab4dd328a412253cd 13-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Make compile failure more verbose with INTEL_DEBUG=wm.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4a101f957714dea2bc956d516d34c5b56ecb2c64 13-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Clean up reg_undef args from long ago lack of fs_inst overloads.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
53d78be3bde68bfb6416fb9c1abfbc24030f390e 13-Mar-2011 Eric Anholt <eric@anholt.net> i965/fs: Clean up the emit calls by introducing emit() overload helpers.

I think the code ends up a lot more legible this way, though we've
still got the overloads in the fs_inst as well (even though there's
only one caller left currently).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5e9aa9926b9bdf1260ce7350b88908bda337388b 26-Feb-2011 Kenneth Graunke <kenneth@whitecape.org> mesa: Remove the CompileShader driver hook; it's just a no-op.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2279156fe7ac9718533b8b0de90ae96100486680 16-Mar-2011 Kenneth Graunke <kenneth@whitecape.org> i965: Rename brw_(IF|CONT)_gen6 functions to gen6_(IF|CONT).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cc48d663f7282411d88c6187ce3d03f21df0acd3 16-Mar-2011 Kenneth Graunke <kenneth@whitecape.org> i965: Rename BRW_SAMPLER_MESSAGE_..._GEN5 to GEN5_SAMPLER_MESSAGE.

We already have lots of GEN6_* defines; this seems more consistent.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a99447314ca1cfce60f2a22285398fb222b2a440 12-Mar-2011 Eric Anholt <eric@anholt.net> i965: Fix alpha testing when there is no color buffer in the FBO.

We were alpha testing against an unwritten value, resulting in garbage.
(part of) Bug #35073.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b60651a17ba85af14e9d7b9a1398e065adb58665 11-Mar-2011 Eric Anholt <eric@anholt.net> i965: Do our lowering passes before the loop of optimization.

The optimization loop won't reinsert noise instructions or quadop
vectors, so we were traversing the tree for nothing. Lowering vector
indexing was in the loop after do_common_optimization() to avoid the
work if it ended up that the index was actually constant, but that has
been called already in the core.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
25e31140952c328f70f804e0134664d7ed6248a6 14-Mar-2011 Kenneth Graunke <kenneth@whitecape.org> i965: Enable texture lookups whose return type is 'float'

This enables the new shadow texture functions in GLSL 1.30.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
403be1111190a3fe63ae03bc0111e0a0b026495b 13-Mar-2011 Eric Anholt <eric@anholt.net> Revert "i965: Use the fixed function GLSL program instead of the ARB program."

This reverts commit 81b34a4e3a7aec9cdf2781757408dc5e9eec79cb. There
were regressions in the core change that this depends on.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
81b34a4e3a7aec9cdf2781757408dc5e9eec79cb 12-Jan-2011 Eric Anholt <eric@anholt.net> i965: Use the fixed function GLSL program instead of the ARB program.

This gets one more piece of the pipeline onto the new codegen backend.
Once ARB_fragment_program can generate GLSL programs, we can nuke the
old backend.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
186d3bc7a3389b78a851e34d8f970c28b8db1608 01-Mar-2011 Kenneth Graunke <kenneth@whitecape.org> Revert "i965/fs: Correctly set up gl_FragCoord.w on Sandybridge."

This reverts commit 4a3b28113c3d23ba21bb8b8f5ebab7c567083a6d, as it
caused a regression on Ironlake (bug #34646).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
58f7c9c72ee52527610b26ca8a137dd88c082c89 25-Feb-2011 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Initial plumbing to support TXD.

This adds the opcode and the code to convert ir_txd to OPCODE_TXD;
it doesn't actually add support yet.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2830b1ae9032666e62460de5aece8db843c51c14 28-Oct-2010 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Complete TXL support on gen5+.

Initial plumbing existed to turn the ir_txl into OPCODE_TXL, but it was
never handled.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4ddd11aad6a396e98ae30e3e78f6736804eae541 28-Oct-2010 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Complete TXL support on gen4.

Initial plumbing existed to turn the ir_txl into OPCODE_TXL, but it was
never handled.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e54d62b89677624b5806442cc5053c0ceedd79b0 28-Oct-2010 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Use a properly named constant in TXB handling.

The old value, BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE makes it sound like we're
doing a non-bias texture lookup. It has the same value as the new constant
BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE_BIAS_COMPARE, so there should be no
functional changes.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4a3b28113c3d23ba21bb8b8f5ebab7c567083a6d 20-Feb-2011 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Correctly set up gl_FragCoord.w on Sandybridge.

pixel_w is the final result; wpos_w is used on gen4 to compute it.

NOTE: This is a candidate for the 7.10 branch.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
df2aef0e197f9276f60a8e755260420c90841269 20-Feb-2011 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Refactor control flow stack handling.

We can't safely use fixed size arrays since Gen6+ supports unlimited
nesting of control flow.

NOTE: This is a candidate for the 7.10 branch.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2c2686b912de19a430aba9f5ea5fa679eabdc5c6 19-Feb-2011 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Avoid register coalescing away gen6 MATH workarounds.

The code that generates MATH instructions attempts to work around
the hardware ignoring source modifiers (abs and negate) by emitting
moves into temporaries. Unfortunately, this pass coalesced those
registers, restoring the original problem. Avoid doing that.

Fixes several OpenGL ES2 conformance failures on Sandybridge.

NOTE: This is a candidate for the 7.10 branch.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
72cd7e87d35e96fad9643f1cee706a8568fa3fa1 19-Feb-2011 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Apply source modifier workarounds to POW as well.

Single-operand math already had these workarounds, but POW (the only two
operand function) did not. It needs them too - otherwise we can hit
assertion failures in brw_eu_emit.c when code is actually generated.

NOTE: This is a candidate for the 7.10 branch.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0f7325b89038937bd428f7c89ed9859189a0ab0b 27-Dec-2010 Kenneth Graunke <kenneth@whitecape.org> i965: Emit texel offsets in sampler messages.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d3073f58c17d8675a2ecdd5dfa83e5520c78e1a8 21-Jan-2011 Kenneth Graunke <kenneth@whitecape.org> Convert everything from the talloc API to the ralloc API.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e256e4743c3f8f924f0d191759d9428f33f3e329 19-Jan-2011 Kenneth Graunke <kenneth@whitecape.org> glsl, i965: Remove unnecessary talloc includes.

These are already picked up by ir.h or glsl_types.h.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
382c2d99da3f219a5b82f391a81b534b6b44ebce 19-Jan-2011 Eric Anholt <eric@anholt.net> i965/fs: Add a helper function for detecting math opcodes.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1991d92207cf629ba4ceead4bfc3f768d7b9e402 19-Jan-2011 Eric Anholt <eric@anholt.net> i965/fs: Assign URB/CURB register numbers after instruction scheduling.

This fixes a bunch of unnecessary barriers due to the scheduler not
knowing what that arbitrary register description refers to when trying
to reason about its dependencies.

The result is rescheduling in the convolution kernel shader in
Lightsmark, which results in avoiding register spilling and increasing
the performance of the first scene from 6-7 fps midway through the
panning to 11fps. The register spilling was a regression from Mesa
7.9 to Mesa 7.10.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
63879d90ace519749fed228ca0e21b5b56c7e1c0 19-Jan-2011 Eric Anholt <eric@anholt.net> i965/fs: Add an instruction scheduler.

Improves performance of my GLSL demo by 5.1% (+/- 1.4%, n=7). It also
reschedules the giant multiply tree at the end of
glsl-fs-convolution-1 so that we end up not spilling registers,
producing the expected level of performance.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3f2fe31eee1667ef9cad99aaad69e52a09c9effa 19-Jan-2011 Eric Anholt <eric@anholt.net> i965/fs: Add a helper for detecting texturing opcodes.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
568e0083651dd29e5bce94ade8625a64a0e85e88 18-Jan-2011 Eric Anholt <eric@anholt.net> i965: Fix a comment typo.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8ce425f3e3e330bda859c439b915c4e59b1a2bf4 18-Jan-2011 Eric Anholt <eric@anholt.net> i965: Fix a bug in i965 compute-to-MRF.

Fixes piglit glsl-fs-texture2d-branching. I couldn't come up with a
testcase that didn't involve dead code, but it's still worthwhile to
fix I think.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e4be665bbddcb6ddfd7b9b13f01152a97097b35c 18-Jan-2011 Eric Anholt <eric@anholt.net> i965: Fix dead pointers to fp->Parameters->ParameterValues[] after realloc.

Fixes texrect-many regression with ff_fragment_shader -- as we added
refs to the subsequent texcoord scaling paramters, the array got
realloced to a new address while our params[] still pointed at the old
location.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a6e4614ca1284c5731876bb88732b326bf13aba0 14-Jan-2011 Eric Anholt <eric@anholt.net> i965: Replace broken handling of dead code with an assert.

This code should never have been triggered, but I often did anyway
when I disabled optimization passes during debugging, then spent my
time debugging that this code doesn't work.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7c7df146b59bae9dcb3a271bd3c671e273015617 14-Jan-2011 Eric Anholt <eric@anholt.net> i965: Add an invalidation of live intervals after register splitting.

No effect, since it was called before live intervals were calculated.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c3f000b3926988124a44ce7e8cd6588e46063058 12-Jan-2011 Eric Anholt <eric@anholt.net> i965/fs: Do flat shading when appropriate.

We were trying to interpolate, which would end up doing unnecessary
math, and doing so on undefined values. Fixes glsl-fs-flat-color.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e880a57a71bbd5152ed26367dcc7051f21c20981 12-Jan-2011 Eric Anholt <eric@anholt.net> i965: Clarify when we need to (re-)calculate live intervals.

The ad-hoc placement of recalculation somewhere between when they got
invalidated and when they were next needed was confusing. This should
clarify what's going on here.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ab56e3be9aae54602372427755305c354821e105 12-Jan-2011 Eric Anholt <eric@anholt.net> i965/fs: When producing ir_unop_abs of an operand, strip negate.

We were returning the negative absolute value, instead of the absolute
value. Fixes glsl-fs-abs-neg.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4eb7284ef98c24331761cbe683c5bd89058e3ad3 12-Jan-2011 Eric Anholt <eric@anholt.net> i965: Tighten up the check for flow control interfering with coalescing.

This greatly improves codegen for programs with flow control by
allowing coalescing for all instructions at the top level, not just
ones that follow the last flow control in the program.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
df4d83dca4618eb7077637865763d3e9ab750d11 29-Dec-2010 Eric Anholt <eric@anholt.net> i965: Do lowering of array indexing of a vector in the FS.

Fixes a regression in ember since switching to the native FS backend,
and the new piglit tests glsl-fs-vec4-indexing-{2,3} for catching this.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
54df8e48bcceacbfa468d5237f2981b26493df29 28-Dec-2010 Eric Anholt <eric@anholt.net> i965: Fix regression in FS comparisons on original gen4 due to gen6 changes.

Fixes 26 piglit cases on my GM965.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
74dffb39c3434b590b36833905f2b12a6e3477e9 28-Dec-2010 Eric Anholt <eric@anholt.net> i965: Factor out the ir comparision to BRW_CONDITIONAL_* code.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
634a7dce9c1d9e4a8576ff8197c8adaea7e9ddd1 27-Dec-2010 Kenneth Graunke <kenneth@whitecape.org> i965: Flatten if-statements beyond depth 16 on pre-gen6.

Gen4 and Gen5 hardware can have a maximum supported nesting depth of 16.
Previously, shaders with control flow nested 17 levels deep would
cause a driver assertion or segmentation fault.

Gen6 (Sandybridge) hardware no longer has this restriction.

Fixes fd.o bug #31967.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4374703a9b2ce0be105ee544c8402a932e3e1f52 22-Dec-2010 Zhenyu Wang <zhenyuw@linux.intel.com> i965: explicit tell header present for fb write on sandybridge

Determine header present for fb write by msg length is not right
for SIMD16 dispatch, and if there're more output attributes, header
present is not easy to tell from msg length. This explicitly adds
new param for fb write to say header present or not.

Fixes many cases' hang and failure in GL conformance test.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
036c817f77f71e7c4b17571ae100a9bc93d8fe5b 13-Dec-2010 Eric Anholt <eric@anholt.net> i965: Fix gl_FragCoord.z setup on gen6.

Fixes glsl-bug-22603.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c3ca384e7190656afcd9f5143e811843efa2b3cb 09-Dec-2010 Vinson Lee <vlee@vmware.com> i965: Silence uninitialized variable warning.

Fixes this GCC warning.
brw_fs.cpp: In function 'brw_reg brw_reg_from_fs_reg(fs_reg*)':
brw_fs.cpp:3255: warning: 'brw_reg' may be used uninitialized in this function
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b2167a6c013c057e731b96486e3363c1d1171d60 08-Dec-2010 Eric Anholt <eric@anholt.net> i965: Fix flipped value of the not-embedded-in-if on gen6.

Fixes:
glean/glsl1-! (not) operator (1, fail)
glean/glsl1-! (not) operator (1, pass)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7ca7e9b626389dd6dac683c6664b8478e6d5c3b9 07-Dec-2010 Eric Anholt <eric@anholt.net> i965: Work around gen6 ignoring source modifiers on math instructions.

With the change of extended math from having the arguments moved into
mrfs and handed off through message passing to being directly hooked
up to the EU, it looks like the piece for doing source modifiers
(negate and abs) was left out.

Fixes:
fog-modes
glean/fp1-ARB_fog_exp test
glean/fp1-ARB_fog_exp2 test
glean/fp1-Computed fog exp test
glean/fp1-Computed fog exp2 test
ext_fog_coord-modes
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6848e27e1462e98dd91826a06f96c203c9eeebd0 07-Dec-2010 Ian Romanick <ian.d.romanick@intel.com> i965: Correctly emit constants for aggregate types (array, matrix, struct)

Previously the code only handled scalars and vectors. This new code
is modeled somewhat after similar code in ir_to_mesa.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
16f8c823898fd71a3545457eacd2dc31ddeb3592 11-Nov-2010 Eric Anholt <eric@anholt.net> i965: Move payload reg setup to compile, not lookup time.

Payload reg setup on gen6 depends more on the dispatch width as well
as the uses_depth, computes_depth, and other flags. That's something
we want to decide at compile time, not at cache lookup. As a bonus,
the fragment shader program cache lookup should be cheaper now that
there's less to compute for the hash key.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
843a6a308e05bd4bf2056e08ec65ac4770097b93 01-Dec-2010 Eric Anholt <eric@anholt.net> i965: Add support for gen6 CONTINUE instruction emit.

At this point, piglit tests for fragment shader loops are working.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
00e5a743e2ee3981a34b95067a97fa73c0f5d779 01-Dec-2010 Eric Anholt <eric@anholt.net> i965: Add support for gen6 BREAK ISA emit.

There are now two targets: the hop-to-end-of-block target, and the
target for where to resume execution for active channels.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4890e0f09c934e3ffb692b417e5444e43685c876 01-Dec-2010 Eric Anholt <eric@anholt.net> i965: Add support for gen6 DO/WHILE ISA emit.

There's no more DO since there's no more mask stack, and WHILE has
been shuffled like IF was.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2927b6c21202fd0f9a661665e0093e7193c5df6e 30-Nov-2010 Eric Anholt <eric@anholt.net> i965: Fix type of gl_FragData[] dereference for FB write.

Fixes glsl-fs-fragdata-1, and hopefully Eve Online where I noticed
this bug in the generated shader. Bug #31952.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b6b91fa02911f5dfc5d528d822674ee5557800d9 19-Nov-2010 Eric Anholt <eric@anholt.net> i965: Remove duplicate MRF writes in the FS backend.

This is quite common for multitexture sampling, and not only cuts down
on the second and later set of MOVs, but typically also allows
compute-to-MRF on the first set.

No statistically siginficant performance difference in nexuiz (n=3),
but it reduces instruction count in one of its shaders and seems like
a good idea.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
47b1aac1cf0aefae4df58a60bb7eb26d21e25913 18-Nov-2010 Eric Anholt <eric@anholt.net> i965: Improve compute-to-mrf.

We were skipping it if the instruction producing the value we were
going to compute-to-mrf used its result reg as a source reg. This
meant that the typical "write interpolated color to fragment color" or
"texture from interpolated texcoord" shader didn't compute-to-MRF.
Just don't check for the interference cases until after we've checked
if this is the instruction we wanted to compute-to-MRF.

Improves nexuiz high-settings performance on my laptop 0.48% +- 0.08%
(n=3).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
19631fab35ca4d5ca64d606922f3f20774b27645 19-Nov-2010 Eric Anholt <eric@anholt.net> i965: Recognize saturates and turn them into a saturated mov.

On pre-gen6, this turns 4 instructions into 1. We could still do
better by folding the saturate into the instruction generating the
value if nobody else uses it, but that should be a separate pass.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
602ae2441aaca6a652d3fc78114bb60852132f98 18-Nov-2010 Eric Anholt <eric@anholt.net> i965: Fold constants into the second arg of BRW_SEL as well.

This hits a common case with min/max operations.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f9b420d3bda25ea517b66c5ee2c6bde4fdff3935 18-Nov-2010 Eric Anholt <eric@anholt.net> i965: Remove extra \n at the end of every instruction in INTEL_DEBUG=wm.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
61126278a39fbff9a66aff9ecc37893e87950091 19-Nov-2010 Eric Anholt <eric@anholt.net> i965: Fix compute_to_mrf to not move a MRF write up into another live range.

Fixes glsl-fs-copy-propagation-texcoords-1.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
63684a9ae7a66f68df1f2c68cd9358e5622122a3 19-Nov-2010 Kenneth Graunke <kenneth@whitecape.org> glsl: Combine many instruction lowering passes into one.

This should save on the overhead of tree-walking and provide a
convenient place to add more instruction lowering in the future.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
11d6f1c69871d0b7edc28f639256460839fccd2d 16-Nov-2010 Ian Romanick <ian.d.romanick@intel.com> glsl: Add ir_quadop_vector expression

The vector operator collects 2, 3, or 4 scalar components into a
vector. Doing this has several advantages. First, it will make
ud-chain tracking for components of vectors much easier. Second, a
later optimization pass could collect scalars into vectors to allow
generation of SWZ instructions (or similar as operands to other
instructions on R200 and i915). It also enables an easy way to
generate IR for SWZ instructions in the ARB_vertex_program assembler.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fc92e87b9757eda01caf0bb3e2c31b1dbbd73aa0 11-Nov-2010 Ian Romanick <ian.d.romanick@intel.com> glsl: Eliminate assumptions about size of ir_expression::operands

This may grow in the near future.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f2616e56de8a48360cae8f269727b58490555f4d 18-Nov-2010 Ian Romanick <ian.d.romanick@intel.com> glsl: Add ir_unop_sin_reduced and ir_unop_cos_reduced

The operate just like ir_unop_sin and ir_unop_cos except that they
expect their inputs to be limited to the range [-pi, pi]. Several
GPUs require this limited range for their sine and cosine
instructions, so having these as operations (along with a to-be-written
lowering pass) helps this architectures.

These new operations also matche the semantics of the
GL_ARB_fragment_program SCS instruction. Having these as operations
helps in generating GLSL IR directly from assembly fragment programs.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
50b4508319cc5277d51a38065850eaa092afc0d4 18-Nov-2010 Eric Anholt <eric@anholt.net> i965: Eliminate dead code more aggressively.

If an instruction writes reg but nothing later uses it, then we don't
need to bother doing it. Before, we were just killing code that was
never read after it was ever written.

This removes many interpolation instructions for attributes with only
a few comopnents used. Improves nexuiz high-settings performance .46%
+/- .12% (n=3) on my Ironlake.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
da35388044db4aa6fc66c08a087d8d703b5a6008 17-Nov-2010 Eric Anholt <eric@anholt.net> i965: Fail on loops on gen6 for now until we write the EU emit code for it.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d512cbf58f9039575dbbb5ab65dbbf7b742a0854 17-Nov-2010 Eric Anholt <eric@anholt.net> i965: Shut up spurious gcc warning about GLSL_TYPE enums.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9935fe705df44bb633039ca74332cc0c126ccc30 17-Nov-2010 Kenneth Graunke <kenneth@whitecape.org> glsl: Remove the ir_binop_cross opcode.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3b337f5cd94384d2d5918fb630aa8089e49b1d8d 13-Nov-2010 Eric Anholt <eric@anholt.net> i965: Fix gl_FragCoord inversion when drawing to an FBO.

This showed up as cairo-gl gradients being inverted on everyone but
Intel, where I'd apparently tweaked the transformation to work around
the bug. Fixes piglit fbo-fragcoord.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d11db2a857141c556378fde9f9f5ec08c7f8636f 14-Nov-2010 Vinson Lee <vlee@vmware.com> i965: Silence uninitialized variable warning.

Silences this GCC warning.
brw_fs.cpp: In member function 'void fs_visitor::split_virtual_grfs()':
brw_fs.cpp:2516: warning: unused variable 'reg'
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9effc1adf1e7ba57fb3b10909762b76c1ae12f61 12-Oct-2010 Eric Anholt <eric@anholt.net> i965: re-enable gen6 IF statements in the fragment shader.

IF statements were getting flattened while they were broken. With
Zhenyu's last fix for ENDIF's type, everything appears to have lined
up to actually work.

This regresses two tests:
glsl1-! (not) operator (1, fail)
glsl1-! (not) operator (1, pass)

but fixes tests that couldn't work before because the IFs couldn't be
flattened:
glsl-fs-discard-01
occlusion-query-discard

(and, naturally, this should be a performance improvement for apps
that actually use IF statements to avoid executing a bunch of code).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bb1540835056cdea5db6f55b19c0c87358f14cd1 03-Nov-2010 Eric Anholt <eric@anholt.net> intel: Annotate debug printout checks with unlikely().

This provides the optimizer with hints about code hotness, which we're
quite certain about for debug printouts (or, rather, while we
developers often hit the checks for debug printouts, we don't care
about performance while doing so).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cbc966b57bdb61f5bc158352a9c8dd57bf31b81e 19-Oct-2010 Kenneth Graunke <kenneth@whitecape.org> i965: Add bit operation support to the fragment shader backend.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9e3641bd0d739a87a6998300ca29580cb557f380 25-Oct-2010 Eric Anholt <eric@anholt.net> i965: Make FS uniforms be the actual type of the uniform at upload time.

This fixes some insanity that would otherwise be required for GLSL
1.30 bit ops or gen6 integer uniform operations in general, at the
cost of upload-time pain. Given that we only have that pain because
mesa's mangling our integer uniforms to be floats, this something that
should be fixed outside of the shader codegen.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
84eba3ef71dfa822e5ff0463032cdd2e3515b888 13-Oct-2010 Ian Romanick <ian.d.romanick@intel.com> Track separate programs for each stage

The assumption is that all stages are the same program or that
varyings are passed between stages using built-in varyings.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
62452e7d94a6353b59dfe0a8891d0709670dbeac 26-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add support for discard instructions on gen6.

It's a little more painful than before because we don't have the handy
mask register any more, and have to make do with cooking up a value
out of the flag register.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0e8c834ffa2f6d943a927e1a32a273d2f8600694 26-Oct-2010 Eric Anholt <eric@anholt.net> i965: Clear some undefined fields of g0 when using them for gen6 FB writes.

This doesn't appear to help any testcases I'm looking at, but it looks
like it's required.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
07cd8f46acc34b04308f81de2faf05ba33da264b 22-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add support for pull constants to the new FS backend.

Fixes glsl-fs-uniform-array-5, but not 6 which fails in ir_to_mesa.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ff622d5528c8cca465e29081c0792ca210cdd092 22-Oct-2010 Eric Anholt <eric@anholt.net> i965: Move the FS disasm/annotation printout to codegen time.

This makes it a lot easier to track down where we failed when some
code emit triggers an assert. Plus, less memory allocation for
codegen.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1d91f8d9164b38b4c924f43ec4fc5ceb65c96a78 22-Oct-2010 Eric Anholt <eric@anholt.net> i965: Be more aggressive in tracking live/dead intervals within loops.

Fixes glsl-fs-convolution-2, which was blowing up due to the array
access insanity getting at the uniform values within the loop. Each
temporary was considered live across the whole loop.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4e7252510976d8d3ff12437ea8842129f24d88f5 22-Oct-2010 Eric Anholt <eric@anholt.net> i965: Correct scratch space allocation.

One, it was allocating increments of 1kb, but per thread scratch space
is a power of two. Two, the new FS wasn't getting total_scratch set
at all, so everyone thought they had 1kb and writes beyond 1kb would
go stomping on a neighbor thread.

With this plus the previous register spilling for the new FS,
glsl-fs-convolution-1 passes.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
99b2c8570ea6f46c6564681631f0e0750a0641cc 19-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add support for register spilling.

It can be tested with if (0) replaced with if (1) to force spilling for all
virtual GRFs. Some simple tests work, but large texturing tests fail.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7a3f113e79f983222ecc95c33655a8c9354fcfad 21-Oct-2010 Eric Anholt <eric@anholt.net> i965: Fix gl_FrontFacing emit on pre-gen6.

It's amazing this code worked. Basically, we would get lucky in
register allocation and the tests using frontfacing would happen to
allocate gl_FrontFacing storage and the instructions generating
gl_FrontFacing but pointing at another register to the same hardware
register. Noticed during register spilling debug, when suddenly they
didn't get allocatd the same storage.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5ac6c4ecfe77bf7e02ae61981b2c8b1fe73027cd 20-Oct-2010 Eric Anholt <eric@anholt.net> i965: Split register allocation out of the ever-growing brw_fs.cpp.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ae5698e60467db2a7e3f730788cdcdd3711da101 19-Oct-2010 Eric Anholt <eric@anholt.net> i965: Use the new style of IF statement with embedded comparison on gen6.

"Everyone else" does it this way, so follow suit. It's fewer
instructions, anyway.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
80c9f756b28d15ca097963af35915f5b073f081d 19-Oct-2010 Kenneth Graunke <kenneth@whitecape.org> i965: Remove unused variable.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
746e68c50b4ae1566b342fbc965557b6dbcfaa2e 18-Oct-2010 Eric Anholt <eric@anholt.net> i965: Fix a weirdness in NOT handling.

XOR makes much more sense. Note that the previous code would have
failed for not(not(x)), but that gets optimized out.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ea213417f14a8b2734cb2a88d8aa1ac05a70b7d5 18-Oct-2010 Eric Anholt <eric@anholt.net> i965: Disable the debug printf I added for FS disasm.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
65d4234c2398aaa48eb5e29e6e7bede40fe2fd36 18-Oct-2010 Kenneth Graunke <kenneth@whitecape.org> i965: Add missing "break" statement.

Otherwise, it would try to handle arrays as structures, use
uninitialized memory, and crash.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
81d0a1fb3f1e5b7bcf43145f8a096691e3a5fdfb 15-Oct-2010 Eric Anholt <eric@anholt.net> i965: Set the type of the null register to fix gen6 FS comparisons.

We often use reg_null as the destination when setting up the flag
regs. However, on gen6 there aren't general implicit conversions to
destination types from src types, so the comparison to produce the
flag regs would be done on the integer result interpreted as a float.
Hilarity ensued.

Fixes 20 piglit cases.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
20b39c7760487bae73489b9812408e12d1d56dd5 15-Oct-2010 Ian Romanick <ian.d.romanick@intel.com> i965: Fix indentation after commit 3322fbaf
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3322fbaf3b5e305ce00c1d08c26965bb98e0cef0 14-Oct-2010 Ian Romanick <ian.d.romanick@intel.com> glsl: Slightly change the semantic of _LinkedShaders

Previously _LinkedShaders was a compact array of the linked shaders
for each shader stage. Now it is arranged such that each slot,
indexed by the MESA_SHADER_* defines, refers to a specific shader
stage. As a result, some slots will be NULL. This makes things a
little more complex in the linker, but it simplifies things in other
places.

As a side effect _NumLinkedShaders is removed.

NOTE: This may be a candidate for the 7.9 branch. If there are other
patches that get backported to 7.9 that use _LinkedShader, this patch
should be cherry picked also.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4b4284c9c9b472f750663352485290c22f8c3921 15-Oct-2010 Eric Anholt <eric@anholt.net> i965: Fix texturing on pre-gen5.

I broke it in 06fd639c519214b6ebcbf29127b6d9ed429f8641 by only testing
2 generations of hardware :(
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f157812bbbcf9caac1f84988e738fc9d1e051056 14-Oct-2010 Kenneth Graunke <kenneth@whitecape.org> i965: Add support for ir_unop_round_even via the RNDE instruction.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a81d423d93f22a948f3aa4bf73dc6b1a3b70192f 14-Oct-2010 Eric Anholt <eric@anholt.net> i965: Enable the new FS backend on pre-gen6 as well.

It is now to the point where we have no regressing piglit tests. It
also fixes Yo Frankie! and Humus DynamicBranching, probably due to the
piglit bias tests that work that didn't on the Mesa IR backend.

As a downside, performance takes about a 5-10% performance hit at the
moment (e.g. nexuiz 19.8fps -> 18.8fps), which I plan to resolve by
reintroducing 16-wide fragment shaders where possible. It is a win,
though, for fragment shaders using flow control.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f541b685aaf404fa7c8142f51d91c2720d82f264 14-Oct-2010 Kenneth Graunke <kenneth@whitecape.org> i965: Use RNDZ for ir_unop_trunc in the new FS.

The existing code used RNDD, which rounds down, rather than toward zero.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c4226142f3b5d1c931fcc781be8a3aafdfabf316 14-Oct-2010 Kenneth Graunke <kenneth@whitecape.org> i965: Use logical-not when emitting ir_unop_ceil.

Fixes piglit test glsl-fs-ceil.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5dd07b442e02696bf0ec5d4e3b4be1674519664a 14-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add peepholing of conditional mod generation from expressions.

This cuts usually 2 out of 3 instructions for flag reg generation (if
statements, conditional assignment) by producing the conditional mod
in the expression representing the boolean value.

Fixes glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined (register
allocation no longer fails for the conditional generation
proliferation)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d5599c0b6a22cd0bbc475ec715824660144d02a0 14-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add a function for handling the move of boolean values to flag regs.

This will be a place to peephole comparisions directly to the flag
regs, and for now avoids using MOV with conditional mod on gen6, which
is now illegal.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4f88550ba0e1ad07e39903f268975921c0101e85 14-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add a pass to the FS to split virtual GRFs to float channels.

Improves nexuiz performance 0.91% (+/- 0.54%, n=8)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b8613d70da34217b98edb9ac9e0a4c9a6598d0b3 14-Oct-2010 Eric Anholt <eric@anholt.net> i965: Update the live interval when coalescing regs.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0c6752026c405dc3ab5fe85c6a40ac3f04c685c3 14-Oct-2010 Eric Anholt <eric@anholt.net> i965: Set class_sizes[] for the aligned reg pair class.

So far, I've only seen this be a valgrind warning and not a real failure.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a57ef244fc55476660f9fb76982130c5c0b25163 14-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add support for rescaling GL_TEXTURE_RECTANGLE coords to new FS.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f9995b30756140724f41daf963fa06167912be7f 12-Oct-2010 Kristian Høgsberg <krh@bitplanet.net> Drop GLcontext typedef and use struct gl_context instead
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
080e7aface81e6a055ac61988ca27a88ad70f879 12-Oct-2010 Eric Anholt <eric@anholt.net> i965: Fix missing "break;" in i2b/f2b, and missing AND of CMP result.

Fixes glsl-fs-i2b.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bcec03d527561e2df56bf9ebfa250cef56bb732b 12-Oct-2010 Eric Anholt <eric@anholt.net> i965: Always use the new FS backend on gen6.

It's now much more correct for gen6 than the old backend, with just 2
regressions I've found (one of which is common with pre-gen6 and will
be fixed by an array splitting IR pass).

This does leave the old Mesa IR backend getting used still when we
don't have GLSL IR, but the plan is to get GLSL IR input to the driver
for the ARB programs and fixed function by the next release.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0cadd32b6dc80455802c04b479ec8e768f93ffe1 12-Oct-2010 Eric Anholt <eric@anholt.net> i965: Fix gen6 pixel_[xy] setup to avoid mixing int and float src operands.

Pre-gen6, you could mix int and float just fine. Now, you get goofy
results.

Fixes:
glsl-arb-fragment-coord-conventions
glsl-fs-fragcoord
glsl-fs-if-greater
glsl-fs-if-greater-equal
glsl-fs-if-less
glsl-fs-if-less-equal
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
720ed3c906b0f6d5822fe9fa442294c9828e1560 11-Oct-2010 Eric Anholt <eric@anholt.net> i965: Expand uniform args to gen6 math to full registers to get hstride == 1.

This is a hw requirement in math args. This also is inefficient, as
we're calculating the same result 8 times, but then we've been doing
that on pre-gen6 as well. If we're doing math on uniforms, though,
we'd probably be better served by having some sort of mechanism for
precalculating those results into another uniform value to use.

Fixes 7 piglit math tests.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
317dbf4613ebf56ca14ee70c1ad6e620ad7942c2 11-Oct-2010 Eric Anholt <eric@anholt.net> i965: Don't compute-to-MRF in gen6 math instructions.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
25cf241540007088936a6df16c849441087f722c 11-Oct-2010 Eric Anholt <eric@anholt.net> i965: Don't consider gen6 math instructions to write to MRFs.

This was leftover from the pre-gen6 cleanups. One tests regresses
where compute-to-MRF now occurs.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c6dbf253d284f68b0d0e4a3c145583880855324b 08-Oct-2010 Eric Anholt <eric@anholt.net> i965: Compute to MRF in the new FS backend.

This didn't produce a statistically significant performance difference
in my demo (n=4) or nexuiz (n=3), but it still seems like a good idea
and is recommended by the HW team.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
06fd639c519214b6ebcbf29127b6d9ed429f8641 09-Oct-2010 Eric Anholt <eric@anholt.net> i965: Give the FB write and texture opcodes the info on base MRF, like math.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0cd6cea8a3e9339fc69f9de0da6b40e4f9d5f4fe 08-Oct-2010 Eric Anholt <eric@anholt.net> i965: Give the math opcodes information on base mrf/mrf len.

This is progress towards enabling a compute-to-MRF pass.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
37758fb1cbb1ddcd106553763c1b1f222f4cfb47 11-Oct-2010 Eric Anholt <eric@anholt.net> i965: Move FS backend structures to a header.

It's time to start splitting some of this up.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
251fe2785484f7ba0c194c92fe0feff9c78b52ca 10-Oct-2010 Eric Anholt <eric@anholt.net> i965: Reduce register interference checks for changed FS_OPCODE_DISCARD.

While I don't know of any performance changes from this (once extra
reg available out of 128), it makes the generated asm a lot cleaner
looking.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
90c402204018c78f4a0b8a79515cf8c582092963 10-Oct-2010 Eric Anholt <eric@anholt.net> i965: Split FS_OPCODE_DISCARD into two steps.

Having the single opcode write then read the reg meant that single
instruction opcodes had to consider their source regs to interfere
with their dest regs.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c52a0b5c7d4b55fb183c8ab68aa3561432287283 05-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add register coalescing to the new FS backend.

Improves performance of my GLSL demo 14.3% (+/- 4%, n=4) by
eliminating the moves used in ir_assignment and ir_swizzle handling.
Still 16.5% to go to catch up to the Mesa IR backend, presumably
because instructions are almost perfectly mis-scheduled now.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
cac04a93974e7ae773b84e000a2b26391ee2f4bb 08-Oct-2010 Eric Anholt <eric@anholt.net> i965: Fix new FS gen6 interpolation for sparsely-populated arrays.

We'd overwrite the same element twice.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bbb840049e7a92af6e0e8c2c5c21c63caec9e826 06-Oct-2010 Eric Anholt <eric@anholt.net> i965: Normalize cubemap coordinates like is done in the Mesa IR path.

Fixes glsl-fs-texturecube-2-*
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4d202da7a4951eb534f77014238e7cdca9f781e9 07-Oct-2010 Eric Anholt <eric@anholt.net> i965: Disable emitting if () statements on gen6 until we really fix them.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b380531fd40e0876218b1116502bafea7911bd3d 06-Oct-2010 Eric Anholt <eric@anholt.net> i965: Don't assume that WPOS is always provided on gen6 in the new FS.

We sensibly only provide it if the FS asks for it. We could actually
skip WPOS unless the FS needed WPOS.zw, but that's something for
later.

Fixes: glsl-texture2d and probably many others.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1fdc8c007ea66b4c9866bf2c679653a005307fa5 06-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add support for gl_FrontFacing on gen6.

Fixes glsl1-gl_FrontFacing var (2) with new FS.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a760b5b509f85991a10400977576afabcedbb3c5 06-Oct-2010 Eric Anholt <eric@anholt.net> i965: Refactor gl_FrontFacing setup out of general variable setup.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
75270f705f319b0ecf297d1bdd328e52a8a956aa 06-Oct-2010 Eric Anholt <eric@anholt.net> i965: Gen6's sampler messages are the same as Ironlake.

This should fix texturing in the new FS backend.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fe6efc25ed3c1edf26073c4e6b6a3a45c857c1eb 06-Oct-2010 Eric Anholt <eric@anholt.net> i965: Don't do 1/w multiplication in new FS for gen6

Not needed now that we're doing barycentric.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5eeaf3671e2f913d38187fd1401c4b22a2900d57 06-Oct-2010 Eric Anholt <eric@anholt.net> i965: Fix botch in the header_present case in the new FS.

I only set it on the color_regions == 0 case, missing the important
case, causing GPU hangs on pre-gen6.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3c97c00e3810d31c3aa26173eb9fdef91b3e7c87 06-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add back gen6 headerless FB writes to the new FS backend.

It's not that hard to detect when we need the header.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
634abbf7b2e6ea21db30aafc0de9472ee31d4173 05-Oct-2010 Eric Anholt <eric@anholt.net> i965: Also do constant propagation for the second operand of CMP.

We could do the first operand as well by flipping the comparison, but
this covered several CMPs in code I was looking at.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
dcd0261affc293b75d231e612091ec7b1076fff6 05-Oct-2010 Eric Anholt <eric@anholt.net> i965: Enable the constant propagation code.

A debug disable had slipped in.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ea909be58dda7e916cb9ce434ecb78597881ad33 05-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add support for gen6 FB writes to the new FS.

This uses message headers for now, since we'll need it for MRT. We
can cut out the header later.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3bf8774e9c293fcad654d1bd67d4b43247b82f97 04-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add initial folding of constants into operand immediate slots.

We could try to detect this in expression handling and do it
proactively there, but it seems like less logic to do it in one
optional pass at the end.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e27c88d8e6c9d18bfa793f884d02ce6011c4bdde 04-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add trivial dead code elimination in the new FS backend.

The glsl core should be handling most dead code issues for us, but we
generate some things in codegen that may not get used, like the 1/w
value or pixel deltas. It seems a lot easier this way than trying to
work out up front whether we're going to use those values or not.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9faf64bc32cf7c1a06a302fff9f80d7e2e2685d5 04-Oct-2010 Eric Anholt <eric@anholt.net> i965: Be more conservative on live interval calculation.

This also means that our intervals now highlight dead code.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4fb0c92c6986cf4e88296bab8837320210f1794f 03-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add support for EXT_texture_swizzle to the new FS backend.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
64a9fc3fc15603a8e25d0e1146fe5da5a5bde55b 02-Oct-2010 Eric Anholt <eric@anholt.net> i965: Don't try to emit code if we failed register allocation.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6397addd6146661689a0e315b06e543ef12d8868 02-Oct-2010 Eric Anholt <eric@anholt.net> i965: Fix off-by-ones in handling the last members of register classes.

Luckily, one of them would result in failing out register allocation
when the other bugs were encountered. Applies to
glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined, which still
fails register allocation, but now legitimately.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
afb64311e3484002e06aeac62187b68467610449 02-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add a sanity check for register allocation sizes.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5ee09413162f4ec83cc7a738e807ffde8c89cca7 02-Oct-2010 Eric Anholt <eric@anholt.net> i965: When producing a single channel swizzle, don't make a temporary.

This quickly cuts 8% of the instructions in my glsl demo.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a0799725f52386cef911d3e104c5514a2811290b 02-Oct-2010 Eric Anholt <eric@anholt.net> i965: Restore the forcing of aligned pairs for delta_xy on chips with PLN.

By doing so using the register allocator now, we avoid wasting a
register to make the alignment happen.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e9bcc8328968f05a5688a020bfa8165260865a9b 01-Oct-2010 Eric Anholt <eric@anholt.net> i965: Fix up copy'n'pasteo from moving coordinate setup around for gen4.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
bfd9715c3c9d40b3f937638073ff2f0969ebd143 01-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add real support for pre-gen5 texture sampling to the new FS.

Fixes 36 testcases, including glsl-fs-shadow2d*-bias which fail on the
Mesa IR backend.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
8f63a44636e4fef2f35fe73f24c27db9b04389b1 01-Oct-2010 Eric Anholt <eric@anholt.net> i965: Pre-gen6, map VS outputs (not FS inputs) to URB setup in the new FS.

We should fix the SF to actually give us just the data we need, but
this fixes regressions in the new FS until then.

Fixes:
glsl-kwin-blur
glsl-routing
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ff5ce9289b5159e7de34706b31be771d3e3cefd6 01-Oct-2010 Eric Anholt <eric@anholt.net> i965: Also increment attribute location when skipping unused slots.

Fixes glsl1-texcoord varying.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
354c40a62411262d1223f439fdaf2176ca9adbe9 01-Oct-2010 Eric Anholt <eric@anholt.net> i965: Fix the gen6 jump size for BREAK/CONT in new FS.

Since gen5, jumps are in increments of 64 bits instead of increments
of 128-bit instructions.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
efc4a6f7909dbf554ee440210233c4b0f89ac89e 01-Oct-2010 Eric Anholt <eric@anholt.net> i965: Add gen6 attribute interpolation to new FS backend.

Untested, since my hardware is not booting at the moment.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1d073cb2d920d1c0b8c6d598055b14048fedc96e 01-Oct-2010 Eric Anholt <eric@anholt.net> i965: Split the gen4 and gen5 sampler handling apart.

Trying to track the insanity of the different argument layouts for
normal/shadow crossed with normal/lod/bias one generation at a time is
enough.

Fixes: glsl1-texture2D() with bias.
(first test passing in this code that doesn't pass without it!)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5f237a1ccb28399fbbceecea694f5d18ebba9938 01-Oct-2010 Eric Anholt <eric@anholt.net> i965: Use the lowering pass for texture projection.

We should end up with the same code, but anyone else with this issue
could share the handling (which I got wrong for shadow comparisons in
the driver before).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c6960e4471abe287448b9d0e7e6519d588cdf43c 30-Sep-2010 Eric Anholt <eric@anholt.net> i965: Fix new FS handling of builtin uniforms with packed scalars in structs.

We were pointing each element at the .x channel of the
ParameterValues.

Fixes glsl1-linear fog.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6f6542a483ec726538f8a4555bddaeb0be6b2146 30-Sep-2010 Eric Anholt <eric@anholt.net> i965: Fix whole-structure/array assignment in new FS.

We need to walk the type tree to get the right register types for
structure components. Fixes glsl-fs-statevar-call.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ad1506c5ac61b75e45f24a2e18c91dc8a49a3bb0 30-Sep-2010 Eric Anholt <eric@anholt.net> i965: Remove my "safety counter" code from loops.

I've screwed this up enough times that I don't think it's worth it.
This time, it was that I was doing it once per top-level body
instruction instead of just once at the end of the loop body.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b90c7d1713c5a52fd85cb9dacad5828ae2fdbf6c 30-Sep-2010 Eric Anholt <eric@anholt.net> i965: Add live interval analysis and hook it up to the register allocator.

Fixes 13 piglit cases that failed at register allocation before.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e1261d3c493ff48348483a0084f3017c7e663dc0 29-Sep-2010 Eric Anholt <eric@anholt.net> i965: First cut at register allocation using graph coloring.

The interference is totally bogus (maximal), so this is equivalent to
our trivial register assignment before. As in, passes the same set of
piglit tests.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
21148e1c0a3cf9cf25ded006a3d5ce2b12803ea9 29-Sep-2010 Eric Anholt <eric@anholt.net> i965: Clean up the virtual GRF handling.

Now, virtual GRFs are consecutive integers, rather than offsetting the
next one by the size. We need the size information to still be around
for real register allocation, anyway.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
0efea25c4b9c6b5505fdbba25b525efb27468de4 30-Sep-2010 Eric Anholt <eric@anholt.net> i956: Make new FS discard do its work in a temp, not the null reg!

Fixes:
glsl-fs-discard-02 (GPU hang)
glsl1-discard statement (2)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1747aa6755088398108febb121a80d9572c1533e 29-Sep-2010 Eric Anholt <eric@anholt.net> i965: Add support for builtin uniforms to the new FS backend.

Fixes 8 piglit tests.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9ac910cfcddf1b6e7c520261371e78fc9bcbddcf 29-Sep-2010 Eric Anholt <eric@anholt.net> i965: Clean up obsolete FINISHME comment.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ff0eb45f47ebf2fcc1af06a8b6b934c79dff1d41 29-Sep-2010 Eric Anholt <eric@anholt.net> i965: Fix array indexing of arrays of matrices.

The deleted code was meant to be handling indexing of a matrix, which
would have been a noop if it had been correct.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
17f3b8097d01a63917afaaefccd6eea070271652 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Don't try to emit interpolation for unused varying slots.

Fixes:
glsl-fs-varying-array
glsl-texcoord-array
glsl-texcoord-array-2
glsl-vs-varying-array
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5272c6a7a23ba74c696608fc2cb07fbfaf9e822a 03-Sep-2010 Eric Anholt <eric@anholt.net> i965: Do interpolation for varying matrices and arrays in the FS backend.

Fixes:
glsl-array-varying-01
glsl-vs-mat-add-1
glsl-vs-mat-div-1
glsl-vs-mat-div-2
glsl-vs-mat-mul-2
glsl-vs-mat-mul-3
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b9a59f0358f6f6afc7fafc1b417fa1b2c4cdaf37 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Add support for ARB_fragment_coord_conventions to the new FS backend.

Fixes:
glsl-arb-frag-coord-conventions
glsl-fs-fragcoord
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
701c5f11c9102047c8962f053843469ada3b3a1a 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Add support for ir_loop counters to the new FS backend.

Fixes:
glsl1-discard statement in for loop
glsl-fs-loop-two-counter-02
glsl-fs-loop-two-counter-04
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
89f6783d1769c61b835b49a5fb4405a3249031f4 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Add support for MRT to the new FS backend.

Fixes these tests using gl_FragData or just gl_FragDepth:
glsl1-Preprocessor test (extension test 1)
glsl1-Preprocessor test (extension test 2)
glsl-bug-22603
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
86fd11262cb5697e5c3563e876781b3587788737 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Add support for non-color render target write data to new FS backend.

This is the first time these payload bits have made sense to me,
outside of brw_wm_pass* structure.

Fixes: glsl1-gl_FragDepth writing
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2999a44968a045b5516ff23d70b711b01bd696a5 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Set up sampler numbers in the FS backend.

+10 piglits
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9e96c737f8cb6faebf7c7339cfcf14f80ed8e73c 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Subtract instead of adding when computing y delta in new FS backend.

Fixes 7 piglit cases.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5f7bd68149e59b6940e891928faa532bce0271f6 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Add support for gl_FrontFacing to the new FS backend.

Fixes:
glsl1-gl_FrontFacing var (1)
glsl1-gl_FrontFacing var (2)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6bf12c8b7366a9db8c88b9cacaa06266b41a73b5 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Add support for struct, array, and matrix uniforms to FS backend.

Fixes 16 piglit cases.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
ba481f2046e6427c8bd7fc5f8cb8ef3059a7881a 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Add support for dereferencing structs to the new FS backend.

Fixes: glsl1-struct(2)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
07fc8eed8f0398063d87acf3a7ee392da4184822 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Set the variable type when dereferencing an array.

We don't set the type on the array virtual reg as a whole, so here's
the right place.

Fixes:
glsl1-GLSL 1.20 arrays
glsl1-temp array with constant indexing, fragment shader
glsl1-temp array with swizzled variable indexing
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
719f84d9aba6b016e1069e0461cbfc4211f5a3b5 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Fix up the FS backend for the variable array indexing pass.

We need to re-run channel expressions afterwards as it generates new
vector expressions, and we need to successfully support conditional
assignment (brw_CMP takes 2 operands, not 1).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
57edd7c5c116926325e3a86cef618bfd1b5881c1 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Fix valgrind complaint about base_ir for new FS debugging.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1723fdb3f0004a685351d005ba0f5bfc1c2a852e 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Apply the same set of lowering passes to new FS as to Mesa IR.

While much of this we will want to support natively, this should make
the task of reaching the Mesa IR backend's quality easier.

Fixes:
glsl-fs-main-return.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e10508812aed4c41c62ea27ac540c8d079bece07 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Actually track the "if" depth in loop in the new FS backend.

Fixes:
glsl-fs-if-nested-loop.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fceb78e3cc67d035a69613826f46a18e62235f5c 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Fix negation in the new FS backend.

Fixes:
glsl1-Negation
glsl1-Negation2
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
94d44c33c0ced34e222517ed9c3b72d3c5e3b9f0 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Add support for dFdx()/dFdy() to the FS backend.

Fixes:
glsl-fwidth
glsl-derivs-swizzle
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
169ff0cc9d189f5a00a2a94313a6ce1503d1d5b9 28-Sep-2010 Eric Anholt <eric@anholt.net> i965: Handle all_equal/any_nequal in the new FS.

These are generated for scalar operands instead of plain equal/nequal.
But for scalars, they're the same anyway. +30 piglits.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
11ba8bafdbb31f40ecbb6478e26496b547d34c68 27-Sep-2010 Eric Anholt <eric@anholt.net> i965: Fix up writemasked assignments in the new FS.

Not sure how I managed to get tests to succeed without this. +54 piglits.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
03923ff95ed2c1ee54f0132e87e277b6cf07b7f5 22-Sep-2010 Eric Anholt <eric@anholt.net> i965: Warning fix for vector result any_nequal/all_equal change.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
6ef5f212343c0557c4fca272d8236226c1a7c87a 10-Sep-2010 Eric Anholt <eric@anholt.net> i965: Add switch cases for ir_unop_noise, which should have been lowered.

Fixes compiler warnings.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e591c4625cae63660c5000fbab366e40fe154ab0 05-Sep-2010 Luca Barbieri <luca@luca-barbieri.com> glsl: add several EmitNo* options, and MaxUnrollIterations

This increases the chance that GLSL programs will actually work.

Note that continues and returns are not yet lowered, so linking
will just fail if not supported.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
32b84ef4ca50998914184fc4600d8e43674a9a22 05-Sep-2010 Eric Anholt <eric@anholt.net> i965: Make pixel_xy results UW.

There is a restriction on the destination of an operation involving a
vector immediate being 128-bit aligned and the destination horizontal
stride being equivalent to 2 bytes. Fixes bad pixel_x results from
gl_FragCoord, where each pair had the same value.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
5afdfa222fa9ec8c54e7d6957d2680c37a9eb715 06-Sep-2010 Eric Anholt <eric@anholt.net> i965: Don't bother with RNDZ for f2i.

The default type conversion for MOV should be fine, and RNDZ actually
requires two instructions.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3fb5377ba57aea356a81c521c0cf1975dc290b61 04-Sep-2010 Eric Anholt <eric@anholt.net> i965: Align the start of attribute interp coefficients in FS to use PLN.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3dbc9ea0a35653a0484d3b0a65a305626c251789 03-Sep-2010 Eric Anholt <eric@anholt.net> i965: Just assert when we flagged a compile error in the FS for now.

Dumping back to potentially 16-wide dispatch doesn't really work out
at the moment, and hopefully I'll just be able to resolve all the
failures so we never have to do this at all.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
42fc60cadcea920e9d67581de133a47effcc8441 03-Sep-2010 Eric Anholt <eric@anholt.net> i965: Clean up fs_reg setup by using a helper for constructors.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1fcb5a9858b7513c5130006933edc224b69be82d 29-Aug-2010 Eric Anholt <eric@anholt.net> i965: Add support for loops to the new FS backend.

This includes a handy little safety check to prevent the loop from
going "too long", as permitted by the spec. I haven't gone out of my
way to test it, though…

Fixes 20 more piglit tests.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
b0a933a4d91c47e697459921073f8afe668bac31 29-Aug-2010 Eric Anholt <eric@anholt.net> i965: Add "discard" support to the new FS backend.

Fixes 3 testcases related to discard.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4ff25c2106fb981334bdc1b032fcf37d8753ba62 29-Aug-2010 Eric Anholt <eric@anholt.net> i965: Fix the new implementation of ir_unop_sign to match brw_wm_emit.c

Like the comparison operations, this suffered from CMP only setting
the low bit. Doing the AND instructions would be the same instruction
count as the more obvious conditional moves, so do cond moves.

Fixes glsl-fs-sign and 6 other cases, like trig functions that use
sign() internally.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
40aadafa91ef5b931436d400fedafd720d59deff 29-Aug-2010 Eric Anholt <eric@anholt.net> i965: Add support for texturing with bias to i965 FS backend.

Fixes 5 piglit tests for bias. Note that LOD is a 1.30 feature and
not yet supported.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
735af3959f4a4eb5940835c5a4117a020f103414 28-Aug-2010 Eric Anholt <eric@anholt.net> i965: Add initial support for texturing to the new FS backend.

Fixes 11 piglit tests.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3d4597f9d4c93d285825d5a6505d4ee7ce6e2c3e 29-Aug-2010 Cedric Vivier <cedricv@neonux.com> i965: Move libdrm/C++ hack introduced in fa2deb3d to intel_context.h

Fixes build on Linux/GCC 4.4 as libdrm includes are also used by other
brw_fs_*.cpp files.

Bug #29855
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
d20c2766182b632fba296eff7328bf14c802096e 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Don't strip negate/abs flags when assigning uniform locations.

Fixes glsl-algebraic-sub-zero-4.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
f0aa2d6118b1af7434b7551227cd72c588568e65 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Add missing handling for BRW_OPCODE_SEL.

Fixes 4 piglit tests about min, max, and clamp.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
38d01c5b272d28a805e7598bad2f2ef5c8da732a 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Mask out higher bits of the result of BRW_CMP producing a boolean.

When it says it sets the LSB, that's not just a hint as to where the
result goes. Only the LSB is modified. Fixes 20 piglit cases.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
4229a93cc756b3ade02dcf93d806610f95497ad3 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Fix the types of immediate integer values.

When we're trying to do integer ops, handing a float in doesn't help.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
41e75cde2605e62ab691fd725a8a7259f40f5122 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Add translation for RNDD and RNDZ.

Fixes:
glsl-fs-any.
glsl1-integer division with uniform var
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
31c9f468f35637ce3b82e59a43c49c949d59ee9e 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Add support for ir_binop_mod using do_mod_to_fract.

Fixes glsl-fs-mod.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
53290900db2f13fd9ab56b8f9780fa309d31780f 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Fix swapped instructions in ir_unop_abs and ir_unop_neg.

Fixes glsl-fs-neg and 5 other tests.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
2776ad2641469d3bdb6f53b99fbd748efd277c51 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Add generate() handling for AND, OR, XOR.

10 more piglit tests pass.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
130368f910a806a12287c7561df7dddd0fc8be40 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Add support for if instructions in the new FS backend.

20 more piglit tests pass.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a0ffee2cd79deb5a437784e25de6512d7f8e6bb8 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: When encountering an unknown opcode in new FS backend, print its name.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
40932c1752b0fa918d764e3367f5ab450033304a 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Fix the maximum grf counting in the new FS backend.

glsl-algebraic-rcp-rsq managed to use 33 registers, and we claimed to
only use 32, so the write to g32 would go stomping over the precious
g0 of some other thread.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
166b3fa29d4b5af8d4e8c410ed71e4348b65bbd9 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Validate the IR tree after doing our custom optimization passes.

This wouldn't catch the last failure fixed in them, because we don't
validate assignments well (due to the fact that we've got a pretty
glaring inconsistency in how we handle assignment writemasking), but
it could catch other failure we may produce.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
55ced3367543994bd21b48326c64edb743001145 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Add a bit of support for matrices to the new FS.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
91a037b5e1374fe0574480a579bd36c71b75f9c2 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Fix destination writemasking in the new FS.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a4d97d3726046fca66f3dbcfbe7b276c5eb80b3b 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Add preliminary support for uniforms to the new FS backend.

+269 piglits
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3dff682b6595c8771655307ed00bd8844f22238c 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Abort on gl_FragDepth in the new FS backend for now.

It hangs the GPU due to FB_WRITE handling being incomplete. There are
bigger issues to handle first.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
1a3de23509b8170ee87223dc63e992e195a04de5 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Fix up and actually enable the NewShader and NewShaderProgram hooks.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
fa2deb3ddc8dc9e3eedf7f3dc1d2d2945a95f79b 27-Aug-2010 Eric Anholt <eric@anholt.net> i965: Hack in avoidance of c++ reserved keyword in libdrm.

I'm also fixing this upstream in libdrm, but this avoids new libdrm
dependency for the moment.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
363d0f6774b4c6b825f5b903284da1cd51a91986 26-Aug-2010 Eric Anholt <eric@anholt.net> i965: Add GLSL IR-level source annotation and comments to new FS debug.

This should make debugging way easier, as now we have context for
reading large programs.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
7268bd82f60b1c9642a48dcfff6d77b2897222cd 26-Aug-2010 Eric Anholt <eric@anholt.net> i965: Use the implied move in brw_math() in the new FS.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
e85f8272d0757989aeab650fbf929b382d671492 17-Aug-2010 Eric Anholt <eric@anholt.net> i965: Add support for in varyings to the new FS codegen.

At least some tests, like glsl-vs-sign, now work.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
dcb7c0009bf0a1e0c4fb1aae4b7b07efcc0ed173 16-Aug-2010 Eric Anholt <eric@anholt.net> i965: Start building the codegen visitor.

This can successfully emit a real program that generates magenta now.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
9763d0a82a1ee605a8794f199d432824fb972b6a 26-Aug-2010 Eric Anholt <eric@anholt.net> i965: Start building direct GLSL2 IR to 965 assembly codegen.

Our channel-expressions and vector-splitting changes now happen into a
private copy of the IR that we maintain for ourselves. Uniform
assignment still happens by the core, so we continue using Mesa IR
generation not just for swrast fallbacks but also for uniform values
(since there's no storage for their contents other than
shader_program->FragmentProgram->Parameters->ParameterValues). And
most importantly, at the moment no actual codegen is hooked up other
than emitting our favorite color to the framebuffer.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
c1dfdcb93a8991788032d4906c5bf1a5b48cdc48 26-Aug-2010 Eric Anholt <eric@anholt.net> i965: Add new pass to split vectors into scalar variables

Combined with the previous pass, this lets other optimization passes
do their work thanks to ir_tree_grafting. Still have regression in
instruction count with INTEL_NEW_FS, but register count is even
better.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
3a8ad33dde2f059b82ebf09f5cffa66c86f2e734 13-Aug-2010 Eric Anholt <eric@anholt.net> i965: Add a pass for the FS to reduce vector expressions down to scalar.

This is a step towards implementing a GLSL IR backend for the 965
fragment shader. Because it has downsides with the current codegen,
it is hidden under the environment variable INTEL_NEW_FS.

This results in an increase in instruction count at the moment (1444
-> 1752 for glsl-fs-raytrace, 345 -> 359 on my demo), because dot
products are turned into a series of multiplies and adds instead of a
custom expansion of MULs and MACs, and by not splitting the variable
types up we don't get tree grafting and thus there are extra moves of
temporary storage. However, register count drops for the non-GLSL
path (64 -> 56 on my demo shader) because the register allocator sees
all the sub-operations.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
a1bebf73dfdaf2cd23286aa74271b87166589901 11-Aug-2010 Eric Anholt <eric@anholt.net> i965: Start building 965 FS backend.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp