History log of /external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
f77cecf08cf9fba5e8f62e8ac1731c1916a97618 30-Mar-2017 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Always provide a default LOD of 0 for TXS and TXL

We already provide a default LOD for textureQueryLevels and texture() on
non-fragment stages. However, there are more cases where one is needed
such as textureSize(gsampler2DMS*) in SPIR-V. Instead of trying to list
out all of the cases one at a time, just provide the default for all TXS
and TXL operations. This fixes a shader validation error in the new
Sascha deferredmultisampling demo which uses textureSize(gsampler2DMS).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100391
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3503b2714b98684a2ceba5f4fd9a5bfbfbcaad38)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
e1e27b0917249448a481b6681aac375505f728c3 16-Feb-2017 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/fs: fix source type when emitting MOV_INDIRECT to read ICP handles

When generating the MOV INDIRECT instruction, the source type is ignored
and it is set to destination's type. However, this is going to change in a
later patch, so we need to explicitly set the proper source type.

brw_vec8_grf() creates an float type's fs_reg by default, when the
ICP handle is actually unsigned. This patch fixes these cases before
applying the aforementioned patch.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit d8122128bc6bd291ff0abcb7f2e52d9cdc631527)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
59e6c0d8aee718cf58198d5a5b2adce3e01391a6 13-Feb-2017 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/fs: fix indirect load DF uniforms on BSW/BXT

The lowered BSW/BXT indirect move instructions had incorrect
source types, which luckily wasn't causing incorrect assembly to be
generated due to the bug fixed in the next patch, but would have
confused the remaining back-end IR infrastructure due to the mismatch
between the IR source types and the emitted machine code.

v2:
- Improve commit log (Curro)
- Fix read_size (Curro)
- Fix DF uniform array detection in assign_constant_locations() when
it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT.

v3:
- Move changes in assign_constant_locations() to other patch.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 56266df7ed9dbdf63acfd58944442893b4cd0c0b)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a594bd19dc2344260904c51ea7b22bdc71428d64 15-Feb-2017 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Fix the inline nir_op_pack_double optimization

We can only do the optimization if the source *is* SSA.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a4393bd97fe62e8299273bae769201c5c9c816ea)

Squashed with commit:

i965/fs: Remove the inline pack_double_2x32 optimization

It's broken in a number of ways. In particular, a bunch of the
conditions are backwards so it doesn't actually detect what it's
supposed to detect. Since it's been broken, it hasn't actually been
helping anything so just deleting it isn't a regression.

This (and removing another optimization) were done on master in commit
b07381161777ba5d5f4a1d713f7655bcaede4139.

Cc: "Kenneth Grunke" <kenneth@whitecape.org>
Cc: "Mark Janes" <mark.a.janes@intel.com>

[Emil Velikov: patch is a backport of the below "cherry pick"]
Fixes: a4393bd97fe ("i965/fs: Fix the inline nir_op_pack_double optimization")

(cherry picked from commit b07381161777ba5d5f4a1d713f7655bcaede4139)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
e6ae19944d977dc91bc45adff679337182c20683 24-Nov-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Rework gl_TessLevel*[] handling to use NIR compact arrays.

Treating everything as scalar arrays allows us to drop a bunch of
special case input/output munging all throughout the backend.
Instead, we just need to remap the TessLevel components to the
appropriate patch URB header locations in remap_patch_urb_offsets().

We also switch to treating the TES input versions of these as ordinary
shader inputs rather than system values, as remap_patch_urb_offsets()
just makes everything work out without special handling.

This regresses one Piglit test:
arb_tessellation_shader-large-uniforms/GL_TESS_CONTROL_SHADER-array-at-limit

The compiler starts promoting the constant arrays assigned to gl_TessLevel*
to uniform arrays. Since the shader also has a uniform array that uses
the maximum number of uniform components, this puts it over the uniform
component limit enforced by the linker. This is arguably a bug in the
constant array promotion code (it should avoid pushing us over limits),
but is unlikely to penalize any real application.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c5ae6e78fc3bed83c6e18be6dbc8eb86a8db0898 23-Dec-2016 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/fs: fix exec_size when emitting DIM instruction

Otherwise, DIM instructions will be emitted with the default exec size
which could be 16 in some cases, that is not legal.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b56fa830c6095f8226456b2aeb62f2dfad804be5 09-Dec-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Fetch one cacheline of pull constants at a time.

Asking the DC for less than one cacheline (4 owords) of data for
uniform pull constants is suboptimal because the DC cannot request
less than that from L3, resulting in wasted bandwidth and unnecessary
message dispatch overhead, and exacerbating the IVB L3 serialization
bug. The following table summarizes the overall framerate improvement
(with statistical significance of 5% and sample size ~10) from the
whole series up to this patch for several benchmarks and hardware
generations:

| SKL | BDW | HSW
SynMark2 OglShMapPcf | 24.63% ±0.45% | 4.01% ±0.70% | 10.31% ±0.38%
GfxBench4 gl_manhattan31 | 5.93% ±0.35% | 3.92% ±0.31% | 6.62% ±0.22%
GfxBench4 gl_4 | 2.52% ±0.44% | 1.23% ±0.10% | N/A
Unigine Valley | 0.83% ±0.17% | 0.23% ±0.05% | 0.74% ±0.45%

Note that there are two versions of the Manhattan demo shipped with
GfxBench4, one of them is the original gl_manhattan demo which doesn't
use UBOs, so this patch will have no effect on it, and another one is
the gl_manhattan31 demo based on GL 4.3/GLES 3.1, which this patch
benefits as shown above.

I haven't observed any statistically significant regressions in the
benchmarks I have at hand. Note that the comparatively huge
improvement on SKL in the OglShMapPcf test case is due to the combined
effect of this patch and the register pressure benefit on SKL+ of
"i965/fs: Switch to the constant cache for uniform pull constants.",
part of the same series.

Going up to 8 oword blocks would improve performance of pull constants
even more, but at the cost of some additional bandwidth and register
pressure, so it would have to be done on-demand based on the number of
constants actually used by the shader.

v2: Fix for Gen4 and 5.
v3: Non-trivial rebase. Rework to allow the visitor specifiy
arbitrary pull constant block sizes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
9b22a0d295316b7547667ebbfe1e1b6182439186 09-Dec-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Expose arbitrary pull constant load sizes to the IR.

Change the FS generator to ask the dataport for enough owords worth of
constants to fill the execution size of the instruction -- Which means
that the visitor now needs to set the execution size correctly for
uniform pull constant load instructions, which we were kind of
neglecting until now.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
fd249c803e3ae2acb83f5e3b7152728e73228b7b 12-Dec-2016 Ilia Mirkin <imirkin@alum.mit.edu> treewide: s/comparitor/comparator/

git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g'

Just happened to notice this in a patch that was sent and included one
of the tokens in question.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
4f2d1d6ea713df8f8d816b48b9e99c7117cf36d7 28-Nov-2016 Ilia Mirkin <imirkin@alum.mit.edu> i965: support constant gather offsets larger than 4 bits

Offsets that don't fit into 4 bits need to force gather_po to be
selected. Adjust the logic so that this happens.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
faf20df143a63e58aa729446f21c38ae39a438f2 29-Nov-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Refactor handling of constant tg4 offsets

Previously, we had an OFFSET_VALUE source for logical texture instructions
that was intended to mean exactly what it says, "offset". In reality, we
only fully used it for tg4 offsets. We used offset_value.file == IMM to
mean, "you have a constant offset, go look in instr->offset" and didn't
actually use the contents of the register at all in that case except for
in nir_emit_texture where we used it as a temporary before we copy it into
instr->offset.

This commit renames OFFSET_VALUE to TG4_OFFSET and restricts its usage to
indirect tg4 offsets only. The nir_emit_texture code is refactored so that
we explicitly build a header_bits value which is placed in instr->offset
and the constant offset values (both for tg4 and regular texture
operations) are used to construct header_bits and don't go through the
offset source at all. Finally, we stop passing offset_value in to
lower_sampler_logical_send_gen5 because we can't do indirect offsets until
gen7 anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
2e311e421122e0232987fdca3645c6bd39fe2470 16-Nov-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Implement load_layer_id for fragment shaders

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b63f7671a3eafa4ab293a13f45f58837bd840a46 04-Oct-2016 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Handle compact outputs.

We need to calculate the number of vec4 slots correctly.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c4be6e0b8d91746eccf334b9e20861af4036d06a 15-Nov-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Fix GS push inputs with enhanced layouts.

We weren't taking first_component into account when handling GS push
inputs. We hardly ever push GS inputs, so this was not caught by
existing tests. When I started using component qualifiers for the
gl_ClipDistance arrays, glsl-1.50-transform-feedback-type-and-size
started catching this.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
e1af20f18a86f52a9640faf2d4ff8a71b0a4fa9b 13-Oct-2016 Timothy Arceri <timothy.arceri@collabora.com> nir/i965/anv/radv/gallium: make shader info a pointer

When restoring something from shader cache we won't have and don't
want to create a nir_shader this change detaches the two.

There are other advantages such as being able to reuse the
shader info populated by GLSL IR.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
59864e8e02057cc6fa0448a8af067a3cf53389da 13-Oct-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Don't use nir_assign_var_locations for VS/TES/GS outputs.

Fixes spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-dvec3.

v2: Remove nir_outputs field from fs_visitor (caught by Tim and Iago).

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3728ee000aecb19793dec56d45aff9d6cfce3e5b 13-Oct-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Drop unnecessary switch statement in nir_setup_outputs()

TCS and FS are skipped above. CS has no output variables.
All remaining cases take the same path.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
e51e055fcdf8107aafaba358fa65b00f963e1728 09-Sep-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Introduce downcast helpers for prog_data structures.

Similar to brw_context(...), intel_texture_object(...), and so on.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
40dd45d0c6aa4a9d727c09225967e9c3b1f45854 30-Jun-2016 Ian Romanick <ian.d.romanick@intel.com> i965: Enable ARB_shader_atomic_counter_ops

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3d2011cb33317b0fe9b8fe989916efc1841c6ce0 30-Jun-2016 Ian Romanick <ian.d.romanick@intel.com> i965: Refactor emission of atomic counter operations

This will make it easier to add more operations.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
950af5ed40895ba7eb664a64e869cf4ae1104fc7 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Misc simplification.

Get rid of some leftover redundant arithmetic introduced during the
conversion to byte offsets and sizes that can be simplified easily.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
80e1d670b4b4c080ce2092a3b52d2415bc4c6a42 01-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Get rid of fs_inst::set_smear().

component() was generally a better alternative because of several
issues set_smear() had:

- It wouldn't take the original stride and offset of the register
into account, which means that set_smear() on the result of
e.g. another set_smear() call or an offset() call would give a
bogus region as result.

- It was an inherently destructive operation. See the
'nir_intrinsic_shader_clock' hunk below for how this could lead to
subtle bugs in cases where set_smear() was called multiple times on
the same register like 'r.set_smear(0), r.set_smear(1)' with the
expectation that each call would return a separate value instead of
a reference to the same subsequently mutated object.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
2d7d4a791083ff63f37ac1e40bfe8b448e7f8045 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Simplify a bunch of fs_inst::size_written calculations by using component_size().

Using component_size() is easier and generally more correct because it
takes into account the register type and stride for you.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
62aaef6c83e4eb354bd7f15803db01e90d22fc34 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Simplify and fix buggy stride/offset calculations using subscript().

These were bashing the 'offset' and 'stride' values of several
registers without taking the previous value into account, which
probably didn't matter in practice for optimize_frontfacing_ternary()
because the 'tmp' register already had a known region, but it would
have given the wrong region as result in the other cases in
lower_integer_multiplication(). subscript(..., i) is a more
straightforward way to take the i-th field of a given type from each
channel of a register which should give the right answer as result
regardless of the original 'offset' and 'stride' parameters of the
register region.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c057278c065747c1f53579504bf109cafb7cb390 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Stop using fs_reg::in_range() in favor of regions_overlap().

Its only use left in the FS back-end should be using regions_overlap()
instead to avoid getting a false negative result in cases where source
and destination overlap but the former starts before the latter in the
VGRF file.

v2: Put back lost components factor (Iago).

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
69570bbad876bb9da609c3b651aacda28cecc542 07-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Replace fs_inst::regs_written with ::size_written field in bytes.

The previous regs_written field can be recovered by rewriting each
rvalue reference of regs_written like 'x = i.regs_written' to 'x =
DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference
like 'i.regs_written = x' to 'i.size_written = x * reg_unit'.

For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
be095e11e41158f91bcb3f6fcbc2e2a91a5d9124 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Replace fs_reg::subreg_offset with fs_reg::offset expressed in bytes.

The fs_reg::subreg_offset and ::offset fields are now redundant, the
sub-GRF offset can just be added to the single ::offset field
expressed in byte units. The current subreg_offset value can be
recovered by applying the following rule: Replace each rvalue
reference of subreg_offset like 'x = r.subreg_offset' with 'x =
r.offset % reg_unit', and each lvalue reference like 'r.subreg_offset
= x' with 'r.offset = ROUND_DOWN_TO(r.offset, reg_unit) + x'.

For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
86944e063ad40cac0860bfd85a3cc4e9a9805aa3 01-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes.

The fs_reg::offset field in byte units introduced in this patch is a
more straightforward alternative to the current register offset
representation split between fs_reg::reg_offset and ::subreg_offset.
The split representation makes it too easy to forget about one of the
offsets while dealing with the other, which has led to multiple
back-end bugs in the past. To make the matter worse the unit
reg_offset was expressed in was rather inconsistent, for uniforms it
would be expressed in either 4B or 16B units depending on the
back-end, and for most other things it would be expressed in 32B
units.

This encodes reg_offset as a new offset field expressed consistently
in byte units. Each rvalue reference of reg_offset in existing code
like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and
each lvalue reference like 'r.reg_offset = x' is rewritten to
'r.offset = r.offset % reg_unit + x * reg_unit'.

Because the change affects a lot of places and is rather non-trivial
to verify due to the inconsistent value of reg_unit, I've tried to
avoid making any additional changes other than applying the rewrite
rule above in order to keep the patch as simple as possible, sometimes
at the cost of introducing obvious stupidity (e.g. algebraic
expressions that could be simplified given some knowledge of the
context) -- I'll clean those up later on in a second pass.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
979d0aca6277975986f5f278cad0f37616c9d91f 26-Aug-2016 Jason Ekstrand <jason.ekstrand@intel.com> intel: Rename brw_get_device_name/info to gen_get_device_name/info

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
527f37199929932300acc1688d8160e1f3b1d753 23-Aug-2016 Jason Ekstrand <jason.ekstrand@intel.com> intel: s/brw_device_info/gen_device_info/

Generated by:

sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
4135fc22ff735a40c36fcf051c1735fe23d154f2 19-Aug-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Hook up coherent framebuffer reads to the NIR front-end.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
f24e393bd5caee85994b00b93f141e6c4b99e273 22-Jul-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Translate nir_intrinsic_load_output on a fragment output.

This gets the non-coherent framebuffer fetch path hooked up to the NIR
front-end.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b00a236d6a6212323f77248ba923c65eeb02592b 22-Jul-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Allocate fragment output temporaries on demand.

This gets rid of the duplication of logic between nir_setup_outputs()
and get_frag_output() by allocating fragment output temporaries lazily
whenever get_frag_output() is called. This makes nir_setup_outputs()
a no-op for the fragment shader stage.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7dac8820730777756c00d7024330517848dc3b9f 22-Jul-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Rework representation of fragment output locations in NIR.

The problem with the current approach is that driver output locations
are represented as a linear offset within the nir_outputs array, which
makes it rather difficult for the back-end to figure out what color
output and index some nir_intrinsic_load/store_output was meant for,
because the offset of a given output within the nir_output array is
dependent on the type and size of all previously allocated outputs.
Instead this defines the driver location of an output to be the pair
formed by its GLSL-assigned location and index (I've borrowed the
bitfield macros from brw_defines.h in order to represent the pair of
integers as a single scalar value that can be assigned to
nir_variable_data::driver_location). nir_assign_var_locations is no
longer useful for fragment outputs.

Because fragment outputs are now allocated independently rather than
within the nir_outputs array, the get_frag_output() helper becomes
necessary in order to obtain the right temporary register for a given
location-index pair.

The type_size helper passed to nir_lower_io is now type_size_dvec4
rather than type_size_vec4_times_4 so that output array offsets are
provided in terms of whole array elements rather than in terms of
scalar components (dvec4 is the largest vector type supported by the
GLSL so this will cause all individual fragment outputs to have a size
of one regardless of the type).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
f3cb2c34f29d35088879a6b8101c3ac648e0febf 22-Jul-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Special-case nir_intrinsic_store_output for the fragment shader.

I'm about to change how fragment shader output locations are
represented, so the generic nir_intrinsic_store_output implementation
that assumes that outputs are just contiguous elements in the big
nir_outputs array won't work anymore. This somewhat simplified
implementation of nir_intrinsic_store_output for fragment shaders
should be functionally equivalent to the current fall-back one.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
af0cc743e607293146861518bb6ef96f411aeca9 22-Jul-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Implement non-coherent framebuffer fetch using the sampler unit.

v2: Memoize sample ID, misc codestyle changes. (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
4a87e4ade778e56d43333c65a58752b15a00ce69 21-Jul-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Get rid of fs_visitor::do_dual_src.

This boolean flag was being used for two different things:

- To set the brw_wm_prog_data::dual_src_blend flag. Instead we can
just set it based on whether the dual_src_output register is valid,
which will be the case if the shader writes the secondary blending
color.

- To decide whether to call emit_single_fb_write() once, or in a loop
that would iterate only once, which seems pretty useless.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
d14dd727f4aded5bd34a78dc2c81374a78114440 17-Aug-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Fix barrier count shift in scalar TCS backend.

The "Barrier Count" field goes in 14:9 of m0.2. The vec4 backend
correctly shifts by 9, but the scalar backend only shifted by 8.

It's not like this changed - I think I just made a typo when writing
the original scalar TCS backend code.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
159f0377556c45630cdc0721b193f34217a329b0 17-Aug-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Fix execution size of scalar TCS barrier setup code.

Previously, the scalar TCS backend was generating:

mov(8) g17<1>UD 0x00000000UD { align1 WE_all 1Q compacted };
and(8) g17.2<1>UD g0.2<0,1,0>UD 0x0001e000UD { align1 WE_all 1Q };
shl(8) g17.2<1>UD g17.2<8,8,1>UD 0x0000000bUD { align1 WE_all 1Q };
or(8) g17.2<1>UD g17.2<8,8,1>UD 0x00008200UD { align1 WE_all 1Q };
send(8) null<1>UW g17<8,8,1>UD
gateway (barrier msg) mlen 1 rlen 0 { align1 WE_all 1Q };

This is rubbish - g17.2<8,8,1>UD spans two registers, and is an illegal
region. Not to mention it clobbers 8 channels of data when we only
wanted to touch m0.2.

Instead, we want:

mov(8) g17<1>UD 0x00000000UD { align1 WE_all 1Q compacted };
and(1) g17.2<1>UD g0.2<0,1,0>UD 0x0001e000UD { align1 WE_all };
shl(1) g17.2<1>UD g17.2<0,1,0>UD 0x0000000bUD { align1 WE_all };
or(1) g17.2<1>UD g17.2<0,1,0>UD 0x00008200UD { align1 WE_all };
send(8) null<1>UW g17<8,8,1>UD
gateway (barrier msg) mlen 1 rlen 0 { align1 WE_all 1Q };

Using component() accomplishes this.

Fixes GL44-CTS.tessellation_shader.tessellation_shader_tc_barriers.
barrier_guarded_read_write_calls on Skylake. Probably fixes other
barrier issues on Gen8+.

v2: Use a group(1, 0) builder so inst->exec_size is set correctly
(thanks to Francisco Jerez for catching that it was incorrect).

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v1]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
0c754d1c4203d87dbb9d2dd882ef42686e6d01ec 12-Aug-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Lower TEX to TXL during NIR translation.

This simplifies the code slightly and will allow the SIMD lowering
pass to find out easily what the actual texturing opcode is in order
to determine the maximum execution size of texturing instructions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
027cbf00f248bda325521db8f56a3718898da46b 02-Aug-2016 Mathias Fröhlich <mathias.froehlich@web.de> util: Move _mesa_fsl/util_last_bit into util/bitscan.h

As requested with the initial creation of util/bitscan.h
now move other bitscan related functions into util.

v2: Split into two patches.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
875341c69b99dea7942a68c9060aa31a459e93fc 02-Aug-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Rework the unlit centroid workaround.

Previously, for every input, we moved the dispatch mask to the flag
register, then emitted two predicated PLN instructions, one with
centroid barycentric coordinates (for normal pixels), and one with
pixel barycentric coordinates (for unlit helper pixels).

Instead, we can simply emit a set of predicated MOVs at the top of
the program which copy the pixel barycentric coordinates over the
centroid ones for unlit helper pixel channels. Then, we can just
use normal PLNs.

On Sandybridge:

total instructions in shared programs: 7538470 -> 7534500 (-0.05%)
instructions in affected programs: 101268 -> 97298 (-3.92%)
helped: 705
HURT: 9 (all of which are SIMD16 programs)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
12a912586f11ccbc4612532d5ceaf1bdd0cdb45a 29-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Use a separate register for every access to an SSA undef.

Previously, we allocated a new VGRF for every undefined definition.
Instead, this patch makes us allocate a new VGRF for every use of an
undefined definition. This makes sure that undefined values are
fully independent of one another, and have live ranges limited to
their single use. This allows register coalescing to combine the
source and destination of MOVs from undefined sources, eliminating
the MOV altogether.

On Broadwell:

total instructions in shared programs: 11641187 -> 11640214 (-0.01%)
instructions in affected programs: 70199 -> 69226 (-1.39%)
helped: 213
HURT: 1

v2: Add a comment (based on Iago's suggested one).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a2b3c146d2017a626be66dcf43753d545e902c52 22-Jul-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: fix varying output setup

Since 7f53fead5c we treat every location as using all
four components so we only need special handling for
doubles when they cross multiple locations.

This fixes a crash in GL45-CTS.enhanced_layouts.varying_locations
where the outputs array would overflow when a dmat2 was stored at
the max varying location i.e 30.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
be1c53d2cf2b12655ff69caac49cca75a55e63e0 22-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Fix "operation operation" in comment.

From the redundant redundant department.

Reported-by: Michael Schellenberger Costa <mschellenbergercosta@googlemail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
76e161056a424e5b9c35b02a9f4e520c8c44cf2b 18-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Fix shared atomic intrinsics to pay attention to base.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
ad5dd39984467b29d20e03ec8bd26f6f1d2e97ad 14-Jun-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: add component packing support for load_output intrinsics

Here we use the component qualifier (which is the first component)
as an offset when loading output varyings.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7f53fead5cf9a85c74a94d359dd5fccfbb87856c 23-May-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: enable component packing for vs and fs

Rather than trying to work out the total number of components
used at a location we simply treat all outputs as vec4s. This
removes the need for complex code looping over varyings to match
packed locations and the need for storing the total number of
components used at each location.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3dba8516d6468866f2534f517358a6243eb0995e 20-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Move VS load_input handling to nir_emit_vs_intrinsic().

TCS/TES/GS and now FS all handle these in stage-specific functions.
CS don't have inputs, so VS was the only one left using this code.

Move it to the VS-specific function for clarity.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
1eef0b73aa323d94d5a080cd1efa81ccacdbd0d2 12-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Rewrite FS input handling to use the new NIR intrinsics.

This eliminates the need to walk the list of input variables, recurse
into their types (via logic largely redundant with nir_lower_io), and
interpolate all possible inputs up front. The backend no longer has
to care about variables at all, which eliminates complications from
trying to pack multiple variables into the same location. Instead,
each intrinsic specifies exactly what's needed.

This should unblock Timothy's work on GL_ARB_enhanced_layouts.

Each load_interpolated_input intrinsic corresponds to PLN instructions,
while load_barycentric_at_* intrinsics correspond to pixel interpolator
messages. The pixel/centroid/sample barycentric intrinsics simply refer
to payload fields (delta_xy[]), and don't actually generate any code.

Because we use a single intrinsic for both centroid-qualified variables
and interpolateAtCentroid(), they become indistinguishable. We stop
sending pixel interpolator messages for those, and instead use the
payload provided data, which should be considerably faster.

On Broadwell:

total instructions in shared programs: 9067751 -> 9067570 (-0.00%)
instructions in affected programs: 145902 -> 145721 (-0.12%)
helped: 422
HURT: 209

total spills in shared programs: 2849 -> 2899 (1.76%)
spills in affected programs: 760 -> 810 (6.58%)
helped: 0
HURT: 10

total fills in shared programs: 3910 -> 3950 (1.02%)
fills in affected programs: 617 -> 657 (6.48%)
helped: 0
HURT: 10

LOST: 3
GAINED: 3

The differences mostly appear to be slight changes in MOVs.

v2: Use nir_shader_compiler_options::use_interpolated_input_intrinsics
flag rather than passing it directly to nir_lower_io. Use the
unreachable() macro rather than assert in one place. (Review
feedback from Chris Forbes.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
96dfed49e47eac7afc100e5b8d3b316dd6652fb6 19-Jul-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965: Stop muging cube array lengths by 6

From the Sky Lake PRM:

"For SURFTYPE_CUBE: For Sampling Engine Surfaces and Typed Data Port
Surfaces, the range of this field is [0,340], indicating the number of
cube array elements (equal to the number of underlying 2D array elements
divided by 6). For other surfaces, this field must be zero."

In other words, the depth field for cube maps is in number of cubes not
number of 2-D slices so we need to divide by 6. ISL will do this correctly
for us assuming that we provide it with the correct array bounds which it
expects to be in 2-D slices. It appears as if we've been doing this wrong
ever since we first added cube map arrays for Sandy Bridge and the change
to ISL made things slightly worse. While we're at it, we now need to remoe
the shader hacks we've always done since they were only needed because we
were setting the depth field six times too large.

v2: Fix the vec4 backend as well (not sure how I missed this).

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3e7cebc8da5c9f16fa1b9a25ea72b8d31c86a440 22-Jun-2016 Ian Romanick <ian.d.romanick@intel.com> i965: Use LZD to implement nir_op_find_lsb on Gen < 7

v2: Rebase on changes to previous two patches.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c2019c6c261d5c46a4e5d3edc88836bcedf75f30 22-Jun-2016 Ian Romanick <ian.d.romanick@intel.com> i965: Use LZD to implement nir_op_ifind_msb on Gen < 7

v2: Retype LZD source as UD to avoid potential problems with 0x80000000.
Suggested by Matt. Also update comment about problem values with
LZD(abs(x)). Suggested by Curro.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
de20086eed47e6bfe7c25835d72383114f99c7a9 22-Jun-2016 Ian Romanick <ian.d.romanick@intel.com> i965: Use LZD to implement nir_op_ufind_msb

This uses one less instruction.

v2: Move emit_find_msb_using_lzd out of the visitor classes. Suggested
by Curro.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
0f2516d88f6607b2816445c2dc18607cdaf1beff 15-Jul-2016 Iago Toral Quiroga <itoral@igalia.com> i965/tes/scalar: fix 64-bit indirect input loads

We totally ignored this before because there were no piglit tests for
indirect loads in tessellation stages with doubles.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
1737e75bfb85eb22a30e4f1c69a825b3abd946f6 15-Jul-2016 Iago Toral Quiroga <itoral@igalia.com> i965/tcs/scalar: only update imm_offset for second message in 64bit input loads

Our indirect URB read messages take both a direct and an indirect offset
so when we emit the second message for a 64-bit input load we can just
always incremement the immediate offset, even for the indirect case.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
18f67c8a69fcde5d3f585effeef670d0861b0730 14-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Move pulls_bary setting to emit_pixel_interpolator_send().

pulls_bary should be set when the shader uses a pixel interpolator
message. So, setting it from the function that emits pixel interpolator
messages makes a lot of sense.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7ef7738a61ded5632105b8de6f8141307592e20a 15-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Write gl_FragCoord directly to the destination.

This patch makes emit_general_interpolation take a destination register
as an argument, and write directly to that. This is simpler than the
old approach of ralloc'ing a register, writing to that temporary, and
then making the caller emit per-component MOVs to copy it to the actual
destination.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
ac1181ffbef5250cb3b651e047cce5116727c34c 07-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> compiler: Rename INTERP_QUALIFIER_* to INTERP_MODE_*.

Likewise, rename the enum type to glsl_interp_mode.

Beyond the GLSL front-end, talking about "interpolation modes" seems
more natural than "interpolation qualifiers" - in the IR, we're removed
from how exactly the source language specifies how to interpolate an
input. Also, SPIR-V calls these "decorations" rather than "qualifiers".

Generated by:
$ find . -regextype egrep -regex '.*\.(c|cpp|h)' -type f -exec sed -i \
-e 's/INTERP_QUALIFIER_/INTERP_MODE_/g' \
-e 's/glsl_interp_qualifier/glsl_interp_mode/g' {} \;

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
94135e8736f2741684e978afac9d34c368f7bcb1 07-Jul-2016 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/fs: emit DIM instruction to load 64-bit immediates in HSW

v2 (Matt):
- Use brw_imm_df() as source argument of DIM instruction.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
87a13f598b1ecd50bc209088cf1dc60fd90df015 11-Jul-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: use the new helper function to create double immediates

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
9e196e907ee87bff2b8c215df5e31a0cd1d1a322 09-Mar-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: add a helper function to create double immediates

Gen7 hardware does not support double immediates so these need
to be moved in 32-bit chunks to a regular vgrf instead. Instead
of doing this every time we need to create a DF immediate,
create a helper function that does the right thing depending
on the hardware generation.

v2:
- Define setup_imm_df() as an independent function (Curro)
- Create a specific builder to get rid of some instruction field
assignments (Curro).

v3:
- Get devinfo from builder (Kenneth)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
27e28197e8e82e8c47fda5d6e912c5cb62c03f4a 10-Jun-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: add double packing support to tess stages

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
8b80e9c31db62ccf54ab593b47016ea514dec81c 10-Jun-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: add double support packing support to gs inputs

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
9d9b0b54cdc212c372ac67cc14d7ba1a16cc69ef 22-May-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: add indirect packing support to gs load inputs

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
2477e6cfada55563631c654fce9250e4fe276f0e 23-May-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: add indirect packing support for tcs and tes

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
2bda4b062f62edac1011bf65f410eeca176b5e23 20-May-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: add component packing support for tcs

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
cfff71a47a655e8cf930e858d408dc4db942ec7c 19-May-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: add component packing support for tes

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a102ef2d4fd01a946f949a45115d65abb6714a5b 19-May-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: add component packing support for gs

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
255cff76d961e56199acab2ab523140e43ea2de2 23-Jun-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Drop unnecessary inst->base_mrf = -1 assignments.

These are now unnecessary, as base_mrf is -1 by default.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
bdab572a86f27b92ba10124f85d278e9c8861fff 13-Jun-2016 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/fs: indirect addressing with doubles is not supported in CHV/BSW/BXT

From the Cherryview's PRM, Volume 7, 3D Media GPGPU Engine, Register Region
Restrictions, page 844:

"When source or destination datatype is 64b or operation is integer DWord
multiply, indirect addressing must not be used."

v2:
- Fix it for Broxton too.

v3:
- Simplify code by using subscript() and not creating a new num_components
variable (Kenneth).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
0177dbb6c2fe876a9761a4a97eec44accfa4c007 13-Jun-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: Fix single-precision to double-precision conversions for CHV/BSW/BXT

From the Cherryview PRM, Volume 7, 3D Media GPGPU Engine,
Register Region Restrictions:

"When source or destination is 64b (...), regioning in Align1
must follow these rules:

1. Source and destination horizontal stride must be aligned to
the same qword.
(...)"

v2:
- Fix it for Broxton too.

v3:
- Remove inst->regs_written change as it is not necessary (Ken)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a8a9d1bf41c00123cefb6e757f3509c62e880a15 14-Jun-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: remove type_size_vec4_times_4()

type_size_vec4_times_4() was introduced as a fix in 8dcf807cb43383
however since 3810c1561 we can just use type_size_scalar() and
get the actual number of outputs we need.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
2b648ec17c2934802dd56452d11d78ec2d525a06 27-May-2016 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/gs/scalar: Fix load input for doubles

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
2d6f82a294ad1ab1eab0020cf65df5ecc9591272 26-May-2016 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/fs: fix offset when loading double vector input varyings

When we are not packing a double input varying, we might need to
read its data in a non-aligned to 64-bit offset, so we read
the wrong data. This is happening when using explicit locations
in varyings because Mesa disables packing varying for that case.

const_index is in 32-bit size units but offset() is multiplying
it by destination type size units. When operating with double
input varyings, const_index value could be not aligned to 64 bits.
To fix it, we load the double vector as if it was a float based vector
with twice the number of components.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3fb289f957a8a27349a6f7df03983f92d9b6cf64 02-Jun-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs Add a wm_prog_data bit for has_side_effects

This is more accurate than calling
_mesa_active_fragment_shader_has_side_effects because it looks at whether
or not the SSBOs, images, or atomic buffers are actually written rather
than just existing in the program.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
0a3acff5b53d409181dcd2f31a4a50af06f73a57 23-May-2016 Jordan Justen <jordan.l.justen@intel.com> i965: Remove old CS local ID handling

The old method pushed data for each channels uvec3 data of
gl_LocalInvocationID.

The new method pushes 1 dword of data that is a 'thread local ID'
value. Based on that value, we can generate gl_LocalInvocationIndex
and gl_LocalInvocationID with some calculations.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
8f48d23e0fcc0809f6397a67c26751a45a95e076 23-May-2016 Jordan Justen <jordan.l.justen@intel.com> i965: Add nir channel_num system value

v2:
* simd16/32 fixes (curro)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
25e1b8d366a6131bc9d46fe27f6bc476f05a7a58 01-Jun-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Fix isoline reads in scalar TES.

Isolines aren't reversed. commit 5b2d8c2273c6f fixed this for the vec4
TES backend, but not the scalar one.

Found while debugging GL45-CTS.tessellation_shader.
tessellation_control_to_tessellation_evaluation.gl_tessLevel.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b27dfa5403ed1884999524417c08d2bc50365965 24-May-2016 Ian Romanick <ian.d.romanick@intel.com> i965: If control_data_header_size_bits is zero, don't do EndPrimitive

This can occur when max_vertices=0 is explicitly specified.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
796238d9e6eee0b942d34c57bd8bdf0f9c98b6c3 18-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Use SIMD8 SSBO GET_BUFFER_SIZE message regardless of the dispatch width.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
29e471725115edf941458c5be0bb7e93218ddd0f 18-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Don't emit duplicated SSBO GET_BUFFER_SIZE instruction unnecessarily.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a55452530f7525e9cf5d2619bef66a61b488b4af 26-Apr-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Emit fixed width memory fence opcode regardless of the dispatch width.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
29ce110be6d0d4e4df51be635810f528f7dd7f40 19-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Remove extract virtual opcodes.

These can be easily represented in the IR as a MOV instruction with
strided source so they seem rather redundant.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a8e7b4f1d9ec50d2214e7694da26af6a108e506f 20-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Handle SAMPLEINFO consistently like other texturing instructions.

Seems like this texturing opcode was missing its logical counterpart
which would prevent it from taking advantage of the SIMD lowering
infrastructure, define it and plumb it through the back-end. At some
point we'll likely want to emit a single SAMPLEINFO message shared
among all channels irrespective of this change, but for the moment
this should be enough to get the intrinsic working in SIMD32 mode.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
47e2a57fe955c04763c979ff4ca61c6867fa05bb 18-May-2016 Jordan Justen <jordan.l.justen@intel.com> i965/compute: Fix uniform init issue when SIMD8 is skipped

In d8347f12ead89c5a58f69ce9283a54ac8487159c, we added support for
skipping SIMD8 generation when the program local size is too large for
SIMD8 to be usable. This change was missed in that commit.

This bug would impact gen7 platforms when the compute shader local
size is greater than 512, and gen8 platforms when the local size is
greater than 448.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
e58fabc93a25ccc910369f3638b302d46de12271 24-May-2016 Jordan Justen <jordan.l.justen@intel.com> i965/gen7: Fix gl_HelperInvocation

It appears that UV immediates aren't working on Ivy Bridge. In this
case, a signed version will work, and this fixes the piglit
tests/spec/glsl-4.50/execution/helper-invocation.shader_test test.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
44997fc0c1cc7f24216e3b1c5d954919df946ee5 02-May-2016 Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com> i965: Support textures with multiple planes

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
015035027beb38fb9a3b06f8cd94aadc96a8f728 23-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Mark UBO uniform pull constant loads as force_writemask_all.

This lets the rest of the backend know that the uniform pull constant
load opcodes don't respect channel enables -- Without this the
register allocator has no way to know that the return payload of a
pull constant load is not per-channel and spills of the destination
will be broken under non-uniform control flow.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b46867cd378e5fb135fd060d50c8028d3dac622a 19-May-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: do not depend on std140 alignment rules for UBO loads

The previous implementation relied on the std140 alignment rules to
avoid handling misalignment in the case where we are loading more than
2 double components from a vector, which requires to emit a second load
message.

This alternative implementation deals with misalignment and is more
flexible going forward.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
dac10e8a1390711f1f36f224644c4a33586cebe3 17-May-2016 Kenneth Graunke <kenneth@whitecape.org> i965, anv: Use NIR FragCoord re-center and y-transform passes.

This handles gl_FragCoord transformations and other window system vs.
user FBO coordinate system flipping by multiplying/adding uniform
values, rather than recompiles.

This is much better because we have no decent way to guess whether
the application is going to use a shader with the window system FBO
or a user FBO, much less the drawable height. This led to a lot of
recompiles in many applications.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
fb5dcb81cc121e4355b7eef014474a5c42a2f6db 19-May-2016 Matt Turner <mattst88@gmail.com> i965: Pass nir_src/nir_dest by reference.

Cuts 6K of .text.

text data bss dec hex filename
5772372 264648 29320 6066340 5c90a4 lib/i965_dri.so before
5766074 264648 29320 6060042 5c780a lib/i965_dri.so after

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
07353599e07529e98494057f556b9d96c1df5cfd 05-May-2016 Matt Turner <mattst88@gmail.com> i965/fs: Add and use get_nir_src_imm().

The next patch wants to inspect the LOD argument and do something
different if it's 0.0f. But at that point we've emitted a MOV for it and
we just have a register to look at.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
cbb0e3a7e8fffa4d5c5af8660d99cd3da8af97ec 17-May-2016 Matt Turner <mattst88@gmail.com> i965/fs: Assert that nir_op_extract_*'s src1 is a constant.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
ccfe25f7583dd8d0ff0609de3728c8b15fb0f8fb 31-Mar-2016 Juan A. Suarez Romero <jasuarez@igalia.com> i965/fs: shuffle 32bits into 64bits for doubles

VS Thread Payload handles attributes in URB as vec4, no matter if they
are actually single or double precision.

So with double-precision types, value ends up in the registers split in
32bits chunks, in different positions.

We need to shuffle the chunks to get the doubles correctly.

v2:
* Extra blank line. Add { } on if body (Ian Romanick)
* Use dest directly (Kenneth Graunke)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
58f304defe804a6f01b0b961997ecfe61fe00d34 09-May-2016 Iago Toral Quiroga <itoral@igalia.com> i965/tes/scalar: Fix load input for doubles

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
61197b8d5dd963bd9288385308feb3f0dcaf6742 09-May-2016 Iago Toral Quiroga <itoral@igalia.com> i965/tcs/scalar: fix store output for doubles

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
cda3435ea85904a17c5c23a7c044e59ba0181b96 09-May-2016 Iago Toral Quiroga <itoral@igalia.com> i965/tcs/scalar: fix load input for doubles

v2: do not write to the original indirect_offset since that is
an expression that could be used somewhere else (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
66192b3c16b09fa7ba97574103fc3d883b3cbfdb 09-May-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: fix nir_intrinsic_store_output for doubles

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3cce67aff09a4c248e9a69a8b05a63ac6b3e4878 09-May-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: fix number of output components for doubles

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
8c6d147373cbdefef5945b00626bb62bb03198ca 26-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: support doubles with shared variable stores

This is pretty much the same we do with SSBOs.

v2: do not shuffle in-place, it is not safe since the original 64-bit data
could be used after the write, instead use a temporary like we do
for SSBO stores (Iago)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
943f9442bf7943a992730e642e91ed874d50790c 25-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: support doubles with ssbo stores

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b9aa66aa516c100d5476ee966f428aaf743d786c 25-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: add shuffle_64bit_data_for_32bit_write helper

This does the inverse operation of shuffle_32bit_load_result_to_64bit_data
and we will use it when we need to write 64-bit data in the layout expected
by untyped write messages.

v2 (curro):
- Use subscript() instead of stride()
- Assert on the input types rather than silently retyping.
- Use offset() instead of horiz_offset(), drop the multiplier definition.
- Drop the temporary vgrf and force_writemask_all.
- Make component_i const.
- Move to brw_fs_nir.cpp

v3 (curro):
- Pass dst and src by reference.
- Simplify allocation of tmp register.
- Move to brw_fs_nir.cpp.
- Get rid of the temporary.

v3 (Iago):
- Check that the src and dst regions do not overlap, since that would
typically be a bug in the caller.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
33f7ec18ac399719df06ab7031cb43965e6793be 25-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: support doubles with SSBO loads

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
8aa01ac596fc0722058e10808c8141533c3fd1fe 05-May-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: support doubles with shared variable loads

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
6eab06b866916d4fd52adf7b8bb6113948a3811a 05-May-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: Add do_untyped_vector_read helper

We are going to need the same logic for anything that reads
doubles via untyped messages (CS shared variables and SSBOs). Add a
helper function with that logic so that we can reuse it.

v2:
- Make this a static function instead of a method of fs_visitor (Iago)
- We only support types with a size of 4 or 8 (Curro)
- Avoid retypes by using a separate vgrf for the packed result (Curro)
- Put dst parameter before source parameters (Curro)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b86d4780ed203b2a22afba5f95c73b15165a7259 13-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: support doubles with UBO loads

UBO loads with constant offset use the UNIFORM_PULL_CONSTANT_LOAD
instruction, which reads 16 bytes (a vec4) of data from memory. For dvec
types this only provides components x and y. Thus, if we are reading
more than 2 components we need to issue a second load at offset+16 to
read the next 16-byte chunk with components w and z.

UBO loads with non-constant offset emit a load for each component
in the vector (and rely in CSE to fix redundant loads), so we only
need to consider the size of the data type when computing the offset
of each element in a vector.

v2 (Sam):
- Adapt the code to use component() (Curro).

v3 (Sam):
- Use type_sz(dest.type) in VARYING_PULL_CONSTANT_LOAD() call (Curro).
- Add asserts to ensure std140 vector alignment rules are followed
(Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
58f1804c4f38b76c20872d6887b7b5e6029e0454 18-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: fix pull constant load component selection for doubles

UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4 starting at a
constant offset that is 16-byte aligned. If we need to access an unaligned
offset we emit a load with an aligned offset and use the remaining constant
offset to select the component into the vec4 result that we are interested
in. This component must be computed in units of the type size, since that
is what fs_reg::set_smear expects.

This patch does this change in the two places where we use this message:
In demote_pull_constants when we lower uniform access with constant offset
into the pull constant buffer and in UBO loads with constant offset.

v2 (Sam):
- Fix set_smear() in fs_visitor::lower_constant_loads(), take into account
source type instead and remove MAX2 (Curro).
- Improve changes to nir_intrinsic_load_ubo case in nir_emit_intrinsic()
(Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
50b7676dc46bae39c5e9b779828ef4fb2e1fbefc 22-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: add shuffle_32bit_load_result_to_64bit_data helper

There will be a few places where we need to shuffle the result of a 32-bit
load into valid 64-bit data, so extract this logic into a separate helper
that we can reuse.

v2 (Curro):
- Use subscript() instead of stride()
- Assert on the input types rather than retyping.
- Use offset() instead of horiz_offset(), drop the multiplier definition.
- Don't use force_writemask_all.
- Mark component_i as const.
- Make the function name lower case.

v3 (Curro):
- Pass src and dst by reference.
- Move to brw_fs_nir.cpp

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c907ca6c8d256f4b8c271bcf0901661ef943ae08 13-May-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Flip interpolateAtOffset's y offset when necessary.

Fixes 4 dEQP-GLES31.functional.shaders.multisample_interpolation tests:
- interpolate_at_offset.no_qualifiers.default_framebuffer
- interpolate_at_offset.centroid_qualifier.default_framebuffer
- interpolate_at_offset.sample_qualifier.default_framebuffer
- interpolate_at_offset.array_element.default_framebuffer

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
50e5e1f747ad820eb491e093600a4bde9c13efba 03-May-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Implement the new NIR MCS texturing

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3aa542c65760c7e9b92a41d850677a44879cc5c7 09-May-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Delete bogus assertion in emit_gs_input_load().

This looks like leftover cruft from an earlier attempt at writing
point size hacks. Each vertex has its own copy of gl_PointSize,
so accessing any vertex other than 0 would cause this to fail.

The tests seem to work fine without it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
1c41cb58def637c9e033cb7bf108f1096c9ae63c 08-May-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Support instanced GS inputs in the scalar backend.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
5fc37726501bc65f3bbaef2573ac89e980f1a412 08-May-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Use an early return for the push case in emit_gs_input_load().

Just trying to keep things from getting too ugly in the next commit.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
75ada43a3af88835de6a83ed453d4ed512df0412 19-Apr-2016 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/fs: take into account doubles when calculating read_size for MOV_INDIRECT

v2:
- Fix assert's line width (Topi).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
efaf62a40a95b240cab7b0f371c7178aa19b7f3a 12-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: implement i2d and u2d

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c63a6f21494685d41d51887901298639c4d32c22 18-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: implement d2i and d2u

These need the same treatment as d2f, so generalize our d2f lowering to cover
these too.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
e0c45182e3d865d7f187dc35e70832f1fa7c9fad 18-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: implement d2b

v2: Use subscript() instead of stride() (Curro)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
80f60a4302c8bd805882baaf60db72cf785593e3 07-Jan-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: implement fsign() for doubles

v2 (Sam):
- Fix indentation (Kenneth)
- Simplify code (Kenneth)

v3: Use subscript() instead of stride() (Curro)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
e8a8fc956358fb5e0f776b39fdbce9247bb5538a 10-Nov-2015 Iago Toral Quiroga <itoral@igalia.com> i965/fs: We only support 32-bit integer ALU operations for now

Add asserts so we remember to address this when we enable 64-bit
integer support, as suggested by Connor and Jason.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a644b0939dd8284bca25042bccd2439c173dd7d7 30-Jul-2015 Connor Abbott <connor.w.abbott@intel.com> i965/fs: add support for f2d and d2f

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
e83f51d54e9c3db11526b66a741352135eae6f52 04-Aug-2015 Connor Abbott <connor.w.abbott@intel.com> i965/fs: fix compares for doubles

The destination has to have the same source as the type, or else the
simulator will complain. As a result, we need to emit a CMP that
outputs a 64-bit wide result and then do a strided MOV to pick out the
low 32 bits of each channel.

v2: Use subscript() instead of stride() (Curro)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
935e0e305dd7a4f67557e969513a30357d308efb 19-Apr-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: optimize unpack double

When we are actually unpacking from a double that we have previously
packed from its 32-bit components we can bypass the pack operation
and source from its arguments directly.

v2 (Sam):
- Fix line overflow (Topi)
- Bail if the parent instruction's source is not SSA (Connor)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
ba1907f040e9d61be932a8e098061d94d4ba30cb 19-Apr-2016 Iago Toral Quiroga <itoral@igalia.com> i965/fs: optimize pack double

When we are actually creating a double using values obtained from a
previous unpack operation we can bypass the unpack and source from
the original double value directly.

v2:
- Style changes (Topi)
- Bail is parent instruction's src is not SSA (Connor)

v3: Use subscript() instead of stride() (Curro)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7782f39e759798975ace6f3272dd3f263ddc8702 14-Aug-2015 Connor Abbott <connor.w.abbott@intel.com> i965/fs/nir: translate double pack/unpack

v2 (Sam):
- Fix line overflow (Topi).

v3: Use subscript() instead of stride() (Curro)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
d17cdacba37cff8ee172322c9ba2c4a58bf57d8b 29-Jul-2015 Connor Abbott <connor.w.abbott@intel.com> i965/fs: always pass the bitsize to brw_type_for_nir_type()

v2 (Sam):
- Add bitsize to brw_type_for_nir_type() in optimize_extract_to_float()

v3 (Sam):
- Fix line width (Topi).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
0f1690fd9514f7a282141a7ad57a06b334b6c1a4 29-Jul-2015 Connor Abbott <connor.w.abbott@intel.com> i965/fs: use the NIR bit size when creating registers

v2 (Iago):
- Squashed bits from 'support double precission constant operands for
the implementation of 64-bit emit_load_const'.
- Do not use BRW_REGISTER_TYPE_D for all 32-bit registers since that breaks
asserts and functionality for some piglit tests. Just keep 32-bit types
untouched and add 64-bit support.
- Use DF instead of Q for 64-bit registers. Otherwise the code we generate
will use Q sometimes and DF others and we hit unwanted DF/Q conversions,
so always use DF.

v3 (Sam):
- Mark 'reg_type' occurrences as const (Topi).

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Tapani Palli <tapani.palli@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7bc987abe0dc863b091bf77f5b02138ebe79e559 03-May-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Move handling of samples_identical into the switch statement

This is where we handle texop_texture_samples so it makes things more
consistent.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3ba228f9978cbabc2b4731327454dd91a208c317 03-May-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Simplify texture destination fixups

There are a few different fixups that we have to do for texture
destinations that re-arrange channels, fix hardware vs. API mismatches, or
just shrink the result to fit in the NIR destination. These were all being
done in a somewhat haphazard manner. This commit replaces all of the
shuffling with a single LOAD_PAYLOAD operation at the end and makes it much
easier to insert fixups between the texture instruction itself and the
LOAD_PAYLOAD.

Shader-db results on Haswell:

total instructions in shared programs: 6227035 -> 6226669 (-0.01%)
instructions in affected programs: 19119 -> 18753 (-1.91%)
helped: 85
HURT: 0

total cycles in shared programs: 56491626 -> 56476126 (-0.03%)
cycles in affected programs: 672420 -> 656920 (-2.31%)
helped: 92
HURT: 42
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a815499294afb485fe6773fba9ba12fa6773c654 03-May-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Merge nir_emit_texture and emit_texture

The fs_visitor::emit_texture helper originated when we still had both NIR
and IR visitors for the FS backend. Since the old visitor was removed,
emit_texture serves no real purpose beyond arbitrarily splitting
heavily-linked code across two functions.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a808ba59657b3e5c6399e51fa1f4ebe9cad201a9 03-May-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Rework passthrough TCS checks.

According to Timothy, using program_string_id == 0 to identify the
passthrough TCS is going to be problematic for his shader cache work.

So, change it to strcmp() the name at visitor creation time.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7d9143ad885752184156b3a0d3e492aef09af3b0 15-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Write a scalar TCS backend that runs in SINGLE_PATCH mode.

Unlike most shader stages, the Hull Shader hardware makes us explicitly
tell it how many threads to dispatch and manually configure the channel
mask. One perk of this is that we have a lot of flexibility - we can
run it in either SIMD4x2 or SIMD8 mode.

Treating it as SIMD8 means that shaders with 8 or fewer output vertices
(which is overwhemingly the common case) can be handled by a single
thread. This has several intriguing properties:

- Accessing input arrays with gl_InvocationID as the index is a simple
SIMD8 URB read with g1 as the header. No indirect addressing required.
- Barriers are no-ops.
- We could potentially do output shadowing to combine writes, as the
concurrency concerns are gone. (We don't do this yet, though.)

v2: Drop first_non_payload_grf change, as it was always adding 0
(caught by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
9464d8c49813aba77285e7465b96e92a91ed327c 27-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> nir: Switch the arguments to nir_foreach_function

This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:

s/nir_foreach_function(\([^,]*\),\s*\([^,]*\))/nir_foreach_function(\2, \1)/

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
707e72f13bb78869ee95d3286980bf1709cba6cf 27-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> nir: Switch the arguments to nir_foreach_instr

This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:

s/nir_foreach_instr(\([^,]*\),\s*\([^,]*\))/nir_foreach_instr(\2, \1)/

and similar expressions for nir_foreach_instr_safe etc.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7efff10585122d484dc3adab14af9380b9b8f309 13-Apr-2016 Connor Abbott <cwabbott0@gmail.com> i965/nir: fixup for new foreach_block()

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
13195f7ef85e0923a7b7d5b8a35eb6b6c257db1c 23-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Reduce the response length of sampler messages on Skylake.

Often, we don't need a full 4 channels worth of data from the sampler.
For example, depth comparisons and red textures only return one value.
To handle this, the sampler message header contains a mask which can
be used to disable channels, and reduce the message length (in SIMD16
mode on all hardware, and SIMD8 mode on Broadwell and later).

We've never used it before, since it required setting up a message
header. This meant trading a smaller response length for a larger
message length and additional MOVs to set it up.

However, Skylake introduces a terrific new feature: for headerless
messages, you can simply reduce the response length, and it makes
the implicit header contain an appropriate mask. So to read only
RG, you would simply set the message length to 2 or 4 (SIMD8/16).

This means we can finally take advantage of this at no cost.

total instructions in shared programs: 9091831 -> 9073067 (-0.21%)
instructions in affected programs: 191370 -> 172606 (-9.81%)
helped: 2609
HURT: 0

total cycles in shared programs: 70868114 -> 68454752 (-3.41%)
cycles in affected programs: 35841154 -> 33427792 (-6.73%)
helped: 16357
HURT: 8188

total spills in shared programs: 3492 -> 1707 (-51.12%)
spills in affected programs: 2749 -> 964 (-64.93%)
helped: 74
HURT: 0

total fills in shared programs: 4266 -> 2647 (-37.95%)
fills in affected programs: 3029 -> 1410 (-53.45%)
helped: 74
HURT: 0

LOST: 1
GAINED: 143

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c7a09c057162ed0b7e9e039470c76bb79518876c 10-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Properly report regs_written from SAMPLEINFO

The previous behavior would only allocate one register and then write
four thus potentially stomping three innocent bystanders.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
0bd956b34b376bdc1eaf91a2a8463d13dd59e641 24-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Don't force a header for texture offsets of 0.

Calling textureOffset() with an offset of <0, 0, 0> is equivalent to
calliing texture(). We don't actually need to set up an offset,
which causes a message header to be created.

A fairly common pattern is to sample at a point with a bunch of
offsets, and average them. It's natural to write all the lookups
as textureOffset, but use <0, 0> for the center sample.

shader-db results on Skylake:

total instructions in shared programs: 9092095 -> 9092087 (-0.00%)
instructions in affected programs: 2826 -> 2818 (-0.28%)
helped: 12
HURT: 2

total cycles in shared programs: 70870166 -> 70870144 (-0.00%)
cycles in affected programs: 15924 -> 15902 (-0.14%)
helped: 2
HURT: 0

This also helps prevent code quality regressions in a future patch.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
f310c02b94fba0a0a5ea7f5573f906de823cc5fe 16-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_surface_builder: Take a GL format enum instead of mesa_format

Reviewed-by: Chad Versace <chad.versace@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
d8c8f4203f8bb18152af0d0c120f3582a93c07c2 06-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Fix interpolateAtSample() on single sampled buffers.

Fixes dEQP-GLES31.functional.shaders.multisample_interpolation tests:
- interpolate_at_sample.non_multisample_buffer.sample_n_default_framebuffer
- interpolate_at_sample.non_multisample_buffer.sample_n_singlesample_rbo
- interpolate_at_sample.non_multisample_buffer.sample_n_singlesample_texture

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
447d3eec6a869200612e5010f47335cb26789a3a 06-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Fix gl_SampleMaskIn[] in per-sample shading mode.

The coverage mask is not sufficient - in per-sample mode, we also need
to AND with a mask representing the samples being processed by the
current fragment shader invocation.

Fixes 18 dEQP-GLES31.functional.shaders.sample_variables tests:

sample_mask_in.bit_count_per_sample.multisample_{rbo,texture}_{1,2,4,8}
sample_mask_in.bit_count_per_two_samples.multisample_{rbo,texture}_{4,8}
sample_mask_in.bits_unique_per_sample.multisample_{rbo,texture}_{1,2,4,8}
sample_mask_in.bits_unique_per_two_samples.multisample_{rbo,texture}_{4,8}

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b6dc940ec273252678d40707d300851fa1c85ea5 13-Apr-2016 Connor Abbott <cwabbott0@gmail.com> nir: rename nir_foreach_block*() to nir_foreach_block*_call()

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
f30f6e26252ed09eca1922f7c8633c7c7b6e50fe 15-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Don't allow OOB array access of images

We have had a guard against OOB array access of images on IVB for a long
time, but it can actually cause hangs on any GPU generation. This can
happen due to getting an untyped SURFACE_STATE for a typed message. We
didn't used to hit this with the piglit test on anything other than IVB
because the OOB in the test would cause us to go past the top of the pull
constant UBO and we would get a surface index of 0 which is was always a
valid surface. Now that we're pushing small arrays, we can end up grabbing
garbage from the GRF and going to some random index which causes a hang.
The solution is to just do the bounds check on all hardware.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94944
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
479e38ad63ab1421afe4f25d36f434ac2e12e817 25-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Get rid of the param_size array

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3c93cdfaf598bc3c28e3dc288da35675c666602b 25-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use MOV_INDIRECT for all indirect uniform loads

Instead of using reladdr, this commit changes the FS backend to emit a
MOV_INDIRECT whenever we need an indirect uniform load. We also have to
rework some of the other bits of the backend to handle this new form of
uniform load. The obvious change is that demote_pull_constants now acts
more like a lowering pass when it hits a MOV_INDIRECT.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
240d16ea94834eb2472e91fd4856381951a07007 25-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use UD type for offsets in VARYING_PULL_CONSTANT_LOAD

Reveiewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
765dd6534937e125b95c7998862b1a4ec76a22d8 25-Mar-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965: Implement the new imod and irem opcodes

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
bfd17c76c1267756ea16051cbe174cb23ff49f44 08-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Port INTEL_PRECISE_TRIG=1 to NIR.

This makes the extra multiply visible to NIR's algebraic optimizations
(for constant reassociation) as well as constant folding. This means
that when the result of sin/cos are multiplied by an constant, we can
eliminate the extra multiply altogether, reducing the cost of the
workaround.

It also means we only have to implement it one place, rather than in
both backends.

This makes INTEL_PRECISE_TRIG=1 cost nothing on GPUTest/Volplosion,
which has a ton of sin() calls, but always multiplies them by an
immediate constant. The extra multiply gets folded away.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
5ea3647f89abccea5496824815b5b729f38f7a23 25-Mar-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Move the code for load/store_shared to emit_cs_intrinsic

They are compute-shader only and that's where the code for doing atomics on
shared variables lives so it seemes to make sense.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
80c72a8ea7b1018661da0e6509a7f88ca1f5086f 25-Mar-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Provide a default LOD for buffer textures

Our hardware requires an LOD for all texelFetch commands even if they are
on buffer textures. GLSL IR gives us an LOD of 0 in that case, but the LOD
is really rather meaningless. This commit allows other NIR producers to be
more lazy and not provide one at all.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
65fbc43d54403905e3eaea02372b5a364dc1d773 27-Jan-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Add an INTEL_PRECISE_TRIG=1 option to fix SIN/COS output range.

The SIN and COS instructions on Intel hardware can produce values
slightly outside of the [-1.0, 1.0] range for a small set of values.
Obviously, this can break everyone's expectations about trig functions.

According to an internal presentation, the COS instruction can produce
a value up to 1.000027 for inputs in the range (0.08296, 0.09888). One
suggested workaround is to multiply by 0.99997, scaling down the
amplitude slightly. Apparently this also minimizes the error function,
reducing the maximum error from 0.00006 to about 0.00003.

When enabled, fixes 16 dEQP precision tests

dEQP-GLES31.functional.shaders.builtin_functions.precision.
{cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}.

at the cost of making every sin and cos call more expensive (about
twice the number of cycles on recent hardware). Enabling this
option has been shown to reduce GPUTest Volplosion performance by
about 10%.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
14c46954c910efb1db94a068a866c7259deaa9d9 25-Mar-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965: Add an implemnetation of nir_op_fquantize2f16

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
084b24f5582567ebf5aa94b7f40ae3bdcb71316b 16-Mar-2016 Iago Toral Quiroga <itoral@igalia.com> nir: rename nir_const_value fields to include bitsize information

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
ef76ea4ba97d0ac122491fd3f1b2bbb8e4163150 04-Mar-2016 Alejandro Piñeiro <apinheiro@igalia.com> i965/fs/nir: "surface_access::" prefix not needed

"using namespace brw::surface_access" is already present at the
top of the source file.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
1f862e923cba1d5cd54a707f70f0be113635e855 21-Jan-2016 Matt Turner <mattst88@gmail.com> i965/fs: Optimize float conversions of byte/word extract.

instructions in affected programs: 31535 -> 29966 (-4.98%)
helped: 23

cycles in affected programs: 272648 -> 266022 (-2.43%)
helped: 14
HURT: 1

The patch decreases the number of instructions in the two Unigine
programs by:

#1721: 4374 -> 4155 instructions (-5.01%)
#1706: 3582 -> 3363 instructions (-6.11%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
0e9dc59a58e632979b3bdebb19d184bd22a0c182 11-Feb-2016 Matt Turner <mattst88@gmail.com> i965: Make emit_minmax return an instruction*.

And use it in brw_fs_nir.cpp.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
2f2c00c7279e7c43e520e21de1781f8cec263e92 11-Feb-2016 Matt Turner <mattst88@gmail.com> i965: Lower min/max after optimization on Gen4/5.

Gen4/5's SEL instruction cannot use conditional modifiers, so min/max
are implemented as CMP + SEL. Handling that after optimization lets us
CSE more.

On Ironlake:

total instructions in shared programs: 6426035 -> 6422753 (-0.05%)
instructions in affected programs: 326604 -> 323322 (-1.00%)
helped: 1411

total cycles in shared programs: 129184700 -> 129101586 (-0.06%)
cycles in affected programs: 18950290 -> 18867176 (-0.44%)
helped: 2419
HURT: 328

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
ac089126b9b647f930ee2657aa16ea8e8f6a5dd7 09-Feb-2016 Jason Ekstrand <jason.ekstrand@intel.com> glsl/types: Rename sampler_type to sampled_type

It's a bit more descriptive since it is the base type that you get when you
sample from it. Also, the next commit adds a bare "sampler" type and we
need glsl_type::sampler_type available for a public static member.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
8750299a420af76cebd3067f6f603eacde06ae06 09-Feb-2016 Jason Ekstrand <jason.ekstrand@intel.com> nir: Remove the const_offset from nir_tex_instr

When NIR was originally drafted, there was no easy way to determine if
something was constant or not. The result was that we had lots of
special-casing for constant values such as this. Now that load_const
instructions are SSA-only, it's really easy to find constants and this
isn't really needed anymore.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robclark@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b8ab9c8c8674d67e09c1134ca44b37e0a611f5b5 06-Feb-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Plumb separate surfaces and samplers through from NIR

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
5ec456375e4fdd0b6c7d797f99191044e19ead74 03-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir: Separate texture from sampler in nir_tex_instr

This commit adds the capability to NIR to support separate textures and
samplers. As it currently stands, glsl_to_nir only sets the texture deref
and leaves the sampler deref alone as it did before and nir_lower_samplers
assumes this. Backends can still assume that they are combined and only
look at only at the texture index. Or, if they wish, they can assume that
they are separate because nir_lower_samplers, tgsi_to_nir, and prog_to_nir
all set both texture and sampler index whenever a sampler is required (the
two indices are the same in this case).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
ee85014b90af1d94d637ec763a803479e9bac5dc 06-Feb-2016 Jason Ekstrand <jason.ekstrand@intel.com> nir/tex_instr: Rename sampler to texture

We're about to separate the two concepts. When we do, the sampler will
become optional. Doing a rename first makes the separation a bit more
safe because drivers that depend on GLSL or TGSI behaviour will be fine to
just use the texture index all the time.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
1dc312e295c66ab8674d2f47f859e310f607b2ed 21-Jan-2016 Matt Turner <mattst88@gmail.com> i965/fs: Implement support for extract_word.

The vec4 backend will lower it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
eb63640c1d38a200a7b1540405051d3ff79d0d8a 17-Jan-2016 Emil Velikov <emil.velikov@collabora.com> glsl: move to compiler/

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b3340cd32acf5935891f19833de0cfc500a93e0b 21-Jan-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Implement a drirc workaround for broken dual color blending.

OpenGL's dual color blending feature was specified so that an
implementation could support both multiple render targets (MRT) and
dual source blending. Fragment shader outputs specify both "location"
(the render target number) and "index" (either color 0 or 1).

I believe DirectX only has the notion of "location" - if using dual
color blending, location 0 or 1 will specify the operands. If not,
then location means the render target index. The two features can't
be used together.

As such, some applications mistakenly try to use <loc = 0, index = 0>
and <loc = 1, index = 0> in a shader used for dual color blending with
a single render target, rather than the correct <loc = 0, index = 0>
and <loc = 0, index = 1>.

In particular, Unigine Heaven 4.0 and Valley 1.0 suffer from this bug.
Unigine is aware of the problem, and quickly developed a fix, but has
not bothered to change the download link on their website to a working
copy in over a year. People were still using the broken version and
complaining. We tried working around this by disabling dual color
blending, but that apparently hurts performance, and people were once
again unhappy.

On i965, dual source blending is achieved by using different framebuffer
write messages than normal rendering. So, we have to compile different
code for the two cases. We're not being pedantic: we actually have to
know in order to function.

Normally, dual source blending is detectable in the shader: if a shader
has an output with index = 1, then it's meant for blending, not MRT.
With the broken inputs, they're indistinguishable, so we can only tell
by looking at the current GL state.

This patch implements a new drirc workaround:

export dual_color_blend_by_location=true

which makes the i965 driver detect when OpenGL state is configured for
dual source blending, and recompile the fragment shader to use the right
messages. In that case, we allow either location = 1 or index = 1 to
specify the second source for the blending equations.

It also re-enables GL_ARB_blend_func_extended for Unigine.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92233
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b82e26a6a4d6baf121f44c61c862bfa79ba0d172 13-Jan-2016 Matt Turner <mattst88@gmail.com> nir: Lower bitfield_extract.

The OpenGL specifications for bitfieldExtract() says:

The result will be undefined if <offset> or <bits> is negative, or if
the sum of <offset> and <bits> is greater than the number of bits
used to store the operand.

Therefore passing bits=32, offset=0 is legal and defined in GLSL.

But the earlier SM5 ubfe/ibfe opcodes are specified to accept a bitfield width
ranging from 0-31. As such, Intel and AMD instructions read only the low 5 bits
of the width operand, making them not able to implement the GLSL-specified
behavior directly.

This commit adds ubfe/ibfe operations from SM5 and a lowering pass for
bitfield_extract to to handle the trivial case of <bits> = 32 as

bitfieldExtract:
bits > 31 ? value : bfe(value, offset, bits)

Fixes:
ES31-CTS.shader_bitfield_operation.bitfieldExtract.uvec3_0
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b85a229e1f542426b1c8000569d89cd4768b9339 08-Jan-2016 Kenneth Graunke <kenneth@whitecape.org> glsl: Delete the ir_binop_bfm and ir_triop_bfi opcodes.

TGSI doesn't use these - it just translates ir_quadop_bitfield_insert
directly. NIR can handle ir_quadop_bitfield_insert as well.

These opcodes were only used for i965, and with Jason's recent patches,
we can do this lowering in NIR (which also gains us SPIR-V handling).
So there's not much point to retaining this GLSL IR lowering code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
4a1c8a3037cd29938b2a6e2c680c341e9903cfbe 28-Dec-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Push most TES inputs in SIMD8 mode.

Using the push model for inputs is much more efficient than pulling
inputs - the hardware can simply copy a large chunk into URB registers
at thread creation time, rather than having the thread send messages to
request data from the L3 cache. Unfortunately, it's possible to have
more TES inputs than fit in registers, so we have to fall back to the
pull model in some cases.

However, it turns out that most tessellation evaluation shaders are
fairly simple, and don't use many inputs. An arbitrary cut-off of
32 vec4 slots (16 registers) is more than sufficient to ensure that
100% of TES inputs are pushed for Shadow of Mordor, Unigine Heaven,
GPUTest/TessMark, and SynMark.

Note that unlike most SIMD8 stages, this actually reads packed vec4
data, since that is what our vec4 TCS programs write.

Improves performance in GPUTest's tessmark_x64 microbenchmark
by 93.4426% +/- 5.35541% (n = 25) on my Lenovo X250 at 1024x768.

Improves performance in Synmark's Gl40TerrainFlyTess microbenchmark
by 22.74% +/- 0.309394% (n = 5).

Improves performance in Shadow of Mordor at low settings with
tessellation enabled at 1280x720 by 2.12197% +/- 0.478553% (n = 4).

shader-db statistics for files containing tessellation shaders:

total instructions in shared programs: 184358 -> 181181 (-1.72%)
instructions in affected programs: 27971 -> 24794 (-11.36%)
helped: 226

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b022150d70a1cfdda2007fa16b04c601eef45d6f 28-Dec-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Use LOAD_PAYLOAD for SIMD8 TES input loads, not MOV.

We need a MOV to replicate g0.0<0,1,0> to all 8 channels. Since the
message payload is a single register, MOV seemed more sensible than
LOAD_PAYLOAD. However, MOV cannot be CSE'd, while LOAD_PAYLOAD can.

All input loads can use the same header - we don't need to re-expand
g0 every time. CSE accomplishes this, saving instructions.

shader-db statistics for files containing tessellation shaders:

total instructions in shared programs: 186923 -> 184358 (-1.37%)
instructions in affected programs: 30536 -> 27971 (-8.40%)
helped: 226
HURT: 0

total cycles in shared programs: 1009850 -> 1005356 (-0.45%)
cycles in affected programs: 168206 -> 163712 (-2.67%)
helped: 226
HURT: 0

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
cddfc2cefa93b884c40329dcb193fe4fb22143ab 10-Dec-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Add support for gl_DrawIDARB and enable extension

We have to break open a new vec4 for gl_DrawIDARB. We've used up all
space in the vec4 we use for SGVS and gl_DrawIDARB has to come from its
own separate vertex buffer anyway. This is because we point the vb for
base vertex and base instance into the draw parameter BO for indirect
draw calls, but the draw id is generated by mesa in a different buffer.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
17ebb55a14b5a9aa639845fbda9330ef9421834a 10-Dec-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Add support for gl_BaseVertexARB and gl_BaseInstanceARB

We already have gl_BaseVertexARB in the .x component of the SGVS vec4
and plug gl_BaseInstanceARB into the last free component (.y).

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
237f2f2d8b45d9d956102eec6f9be63193e5269b 26-Dec-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir: Get rid of function overloads

When Connor originally drafted NIR, he copied the same function+overload
system that GLSL IR had with a few names changed. However, this
double-indirection is not really needed and has only served to confuse
people. Instead, let's just have functions which may not have unique names
and may or may not have an implementation. If someone wants to do overload
resolving, they can hav a hash table based function+overload system in the
overload resolving pass. There's no good reason to keep it in core NIR.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>

ir3 bits are

Reviewed-by: Rob Clark <robclark@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a5038427c3624e559f954124d77304f9ae9b884c 10-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Add tessellation evaluation shaders

The TES is essentially a post-tessellator VS, which has access to the
entire TCS output patch, and a special gl_TessCoord input. Otherwise,
they're very straightforward.

This patch implements SIMD8 tessellation evaluation shaders for Gen8+.
The tessellator can generate a lot of geometry, so operating in SIMD8
mode (8 vertices per thread) is more efficient than SIMD4x2 mode (only
2 vertices per thread). I have another patch which implements SIMD4x2
mode for older hardware (or via an environment variable override).

We currently handle all inputs via the pull model.

v2: Improve comments (suggested by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c8a74e3a4ea6ac5dfa35adac06af14a8fa4ff773 30-Nov-2015 Matt Turner <mattst88@gmail.com> nir: Delete bany, ball, fany, fall.

As in the previous patches, these can be implemented as

any(v) -> any_nequal(v, false)
all(v) -> all_equal(v, true)

and their removal simplifies the code in the next patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b8425bb1e845bef19dac8d8a9fd672e958018802 11-Dec-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use the correct source for local memory load offsets

The offset for loads is in src[0]. This was a copy+paste error in the
nir_intrinsic_load/store refactoring. This commit fixes a segfault in
ES31-CTS.compute_shader.work-group-size. I have no idea how piglit failed
to catch this...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93348
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
78b81be627734ea7fa50ea246c07b0d4a3a1638a 25-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir: Get rid of *_indirect variants of input/output load/store intrinsics

There is some special-casing needed in a competent back-end. However, they
can do their special-casing easily enough based on whether or not the
offset is a constant. In the mean time, having the *_indirect variants
adds special cases a number of places where they don't need to be and, in
general, only complicates things. To complicate matters, NIR had no way to
convdert an indirect load/store to a direct one in the case that the
indirect was a constant so we would still not really get what the back-ends
wanted. The best solution seems to be to get rid of the *_indirect
variants entirely.

This commit is a bunch of different changes squashed together:

- nir: Get rid of *_indirect variants of input/output load/store intrinsics
- nir/glsl: Stop handling UBO/SSBO load/stores differently depending on indirect
- nir/lower_io: Get rid of load/store_foo_indirect
- i965/fs: Get rid of load/store_foo_indirect
- i965/vec4: Get rid of load/store_foo_indirect
- tgsi_to_nir: Get rid of load/store_foo_indirect
- ir3/nir: Use the new unified io intrinsics
- vc4: Do all uniform loads with byte offsets
- vc4/nir: Use the new unified io intrinsics
- vc4: Fix load_user_clip_plane crash
- vc4: add missing src for store outputs
- vc4: Fix state uniforms
- nir/lower_clip: Update to the new load/store intrinsics
- nir/lower_two_sided_color: Update to the new load intrinsic

NIR and i965 changes are

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

NIR indirect declarations and vc4 changes are

Reviewed-by: Eric Anholt <eric@anholt.net>

ir3 changes are

Reviewed-by: Rob Clark <robdclark@gmail.com>

NIR changes are

Acked-by: Rob Clark <robdclark@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
f3970fad9e5b04e04de366a65fed5a30da618f9d 08-Dec-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Refactor store_output, load_input, and load_uniform

There was way too much incrementing of things going on. Instead, let's
just start everything off at the right base location, and then increment in
the loop.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
e288b4a133f1ea8208cd219545a72805ed5a91c6 10-Oct-2015 Jordan Justen <jordan.l.justen@intel.com> i965/nir: Implement shared variable atomic operations

v3:
* Update based on latest SSBO code (Iago)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
faddb301ff72bd7ac8d4274e0d895ca37a4d3bce 29-Jul-2015 Jordan Justen <jordan.l.justen@intel.com> i965/fs: Handle nir shared variable store intrinsic

v4:
* Apply similar optimization for shared variable stores as
0cb7d7b4b7c32246d4c4225a1d17d7ff79a7526d. This was causing a
OpenGLES 3.1 CTS failure, but
867c436ca841b4196b4dde4786f5086c76b20dd7 fixes that.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
8613206bd3dd80dc916b6ce7c47bf59cd4d114c8 29-Jul-2015 Jordan Justen <jordan.l.justen@intel.com> i965/fs: Handle nir shared variable load intrinsic

v3:
* Remove extra #includes (Iago)
* Use recently added GEN7_BTI_SLM instead of BRW_SLM_SURFACE_INDEX (curro)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
18069dce4a4c3d71e6afc6b10bfa7bee0560ba9c 11-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Make uniform offsets be in terms of bytes

This commit pushes makes uniform offsets be terms of bytes starting with
nir_lower_io. They get converted to be in terms of vec4s or floats when we
cram them in the UNIFORM register file but reladdr remains in terms of
bytes all the way down to the point where we lower it to a pull constant
load.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
22c273de2b97743587310f7bbf66767191bde866 11-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Remove unused indirect handling

The one and only place where the FS backend allows reladdr is on uniforms.
For locals, inputs, and outputs, we lower it away before the backend ever
sees it. This commit gets rid of the dead indirect handling code.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
13ad8d03f201a4d09bf7ab9078b00807d61dfada 01-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use a stride of 1 and byte offsets for UBOs

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3810c1561401aba336765d64d1a5a3e44eb58eb3 25-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Fix scalar vertex shader struct outputs.

While we correctly set output[] for composite varyings, we set completely
bogus values for output_components[], making emit_urb_writes() output
zeros instead of the actual values.

Unfortunately, our simple approach goes out the window, and we need to
recurse into structs to get the proper value of vector_elements for each
field.

Together with the previous patch, this fixes rendering in an upcoming
game from Feral Interactive.

v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt).

Cc: "11.1 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3e9003e9cf55265ab1fb6522dc5cbb2f455ea1f9 20-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Fix fragment shader struct inputs.

Apparently we have literally no support for FS varying struct inputs.
This is somewhat surprising, given that we've had tests for that very
feature that have been passing for a long time.

Normally, varying packing splits up structures for us, so we don't see
them in the backend. However, with SSO, varying packing isn't around
to save us, and we get actual structs that we have to handle.

This patch changes fs_visitor::emit_general_interpolation() to work
recursively, properly handling nested structs/arrays/and so on.
(It's easier to read with diff -b, as indentation changes.)

When using the vec4 VS backend, this fixes rendering in an upcoming
game from Feral Interactive. (The scalar VS backend requires additional
bug fixes in the next patch.)

v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt).

Cc: "11.1 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
f36993b46962eab4446bc1964eb47149751aee26 23-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Clean up #includes in the compiler.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
ecac1aab538d65f0867fd93e23d0d020c1a5d0f1 23-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Push down inclusion of brw_program.h.

We were including it in headers, which then caused it to be included in
tons of places it wasn't needed.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
6fe9ea78fa413ca3f0359f62881876f6b7a12f03 23-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Remove duplicate #includes.

Added in commits 36fd65381 and 337dad8ce even though the existing
include was in view.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
6c8ba59cff14a1a86273f4008ff2a8e68335ab25 11-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Use nir_lower_tex for texture coordinate lowering

Previously, we had a rescale_texcoords helper in the FS backend for
handling rescaling of texture coordinates. Now that we can do variants in
NIR, we can use nir_lower_tex to do the rescaling for us. This allows us
to delete the i965-specific code and gives us proper TEXTURE_RECTANGLE and
GL_CLAMP handling in vertex and geometry shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c875e3cdd21811ad6669160d59fa39a4526ef872 14-Nov-2015 Matt Turner <mattst88@gmail.com> i965/fs: Add support for gl_HelperInvocation system value.

In most cases (when the negate is copy propagated and the MOV removed),
this is two instructions on Gen >= 8 and only two instructions on
earlier platforms -- and it doesn't use the flag register.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
99840eb983f74cd447546f7205c8c9f505ef82c8 18-Nov-2015 Ian Romanick <ian.d.romanick@intel.com> i965: Enable EXT_shader_samples_identical

On the vec4 backend, textureSamplesIdentical() will always return
false. There are currently no test cases for the vec4 backend, so we
don't have much confidence in any implementation. We also don't think
anyone is likely to miss it.

v2: Handle immediate value for MCS smarter. Rebase on changes to
nir_texop_sampels_identical (missing second parameter). Suggested by
Jason.

v3: Add Neil's code to handle 16x MSAA in the FS. Also rebase on top of
f9a9ba5e. Stub out the vec4 implementation.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v2]
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v2]
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
457bb290efc162ea3c7c51a820ab7cf88a4efb8d 18-Nov-2015 Ian Romanick <ian.d.romanick@intel.com> nir: Add nir_texop_samples_identical opcode

This is the NIR analog to GLSL IR ir_samples_identical.

v2: Don't add the second nir_tex_src_ms_index parameter. Suggested by
Ken and Jason.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
9b978046eb1d1657060365e8dcde4aad41b50af9 02-Nov-2015 Matt Turner <mattst88@gmail.com> i965/fs: Use brw_imm_uw().

W/UW immediates are 16-bits, but those 16-bits must be replicated
in the high 16-bits of the 32-bit field.

Remove the useless W/UW immediate saturating code, since we'll now be
using the appropriate immediate (and W/UW immediates in the IR can now
no longer be larger than 16-bits).

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3ccc41ecfc5e9345a1c291748d8840984f7413ae 02-Nov-2015 Matt Turner <mattst88@gmail.com> i965/fs: Replace fs_reg(imm) constructors with brw_imm_*().

Cuts 10k of .text, of which only 776 bytes are the fs_reg constructor
implementations themselves.

text data bss dec hex filename
5204535 214112 27784 5446431 531b1f i965_dri.so before
5193977 214112 27784 5435873 52f1e1 i965_dri.so after

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
fc19a0d2e422ea8e45bc5440a91f858f5f345884 08-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Allow indirect GS input indexing in the scalar backend.

This allows arbitrary non-constant indices on GS input arrays,
both for the vertex index, and any array offsets beyond that.

All indirects are handled via the pull model. We could potentially
handle indirect addressing of pushed data as well, but it would add
additional code complexity, and we usually have to pull inputs anyway
due to the sheer volume of input data. Plus, marking pushed inputs
as live due to indirect addressing could exacerbate register pressure
problems pretty badly. We'd need to be careful.

v2: Use updated MOV_INDIRECT opcode.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b163aa01487ab5f9b22c48b7badc5d65999c4985 27-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Rename GRF to VGRF.

The 2-bit hardware register file field is ARF, GRF, MRF, IMM.

Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
dba309fc14d1ca99251c8f8115d2a26ac86f14f6 30-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Initialize registers.

The test (file == BAD_FILE) works on registers for which the constructor
has not run because BAD_FILE is zero. The next commit will move
BAD_FILE in the enum so that it's no longer zero.

In the case of this->outputs, the constructor was being run implicitly,
and we were unnecessarily memsetting is to zero.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
d4fdb84f80dd3dbad2b71ea6e877f24dc625aa2a 10-Nov-2015 Samuel Iglesias Gonsálvez <siglesias@igalia.com> i965/fs/nir: fix the number of register written by FS_OPCODE_GET_BUFFER_SIZE

FS_OPCODE_GET_BUFFER_SIZE is calculated with a resinfo's sampler message.

This patch adjusts the number of registers written by the opcode
following what the PRM spec says about the number of registers written
by the SIMD8 and SIMD16's writeback messages for sampler messages.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
918bda23dda36004c95f6441328ecc892e068886 05-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Split nir_emit_intrinsic by stage with a general fallback.

Many intrinsics only apply to a particular stage (such as discard).
In other cases, we may want to interpret them differently based on
the stage (such as load_primitive_id or load_input).

The current method isn't that pretty - we handle all intrinsics in
one giant function. Sometimes we assert on stage, sometimes we forget.
Different behaviors are handled via if-ladders based on stage.

This commit introduces new nir_emit_<stage>_intrinsic() functions,
and makes nir_emit_instr() call those. In turn, those fall back to
the generic nir_emit_intrinsic() function for cases they don't want
to handle specially.

This makes it clear which intrinsics only exist in one stage, and makes
it easy to handle inputs/outputs differently for various stages.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
51694072218b5ae84b5d8f98ee2172d7c5d61b31 06-Nov-2015 Francisco Jerez <currojerez@riseup.net> i965/nir/fs: Add comment for no-op memory barrier functions

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
faa119307035787f5e421dd6a9eb4d0101de963b 10-Oct-2015 Jordan Justen <jordan.l.justen@intel.com> i965/nir/fs: Implement new barrier functions for compute shaders

For these nir intrinsics, we emit the same code as
nir_intrinsic_memory_barrier:

* nir_intrinsic_memory_barrier_atomic_counter
* nir_intrinsic_memory_barrier_buffer
* nir_intrinsic_memory_barrier_image

We treat these nir intrinsics as no-ops:
* nir_intrinsic_group_memory_barrier
* nir_intrinsic_memory_barrier_shared

v3:
* Add comment for no-op cases (curro)

v4:
* Moving comment to a separate patch authored by curro

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
8dcf807cb43383590ba193c7ff20b8a98e4a9f65 14-Oct-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Fix scalar VS float[] and vec2[] output arrays.

The scalar VS backend has never handled float[] and vec2[] outputs
correctly (my original code was broken). Outputs need to be padded
out to vec4 slots.

In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot
by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4. However,
this is wrong: type_size_scalar() for a float[2] would return 2, or
for vec2[2] it would return 4. This looked like a single slot, even
though in reality each array element would be stored in separate vec4
slots.

Because of this bug, outputs[] and output_components[] would not get
initialized for the second element's VARYING_SLOT, which meant
emit_urb_writes() would skip writing them. Nothing used those values,
and dead code elimination threw a party.

To fix this, we introduce a new type_size_vec4_times_4() function which
pads array elements correctly, but still counts in scalar components,
generating correct indices in store_output intrinsics.

Normally, varying packing avoids this problem by turning varyings into
vec4s. So this doesn't actually fix any Piglit or dEQP tests today.
However, if varying packing is disabled, things would be broken.
Tessellation shaders can't use varying packing, so this fixes various
tcs-input Piglit tests on a branch of mine.

v2: Shorten the implementation of type_size_4x to a single line (caught
by Connor Abbott), and rename it to type_size_vec4_times_4()
(renaming suggested by Jason Ekstrand). Use type_size_vec4
rather than using type_size_vec4_times_4 and then dividing by 4.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
eea3c907cc480a105224b21be51d62bc64ea1057 30-Oct-2015 Iago Toral Quiroga <itoral@igalia.com> i965/fs: Do not mark used surfaces in FS_OPCODE_GET_BUFFER_SIZE

Do it in the visitor, like we do for other opcodes.

v2: use const, get rid of useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
027b64a55afc0fe8efcf9f6217192807e285c830 30-Oct-2015 Iago Toral Quiroga <itoral@igalia.com> i965/fs: Do not mark direct used surfaces in VARYING_PULL_CONSTANT_LOAD

Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

v2: Use const and remove useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
fb77da89f51fd82d5cee95400acb20ad74d9e7bc 31-Oct-2015 Timothy Arceri <t_arceri@yahoo.com.au> i965: add support for image AoA

V3: clamp array index to the correct size (the size of the current array
rather than the inner array) Francisco Jerez.

V2: avoid useless zero-initialization and addition for the first AoA level,
avoid redundant temporary, make use of type_size_scalar(), rename aoa_size
to element_size, assign the indirect indexing temporary directly to
image.reladdr, and replace while loop with a for loop. All suggested
by Francisco Jerez.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
36fd65381756ed1b8f774f7fcdd555941a3d39e1 12-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Add scalar geometry shader support.

This is hidden behind INTEL_SCALAR_GS=1 for now, as we don't yet support
instanced geometry shaders, and Orbital Explorer's shader spills like
crazy. But the infrastructure is in place, and it's largely working.

v2: Lots of rebasing.

v3: (feedback from Kristian Høgsberg)
- Handle stride and subreg_offset correctly for ATTRs; use a helper.
- Fix missing emit_shader_time_end() call.
- Delete dead code after early EOT in static vertex case to avoid
tripping asserts in emit_shader_time_end().
- Use proper D/UD type in intexp2().
- Fix "EndPrimitve" and "to that" typos.
- Assert that invocations == 1 so we know this is missing.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
0b19f651958c3888588190c8c8a9e701173a2aa2 26-Oct-2015 Matt Turner <mattst88@gmail.com> i965/fs: Clean up FBH code.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
4379ca22f18f5731248ee794ab651db721ba38b2 07-Oct-2015 Emil Velikov <emil.velikov@collabora.com> i965: Implement nir_intrinsic_shader_clock

v2:
- Add a few const qualifiers for good measure.
- Drop unneeded retype()s (Matt)
- Convert timestamp to SIMD8/16, as fs_visitor::get_timestamp() returns
SIMD4 (Connor)

v3:
- Remove unneeded temporary + MOV (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
8c902a580a490181e7cde29073b11181db4614f8 17-Jun-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Implement ARB_fragment_layer_viewport.

Normally, we could read gl_Layer from bits 26:16 of R0.0. However, the
specification requires that bogus out-of-range 32-bit values written by
previous stages need to appear in the fragment shader as-written.

Instead, we pass in the full 32-bit value from the VUE header as an
extra flat-shaded varying. We have the SF override the value to 0
when the previous stage didn't actually write a value (it's actually
defined to return 0).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
0cb7d7b4b7c32246d4c4225a1d17d7ff79a7526d 22-Oct-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965/fs: Optimize ssbo stores

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Write groups of enabled components together.

Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
feff21d1a6ba49a0d6f7526e1ff473a0b574c92e 22-Oct-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965/fs: Drop offset_reg temporary in ssbo load

Now that we don't read each component one-by-one, we don't need the
temoprary vgrf for the offset. More importantly, this register was type
UD while the nir source was type D. This broke copy propagation and left
a redundant MOV in the generated code.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a19bf6d3ccbab6170ccfb7e04316a58f3e19396c 21-Oct-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965/fs: Don't uniformize surface index twice

The emit_untyped_read and emit_untyped_write helpers already uniformize
the surface index argument. No need to do it before calling them.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
24a3a697e5e029767c2d210a94d47c52c5e5e299 17-Oct-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965/fs: Read all components of a SSBO field with one send

Instead of looping through single-component reads, read all components
in one go.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
1db44252d01bf7539452ccc2b5210c74b8dcd573 20-Oct-2015 Ben Widawsky <benjamin.widawsky@intel.com> i965: Implement ARB_shader_stencil_export (gen9+)

v2: remove useless source_stencil_to_render_target (Ken)
Squash in the actual packing function, which also got to
v2:
Move the definition of the OPCODE outside of FB_WRITE opcodes (Matt)
Reorder the regioning to be in VWH order (Matt)
Don't retype src in the backend, just assert instead (Matt)
Rename the debug prints to something better (Matt)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
48c76eae8e52fba2fe22d2cfa7f3c94a5420feb2 10-Jul-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Implement gl_InvocationID.

It's stored in bits 31:27 of g1 (along with the URB handles).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c5ae34f38f239d346090212a9f33a947a3b7642e 24-Sep-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Implement nir_intrinsic_load_primitive.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
6f9ca3026693e061ee55fa6d5f16d9ec0e744b59 15-Oct-2015 Iago Toral Quiroga <itoral@igalia.com> i965/fs: use the right number of UBOs

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
176e6930e6c24dfce7cc730faa2612d27689a4df 18-Jul-2015 Timothy Arceri <t_arceri@yahoo.com.au> i965: add arrays of arrays support for varyings

V2: get the correct vector elements value for outputs

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
d3f45888045c84b2bc382a34d169a0ede4774a24 09-Oct-2015 Iago Toral Quiroga <itoral@igalia.com> i965: Adapt SSBOs to work with their own separate index space

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
baee16bf02eedc6a32381d79da6c7ac942f782ae 28-Sep-2015 Iago Toral Quiroga <itoral@igalia.com> nir: split SSBO min/max atomic instrinsics into signed/unsigned versions

NIR is typeless so this is the only way to keep track of the
type to select the proper atomic to use.

v2:
- Use imin,imax,umin,umax for the intrinsic names (Connor Abbott)
- Change message for unreachable paths (Michael Schellenberger)

Tested-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
2953c3d76178d7589947e6ea1dbd902b7b02b3d4 15-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Map scalar VS input locations properly; avoid tons of MOVs.

Previously, we used nir_lower_io with the scalar type_size function,
which mapped VERT_ATTRIB_* locations to...some numbers. Then, in
fs_visitor::nir_setup_inputs(), we created temporaries indexed by
those numbers, and emitted MOVs from the actual ATTR registers to
those temporaries. Virtually all of these were copy propagated away,
but it's still ugly.

This patch reworks our input lowering to produce NIR lower_input
intrinsics that properly index into the ATTR file, so we can access
it directly.

No changes in shader-db.

v2: Fix unreachable() message (Ken), update commit message (Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
da361acd1c899d533caec6cae5a336f6ab35e076 17-Jul-2015 Neil Roberts <neil@linux.intel.com> i965/fs: Handle non-const sample number in interpolateAtSample

If a non-const sample number is given to interpolateAtSample it will
now generate an indirect send message with the sample ID similar to
how non-const sampler array indexing works. Previously non-const
values were ignored and instead it ended up using a constant 0 value.

The generator will try to determine if the sample ID is dynamically
uniform via nir_src_is_dynamically_uniform. If not it will query the
pixel interpolator in a loop, once for each different live sample
number. The next live sample number is found using emit_uniformize. If
multiple live channels have the same sample number then they will be
handled in a single iteration of the loop. The loop is necessary
because the indirect send message doesn't seem to have a way to
specify a different value for each fragment.

This fixes the following two Piglit tests:

arb_gpu_shader5-interpolateAtSample-nonconst
arb_gpu_shader5-interpolateAtSample-dynamically-nonuniform

v2: Handle dynamically non-uniform sample ids.
v3: Remove the BREAK instruction and predicate the WHILE directly.
Make the tokens arrays const. (Matt Turner)
v4: Iterate over the live channels instead of each possible sample
number.
v5: Don't special case immediate values in
brw_pixel_interpolator_query. Make a better wrapper for the
function to set up the PI send instruction. Ensure that the SHL
instructions are scalar. (Francisco Jerez).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
1e3c1b107e075b210998998423901092b8fcd79b 03-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Use nir_foreach_variable

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
bf7b6fd3fd6d98305d64ee6224ca9f9e7ba48444 02-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/shader: Get rid of the shader, prog, and shader_prog fields

Unfortunately, we can't get rid of them entirely. The FS backend still
needs gl_program for handling TEXTURE_RECTANGLE. The GS vec4 backend still
needs gl_shader_program for handling transfom feedback. However, the VS
needs neither and we can substantially reduce the amount they are used.
One day we will be free from their tyranny.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
756613ed35d6fd2216b5138731c0c38886b8e14a 02-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use the nir info instead of pulling things out of [shader_]prog

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b62e36d18fac4a9c9977ddfa4bc2c2dbbcdad1b4 02-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Move sampler unit lookup into rescale_texcoord

The texunit variable we create and assign in nir_emit_texture gets passed
through two more layers of function calls before it gets to its sole use in
rescale_texcoord. The best part is that we already pass the sampler into
rescale_texcoord so we can just look it up there.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7926c3ea7d8f455cbee390d20c78dadf5432b9bc 01-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/backend_shader: Add a field to store the NIR shader

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
30c63571133ed50907ec14172c2f3ef82ee8a34e 01-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Move prog_data uniform setup to the codegen level

As of now, uniform setup is more-or-less unified between vec4 and fs and no
longer requires the fs_visitor. This makes uniform setup more of a
language/API thing than a backend compiler thing. This commit moves
setting up the stage_prog_data.params arrays to the same place as we set up
the rest of stage_prog_data.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
cdf314cb21377ee7caca05bd1abab6a2b921d213 01-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Simplify uniform setup

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7fee8b6f055831bc070bb36d02a8b1c4d601652a 02-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Pull GLSL uniform handling into a common function

The way we deal with GLSL uniforms and builtins is basically the same in
both the vec4 and the fs backend. This commit takes the best parts of both
implementations and pulls the common code into a shared helper function.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
03c4171b577b06b1d8dde50b6eb9507d8ef4c1ce 29-Sep-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Pull common ARB program uniform handling into a common function

The way we deal with ARB program uniforms is basically the same in both the
vec4 and the fs backend. This commit takes the best parts of both
implementations and pulls the common code into a shared helper function.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
58cea0c2b63db236e6efcae930c5fb936181c2a9 30-Sep-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/shader: Pull setup_image_uniform_values out of backend_shader

I tried to do this once before but Curro pointed out that having it in
backend_shader meant it could use the setup_vec4_uniform_values helper
which did different things in vec4 and fs. Now the setup_uniform_values
function differs only by an assert in the two backends so there's no real
good reason to be using it anymore.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
681b4badaedec5c9503887c4afb32485ce22c30e 24-Sep-2015 Jordan Justen <jordan.l.justen@intel.com> i965/cs: Generate code to load gl_NumWorkGroups

This code also sets cs_prog_data->uses_num_work_groups which is later
used by state setup to indicate that the gl_NumWorkGroups surface
needs to be setup.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
6668eb5a451c43ac78a784711cf239fdf7ca75ef 11-Sep-2015 Samuel Iglesias Gonsalvez <siglesias@igalia.com> mesa: rename gl_shader_program's NumUniformBlocks to NumBufferInterfaceBlocks

Because it counts shader storage blocks too.

v2:
- Use NumBufferInterfaceBlocks instead (Jordan).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
14af6f4698a9f60c080b9adda4d3b4c45b157bd7 01-Jun-2015 Iago Toral Quiroga <itoral@igalia.com> i965/nir/fs: Implement nir_intrinsic_ssbo_atomic_*

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
5b186aafe7a8d3f96a99ad2fddd2bff99d99e923 01-Jun-2015 Iago Toral Quiroga <itoral@igalia.com> i965/nir/fs: Implement nir_intrinsic_load_ssbo

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
337dad8ceeb4f313a47b4ddb31805f355c3fc3a5 01-Jun-2015 Iago Toral Quiroga <itoral@igalia.com> i965/nir/fs: Implement nir_intrinsic_store_ssbo

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
f5dd2c182275a9de57e5186491012c402a6248e0 01-Jun-2015 Samuel Iglesias Gonsalvez <siglesias@igalia.com> i965/fs/nir: implement nir_intrinsic_get_buffer_size

v2:
- Remove inst->regs_written assignment as the instruction only
writes to one register.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c5743a5d7fa62a339222ceb96d568a525d77fe0c 13-Mar-2015 Jordan Justen <jordan.l.justen@intel.com> i965/nir: Support gl_WorkGroupID variable

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
49f999b9cb6ecb32cb27d10b47d234a176ae4c77 13-Mar-2015 Jordan Justen <jordan.l.justen@intel.com> i965/nir: Support gl_LocalInvocationID variable

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
34cff76fc2da1ce9abad6e2b1856fec6a950d19c 05-Nov-2014 Jordan Justen <jordan.l.justen@intel.com> i965/cs: Enable barrier in MEDIA_INTERFACE_DESCRIPTOR

Enable barrier in MEDIA_INTERFACE_DESCRIPTOR if the program uses the
barrier() GLSL function.

On Ivy Bridge and Haswell, this allows the piglit test
tests/spec/arb_compute_shader/execution/simple-barrier-atomics.shader_test
to pass. On gen8, this enables a similar test with a local group size
of 896 to pass.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
55ebaa6d003b69c0a159a00d82a1e96f685062d6 28-Aug-2015 Ilia Mirkin <imirkin@alum.mit.edu> i965: add handling for imageSamples

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
0b91bcea98c0fe201bba89abe1ca3aee4d04c56c 12-Aug-2015 Ilia Mirkin <imirkin@alum.mit.edu> i965: add support for textureSamples function

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
[v2: kayden-supplied code in fs_nir replacing need for logical opcode]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
0cc331dddd1a99c7af3619c92c48b5c32e17f6b3 04-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Use nir_system_value_from_intrinsic to reduce duplication.

This code is all pretty much identical. We just needed the translation
from one enum value to the other.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c676c432f30158190c260e7f3731ee6667ad4103 17-Aug-2015 Matt Turner <mattst88@gmail.com> i965/fs: Remove fs_visitor::try_replace_with_sel().

No shader-db changes on g4x, snb, hsw, or bdw.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
2581fe931a48478123d8054ce7a291cffa851de9 28-Aug-2015 Marta Lofstedt <marta.lofstedt@intel.com> i965/fs: Do not set the size for zero-size uniforms

Zero sized uniforms can exist in the list, but they don't get get any space
allocated in prog_data->params or in the param_size array, so the size
should not be set for them. This was previously fixed in:

commit: 781dc7c0e1f41502f18e07c0940af949a78d2792.

However,

commit: 259f7291de2387aa3ac5f856b39b7b934a1d8e7d

removed the fix.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
97f4efd573aed7ffc0ea9395f4e69ccdeb5041f6 27-May-2015 Nanley Chery <nanley.g.chery@intel.com> mesa/macros: add power-of-two assertions for alignment macros

ALIGN and ROUND_DOWN_TO both require that the alignment value passed
into the macro be a power of two in the comments. Using software assertions
verifies this to be the case.

v2: use static inline functions instead of gcc-specific statement expressions (Brian).
v3: fix indendation (Brian).
v4: add greater than zero requirement (Anuj).

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
259f7291de2387aa3ac5f856b39b7b934a1d8e7d 18-Aug-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Rework uniform handling

Previously, we treated the entire UNIFORM file as if it had two elements:
One for direct things and one for indirect. This is substantially
different from how the old visitor code handled it where each element was
effectively its own uniform. This commit makes the NIR path more like the
old ir_visitor path where each uniform is separate. This should allow us
to more easily make decisions about what to push.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
0db8e87b4a16b123f7c0b44d54f23b535a136ee6 18-Aug-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir/intrinsics: Add a second const index to load_uniform

In the i965 backend, we want to be able to "pull apart" the uniforms and
push some of them into the shader through a different path. In order to do
this effectively, we need to know which variable is actually being referred
to by a given uniform load. Previously, it was completely flattened by
nir_lower_io which made things difficult. This adds more information to
the intrinsic to make this easier for us.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
6c33d6bbf9b54784e4498a81c73b712dca5dd737 12-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> nir: Pass a type_size() function pointer into nir_lower_io().

Previously, there were four type_size() functions in play - the i965
compiler backend defined scalar and vec4 type_size() functions, and
nir_lower_io contained its own similar functions.

In fact, the i965 driver used nir_lower_io() and then looped over the
components using its own type_size - meaning both were in play. The
two are /basically/ the same, but not exactly in obscure cases like
subroutines and images.

This patch removes nir_lower_io's functions, and instead makes the
driver supply a function pointer. This gives the driver ultimate
flexibility in deciding how it wants to count things, reduces code
duplication, and improves consistency.

v2 (Jason Ekstrand):
- One side-effect of passing in a function pointer is that nir_lower_io is
now aware of and properly allocates space for image uniforms, allowing
us to drop hacks in the backend

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
640c472fd075814972b1276c5b0ed3a769aacda5 12-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Move type_size() methods out of visitor classes.

I want to use C function pointers to these, and they don't use anything
in the visitor classes anyway.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c56899f41a904762225267cb9c543a0abd901ad5 19-Aug-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Make setup_vec4_uniform_value and _image_uniform_values take an offset

This way they don't implicitly increment the uniforms variable and don't
have to be called in-sequence during uniform setup.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
56ebd3314bfc5895fab47586fc8cda024aac4fd8 20-Aug-2015 Martin Peres <martin.peres@linux.intel.com> i965: Fix "handle nir_intrinsic_image_size"

I pushed a half-baked version of "i965: handle nir_intrinsic_image_size" by
accident. Not having the Reviewed-by: tags on the last two commits should
have been a red flag but I somehow missed it after the QA check.

This patch should fix image-size for non-int images. I will add support to
the piglit test for all the other image types.

Sorry for the noise.

Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
50db9c1db645c1a4d5777d2cacfd7ac74ebbe544 28-Apr-2015 Martin Peres <martin.peres@linux.intel.com> i965: handle nir_intrinsic_image_size

v2, Review from Francisco Jerez:
- avoid the camelCase for the booleans
- init the booleans using the sampler type
- force the initialization of all the components of the output register

v3:
- Rename a variable from CubeMapArray to CubeArray to re-use GLSL's name (Ilia)
- Fix some indentation and drop parenthesis (Topi)
- Fix a signed/unsigned comparaison warning

Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
13a04abc277089275217dce119e18acf4d4ce52d 27-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Clamp image array indices to the array bounds on IVB.

This fixes the spec@arb_shader_image_load_store@invalid index bounds
piglit tests on IVB, which were causing a GPU hang and then a crash
due to the invalid binding table index result of the array index
calculation. Other generations seem to behave sensibly when an
invalid surface is provided so it doesn't look like we need to care.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a47ae8de2cf30fbe45318a18a2ea032f30ab7d10 27-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Translate image load, store and atomic NIR intrinsics.

v2: Move array coordinate workaround into the surface builder.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
912ef52c29fdc373889594b963cc93c89fa9e3f7 28-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Handle image uniforms in NIR programs.

v2: Move the image_params array back to brw_stage_prog_data.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
8a688bee83ced46eb4bff741f05d2da033c07ade 10-Aug-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Make resolve_source_modifiers consistent with the vec4 version

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
e77a4a9b1f66de383043df95aada40fd5a004913 04-Aug-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Implement nir_op_imul/umul_high in terms of MULH.

And get rid of another no16() call.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
db8a6de571bb72ef43209a415e5492001a87b1d8 17-Jun-2015 Eduardo Lima Mitev <elima@igalia.com> i965/nir: Add new utility method brw_glsl_base_type_for_nir_type()

This method returns the glsl_base_type corresponding to a nir_alu_type.
It will factorize code currently present in fs_nir, that can be reused
in vec4_nir on its upcoming emit_texture support.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
97e205fd35bf77fd761caf24c611ff72cc0d85e2 17-Apr-2015 Eduardo Lima Mitev <elima@igalia.com> i965/nir: Move brw_type_for_nir_type() to brw_nir to allow reuse

Upcoming NIR->vec4 pass can benefit from this method, so lets move it up.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
781dc7c0e1f41502f18e07c0940af949a78d2792 30-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Fix regression with SIMD8 VS since b5f1a48e234d47b24df38cb562cffb8941d43795.

With num_direct_uniforms == 0 there's no space allocated in the
param_size array for the one block of direct uniforms -- On the FS
stage this would be a harmless no-op because it would simply re-set
one of the param_size entries allocated for the sampler units to zero,
but on the VS stage it has been reported to cause memory corruption
followed by a crash -- Surprising how a full piglit run on Gen8 didn't
catch it.

Reported-and-reviewed-by: "Lofstedt, Marta" <marta.lofstedt@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7cb60d770fc24bf00b6f7e5898cca1426e55c026 27-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Translate memory barrier NIR intrinsics.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b5f1a48e234d47b24df38cb562cffb8941d43795 28-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Execute nir_setup_uniforms, _inputs and _outputs unconditionally.

Images take up zero uniform slots in the nir_shader::num_uniforms
calculation, but nir_setup_uniforms needs to be executed even if the
program has no non-image uniforms so the driver-specific image
parameters are uploaded. nir_setup_uniforms is a no-op if there are
really no uniforms, so checking the num_uniform count is useless in
any case.

The nir_setup_inputs and _outputs changes shouldn't lead to any
functional change, they are just meant to preserve the symmetry
between them and nir_setup_uniforms.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3e5a90792d14aeb599dd236f830e6e344b35c905 05-May-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Don't overwrite fs_visitor::uniforms and ::param_size during the SIMD16 run.

Image variables need to allocate additional uniform slots over
nir_shader::num_uniforms. nir_setup_uniforms() overwrites the values
imported from the SIMD8 visitor and then exits early before entering
the nir_shader::uniforms loop, so image uniforms are never re-created.
Instead leave the imported values alone, they *must* be the same for
the uniform layout of both runs to be compatible.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
ea0ac53f059c418d5797c495b87020f2ca2ec842 29-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Drop unused untyped surface read and atomic emit methods.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
854c4d8b37416d3e5593099a8e5441f3cf861173 05-May-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Revisit NIR atomic counter intrinsic translation.

Rewrite the NIR atomic counter intrinsics translation code making use
of the recently introduced surface builder. This will allow the
removal of some of the functionality duplicated between the visitor
and surface builder.

v2: Drop VEC4 suport.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3af2623da5167aa686bcb2cff01d27058a507026 20-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965: Lift the constness restriction on surface indices passed to untyped ops.

v2: Update NIR atomic intrinsic handling too (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b406c34a65677cac2517336d93ab279c3d35fce6 23-Jul-2015 Dave Airlie <airlied@redhat.com> i965: fix warning since tess merge.

Signed-off-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
fadf34773527779eef4622b2586d87ec00476c0f 13-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965: Fix stride field for the result of emit_uniformize().

This is essentially the same problem fixed in an earlier patch for
immediates. Setting the stride to zero will be particularly useful
for my future SIMD lowering pass, because we will be able to just
check whether the stride of a source register is zero and skip
emitting the copies required to unzip it in that case.

Instead of setting stride to zero in every caller of emit_uniformize()
I've changed the function to return the result as its return value
(previously it was being written into a caller-provided destination
register), because this way we can enforce that the result is used with
the correct regioning from the function itself.

The changes to the prototype of its VEC4 counterpart are mainly for
the sake of symmetry, VEC4 registers don't have stride.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
8ba1982b1e37aa69680e243fe391254211ae273a 17-Jul-2015 Alejandro Piñeiro <apinheiro@igalia.com> i965/nir/fs: removed unneeded support for global variables

As functions are inlined, and nir_lower_global_vars_to_local gets
run, all global variables are lowered to local variables.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b00cd6e4a0f9a84d514f428428be348900236e2e 09-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965: Implement nir_op_uadd_carry and _usub_borrow without accumulator.

This gets rid of two no16() fall-backs and should allow better
scheduling of the generated IR. There are no uses of usubBorrow() or
uaddCarry() in shader-db so no changes are expected. However the
"arb_gpu_shader5/execution/built-in-functions/fs-usubBorrow" and
"arb_gpu_shader5/execution/built-in-functions/fs-uaddCarry" piglit
tests go from 40 to 28 instructions. The reason is that the plain ADD
instruction can easily be CSE'ed with the original addition, and the
b2i negation can easily be propagated into the source modifier of
another instruction, so effectively both operations are performed with
just one instruction.

v2: Rely on carry_to_arith() and borrow_to_arith() to lower these
(Ilia Mirkin).

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3ee2daf23dc91b8dfc017b5c89c10ab1376ba4df 10-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965: Implement b2f and b2i using negation.

Booleans are represented as 0/-1 on modern hardware which means we can
just negate them to convert them into a numeric type. Negation has
the benefit that it can be implemented using a source modifier which
can easily be propagated into some other instruction. shader-db
results on HSW:

total instructions in shared programs: 6349082 -> 6346693 (-0.04%)
instructions in affected programs: 40948 -> 38559 (-5.83%)
helped: 123
HURT: 1
GAINED: 1
LOST: 0

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
308c0bf74307af0f3385cdcbb00aa0534ec3e5da 12-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Switch on shader stage in nir_setup_outputs().

Adding new shader stages to a switch statement is less confusing than an
if-else-if ladder where all but the first case are fragment shader
specific (but don't claim to be).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
73d0e7f3451eaeb62ac039d2dcee1e1c6787e3db 02-Jul-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Fix matNxM vertex attributes where M != 4.

Matrix vertex attributes have their columns padded out to vec4s, which
I was failing to account for. Scalar NIR expects them to be packed,
however.

Fixes 1256 dEQP tests on Broadwell.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7009e2683ebb917393d87639f549588f22c03a32 06-Jul-2015 Francisco Jerez <currojerez@riseup.net> i965/gen4-5: Enable 16-wide dispatch on shaders with control flow.

This was probably disabled due to a combination of several bugs in the
generator code (fixed earlier in this series) and a misunderstanding
of the hardware spec. The documentation for most control flow
instructions mentions among other restrictions:

"Instruction compression is not allowed."

This however doesn't have any implications on 16 wide not being
supported, because none of the control flow instructions have
multi-register operands (control flow instructions are not compressed
on more recent hardware either, except maybe SNB's IF with inline
compare). In fact Gen4-5 had 16-wide control flow masks and stacks,
and the spec mentions in several places that control flow instructions
push and pop 16 channels worth of data -- Otherwise there doesn't seem
to be any indication that it shouldn't work.

Causes no piglit regressions, and gives the following shader-db
results on ILK:

total instructions in shared programs: 4711384 -> 4711384 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
GAINED: 1215
LOST: 0

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
493af150fb3b1c007d791b24dcd5ea8a92ad763c 03-Jul-2015 Neil Roberts <neil@linux.intel.com> i965/skl: Set the pulls bary bit in 3DSTATE_PS_EXTRA

On Gen9+ there is a new bit in 3DSTATE_PS_EXTRA that must be set if
the shader sends a message to the pixel interpolator. This fixes the
interpolateAt* tests on SKL, apart from interpolateatsample-nonconst
but that is not implemented anywhere so it's not a regression.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.6 10.5" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7abc1e3286bc4729e144d3a247c2a275e46aaf53 02-Jul-2015 Neil Roberts <neil@linux.intel.com> i965/fs: Don't disable SIMD16 when using the pixel interpolator

There was a comment saying that in SIMD16 mode the pixel interpolator
returns coords interleaved 8 channels at a time and that this requires
extra work to support. However, this interleaved format is exactly
what the PLN instruction requires so I don't think anything needs to
be done to support it apart from removing the line to disable it and
to ensure that the message lengths for the send message are correct.

I am more convinced that this is correct because as it says in the
comment this interleaved output is identical to what is given in the
thread payload. The code generated to apply the plane equation to
these coordinates is identical on SIMD16 and SIMD8 except that the
dispatch width is larger which implies no special unmangling is
needed.

Perhaps the confusion stems from the fact that the description of the
PLN instruction in the IVB PRM seems to imply that the src1 inputs are
not interleaved so it wouldn't work. However, in the HSW and BDW PRMs,
the pseudo-code is different and looks like it expects the interleaved
format. Mesa doesn't seem to generate different code on IVB to
uninterleave the payload registers and everything is working so I can
only assume that the PRM is wrong.

I tested the interpolateAt tests on HSW and did a full Piglit run on
IVB on there were no regressions.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
89bc4c78c394e50ddb16cc089bd3ec90681342d7 18-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Remove fs_inst constructors that don't take an explicit exec_size

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
f7dcc1160331462a071c54ca1067f9e2f57b55be 18-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Add a builder argument to offset()

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
0ecdf04060518149e99a098caf4f6025fd6482a4 26-Jun-2015 Connor Abbott <cwabbott0@gmail.com> i965/fs: emit constants only once

Before, we would lazily emit a MOV whenever we encountered a use of a
constant. Now that we have a dedicated file for SSA values, we can
instead only emit the MOV's once, which is more consistent and prevents
us from relying on CSE to re-combine the constants when they aren't
absorbed into the instruction.

total instructions in shared programs: 6078991 -> 6073118 (-0.10%)
instructions in affected programs: 402221 -> 396348 (-1.46%)
helped: 1527
HURT: 0
GAINED: 8
LOST: 2

v2: split this out from the previous commit (Jason)

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
864907e2f14523c130e6ff24c081789bb079bae1 24-Jun-2015 Connor Abbott <cwabbott0@gmail.com> i965/fs: use SSA values directly

Before, we would use registers, but set a magical "parent_instr" field
to indicate that it was actually purely an SSA value (i.e., it wasn't
involved in any phi nodes). Instead, just use SSA values directly, which
lets us get rid of the hack and reduces memory usage since we're not
allocating a nir_register for every value. It also makes our handling of
load_const more consistent compared to the other instructions.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
f0e772392f1c61df6e3f253dc236eb9737fb6146 13-Mar-2015 Jordan Justen <jordan.l.justen@intel.com> i965/nir: Support barrier intrinsic function

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
cfc175b40995ca4e590cd30897f6bb017e1376a3 10-Jun-2015 Chad Versace <chad.versace@intel.com> i965/fs: Fix unused variable warning

Annotate offset_components with attribute 'unused'.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
44928b799adbbf2671c482431b3b7a390118725c 08-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Remove dead IR construction code from the visitor.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
bf83a1a219af8bf82c3c721888bbe0dfc3eced34 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate translation of NIR texturing instructions to the IR builder.

v2: Don't remove assignments of base_ir just yet.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
979fe2ffee3956186017fe6c115aed53fc87ad3d 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate translation of NIR intrinsics to the IR builder.

v2: Use fs_builder::SEL instead of ::emit. Use set_condmod().

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
fe88c7ae38c72ea09ced69fb12ff00f58bdf1d6e 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate translation of NIR ALU instructions to the IR builder.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3632c28bde071950dc57e42eb62a65fb838c8bdc 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate translation of NIR control flow to the IR builder.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
9976731485abb68eb3b5ae6f11a7838977b95b5b 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate NIR variable handling to the IR builder.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
09733f220ac9921ce7d8c3524bc5327d8203c446 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate NIR emit_percomp() to the IR builder.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
546839ef639bf871feaa62ab7d811f2fc783bdcd 03-Jun-2015 Francisco Jerez <currojerez@riseup.net> i965/fs: Migrate pull constant loads to the IR builder.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
87a4bc511811327a00f9bbc1b6870b7fa46675f7 21-May-2015 Martin Peres <martin.peres@linux.intel.com> mesa: reference built-in uniforms into gl_uniform_storage

This change introduces a new field in gl_uniform_storage to
explicitely say that a uniform is built-in. In the case where it is,
no storage is defined to make it clear that it is read-only from the
mesa side. I fixed all the places in the code that made use of the
structure that I changed. Any place making a wrong assumption and using
the storage straight away will just crash.

This patch seems to implement the path of least resistance towards
listing built-in uniforms in GL_ACTIVE_UNIFORM (and other APIs).

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
2126c68e5cba79709e228f12eb3062a9be634a0e 20-May-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir: Get rid of the array elements parameter on load/store intrinsics

Previously, we used intrinsic->const_index[1] to represent "the number of
array elements to load" for load/store intrinsics. However, this set to 1
by every pass that ever creates a load/store intrinsic. Also, while it
might make some sense for registers, it makes no sense whatsoever in SSA.
On top of that, the i965 backend was the only backend to ever support it;
freedreno and vc4 just assert that it's always 1. Let's just delete it.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
1e4e17fbd9296cc5064aabdb351a894d10190cb6 11-May-2015 Matt Turner <mattst88@gmail.com> i965/fs: Lower integer multiplication after optimizations.

32-bit x 32-bit integer multiplication requires multiple instructions
until Broadwell. This patch just lets us treat the MUL instruction in
the FS backend like it operates on Broadwell, and after optimizations
we lower it into a sequence of instructions on older platforms.

Doing this will allow us to some extra optimization on integer
multiplies.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3bdbc1e436828606d0b549b9480e7cc28b42d159 07-May-2015 Ian Romanick <ian.d.romanick@intel.com> nir: Delete all traces of nir_op_flog

Nothing produces it, and nothing can consume it.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
e0a17f6e31a8cefc173ced5f53cb2d28a842fbb6 07-May-2015 Ian Romanick <ian.d.romanick@intel.com> nir: Delete all traces of nir_op_fexp

Nothing produces it, and nothing can consume it.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
e1ae0c3bc37be7b1de21ee248d674671d01da8e6 19-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965: Fix variable indexing of sampler arrays under non-uniform control flow.

ARB_gpu_shader5 requires sampler array indexing expressions to be
dynamically uniform, this however doesn't have any implications on the
control flow that leads to the evaluation of that expression being
uniform. Use emit_uniformize() to obtain an arbitrary live value from
the binding table index calculation instead of assuming that the first
channel is always live.

Fixes the following Piglit test cases:
arb_gpu_shader5/execution/sampler_array_indexing/fs-nonuniform-control-flow.shader_test
arb_gpu_shader5/execution/sampler_array_indexing/vs-nonuniform-control-flow.shader_test

part of the series:
http://lists.freedesktop.org/archives/piglit/2015-February/014615.html

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b234537cc3e513ded9b5385d876e4c531f72af94 19-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965: Fix variable indexing of UBO arrays under non-uniform control flow.

ARB_gpu_shader5 requires UBO array indexing expressions to be
dynamically uniform, this however doesn't have any implications on the
control flow that leads to the evaluation of that expression being
uniform. Use emit_uniformize() to obtain an arbitrary live value from
the binding table index calculation instead of assuming that the first
channel is always live.

Fixes the following Piglit tests:
arb_gpu_shader5/execution/ubo_array_indexing/fs-nonuniform-control-flow.shader_test
arb_gpu_shader5/execution/ubo_array_indexing/vs-nonuniform-control-flow.shader_test

part of the series:
http://lists.freedesktop.org/archives/piglit/2015-February/014616.html

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
0c06d019bcf626b289ae94ca791dc25c216c1e5c 24-Apr-2015 Matt Turner <mattst88@gmail.com> i965/fs: Fix code emission for imul_high in NIR.

Copy over from brw_fs_visitor.cpp.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c68364ac341d5fbbc5b6dcf74812a776359c0168 10-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Use the correct offsets when handling register indirects

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
28e9601d0e681411b60a7de8be9f401b0df77d29 16-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Add a devinfo field to backend_visitor and use it for gen checks

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
ceb6e5eebe13b85f57cf5a7a22371c10170943a3 14-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Remove the context parameter from brw_texture_offset

It wasn't really being used anyway. We used it to assert that gpu_shader5
is supported in the back-end but that should be caught by the front-end.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
5af0604d528733af9113a6f8711c39796ce0ae40 07-Apr-2015 Matt Turner <mattst88@gmail.com> i965/fs: Calculate delta_x and delta_y together.

This lets SIMD16 programs on G45 and Gen5 use the PLN instruction.

On Ironlake:

total instructions in shared programs: 5634757 -> 5518055 (-2.07%)
instructions in affected programs: 1745837 -> 1629135 (-6.68%)
helped: 11439
HURT: 4

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b6354d9bb077815d2e388dc5d0e7411ea6d89748 24-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Make INTEL_DEBUG=ann work with NIR.

Now that we store a copy of the NIR shader, and don't immediately free
it, we can use it in annotations as well.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
89c1feb78d010bc457f5d02be84c955eebf3549f 08-Apr-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Create NIR during LinkShader() and ProgramStringNotify().

Previously, we translated into NIR and did all the optimizations and
lowering as part of running fs_visitor. This meant that we did all of
that work twice for fragment shaders - once for SIMD8, and again for
SIMD16. We also had to redo it every time we hit a state based
recompile.

We now generate NIR once at link time. ARB programs don't have linking,
so we instead generate it at ProgramStringNotify time.

Mesa's fixed function vertex program handling doesn't bother to inform
the driver about new programs at all (which is rather mean), so we
generate NIR at the last minute, if it hasn't happened already.

shader-db runs ~9.4% faster on my i7-5600U, with a release build.

v2: Check NirOptions != NULL in ProgramStringNotify(). Don't bother
using _mesa_program_enum_to_shader_stage as we already know it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b3e286c4575bf6af343c1a03471fd876cdfb5c43 08-Apr-2015 Kenneth Graunke <kenneth@whitecape.org> nir: Store num_direct_uniforms in the nir_shader.

Storing this here is pretty sketchy - I don't know if any driver other
than i965 will want to use it. But this will make it a lot easier to
generate NIR code at link time. We'll probably rework it anyway.

(Ian suggested making nir_assign_var_locations_scalar_direct_first
simply modify the nir_shader's fields, rather than passing pointers
to them. If this stays long term, we should do that. But Jason and
I suspect we'll be reworking this area again in the near future.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
f41f07f685e7f585e433b5fd1fadf602e74f0f1e 08-Apr-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Move lower_output_reads to brw_link_shader().

This makes it so emit_nir_code() doesn't modify the GLSL IR.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
024ecc783b763712d2896fd315d8b5222c27b1ec 11-Apr-2015 Matt Turner <mattst88@gmail.com> i965/fs/nir: Mark fallthrough.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
99264b7f37dc92bcb3a9ae226e00c9300414431c 08-Apr-2015 Kenneth Graunke <kenneth@whitecape.org> nir: Make nir_lower_samplers take a gl_shader_stage, not a gl_program *.

We don't actually need a gl_program struct. We only used it to
translate prog->Target (i.e. GL_VERTEX_PROGRAM) to the gl_shader_stage
(i.e. MESA_SHADER_VERTEX). We may as well just pass that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
d131630c0825f199768965c504b6fa1e593d03d5 02-Apr-2015 Matt Turner <mattst88@gmail.com> nir: Remove fsin_reduced/fcos_reduced.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
1bd1fc248ce5ecc6882309ab64ec61835fea1eda 03-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Use brw_nir_cubemap_normalize for NIR shaders

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
cb966fb2bea77b1d7b1bdb6597b7b85d810f2d0a 01-Apr-2015 Eric Anholt <eric@anholt.net> i965: Use the tex projector lowering pass instead of hand-rolling it.

This only impacts the ARB_fp path. We can't quite disable the GLSL-level
lowering pass, because it needs to apply before
brw_do_lower_unnormalized_offset().

total instructions in shared programs: 5667857 -> 5667847 (-0.00%)
instructions in affected programs: 1114 -> 1104 (-0.90%)
helped: 16
HURT: 6

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b9d7454571029ab330f28164fe6869f5e455ca90 01-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Run DCE again before going out of SSA

We run lowering and optimization passes that might leave garbage lying
around. This keeps the FS cse from having to clean it up.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
37703040a142da6bc7c458479a70e35118e10e6b 23-Mar-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Run the ffma peephole after the rest of the optimizations

The idea here is that fusing multiply-add combinations too early can reduce
our ability to perform CSE and value-numbering. Instead, we split ffma
opcodes up-front, hope CSE cleans up, and then fuse after-the-fact.
Unless an algebraic pass does something silly where it inserts something
between the multiply and the add, splitting and re-fusing should never
cause a problem. We run the late algebraic optimizations after this so
that things like compare-with-zero don't hurt our ability to fuse things.

shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4390538 -> 4379236 (-0.26%)
instructions in affected programs: 989359 -> 978057 (-1.14%)
helped: 5308
HURT: 97
GAINED: 78
LOST: 5

This does, unfortunately, cause some substantial hurt to a shader in Kerbal
Space Program. However, the damage is caused by changing a single
instruction from a ffma to an add. This, in turn, *decreases* register
pressure in one part of the program causing it to fail to register allocate
and spill. Given the overwhelmingly positive results in other shaders and
the fact that the NIR for the Kerbal shaders is actually better, this
should be considered a positive.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a8c8b3b8720bb7ce8ac1cb94815ed36d8c881f66 21-Mar-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir: Add a dedicated ffma peephole optimization

i965/nir: Use the dedicated ffma peephole

total instructions in shared programs: 4418748 -> 4394618 (-0.55%)
instructions in affected programs: 1292790 -> 1268660 (-1.87%)
helped: 5999
HURT: 457
GAINED: 4
LOST: 9

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
da294f9b2f666f487001b2a25627c867c40eb3d9 24-Mar-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir/algebraic: Add a seperate section for "late" optimizations

i965/nir: Use the late optimizations

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
826d3afb8f421a62020308813397e541e672381e 30-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Add ARB_fragment_program support to the NIR backend.

Use prog_to_nir where we would normally call glsl_to_nir, handle program
parameter lists, and skip a few things that don't exist.

Using NIR generates much better shader code than Mesa IR, since we get
real optimizations, as opposed to prog_optimize:

total instructions in shared programs: 314007 -> 279892 (-10.86%)
instructions in affected programs: 285173 -> 251058 (-11.96%)
helped: 2001
HURT: 67
GAINED: 4
LOST: 7

v2: Change early return in nir_setup_uniforms to if/else (Jordan).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
649173b473ded2d7b1aded91cd4aab42eaeb5766 01-Feb-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Implement texture projection support.

Our fragment program backend implements support for TXP directly, and
there's no NIR lowering pass to remove the projection. When we switch
fragment program support over to NIR, we need to support it somehow.

It's easy enough to support directly.

v2: Split out offset/tex_offset rename (requested by Jordan).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
0a9bcf9e39409ea5acfdfbcf0c388e41e0f9ea45 25-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Rename offset to tex_offset to avoid shadowing offset().

fs_visitor::nir_emit_texture() created an fs_reg variable called offset,
which shadowed the offset() helper function in brw_ir_fs.h.

Rename the variable to tex_offset so we can still call offset().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a6d4a108d27f2b635748c583fe0507f04b3b493e 18-Mar-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Use signed integer type for booleans

FS instructions with NIR on i965:
total instructions in shared programs: 2663561 -> 2619051 (-1.67%)
instructions in affected programs: 1612965 -> 1568455 (-2.76%)
helped: 5455
HURT: 12

FS instructions with NIR on g4x:
total instructions in shared programs: 2352633 -> 2307908 (-1.90%)
instructions in affected programs: 1441842 -> 1397117 (-3.10%)
helped: 5463
HURT: 11

FS instructions with NIR on ilk:
total instructions in shared programs: 3997305 -> 3934278 (-1.58%)
instructions in affected programs: 2189409 -> 2126382 (-2.88%)
helped: 8969
HURT: 22

FS instructions with NIR on hsw (snb and ivb were similar):
total instructions in shared programs: 4109389 -> 4109242 (-0.00%)
instructions in affected programs: 109869 -> 109722 (-0.13%)
helped: 339
HURT: 190

No SIMD16 programs were gained or lost on any platform

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
41d64fa184671d372f6630deaf2401b00d4e984a 17-Mar-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Do boolean resolves on GEN <= 5

v2: A couple comment clean-ups from Matt

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
2612e569e04e29500f81ed233bd86b45ef583495 17-Mar-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Properly set the predicate on the SEL used in min/max

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
235c728020af352ee0f4b7d598c951f4a4e83232 17-Mar-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Use emit_lrp for emitting flrp

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
8a0946f3b1522e5f91afe14c8c3b22ba6009ed04 06-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Make an emit_discard_jump() function to reduce duplication.

This is already copied in two places, and I want to copy it to a third
place.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
46c35c61e9c5c1b56fdd9fcd4eb45591dd16d21d 18-Mar-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Sort uniforms direct-first and use two different uniform registers

Previously, we put all the uniforms into one big array. The problem with
this approach is that, as soon as there was one indirect array acces, the
backend would decide that the entire large array should be pull constants.
This commit splits the array in half: first direct-only uniforms and then
potentially-indirect uniforms. This may not be optimal, but it does let
the backend promote things to push constants.

Shader-db results on HSW:
total instructions in shared programs: 4114840 -> 4112172 (-0.06%)
instructions in affected programs: 43316 -> 40648 (-6.16%)
helped: 116
HURT: 0

v2: Set param_size[num_direct_uniforms] only if we have indirect uniforms.
This caused a bug that, strangely enough, only showed up on Broadwell
vertex shaders.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
25db44a84597960a6aea6b252bcf2c3d7e17fc74 18-Mar-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir/lower_io: Make variable location assignment a manual operation

Previously, we just assigned variable locations in nir_lower_io. Now, we
force the user to assign variable locations for us. This gives the backend
a bit more control over where variables are placed.

v2: Rename from _packed to _scalar

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
639115123efe7f71d432e24b1719adda7d23e97e 18-Mar-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir: Use a list instead of a hash_table for inputs, outputs, and uniforms

We never did a single hash table lookup in the entire NIR code base that I
found so there was no real benifit to doing it that way. I suppose that
for linking, we'll probably want to be able to lookup by name but we can
leave building that hash table to the linker. In the mean time this was
causing problems with GLSL IR -> NIR because GLSL IR doesn't guarantee us
unique names of uniforms, etc. This was causing massive rendering isues in
the unreal4 Sun Temple demo.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7ef0b6b367f73e24e6dd47a15d439775d3dd1297 09-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Add VS output support to nir_setup_outputs().

Adapted from fs_visitor::visit(ir_variable *).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
eb137117b7db6c78d6a1662730524d622301c708 09-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Handle VS inputs in the NIR backend.

(Jason noted that this is not a good long term solution, and we should
instead improve nir_lower_io so that this extra set of MOVs is
unnecessary. I tend to agree, but decided we could do that as a
follow-up improvement.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a5c4e7fcf52c048c02e4ee14413a574b4ff3695e 09-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Refactor fs_visitor::nir_setup_inputs().

No functional change. In preparation for supporting vertex shaders,
this adds a switch statement on shader stage (since vertex attributes
and fragment shader varyings will need different handling). It also
renames "varying" to "input", to be more general.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
34628a838aa96643be02cd23eb55af50025dd422 09-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Implement NIR intrinsics for loading VS system values.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b9dea9bc45299f19c445170a4cac27810547de00 09-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Lower to registers a bit later.

We can't safely call nir_optimize() with register present, since several
passes called in the loop can't handle registers, and will fail asserts.

Notably, nir_lower_vec_alus() and nir_opt_algebraic() really don't want
registers.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
1f0067811c059fb3b284a2169e94fbdec7a4b909 09-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Optimize after nir_lower_var_copies().

Array variable copy splitting generates a bunch of stuff we want to
clean up before proceeding.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
1d8ef6ba606a88239de633e5abcc19471c9d3cf4 09-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Store a pointer to brw_sampler_prog_key_data in the visitor.

The NIR backend hardcodes brw_wm_prog_key at the moment, which won't
work when we support scalar VS. We could use get_tex(), but it's a
static method. I was going to promote it to fs_visitor, but then
realized that both parameters (stage and key) are already members.

It then occured to me that we could just set up a pointer in the
constructor, and skip having a function altogether.

This patch also converts all existing users to use key_tex.

v2: Make key_tex a "const brw_sampler_prog_key_data *" instead of
non-const; word-wrap some lines. (Review comments from Topi.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
e4f26acc08a3d852e60a27d0f0da7001944cb607 28-Feb-2015 Ian Romanick <ian.d.romanick@intel.com> i965/fs: Silence unused parameter warning

brw_fs_visitor.cpp:2162:56: warning: unused parameter 'offset_components' [-Wunused-parameter]
fs_reg offset_value, unsigned offset_components,
^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c6f2abe67e38c52361a1d342dca6ec5ed7747913 06-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> nir: Plumb the shader stage into glsl_to_nir().

The next commit needs to know the shader stage in glsl_to_nir().
To facilitate that, we pass the gl_shader rather than the raw exec_list
of instructions. This has both the exec_list and the stage.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b200cbb0a41aaebb007668f870a483f0b9ecd898 06-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> nir: Add native_integers to nir_shader_compiler_options.

glsl_to_nir, tgsi_to_nir, and prog_to_nir all want to know whether the
driver supports native integers. Presumably other passes may as well.

Adding this to nir_shader_compiler_options is an easy way to provide
that information, as it's accessible via nir_shader::options.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a55da73be46b4576015417b2dff71a719bc8b797 06-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> nir: Try to make sense of the nir_shader_compiler_options code.

The code in glsl_to_nir is entirely dead, as we translate from GLSL to
NIR at link time, when there isn't a _mesa_glsl_parse_state to pass,
so every caller passes NULL.

glsl_to_nir seems like the wrong place to try and create the shader
compiler options structure anyway - tgsi_to_nir, prog_to_nir, and other
translators all would have to duplicate that code. The driver should
set this up once with whatever settings it wants, and pass it in.

Eric also added a NirOptions field to ctx->Const.ShaderCompilerOptions[]
and left a comment saying: "The memory for the options is expected to be
kept in a single static copy by the driver." This suggests the plan was
to do exactly that. That pointer was not marked const, however, and the
dead code used a mix of static structures and ralloced ones.

This patch deletes the dead code in glsl_to_nir, instead making it take
the shader compiler options as a mandatory argument. It creates an
(empty) options struct in the i965 driver, and makes NirOptions point
to that. It marks the pointer const so that we can actually do so
without generating "discards const qualifier" compiler warnings.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a84f66a9b6cf46bb19ca71faca5b1d6d81209caf 06-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Resolve source modifiers on Gen8+ logic operations.

On Gen8+, AND/OR/XOR/NOT don't support the abs() source modifier, and
negate changes meaning to bitwise-not (~, not -). This isn't what NIR
expects, so we should resolve the source modifers via a MOV.

+30 Piglits (fs-op-bit{and,or,xor}-not-abs-*).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
5666d9266fd43d552c76ce7b472abc0afde6c32b 28-Feb-2015 Matt Turner <mattst88@gmail.com> i965/fs/nir: Mark fallthrough.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
54cd2f7c9655ccbb00209b1f49692196df2a33a1 28-Feb-2015 Matt Turner <mattst88@gmail.com> i965/fs/nir: Mark fallthrough.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b8a1637119249c1d5e76c27d0053360bbb7f4e77 27-Feb-2015 Ian Romanick <ian.d.romanick@intel.com> i965/fs/nir: Use emit_math for nir_op_fpow

It appears that all the other instructions that need it already use it.
This one just got missed.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
8eb6c109994de2827b0a1340a2dc8d933edaf5e0 20-Aug-2014 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Handle conditional discards.

The discard condition tells us which channels we want killed. We want
to invert that condition to get the channels that should survive (remain
live) in f0.1. Emit a CMP to negate it.

Nothing generates these today, but that will change shortly.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b8582d18e6b0737c4a34777837c10898ed177e30 15-Feb-2015 Matt Turner <mattst88@gmail.com> i965/fs/nir: Optimize integer multiply by a 16-bit constant.

Gen8+ support was just broken, since MUL now consumes 32-bits from both
sources. Fixes 986 piglit tests on my BDW.

total instructions in shared programs: 7753873 -> 7753522 (-0.00%)
instructions in affected programs: 28164 -> 27813 (-1.25%)
helped: 77
GAINED: 47

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7a997a386375a98b70ae5e1d880c8d47f236de8d 15-Feb-2015 Matt Turner <mattst88@gmail.com> i965/fs/nir: Optimize (gl_FrontFacing ? x : y) where x and y are ±1.0.

total instructions in shared programs: 7756214 -> 7753873 (-0.03%)
instructions in affected programs: 455452 -> 453111 (-0.51%)
helped: 2333

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
2bd139e18c941e7ea0870ba43314a5c10fd5bb12 19-Feb-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Un-hardcode DEBUG_WM, "FS", and "fragment".

These code paths can (or will) be used for other shader stages.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
231267bf011e1fa6edb52ffad27fcbca8e0e28e1 31-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Use VARYING_SLOT checks rather than strcmp().

Comparing the location field is equivalent and more efficient.

We'll also need this when we start using NIR for ARB programs, as our
NIR converter will set the location field correctly, but probably won't
use the GLSL names for these concepts.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3c57a595276d0614940d70315e78de0d83bf74ac 14-Feb-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Don't support gl_FrontFacing as an input variable

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
785b22caee28892d9d995a743de1dee5434c9ce1 14-Feb-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Add support for nir_intrinsic_load_front_face

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
6489cb1ae6f3cb999b1a9c60d941ef4c388febd1 11-Feb-2015 Eric Anholt <eric@anholt.net> i965: Shut up a compiler warning about uninitialized var.

We always pass this argument, even if it won't be used by the particular
texture op.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
ccbe15f3325d7a6d04d0ea18227a08f53decec16 03-Feb-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Fix saturate on MAD and LRP with the NIR backend.

Fixes misrendering in "Witcher 2" with INTEL_USE_NIR=1, and probably
many other programs.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
ab24e1270674192d2aeb4ba0cc39497edb3342f8 03-Feb-2015 Connor Abbott <cwabbott0@gmail.com> i965/nir: use redundant phi optimization

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
8776b1b14b229d110f283f5da8c3c36261068ede 22-Jan-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Get rid of get_alu_src

Originally, get_alu_src was supposed to handle resolving swizzles and
things like that. However, now that basically every instruction we have
only takes scalar sources, we don't really need it anymore. The only case
where it's still marginally useful is for the mov and vecN operations that
are left over from SSA form. We can handle those cases as a special case
easily enough. As a side-effect, we don't need the vec_to_movs pass
anymore.

v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Rework the way we detect if we need an extra copy for swizzling. The
old code involved a pile of confusing switch fall-throughs; we now use a
loop.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
112d738b91aac44c2509aafe68bdbf9ab74bb3c1 23-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use NIR's scalarizing abilities and stop handling vectors

Now that we can scalarize with NIR, there's no need for all this code
anymore. Let's get rid of it and just do scalar operations.

v2: run copy prop before lowering phi nodes

v3: Get rid of the "emit(...)->saturate = foo" pattern

v4: Run alu_to_scalar as an optimization pass

total instructions in shared programs: 5998321 -> 5974070 (-0.40%)
instructions in affected programs: 732075 -> 707824 (-3.31%)
helped: 3137
HURT: 191
GAINED: 18
LOST: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
f02f1af9f7582bc9ca685ef240751aa57ce42638 23-Jan-2015 Ian Romanick <ian.d.romanick@intel.com> i965/fs: Allow SIMD16 on pre-SNB when try_replace_with_sel is successful

If try_replace_with_sel is able to replace the flow control with a SEL
instruction, then there is no flow control... failing SIMD16 because
of nonexistent flow control is wrong.

No piglit regressions on any i965 platform in Jenkins.

total instructions in shared programs: 4382707 -> 4382707 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
GAINED: 2089
LOST: 0

No other platforms affected in shader-db.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
d7743bb1c2d5cfe44a018251d21def18eb6d4b97 21-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Report NIR instruction counts (in SSA form) via KHR_debug.

This allows us to count NIR instructions via shader-db.

Use "run" as normal. The results file will contain both NIR and
assembly.

Then, to generate a NIR report:
./report.py <(grep NIR results/foo) <(grep NIR results/bar)

Or, to generate an i965 report:
./report.py <(grep -v NIR results/foo) <(grep -v NIR results/bar)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
f3e06fcc6add67ed3eeecbce600994ef3220ec1c 20-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Print NIR on INTEL_DEBUG=fs.

This is useful for debugging and looking for optimization opportunities.

It will need to be expanded when we add support for other scalar stages.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
faa38e16aadd9f2a2416fcb5087d7f8fc8178bf2 20-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Do optimizations again just before lowering source mods.

We want to run CSE and algebraic optimizations again after lowering IO.
Some of the passes in the optimization loop don't handle saturates and
other modifiers, so run it before lowering to source modifiers.

total instructions in shared programs: 6046190 -> 6045768 (-0.01%)
instructions in affected programs: 22406 -> 21984 (-1.88%)
helped: 47
HURT: 0
GAINED: 0
LOST: 0

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
45123ee8186cff6bb819b9c9e44d6d5a1bb41923 16-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Use offset() instead of altering reg_offset directly.

offset() properly handles reg_width, so it'll work for SIMD16.

While we're in the area, simplify a few cases, and use retype() to cut a
few more lines of code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3f263ffbb37d77f97a86686e1d2d5eeabf4ecae6 16-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Replace fs_reg(GRF, virtual_grf_alloc(...)) with vgrf(...).

brw_fs_nir.cpp creates almost all of its registers via:

fs_reg reg = fs_reg(GRF, virtual_grf_alloc(num_components));

When we add SIMD16 support, we'll need to set reg->width = 16 and
double the VGRF size...on pretty much every VGRF it allocates.

This patch replaces that pattern with a new "vgrf" helper method:

fs_reg reg = vgrf(num_components);

The new function correctly takes reg_width into account. For now,
reg_width is always 1, so this should have no functional change.

v2: Just make vgrf() account for reg_width right away, rather than
changing the behavior in the next patch.

v3: Replace one last virtual_grf_alloc I missed. It's used in code
that only runs for dispatch_width == 8, so it doesn't matter,
but consistency is nice.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
d1533d87cc7e2c39e7ce9dc838b45a2c39c96e33 16-May-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Replace fs_reg(fs_visitor, type) with fs_visitor::vgrf(type).

I dislike how fs_reg has a constructor that knows about fs_visitor.
Apart from that, it stands alone, with no need to interact with the
rest of the compiler. Which is sensible - a class that represents
a register should do just that. Allocating virtual register numbers
should be left up to the compiler (fs_visitor).

This patch replaces the constructor with a new fs_visitor::vgrf method,
eliminating fs_reg's dependency on fs_visitor. It ends up being no
more code.

v2: Rebase from May 2014 -> January 2015.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c56adc68e2e75276785fd933b47621c87f9fd3ee 15-Jan-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Do a final copy lowering pass before lowering locals to regs

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
55b5058e69859ba28c2f32de6edf5f0df3c6c28c 14-Jan-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir: Rename lower_variables to lower_vars_to_ssa

The original name wasn't particularly descriptive. This one indicates that
it actually gives you SSA values as opposed to the old pass which lowered
variables to registers.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
4aa6162f6ecf96c7400c17c310eba0cfd0f5e083 10-Jan-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir/tex_instr: Add a nir_tex_src struct and dynamically allocate the src array

This solves a number of problems. First is the ability to change the
number of sources that a texture instruction has. Second, it solves the
delema that may occur if a texture instruction has more than 4 sources.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
cb53aacaa1555b98fa77146492e96a7e3d7631ba 17-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Handle sample ID, position, and mask better

Before, we were emitting the full pile of setup instructions for sample_id
and sample_pos every time they were used. With this commit, we emit them
in their own pass once at the beginning of the shader and simply emit uses
later on. When it comes time for setting up VS, we can put setup for its
special values in the same pass.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
2c7da78805175f36879111306ac37c12d33bf65b 16-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> nir: Make load_const SSA-only

As it was, we weren't ever using load_const in a non-SSA way. This allows
us to substantially simplify the load_const instruction. If we ever need a
non-SSA constant load, we can do a load_const and an imov.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
951a7f23a076c1570f68b50fc7d03a33eb5145e7 16-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Move the other lowering passes to before out-of-SSA

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
821e75a16038aba23aa0d46c081c99f07ee44ecd 16-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> nir/lower_atomics: Use/support SSA

Previously, lower_atomics was non-SSA only. We assert-failed if the
destination of an atomic operation intrinsic was an SSA def and we used
temporary registers for computing offsets. This commit changes both of
these behaviors. We now use SSA values for computing offsets (so we can
optimize them) and we handle SSA destinations. We also move the pass to
run before we go out of SSA on i965 as it now generates SSA values.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
dfb3abbaecfbe30b8858a5428c604f9d90f65505 13-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> nir: Remove predication

We stopped generating predicates in glsl_to_nir some time ago. Right now,
it's all dead untested code that I'm not convinced always worked in the
first place. If we decide we want them back, we can revert this patch.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b3fd098e7daa491637d66d03366b67c989937a1f 13-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> nir: Make bcsel a fully vector operation

Previously, the condition was a scalar that applied to all components
simultaneously. As of this commit, the condition is a vector and each
component is switched seperately.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
3c2c0a164c2308a5777d7a59b6da4b44a57ba6e2 06-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Add support for indirect texture arrays

v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Use the nir_tex_src_sampler_offset source type instead of the
sampler_indirect thing that I cooked up before.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
62ac0ee804027d1a1fa9864e03428ced7bd8510a 05-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> nir/tex_instr: Rename the indirect source type and add an array size

In particular, we rename nir_tex_src_sampler_index to _sampler_offset and
add a sampler_array_size field to nir_tex_instr. This way we can pass the
size of sampler arrays through to backends even after removing the variable
information and, with it, the type.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
534d145e5ea039d57833395a36eed90721f6b272 09-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> nir: Use a source for uniform buffer indices instead of an index

In GLSL-to-NIR we were just setting the base index to 0 whenever there was
an indirect so having it expressed as a sum makes no sense. Also, while a
base offset may make sense for the memory location (first element in the
array, etc.) it makes less sense for the actual uniform buffer index. This
may change later, but it seems to make more sense for now.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
cd4b995254fe29bae9ab5a9563cc615274d361ed 05-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> nir: Make texture instruction names more consistent

This commit renames nir_instr_as_texture to nir_instr_as_tex and renames
nir_instr_type_texture to nir_instr_type_tex to be consistent with
nir_tex_instr.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
f77f4c00ce4834ca14dd27bed28949dc012e7daf 15-Nov-2014 Jason Ekstrand <jason.ekstrand@intel.com> nir: Add a basic constant folding pass

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
d5410bd8f65b8d0f845dc8beccd498b6fa098660 12-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> nir: Add an algebraic optimization pass

This pass uses the previously built algebraic transformations framework and
should act as an example for anyone else wanting to make an algebraic
transformation pass for NIR.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
919426631b7bd32f012eb9b6ffd8a9aff74788e1 13-Nov-2014 Jason Ekstrand <jason.ekstrand@intel.com> nir: Add a lowering pass for adding source modifiers where possible

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
a1c259d6668bf934a79e7815dff3636783adea9f 05-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Implement the ARB_gpu_shader5 interpolation intrinsics

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
e257a5112476c47928b2fa2a2f2ea3108d13264b 04-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Add a has_indirect flag and clean up some of the input/output code

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
27663dbe8edfb7583d9d8fc3704a04a5c837fe05 04-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> nir: Vectorize intrinsics

We used to have the number of components built into the intrinsic. This
meant that all of our load/store intrinsics had vec1, vec2, vec3, and vec4
variants. This lead to piles of switch statements to generate the correct
intrinsic names, and introspection to figure out the number of components.
We can make things much nicer by allowing "vectorized" intrinsics.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
26865f858d48dd473fc294f7fe14c964715cd55e 27-Nov-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Use the new variable lowering code

This commit switches us over to the new variable lowering code which is
capable of properly handling lowering indirects as we go.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7c5284d0e52add862821ab13be61228e53867e62 02-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Don't dump the shader.

This is killing piglit. I'll leave the logging local

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
04fb073344b03a02d56291dd273bdef96147e857 14-Nov-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Properly saturate multiplies

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c2abfc0b86628bb1b756e4ef125c97cb4386aea2 13-Nov-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Handle SSA constants

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
e0aa4c6272851ed418dfa18ee6014f40b0e266c2 12-Nov-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Use an array rather than a hash table for register lookup

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
20adc516e27e390b1558703720a2a2129c9e8ad5 12-Nov-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Add the CSE pass and actually run in a loop

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
20a581260633cb6d0d8ca571e7f3e886298a5733 11-Nov-2014 Jason Ekstrand <jason.ekstrand@intel.com> nir: Add a fused multiply-add peephole
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c937bdb3c2c41c5bf914ae7ead9223b8b87e9fe2 08-Nov-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Turn on the peephole select optimization

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
2bd5a24a5e440ba0072528fdb32892cf8c935a8e 07-Nov-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Validate optimization passes

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
10adf8fc858c21cd95b3e02a8d6abee563ca1046 07-Nov-2014 Jason Ekstrand <jason.ekstrand@intel.com> nir: Differentiate between signed and unsigned versions of find_msb

We also make the return types match GLSL. The GLSL spec specifies that
findMSB and findLSB return a signed integer. Previously, nir had them
return unsigned. This updates nir's behavior to match what GLSL expects.

We also update the nir-to-fs generator to take the new instructions. While
we're at it, we fix the case where the input to findMSB is zero.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
4285aaecdceac55005e1ea2e75e17c6490d158a9 12-Dec-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Do retyping for ALU srouces in get_nir_alu_src

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
63eb32950e64715a7a686ae9da82b55954db9ab8 22-Oct-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Convert the shader to/from SSA

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
ff0a9fcf332ce319fae1eb53f3e5d863d0289cbf 21-Oct-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Don't duplicate emit_general_interpolation

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
744b4e9348db1767a772fda2a5cbe33abbba7db1 16-Oct-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Add atomic counters support

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
95fbd6e1eed58f1f87aaa425bb5312a92db29d21 15-Oct-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Handle coarse/fine derivatives

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
4582341ea74a076c981c962f1a01311bfa3bf991 16-Oct-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Add support for sample_pos and sample_id
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
7cd1537aae28b9189b1251688ac1a5dc9d36cc80 16-Oct-2014 Jason Ekstrand <jason.ekstrand@intel.com> Fix up varying pull constants

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b092bc9805f0f28209fc70fb367e0dc26e294317 16-Oct-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Use the correct texture offset immediate

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c181ff268e4787056fdee417d30d52b1098fe211 15-Oct-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Use the correct types for texture inputs

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
c2ded36bb60d3dfad0036dac7adbf7718968ccf2 15-Oct-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs_nir: Make the sampler register always unsigned

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
2faf7f87d6a1c00b3f3d3907178a2eeeefa5d2a9 15-Aug-2014 Connor Abbott <connor.abbott@intel.com> i965/fs: add a NIR frontend

This is similar to the GLSL IR frontend, except consuming NIR. This lets
us test NIR as part of an actual compiler.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
Make brw_fs_nir build again
Only use NIR of INTEL_USE_NIR is set
whitespace fixes
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs_nir.cpp