History log of /external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
0d5071db5e50629a63490639a3c86dfc65bf27ab 13-Jan-2017 Kenneth Graunke <kenneth@whitecape.org> i965: Move Gen4-5 interpolation stuff to brw_wm_prog_data.

This fixes glxgears rendering, which had surprisingly been broken since
late October! Specifically, commit 91d61fbf7cb61a44adcaae51ee08ad0dd6b.

glxgears uses glShadeModel(GL_FLAT) when drawing the main portion of the
gears, then uses glShadeModel(GL_SMOOTH) for drawing the Gouraud-shaded
inner portion of the gears. This results in the same fragment program
having two different state-dependent interpolation maps: one where
gl_Color is flat, and another where it's smooth.

The problem is that there's only one gen4_fragment_program, so it can't
store both. Each FS compile would trash the last one. But, the FS
compiles are cached, so the first one would store FLAT, and the second
would see a matching program in the cache and never bother to compile
one with SMOOTH. (Clearing the program cache on every draw made it
render correctly.)

Instead, move it to brw_wm_prog_data, where we can keep a copy for
every specialization of the program. The only downside is bloating
the structure a bit, but we can tighten that up a bit if we need to.
This also lets us kill gen4_fragment_program entirely!

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
56ee2df4bf9b1e8c26cf8689f5ef20237c95466b 13-Jan-2017 Juan A. Suarez Romero <jasuarez@igalia.com> i965/vec4: Fix mapping attributes

This patch reverts 57bab6708f2bbc1ab8a3d202e9a467963596d462, which was
causing issues with ILK and earlier VS programs.

1. brw_nir.c: Revert "i965/vec4/nir: vec4 also needs to remap vs attributes"

Do not perform a remap in vec4 backend. Rather, do it later when
setup attributes

2. brw_vec4.cpp: This fixes mapping ATTRx to proper GRFn.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99391
[jordan.l.justen@intel.com: merge Juan's two patches from bugzilla]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
57bab6708f2bbc1ab8a3d202e9a467963596d462 22-Apr-2016 Alejandro Piñeiro <apinheiro@igalia.com> i965/vec4/nir: vec4 also needs to remap vs attributes

Doubles need extra space, so we would need to do a remapping for vec4
too in order to take that into account. We reuse the already
existing remap_vs_attrs, but passing is_scalar, so they could
remap accordingly.

v2: code-format remap_vs_attrs_params initialization (Matt)

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
b4c44ff08c1154ff3790de89f29520d178e9e0ef 08-Aug-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Use the nir_move_comparisons pass.

While the below stats are encouraging this pass will also become
very usefull for avoiding regression once
brw_do_channel_expressions() and brw_do_vector_splitting() are
disabled.

On Broadwell:

total instructions in shared programs: 13078787 -> 13060898 (-0.14%)
instructions in affected programs: 1809827 -> 1791938 (-0.99%)
helped: 4527
HURT: 157

total cycles in shared programs: 256562762 -> 256590424 (0.01%)
cycles in affected programs: 159749392 -> 159777054 (0.02%)
helped: 5583
HURT: 2289

total spills in shared programs: 14929 -> 14923 (-0.04%)
spills in affected programs: 62 -> 56 (-9.68%)
helped: 1
HURT: 0

total fills in shared programs: 20144 -> 20141 (-0.01%)
fills in affected programs: 253 -> 250 (-1.19%)
helped: 1
HURT: 3

LOST: 0
GAINED: 2

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
b5e682a1efa0a929574e739807ac2b046e004561 10-Aug-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Move nir_lower_locals_to_regs a bit later.

I'm going to add a boolean scheduling pass that I want run late, but
after copy propagation and dead code elimination. Yet, I don't want
to have to think about registers. So, move the register conversion
a little later.

No impact on shader-db. Suggested by Jason Ekstrand.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
5edc3381628d1db4468f31b1c66bb518146e35b5 09-Jan-2017 Kenneth Graunke <kenneth@whitecape.org> compiler: Merge shader_info's tcs and tes structs.

Annoyingly, SPIR-V lets you specify all of these fields in either the
TCS or TES, which means that we need to be able to store all of them
for either shader stage. Putting them in a union won't work.

Combining both is an easy solution, and given that the TCS struct only
had a single field, it's pretty inexpensive.

This patch renames the combined struct to "tess" to indicate that it's
for tessellation in general, not one of the two stages.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
c2acf97fcc9b32eaa9778771282758e5652a8ad4 16-Dec-2016 Juan A. Suarez Romero <jasuarez@igalia.com> nir/i965: use two slots from inputs_read for dvec3/dvec4 vertex input attributes

So far, input_reads was a bitmap tracking which vertex input locations
were being used.

In OpenGL, an attribute bigger than a vec4 (like a dvec3 or dvec4)
consumes just one location, any other small attribute. So we mark the
proper bit in inputs_read, and also the same bit in double_inputs_read
if the attribute is a dvec3/dvec4.

But in Vulkan, this is slightly different: a dvec3/dvec4 attribute
consumes two locations, not just one. And hence two bits would be marked
in inputs_read for the same vertex input attribute.

To avoid handling two different situations in NIR, we just choose the
latest one: in OpenGL, when creating NIR from GLSL/IR, any dvec3/dvec4
vertex input attribute is marked with two bits in the inputs_read bitmap
(and also in the double_inputs_read), and following attributes are
adjusted accordingly.

As example, if in our GLSL/IR shader we have three attributes:

layout(location = 0) vec3 attr0;
layout(location = 1) dvec4 attr1;
layout(location = 2) dvec3 attr2;

then in our NIR shader we put attr0 in location 0, attr1 in locations 1
and 2, and attr2 in location 3 and 4.

Checking carefully, basically we are using slots rather than locations
in NIR.

When emitting the vertices, we do a inverse map to know the
corresponding location for each slot.

v2 (Jason):
- use two slots from inputs_read for dvec3/dvec4 NIR from GLSL/IR.

v3 (Jason):
- Fix commit log error.
- Use ladder ifs and fix braces.
- elements_double is divisible by 2, don't need DIV_ROUND_UP().
- Use if ladder instead of a switch.
- Add comment about hardware restriction in 64bit vertex attributes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
45912fb908f7a1d2efbce0f1dbe81e5bc975fbe1 10-Dec-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/compiler: Use the new nir_opt_copy_prop_vars pass

We run this after nir_lower_vars_to_ssa so that as many load/store_var
intrinsics as possible before copy_prop_vars executes. This is because the
pass isn't particularly efficient (it does a lot of linear walks of a
linked list) so we'd like as much of the work as possible to be done before
copy_prop_vars runs.

Shader DB results on Sky Lake:

total instructions in shared programs: 12020290 -> 12013627 (-0.06%)
instructions in affected programs: 26033 -> 19370 (-25.59%)
helped: 16
HURT: 13

total cycles in shared programs: 137772848 -> 137549012 (-0.16%)
cycles in affected programs: 6955660 -> 6731824 (-3.22%)
helped: 217
HURT: 237

total loops in shared programs: 3208 -> 3208 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 4112 -> 4057 (-1.34%)
spills in affected programs: 483 -> 428 (-11.39%)
helped: 2
HURT: 0

total fills in shared programs: 5519 -> 5102 (-7.56%)
fills in affected programs: 993 -> 576 (-41.99%)
helped: 2
HURT: 0

LOST: 0
GAINED: 0

Broadwell had similar results. On older hardware, the impact isn't as
large because they don't advertise GL 4.5. Of the hurt programs, all but
one are hurt by a single instruction and the one is hurt by 3 instructions.
All of the helped programs, on the other hand, are helped by at least 3
instructions and one kerbal space program shader is helped by 44.59%.

The real star of the show, however, is the Gl43CSDof synmark2 benchmark
which has two shaders which are cut by 28% and 40% and the over-all runtime
performance of the benchmark on my Sky Lake laptop is improved by around
25-30% (it's a bit hard to be exact due to thermal throttling).

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
e6ae19944d977dc91bc45adff679337182c20683 24-Nov-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Rework gl_TessLevel*[] handling to use NIR compact arrays.

Treating everything as scalar arrays allows us to drop a bunch of
special case input/output munging all throughout the backend.
Instead, we just need to remap the TessLevel components to the
appropriate patch URB header locations in remap_patch_urb_offsets().

We also switch to treating the TES input versions of these as ordinary
shader inputs rather than system values, as remap_patch_urb_offsets()
just makes everything work out without special handling.

This regresses one Piglit test:
arb_tessellation_shader-large-uniforms/GL_TESS_CONTROL_SHADER-array-at-limit

The compiler starts promoting the constant arrays assigned to gl_TessLevel*
to uniform arrays. Since the shader also has a uniform array that uses
the maximum number of uniform components, this puts it over the uniform
component limit enforced by the linker. This is arguably a bug in the
constant array promotion code (it should avoid pushing us over limits),
but is unlikely to penalize any real application.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
8962cc96ec2bc1eb561a438512adc5042e2c8d34 17-Dec-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965: Use nir_opt_trivial_continues and nir_opt_if

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
715f0d06d19e7c33d98f99c764c5c3249d13b1c0 13-Dec-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: use nir loop unrolling pass

shader-db results for BDW:

total instructions in shared programs: 12589614 -> 12590119 (0.00%)
instructions in affected programs: 50525 -> 51030 (1.00%)
helped: 7
HURT: 145

total cycles in shared programs: 241524604 -> 241490502 (-0.01%)
cycles in affected programs: 1941404 -> 1907302 (-1.76%)
helped: 302
HURT: 449

total loops in shared programs: 4245 -> 2947 (-30.58%)
loops in affected programs: 1535 -> 237 (-84.56%)
helped: 1142
HURT: 0

total spills in shared programs: 14453 -> 14453 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0

total fills in shared programs: 18984 -> 18984 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0

LOST: 26
GAINED: 15

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
e729504fb1799c3ae31cea76d73946530ef9806f 14-Sep-2016 Timothy Arceri <timothy.arceri@collabora.com> nir: pass compiler rather than devinfo to functions that call nir_optimize

Later we will pass compiler to nir_optimise to be used by the loop unroll
pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
eda3ec7957ec9324641ee75847b892885e77335f 05-Dec-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: use nir_lower_indirect_derefs() for GLSL

This moves the nir_lower_indirect_derefs() call into
brw_preprocess_nir() so thats is called by both OpenGL and Vulkan
and removes that call to the old GLSL IR pass
lower_variable_index_to_cond_assign()

We want to do this pass in nir to be able to move loop unrolling
to nir.

There is a increase of 1-3 instructions in a small number of shaders,
and 2 Kerbal Space program shaders that increase by 32 instructions.
The changes seem to be caused be the difference in the GLSL IR vs
NIR variable index lowering passes. The GLSL IR pass creates a
simple if ladder for arrays of size 4 or less, while the NIR pass
implements a binary search for all arrays regardless of size.

Shader-db results BDW:

total instructions in shared programs: 13021176 -> 13021819 (0.00%)
instructions in affected programs: 57693 -> 58336 (1.11%)
helped: 20
HURT: 190

total cycles in shared programs: 299805580 -> 299750826 (-0.02%)
cycles in affected programs: 2290024 -> 2235270 (-2.39%)
helped: 337
HURT: 442

total fills in shared programs: 19984 -> 19984 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0

LOST: 4
GAINED: 0

V2: remove the do_copy_propagation() call from the i965 GLSL IR
linking code. This call was added in f7741c52111 but since we are
moving the variable index lowering to NIR we no longer need it and
can just rely on the nir copy propagation pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
77f65b3b643d135a5eab3799a600302177dccb26 13-Dec-2016 Iago Toral Quiroga <itoral@igalia.com> i965/nir: enable lowering of texture gradient for shadow samplers

This gets the lowering on the Vulkan driver too, which is required for
hardware that does not have the sample_l_d message (up to IvyBridge).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
f90da64fc65d8da99a5a5a140be7b64c8cf5ee6a 30-Nov-2016 Iago Toral Quiroga <itoral@igalia.com> i965/nir: enable lowering of texture gradient for cube maps

This gets the lowering on the Vulkan driver too.

Fixes Vulkan CTS cube map texture gradient tests in:
dEQP-VK.glsl.texture_functions.texturegrad.*

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
0291bf4db2affbfb6a7daeb562c0cf56f6a85829 06-Dec-2016 Jason Ekstrand <jason.ekstrand@intel.com> Revert "i965: use nir_lower_indirect_derefs() for GLSL"

This reverts commit 9404439a754e5640ccd98df40fa694835c0d8759. I didn't
intend to push it and it breaks clip and cull distance.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
9404439a754e5640ccd98df40fa694835c0d8759 15-Aug-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: use nir_lower_indirect_derefs() for GLSL

This moves the nir_lower_indirect_derefs() call into
brw_preprocess_nir() so thats is called by both OpenGL and Vulkan
and removes that call to the old GLSL IR pass
lower_variable_index_to_cond_assign()

We want to do this pass in nir to be able to move loop unrolling
to nir.

There is a increase of 1-3 instructions in a small number of shaders,
and 2 Kerbal Space program shaders that increase by 32 instructions.

Shader-db results BDW:

total instructions in shared programs: 8705873 -> 8706194 (0.00%)
instructions in affected programs: 32515 -> 32836 (0.99%)
helped: 3
HURT: 79

total cycles in shared programs: 74618120 -> 74583476 (-0.05%)
cycles in affected programs: 528104 -> 493460 (-6.56%)
helped: 47
HURT: 37

LOST: 2
GAINED: 0
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
a8ef92b031729fca86f8615d0ca113ddf0965af9 08-Nov-2016 Jason Ekstrand <jason@jlekstrand.net> i965/compiler: Disable trig workarounds on KBL+

The precision of our trig instructions appears to have been fixed on Kaby
Lake. Neither Ben nor I can find any documentation for this. However, the
dEQP precision tests now pass with INTEL_PRECISE_TRIG=0 where they fail on
Sky Lake.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
38a8507f79b8da71b309654ce56854bbea1bcf94 17-Oct-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Use NIR-based clip/cull lowering for OpenGL as well.

The old approach works fine, and this approach isn't necessarily better.
But it at least has the advantage that Vulkan and GL use the same
approach. I originally wrote it to gain additional testing for the
new paths.

shader-db statistics show 0 instruction count changes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
2e423ca1477bd212c01676c5e4828ebdb83310d8 25-Oct-2016 Timothy Arceri <timothy.arceri@collabora.com> nir: stop adjusting driver location for varying packing

As of 59864e8e020 we just use the location assigned by the front-end and
no longer need this for i965.

Since there were some issues in the logic with assigning arrays the same
driver location if they didn't start at the same location just remove it
and let other drivers implement a solution if needed when they add
ARB_enhanced_layouts support.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
91d61fbf7cb61a44adcaae51ee08ad0dd6b2a03b 20-Oct-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: rewrite brw_setup_vue_interpolation()

Here brw_setup_vue_interpolation() is rewritten not to use the InterpQualifier
array in gl_fragment_program which will allow us to remove it.

This change also makes the code which is only used by gen4/5 more self contained
as it now has its own gen5_fragment_program struct rather than storing the map
in brw_context. This means the interpolation map will only get processed once
and will get stored in the in memory cache rather than being processed everytime
the fs changes.

Also by calling this from the fs compile code rather than from the upload code
and using the interpolation assigned there we can get rid of the
BRW_NEW_INTERPOLATION_MAP flag.

It might not seem ideal to add a gen5_fragment_program struct however by the end
of this series we will have gotten rid of all the brw_{shader_stage}_program
structs and replaced them with a generic brw_program struct so there will only
be two program structs which is better than what we have now.

V2: Don't remove BRW_NEW_INTERPOLATION_MAP from dirty_bit_map until the following
patch to fix build error.

V3 - Suggestions by Jason:
- name struct gen4_fragment_program rather than gen5_fragment_program
- don't use enum with memset()
- create interp mode set helper and simplify logic to call it
- add assert when calling function to show prog will never be NULL for
gen4/5 i.e. no Vulkan

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
e1af20f18a86f52a9640faf2d4ff8a71b0a4fa9b 13-Oct-2016 Timothy Arceri <timothy.arceri@collabora.com> nir/i965/anv/radv/gallium: make shader info a pointer

When restoring something from shader cache we won't have and don't
want to create a nir_shader this change detaches the two.

There are other advantages such as being able to reuse the
shader info populated by GLSL IR.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
094fe3a9591ce200162d955635eee577c13f9324 13-Oct-2016 Timothy Arceri <timothy.arceri@collabora.com> nir: move nir_shader_info to a common compiler header

This will allow use to stop copying values between structs and
will also simplify handling handling these values in the shader cache.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
59864e8e02057cc6fa0448a8af067a3cf53389da 13-Oct-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Don't use nir_assign_var_locations for VS/TES/GS outputs.

Fixes spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-dvec3.

v2: Remove nir_outputs field from fs_visitor (caught by Tim and Iago).

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
89e1436e2d4ff0c15202708979eb36761cae4167 11-Oct-2016 Ian Romanick <ian.d.romanick@intel.com> i965: Silence unused parameter warnings

brw_link.cpp:76:44: warning: unused parameter ‘shader_type’ [-Wunused-parameter]
gl_shader_stage shader_type,
^
brw_nir.c: In function ‘brw_nir_lower_vs_inputs’:
brw_nir.c:194:55: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
const struct gen_device_info *devinfo,
^
brw_vec4_visitor.cpp:914:37: warning: unused parameter ‘sampler’ [-Wunused-parameter]
uint32_t sampler,
^
brw_vec4_visitor.cpp:1146:34: warning: unused parameter ‘stream_id’ [-Wunused-parameter]
vec4_visitor::gs_emit_vertex(int stream_id)
^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
36f0f0318275f65f8744ec6f9471702e2f58e6d5 07-Sep-2016 Eric Anholt <eric@anholt.net> nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.

VC4 was running into a major performance regression from enabling control
flow in the glmark2 conditionals test, because of short if statements
containing an ffract.

This pass seems like it was was trying to ensure that we only flattened
IFs that should be entirely a win by guaranteeing that there would be
fewer bcsels than there were MOVs otherwise. However, if the number of
ALU ops is small, we can avoid the overhead of branching (which itself
costs cycles) and still get a win, even if it means moving real
instructions out of the THEN/ELSE blocks.

For now, just turn on aggressive flattening on vc4. i965 will need some
tuning to avoid regressions. It does looks like this may be useful to
replace freedreno code.

Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47
fps to 95 fps on vc4.

vc4 shader-db:
total instructions in shared programs: 101282 -> 99543 (-1.72%)
instructions in affected programs: 17365 -> 15626 (-10.01%)
total uniforms in shared programs: 31295 -> 31172 (-0.39%)
uniforms in affected programs: 3580 -> 3457 (-3.44%)
total estimated cycles in shared programs: 225182 -> 223746 (-0.64%)
estimated cycles in affected programs: 26085 -> 24649 (-5.51%)

v2: Update shader-db output.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
111f6b250d01fa1937103f24b5cb54b15dd77fbf 14-Sep-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Roll set_default_interpolation into lower_fs_inputs

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
246db0063eb6e01aad961b1c73d32fca911ae1df 14-Sep-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Use NIR for handling forced per-sample interpolation

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
ed65e6ef49e17e9cae93a8f98e2968346de2bc6e 14-Sep-2016 Jason Ekstrand <jason.ekstrand@intel.com> nir: Add a flag to lower_io to force "sample" interpolation

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
2d8a3fa7ea994ad02a40ff497109f966e3fcbeec 14-Sep-2016 Kenneth Graunke <kenneth@whitecape.org> nir: Report progress from nir_lower_phis_to_scalar.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
32630e211e60a2b41388d403cfbd4f43344d8590 14-Sep-2016 Kenneth Graunke <kenneth@whitecape.org> nir: Report progress from nir_lower_alu_to_scalar.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
527f37199929932300acc1688d8160e1f3b1d753 23-Aug-2016 Jason Ekstrand <jason.ekstrand@intel.com> intel: s/brw_device_info/gen_device_info/

Generated by:

sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
7dac8820730777756c00d7024330517848dc3b9f 22-Jul-2016 Francisco Jerez <currojerez@riseup.net> i965/fs: Rework representation of fragment output locations in NIR.

The problem with the current approach is that driver output locations
are represented as a linear offset within the nir_outputs array, which
makes it rather difficult for the back-end to figure out what color
output and index some nir_intrinsic_load/store_output was meant for,
because the offset of a given output within the nir_output array is
dependent on the type and size of all previously allocated outputs.
Instead this defines the driver location of an output to be the pair
formed by its GLSL-assigned location and index (I've borrowed the
bitfield macros from brw_defines.h in order to represent the pair of
integers as a single scalar value that can be assigned to
nir_variable_data::driver_location). nir_assign_var_locations is no
longer useful for fragment outputs.

Because fragment outputs are now allocated independently rather than
within the nir_outputs array, the get_frag_output() helper becomes
necessary in order to obtain the right temporary register for a given
location-index pair.

The type_size helper passed to nir_lower_io is now type_size_dvec4
rather than type_size_vec4_times_4 so that output array offsets are
provided in terms of whole array elements rather than in terms of
scalar components (dvec4 is the largest vector type supported by the
GLSL so this will cause all individual fragment outputs to have a size
of one regardless of the type).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
9f32721f8695f3e55849dce015da3b53d1af5d57 21-Jul-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Enable NIR lowering of txf and rect offsets

This fixes the following piglit tests on gen6+:

tex-miplevel-selection textureProjGradOffset 2DRect
tex-miplevel-selection textureGradOffset 2DRect
tex-miplevel-selection textureGradOffset 2DRectShadow
tex-miplevel-selection textureProjGradOffset 2DRect_ProjVec4
tex-miplevel-selection textureProjGradOffset 2DRectShadow

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
7f53fead5cf9a85c74a94d359dd5fccfbb87856c 23-May-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: enable component packing for vs and fs

Rather than trying to work out the total number of components
used at a location we simply treat all outputs as vec4s. This
removes the need for complex code looping over varyings to match
packed locations and the need for storing the total number of
components used at each location.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
1eef0b73aa323d94d5a080cd1efa81ccacdbd0d2 12-Jul-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Rewrite FS input handling to use the new NIR intrinsics.

This eliminates the need to walk the list of input variables, recurse
into their types (via logic largely redundant with nir_lower_io), and
interpolate all possible inputs up front. The backend no longer has
to care about variables at all, which eliminates complications from
trying to pack multiple variables into the same location. Instead,
each intrinsic specifies exactly what's needed.

This should unblock Timothy's work on GL_ARB_enhanced_layouts.

Each load_interpolated_input intrinsic corresponds to PLN instructions,
while load_barycentric_at_* intrinsics correspond to pixel interpolator
messages. The pixel/centroid/sample barycentric intrinsics simply refer
to payload fields (delta_xy[]), and don't actually generate any code.

Because we use a single intrinsic for both centroid-qualified variables
and interpolateAtCentroid(), they become indistinguishable. We stop
sending pixel interpolator messages for those, and instead use the
payload provided data, which should be considerably faster.

On Broadwell:

total instructions in shared programs: 9067751 -> 9067570 (-0.00%)
instructions in affected programs: 145902 -> 145721 (-0.12%)
helped: 422
HURT: 209

total spills in shared programs: 2849 -> 2899 (1.76%)
spills in affected programs: 760 -> 810 (6.58%)
helped: 0
HURT: 10

total fills in shared programs: 3910 -> 3950 (1.02%)
fills in affected programs: 617 -> 657 (6.48%)
helped: 0
HURT: 10

LOST: 3
GAINED: 3

The differences mostly appear to be slight changes in MOVs.

v2: Use nir_shader_compiler_options::use_interpolated_input_intrinsics
flag rather than passing it directly to nir_lower_io. Use the
unreachable() macro rather than assert in one place. (Review
feedback from Chris Forbes.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
448adfbc67f4f6d0268a2f94dac311a26dc19864 18-May-2016 Timothy Arceri <timothy.arceri@collabora.com> nir: use the same driver location for packed varyings

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
a8a9d1bf41c00123cefb6e757f3509c62e880a15 14-Jun-2016 Timothy Arceri <timothy.arceri@collabora.com> i965: remove type_size_vec4_times_4()

type_size_vec4_times_4() was introduced as a fix in 8dcf807cb43383
however since 3810c1561 we can just use type_size_scalar() and
get the actual number of outputs we need.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
5e43ba7e9e9bfce451f9caa3845136f8a5b6eda0 26-May-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965: Move brw_create_nir to brw_program.c

This way it's no longer part of libi965_compiler.la since it depends on
GLSL and ARB program stuff.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
86a2447eec7e87e46e842ca7a3ad5cd9fadb1ca5 26-May-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Move the type_size_*_bytes functions to brw_nir.h

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
32210dea8e474f8e93f5df681fb6a8265a0cda4b 26-May-2016 Jason Ekstrand <jason.ekstrand@intel.com> compiler: Move glsl_to_nir to libglsl.la

Right now libglsl.la depends on libnir.la so putting it in libnir.la
adds a dependency on libglsl.la that goes the wrong direction.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
654e950cba55dabd2d9accb60db8e5f4c1495716 02-May-2016 Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com> i965: Invoke lowering pass for YUV textures

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
0970c563d6e4a30ab0852ef664dc97a995997f88 20-May-2016 Dave Airlie <airlied@redhat.com> nir: remove dead glsl variables before lowering io.

For cull distance GLSL will let unsized unused arrays get
into the backend, we should nuke those straight away, to
save caring about them later.

This fixes:
arb_separate_shader_objects/linker/large-number-of-unused-varyings
as a side effect (even without culling changes).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
dac10e8a1390711f1f36f224644c4a33586cebe3 17-May-2016 Kenneth Graunke <kenneth@whitecape.org> i965, anv: Use NIR FragCoord re-center and y-transform passes.

This handles gl_FragCoord transformations and other window system vs.
user FBO coordinate system flipping by multiplying/adding uniform
values, rather than recompiles.

This is much better because we have no decent way to guess whether
the application is going to use a shader with the window system FBO
or a user FBO, much less the drawable height. This led to a lot of
recompiles in many applications.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
d6281a9d955ad97f993927bc214e4b641cfbe359 15-Apr-2016 Juan A. Suarez Romero <jasuarez@igalia.com> i965: take care of doubles when lowering VS inputs

Input attributes can require 2 vec4 or 1 vec4 depending on whether they
are double-precision or not.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
b0fb08e179d784ca319c3c547a874fd24ce93c3f 01-Apr-2016 Juan A. Suarez Romero <jasuarez@igalia.com> i965: take care of doubles when remapping VS attributes

Double-precision types require 1 slot in VUE for double and dvec2, and 2 slots for
anything else.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
dfbabc6bad775e1575ff4a97a3c871341cd57f77 25-Mar-2016 Rob Clark <robclark@freedesktop.org> nir/lower-io: add support for lowering inputs

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
b085016f94721a6c18f7076fc37c450a98e6bdbc 25-Mar-2016 Rob Clark <robclark@freedesktop.org> nir: rename lower_outputs_to_temporaries -> lower_io_to_temporaries

Since it will gain support to lower inputs, give it a more generic name.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
47fcef9a209e533103a7ecf4d69440a67aa463ed 25-Mar-2016 Rob Clark <robclark@freedesktop.org> nir: move callsite of lower_outputs_to_temporaries

Going to convert this pass to parameterized lower_io_to_temporaries, and
we want the user to be able to specify whether to lower outputs or
inputs or both. The restriction of running this pass before validate
to avoid output reads no longer applies.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
30424fd25a2f6554c35272d8edeacab0299ad8cc 07-Aug-2015 Connor Abbott <connor.w.abbott@intel.com> i965: use pack/unpackDouble lowering

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
bea2f8beb53450bd07e5a33d48f00a9e9520645d 04-Aug-2015 Connor Abbott <connor.w.abbott@intel.com> i965: use double lowering pass

v2: also lower trunc, ceil, floor, fract and roundEven (Iago)
v3: also lower mod for doubles (Sam)

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
9464d8c49813aba77285e7465b96e92a91ed327c 27-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> nir: Switch the arguments to nir_foreach_function

This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:

s/nir_foreach_function(\([^,]*\),\s*\([^,]*\))/nir_foreach_function(\2, \1)/

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
707e72f13bb78869ee95d3286980bf1709cba6cf 27-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> nir: Switch the arguments to nir_foreach_instr

This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:

s/nir_foreach_instr(\([^,]*\),\s*\([^,]*\))/nir_foreach_instr(\2, \1)/

and similar expressions for nir_foreach_instr_safe etc.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
7efff10585122d484dc3adab14af9380b9b8f309 13-Apr-2016 Connor Abbott <cwabbott0@gmail.com> i965/nir: fixup for new foreach_block()

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
b6dc940ec273252678d40707d300851fa1c85ea5 13-Apr-2016 Connor Abbott <cwabbott0@gmail.com> nir: rename nir_foreach_block*() to nir_foreach_block*_call()

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
b63a98b1211d22f759ae9c80b2270fe2d3b2639e 25-Mar-2016 Jason Ekstrand <jason.ekstrand@intel.com> nir/dead_variables: Configurably work with any variable mode

The old version of the pass only worked on globals and locals and always
left inputs, outputs, uniforms, etc. alone.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
bfd17c76c1267756ea16051cbe174cb23ff49f44 08-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Port INTEL_PRECISE_TRIG=1 to NIR.

This makes the extra multiply visible to NIR's algebraic optimizations
(for constant reassociation) as well as constant folding. This means
that when the result of sin/cos are multiplied by an constant, we can
eliminate the extra multiply altogether, reducing the cost of the
workaround.

It also means we only have to implement it one place, rather than in
both backends.

This makes INTEL_PRECISE_TRIG=1 cost nothing on GPUTest/Volplosion,
which has a ton of sin() calls, but always multiplies them by an
immediate constant. The extra multiply gets folded away.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
b0dffdc616801a1fd8534502e11ac840369041ab 08-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Pass brw_compiler into brw_preprocess_nir() instead of is_scalar.

I want to be able to read other fields.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
084b24f5582567ebf5aa94b7f40ae3bdcb71316b 16-Mar-2016 Iago Toral Quiroga <itoral@igalia.com> nir: rename nir_const_value fields to include bitsize information

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
9076c4e289de0debf1fb2a7237bdeb9c11002347 14-Aug-2015 Connor Abbott <connor.w.abbott@intel.com> nir: update opcode definitions for different bit sizes

Some opcodes need explicit bitsizes, and sometimes we need to use the
double version when constant folding.

v2: fix output type for u2f (Iago)

v3: do not change vecN opcodes to be float. The next commit will add
infrastructure to enable 64-bit integer constant folding so this is isn't
really necessary. Also, that created problems with source modifiers in
some cases (Iago)

v4 (Jason):
- do not change bcsel to work in terms of floats
- leave ldexp generic

Squashed changes to handle different bit sizes when constant
folding since otherwise we would break the build.

v2:
- Use the bit-size information from the opcode information if defined (Iago)
- Use helpers to get type size and base type of nir_alu_type enum (Sam)
- Do not fallback to sized types to guess bit-size information. (Jason)

Squashed changes in i965 and gallium/nir drivers to support sized types.
These functions should only see sized types, but we can't make that change
until we make sure that nir uses the sized versions in all the relevant places.
A later commit will address this.

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
7d021cb15e6d67ecef8b020fd36c4a680bcc9c39 18-Jan-2016 Jordan Justen <jordan.l.justen@intel.com> i965/nir: Lower nir compute shader shared variables

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
a0294c2cf3d23943c7f365abf84765afa0f383a2 25-Feb-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Simplify brw_nir_lower_vue_inputs() slightly.

The same code appeared in both branches; pull it above the if statement.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
8151003ade952c3e9d8284fada9237e1311cf173 25-Feb-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Avoid recalculating the normal VUE map for IO lowering.

The caller already computes it. Now that we have stage specific
functions, it's really easy to pass this in.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
15b3639bf1b0676e74b107d74653185eedbc6688 25-Feb-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Avoid recalculating the tessellation VUE map for IO lowering.

The caller already computes it. Now that we have stage specific
functions, it's really easy to pass this in.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
cfbd9831f89ef165e7998d0b8524a1aefedec404 25-Feb-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Eliminate brw_nir_lower_{inputs,outputs,io} functions.

Now that each stage is directly calling brw_nir_lower_io(), and we have
per-stage helper functions, it makes sense to just call the relevant one
directly, rather than going through multiple switch statements.

This also eliminates stupid function parameters, such as the two that
only apply to vertex attributes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
b96ddd2e52e7205e5820714f2ad1028b666426c6 25-Feb-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Split brw_nir_lower_inputs/outputs into per-stage functions.

These functions are both giant switch statements where most cases don't
overlap at all. Let's put the bulk of the work in per-stage helpers.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
d33c478bedc6ac0b2bd2646e443e690cdfb2b640 25-Feb-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Remove catch-all nir_lower_io call with specific cases.

Most cases already call nir_lower_io explicitly for input and output
lowering. This catch all isn't very useful anymore - we can just add it
to the remaining cases.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
51f87979934100af8c40cfa17f670bc38417ddc0 25-Feb-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Move optimizations from brw_nir_lower_io to brw_postprocess_nir.

This simplifies things. Every caller of brw_nir_lower_io() immediately
calls brw_postprocess_nir(). The only real change this will have is
that we get an extra brw_nir_optimize() call when compiling compute
shaders, but that seems fine.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
dcd4a841e9096b988ea3ca2779e7c8b1ca5e5747 25-Feb-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Always do NIR IO lowering at specialization time.

We've now hit literally every case other than geometry shaders (and
compute shaders, but those are a no-op). So, let's just move geometry
shaders over too and be done with it.

The only advantage to doing this at link time was to save the expense
of running the pass on recompiles. But we're already running a lot of
passes, and the extra code complexity isn't worth it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
b3cb6e78aa219ad73c145a25ee1bb48fd8b025d0 17-Feb-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Do lower_io late for fragment shaders

The Vulkan driver wants to be able to delete fragment outputs that are
beyond key.nr_color_regions; this is a lot easier if we lower outputs at
specialization time rather than link time.

(Rationale added to commit message by Ken)

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
d56ae2d1605fc1b5a3fdf5aba9aefc3c7692a4ba 14-Jan-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Apply VS attribute workarounds in NIR.

This patch re-implements the pre-Haswell VS attribute workarounds.
Instead of emitting shader code in the vec4 backend, we now simply
call a NIR pass to emit the necessary code.

This simplifies the vec4 backend. Beyond deleting code, it removes
the primary use of ATTR as a destination. It also eliminates the
requirement that the vec4 VS backend express the ATTR file in terms
of VERT_ATTRIB_* locations, giving us a bit more flexibility.

This approach is a little different: rather than munging the attributes
at the top, we emit code to fix them up when they're accessed. However,
we run the optimizer afterwards, so CSE should eliminate the redundant
math. It may even be able to fuse it with other calculations based on
the input value.

shader-db does not handle non-default NOS settings, so I have no
statistics about this patch.

Note that the scalar backend does not implement VS attribute
workarounds, as they are unnecessary on hardware which allows SIMD8 VS.

v2: Do one multiply for FIXED rescaling and select components from
either the original or scaled copy, rather than multiplying each
component separately (suggested by Matt Turner).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
74f956c416d5b0b37b4c2d6b957167bb203502c3 22-Jan-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Use nir_lower_load_const_to_scalar().

I don't know why, but we never hooked up this pass Eric wrote.
Otherwise, you can end up with stupid scalarized code such as:

vec4 ssa_7 = load_const (0.0, 0.0, 0.0, 0.0)
vec4 ssa_8 = ...
vec1 ssa_9 = feq ssa_8, ssa_7
vec1 ssa_10 = feq ssa_8.y, ssa_7.y
vec1 ssa_11 = feq ssa_8, ssa_7.z
vec1 ssa_12 = feq ssa_8.y, ssa_7.w

ssa_8.xyxy == <0, 0, 0, 0> should only take two feq instructions.

shader-db on Skylake:

total instructions in shared programs: 9121153 -> 9120749 (-0.00%)
instructions in affected programs: 32421 -> 32017 (-1.25%)
helped: 277
HURT: 69

total cycles in shared programs: 69003364 -> 69000912 (-0.00%)
cycles in affected programs: 899186 -> 896734 (-0.27%)
helped: 313
HURT: 403

This also prevents regressions when disabling channel expressions.

v2: Don't call opt_cse afterwards (requested by Matt). It should
happen in the optimization loop below anyway.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
a39a8fbbaa129f4e52f2a3ad2747182e9a74d910 17-Jan-2016 Emil Velikov <emil.velikov@collabora.com> nir: move to compiler/

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
3657cbf24f3b0baf7e3382e572d97a36b0ed4103 14-Jan-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Apply add_const_offset_to_base for vec4 VS inputs too.

This shouldn't hurt anything, and I'm about to introduce a pass that
will want it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
a3500f943e2c61c0aed043108132f35b79d16676 14-Jan-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Make add_const_offset_to_base() work at the shader level.

This makes it a pass, hiding the parameter structs and block callbacks
so it's simpler to work with.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
317628dbb35d03d1e855332c892594ae491c5d24 18-Nov-2015 Rob Clark <robclark@freedesktop.org> nir: extract out helper macros for running passes

Note these are a bit uglier, due to avoidance of GNU C extensions. But
drivers which do not need to be built with compilers that don't support
the extension can wrap these macros with their own.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
237f2f2d8b45d9d956102eec6f9be63193e5269b 26-Dec-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir: Get rid of function overloads

When Connor originally drafted NIR, he copied the same function+overload
system that GLSL IR had with a few names changed. However, this
double-indirection is not really needed and has only served to confuse
people. Instead, let's just have functions which may not have unique names
and may or may not have an implementation. If someone wants to do overload
resolving, they can hav a hash table based function+overload system in the
overload resolving pass. There's no good reason to keep it in core NIR.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>

ir3 bits are

Reviewed-by: Rob Clark <robclark@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
794eb9d7270456ab3d2cadbaf302192eca7f4dbc 08-Dec-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Handle mix-and-match TCS/TES with separate shader objects.

GL_ARB_separate_shader_objects allows the application to mix-and-match
TCS and TES programs separately. This means that the interface between
the two stages isn't known until the final SSO pipeline is in place.

This isn't a great match for our hardware: the TCS and TES have to agree
on the Patch URB entry layout. Since we store data as per-patch slots
followed by per-vertex slots, changing the number of per-patch slots can
significantly alter the layout. This can easily happen with SSO.

To handle this, we store the [Patch]OutputsWritten and [Patch]InputsRead
bitfields in the TCS/TES program keys, introducing program recompiles.
brw_upload_programs() decides the layout for both TCS and TES, and
passes it to brw_upload_tcs/tes(), which store it in the key.

When creating the NIR for a shader specialization, we override
nir->info.inputs_read (and friends) to the program key's values.
Since everything uses those, no further compiler changes are needed.
This also replaces the hack in brw_create_nir().

To avoid recompiles, brw_precompile_tes() looks to see if there's a
TCS in the linked shader. If so, it accounts for the TCS outputs,
just as brw_upload_programs() would. This eliminates all recompiles
in the non-SSO case. In the SSO case, there should only be recompiles
when using a TCS and TES that have different input/output interfaces.

Fixes Piglit's mix-and-match-tcs-tes test.

v2: Pull the brw_upload_programs code into a brw_upload_tess_programs()
helper function (requested by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
01b1b44d31adde3954d1f1404ca66f90d87d4ae5 08-Dec-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Defer input lowering for tessellation stages until specialization.

With tessellation shaders and SSO, we won't be able to always decide on
VUE map layouts at LinkProgram time. Unfortunately, we have to delay it
until shader specialization time.

However, uniform lowering cannot be deferred - brw_codegen_*_prog()
reads nir->num_uniforms. Fortunately, we don't need to defer it -
uniform, system value, atomic, and sampler lowering can safely stay
where it is. This patch moves those to brw_lower_nir()'s only caller,
renames brw_lower_nir() to brw_nir_lower_io(), and introduces calls
to that.

For non-tessellation stages, I chose to call brw_nir_lower_io() from
brw_create_nir(), so it's still done at the same time. There's no
need to defer it, and doing it at LinkProgram time is nice.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
9f0944d15b9d2cd85f501f80eea7e6b6fc7f3487 10-Dec-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Make TES inputs match TCS outputs.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
4fac9500100273424450b5687c4e04dfd066d08e 10-Dec-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Force VS -> TCS varyings to use the SSO VUE map layout.

The compact VUE map only works when varying packing is in use.
Unfortunately, varying packing is disabled for TCS inputs.

This is needed to fix Piglit's tcs-input-read-array-interface test.

v2: Make lines fit in 80 columns (caught by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
bee42cc1f78c480d13da879682268ee14b0d6fe7 10-Dec-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Handle TCS outputs and TES inputs.

TCS outputs and TES inputs both refer to a common "patch URB entry"
shared across all invocations. First, there are some number of
per-patch entries. Then, there are per-vertex entries accessed via
an offset for the variable and a stride times the vertex index.

Because these calculations need to be done in both the vec4 and scalar
backends, it's simpler to just compute the offset calculations in NIR.
It doesn't necessarily make much sense to use per-vertex intrinsics
afterwards, but that at least means we don't lose the per-patch vs.
per-vertex information.

v2: Use is_input/is_output helpers (suggested by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
31140d097a7939e0f917aa76bd37b5c682898e63 10-Dec-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Handle TCS inputs and TES outputs.

TES outputs work exactly like VS outputs, so we can simply add a case
statement for those.

TCS inputs are very similar to geometry shaders - they're arrays of
per-vertex data. We use the same method I used for the scalar GS
backend.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
9f3917bf372aa19f85875dbe30ca12adc9b67b90 10-Dec-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Fix partial variable access for geometry shaders in SSO mode.

Without varying packing, if a VS writes a compound variable, and the GS
only reads part of it, the base location of the variable may not
actually be in the VUE map.

To cope with this, we do lowering in terms of varying slots, add any
constant offsets to the base, and then do the VUE map remapping. This
ensures we only look up VUE map entries for slots which actually exist.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
8c4deb10dfbef683a2052a7bd62450aa76ad8fde 09-Dec-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Separate base offset/constant offset combining from remapping.

My tessellation branch has two additional remap functions. I don't want
to replicate this logic there.

v2: Handle inputs/outputs separately (suggested by Jason Ekstrand).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
78b81be627734ea7fa50ea246c07b0d4a3a1638a 25-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir: Get rid of *_indirect variants of input/output load/store intrinsics

There is some special-casing needed in a competent back-end. However, they
can do their special-casing easily enough based on whether or not the
offset is a constant. In the mean time, having the *_indirect variants
adds special cases a number of places where they don't need to be and, in
general, only complicates things. To complicate matters, NIR had no way to
convdert an indirect load/store to a direct one in the case that the
indirect was a constant so we would still not really get what the back-ends
wanted. The best solution seems to be to get rid of the *_indirect
variants entirely.

This commit is a bunch of different changes squashed together:

- nir: Get rid of *_indirect variants of input/output load/store intrinsics
- nir/glsl: Stop handling UBO/SSBO load/stores differently depending on indirect
- nir/lower_io: Get rid of load/store_foo_indirect
- i965/fs: Get rid of load/store_foo_indirect
- i965/vec4: Get rid of load/store_foo_indirect
- tgsi_to_nir: Get rid of load/store_foo_indirect
- ir3/nir: Use the new unified io intrinsics
- vc4: Do all uniform loads with byte offsets
- vc4/nir: Use the new unified io intrinsics
- vc4: Fix load_user_clip_plane crash
- vc4: add missing src for store outputs
- vc4: Fix state uniforms
- nir/lower_clip: Update to the new load/store intrinsics
- nir/lower_two_sided_color: Update to the new load intrinsic

NIR and i965 changes are

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

NIR indirect declarations and vc4 changes are

Reviewed-by: Eric Anholt <eric@anholt.net>

ir3 changes are

Reviewed-by: Rob Clark <robdclark@gmail.com>

NIR changes are

Acked-by: Rob Clark <robdclark@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
18069dce4a4c3d71e6afc6b10bfa7bee0560ba9c 11-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Make uniform offsets be in terms of bytes

This commit pushes makes uniform offsets be terms of bytes starting with
nir_lower_io. They get converted to be in terms of vec4s or floats when we
cram them in the UNIFORM register file but reladdr remains in terms of
bytes all the way down to the point where we lower it to a pull constant
load.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
aa35b0c2c71f054f72df5a85779d0862fa7d6e4a 25-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Get rid of the nir_inputs array

It's not really buying us anything at this point. It's just a way of
remapping one offset namespace onto another. We can just use the location
namespace the whole way through.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
f36993b46962eab4446bc1964eb47149751aee26 23-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Clean up #includes in the compiler.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
d278e31459374feb18edd97d5adaacccc08f978a 18-Nov-2015 Rob Clark <robclark@freedesktop.org> util: move brw_env_var_as_boolean() to util

Kind of a handy function. And I'll want it available outside of i965
for common nir-pass helpers.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Nicolai Hähnle <nhaehnle@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
d9b8fde963a53d4e06570d8bece97f806714507a 12-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Use NIR for lowering texture swizzle

Now that nir_lower_tex can do texture swizzle lowering, we can use that
instead of repeating more-or-less the same code in both backends. This
both allows us to share code and means that things like the tg4
work-arounds are somewhat simpler because they don't have to take the
swizzle into account.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
6c8ba59cff14a1a86273f4008ff2a8e68335ab25 11-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Use nir_lower_tex for texture coordinate lowering

Previously, we had a rescale_texcoords helper in the FS backend for
handling rescaling of texture coordinates. Now that we can do variants in
NIR, we can use nir_lower_tex to do the rescaling for us. This allows us
to delete the i965-specific code and gives us proper TEXTURE_RECTANGLE and
GL_CLAMP handling in vertex and geometry shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
1417f6a216b46dbbaa1bfe0cef97e2b4a48224c0 11-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir/lower_tex: Report progress

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
ce767bbdfff7c2a7829b652c111a11eb9ddba026 11-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Move postprocess_nir to codegen time

This allows us to insert NIR passes between initial NIR compilation and
optimization (link time) and actual backend code-gen. In particular, it
will allow us to do shader variants in NIR and share some of that shader
variant code between backends.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
9cf108193b61c342c94c4cd980c4b403638e1051 11-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Split shader optimization and lowering into three stages

At the moment, brw_create_nir just calls the three stages in sequence so
there's not much difference. Soon, however, we will want to start doing
variants in NIR at which point the postprocessing step will have to move
from shader create time to codegen time.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
f58813842bcece3498f55ec5d582466ccff92a5e 15-May-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir: s/nir_type_unsigned/nir_type_uint

v2: do the same in tgsi_to_nir (Samuel)

v3: added missing cases after rebase (Iago)

v4: Add a blank space after '#' in one of the comments (Matt)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
006e4f070f08ff1e1731863940bc51de9e97b865 19-Oct-2015 Rob Clark <robdclark@gmail.com> nir: add nir_var_all enum

Otherwise, passing -1 gets you:

error: invalid conversion from 'int' to 'nir_variable_mode' [-fpermissive]

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
0bee3acc2a303b4cbbac0f6f54ffc8be79bc7470 16-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Add hooks for testing nir_shader_clone

This commit adds code for testing nir_shader_clone by running it after each
and every optimization pass and throwing away the old shader. Testing
nir_shader_clone is hidden behind a new INTEL_CLONE_NIR environment
variable.

Reviewed-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
9ff71b649b4b3808a9e17ce69743c6037fd6603c 03-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Validate that NIR passes call nir_metadata_preserve().

Failing to call nir_metadata_preserve() can have nasty consequences:
some pass breaks dominance information, but leaves it marked as valid,
causing some subsequent pass to go haywire and probably crash.

This pass adds a simple validation mechanism to ensure passes handle
this properly. We add a new bogus metadata flag that isn't used for
anything in particular, set it before each pass, and ensure it *isn't*
still set after the pass. nir_metadata_preserve will reset the flag,
so correct passes will work, and bad passes will assert fail.

(I would have made these functions static inline, but nir.h is included
in C++, so we can't bit-or enums without lots of casting...)

Thanks to Dylan Baker for the idea.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
7bc097899924f40140981567c7bb52297dd801f2 03-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Add OPT() and OPT_V() macros for invoking NIR passes.

OPT() is the normal macro for passes that return booleans, while OPT_V()
is a variant that works for passes that don't properly report progress.
(Such passes should be fixed to return a boolean, eventually.)

These macros take care of calling nir_validate_shader() and setting
progress appropriately. In the future, it would be easy to add shader
dumping similar to INTEL_DEBUG=optimizer by extending the macro.

v2 (Jason Ekstrand):
- Fix an unused variable warning

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
94ff35204dba0ddbd7f5c4342206c8acba22d32f 22-Oct-2015 Eduardo Lima Mitev <elima@igalia.com> nir/nir_opt_peephole_ffma: Move this lowering pass to the i965 driver

Because the next patch will add an optimization that is specific to i965,
we want to move this loweing pass to that driver altogether.

This is safe because i965 is the only consumer.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
8dcf807cb43383590ba193c7ff20b8a98e4a9f65 14-Oct-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Fix scalar VS float[] and vec2[] output arrays.

The scalar VS backend has never handled float[] and vec2[] outputs
correctly (my original code was broken). Outputs need to be padded
out to vec4 slots.

In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot
by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4. However,
this is wrong: type_size_scalar() for a float[2] would return 2, or
for vec2[2] it would return 4. This looked like a single slot, even
though in reality each array element would be stored in separate vec4
slots.

Because of this bug, outputs[] and output_components[] would not get
initialized for the second element's VARYING_SLOT, which meant
emit_urb_writes() would skip writing them. Nothing used those values,
and dead code elimination threw a party.

To fix this, we introduce a new type_size_vec4_times_4() function which
pads array elements correctly, but still counts in scalar components,
generating correct indices in store_output intrinsics.

Normally, varying packing avoids this problem by turning varyings into
vec4s. So this doesn't actually fix any Piglit or dEQP tests today.
However, if varying packing is disabled, things would be broken.
Tessellation shaders can't use varying packing, so this fixes various
tcs-input Piglit tests on a branch of mine.

v2: Shorten the implementation of type_size_4x to a single line (caught
by Connor Abbott), and rename it to type_size_vec4_times_4()
(renaming suggested by Jason Ekstrand). Use type_size_vec4
rather than using type_size_vec4_times_4 and then dividing by 4.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
c9541a74e4d179ad844bdf8af1e3de541c5b14c2 24-Sep-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Add scalar GS input lowering code.

We really ought to compute the VUE map at link time and stash it, rather
than recomputing it here, but with the mess of program structures I
wasn't sure where to put it. We can improve that later.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
a3d0359aff7a9be90149c416844f330b4f9a15ed 26-Oct-2015 Timothy Arceri <timothy.arceri@collabora.com> glsl: keep track of intra-stage indices for atomics

This is more optimal as it means we no longer have to upload the same set
of ABO surfaces to all stages in the program.

This also fixes a bug where since commit c0cd5b var->data.binding was
being used as a replacement for atomic buffer index, but they don't have
to be the same value they just happened to end up the same when binding is 0.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Alejandro Piñeiro <apinheiro@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90175
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
dbac0a6352053bd6106feff88d95b0fd38b82afe 16-Oct-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Switch on shader stage in nir_lower_outputs().

VS, GS, and FS continue doing the same thing they did before. We can
simplify the FS code a bit because it is always scalar.

Compute shaders now assert that there are no outputs instead of doing
a loop over 0 outputs.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
0d1eef536bc744f5c4dcdf854ad6adfdfe4f4dcb 14-Oct-2015 Jordan Justen <jordan.l.justen@intel.com> i965/fs: Ignore compute shaders in brw_nir_lower_inputs

The commit shown below caused compute shaders to hit the unreachable
in the default of the switch block. Since compute shaders don't have
any inputs, we can make brw_nir_lower_inputs a no-op for CS.

commit 2953c3d76178d7589947e6ea1dbd902b7b02b3d4
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Fri Aug 14 15:15:11 2015 -0700

i965/vs: Map scalar VS input locations properly; avoid tons of MOVs.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
63728dac57c18df0f45bb2482f60188fac2d1efe 13-Oct-2015 Jordan Justen <jordan.l.justen@intel.com> i965/fs: Simplify FS in brw_nir_lower_inputs to only support scalar mode

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
bd198b9f0a292a9ff4ffffec3a29bad23d62caba 15-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Simplify fs_visitor's ATTR file.

Previously, ATTR was indexed by VERT_ATTRIB_* slots; at the end of
compilation, assign_vs_urb_setup() translated those into GRF units,
and converted ATTR to HW_REGs.

This patch moves the transslation earlier, making ATTR work in terms of
GRF units from the beginning. assign_vs_urb_setup() simply has to add
the number of payload registers and push constants to obtain the final
hardware GRF number. (We can't do this earlier as those values aren't
known.)

ATTR still supports reg_offset; however, it's simply added to reg.
It's not clear whether this is valuable or not.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
2953c3d76178d7589947e6ea1dbd902b7b02b3d4 15-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Map scalar VS input locations properly; avoid tons of MOVs.

Previously, we used nir_lower_io with the scalar type_size function,
which mapped VERT_ATTRIB_* locations to...some numbers. Then, in
fs_visitor::nir_setup_inputs(), we created temporaries indexed by
those numbers, and emitted MOVs from the actual ATTR registers to
those temporaries. Virtually all of these were copy propagated away,
but it's still ugly.

This patch reworks our input lowering to produce NIR lower_input
intrinsics that properly index into the ATTR file, so we can access
it directly.

No changes in shader-db.

v2: Fix unreachable() message (Ken), update commit message (Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
5d7f8cb5a511977e256e773716fac3415d01443e 01-Oct-2015 Kenneth Graunke <kenneth@whitecape.org> nir: Introduce new nir_intrinsic_load_per_vertex_input intrinsics.

Geometry and tessellation shaders process multiple vertices; their
inputs are arrays indexed by the vertex number. While GLSL makes
this look like a normal array, it can be very different behind the
scenes.

On Intel hardware, all inputs for a particular vertex are stored
together - as if they were grouped into a single struct. This means
that consecutive elements of these top-level arrays are not contiguous.
In fact, they may sometimes be in completely disjoint memory segments.

NIR's existing load_input intrinsics are awkward for this case, as they
distill everything down to a single offset. We'd much rather keep the
vertex ID separate, but build up an offset as normal beyond that.

This patch introduces new nir_intrinsic_load_per_vertex_input
intrinsics to handle this case. They work like ordinary load_input
intrinsics, but have an extra source (src[0]) which represents the
outermost array index.

v2: Rebase on earlier refactors.
v3: Use ssa defs instead of nir_srcs, rebase on earlier refactors.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
1e3c1b107e075b210998998423901092b8fcd79b 03-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Use nir_foreach_variable

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
ca941799ce76eac8afe2503fbacffee057e949d3 03-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/nir: Remove the prog parameter from brw_nir_lower_inputs

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
cd1ae6ebfac22f76d26a5b8659423969b2aeddce 06-Aug-2015 Jason Ekstrand <jason.ekstrand@intel.com> nir/glsl: Take a gl_shader_program and a stage rather than a gl_shader

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
193d29516ddb76f469fea17119493e2b685bc6b7 26-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Refactor input/output lowering setup into helpers.

The code for input lowering is going to get significantly more
complicated shortly, so I wanted to pull it out. Vertex shader inputs
are handled nearly identically regardless of vec4/scalar mode, so I
opted to not split that.

I thought about having each function actually do the lowering, but one
pass through nir_lower_io that handles all types (which weren't handled
earlier) is probably more efficient.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
39a1d36a67974dd9fc3c0d834d6a117cdfed8f33 13-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> nir: Allow nir_lower_io() to only lower one type of variable.

We may want to use different type_size functions for (e.g.) inputs
vs. uniforms. Passing in -1 for mode ignores this, handling all
modes as before.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
df31c1850d14729e27513ae733110a668f6b6e95 05-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965/gs: Use new NIR intrinsics.

By performing the vertex counting in NIR, we're able to elide a ton of
useless safety checks around every EmitVertex() call:

total instructions in shared programs: 3952 -> 3720 (-5.87%)
instructions in affected programs: 3491 -> 3259 (-6.65%)
helped: 11
HURT: 0

Improves performance in Gl32GSCloth by 0.671742% +/- 0.142202% (n=621)
on Haswell GT3e at 1024x768.

This should also make it easier to implement Broadwell's "Static Vertex
Count" feature someday.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
faf5f174ddbc7680f6947ceababb94fdb552bcdb 16-Sep-2015 Rob Clark <robclark@freedesktop.org> nir/lower_tex: support projector lowering per sampler type

Some hardware, such as adreno a3xx, supports txp on some but not all
sampler types. In this case we want more fine grained control over
which texture projectors get lowered.

v2: split out nir_lower_tex_options struct to make it easier to
add the additional parameters coming in the following patches

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
d9b9ff76f17ee36b87b2722fa2a19e1d9f036c26 17-Sep-2015 Rob Clark <robclark@freedesktop.org> nir: rename nir_lower_tex_projector

Since the following patches will add additional tex-lowering related
functionality, which doesn't make sense to split out into a separate
pass (as they would require duplication of the projector lowering
logic), let's give this pass a more generic name.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
fc11dbe13f3470ff2a4cb91c6b063db2456664da 09-Sep-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Use nir_move_vec_src_uses_to_dest

The idea here is not that it gives register coalescing a little bit of a
helping hand. It doesn't actually fix the coalescing problems, but it
seems to help a good bit.

Shader-db results for vec4 programs on Haswell:

total instructions in shared programs: 1746280 -> 1683959 (-3.57%)
instructions in affected programs: 1259166 -> 1196845 (-4.95%)
helped: 11363
HURT: 148

v2 (Jason Ekstrand):
- Run nir_move_vec_src_uses_to_dest after going out of SSA
- New shader-db numbers

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
c951bb83056724df02ba7e6fe2dfa720c0f45c1f 09-Sep-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4_nir: Use partial SSA form rather than full non-SSA

We made this switch in the FS backend some time ago and it seems to make a
number of things a bit easier. In particular, supporting SSA values takes
very little work in the backend and allows us to take advantage of the
majority of the SSA information even after we've gotten rid of Phi nodes.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
1484d8c9aa2e7e78462ffb5c207394bef77af89b 01-May-2015 Connor Abbott <cwabbott0@gmail.com> i965/nir: enable the dead control flow optimization

total instructions in shared programs: 7541551 -> 7541381 (-0.00%)
instructions in affected programs: 3054 -> 2884 (-5.57%)
helped: 29

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
86c57ebe0ed1acc98545746058862db7429412da 21-Aug-2015 Boyan Ding <boyan.j.ding@gmail.com> i965/nir: Make use of nir_opt_undef

Shader-db result on Ivy Bridge:
total instructions in shared programs: 145484 -> 145445 (-0.03%)
instructions in affected programs: 225 -> 186 (-17.33%)
helped: 5
HURT: 0

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
5f14c417c86ced1847746c64d4db54c7e5ddc187 18-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> nir: Use nir_shader::stage rather than passing it around.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
259f7291de2387aa3ac5f856b39b7b934a1d8e7d 18-Aug-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Rework uniform handling

Previously, we treated the entire UNIFORM file as if it had two elements:
One for direct things and one for indirect. This is substantially
different from how the old visitor code handled it where each element was
effectively its own uniform. This commit makes the NIR path more like the
old ir_visitor path where each uniform is separate. This should allow us
to more easily make decisions about what to push.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
6c33d6bbf9b54784e4498a81c73b712dca5dd737 12-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> nir: Pass a type_size() function pointer into nir_lower_io().

Previously, there were four type_size() functions in play - the i965
compiler backend defined scalar and vec4 type_size() functions, and
nir_lower_io contained its own similar functions.

In fact, the i965 driver used nir_lower_io() and then looped over the
components using its own type_size - meaning both were in play. The
two are /basically/ the same, but not exactly in obscure cases like
subroutines and images.

This patch removes nir_lower_io's functions, and instead makes the
driver supply a function pointer. This gives the driver ultimate
flexibility in deciding how it wants to count things, reduces code
duplication, and improves consistency.

v2 (Jason Ekstrand):
- One side-effect of passing in a function pointer is that nir_lower_io is
now aware of and properly allocates space for image uniforms, allowing
us to drop hacks in the backend

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
da1b1bf85cdc691ec27f379de84dec495cdd51e0 15-Jul-2015 Iago Toral Quiroga <itoral@igalia.com> i965/nir: Do not scalarize phis in non-scalar setups

Significantly reduces register pressure in some piglit tests.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
db8a6de571bb72ef43209a415e5492001a87b1d8 17-Jun-2015 Eduardo Lima Mitev <elima@igalia.com> i965/nir: Add new utility method brw_glsl_base_type_for_nir_type()

This method returns the glsl_base_type corresponding to a nir_alu_type.
It will factorize code currently present in fs_nir, that can be reused
in vec4_nir on its upcoming emit_texture support.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
e4f02f47e70d384531ac68e6d33a62fdcdbd1f28 16-Jun-2015 Antia Puentes <apuentes@igalia.com> i965/nir/vec4: Lower "vecN" instructions and mark them unreachable

This enables NIR pass "lower_vec_to_movs" on shaders that work on vec4.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
9e5d827f455f3c72af6cb8d60b97890bab8d5ad0 25-Jun-2015 Alejandro Piñeiro <apinheiro@igalia.com> i965/nir: Disable alu_to_scalar pass on non-scalar shaders

Disables nir_lower_alu_to_scalar when the shader stage being processed work
on vec4 vectors, like the upcoming NIR->vec4 backend.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
662c4c99065381b8e265310d176cfdef6698ca57 16-Jun-2015 Eduardo Lima Mitev <elima@igalia.com> i965/nir/vec4: Implement store_output intrinsic

This implementation is based on the current URB setup in vec4_visitor, which
requires the output register to be stored in the output_reg array at variable's
original shader location index. But since nir_lower_io() pass uses the value
in var->data.driver_location, we need to put there var->data.location instead,
prior to calling nir_lower_io(), so that we end up with the correct index
in const_index[0].

The driver_location is not used at all, so this patch also disables the
nir_assign_var_locations pass on non-scalar shaders.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
97e205fd35bf77fd761caf24c611ff72cc0d85e2 17-Apr-2015 Eduardo Lima Mitev <elima@igalia.com> i965/nir: Move brw_type_for_nir_type() to brw_nir to allow reuse

Upcoming NIR->vec4 pass can benefit from this method, so lets move it up.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
f7152525374015594e037fa11bb64e1c7174829b 01-Jul-2015 Eduardo Lima Mitev <elima@igalia.com> i965/nir/vec4: Implement load_const intrinsic

Similar to fs_nir backend, a nir_local_values map will be filled with
newly allocated registers as the load_const instrinsic instructions are
processed. Later, get_nir_src() will fetch the registers from this map
for sources that are ssa.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
6e58fc56a5a396020cd299db11895120ec3da520 03-Jul-2015 Iago Toral Quiroga <itoral@igalia.com> i965/nir: Dot not assign direct uniform locations first for vec4-based shaders

In the vec4 backend we want uniform locations to be assigned consecutively
since that way the offsets produced by nir_lower_io are exactly what we
need to implement nir_intrinsic_load_uniform. Otherwise we would need a
mapping to match the output of nir_lower_io to the actual uniform registers
we need to use.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
01f6235020f9f0c2bc1a6e6ea9bd15c22fb2bcf5 18-Jun-2015 Iago Toral Quiroga <itoral@igalia.com> nir/nir_lower_io: Add vec4 support

The current implementation operates in scalar mode only, so add a vec4
mode where types are padded to vec4 sizes.

This will be useful in the i965 driver for its vec4 nir backend
(and possbly other drivers that have vec4-based shaders).

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
5e839727ed2378a01d3b657bad83abd4728e8da6 22-Jul-2015 Eduardo Lima Mitev <elima@igalia.com> i965/nir: Pass a is_scalar boolean to brw_create_nir()

The upcoming introduction of NIR->vec4 pass will require that some NIR
lowering passes are enabled/disabled depending on the type of shader
(scalar vs. vector).

With this patch we pass a 'is_scalar' variable to the process of
constructing the NIR, to let an external context decide how the shader
should be handled.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
864907e2f14523c130e6ff24c081789bb079bae1 24-Jun-2015 Connor Abbott <cwabbott0@gmail.com> i965/fs: use SSA values directly

Before, we would use registers, but set a magical "parent_instr" field
to indicate that it was actually purely an SSA value (i.e., it wasn't
involved in any phi nodes). Instead, just use SSA values directly, which
lets us get rid of the hack and reduces memory usage since we're not
allocating a nir_register for every value. It also makes our handling of
load_const more consistent compared to the other instructions.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
2b1a1d8b1294f91b7ac563da1f395deba4384765 24-Jun-2015 Connor Abbott <cwabbott0@gmail.com> nir/from_ssa: add a flag to not convert everything from SSA

We already don't convert constants out of SSA, and in our backend we'd
like to have only one way of saying something is still in SSA.

The one tricky part about this is that we may now leave some undef
instructions around if they aren't part of a phi-web, so we have to be
more careful about deleting them.

v2: rename and flip meaning of flag (Jason)

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
c8b8e8b29b755cd3d80fc5e470f441cb3716152a 22-Jun-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Don't count NIR instructions for shader-db.

Matt, Jason, and I haven't found this useful in a long time.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
f4310cdbd08f20276237fbefa3eba406aa109636 10-Jun-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Re-index SSA definitions before printing NIR code.

This makes the SSA definitions use sequential numbers (0, 1, 2, ...)
instead of seemingly random ones. There's not much point normally,
but it makes debug output much easier to read.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
58aed1031d40e62c9f41f7c512b3165dd5913d1e 20-May-2015 Jason Ekstrand <jason.ekstrand@intel.com> prog_to_nir: Use a variable for uniform data

Previously, the prog_to_nir pass was directly generating uniform load/store
intrinsics. This converts it to use a single giant "parameters" variable
and we now depend on lowering to get the uniform load/store intrinsics.
One advantage of this is that we now have one code-path after we do the
initial conversion into NIR.

No shader-db changes.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c
89c1feb78d010bc457f5d02be84c955eebf3549f 08-Apr-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Create NIR during LinkShader() and ProgramStringNotify().

Previously, we translated into NIR and did all the optimizations and
lowering as part of running fs_visitor. This meant that we did all of
that work twice for fragment shaders - once for SIMD8, and again for
SIMD16. We also had to redo it every time we hit a state based
recompile.

We now generate NIR once at link time. ARB programs don't have linking,
so we instead generate it at ProgramStringNotify time.

Mesa's fixed function vertex program handling doesn't bother to inform
the driver about new programs at all (which is rather mean), so we
generate NIR at the last minute, if it hasn't happened already.

shader-db runs ~9.4% faster on my i7-5600U, with a release build.

v2: Check NirOptions != NULL in ProgramStringNotify(). Don't bother
using _mesa_program_enum_to_shader_stage as we already know it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_nir.c