History log of /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
56ee2df4bf9b1e8c26cf8689f5ef20237c95466b 13-Jan-2017 Juan A. Suarez Romero <jasuarez@igalia.com> i965/vec4: Fix mapping attributes

This patch reverts 57bab6708f2bbc1ab8a3d202e9a467963596d462, which was
causing issues with ILK and earlier VS programs.

1. brw_nir.c: Revert "i965/vec4/nir: vec4 also needs to remap vs attributes"

Do not perform a remap in vec4 backend. Rather, do it later when
setup attributes

2. brw_vec4.cpp: This fixes mapping ATTRx to proper GRFn.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99391
[jordan.l.justen@intel.com: merge Juan's two patches from bugzilla]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
58fdb85f0f413d1a144d4beb6519da59bc52c974 21-Apr-2016 Alejandro Piñeiro <apinheiro@igalia.com> i965/vec4: take into account doubles when creating attribute mapping

Doubles needs more that one slot per attribute. So when filling the
attribute_map we check if it is a double in order to allocate one
extra register.

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f8310189f4a31c443657cd0c1aef35db02b86c95 21-Apr-2016 Alejandro Piñeiro <apinheiro@igalia.com> i965/vec4: use attribute slots for first non payload GRF

As part of the payload setup, setup_attributes is called with the first
GRF that can be used for the attributes (first ones are used for
uniforms for example) and returns the first GRF that is not part of the
payload. Before this patch, it adds directly the number of attributes.
But as with 64-bit attributes can consume more than one slot, that is
not valid anymore. This patch change the addition to use the number of
slots consumed.

gen >= 8 would not be affected, as they use the scalar mode. For that
case, the vs configuration is done at fs_visitor::assign_vs_urb_setup.

v2: add explanation in commit log (Jordan)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c2acf97fcc9b32eaa9778771282758e5652a8ad4 16-Dec-2016 Juan A. Suarez Romero <jasuarez@igalia.com> nir/i965: use two slots from inputs_read for dvec3/dvec4 vertex input attributes

So far, input_reads was a bitmap tracking which vertex input locations
were being used.

In OpenGL, an attribute bigger than a vec4 (like a dvec3 or dvec4)
consumes just one location, any other small attribute. So we mark the
proper bit in inputs_read, and also the same bit in double_inputs_read
if the attribute is a dvec3/dvec4.

But in Vulkan, this is slightly different: a dvec3/dvec4 attribute
consumes two locations, not just one. And hence two bits would be marked
in inputs_read for the same vertex input attribute.

To avoid handling two different situations in NIR, we just choose the
latest one: in OpenGL, when creating NIR from GLSL/IR, any dvec3/dvec4
vertex input attribute is marked with two bits in the inputs_read bitmap
(and also in the double_inputs_read), and following attributes are
adjusted accordingly.

As example, if in our GLSL/IR shader we have three attributes:

layout(location = 0) vec3 attr0;
layout(location = 1) dvec4 attr1;
layout(location = 2) dvec3 attr2;

then in our NIR shader we put attr0 in location 0, attr1 in locations 1
and 2, and attr2 in location 3 and 4.

Checking carefully, basically we are using slots rather than locations
in NIR.

When emitting the vertices, we do a inverse map to know the
corresponding location for each slot.

v2 (Jason):
- use two slots from inputs_read for dvec3/dvec4 NIR from GLSL/IR.

v3 (Jason):
- Fix commit log error.
- Use ladder ifs and fix braces.
- elements_double is divisible by 2, don't need DIV_ROUND_UP().
- Use if ladder instead of a switch.
- Add comment about hardware restriction in 64bit vertex attributes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7c6b714cd0fe06044c9a810186f5ce3690152574 05-Jan-2017 Kenneth Graunke <kenneth@whitecape.org> i965: Print VS output VUE map in Vulkan too.

We need to move this to the shared layer.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c762809e49daf61fc986721006ce6a520e6e735f 01-Sep-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: run scalarize_df() after spilling

Spilling of 64-bit data requires data shuffling for the corresponding
scratch read/write messages. This produces unsupported swizzle regions
and writemasks that we need to scalarize.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
73610384a8357287cef64434c789ff03c2f6f37a 01-Sep-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: prevent src/dst hazards during 64-bit register allocation

8-wide compressed DF operations are executed as two separate 4-wide
DF operations. In that scenario, we have to be careful when we allocate
register space for their operands to prevent the case where the first
half of the instruction overwrites the source of the second half.

To do this we mark compressed instructions as having hazards to make
sure that ther register allocators assigns a register regions for the
destination that does not overlap with the region assigned for any
of its source operands.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2b57adad0056273e38d9a9736cd98be95c0deb07 18-Aug-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4/scalarize_df: support more swizzles via vstride=0

By exploiting gen7's hardware decompression bug with vstride=0 we gain the
capacity to support additional swizzle combinations.

This also fixes ZW writes from X/Y channels like in:

mov r2.z:df r0.xxxx:df

Because DF regions use 2-wide rows with a vstride of 2, the region generated
for the source would be r0<2,2,1>.xyxy:DF, which is equivalent to r0.xxzz, so
we end up writing r0.z in r2.z instead of r0.x. Using a vertical stride of 0
in these cases we get to replicate the XX swizzle and write what we want.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c3edacaa288ae01c0f37e645737feeeb48f2c3f2 19-Jul-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4/scalarize_df: do not scalarize swizzles that we can support natively

Certain swizzles like XYZW can be supported by translating only the first two
64-bit swizzle channels to 32-bit channels. This happens with swizzles such
that the first two logical components, when translated to 32-bit channels and
replicated across the second dvec2 row, select the same channels specified by
the 3rd and 4th logical swizzle components.

Notice that this opens up the possibility that some instructions are not
scalarized and can end up with XY or ZW 32-bit writemasks. Make sure we always
scalarize in such cases.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2f0bc54e2bf6c7d218f30acc88f5cb94bd6214f7 01-Jul-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: split instructions that read 64-bit interleaved attributes

Stages that use interleaved attributes generate regions with a vstride=0
that can hit the gen7 hardware decompression bug.

v2:
- Make static the function and fix indent (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0579c85e5ca7d406cad42db7c1501d6b1fb9696b 01-Jul-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: dump subnr for FIXED_GRF

This came in handy when debugging the payload setup for Tess Eval,
since it prints correct subnr for attributes that can be loaded
in the second half of a register.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5fe8d567d8dadeb2b77addd73762f6bde4acfac2 06-Oct-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: fix attribute setup for doubles

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6a01259d8a13aace16e4f1ce9e09e0e41bd52273 06-Oct-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: fix indentation in lower_attributes_to_hw_regs()

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ae400e38d90ea2fddf1b050ff94f52bdec94e150 15-Jul-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: make emit_pull_constant_load support 64-bit loads

This way callers don't need to know about 64-bit particularities and
we reuse some code.

v2:
- use byte_offset() instead of offset()
- only mark the surface as used once

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
df6e3aa6ae23346bad59d071d340a67be0e2a2c5 13-Jul-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: fix move_push_constants_to_pull_constants() for 64-bit data

v2: adapt to changes in offset()

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
eee2c0d7854e55e92e0e72eb0fb94ab83d702754 13-Jul-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: fix indentation in move_push_constants_to_pull_constants()

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
07bc6a35d3d6d94d45b81bd10002f0e420d855c2 28-Jun-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: do not split scratch read/write opcodes

64-bit scratch read/writes require to shuffle data around so we need
to have access to the full 64-bit data. We will do the right thing
for these when we emit the messages.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2a857104e41167cef3c6a5132a45c88056c75dff 23-Jun-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: Do not use DepCtrl with 64-bit instructions

The BDW PRM says that it is not supported, but it seems that gen7 is also
affected, since doing DepCtrl on double-float instructions leads to
GPU hangs in some cases, which is probably not surprising knowing that
this is not supported in new hardware iterations. The SKL PRMs do not
mention this restriction, so it is probably fine.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
506154f704dcb9185dadcd655fd6d0603916ea97 23-Jun-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8 platforms

v2:
- Add Broxton as Intel's internal PRMs says that it is needed (Matt).

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b3a7d0ee9d5f792ab68fbe77da5e3ea85d4bc4c0 08-Jun-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: Lower 64-bit MAD

The previous patch made sure that we do not generate MAD instructions
for any NIR's 64-bit ffma, but there is nothing preventing i965 from
producing MAD instructions as a result of lowerings or optimization
passes. This patch makes sure that any 64-bit MAD produced inside the
driver after translating from NIR is also converted to MUL+ADD before
we generate code.

v2:
- Use a copy constructor to copy all relevant instruction fields from
the original mad into the add and mul instructions

v3:
- Rename the lowering and fix commit log (Matt)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
83dcd146020f5e54d1e0a46c585ed672e75abaa0 01-Jun-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands

We make scalar sources in 3src instructions use subnr instead of
swizzles because they don't really use swizzles.

With doubles it is more complicated because we use vstride=0 in
more scenarios in which they don't produce scalar regions. Also
RepCtrl=1 is not allowed with 64-bit operands, so we should avoid
this.

v2: Fix typo (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
49be3abbe7afd64f9e3435e9a9e341e30acacb52 30-May-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: fix indentation in pack_uniform_registers

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bdf5498c6b7870c14139279e76f1e4b281bed2cd 29-Jun-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: fix pack_uniform_registers for doubles

We need to consider the fact that dvec3/4 require two vec4 slots.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
23278a75ce06c3c083892b2a20d9efdf794167d6 18-Jul-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: teach register coalescing about 64-bit

Specifically, at least for now, we don't want to deal with the fact that
channel sizes for fp64 instructions are twice the size, so prevent
coalescing from instructions with a different type size.

Also, we should check that if we are coalescing a register from another
MOV we should be writing the same amount of data in both operations, otherwise
we end up wiring more or less than the original instruction. This can happen,
for example, when we have split fp64 MOVs with an exec size of 4 that only
write one register each and then a MOV with exec size of 8 that reads both.
We want to avoid the pass to think that it can coalesce from the first split
MOV alone. Ideally we would like the pass to see that it can coalesce from both
split MOVs instead, but for now we keep it simple.

Finally, the pass doesn't support coalescing of multiple registers but in the
case of normal SIMD4x2 double-precision instructions they naturally write two
registers (one per vertex) and there is no reason why we should not allow
coalescing in this case. Change the restriction to bail if we see instructions
that write more than 8 channels, where the channels can be 32-bit or 64-bit.

v2:
- Make sure that scan_inst and inst write the same amount of data.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ac5a06ff83c32ab14e01e526e729b2fbfe3a2426 18-Jul-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: implement access to DF source components Z/W

The general idea is that with 32-bit swizzles we cannot address DF
components Z/W directly, so instead we select the region that starts
at the the 16B offset into the register and use X/Y swizzles.

The above, however, has the caveat that we can't do that without
violating register region restrictions unless we probably do some
sort of SIMD splitting.

Alternatively, we can accomplish what we need without SIMD splitting
by exploiting the gen7 hardware decompression bug for instructions
with a vstride=0. For example, an instruction like this:

mov(8) r2.x:DF r0.2<0>xyzw:DF

Activates the hardware bug and produces this region:

Component: x0 y0 z0 w0 x1 y1 z1 w1
Register: r0.2 r0.3 r0.2 r0.3 r1.2 r1.3 r1.2 r1.3

Where r0.2 and r0.3 are r0.z:DF for the first vertex of the SIMD4x2
execution and r1.2 and r1.3 are the same for the second vertex.

Using this to our advantage we can select r0.z:DF by doing
r0.2<0,2,1>.xyxy and r0.w by doing r0.2<0,2,1>.zwzw without needing
to split the instruction.

Of course, this only works for gen7, but that is the only hardware
platform were we implement align16/fp64 at the moment.

v2: Adapted to the fact that we now do this after converting to
hardware registers (Iago)

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e238601a2da9512c0fd263e8378f30498a0a1507 24-May-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: translate 64-bit swizzles to 32-bit

The hardware can only operate with 32-bit swizzles, which is a rather
limiting restriction. However, the idea is not to expose this to the
optimization passes, which would be a mess to deal with. Instead, we let
the bulk of the vec4 backend ignore this fact and we fix the swizzles right
at codegen time.

At the moment the pass only needs to handle single value swizzles thanks to
the scalarization pass that runs before it.

Notice that this only works for X/Y swizzles. We will add support for Z/W
swizzles in the next patch, since they need a bit more work.

v2 (Sam):
- Do not expand swizzle of 64-bit immediate values.

v3:
- Do this after translation to hardware registers instead of doing it right
before so we don't need the force_vstride0 flag (Curro).
- Squashed patch that included FIXED_GRF in the list of register files that
need this translation (Iago).
- Remove swizzle assignments for VGRF and UNIFORM files in
convert_to_hw_regs(), they will be set by apply_logical_swizzle() (Iago).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
fb7cb853c964db44ab99c1592e1ef7dec2f0c25b 24-May-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: add a scalarization pass for double-precision instructions

The hardware only supports 32-bit swizzles, which means that we can
only access directly channels XY of a DF making access to channels ZW
more difficult, specially considering the various regioning restrictions
imposed by the hardware. The combination of both things makes handling
ramdom swizzles on DF operands rather difficult, as there are many
combinations that can't be represented at all, at least not without
some work and some level of instruction splitting depending on the case.

Writemasks are 64-bit in general, however XY and ZW writemasks also work
in 32-bit, which means these writemasks can't be represented natively,
adding to the complexity.

For now, we decided to try and simplify things as much as possible to
avoid dealing with all this from the get go by adding a scalarization
pass that runs after the main optimization loop. By fully scalarizing
DF instructions in align16 we avoid most of the complexity introduced
by the aforementioned hardware restrictions and we have an easier path
to an initial fully functional version for the vector backend in Haswell
and IvyBridge.

Later, we can improve the implementation so we don't necessarily
scalarize everything, iteratively adding more complexity and building
on top of a framework that is already working. Curro drafted some ideas
for how this could be done here:
https://bugs.freedesktop.org/show_bug.cgi?id=92760#c82

v2:
- Use a copy constructor for the scalar instructions so we copy all
relevant instructions fields from the original instruction.

v3: Fix indention in one switch (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f4b8649233fa10e89205b6b5f6f334279b198f22 17-Jun-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: split double-precision SEL

There is a hardware bug affecting compressed double-precision SEL
instructions in align16 mode by which they won't read predication mask
properly. The bug does not affect other predicated instructions
and it does not affect SEL in Align1 mode either. This was found
empirically and verified by Curro in the simulator.

Fix this by splitting double-precision SEL in Align16 mode to use an
execution size of 4.

v2: Check that the dst type is 64-bit, since we can have 16-wide single
precision bcsel instructions that also write 2 registers.

v3: Replace bcsel by SEL in all the comments as bcsel is the nir opcode
but SEL is the actual assembly instruction (Matt).

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a83608f50483ac397545d3815bfe8dc3be5126b6 29-Jun-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: dump NibCtrl for instructions with execsize != 8

v2: do it in the same fashion as the FS backend for consistency (Curro)

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
58767f0fec7809c3408adbc4d147dd56f2ee3d4d 29-Aug-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: add a SIMD lowering pass

Generally, instructions in Align16 mode only ever write to a single
register and don't need any form of SIMD splitting, that's why we
have never had a SIMD splitting pass in the vec4 backend. However,
double-precision instructions typically write 2 registers and in
some cases they run into certain hardware bugs and limitations
that we need to work around by splitting the instructions so we only
write to 1 register at a time. This patch implements a SIMD splitting
pass similar to the one in the scalar backend.

Because we only use double-precision instructions in Align16 mode
in gen7 (gen8+ is fully scalar and gens < 7 do not implement fp64)
the pass should be a no-op on any other generation.

For now the pass only handles the gen7 restriction where any
instruction that writes 2 registers also needs to read 2 registers.
This affects double-precision instructions reading uniforms, for
example. Later patches will extend the lowering pass adding a few
more cases.

v2:
- Move the simd lowering pass after the main optimization loop and
run copy-propagation and dce if it reports progress (Curro)
- Compute number of registers written instead of fixing it to 1 (Iago)
- Use group from backend_instruction (Iago)
- Drop assertion that checked that we only split 8-wide instructions
into 4-wide. (Curro)
- Don't assume that instructions can only be 8-wide, we might want
to use 16-wide instructions in the future too (Curro)
- Wrap gen7 workarounds in a conditional to ease adding workarounds
for other gens in the future (Curro)
- Handle dst/src overlap hazard (Curro)
- Use the horiz_offset() helper to simplify the implementation (Curro)
- Drop the assertion that checks that each split instruction writes
exactly one register (Curro)
- Use the copy constructor to generate split instructions with all
the relevant fields initialized to the values in the original
instruction instead of copying only a handful of them manually (Curro)

v3 (Iago):
- When copying to a temporary, allocate the number of registers required
for the copy based on the size written of the lowered instruction
instead of assuming that all lowered instructions produce single-register
writes
- Adapt to changes in offset()

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4ea3bf8ebb56c8db6e885a77d81502a0b2adca4f 10-Jun-2016 Juan A. Suarez Romero <jasuarez@igalia.com> i965/vec4: handle 32 and 64 bit channels in liveness analysis

Our current data flow analysis does not take into account that channels
on 64-bit operands are 64-bit. This is a problem when the same register
is accessed using both 64-bit and 32-bit channels. This is very common
in operations where we need to access 64-bit data in 32-bit chunks,
such as the double packing and packing operations.

This patch changes the analysis by checking the bits that each source
or destination datatype needs. Actually, rather than bits, we use
blocks of 32bits, which is the minimum channel size.

Because a vgrf can contain a dvec4 (256 bits), we reserve 8
32-bit blocks to map the channels.

v2 (Curro):
- Simplify code by making the var_from_reg helpers take an extra
argument with the register component we want.
- Fix a couple of cases where we had to update the code to the new
way of representing live variables.

v3:
- Fix indent in multiline expressions (Matt)
- Fix comment's closing tag (Matt)
- Use DIV_ROUND_UP(inst->size_written, 16) instead of 2 * regs_written(inst)
to avoid rounding issues. The same for regs_read(i). (Curro).
- Add asserts in var_from_reg() to avoid exceeding the allocated
registers (Curro).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
29dd5cf9d64ac998cb313db8a908272a6154ec46 30-May-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: dump the instruction execution size

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f79547840a1951dbf82c7b6629935c6e89020e27 30-May-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: fix regs_read() for doubles

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c722a8e61ebc72d7d21c2bed0f623218d739fdb7 17-Feb-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: Rename DF to/from F generator opcodes

The opcodes are not specific for conversions to/from float since we need
the same for conversions to/from other 32-bit types. Rename the opcodes
accordingly and change the asserts to check the size of the types involved
instead.

v2:
- Rename to VEC4_OPCODE_TO_DOUBLE and VEC4_OPCODE_FROM_DOUBLE (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
21cf6f14d5abdf7d0f9641404387e0c00de6f56f 12-Feb-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: make opt_vector_float ignore doubles

The pass does not support doubles in its current form.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
54b998e0e488189307d2614fe56a3b78b442d316 17-Jun-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: add VEC4_OPCODE_SET_{LOW,HIGH}_32BIT opcodes

These opcodes will set the low/high 32-bit in each 64-bit data element
using Align1 mode. We will use this to implement packDouble2x32.

We use Align1 mode because in order to implement this in Align16 mode
we would need to use 32-bit logical swizzles (XZ for low, YW for high),
but the IR works in terms of 64-bit logical swizzles for DF operands
all the way up to codegen.

v2:
- use suboffset() instead of get_element_ud()
- no need to set the width on the dst

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6979e5a41241993b9e7bedea80f29fb43d96aa47 31-May-2016 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: add VEC4_OPCODE_PICK_{LOW,HIGH}_32BIT opcodes

These opcodes will pick the low/high 32-bit in each 64-bit data element
using Align1 mode. We will use this, for example, to do things like
unpackDouble2x32.

We use Align1 mode because in order to implement this in Align16 mode
we would need to use 32-bit logical swizzles (XZ for low, YW for high),
but the IR works in terms of 64-bit logical swizzles for DF operands
all the way up to codegen.

v2:
- use suboffset() instead of get_element_ud()
- no need to set the width on the dst

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c35fa7ac5507a64943aa518b2dac8bddfdc9e14b 18-Nov-2015 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: set correct register regions for 32-bit and 64-bit

For 32-bit instructions we want to use <4,4,1> regions for VGRF
sources so we should really set a width of 4 (we were setting 8).

For 64-bit instructions we want to use a width of 2 because the
hardware uses 32-bit swizzles, meaning that we can only address 2
consecutive 64-bit components in a row. Also, Curro suggested that
the hardware is probably fixing the width to 2 for 64-bit instructions
anyway, so just go with that and use <2,2,1>.

v2:
- No need to explicitly set the vertical stride of 64-bit regions to 2,
brw_vecn_grf with a width of 2 will do that for us.
- No need to adjust the width of dst registers.

v3 (Ian):
- Make type_size and width const.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
558f27953101c438747c3e9d3c3f98ce21e79007 14-Aug-2015 Iago Toral Quiroga <itoral@igalia.com> i965/vec4: add double/float conversion pseudo-opcodes

These need to be emitted as align1 MOV's, since they need to have a
stride of 2 on the float register (whether src or dest) so that data
from another thread doesn't cross the middle of a SIMD8 register.

v2 (Iago):
- The float-to-double needs to align 32-bit data to 64-bit before doing the
conversion. This was doable in align16 when we tried to use an execsize
of 4, but with an execsize of 8 we would need another align1 opcode to do
that (since we need data to cross the middle of a SIMD register). Just
making the opcode handle this internally seems more practical that adding
another opcode just for this purpose and having the caller know about this
before converting.
- The double-to-float conversion produces 32-bit elements aligned to 64-bit
so we make the opcode re-pack the result to 32-bit and fit in one register,
as expected by SIMD4x2 operation. This still requires that callers reserve
two registers for the float data destination because we need to produce
64-bit aligned data first, and repack it later on the same destination
register, but it saves the need for a re-pack opcode only to achieve this
making the operation complete in a single opcode. Hopefully that is worth
the weirdness of the double register allocation...

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2d6eee3144ce16b39909522be466bdb3871f4c1b 13-Aug-2015 Connor Abbott <connor.w.abbott@intel.com> i965/vec4: add support for printing DF immediates

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e729504fb1799c3ae31cea76d73946530ef9806f 14-Sep-2016 Timothy Arceri <timothy.arceri@collabora.com> nir: pass compiler rather than devinfo to functions that call nir_optimize

Later we will pass compiler to nir_optimise to be used by the loop unroll
pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a1a292d17710a2bfb33f798c9f5fda73a5985261 04-Oct-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Store a clip_distance_mask field similar to cull_distance_mask.

This isn't useful for legacy GL, but will be used in Vulkan.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
19c652b29ce7271374cd0951bdadc9840964e78e 04-Oct-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Use shader_info for brw_vue_prog_data::cull_distance_mask.

This also allows us to move it from a GL specific location to a
part of the compiler shared by both GL and Vulkan.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e1af20f18a86f52a9640faf2d4ff8a71b0a4fa9b 13-Oct-2016 Timothy Arceri <timothy.arceri@collabora.com> nir/i965/anv/radv/gallium: make shader info a pointer

When restoring something from shader cache we won't have and don't
want to create a nir_shader this change detaches the two.

There are other advantages such as being able to reuse the
shader info populated by GLSL IR.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
89e1436e2d4ff0c15202708979eb36761cae4167 11-Oct-2016 Ian Romanick <ian.d.romanick@intel.com> i965: Silence unused parameter warnings

brw_link.cpp:76:44: warning: unused parameter ‘shader_type’ [-Wunused-parameter]
gl_shader_stage shader_type,
^
brw_nir.c: In function ‘brw_nir_lower_vs_inputs’:
brw_nir.c:194:55: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
const struct gen_device_info *devinfo,
^
brw_vec4_visitor.cpp:914:37: warning: unused parameter ‘sampler’ [-Wunused-parameter]
uint32_t sampler,
^
brw_vec4_visitor.cpp:1146:34: warning: unused parameter ‘stream_id’ [-Wunused-parameter]
vec4_visitor::gs_emit_vertex(int stream_id)
^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f57f526fc5cfaedf26b2becf8f1899d5de0d0461 16-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/ir: Skip eliminate_find_live_channel() for stages with sparse thread dispatch.

The eliminate_find_live_channel optimization eliminates
FIND_LIVE_CHANNEL instructions in cases where control flow is known to
be uniform, and replaces them with 'MOV 0', which in turn unblocks
subsequent elimination of the BROADCAST instruction frequently used on
the result of FIND_LIVE_CHANNEL. This is however not correct in
per-sample fragment shader dispatch because the PSD can dispatch a
fully unlit sample under certain conditions. Disable the optimization
in that case.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

v2: Add devinfo argument to brw_stage_has_packed_dispatch() to
implement hardware generation check.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5ca35c63673dad28854c00ce34ec6f085ba4ec5e 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/vec4: Assert that ATTR regions are register-aligned.

It might be useful to actually handle this once copy propagation
becomes smarter about register-misaligned offsets.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8bed1adfc144d9ae8d55ccb9b277942da8a78064 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/vec4: Assign correct destination offset to rewritten instruction in register coalesce.

Because the pass already checks that the destination offset of each
'scan_inst' that needs to be rewritten matches 'inst->src[0].offset'
exactly, the final offset of the rewritten instruction is just the
original destination offset of the copy. This is in preparation for
adding support for sub-GRF offsets to the VEC4 IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3a74e437fdec02c28749c94bc1bcf21c3c4b48d7 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/vec4: Don't coalesce registers with overlapping writes not matching the MOV source.

In preparation for adding support for sub-GRF offsets to the VEC4 IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1bb5074474445ea9f54d0f52383f99ac0fa6128f 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/vec4: Compare full register offsets in opt_register_coalesce nop move check.

In preparation for adding support for sub-GRF offsets to the VEC4 IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3be0d6d040753c62b25077fb6b85ad1f0808b258 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/vec4: Check that the write offsets match when setting dependency controls.

For simplicity just assume that two writes to the same GRF with
different sub-GRF offsets will potentially interfere and break the
dependency control chain. This is in preparation for adding sub-GRF
offset support to the VEC4 IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b52fefc4d55a4627bf0d59c78ac531603fa08fda 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/vec4: Change opt_vector_float to keep track of the last offset seen in bytes.

This simplifies things slightly and makes the pass more correct in
presence of sub-GRF offsets.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
230615e2280e6d28456e7d6a42b1e42645515b4d 09-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/vec4: Simplify src/dst_reg to brw_reg conversion by using byte_offset().

This should also have the side effect of fixing convert_to_hw_regs()
to handle sub-GRF register offsets.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
eb746a80e5e99bafd3957a1cb2d9db8548a1a6be 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/ir: Update several stale comments.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
47784e2346b56bea6a1111fecaa953239ff198ca 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/ir: Don't print ARF subnr values twice.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5d65d51e78c2f73389a0d30dac6dda4561e91bec 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/vec4: Print src/dst_reg::offset field consistently for all register files.

C.f. 'i965/fs: Print fs_reg::offset field consistently for all
register files.'.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
fcd9d1badcd97486eea5d87bf701a3b0a16b4ba9 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/vec4: Drop backend_reg::in_range() in favor of regions_overlap().

This makes sure that overlap checks are done correctly throughout the
back-end when the '*this' register starts before the register/size
pair provided as argument, and is actually less annoying to use than
in_range() at this point since regions_overlap() takes its size
arguments in bytes.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
728dd30c0ac0078653974de36087456065d2e3ae 08-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/vec4: Replace vec4_instruction::regs_read with ::size_read using byte units.

The previous regs_read value can be recovered by rewriting each
reference of regs_read() like 'x = i.regs_read(j)' to 'x =
DIV_ROUND_UP(i.size_read(j), reg_unit)'.

For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
69fdf13c215c2970feaca76f178a5c2c11ba8fec 03-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/vec4: Replace vec4_instruction::regs_written with ::size_written field in bytes.

The previous regs_written field can be recovered by rewriting each
rvalue reference of regs_written like 'x = i.regs_written' to 'x =
DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference
like 'i.regs_written = x' to 'i.size_written = x * reg_unit'.

For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d28cfa35fec75c367b940ff829ba8eaa035fbd22 02-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/vec4: Add wrapper functions for vec4_instruction::regs_read and ::regs_written.

This is in preparation for dropping vec4_instruction::regs_read and
::regs_written in favor of more accurate alternatives expressed in
byte units. The main reason these wrappers are useful is that a
number of optimization passes implement dataflow analysis with
register granularity, so these helpers will come in handy once we've
switched register offsets and sizes to the byte representation. The
wrapper functions will also make sure that GRF misalignment (currently
neglected by most of the back-end) is taken into account correctly in
the calculation of regs_read and regs_written.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
fba020e5af49d9d9a2c6e4d4b79115ed1e74a127 01-Sep-2016 Francisco Jerez <currojerez@riseup.net> i965/vec4: Replace dst/src_reg::reg_offset with dst/src_reg::offset expressed in bytes.

The dst/src_reg::offset field in byte units introduced in the previous
patch is a more straightforward alternative to an offset
representation split between ::reg_offset and ::subreg_offset fields.
The split representation makes it too easy to forget about one of the
offsets while dealing with the other, which has led to multiple FS
back-end bugs in the past. To make the matter worse the unit
reg_offset was expressed in was rather inconsistent, for uniforms it
would be expressed in either 4B or 16B units depending on the
back-end, and for most other things it would be expressed in 32B
units.

This encodes reg_offset as a new offset field expressed consistently
in byte units. Each rvalue reference of reg_offset in existing code
like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and
each lvalue reference like 'r.reg_offset = x' is rewritten to
'r.offset = r.offset % reg_unit + x * reg_unit'.

Because the change affects a lot of places and is rather non-trivial
to verify due to the inconsistent value of reg_unit, I've tried to
avoid making any additional changes other than applying the rewrite
rule above in order to keep the patch as simple as possible, sometimes
at the cost of introducing obvious stupidity (e.g. algebraic
expressions that could be simplified given some knowledge of the
context) -- I'll clean those up later on in a second pass.

v2: Fix division by the wrong reg_unit in the UNIFORM case of
convert_to_hw_regs(). (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
527f37199929932300acc1688d8160e1f3b1d753 23-Aug-2016 Jason Ekstrand <jason.ekstrand@intel.com> intel: s/brw_device_info/gen_device_info/

Generated by:

sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e7c376adfdecd4c1333997c8be8bb066a87c67b4 19-Aug-2016 Matt Turner <mattst88@gmail.com> i965/vec4: Ignore swizzle of VGRF for use by var_range_end().

var_range_end(v, n) loops over the n components of variable number v and
finds the maximum value, giving the last use of any component of v.
Therefore it expects v to correspond to the variable associated with the
.x channel of the VGRF.

var_from_reg() however returns the variable for the first channel of the
VGRF, post-swizzle.

So, if the last register had a swizzle with y, z, or w in the swizzle
component, we would read out of bounds. For any other register, we would
read liveness information from the next register.

The fix is to convert the src_reg to a dst_reg in order to call the
dst_reg version of var_from_reg() that doesn't consider the swizzle.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4c3a6b07e2960266adca634f8607ef38f71b8318 20-Jul-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Make opt_vector_float reset at the top of each block

The pass isn't really control-flow aware and you can get into case where it
tries to combine instructions from different blocks. This can actually
lead to an assertion failure when removing unneeded instructions if part of
the vector is set in one block and part in another. This prevents
regressions in the next commit.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7ea09511ca4f58640063cc1ee08386cce5300535 04-Apr-2016 Juan A. Suarez Romero <jasuarez@igalia.com> i965/fs: calculate first non-payload GRF using attrib slots

When computing where the first non-payload GRF starts, we can't rely on
the number of attributes, as each attribute can be using 1 or 2 slots
depending on whether they are a dvec3/4 or other.

Instead, we need to use the number of slots used by the attributes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b7423b485e11b768f68e8d5865fbc74b07ee6d48 04-Apr-2016 Juan A. Suarez Romero <jasuarez@igalia.com> i965/vec4: use attribute slots to calculate URB read length

Do not use total attributes because a dvec3/dvec4 attribute requires two
slots. So rather use total attribute slots.

v2: do not use loop to calculate required attribute slots (Kenneth
Graunke)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1ec466d0ff59ab17edef95c84ed733c1fea5655e 28-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Stop setting dispatch_grf_start_reg from the visitor

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1cc7573162a7f0e8346d7abab50890c58a0dce9a 28-Apr-2016 Francisco Jerez <currojerez@riseup.net> i965: Pass devinfo pointer to is_3src() helpers.

This is not strictly required for the following changes because none
of the three-source opcodes we support at the moment in the compiler
back-end has been removed or redefined, but that's likely to change in
the future. In any case having hardware instructions specified as a
pair of hardware device and opcode number explicitly in all cases will
simplify the opcode look-up interface introduced in a subsequent
commit, since the opcode number alone is in general ambiguous.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c55dc77ab13420a9fe0177ccd21a6b0a950d9113 28-Apr-2016 Francisco Jerez <currojerez@riseup.net> i965: Pass devinfo pointer to brw_instruction_name().

A future series will implement support for an instruction that happens
to have the same opcode number as another instruction we support
already on a disjoint set of hardware generations. In order to
disambiguate which instruction it is brw_instruction_name() will need
some way to find out which device we are generating code for.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
60a17d071825da4a06303cb699e4417edaaa6386 14-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Properly handle integer types in opt_vector_float().

Previously, opt_vector_float() always interpreted MOV sources as
floating point, and always created a MOV with a F-type destination.

This meant that we could mess up sequences of integer loads, such as:

mov vgrf6.0.x:D, 0D
mov vgrf6.0.y:D, 1D
mov vgrf6.0.z:D, 2D
mov vgrf6.0.w:D, 3D

Here, integer 0/1/2/3 become approximately 0.0f, so we generated:

mov vgrf6.0:F, [0F, 0F, 0F, 0F]

which is clearly wrong. We can properly handle this by converting
integer values to float (rather than bitcasting), and emitting a type
converting MOV:

mov vgrf6.0:D, [0F, 1F, 2F, 3F]

To do this, see first see if the integer values (converted to float)
are representable. If so, we use a D-type MOV. If not, we then try
the floating point values and an F-type MOV. We make zero not impose
type restrictions. This is important because 0D would imply a D-type
MOV, but is often used in sequences such as MOV 0D, MOV 0x3f800000D,
where we want to use an F-type MOV.

Fixes about 54 dEQP-GLES2 failures with the vec4 VS backend. This
recently became visible due to changes in opt_vector_float() which
made it optimize more cases, but it was a pre-existing bug.

Apparently it also manages to turn more integer loads into VFs,
producing the following shader-db statistics on Haswell:

total instructions in shared programs: 7084195 -> 7082191 (-0.03%)
instructions in affected programs: 246027 -> 244023 (-0.81%)
helped: 1937

total cycles in shared programs: 65669642 -> 65651968 (-0.03%)
cycles in affected programs: 531064 -> 513390 (-3.33%)
helped: 1177

v2: Handle the type of zero better.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1aa28f3509b033e0f86510a6d4c7993fca650b3b 14-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Make opt_vector_float() only handle non-type-conversion MOVs.

We don't handle this properly - we'd have to perform the type conversion
before trying to convert the value to a VF.

While we could do that, it doesn't seem particularly useful - most
vector loads should be consistently typed (all float or all integer).

As a special case, we do allow type-converting MOVs of integer 0, as
it's represented the same regardless of the type. I believe this case
does actually come up.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2a25a5142bd78b22cc9ada41b8988bb282c2a7ac 14-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Fold vectorize_mov() back into the one caller.

After the previous patch, this helper is only called in one place.
So, just fold it back in - there are a lot of parameters here and
not much code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9967561158acd94edff0fa93ceaf4bc527e271ed 14-Apr-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Rework opt_vector_float() control flow.

This reworks opt_vector_float() so that there's only one place that
flushes out any accumulated state and emits a VF.

v2: Don't break the sequence for non-representable numbers - just skip
recording their values. Only break it for non-MOVs or register
changes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a112391d52a458c588b8770cbf1ca9fce8863b79 06-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Handle MOV_INDIRECT in pack_uniform_registers

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
aaac8a18904f44e93a2223c93727086358d6a655 24-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Add support for SHADER_OPCODE_MOV_INDIRECT

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
61ee5e62a2beeb2e405ff3aa5e3eb26d1bf5437d 05-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Use can_do_writemask in can_reswizzle

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
75b68f9114dc3ba1b501fb7de8198c03b3dcb1fd 05-Apr-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Move can_do_writemask to vec4_instruction

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8e76f664beb845f8dca30ca5635f9369618563b0 09-Dec-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Get rid of the uniform_size array

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
056849772f66582fd7e8a181c3fb16955f84243b 25-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Use MOV_INDIRECT instead of reladdr for indirect push constants

This commit moves us to an instruction based model rather than a
register-based model for indirects. This is more accurate anyway as we
have to emit instructions to resolve the reladdr. It's also a lot simpler
because it gets rid of the recursive reladdr problem by design.

One side-effect of this is that we need a whole new algorithm in
move_uniform_array_access_to_pull_constants. This new algorithm is much
more straightforward than the old one and is fairly similar to what we're
already doing in the FS backend.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
01425c45b32fa7f323515b05697c6cc0d245ad32 17-Mar-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965: Remove the RCP+RSQ algebraic optimizations

NIR already has this optimization and it can do much better than the little
peephole in the backend.

No shader-db change on Haswell or Broadwell.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7d7990cf657550be4d038a0424ffdc0ef7fd8faa 14-Mar-2016 Francisco Jerez <currojerez@riseup.net> i965/vec4: Consider removal of no-op MOVs as progress during register coalesce.

Bug found by the liveness analysis validation pass that will be
introduced in a later commit. The no-op MOV check in
opt_register_coalesce() was removing instructions which makes the
cached liveness analysis calculation inconsistent with the shader IR.
We were failing to set progress to true in that case though, which
means that invalidate_live_intervals() wouldn't necessarily be called
at the end of the function.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2f76a9924e7b0b33a508ee3651b0cb2ab536a7dc 02-Mar-2016 Juan A. Suarez Romero <jasuarez@igalia.com> i965/vec4: add opportunistic behaviour to opt_vector_float()

opt_vector_float() transforms several scalar MOV operations to a single
vectorial MOV.

This is done when those MOV covers all the components of the destination
register. So something like:

mov vgrf3.0.xy:D, 0D
mov vgrf3.0.w:D, 1065353216D
mov vgrf3.0.z:D, 0D

is transformed in:

mov vgrf3.0:F, [0F, 0F, 0F, 1F]

But there are cases where not all the components are written. For
example, in:

mov vgrf2.0.x:D, 1073741824D
mov vgrf3.0.xy:D, 0D
mov vgrf3.0.w:D, 1065353216D
mov vgrf4.0.xy:D, 1065353216D
mov vgrf4.0.w:D, 0D
mov vgrf6.0:UD, u4.xyzw:UD

Nor vgrf3 nor vgrf4 .z components are written, so the optimization is
not applied.

But it could be applied anyway with the components covered, using a
writemask to select the ones written. So we could transform it in:

mov vgrf2.0.x:D, 1073741824D
mov vgrf3.0.xyw:F, [0F, 0F, 0F, 1F]
mov vgrf4.0.xyw:F, [1F, 1F, 0F, 0F]
mov vgrf6.0:UD, u4.xyzw:UD

This commit does precisely that: opportunistically apply
opt_vector_float() when possible.

total instructions in shared programs: 7124660 -> 7114784 (-0.14%)
instructions in affected programs: 443078 -> 433202 (-2.23%)
helped: 4998
HURT: 0

total cycles in shared programs: 64757760 -> 64728016 (-0.05%)
cycles in affected programs: 1401686 -> 1371942 (-2.12%)
helped: 3243
HURT: 38

v2: change vectorize_mov() signature (Matt).
v3: take in account predicates (Juan).
v4 [mattst88]: Update shader-db numbers. Fix some whitespace issues.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
cfbd9831f89ef165e7998d0b8524a1aefedec404 25-Feb-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Eliminate brw_nir_lower_{inputs,outputs,io} functions.

Now that each stage is directly calling brw_nir_lower_io(), and we have
per-stage helper functions, it makes sense to just call the relevant one
directly, rather than going through multiple switch statements.

This also eliminates stupid function parameters, such as the two that
only apply to vertex attributes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2f2c00c7279e7c43e520e21de1781f8cec263e92 11-Feb-2016 Matt Turner <mattst88@gmail.com> i965: Lower min/max after optimization on Gen4/5.

Gen4/5's SEL instruction cannot use conditional modifiers, so min/max
are implemented as CMP + SEL. Handling that after optimization lets us
CSE more.

On Ironlake:

total instructions in shared programs: 6426035 -> 6422753 (-0.05%)
instructions in affected programs: 326604 -> 323322 (-1.00%)
helped: 1411

total cycles in shared programs: 129184700 -> 129101586 (-0.06%)
cycles in affected programs: 18950290 -> 18867176 (-0.44%)
helped: 2419
HURT: 328

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8122d21d1507b4d6d351299f88fff0c645c0b4ff 13-Feb-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Fix gl_DrawID in the vec4 backend.

brw_draw_upload.c uploads VertexID/InstanceID first, then DrawID.
So we need to assign the attribute mapping in that order as well.

Fixes the following Pigit tests with the vec4 backend:
- arb_shader_draw_parameters-drawid vertexid
- arb_shader_draw_parameters-drawid-indirect basevertex

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5743fd957145040a4734b5542ee5187cfad4cf1d 11-Feb-2016 Ben Widawsky <benjamin.widawsky@intel.com> i965: Rename optimizer debug 00 filename

This allows ls, and scripts to get the file names in the correct order of
optimization.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
85f5c18fef1ff2f19d698f150e23a02acd6f59b9 14-Jan-2016 Kenneth Graunke <kenneth@whitecape.org> i965/vec4: Drop support for ATTR as an instruction destination.

This is no longer necessary...and it doesn't make much sense to
have inputs as destinations.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d56ae2d1605fc1b5a3fdf5aba9aefc3c7692a4ba 14-Jan-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Apply VS attribute workarounds in NIR.

This patch re-implements the pre-Haswell VS attribute workarounds.
Instead of emitting shader code in the vec4 backend, we now simply
call a NIR pass to emit the necessary code.

This simplifies the vec4 backend. Beyond deleting code, it removes
the primary use of ATTR as a destination. It also eliminates the
requirement that the vec4 VS backend express the ATTR file in terms
of VERT_ATTRIB_* locations, giving us a bit more flexibility.

This approach is a little different: rather than munging the attributes
at the top, we emit code to fix them up when they're accessed. However,
we run the optimizer afterwards, so CSE should eliminate the redundant
math. It may even be able to fuse it with other calculations based on
the input value.

shader-db does not handle non-default NOS settings, so I have no
statistics about this patch.

Note that the scalar backend does not implement VS attribute
workarounds, as they are unnecessary on hardware which allows SIMD8 VS.

v2: Do one multiply for FIXED rescaling and select components from
either the original or scaled copy, rather than multiplying each
component separately (suggested by Matt Turner).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
830b075e86e3e9af1bf12316d0f9d888a85a973b 05-Jan-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Explicitly write the "TR DS Cache Disable" bit at TCS EOT.

Bit 0 of the Patch Header is "TR DS Cache Disable". Setting that bit
disables the DS Cache for tessellator-output topologies resulting in
stitch-transition regions (but leaves it enabled for other cases).

We probably shouldn't leave this to chance - the URB could contain
garbage - which could result in the cache randomly being turned on
or off.

This patch makes the final EOT write 0 to the first DWord (which
only contains this one bit). This ensures the cache is always on.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9870f798beab701a9edda81ff7ccc39f1875d610 15-Jan-2016 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs/generator: Take an actual shader stage rather than a string

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
824d82025d0bff9841647942aca501fba16fc1a9 14-Jan-2016 Kenneth Graunke <kenneth@whitecape.org> i965: Make an is_scalar boolean in brw_compile_vs().

Shorter than compiler->scalar_stage[MESA_SHADER_VERTEX], which can
help with line-wrapping.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
53a9b6223f4ebf66e8892e04ffe47eb5586eda5c 31-Dec-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Move 3-src subnr swizzle handling into the vec4 backend.

While most align16 instructions only support a SubRegNum of 0 or 4
(using swizzling to control the other channels), 3-src instructions
actually support arbitrary SubRegNums. When the RepCtrl bit is set,
we believe it ignores the swizzle and uses the equivalent of a <0,1,0>
region from the subnr.

In the past, we adopted a vec4-centric approach of specifying subnr of
0 or 4 and a swizzle, then having brw_eu_emit.c convert that to a proper
SubRegNum. This isn't a great fit for the scalar backend, where we
don't set swizzles at all, and happily set subnrs in the range [0, 7].

This patch changes brw_eu_emit.c to use subnr and swizzle directly,
relying on the higher levels to set them sensibly.

This should fix problems where scalar sources get copy propagated into
3-src instructions in the FS backend. I've only observed this with
TES push model inputs, but I suppose it could happen in other cases.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
cddfc2cefa93b884c40329dcb193fe4fb22143ab 10-Dec-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Add support for gl_DrawIDARB and enable extension

We have to break open a new vec4 for gl_DrawIDARB. We've used up all
space in the vec4 we use for SGVS and gl_DrawIDARB has to come from its
own separate vertex buffer anyway. This is because we point the vb for
base vertex and base instance into the draw parameter BO for indirect
draw calls, but the draw id is generated by mesa in a different buffer.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
17ebb55a14b5a9aa639845fbda9330ef9421834a 10-Dec-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Add support for gl_BaseVertexARB and gl_BaseInstanceARB

We already have gl_BaseVertexARB in the .x component of the SGVS vec4
and plug gl_BaseInstanceARB into the last free component (.y).

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bd8ab8dedb2cc557ea3cb58d507f237743b3f7f9 24-Dec-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Don't set interleave or complete on TCS EOT message.

Setting interleave on the TCS EOT message causes Ivybridge hardware to
GPU hang like crazy. Individual tests would pass, but running even a
simple test like nop.shader_test in a loop would hang within 1-3 runs.
Adding sleep delays worked around the problem, somehow.

Interleave doesn't make much sense given that we only have one patch
URB handle, not two. Complete doesn't seem useful either.

There's no reason to actually set those bits. We were just being lazy.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b7793783b3df94880655234bc2a9054eddf01913 26-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Relase input URB Handles on Gen7/7.5 when TCS threads finish.

Pre-Broadwell hardware requires us to manually release the ICP Handles
by issuing URB read messages with the "Complete" bit set. We can do
this in pairs to use fewer URB read messages.

Based heavily on work from Chris Forbes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1245724f728915694ecb9c318a68107c01ccc808 17-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Port tessellation evaluation shaders to vec4 mode.

This can be used on Broadwell by setting INTEL_SCALAR_TES=0.
More importantly, it will be used for Ivybridge and Haswell.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
24be658d13b13fdb8a1977208038b4ba43bce4ac 17-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Add tessellation control shaders.

The TCS is the first tessellation shader stage, and the most
complicated. It has access to each of the control points in the input
patch, and computes a new output patch. There is one logical invocation
per output control point; all invocations run in parallel, and can
communicate by reading and writing output variables.

One of the main responsibilities of the TCS is to write the special
gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which
control how much new geometry the hardware tessellation engine will
produce. Otherwise, it simply writes outputs that are passed along
to the TES.

We run in SIMD4x2 mode, handling two logical invocations per EU thread.
The hardware doesn't properly manage the dispatch mask for us; it always
initializes it to 0xFF. We wrap the whole program in an IF..ENDIF block
to handle an odd number of invocations, essentially falling back to
SIMD4x1 on the last thread.

v2: Update comments (requested by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
83dedb6354d0e9b04e8ccad77e86bdb7bad44bdd 20-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Add src/dst interference for certain instructions with hazards.

When working on tessellation shaders, I created some vec4 virtual
opcodes for creating message headers through a sequence like:

mov(8) g7<1>UD 0x00000000UD { align1 WE_all 1Q compacted };
mov(1) g7.5<1>UD 0x00000100UD { align1 WE_all };
mov(1) g7<1>UD g0<0,1,0>UD { align1 WE_all compacted };
mov(1) g7.3<1>UD g8<0,1,0>UD { align1 WE_all };

This is done in the generator since the vec4 backend can't handle align1
regioning. From the visitor's point of view, this is a single opcode:

hs_set_output_urb_offsets vgrf7.0:UD, 1U, vgrf8.xxxx:UD

Normally, there's no hazard between sources and destinations - an
instruction (naturally) reads its sources, then writes the result to the
destination. However, when the virtual instruction generates multiple
hardware instructions, we can get into trouble.

In the above example, if the register allocator assigned vgrf7 and vgrf8
to the same hardware register, then we'd clobber the source with 0 in
the first instruction, and read back the wrong value in the last one.

It occured to me that this is exactly the same problem we have with
SIMD16 instructions that use W/UW or B/UB types with 0 stride. The
hardware implicitly decodes them as two SIMD8 instructions, and with
the overlapping regions, the first would clobber the second.

Previously, we handled that by incrementing the live range end IP by 1,
which works, but is excessive: the next instruction doesn't actually
care about that. It might also be the end of control flow. This might
keep values alive too long. What we really want is to say "my source
and destinations interfere".

This patch creates new infrastructure for doing just that, and teaches
the register allocator to add interference when there's a hazard. For
my vec4 case, we can determine this by switching on opcodes. For the
SIMD16 case, we just move the existing code there.

I audited our existing virtual opcodes that generate multiple
instructions; I believe FS_OPCODE_PACK_HALF_2x16_SPLIT needs this
treatment as well, but no others.

v2: Rebased by mattst88.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f36993b46962eab4446bc1964eb47149751aee26 23-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Clean up #includes in the compiler.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2d8c5299032d229c8f6e936db5644cd53716e6c1 20-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Prevent implicit upcasts to brw_reg.

Now that backend_reg inherits from brw_reg, we have to be careful to
avoid the object slicing problem.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
799f924073c62c3a012c48a51895b46ad621e36c 24-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Use scope operator to ensure brw_reg is interpreted as a type.

In the next patch, I make backend_reg's inheritance from brw_reg
private, which confuses clang when it sees the type "struct brw_reg" in
the derived class constructors, thinking it is referring to the
privately inherited brw_reg:

brw_fs.cpp:366:23: error: 'brw_reg' is a private member of 'brw_reg'
fs_reg::fs_reg(struct brw_reg reg) :
^
brw_shader.h:39:22: note: constrained by private inheritance here
struct backend_reg : private brw_reg
^~~~~~~~~~~~~~~
brw_reg.h:232:8: note: member is declared here
struct brw_reg {
^

Avoid this by marking brw_reg with the scope resolution operator.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f093c842e65b251e24ea3a2d6daaa91326a4f862 21-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Use implicit backend_reg copy-constructor.

In order to do this, we have to change the signature of the
backend_reg(brw_reg) constructor to take a reference to a brw_reg in
order to avoid unresolvable ambiguity about which constructor is
actually being called in the other modifications in this patch.

As far as I understand it, the rule in C++ is that if multiple
constructors are available for parent classes, the one closest to you in
the class heirarchy is closen, but if one of them didn't take a
reference, that screws things up.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
309a44d63c75a7d688157486b094e555f49c907d 22-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Add and use backend_reg::equals().

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6c8ba59cff14a1a86273f4008ff2a8e68335ab25 11-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Use nir_lower_tex for texture coordinate lowering

Previously, we had a rescale_texcoords helper in the FS backend for
handling rescaling of texture coordinates. Now that we can do variants in
NIR, we can use nir_lower_tex to do the rescaling for us. This allows us
to delete the i965-specific code and gives us proper TEXTURE_RECTANGLE and
GL_CLAMP handling in vertex and geometry shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ce767bbdfff7c2a7829b652c111a11eb9ddba026 11-Nov-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Move postprocess_nir to codegen time

This allows us to insert NIR passes between initial NIR compilation and
optimization (link time) and actual backend code-gen. In particular, it
will allow us to do shader variants in NIR and share some of that shader
variant code between backends.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a5b3115f0a9ede775b332b1a669de570668e871c 02-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Drop IMM fs_reg/src_reg -> brw_reg conversions.

The previous two commits make this unnecessary.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f9a9ba5eac2f1934bd7fecc92cd309f22411164b 02-Nov-2015 Matt Turner <mattst88@gmail.com> i965/vec4: Replace src_reg(imm) constructors with brw_imm_*().

Cuts 1.5k of .text.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
44d6c0c805d2911cc5dfe853e5bc5a505f87775f 12-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Convert scalar_* flags to a scalar_stage array.

I was going to add scalar_tcs and scalar_tes flags, and then thought
better of it and decided to convert this to an array. Simpler.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0eb3db117b56b081ee2674cc8940c193ffc3c41b 02-Nov-2015 Matt Turner <mattst88@gmail.com> i965: Use BRW_MRF_COMPR4 macro in more places.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
49b3215d7076db8b9afe8998b01ef250795b5892 27-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Combine register file field.

The first four values (2-bits) are hardware values, and VGRF, ATTR, and
UNIFORM remain values used in the IR.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b3315a6f56fb93f2884168cbf9358b2606641db5 27-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Replace HW_REG with ARF/FIXED_GRF.

HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to
look at different fields for type, abs, negate, writemask, swizzle, and
a second file. They also caused annoying problems like immediate sources
being considered scheduling barriers (commit 6148e94e2) and other such
nonsense.

Instead use ARF/FIXED_GRF/MRF for fixed registers in those files.

After a sufficient amount of time has passed since "GRF" was used, we
can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b163aa01487ab5f9b22c48b7badc5d65999c4985 27-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Rename GRF to VGRF.

The 2-bit hardware register file field is ARF, GRF, MRF, IMM.

Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7638e75cf99263c1ee8e31c6cc5a319feec2c943 26-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Use brw_reg's nr field to store register number.

In addition to combining another field, we get replace silliness like
"reg.reg" with something that actually makes sense, "reg.nr"; and no one
will ever wonder again why dst.reg isn't a dst_reg.

Moving the now 16-bit nr field to a 16-bit boundary decreases code size
by about 3k.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3048053908310eaf082058e5be34ae902e1fc02c 26-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Unwrap some lines.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
58fa9d47b536403c4e3ca5d6a2495691338388fd 26-Oct-2015 Matt Turner <mattst88@gmail.com> i965/vec4: Remove swizzle/writemask fields from src/dst_reg.

Also allows us to handle HW_REGs in the swizzle() and writemask()
functions.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
94b1031703b1b5759436fe215323727cffce5f86 25-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Remove fixed_hw_reg field from backend_reg.

Since backend_reg now inherits brw_reg, we can use it in place of the
fixed_hw_reg field.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1392e45bfb396ccbfa5bb0c6063522e0550988d3 24-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Use immediate storage in inherited brw_reg.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e42fb0c2a687cdcd6af2a590f6f5e24f64cfff3b 23-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Make 'dw1' and 'bits' unnamed structures in brw_reg.

Generated by

sed -i -e 's/\.bits\././g' *.c *.h *.cpp
sed -i -e 's/dw1\.//g' *.c *.h *.cpp

and then reverting changes to comments in gen7_blorp.cpp and
brw_fs_generator.cpp.

There wasn't any utility offered by forcing the programmer to list these
to access their fields. Removing them will reduce churn in future
commits.

This is C11 (and gcc has apparently supported it for sometime
"compatibility with other compilers")

See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e42a29531ae3d5dedb72011da2947357dfa8715b 10-Nov-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Print force_writemask_all in dump_instructions().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4ef27745c8ed5153464db22950a90d74d2ef4435 09-Sep-2015 Neil Roberts <neil@linux.intel.com> i965/vec4/skl+: Use ld2dms_w instead of ld2dms

In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data in the response
still fits in a single register so we just need to ensure we copy both
values rather than just the lower one.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7c81a6a647257c309cb1ca36c60aa4bfa8e2e022 26-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Replace default case with list of enum values.

If we add a new file type, we'd like to get warnings if it's not
handled.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4cba8f5d21e4b50343e7c7bfbeb603b59c5d71dd 23-Oct-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vec4: Wrap vec4_generator in a C function.

vec4_generator is a class for convenience, but only exports a single
method as its public API. It makes much more sense to just export a
single function.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
73ff0ead3688519eb76ea8bc32eabb9004e6f37b 23-Oct-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vec4: Convert src_reg/dst_reg to brw_reg at the end of the visitor.

This patch makes the visitor convert registers to the HW_REG file at the
very end, after register allocation, post-RA scheduling, and dependency
control flagging. After that, everything is in fixed brw_regs.

This simplifies the code generator, as it can just use the hardware
registers rather than having to interpret our abstract files. In
particular, interpreting the UNIFORM file meant reading prog_data
to figure out where push constants are supposed to start.

Having the part of the code that performs register allocation also
translate everything to hardware registers seems sensible.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bfc73ff10eafad59b6ae9ca3991f9f1a3700b3a1 07-Oct-2015 Emil Velikov <emil.l.velikov@gmail.com> i965: remove unneeded src_reg copy in emit_shader_time_write

The variable is already of type src_reg. creating a new instance only to
destroy it seems unnecessary.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8cf84a7e470dbd3b46ce4081459d2ecfab22c2d5 09-Oct-2015 Alejandro Piñeiro <apinheiro@igalia.com> i965/vec4: print predicate control at brw_vec4 dump_instruction

v2: externalize pred_ctrl_align16 from brw_disasm.c instead of adding
a copy on brw_vec4.c, as suggested by Matt Turner

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
627f94b72e0e9443ad116f072599a7342269f297 28-Sep-2015 Alejandro Piñeiro <apinheiro@igalia.com> i965/vec4: adding vec4_cmod_propagation optimization

vec4 port of fs_cmod_propagation.

Shader-db results (no vec4 grepping):
total instructions in shared programs: 6240413 -> 6235841 (-0.07%)
instructions in affected programs: 401933 -> 397361 (-1.14%)
total loops in shared programs: 1979 -> 1979 (0.00%)
helped: 2265
HURT: 0

v2: remove extra space and combine two if blocks, as suggested by
Matt Turner
v3: add condition check to bail out if current inst and inst being
scanned has different writemask, as pointed by Matt Turner
v3: updated shader-db numbers
v4: remove block from foreach_inst_in_block_*_starting_from after
commit 801f151917fedb13c5c6e96281a18d833dd6901f

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
801f151917fedb13c5c6e96281a18d833dd6901f 20-Oct-2015 Neil Roberts <neil@linux.intel.com> i965: Remove block arg from foreach_inst_in_block_*_starting_from

Since 49374fab5d793 these macros no longer actually use the block
argument. I think this is worth doing to make the macros easier to use
because they already have really long names and a confusing set of
arguments.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9e17c36b8ba79e688011a5fd293ad5f42da21b66 14-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Extract can_change_source_types() functions.

Make them members of fs_inst/vec4_instruction for use elsewhere.

Also fix the fs version to check that dst.type == src[1].type and for
!saturate.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
41c474df53d9dcd5fd8e24eba5b7acc2b3c32795 15-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vs: Move URB entry_size and read_length calculations to compile_vs

Reviewed-By: Eduardo Lima Mitev <elima@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4467344c829f1dccdf74e27bef2c5fda72552be6 09-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Rename brw_foo_emit to brw_compile_foo

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5d8bf6de6166a686a006478a420bcd373860e9ee 08-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler

This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.

v2 (Jason Ekstrand):
- Patch use_legacy_snorm_formula through as a function argument rather
than trying to go through the shader key.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8f1d968704858d78d7e78a6b88db3ea2bc0cf749 06-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Remove gl_program and gl_shader_program from the generator

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5e86f5b3d21fe8e96676bb0608990d72dbf61b85 06-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs: Remove the gl_program from the generator

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
031d3501322aee0a1474c7f2a9b79f9fa9947430 26-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Unify URB entry size/read length calculations between backends.

Both the vec4 and scalar VS backends had virtually identical URB entry
size and read length calculations. We can move those up a level to
backend-agnostic code and reuse it for both.

Unfortunately, the backends need to know nr_attributes to compute
first_non_payload_grf, so I had to store that in prog_data. We could
use urb_read_length, but that's nr_attributes rounded up to a multiple
of two, so doing so would waste a register in some cases.

There's more code to be removed in the vec4 backend, but that will
come in a follow-on patch.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ee0f0108c8e87b9cfec25bade66670bbc4254139 07-Oct-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Move brw_get_shader_time_index() call out of emit functions

brw_get_shader_time_index() is all tangled up in brw_context state and
we can't call it from the compiler. Thanks the Jasons recent
refactoring, we can just get the index and pass to the emit functions
instead.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ba71d581aeb96c4626500eb5b19f3bef2f40d586 05-Oct-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Move brw_dump_ir() out of brw_*_emit() functions

We move these calls one level up into the codegen functions.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5a360dcad1fdb91f9129cb21775b9af60cbf57e4 03-Oct-2015 Matt Turner <mattst88@gmail.com> i965: Generalize predicated break pass for use in vec4 backend.

instructions in affected programs: 44204 -> 43762 (-1.00%)
helped: 221

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bf7b6fd3fd6d98305d64ee6224ca9f9e7ba48444 02-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/shader: Get rid of the shader, prog, and shader_prog fields

Unfortunately, we can't get rid of them entirely. The FS backend still
needs gl_program for handling TEXTURE_RECTANGLE. The GS vec4 backend still
needs gl_shader_program for handling transfom feedback. However, the VS
needs neither and we can substantially reduce the amount they are used.
One day we will be free from their tyranny.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
404419ee1a57c79982d93eefe4de099d61ad2eee 02-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/fs,vec4: Get rid of the sanity_param_count

It doesn't exist for anything other than an assert that, as far as I can
tell, isn't possible to trip. Soon, we will remove prog from the visitor
entirely and this will become even more impossible to hit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ca6a436f12cb55e9415049a217229c99b02ad3b8 02-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Use nir info instead of pulling things out of [shader_]prog

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ea006c4cb5eb2d98d6bfd5a6c32fcae10b636f17 01-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Move binding table setup to codegen time.

Setting up binding tables really has little to do with the actual process
of turning shaders into instructions; it's more part of setting up
prog_data. This commit moves it out of the visitors and with the rest of
the prog_data setup stuff.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
28709e37d96d6b64753ca4dcce5fbfeb75f5b499 01-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/shader: Pull assign_common_binding_table_offsets out of backend_shader

This really has nothing to do with the backend compiler and we'd like to
eventually be able to set this up earlier in the compile process.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5609e0d7b41e861a3359991e8d0f2053b255fc31 30-Sep-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Get rid of the uniform_vector_size array

The uniform_vector_size array was only ever used by pack_uniform_registers
which no longer needs it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ea35fb0fbead2902b1ba37e7cdb1523853fabd8b 30-Sep-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Use the actual channels used in pack_uniform_registers

Previously, pack_uniform_registers worked based on the size of the uniform
as given to us when we initially set up the uniforms. However, we have to
walk through the uniforms and figure out liveness anyway, so we migh as
well record the number of channels used as we go. This may also allow us
to pack things tighter in a few cases.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
fc3f45234b4ff9545c84fbe8ec5261604d5ab611 01-Oct-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vs: Move lazy NIR creation to codegen_vs_prog

The next commit will add code to codegen_vs_prog that requires the NIR
shader to be there in all cases. It doesn't hurt anything to just move it
from brw_vs_emit to its only caller.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b85761d11d2abff4d45a4938b34c1c7840c97339 21-Sep-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Always use NIR

GLSL IR vs. NIR shader-db results for vec4 programs on i965:

total instructions in shared programs: 1499328 -> 1388354 (-7.40%)
instructions in affected programs: 1245199 -> 1134225 (-8.91%)
helped: 7469
HURT: 2440

GLSL IR vs. NIR shader-db results for vec4 programs on G4x:

total instructions in shared programs: 1436799 -> 1325825 (-7.72%)
instructions in affected programs: 1205599 -> 1094625 (-9.20%)
helped: 7469
HURT: 2440

GLSL IR vs. NIR shader-db results for vec4 programs on Iron Lake:

total instructions in shared programs: 1436654 -> 1325682 (-7.72%)
instructions in affected programs: 1205503 -> 1094531 (-9.21%)
helped: 7468
HURT: 2440

GLSL IR vs. NIR shader-db results for vec4 programs on Sandy Bridge:

total instructions in shared programs: 2016249 -> 1787033 (-11.37%)
instructions in affected programs: 1850547 -> 1621331 (-12.39%)
helped: 14856
HURT: 1481

GLSL IR vs. NIR shader-db results for vec4 programs on Ivy Bridge:

total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
instructions in affected programs: 1660279 -> 1460468 (-12.03%)
helped: 14668
HURT: 1369

GLSL IR vs. NIR shader-db results for vec4 programs on Bay Trail:

total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
instructions in affected programs: 1660279 -> 1460468 (-12.03%)
helped: 14668
HURT: 1369

GLSL IR vs. NIR shader-db results for vec4 programs on Haswell:

total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
instructions in affected programs: 1660279 -> 1460468 (-12.03%)
helped: 14668
HURT: 1369

I also ran our full suite of benchmarks on a Haswell and had the following
statistically significant (according to ministat) changes:

Test master-glsl master-nir diff
bench_OglGeomPoint 461.556 463.006 1.450
bench_OglTerrainFlyInst 184.484 187.574 3.090
bench_OglTerrainPanInst 132.412 136.307 3.895
bench_OglTexFilterAniso 19.653 19.645 -0.008
bench_OglTexFilterTri 58.333 58.009 -0.324
bench_OglVSInstancing 65.049 65.327 0.278
bench_trexoff 69.474 69.694 0.220
bench_valley 40.708 41.125 0.417

v2 (Jason Ekstrand):
- Remove more uses of NirOptions as a switch
- New shader-db numbers
- Added benchmark numbers

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6485880232df46c0cdded0b063b8841a7855bd32 28-Aug-2015 Samuel Iglesias Gonsalvez <siglesias@igalia.com> i965/vec4: Implement VS_OPCODE_GET_BUFFER_SIZE

Notice that Skylake needs to include a header in the sampler message
so it will need some tweaks to work there.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f2e75ac88a92ab2180de576aca298929cfce03f2 22-Sep-2015 Antia Puentes <apuentes@igalia.com> i965/vec4: Don't coalesce regs in Gen6 MATH ops if reswizzle/writemask needed

Gen6 MATH instructions can not execute in align16 mode, so swizzles or
writemasking are not allowed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
10da96887c785930c2553b2d5bde91e52b8b034a 21-Sep-2015 Matt Turner <mattst88@gmail.com> i965/vec4: Detect and delete useless MOVs.

With NIR:

instructions in affected programs: 111508 -> 109193 (-2.08%)
helped: 507

Without NIR:

instructions in affected programs: 28763 -> 28474 (-1.00%)
helped: 186

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a548c75e31b4146d55133cb8c57a82117c196584 05-Sep-2015 Kristian Høgsberg Kristensen <krh@bitplanet.net> i965: Move perf_debug code to brw_codegen_*_prog()

We're trying to avoid a libdrm dependency in the core compiler, so let's
move the perf_debug code one level up from the brw_*_emit() helpers to
the brw_codegen_*_prog() helpers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
79f1a7ae28c37f77e08e550cd077959a2a1f8341 05-Aug-2015 Antia Puentes <apuentes@igalia.com> i965/vec4: Fix saturation errors when coalescing registers

If the register types do not match and the instruction
that contains the final destination is saturated, register
coalescing generated non-equivalent code.

This did not happen when using IR because types usually
matched, but it is visible in nir-vec4.

For example,
mov vgrf7:D vgrf2:D
mov.sat m4:F vgrf7:F

is coalesced to:
mov.sat m4:D vgrf2:D

The patch prevents coalescing in such scenario, unless the
instruction we want to coalesce into is a MOV (without type
conversion implied). In that case, the patch sets the register
types to the type of the final destination.

Shader-db results in HSW (only vec4 instructions shown):

total instructions in shared programs: 1754415 -> 1754416 (0.00%)
instructions in affected programs: 74 -> 75 (1.35%)
helped: 0
HURT: 1
GAINED: 0
LOST: 0

Only one extra instruction in one of the shaders, that comes from
eliminating a saturation error by preventing register coalesce.

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1037e0a84f61f4b1815093bcfd548d4b58ca106f 11-Sep-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Don't reswizzle hardware registers

Cc: "11.0 10.6" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91719
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d4e29af2344c06490913efc35430f93a966061bb 11-Sep-2015 Alejandro Piñeiro <apinheiro@igalia.com> i965/vec4: check writemask when bailing out at register coalesce

opt_register_coalesce stopped to check previous instructions to
coalesce with if somebody else was writing on the same
destination. This can be optimized to check if somebody else was
writing to the same channels of the same destination using the
writemask.

Shader DB results (taking into account only vec4):

total instructions in shared programs: 1781593 -> 1734957 (-2.62%)
instructions in affected programs: 1238390 -> 1191754 (-3.77%)
helped: 12782
HURT: 0
GAINED: 0
LOST: 0

v2: removed some parenthesis, fixed indentation, as suggested by
Matt Turner
v3: added brackets, for consistency, as suggested by Eduardo Lima

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0b91bcea98c0fe201bba89abe1ca3aee4d04c56c 12-Aug-2015 Ilia Mirkin <imirkin@alum.mit.edu> i965: add support for textureSamples function

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
[v2: kayden-supplied code in fs_nir replacing need for logical opcode]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bd6e516fc24128d604f677a16f692d88d65a49f1 23-Jul-2015 Iago Toral Quiroga <itoral@igalia.com> i965: Add a debug option for spilling everything in vec4 code

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4f4b7c4711d98606270133dfd456acabfa8267a6 28-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Remove the brw_vue_prog_key base class.

The legacy userclip fields are only used for the vertex shader, and at
that point there's only program_string_id and the tex struct, which are
common to all keys. So there's no need for a "VUE" key base class.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
014b90221ad5cf833bfdd55b0336771d209f0f1d 28-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Move legacy clip plane handling to vec4_vs_visitor.

This is now only used for the vertex shader, so it makes sense to get it
out of any paths run by the geometry shader.

Instead of passing the gl_clip_plane array into the run() method (which
is shared among all subclasses), we add it as a vec4_vs_visitor
constructor parameter. This eliminates the bogus NULL parameter in the
GS case.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
082b7f1876095f32578720f30fdc35771b2b3e0a 28-Aug-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Delete the brw_vue_program_key::userclip_active flag.

There are two uses of this flag.

The primary use is checking whether we need to emit code to convert
legacy gl_ClipVertex/gl_Position clipping to clip distances. In this
case, we also have to upload the clip planes as uniforms, which means
setting nr_userclip_plane_consts to a positive value. Checking if it's
> 0 works for detecting this case.

Gen4-5 also wants to know whether we're doing clipping at all, so it can
emit user clip flags. Checking if output_reg[VARYING_SLOT_CLIP_DIST0]
is set to a real register suffices for this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4de86e1371b0d59a5b9a787b726be3d373024647 01-Sep-2015 Alejandro Piñeiro <apinheiro@igalia.com> i965/vec4: fill src_reg type using the constructor type parameter

The src_reg constructor that received the glsl_type was using it
only to build the swizzle, but not to fill this->type as dst_reg
is doing.

This caused some type mismatch between movs and alu operations
on the NIR path, so copy propagation optimization was not applied
to remove unneeded movs if negate modifier was involved. This was
first detected on minus (negate+add) operations.

Shader DB results (taking into account only vec4):

total instructions in shared programs: 20019 -> 19934 (-0.42%)
instructions in affected programs: 2918 -> 2833 (-2.91%)
helped: 79
HURT: 0
GAINED: 0
LOST: 0

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8765f1d7ddfb00dc5b202e4e679ebe640a547d50 18-Aug-2015 Matt Turner <mattst88@gmail.com> i965: Only consider fixed_hw_reg in equals() if file is HW_REG/IMM.

Noticed when debugging things that lead to the next patch.

On G45 (and presumably ILK) this helps register coalescing:

total instructions in shared programs: 4077373 -> 4077340 (-0.00%)
instructions in affected programs: 43751 -> 43718 (-0.08%)
helped: 52
HURT: 2

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
34d162260f513a7eaec12611e3859bb34230cf33 08-Jul-2015 Antia Puentes <apuentes@igalia.com> i965/vec4: Handle uniform and GRF array access on vertex programs (NIR)

When the NIR-vec4 pass is enabled, handles uniform and GRF array access
on ARB_vertex_program like it is done on vertex shaders.

When the old IR-vec4 pass is used, emit_program_code() emits pull constant
loads directly instead of using relative addressing, hence to call to
move_uniform_array_access_to_pull_constants() is not needed and it is enough
to call to split_uniform_registers().

The patch also calls to move_grf_array_access_to_scratch() like it is
done for shaders, however I suspect this is a no-op for vertex programs and
we could remove it.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
90825e3ca977057c8f3d6ad2d1aa38277cc3ff11 08-Jul-2015 Antia Puentes <apuentes@igalia.com> i965/vec4: Enable NIR-vec4 pass on ARB_vertex_programs

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
38fc4a91cd5c04fdd5921b8776f8e203513ab517 01-Jul-2015 Iago Toral Quiroga <itoral@igalia.com> i965/nir: Enable NIR-vec4 pass on geometry shaders

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0d43d27df742ad95a086580bae2ee08a0bc00e69 23-May-2015 Alejandro Piñeiro <apinheiro@igalia.com> i965/vec4: Add a new dst_reg constructor accepting a brw_reg_type

This is useful for the upcoming texture support in NIR->vec4 pass,
as we found several cases where the brw_type is available, but not
the glsl_type.

Without this new constructor, the alternative would be:
dst_reg reg(MRF, <reg>)
reg.type = <brw_type>
reg.writemask = <mask>

Adding a new constructor makes code easier to read.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5e839727ed2378a01d3b657bad83abd4728e8da6 22-Jul-2015 Eduardo Lima Mitev <elima@igalia.com> i965/nir: Pass a is_scalar boolean to brw_create_nir()

The upcoming introduction of NIR->vec4 pass will require that some NIR
lowering passes are enabled/disabled depending on the type of shader
(scalar vs. vector).

With this patch we pass a 'is_scalar' variable to the process of
constructing the NIR, to let an external context decide how the shader
should be handled.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
47d68908f2c3ad3e9011a2cf910b04cd3300673a 16-Jun-2015 Eduardo Lima Mitev <elima@igalia.com> i965/nir/vec4: Select between new nir_vec4 or current vec4_visitor code-paths

The NIR->vec4 pass will be activated if both the following conditions are met:

* INTEL_USE_NIR environment variable is defined and is positive (1 or true)
* The stage is vertex shader (support for geometry shaders and
ARB_vertex_program will be added later).

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f12302b89836a24255674a251f7a6902b4e9af7c 29-Jun-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Get rid of brw_vs_compile completely.

After tearing it out another level or two, and just passing the key and
vp directly, we can finally remove this struct. It also eliminates a
pointless memcpy() of the key.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
64390967c1abc326875e495f233afec6e685db72 30-Jun-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Remove 'c'/vs_compile from vec4_vs_visitor.

At this point, the brw_vs_compile structure only contains the key and
gl_vertex_program pointer. We may as well pass and store them directly;
it's simpler and more convenient (key-> instead of vs_compile->key...).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
13372a0ce746cde6fa6e0aa3c5130e4227f123e0 29-Jun-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vec4: Move c->last_scratch into vec4_visitor.

Nothing outside of vec4_visitor uses it, so we may as well keep it
internal.

Commit db9c915abcc5ad78d2d11d0e732f04cc94631350 for the vec4 backend.

(The empty class will be going away soon.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8524deb8c8fc37abc2cb2717be64a533746a92f9 29-Jun-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vec4: Move total_scratch calculation into the visitor.

This is more consistent with how we do it in the FS backend, and reduces
a tiny bit of duplication. It'll also allow for a bit more tidying.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
dc776ffb900b21421158ef8efbd675bdd47593bc 29-Jun-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vec4: Move perf_debug about register spilling into the visitor.

This patch makes us only issue the performance warning about register
spilling if we actually spilled registers. We also use scratch space
for indirect addressing and the like.

This is basically commit c51163b0cf7aff0375b1a5ea4cb3da9d9e164044 for
the vec4 backend.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0163c99e8f6959b5d6c7a937a322127cfdf9315f 30-Jun-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vec4: Plumb log_data through so the backend_shader field gets set.

Jason plumbed this through a while back in the FS backend, but
apparently we were just passing NULL in the vec4 backend.

This patch passes brw in as intended.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
40801295d5a3d747661abb1e2ca64d44c0e3dc05 23-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Remove the brw_context from the visitors

As of this commit, nothing actually needs the brw_context.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bcaf4a3f077e3e3fbc66f264fe9124fa920ee70c 23-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4_vs: Add an explicit use_legacy_snorm_formula flag

This way we can stop doing is_gles3 checks inside of the compiler.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
663f8d121d792edee5c012461bfd0b650011ff4a 20-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vs: Pass the current set of clip planes through run() and run_vs()

Previously, these were pulled out of the GL context conditionally based on
whether we were running ff/ARB or a GLSL program. Now, we just pass them
in so that the visitor doesn't have to grab them itself.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1b0f6ffa15b25e8601d60fe1ea74e893f7d33cf5 20-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Pull calls to get_shader_time_index out of the visitor

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c7893dc3c590b86787d8118e3920debaea3f16da 19-Jun-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Use a single index per shader for shader_time.

Previously, each shader took 3 shader time indices which were potentially
at arbirary points in the shader time buffer. Now, each shader gets a
single index which refers to 3 consecutive locations in the buffer. This
simplifies some of the logic at the cost of having a magic 3 a few places.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6e255a3299c9ec5208cb5519b5da2edb0ce2972b 17-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Add compiler options to brw_compiler

This creates the options at screen cration time and then we just copy them
into the context at context creation time. We also move is_scalar to the
brw_compiler structure.

We also end up manually setting some values that the core would have set by
default for us. Fortunately, there are only two non-zero shader compiler
option defaults that we aren't overriding anyway so this isn't a big deal.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d7565b7d65f8203c20735a61b86e9158b8ec4447 16-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Remove the dependance on brw_context from the generators

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e639a6f68e701f23b977a49c45d646c164991d36 16-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Plumb compiler debug logging through a function pointer in brw_compiler

v2 (Ken): Make shader_debug_log a printf-like function.
v3 (Jason): Add a void * to pass the brw_context through

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0f8ec779ddff4126837a7d4216ecf1d4b97e93d2 12-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Create a shader_dispatch_mode enum to replace VS/GS fields.

We used to store the GS dispatch mode in brw_gs_prog_data while
separately storing the VS dispatch mode in brw_vue_prog_data::simd8.

This patch introduces an enum to represent all possible dispatch modes,
and stores it in brw_vue_prog_data::dispatch_mode, unifying the two.

Based on a suggestion by Matt Turner.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b95ec49e57f81bdd75795dc93022533704efe509 20-May-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965/vs: Rework the logic for generating NIR from ARB vertex programs

Whether or not to use NIR is now equivalent to brw->scalar_vs. We can
simplify the logic and make it far less confusing.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
99cb4233205edcfa1a1e2967eef7bb16ff19bec4 20-May-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Rename backend_visitor to backend_shader

The backend_shader class really is a representation of a shader. The fact
that it inherits from ir_visitor is somewhat immaterial.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3687d752e51829b4723c9abb07ae56d2bbcda570 12-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/fs: Combine the fs_visitor constructors.

For scalar GS support, we either need to add a fourth constructor which
takes the GS structures, or combine the existing two and pass the shader
stage.

Given that they're not significantly different, I opted for the latter.

v2: Remove more stuff from the .h file (Jason and Jordan).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
76c1086f2dfb37a1edf6d2df6eebbe11ccbfc50b 24-Mar-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Change header_present to header_size in backend_instruction

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3da9f708d4f1375d674fae4d6c6eb06e4c8d9613 20-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.

v2: Save some CPU cycles by doing 'return progress' rather than
'depth++' in the discard jump special case.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f2fad0dc80627e853eea558498f18a9fa769992e 19-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965: Perform basic optimizations on the BROADCAST opcode.

v2: Style fixes.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f118e5d15fd9b35cf27a975a702c5fb81d3157aa 23-Apr-2015 Francisco Jerez <currojerez@riseup.net> i965: Add typed surface access opcodes.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0775d8835ac8d1f2ab75d04f0cddbad36b6787fe 23-Apr-2015 Francisco Jerez <currojerez@riseup.net> i965: Add untyped surface write opcode.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
20915130ace4cc0f700ece2a99c0353581a156bb 26-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Add support for untyped surface message sends from GRF.

This doesn't actually enable untyped surface message sends from GRF
yet, the upcoming atomic counter and image intrinsic lowering code
will.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
17233f9bbcbf570f0c7633c63dbd5ed88634ed60 21-Apr-2015 Jordan Justen <jordan.l.justen@intel.com> i965: Add brw_setup_tex_for_precompile. Use in VS, GS & FS.

Suggested-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1ac7db07b363207e8ded9259f84bbcaa084b8667 12-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Unhardcode a few more stage names and abbreviations.

The stage_abbrev and stage_name fields in backend_visitor provide what
we need without any additional effort. It also means we'll get the
right names for compute shaders, SIMD8 geometry shaders, and both kinds
of tessellation shaders.

This does unfortunately change the capitalization of the stage
abbreviation in the INTEL_DEBUG=optimizer output filenames. It doesn't
seem worth adding code to handle, though.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
dffc1a0ae3a75d426f10c5d3ba021de977467929 25-Apr-2015 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Remove unnecessary NULL check on generate_code() result.

Code generation is not allowed to fail for any reason - in fact,
fs_generator has no mechanism for failing. The visitor is responsible
for that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
28e9601d0e681411b60a7de8be9f401b0df77d29 16-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Add a devinfo field to backend_visitor and use it for gen checks

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
89c1feb78d010bc457f5d02be84c955eebf3549f 08-Apr-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Create NIR during LinkShader() and ProgramStringNotify().

Previously, we translated into NIR and did all the optimizations and
lowering as part of running fs_visitor. This meant that we did all of
that work twice for fragment shaders - once for SIMD8, and again for
SIMD16. We also had to redo it every time we hit a state based
recompile.

We now generate NIR once at link time. ARB programs don't have linking,
so we instead generate it at ProgramStringNotify time.

Mesa's fixed function vertex program handling doesn't bother to inform
the driver about new programs at all (which is rather mean), so we
generate NIR at the last minute, if it hasn't happened already.

shader-db runs ~9.4% faster on my i7-5600U, with a release build.

v2: Check NirOptions != NULL in ProgramStringNotify(). Don't bother
using _mesa_program_enum_to_shader_stage as we already know it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bff421332661bfd0f82ab9eee9e4fec9d06ed1a1 03-Apr-2015 Jason Ekstrand <jason.ekstrand@intel.com> i965: Check the INTEL_USE_NIR environment variable once at context creation

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
31dc63d5ca090fed3f1adcd4fd0db2f1f7aa19f7 25-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965/nir: Use NIR for ARB_vertex_program support on Gen8+.

Everything is already in place; we simply have to take the scalar code
generation path. This gives us SIMD8 VS programs, instead of SIMD4x2.

v2: Rebase on the patch that drops brw->gen >= 8.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ef09cfb51e0c1cc9e3c6f370813a843a6ecaa4e2 25-Mar-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Drop unnecessary brw->gen >= 8 check from scalar VS code.

brw->scalar_vs already implies that brw->gen >= 8.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e6e655ef76bb22193b31af2841cb50fda0c39461 18-Mar-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Define helpers to calculate the common live interval of a range of variables.

These will be especially useful when we start keeping track of
liveness information for each subregister.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
588859e18cb597612e56980a65a762ef069363e4 18-Mar-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Fix handling of multiple register reads and writes in split_virtual_grfs().

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9304f60cbe7c348a4771a7746606730bea3ae45f 18-Mar-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Fix handling of multiple register reads and writes in opt_register_coalesce().

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
516d45f78a3bbab0288c49c0f876ebdf4ad05bff 18-Mar-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Some more trivial swizzle clean-up.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
430c6bf70e48c08ba4dc9e00f2b88e2230793010 18-Mar-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Improve src_reg/dst_reg conversion constructors.

This simplifies the src_reg/dst_reg conversion constructors using the
swizzle utils introduced in a previous patch. It also makes them more
useful by changing their semantics slightly: dst_reg(src_reg) used to
set the writemask to XYZW if the src_reg swizzle was anything other
than XXXX, which was almost certainly not what the caller intended if
the swizzle was non-trivial. After this patch the same components
that are present in the swizzle will be enabled in the resulting
writemask.

src_reg(dst_reg) used to set the first components of the swizzle to
the enabled components of the writemask and then replicate the last
enabled component to fill the swizzle, which, in cases where the
writemask didn't have exactly the first n components set, would in
general not be compatible with the original dst_reg. E.g.:

| ADD(tmp, src_reg(tmp), src_reg(1));

would *not* do what one would expect (add one to each of the enabled
components of tmp) if tmp didn't have a writemask of the described
form (e.g. YZ, YW, XZW would all fail). This pattern actually occurs
in many different places in the VEC4 back-end, it's a wonder that it
hasn't caused piglit failures until now. After this patch
src_reg(dst_reg) will construct a swizzle with each enabled component
at its natural position (e.g. Y at the second position, Z at the
third, and so on). The resulting swizzle will behave like the
identity when used in any instruction with the original writemask.

I've manually verified that *none* of the callers of both conversion
constructors were relying on the previous broken semantics. There are
no piglit regressions on any generation.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
62fd3353387547504966d77f3350afc9b688ef93 18-Mar-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Pass argument by reference to src_reg/dst_reg conversion constructors.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
23bda945f570b4f566ed39b4c1de89a957247df7 18-Mar-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Remove swizzle_for_size() in favour of brw_swizzle_for_size().

It could be objected that swizzle_for_size() is "faster" than
brw_swizzle_for_size(). It's not measurably better in any reasonable
CPU-bound benchmark on VLV according to the Finnish benchmarking
system (including the SynMark2 DrvShComp shader compilation
benchmark).

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9a17e4e900256b5be73d935fa5f35c98b3b0d7fe 18-Mar-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Simplify opt_register_coalesce() using the swizzle utils.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
05ec72d8ecdba04a81745fbc3ca0df40c7fb8828 18-Mar-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Simplify reswizzle() using the swizzle utils.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7b30493dc4f0b1346fe4c1fe52211f0c0d7ed229 18-Mar-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Simplify opt_reduce_swizzle() using the swizzle utils.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7e816c7feb8cffa878546eee363240b1b66d5c42 18-Mar-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Fix signedness of dst_reg::writemask.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b0d422cd2a99d2fd26ab11880d5d8410ebfc64b2 16-Mar-2015 Matt Turner <mattst88@gmail.com> i965/fs: Print spills:fills and number of promoted constants.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
78df9d5e30fbca8b0795594448a3bcae05d5f5f2 05-Mar-2015 Matt Turner <mattst88@gmail.com> i965/vec4: Handle saturate in dump_instruction().

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
63d6d09a3b3790c5ec00f2cbc06f58c82ae40b0c 03-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Don't attempt to reduce swizzles of send from GRF instructions.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
eb47d0efd39d73d4388389d6c0ebe458160f79fa 05-Feb-2015 Matt Turner <mattst88@gmail.com> i965: Optimize multiplication by -1 into a negated MOV.

instructions in affected programs: 968 -> 942 (-2.69%)
helped: 4

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
55de910f909ac668ec7ea8fd94ec4f235b0d0335 11-Feb-2015 Eric Anholt <eric@anholt.net> i965: Quiet another compiler warning about uninitialized values.

The compiler can't tell that we're always going to hit the first if block
on the first time through the loop.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b40bcd24e0c86fb02c226261c1fe46fb362be217 04-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Don't set any dependency control bits for F32TO16 on Gen8.

It's expanded to several instructions.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
530445330b403d835a4027b41388b5eea8c2e1ab 03-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Init mlen for several send from GRF instructions.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
de666fc102b805707c7033b203c5b76ccbbcef8d 05-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Fix the scheduler to take into account reads and writes of multiple registers.

v2: Avoid nested ternary operators in vec4_instruction::regs_read(). (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8ad486077e122c19b603750e19dd678bb7793d5b 05-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Make vec4_visitor::implied_mrf_writes() return zero for sends from GRF.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
388b136e677e30249e062145b488c2d938c1ef17 05-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965/vec4: Implement equals() method for dst_reg too.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
dfe957c02b753dbb5b372e768a5677f577daf9ef 06-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965: Move up fs_inst::flag_subreg to backend_instruction.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
447879eb88b8df41ad32cf4406cc636b112b72d9 10-Feb-2015 Francisco Jerez <currojerez@riseup.net> i965: Factor out virtual GRF allocation to a separate object.

Right now virtual GRF book-keeping and allocation is performed in each
visitor class separately (among other hundred different things),
leading to duplicated logic in each visitor and preventing layering as
it forces any code that manipulates i965 IR and needs to allocate
virtual registers to depend on the specific visitor that happens to be
used to translate from GLSL IR.

v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor).

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8030e269e911c4f90a44d9a77eb342dd2657d229 03-Dec-2014 Ben Widawsky <benjamin.widawsky@intel.com> i965/vec4: Correct MUL destination hazard

As it turns out, we were over-thinking the cause of the hang on
Cherryview. It's simply errata for Cherryview.

commit 88fea85f09e2252035bec66ab26c375b45b000f5
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Fri Nov 21 10:47:41 2014 -0800

i965/vec4/gen8: Handle the MUL dest hazard exception

This is an explanation to why we never saw the hang on BDW.

NOTE: The problem the original patch was trying to fix does still exist. It will
have to be fixed at some point.

v2: Modify commit message, s/CHV/BDW

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
94e7b59a75fc2ecc51a74196f6cd198546603b85 05-Jan-2015 Matt Turner <mattst88@gmail.com> i965: Convert CMP.GE -(abs)reg 0 -> CMP.Z reg 0.

total instructions in shared programs: 5952059 -> 5951603 (-0.01%)
instructions in affected programs: 138812 -> 138356 (-0.33%)
GAINED: 1
LOST: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
41d9f232b6a7f53086b9c428cca30e45905abd48 12-Jan-2015 Matt Turner <mattst88@gmail.com> i965/vec4: Make sure that imm writes are to registers in the same file.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87887
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3167a80bb1119616b70fbbcf2661d3fb511a6034 13-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Fix "vertex" vs. "geometry" and "VS" vs. "GS" in debug output.

We were happily printing "Native code for unnamed vertex shader" and
"VS vec4" program for geometry shaders in our INTEL_DEBUG=gs output,
as well as the KHR_debug output used by shader-db.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
68ed14d6adcaf4b91216fc1c53792e88d1fd024d 13-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Pass a shader stage abbreviation to fs_generator().

A lot of messages hardcoded the string "FS", which is confusing on
Broadwell, where we use this code for VS support as well.

shader-db particularly got confused, as it reported two "FS SIMD8"
shaders, and no vertex shaders at all. Craziness ensued.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0b98b2bf535d6e6b6b02c0d47ea03f98adf42f15 01-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Make the precompile ignore DEPTH_TEXTURE_MODE on Gen7.5+.

Gen7.5+ platforms that support the "Shader Channel Select" feature leave
key->tex.swizzles[i] as SWIZZLE_NOOP except when GL_DEPTH_TEXTURE_MODE
is GL_ALPHA (which is really uncommon). So, the precompile should leave
them as SWIZZLE_NOOP (aka SWIZZLE_XYZW) as well.

We didn't notice this because prog->ShadowSamplers is not set correctly.
The next patch will fix that problem.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
408e298942ffb03c00e05dce2569c291df6bec49 01-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Fix INTEL_DEBUG=optimizer with VF types.

Hardcoding stderr is wrong; INTEL_DEBUG=optimizer uses other files.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9b8bd67768769b685c25e1276e053505aede5f93 01-Jan-2015 Kenneth Graunke <kenneth@whitecape.org> i965: Show opt_vector_float() and later passes in INTEL_DEBUG=optimizer.

In order to support calling opt_vector_float() inside a condition, this
patch makes OPT() a statement expression:

https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html

We've used that elsewhere already.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
798c094e6266bf53b332f332e82a90c338c49915 21-Dec-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Do separate copy followed by constant propagation after opt_vector_float().

total instructions in shared programs: 5877012 -> 5876617 (-0.01%)
instructions in affected programs: 33140 -> 32745 (-1.19%)

From before the commit that allows VF constant propagation (which hurt
some programs) to here, the results are:

total instructions in shared programs: 5877951 -> 5876617 (-0.02%)
instructions in affected programs: 123444 -> 122110 (-1.08%)

with no programs hurt.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bbdd3198a5f778ba55c037e4af86d88b06ca4e95 20-Dec-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Do CSE, copy propagation, and DCE after opt_vector_float().

total instructions in shared programs: 5869005 -> 5868220 (-0.01%)
instructions in affected programs: 70208 -> 69423 (-1.12%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
44573458bdc52acc304fb75d6df502312b8e149c 20-Dec-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Add pass to gather constants into a vector-float MOV.

Currently only handles consecutive instructions with the same
destination that collectively write all channels.

total instructions in shared programs: 5879798 -> 5869011 (-0.18%)
instructions in affected programs: 465236 -> 454449 (-2.32%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7bc6e455e231076bfac6c678c375ea4aca94ebf0 21-Dec-2014 Matt Turner <mattst88@gmail.com> i965: Add support for saturating immediates.

I don't feel great about assert(!"unimplemented: ...") but these
cases do only seem possible under some currently impossible circumstances.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3978585bccf69ff8f607cad0de025ea91c418587 20-Dec-2014 Matt Turner <mattst88@gmail.com> i965: Add fs_reg/src_reg constructors that take vf[4].

Sometimes it's easier to generate 4x values into an array, and the
memcpy is 1 instruction, rather than 11 to piece 4 arguments together.

I'd forgotten to remove the prototype from fs_reg from a previous patch,
so it's already there for us here.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8517e665bc4c378e8e7523827090fd1b06abaecd 12-Dec-2014 Andres Gomez <agomez@igalia.com> i965/brw_reg: struct constructor now needs explicit negate and abs values.

We were assuming, when constructing a new brw_reg struct, that the
negate and abs register modifiers would not be present by default in
the new register.

Now, we force explicitly setting these values when constructing a new
register.

This will avoid problems like forgetting to properly set them when we
are using a previous register to generate this new register, as it was
happening in the dFdx and dFdy generation functions.

Fixes piglit test shaders/glsl-deriv-varyings

Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82991
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ee5fb8d1ba7f50ed94e1a34fa0f6e15a0588145e 21-Oct-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Generate vs code using scalar backend for BDW+

With everything in place, we can now use the scalar backend compiler for
vertex shaders on BDW+. We make scalar vertex shaders the default on
BDW+ but add a new vec4vs debug option to force the vec4 backend.

No piglit regressions.

Performance impact is minimal, I see a ~1.5 improvement on the T-Rex
GLBenchmark case, but in general it's in the noise. Some of our
internal synthetic, vs bounded benchmarks show great improvement, 20%-40%
in some cases, but real-world cases are mostly unaffected.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bf2307937995212895375d1e258d50207da3d24e 25-Nov-2014 Kristian Høgsberg <krh@bitplanet.net> i965: Rename brw_vec4_prog_data/key to brw_bue_prog_data/key

These structs aren't vec4 specific, they are shared by shader stages
operating on Vertex URB Entries (VUEs). VUEs are the data structures in
the URB that hold vertex data between the pipeline geometry stages.
Using vue in the name instead of vec4 makes a lot more sense, especially
when we add scalar vertex shader support.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0d3cc01b0b092271938ce2cf2b77d27dc385e4d8 24-Oct-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Allow CSE on uniform-vec4 expansion MOVs.

Three source instructions cannot directly source a packed vec4 (<0,4,1>
regioning) like vec4 uniforms, so we emit a MOV that expands the vec4 to
both halves of a register.

If these uniform values are used by multiple three-source instructions,
we'll emit multiple expansion moves, which we cannot combine in CSE
(because CSE emits moves itself).

So emit a virtual instruction that we can CSE.

Sometimes we demote a uniform to to a pull constant after emitting an
expansion move for it. In that case, recognize in opt_algebraic that if
the .file of the new instruction is GRF then it's just a real move that
we can copy propagate and such.

total instructions in shared programs: 5822418 -> 5812335 (-0.17%)
instructions in affected programs: 351841 -> 341758 (-2.87%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
afd605f3461462ba1b9f522b079ff5a03e7ab55c 01-Dec-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Make vertex color clamp handling code VS specific.

Vertex color clamping only applies to gl_[Secondary]{Front,Back}Color,
which are compatibility-only built-in varyings. We only support GS in
core profile, so they can't exist in geometry shaders.

We can drop several dirty bits from the GS program key - they're
unnecessary for a core profile implementation.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5df88c2096281f416b2738debac1c4c329e29673 03-Nov-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Rewrite dead code elimination to use live in/out.

Improves 359 shaders by >=10%
114 shaders by >=20%
91 shaders by >=30%
82 shaders by >=40%
22 shaders by >=50%
4 shaders by >=60%
2 shaders by >=80%

total instructions in shared programs: 5845346 -> 5822422 (-0.39%)
instructions in affected programs: 364979 -> 342055 (-6.28%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e14c7c7faff3c204a5eefc1f2ea487d4730b8382 10-Mar-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Add VEC4_OPCODE_PACK_4_BYTES.

Will be used by emit_pack_{s,u}norm_4x8().
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
cb0ba848d4176c1ed2c4542fd5875867f460fc3b 09-Mar-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Add vector float immediate infrastructure.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
88fea85f09e2252035bec66ab26c375b45b000f5 21-Nov-2014 Ben Widawsky <benjamin.widawsky@intel.com> i965/vec4/gen8: Handle the MUL dest hazard exception

Fix one of the few cases where we can't reliable touch the destination hazard
bits. I am explicitly doing this patch individually so it is easy to backport. I
was tempted to do this patch before the previous patch which reorganized the
code, but I believe even doing that first, this is still easy to backport.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
156f565f9eb36dad3cd959952724bc54f9ff21ea 21-Nov-2014 Ben Widawsky <benjamin.widawsky@intel.com> i965/vec4: Extract depctrl hazards

Move this to a separate function so that we can begin to add other little
caveats without making too big a mess.

NOTE: There is some desire to improve this function eventually, but we need to
fix a bug first.

v2:
Use const for the inst for the hazard check (Matt)
Invert safe logic to get rid of the double negative (Matt)
Add PRM reference for predicates (Matt)
Add note about empirical evidence for math (Matt)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7d560a3861ff30aa9d8ec872cf9cd7d72a980eb2 21-Oct-2014 Ian Romanick <ian.d.romanick@intel.com> i965: Silence unused parameter warning in brw_dump_ir

Just remove the parameter. Silences:

brw_program.c: In function 'brw_dump_ir':
brw_program.c:566:33: warning: unused parameter 'brw' [-Wunused-parameter]

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b52126b44f40643aa2c0986c1d51330f4e4130b5 27-Sep-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Optimize sqrt+inv into rsq.

Transform

sqrt a, b
rcp c, a

into

sqrt a, b
rsq c, b

In most cases the sqrt's result is still used, so the improvement here
is that we've broken a dependency between these instructions. Leads to
80 fewer INV instructions and 80 more RSQ.

Occasionally the sqrt's result is no longer used, leading to:

instructions in affected programs: 5005 -> 4949 (-1.12%)

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
189ac077644c4ef2c6c15080b6d094410c74abdc 27-Sep-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Call opt_algebraic after opt_cse.

The next patch adds an algebraic optimization for the pattern

sqrt a, b
rcp c, a

and turns it into

sqrt a, b
rsq c, b

but many vertex shaders do

a = sqrt(b);
var1 /= a;
var2 /= a;

which generates

sqrt a, b
rcp c, a
rcp d, a

If we apply the algebraic optimization before CSE, we'll end up with

sqrt a, b
rsq c, b
rcp d, a

Applying CSE combines the RCP instructions, preventing this from
happening.

No shader-db changes.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
72bb3f81c621931e42759148bc8bddc511266dd0 02-Sep-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Don't iterate between blocks with inst->next/prev.

The register coalescing portion of this patch hurts three shaders in
Guacamelee by one instruction each, but examining the diff makes me
believe that what we were generating was (perhaps harmlessly) incorrect.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
90bfeb22444df6ce779251522e47bf169e130f8e 01-Sep-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Don't use instruction list after calculating the cfg.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a4fb8897a2bd00eefa8a503ec17d45e791bced91 01-Sep-2014 Matt Turner <mattst88@gmail.com> i965: Remove now unneeded calls to calculate_cfg().

Now that nothing invalidates the CFG, we can calculate_cfg() immediately
after emit_fb_writes()/emit_thread_end() and never again.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
072ea414d04f1b9a7bf06a00b9011e8ad521c878 01-Sep-2014 Matt Turner <mattst88@gmail.com> i965: Remove cfg-invalidating parameter from invalidate_live_intervals.

Everything has been converted to preserve the CFG.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
269b6e24d6ec61d8d8d0c5d1b3d1bfa4f4a55f5f 25-Aug-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Preserve CFG in spill_reg().

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b0b64c85e4a0dafbb46405e4b3c17be24b63347f 25-Aug-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Preserve the CFG in a few more places.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c66165ab2b15047792808433b788632a4b9df287 01-Aug-2014 Iago Toral Quiroga <itoral@igalia.com> i965/gen6/gs: Fix binding table clash between TF surfaces and textures.

For gen6 geometry shaders we use the first BRW_MAX_SOL_BINDINGS entries of the
binding table for transform feedback surfaces. However, vec4_visitor will
setup the binding table so that textures use the same space in the binding
table. This is done when calling assign_common_binding_table_offsets(0) as
part if its run() method.

To fix this clash we add a virtual method to the vec4_visitor hierarchy to
assign the binding table offsets, so that we can change this behavior
specifically for gen6 geometry shaders by mapping textures right after the
first BRW_MAX_SOL_BINDINGS entries.

Also, when there is no user-provided geometry shader, we only need to upload
the binding table if we have transform feedback, however, in the case of a
user-provided geometry shader, we can't only look into transform feedback
to make that decision.

This fixes multiple piglit tests for textureSize() and texelFetch() when these
functions are called from a geometry shader in gen6, like these:

bin/textureSize gs sampler2D -fbo -auto
bin/texelFetch gs usampler2D -fbo -auto

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2c85132e511bbef9a0965c69848981b1bffb5bad 09-Jul-2014 Iago Toral Quiroga <itoral@igalia.com> i965/gen6/gs: Implement GS_OPCODE_URB_WRITE_ALLOCATE.

Gen6 geometry shaders need to allocate URB handles for each new vertex they
emit after the first (the URB handle for the first vertex is obtained via the
FF_SYNC message).

This opcode adds the URB allocation mechanism to regular URB writes.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d0bdd4ce983ddd52f9f4b70dced4e471c60a130c 09-Jul-2014 Iago Toral Quiroga <itoral@igalia.com> i965/gen6/gs: Implement GS_OPCODE_FF_SYNC.

This implements the FF_SYNC message required in gen6 geometry shaders to
get the initial URB handle.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
667f758788f0796d9be16f0f361022d447f622f5 09-Sep-2014 Chris Forbes <chrisf@ijw.co.nz> i965/vec4: slightly improve insn dumping with no srcs

Previously, we would get a trailing ', ' which looked strange.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6b6145204dd4a1112f6e1fe10162636141495b79 11-Sep-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Separate gl_InstanceID and gl_VertexID uploading.

We always uploaded them together, mostly out of laziness - both required
an additional vertex element. However, gl_VertexID now also requires an
additional vertex buffer for storing gl_BaseVertex; for non-indirect
draws this also means uploading (a small amount of) data. This is extra
overhead we don't need if the shader only uses gl_InstanceID.

In particular, our clear shaders currently use gl_InstanceID for doing
layered clears, but don't need gl_VertexID.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
87472ae58cf2a5c812630f4eabd485931d243e0c 05-Sep-2014 Matt Turner <mattst88@gmail.com> i965/fs: Brown bag fix.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e8df6a6b32aae7695ce010f18588c51cb7d18978 31-Aug-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Add ability to reswizzle arbitrary swizzles.

Before commit 04895f5c we would only reswizzle dot product instructions
(since they wrote the same value into all channels, and we didn't have
to think about anything else). That commit extended reswizzling to cases
when the swizzle was single valued -- i.e., writing the same result into
all channels.

But allowing reswizzling of arbitrary things is actually really easy and
is even less code. (Why didn't we do this in the first place?!)

total instructions in shared programs: 4266079 -> 4261000 (-0.12%)
instructions in affected programs: 351933 -> 346854 (-1.44%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1ee1d8ab468cafd25cfcc513319f3f046492947f 31-Aug-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Reswizzle sources when necessary.

Despite the comment above the function claiming otherwise, the function
did not reswizzle sources, which would lead to bad code generation since
commit 04895f5c, which began claiming we could do such swizzling when we
could not.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82932
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f92fbd554f2e9e702a2bd650c9b2571a3f4f1ab8 02-Sep-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Move curb_read_length/total_scratch to brw_stage_prog_data.

All shader stages have these fields, so it makes sense to store them in
the common base structure, rather than duplicating them in each.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1c573c9adbb8bb95bc10f6ade76a430684918160 28-Aug-2014 Jason Ekstrand <jason.ekstrand@intel.com> i965/vec4: Don't segfault when debug-logging a null program

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
20a849b4aa63c7fce96b04de674a4c70f054ed9c 13-Jul-2014 Matt Turner <mattst88@gmail.com> i965: Use basic-block aware insertion/removal functions.

To avoid invalidating and recreating the control flow graph. Also stop
invalidating the CFG in places we didn't add or remove an instruction.

cfg calculations: 202951 -> 80307 (-60.43%)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
04895f5c601b240df547739da786b7c2b65bdd1e 15-Aug-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Allow reswizzling writemasks when swizzle is single-valued.

total instructions in shared programs: 4288033 -> 4266151 (-0.51%)
instructions in affected programs: 930915 -> 909033 (-2.35%)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9a071e3339afcf6fd937ae31121fa3b3face3bfe 18-Aug-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Add a pass to reduce swizzles.

total instructions in shared programs: 4344280 -> 4288033 (-1.29%)
instructions in affected programs: 397468 -> 341221 (-14.15%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a3d0ccb037082f3aa66bd558dfbe89f63a6eedd3 12-Jul-2014 Matt Turner <mattst88@gmail.com> i965: Pass a cfg pointer to generate_{code,assembly}.

The loop over all instructions is now two-fold, over all of the blocks
and all of the instructions in each block.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
596990d91e2a4c4a3a303c6c2da623bf1840771b 12-Jul-2014 Matt Turner <mattst88@gmail.com> i965: Add and use foreach_block macro.

Use this as an opportunity to rename 'block_num' to 'num'. block->num is
clear, and block->block_num has always been redundant.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
972e87ca30b4c4b7f6269e5f9fe8c5cb6356f744 14-Aug-2014 Pekka Paalanen <pekka.paalanen@collabora.co.uk> i965: fix compiler error in union initiliazer

gcc 4.6.3 chokes with the following error:

brw_vec4.cpp: In member function 'int brw::vec4_visitor::setup_uniforms(int)':
brw_vec4.cpp:1496:37: error: expected primary-expression before '.' token

Apparently C++ does not do named initializers for unions, except maybe
as a gcc extension, which is not present here.

As .f is the first element of the union, just drop it. Fixes the build
error.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2c50212b14da27de4e3da62488ae4e35c069d84e 11-Aug-2014 Neil Roberts <neil@linux.intel.com> i965: Store uniform constant values in a gl_constant_value instead of float

The brw_stage_prog_data struct previously contained an array of float pointers
to the values of parameters. These were then copied into a batch buffer to
upload the values using a regular assignment. However the float values were
also being overloaded to store integer values for integer uniforms. This can
break if x87 floating-point registers are used to do the assignment because
the fst instruction tries to fix up invalid float values. If an integer
constant happened to look like an invalid float value then it would get
altered when it was copied into the batch buffer.

This patch changes the pointers to be gl_constant_value instead so that the
assignment should end up copying without any alteration. This also makes it
more obvious that the values being stored here are overloaded for multiple
types.

There are some static asserts where the values are uploaded to ensure that the
size of gl_constant_value is the same as a float.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81150
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f17bfc9ba954608c58fd0560f255e40eef7e7cea 11-Aug-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Never use the Gen8 code generators.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
074d472398b3cc7f32fe5c0cc742853cf66fabed 30-Jun-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Switch to the EU emit layer for code generation on Broadwell.

Everything should be in place to unify code generation between Gen4-7
and Gen8+. We should be able to drop the Gen8 generators at this point.

However, leave them hooked up for a brief moment, for testing and
comparison purposes. Set GEN8=1 to use the old Gen8+ code generator
paths.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
36a4a6bbdca0c30e16d56e6b406ea7c94831048f 22-Jul-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Port INTEL_DEBUG=optimizer to the vec4 backend.

Largely via copy and paste.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3e9105f7eefae97c928034662f67019973b9e483 12-Jul-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Use foreach_inst_in_block a couple more places.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1761671b0627ce8e1c0eae721e1fca5c2d04690e 12-Jul-2014 Matt Turner <mattst88@gmail.com> i965: Replace cfg instances with calls to calculate_cfg().

Avoids regenerating it unnecessarily.

Every program in shader-db improved, none by an amount less than a 1/3
reduction. One Dota2 shader decreased from 62 -> 24.

cfg calculations: 429492 -> 193197 (-55.02%)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1854ead64ca465ca03e8e5369cd1749bc92c315a 06-Jul-2014 Chris Forbes <chrisf@ijw.co.nz> i965: Avoid crashing while dumping vec4 insn operands

We'd otherwise go looking into virtual_grf_sizes for things that aren't
in there at all.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3c8dc48ad1d4061a2a1d0b9ea3126350b98274f0 06-Mar-2013 Kenneth Graunke <kenneth@whitecape.org> i965/vec4: Add basic common subexpression elimination.

[mattst88]: Modified to perform CSE on instructions with
the same writemask. Offered no improvement before.

total instructions in shared programs: 1995633 -> 1995185 (-0.02%)
instructions in affected programs: 14410 -> 13962 (-3.11%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
34ef6a7651d6651e0bca77c4d4b890af582ad360 30-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Move is_zero/one/null/accumulator into backend_reg.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
53992a102ffddf2e0fad401252cfc1c034d022ad 30-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Use immediate storage in brw_reg for visitor regs.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
489ec685542590c7412db81623952c1aa75d946f 19-May-2014 Eric Anholt <eric@anholt.net> i965: Update a ton of comments about constant buffers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c0f1929dd23bbc558e9eef0f8fd40e10dfef3c21 19-May-2014 Eric Anholt <eric@anholt.net> i965: Move dispatch_grf_start_reg and first_curbe_grf into stage_prog_data.

I wanted to access this value from stage-generic code, so stop storing it
under two different names.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3d826729dabab53896cdbb1f453c76fab1c7e696 29-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Use unreachable() instead of unconditional assert().

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
266109736a9a69c3fdbe49fe1665a7a63c5cc122 25-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Use typed foreach_in_list_safe instead of foreach_list_safe.

Acked-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c5030ac0ac15d3c91c4352789f94281da9a9dcad 25-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Use typed foreach_in_list instead of foreach_list.

Acked-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
46659d46a8c2f7bbc8deb472faff2dccbde92d29 24-Jun-2014 Matt Turner <mattst88@gmail.com> i965: Make can_do_source_mods() a member of the instruction classes.

Pretty nonsensical to have it as a method of the visitor just for access
to brw.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d0575d98fc595dcc17706dc73d1eb461027ca17a 14-Jun-2014 Kenneth Graunke <kenneth@whitecape.org> i965/vec4: Fix dead code elimination for VGRFs of size > 1.

When faced with code such as:

mov vgrf31.0:UD, 960D
mov vgrf31.1:UD, vgrf30.xxxx:UD

The dead code eliminator didn't consider reg_offsets, so it decided that
the second instruction was writing was writing to the same register as
the first one, and eliminated the first one. But they're actually
different registers.

This fixes INTEL_DEBUG=shader_time for vertex shaders. In the above
code, vgrf31.0 represents the offset into the shader_time buffer where
the data should be written, and vgrf31.1 represents the actual time
data. With a completely undefined offset, results were...unexpected.

I think this is probably one of the few cases (maybe only case) where we
generate multiple MOVs to a large VGRF. Normally, we just use them as
texturing results; the other SEND-from-GRF uses a size 1 VGRF.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79029
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7b9cf797903a5ea70072a28c0486d3e99ee60645 06-Mar-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Make src_reg::equals() take a constant reference, not a pointer.

This is more typical C++ style.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
56d6dcf4f771d57d2759b2a5c5006f24444c696f 29-May-2014 Matt Turner <mattst88@gmail.com> i965: Give dump_instruction() a FILE* argument.

Use function overloading rather than default arguments, since gdb
doesn't know about default arguments.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
306ed81b9363721058c568244f9860c5c8c819f4 04-Apr-2014 Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> i965: Add writes_accumulator flag

Our hardware has an "accumulator" register, which can be used to store
intermediate results across multiple instructions. Many instructions
can implicitly write a value to the accumulator in addition to their
normal destination register. This is enabled by the "AccWrEn" flag.

This patch introduces a new flag, inst->writes_accumulator, which
allows us to express the AccWrEn notion in the IR. It also creates a
n ALU2_ACC macro to easily define emitters for instructions that
implicitly write the accumulator.

Previously, we only supported implicit accumulator writes from the
ADDC, SUBB, and MACH instructions. We always enabled them on those
instructions, and left them disabled for other instructions.

To take advantage of the MAC (multiply-accumulate) instruction, we
need to be able to set AccWrEn on other types of instructions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
30c35d1dcb2fde19b1c968751fda5151b795d257 09-Apr-2014 Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> i965: Add is_accumulator() function.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
602510395a96a1f6ca29189e4f5cfb3f07f21d23 13-Feb-2014 Mike Stroyan <mike@LunarG.com> i965: Avoid dependency hints on math opcodes

Putting NoDDClr and NoDDChk dependency control on instruction
sequences that include math opcodes can cause corruption of channels.
Treat math opcodes like send opcodes and suppress dependency hinting.

Signed-off-by: Mike Stroyan <mike@LunarG.com>
Tested-by: Tony Bertapelli <anthony.p.bertapelli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
596737ee91cc199a8edff5dc440736471e28f297 24-Mar-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Let DCE eliminate dead writes in other basic blocks.

We previously stopped searching for unread writes after encountering
control flow, but we can instead just search backwards until we hit
control flow.

instructions in affected programs: 22854 -> 22194 (-2.89%)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
20dee82a75ac7415fba0b3540a1f99d60b2325db 01-Apr-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Consider sources of non-GRF-dst instructions for dead channels.

Previously we'd ignore the sources of instructions with non-GRF
destinations when calculating calculating the dead channels. This would
lead to us incorrectly removing the first instruction in this sequence:

mov vgrf11, ...
cmp.ne.f0 null, vgrf11, 1.0
mov vgrf11, ...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76616
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e14cc504f307a7fa88c8b6757df53026aaa39b08 02-Apr-2014 Tapani Pälli <tapani.palli@intel.com> i965/vec4: do not trim dead channels on gen6 for math

Do not set a writemask on Gen6 for math instructions, those are
executed using align1 mode that does not support a destination mask.

v2: cleanups, better comment (Matt)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76883

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3a8bd9724196075da76ddcb50eff4867c5a37398 29-Mar-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Don't trim writemasks of texture instructions.

It was my understanding that the writemask works in SIMD4x2 mode for
texturing instructions and doesn't require a message header. Some bit of
this logic must be wrong, so disable it until it's understood.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76617
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
764e25d79dad3096274ab2df04f5aa3ffb232119 19-Mar-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Eliminate dead writes to the flag register.

For each write, search previous instructions for unread writes to the
flag register and remove them. Note that this will not eliminate the
last unread write.

total instructions in shared programs: 788074 -> 788004 (-0.01%)
instructions in affected programs: 4930 -> 4860 (-1.42%)

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9cd51bb0c4608258199c69bc7738e72f055799d2 11-Mar-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Eliminate writes that are never read.

With an awful O(n^2) algorithm that searches previous instructions for
dead writes.

total instructions in shared programs: 805582 -> 788074 (-2.17%)
instructions in affected programs: 144561 -> 127053 (-12.11%)

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1b8f143a2302739de90cb643d732e12b55d4e4eb 12-Mar-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Factor code out of DCE into a separate function.

Will be reused in the next commit.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9630ba6c6e754b438cf67c7d76ec1c99488df3ba 11-Mar-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Let dead code eliminate trim dead channels.

That is, modify

mad dst, a, b, c

to be

mad dst.xyz, a, b, c

if dst.w is never read.

total instructions in shared programs: 811869 -> 805582 (-0.77%)
instructions in affected programs: 168287 -> 162000 (-3.74%)

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
dc0f5099fa3cb564c25eb892fde93cacd29df8f1 11-Mar-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Track live ranges per-channel, not per vgrf.

Will be squashed with the next patch.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
89ccd11eebeee884d581e831b61368ac97057b43 11-Mar-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Don't dead code eliminate instructions writing the flag.

A future patch adds support for removing dead writes to the flag
register. This patch simplifies the logic until then.

total instructions in shared programs: 811813 -> 811869 (0.01%)
instructions in affected programs: 3378 -> 3434 (1.66%)

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3a12f50f9ca7f03f470ee053b9076ac12c4d486d 11-Mar-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Preparatory clean up of dead_code_eliminate().

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
10dd6eca89951e0cb40e21c3b53caa33d8fcb383 13-Mar-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Add is_null() method to dst_reg.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0884ce8f42d0e04e889c6d0e4dde91f9aa58e85e 13-Mar-2014 Matt Turner <mattst88@gmail.com> i965/vec4: Print the predicate in dump_instructions().

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
01d9023a9b9a50b42f7a4ef4799d0e35e0b045ca 11-Mar-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Fix register types in dump_instructions(), again.

In commit e57d77280efcbfd6579a88f071426653287ef833, I fixed this for
destinations in the Vec4 backend, and sources in the scalar backend.
But not both types in both backends.

To prevent this mess from continuing, make the reg_encoding table
static, so only the disassembler can use it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a290cd039cc07330598a101e74d25289ce70bcee 18-Feb-2014 Topi Pohjolainen <topi.pohjolainen@intel.com> i965: Merge resolving of shader program source

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
59989a4a92e638415d50e9acdd0685eb56eb17f3 27-Feb-2014 Petri Latvala <petri.latvala@intel.com> i965: Assert array index on access to vec4_visitor's arrays.

v2: vec4_visitor::pack_uniform_registers(): Use correct comparison in the
assert, this->uniforms is already adjusted. Compare the actual value used to
index uniform_size and uniform_vector_size instead.

Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a76e5dce4fc8d50f8699c108833f24e80167d706 23-Dec-2013 Eric Anholt <eric@anholt.net> i965: Move compiler debugging output to stderr.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f28c9208652143b4925bd97ce9823728c34d34a5 21-Feb-2014 Eric Anholt <eric@anholt.net> i965: Refactor debug dumping of GLSL IR.

This was only going to get worse when tesselation shows up, and was
causing too much extra duplication in my stderr changes coming up.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c2ebbe2728cd709029313f4b9c9cc53432c510a1 20-Feb-2014 Eric Anholt <eric@anholt.net> i965: Stop throwing away our double precision for time calculations.

Fixes negative times being reported in our perf debug.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
42b226ef824ed61ccf51fa9a1198cba305ad5472 19-Feb-2014 Francisco Jerez <currojerez@riseup.net> i965: Make sure that backend_reg::type and brw_reg::type are consistent for fixed regs.

And define non-mutating helper functions to retype fixed and normal
regs with a common interface. At some point we may want to get rid of
::fixed_hw_reg completely and have fixed regs use the normal register
data members (e.g. backend_reg::reg to select a fixed GRF number,
src_reg::swizzle to store the swizzle, etc.), I have the feeling that
this is not the last headache we're going to get because of the
multiple ways to represent the same thing and the different register
interface depending on the file a register is stored in...

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ae8b066da5862b4cfc510b3a9a0e1273f9f6edd4 19-Feb-2014 Francisco Jerez <currojerez@riseup.net> i965: Move up duplicated fields from stage-specific prog_data to brw_stage_prog_data.

There doesn't seem to be any reason for nr_params, nr_pull_params,
param, and pull_param to be duplicated in the stage-specific
subclasses of brw_stage_prog_data. Moving their definition to the
common base class will allow some code sharing in a future commit, the
removal of brw_vec4_prog_data_compare and brw_*_prog_data_free, and
the simplification of the stage-specific brw_*_prog_data_compare.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7f00c5f1a3e0db20a89cfedefd53cbe817fec9e3 23-Nov-2013 Francisco Jerez <currojerez@riseup.net> i965/vec4: Add constructor of src_reg from a fixed hardware reg.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b424da4be07ab8d34986e6f3824c679b623df952 28-Nov-2013 Francisco Jerez <currojerez@riseup.net> i965/vec4: Fix confusion between SWIZZLE and BRW_SWIZZLE macros.

Most of the VEC4 back-end agrees on src_reg::swizzle being one of the
BRW_SWIZZLE macros defined in brw_reg.h, except in two places where we
use Mesa's SWIZZLE macros. There is even a doxygen comment saying
that Mesa's macros are the right ones. They are incompatible swizzle
representations (3 bits vs. 2 bits per component), and the code using
Mesa's works by pure luck. Fix it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e57d77280efcbfd6579a88f071426653287ef833 05-Feb-2014 Kenneth Graunke <kenneth@whitecape.org> i965: Fix register types in dump_instructions().

This regressed when I converted BRW_REGISTER_TYPE_* to be an abstract
type that doesn't match the hardware description. dump_instruction()
was using reg_encoding[] from brw_disasm.c, which no longer matches
(and was incorrect for Gen8+ anyway).

This patch introduces a new function to convert the abstract enum values
into the letter suffix we expect.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ce527a6722491fa7d696266d5dec13f0b72bf8e8 10-Dec-2013 Topi Pohjolainen <topi.pohjolainen@intel.com> i965: rename tex_ms to tex_cms

Prepares for the introduction of non-compressed multi-sampled
lookup used in the blorp programs.

v2: now also taking into account gen8

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
71bc11a37508542662132b16a53acd5f541cd2b4 05-Dec-2013 Matt Turner <mattst88@gmail.com> i965: Print reg_offset for vgrf of size > 1 in dump_instruction().

Previously we wouldn't print the +0 for the first part of a VGRF of size
greater than 1.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a4d68e9ee94cf4855a3240c3516279b4e7740268 17-Jan-2014 Paul Berry <stereotype441@gmail.com> i965: Add GS support to INTEL_DEBUG=shader_time.

Previously, time spent in geometry shaders would be counted as part of
the vertex shader time.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9eb568d7531eb4715be24d5076353ea6c10c8ceb 07-Dec-2012 Kenneth Graunke <kenneth@whitecape.org> i965: Create a new vec4 backend for Broadwell.

This replaces the old vec4_generator backend.

v2: Port to use the C-based instruction representation. Also, remove
Geometry Shader offset hacks - the visitor will handle those instead
of this code.

v3: Texturing fixes (including adding textureGather support).

v4: Pass brw_context to gen8_instruction functions as required.

v5: Add SHADER_OPCODE_TXF_MCS support; port DUAL_INSTANCED gs fixes
(caught by Eric). Simplify ADDC/SUBB handling; add comments to
gen8_set_dp_message calls (suggested by Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
26a3bf5c726199d7664d5878ef1f73592e55caa7 28-Nov-2013 Eric Anholt <eric@anholt.net> i965: Stop doing our optimization on a copy of the GLSL IR.

The original intent was that we'd keep a driver-private copy, and there
would be the normal copy for swrast to make use of without the tuning (or
anything more invasive we might do) specific to i965. Only, we don't
generate swrast code any more, because swrast can't render current shaders
anyway. Thus, our private copy is rather a waste, and we can just do our
backend-specific operations on the linked shader.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7629c489c88a6f6dd47b311a90ad64e216c9a37c 29-Nov-2013 Chris Forbes <chrisf@ijw.co.nz> i965: Add shader opcode for sampling MCS surface

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8814806c97ed60c5bb4d6cb1927cd05445864388 21-Oct-2013 Matt Turner <mattst88@gmail.com> i965: Print conditional mod in dump_instruction().

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
637dda1c307aee921ecc646b75f891deab6585a9 02-Dec-2013 Matt Turner <mattst88@gmail.com> i965: Print argument types in dump_instruction().

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
729fe77e3bdf64768e8447c281f249ac80c1b9a2 02-Dec-2013 Matt Turner <mattst88@gmail.com> i965/vec4: Don't print swizzles for immediate values.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2b8e0a73fbc021305fdcab7a3c6661de7af911a9 02-Dec-2013 Matt Turner <mattst88@gmail.com> i965/vec4: Print negate and absolute value for src args.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a85f1b7adf1023667fea090242ba448d935eaa67 26-Nov-2013 Matt Turner <mattst88@gmail.com> i965/vec4: Add support for printing HW_REGs in dump_instruction().

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0e4053234df5e3461e80c90dfd743c3ac96006eb 26-Nov-2013 Matt Turner <mattst88@gmail.com> i965: Don't print extra (null) arguments in dump_instruction().

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d2fcdd0973ee33a2627d1dee6d78091e605af160 29-Nov-2013 Matt Turner <mattst88@gmail.com> i965/cfg: Clean up cfg_t constructors.

parent_mem_ctx was unused since db47074a, so remove the two wrappers
around create() and make create() the constructor.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a97cd0f4d7902965d5173f4bcbf2ad27c0eb5d12 30-Oct-2013 Matt Turner <mattst88@gmail.com> i965: Add a pass to remove dead control flow.

Removes IF/ENDIF and IF/ELSE/ENDIF with no intervening instructions.

total instructions in shared programs: 1360393 -> 1360387 (-0.00%)
instructions in affected programs: 157 -> 151 (-3.82%)

(no change in vertex shaders)

Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1c263f8f4f767df0511e63377c17a95ebebba944 11-Nov-2013 Matt Turner <mattst88@gmail.com> i965/vec4: Add invalidate_live_intervals method.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
34fe051e215107dddbaae71e2edf15f88d839936 20-Oct-2013 Francisco Jerez <currojerez@riseup.net> i965: Add a 'has_side_effects' back-end instruction predicate.

This patch fixes the three dead code elimination passes and the
VEC4/FS instruction scheduling passes so they leave instructions with
side effects alone.

At some point it might be interesting to have the instruction
scheduler calculate the exact memory dependencies between atomic ops,
but they're rare enough that it seems unlikely that it will make any
practical difference.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6032261682388ced64bd33328a5025f561927a38 16-Oct-2013 Eric Anholt <eric@anholt.net> i965: Merge together opcodes for SHADER_OPCODE_GEN4_SCRATCH_READ/WRITE

I'm going to be introducing gen7 variants, and the previous naming was
going to get confusing.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5e621cb9fef7eada5a3c131d27f5b0b142658758 11-Sep-2013 Francisco Jerez <currojerez@riseup.net> i965/gen7: Implement code generation for untyped surface read instructions.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
cfaaa9bbb7a6ab5819f4fa9e38352b72d6293cff 11-Sep-2013 Francisco Jerez <currojerez@riseup.net> i965/gen7: Implement code generation for untyped atomic instructions.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6bb2cf2107c4461ea9dd100edaf110b839311b90 08-Oct-2013 Chris Forbes <chrisf@ijw.co.nz> i965: Add SHADER_OPCODE_TG4_OFFSET for gather with nonconstant offsets.

The generator code ends up clearer this way than if we had to sniff
via the message length. Implemented via the gather4_po message in
hardware, which is present in Gen7 and later.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
89647cffb31ee1ea42d581b1053b4bb147b3e58a 16-Oct-2013 Paul Berry <stereotype441@gmail.com> i965/vec4: if register allocation fails, don't try to schedule.

Otherwise the scheduler would be invoked with prog_data->total_grf ==
0, causing havoc.

In a future patch, this will allow us to try compiling a geometry
shader in DUAL_OBJECT mode with spilling disabled, and then fall back
to DUAL_INSTANCED mode if that failed.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8bb15813e3047820a95724e4257aa2c862eeb31a 16-Oct-2013 Paul Berry <stereotype441@gmail.com> i965/vec4: Add the ability for attributes to be interleaved.

When geometry shaders are operated in "single" or "dual instanced"
mode, a single set of geometry shader inputs is interleaved into the
thread payload (with each payload register containing a pair of
inputs) in order to save register space.

This patch modifies vec4_visitor::lower_attributes_to_hw_regs so that
it can handle the interleaved format.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e0f34301b29ecf3fb7118b2e05872510c104a49b 23-Oct-2013 Paul Berry <stereotype441@gmail.com> i965/vec4: Extract function to set up vec4 prog key for precompiling.

This will allow us to re-use it for precompiling geometry shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
068df64ba6a8309427612836e5eb384721ca6d40 23-Oct-2013 Paul Berry <stereotype441@gmail.com> i965/vec4: Remove uses_clip_distance from program key.

This should never have been in the program key in the first place,
since it's determined by the shader source, not by GL state. Change
the code to just refer to gl_program::UsesClipDistanceOut directly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
705a90e30435490c2de84f4f6741cab335fa7608 03-Oct-2013 Eric Anholt <eric@anholt.net> i965: Move the common binding table offset code to brw_shader.cpp.

Now that both vec4 and fs are dynamically assigning offsets, a lot of the
code is the same.

v2: Avoid passing around the next offset through the class. (Review by
Paul)

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d395485e1df44853cdf86b0bd46b7af36c7e1c13 03-Oct-2013 Eric Anholt <eric@anholt.net> i965/vec4: Dynamically assign the VS/GS binding table offsets.

Note that the dropped comment in brw_context.h is mostly (better written)
in brw_binding_table.c as well.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3c9dc2d31b80fc73bffa1f40a91443a53229c8e2 02-Oct-2013 Eric Anholt <eric@anholt.net> i965: Make a brw_stage_prog_data for storing the SURF_INDEX information.

It would be nice to be able to pack our binding table so that programs
that use 1 render target don't upload an extra BRW_MAX_DRAW_BUFFERS - 1
binding table entries. To do that, we need the compiled program to have
information on where its surfaces go.

v2: Rename size to size_bytes to be more explicit.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a5ec01fb1bd4ad5418eb16cb05e6f6929d1444e8 20-Sep-2013 Matt Turner <mattst88@gmail.com> i965: Don't copy prop source mods into instructions that can't take them.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e7dc88026a821a31bf2afeb934dded11c91401a1 20-Sep-2013 Matt Turner <mattst88@gmail.com> i965: Fixup for don't dead-code eliminate instructions that write to the accumulator.

Accidentally pushed an old version of the patch.

v2: Set destination register using brw_null_reg().
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
92dc16c3e2e2b9e3e71baaccc67bbe727e9d68ab 20-Sep-2013 Matt Turner <mattst88@gmail.com> i965: Don't dead-code eliminate instructions that write to the accumulator.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
fb455500bfb11cca0f45076a9eaccc0ddd764731 31-Mar-2013 Chris Forbes <chrisf@ijw.co.nz> i965: add SHADER_OPCODE_TG4

Adds the Gen7 message IDs, a new SHADER_OPCODE_TG4 pseudo-op, and
low-level support for emitting it via generate_tex().

V3: Updated for changes in master.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4e3d1712a223f9f0b4ff4a34b9b5447a92877347 28-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Detect GRF sources in split_virtual_grfs send-from-GRF code.

It is incorrect to assume that src[0] of a SEND-from-GRF opcode is the GRF.
VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 uses an IMM as src[0], and stores the
GRF as src[1].

To be safe, loop over all the source registers and mark any GRFs. We
probably won't ever have more than one, but it's simpler to just check
all three rather than attempting to bail early.

Fixes assertion failures in Unigine Sanctuary since we started making
register allocation rely on split_virtual_grfs working. (The register
classes were actually sufficient, we were just interpreting an IMM as
a virtual GRF number.)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68637
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
09e2df5961cfe04925bdd820e6ea59af3ba783f6 30-Aug-2013 Eric Anholt <eric@anholt.net> i965/vs: Fix regression on pre-gen6 with no VS uniforms in use.

df06745c5adb524e15d157f976c08f1718f08efa made it so that we didn't
allocate extra uniform space for unused clip planes, which also
incidentally made us not allocate any space at all, which we were relying
on for this no-uniforms case. Instead of putting the knowledge of this
special HW exception into the thing that normally preallocates prog_data
for us, just allocate it here.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68766
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4416cb79926f089ff55dbbb352b94ec2890ae823 23-Mar-2013 Paul Berry <stereotype441@gmail.com> i965/gs: Add GS_OPCODE_THREAD_END.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
96eb2f353605b382cf4fc930cc1d322130b12270 21-Mar-2013 Paul Berry <stereotype441@gmail.com> i965/gs: Add GS_OPCODE_URB_WRITE.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7f57101ad53112b16e4a400f312f68a85dfbd108 13-Jul-2013 Paul Berry <stereotype441@gmail.com> i965/vec4: Virtualize setup_payload instead of setup_attributes.

When I initially generalized the vec4_visitor class in preparation for
geometry shaders, I assumed that the setup_attributes() function would
need to be different between vertex and geometry shaders, but its
caller, setup_payload(), could be shared. So I made
setup_attributes() a virtual function.

It turns out this isn't true; setup_payload() needs to be different
too, since the geometry shader payload sometimes includes an extra
register (primitive ID) that has to come before uniforms.

So setup_payload() needs to be the virtual function instead of
setup_attributes().

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
626495d269e2c2df9dae5c46c086ffff93c77a19 13-Jul-2013 Paul Berry <stereotype441@gmail.com> i965/vec4: Allow for dispatch_grf_start_reg to vary.

Both 3DSTATE_VS and 3DSTATE_GS have a dispatch_grf_start_reg control,
which determines the register where the hardware delivers data sourced
from the URB (push constants followed by per-vertex input data).

For vertex shaders, we always set dispatch_grf_start_reg to 1, since
R1 is always the first register available for push constants in vertex
shaders.

For geometry shaders, we'll need the flexibility to set
dispatch_grf_start_reg to different values depending on the behvaiour
of the geometry shader; if it accesses gl_PrimitiveIDIn, we'll need to
set it to 2 to allow the primitive ID to be delivered to the thread in
R1.

This patch eliminates the assumption that dispatch_grf_start_reg is
always 1. In vec4_visitor, we record the regnum that was passed to
vec4_visitor::setup_uniforms() in prog_data for later use. In
vec4_generator, we consult this value when converting an abstract
UNIFORM register to a concrete hardware register. And in the code
that emits 3DSTATE_VS, we set dispatch_grf_start_reg based on the
value recorded in prog_data.

This will allow us to set dispatch_grf_start_reg to the appropriate
value when compiling geometry shaders. Vertex shaders will continue
to always use a dispatch_grf_start_reg of 1.

v2: Make dispatch_grf_start_reg "unsigned" rather than "GLuint".

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
72168f5f0069b2a0d8a2434ba80f4446952e84c7 15-Aug-2013 Paul Berry <stereotype441@gmail.com> i965/vec4: Move vec4 data structures and functions to brw_vec4.{cpp,h}.

This patch moves the following things into brw_vec4.{cpp,h}:

- struct brw_vec4_compile
- struct brw_vec4_prog_key
- brw_vec4_prog_data_compare()
- brw_vec4_prog_data_free()

This will allow us to avoid having to include brw_vs.h in
geometry-shader-specific files.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5fb13d871e062354a77a427b3a3fe7f3d6908e5a 20-Mar-2013 Paul Berry <stereotype441@gmail.com> i965: Stop including brw_vs.h from brw_vec4.h.

This is backwards from what we are going to want in the long term, which is:

- brw_vec4.h declares general-purpose vec4 infrastructure needed by
both VS and GS
- brw_vs.h includes brw_vec4.h and adds VS-specific parts.
- brw_gs.h includes brw_vec4.h and adds GS-specific parts.

Note that at the moment brw_vec.h contains a fair amount of
VS-specific declarations--I plan to address that in a later patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c642bd3dcc1a6f1039732c614ab8a56dd3779427 15-Aug-2013 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Plumb brw_vec4_prog_data into vec4_generator().

This will be useful for the next commit.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
53631be4ebaa4fb13a7f129727c1cdd32fcc6f3d 06-Jul-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Move intel_context::gen and gt fields to brw_context.

Most functions no longer use intel_context, so this patch additionally
removes the local "intel" variables to avoid compiler warnings.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b15f1fc3c6b3b9dc4422940c412f80e581c9900d 03-Jul-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Move intel_context::perf_debug to brw_context.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
329779a0b45b63be17627f026533c80b2c8f7991 03-Jul-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Move intel_context::batch to brw_context.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
426ca34b7a2c3b9edfc0189daece8de3aff80627 13-Jun-2013 Eric Anholt <eric@anholt.net> glsl: Remove ir_print_visitor.h includes and usage

We have ir->print() to do the old declaration of a visitor and having the
IR accept the visitor (yuck!). And now you can call _mesa_print_ir()
safely anywhere that you know what an ir_instruction is.

A couple of missing printf("\n")s are added in error paths -- when an
expression is handed to the visitor, it doesn't print '\n' (since it might
be a step in printing a whole expression tree).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6220cc931f15ddb428ea481e8b9a70ce26ca3304 28-May-2013 Eric Anholt <eric@anholt.net> i965/vs: Fix implied_mrf_writes() for integer division pre-gen6.

Previously it would assertion fail in debug builds (though the correct
value was returned in a non-debug build). Marking it as a candidate for
stable even though it has no current consumers in the stable branches, in
case one shows up in a later backport.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64727
NOTE: This is a candidate for stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0f3068a58bdbceb2cb93e3848b0e2145629cdf43 01-May-2013 Eric Anholt <eric@anholt.net> i965/vs: Make virtual grf live intervals actually cover their used range.

This is the same change as the previous commit to the FS. A very few VSes
are regressed by 1 or 2 instructions, which look recoverable with a bit
more dead code elimination.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
573d8813fdbb116f4500d2044c56d80aab73ab7f 01-Dec-2012 Eric Anholt <eric@anholt.net> i965/vs: Add instruction scheduling.

While this is ignorant of dependency control, it's still good for a 0.39%
+/- 0.08% performance improvement on GLBenchmark 2.7 (n=548)

v2: Rewrite as a subclass of the base class for the FS instruction
scheduler, inheriting the same latency information.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
63c8155b09bca7917631ec678a0d0db6e7965a1a 29-Apr-2013 Eric Anholt <eric@anholt.net> i965: Make dump_instructions be a virtual method of the visitor.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5e46482993dfd30b888d5219f6fecf4b4d1f42de 28-Apr-2013 Kenneth Graunke <kenneth@whitecape.org> i965: Move is_math/is_tex/is_control_flow() to backend_instruction.

These are entirely based on the opcode, which is available in
backend_instruction. It makes sense to only implement them in one
place.

This changes the VS implementation of is_tex() slightly, which now
accepts FS_OPCODE_TXB and SHADER_OPCODE_LOD. However, since those
aren't generated in the VS anyway, it should be fine.

This also makes is_control_flow() available in the VS.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
417d8917d4924652f1cd0c64dbf3677d4eddbf8c 16-Apr-2013 Paul Berry <stereotype441@gmail.com> i965/vec4: Fix hypothetical use of uninitialized data in attribute_map[].

Fixes issue identified by Klocwork analysis:

'attribute_map' array elements might be used uninitialized in this
function (vec4_visitor::lower_attributes_to_hw_regs).

The attribute_map array contains the mapping from shader input
attributes to the hardware registers they are stored in.
vec4_vs_visitor::setup_attributes() only populates elements of this
array which, according to core Mesa, are actually used by the shader.
Therefore, when vec4_visitor::lower_attributes_to_hw_regs() accesses
the array to lower a register access in the shader, it should in
principle only access elements of attribute_map that contain valid
data. However, if a bug ever caused the driver back-end to access an
input that was not flagged as used by core Mesa, then
lower_attributes_to_hw_regs() would access uninitialized memory, which
could cause illegal instructions to get generated, resulting in a
possible GPU hang.

This patch makes the situation more robust by using memset() to
pre-initialize the attribute_map array to zero, so that if such a bug
ever occurred, lower_attributes_to_hw_regs() would generate a (mostly)
harmless access to r0. In addition, it adds assertions to
lower_attributes_to_hw_regs() so that if we do have such a bug, we're
likely to discover it quickly.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
dea70404eb615bfa148fbd0fec5670fb2657c47b 11-Apr-2013 Eric Anholt <eric@anholt.net> i965: Fix a warning in the release build.

This was copy and pasted from can_reswizzle_dst(), and we can just fold it
in instead to avoid the warning.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
195a6cca3cbc26eeea0f7f8dfc21dd3429911779 11-Apr-2013 Matt Turner <mattst88@gmail.com> i965/vs: Print error if vertex shader fails to compile.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
32a8e877666f7c3798d736bb6f05ad2f41356ebf 11-Apr-2013 Matt Turner <mattst88@gmail.com> i965: NULL check prog on shader compilation failure.

Also change if (shader) to if (prog) for consistency.

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e9fa3a94486d80da34542cfd24425c208a8d30fe 23-Mar-2013 Paul Berry <stereotype441@gmail.com> i965/vs: Don't hardcode DEBUG_VS in generic vec4 code.

Since the vec4_visitor and vec4_generator classes are going to be
re-used for geometry shaders, we can't enable their debug
functionality based on (INTEL_DEBUG & DEBUG_VS) anymore. Instead, add
a debug_flag boolean to these two classes, so that when they're
instantiated the caller can specify whether debug dumps are needed.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
444fce6398556118629ef01204a7d8ff7af0bea3 22-Mar-2013 Paul Berry <stereotype441@gmail.com> i965/vs: Generalize attribute setup code in preparation for GS.

This patch introduces a new function,
vec4_visitor::lower_attributes_to_hw_regs(), which replaces registers
of type ATTR in the instruction stream with the hardware registers
that store those attributes. This logic will need to be common
between the vertex and geometry shaders.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9bb6840b28a9a77377d437198c62d705cade5370 17-Feb-2013 Paul Berry <stereotype441@gmail.com> i965/vs: Generalize data structures pointed to by vec4_generator.

This patch removes the following field from vec4_generator, since it
is not used:

- struct brw_vs_compile *c

And changes the following field:

- struct gl_vertex_program *vp => struct gl_program *prog

With these changes, vec4_generator no longer refers to any VS-specific
data structures. This will pave the way for re-using it for geometry
shaders.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

v2: Use the name "prog" rather than "p".

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5743bea0ba1eda07be831d95c5b7729f9ba98a92 17-Feb-2013 Paul Berry <stereotype441@gmail.com> i965/vs: move VS-specific data members to vs_vec4_visitor.

This patch moves the following data structures from vec4_visitor to
vec4_vs_visitor, since they contain VS-specific data:

- struct brw_vs_compile *c (renamed to vs_compile)
- struct brw_vs_prog_data *prog_data (renamed to vs_prog_data)
- src_reg *vp_temp_regs
- src_reg vp_addr_reg

Since brw_vs_compile and brw_vs_prog_data also contain vec4-generic
data, the following pointers are added to the base class, to allow it
to access the vec4-generic portions of these data structures:

- struct brw_vec4_compile *c
- struct brw_vec4_prog_key *key
- struct brw_vec4_prog_data *prog_data

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

v2: Use shorter names in the base class and longer names in the
derived class.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8941f73c7ccb3c6cfa965a19f346e4b6ead6abdb 17-Feb-2013 Paul Berry <stereotype441@gmail.com> i965/vs: Make some vec4_visitor functions virtual.

This patch makes the following vec4_visitor functions virtual, since
they will need to be implemented differently for vertex and geometry
shaders. Some of the functions are renamed to reflect their generic
purpose, rather than their VS-specific behaviour:

- setup_attributes
- emit_attribute_fixups (renamed to emit_prolog)
- emit_vertex_program_code (renamed to emit_program_code)
- emit_urb_writes (renamed to emit_thread_end)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e9be5a05f70be7cff58b29bff07af71e6d339085 16-Feb-2013 Paul Berry <stereotype441@gmail.com> i965/vs: Make vec4_vs_visitor class derived from vec4_visitor.

This patch just creates the derived class; later patches will migrate
VS-specific functions and data structures from the base class into the
derived class.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5fff3752c88255ea3f4eb26cddb2c996694b33b1 17-Feb-2013 Paul Berry <stereotype441@gmail.com> i965/vs: split brw_vs_prog_data into generic and VS-specific parts.

This will allow the generic parts to be re-used for geometry shaders.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

v2: Put urb_read_length and urb_entry_size in the generic struct.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0c994f181ce1a09cdbb7db27e4ad5565248bf8e1 16-Feb-2013 Paul Berry <stereotype441@gmail.com> i965/vs: split brw_vs_prog_key into generic and VS-specific parts.

This will allow the generic parts to be re-used for geometry shaders.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
09cd6e06d2c7a54ca6eb8d3102822efa78e01a9c 16-Feb-2013 Paul Berry <stereotype441@gmail.com> i965/vs: Remove brw_vs_prog_data pointer from brw_vs_compile.

In patches that follow, we'll be splitting structs brw_vs_prog_data
and brw_vs_compile into a vec4-generic base struct and a VS-specific
derived struct (this will allow the vec4-generic code to be re-used
for geometry shaders). Having brw_vs_compile point to
brw_vs_prog_data makes it difficult to do this cleanly.

Fortunately most of the functions that use brw_vs_compile (those in
the vec4_visitor class) already have access to brw_vs_prog_data
through a separate pointer (vec4_visitor::prog_data). So all we have
to do is use that pointer consistently, and plumb prog_data through
the few remaining functions that need access to it.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b29613371c316e9273ebe29ba37fb2f04c2ed58d 16-Feb-2013 Paul Berry <stereotype441@gmail.com> i965/vs: Make type of vec4_visitor::vp more generic.

The vec4_visitor functions don't use any VS specific data from
vec4_visitor::vp. So rename it to "prog" and change its type from
struct gl_vertex_program * to struct gl_program *. This will allow
the code to be re-used for geometry shaders.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

v2: Use the name "prog" rather than "p".

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
fe97f26c86d65b1b0e026c725c7da348a91093d9 09-Apr-2013 Paul Berry <stereotype441@gmail.com> i965: Rename backend_visitor::prog to shader_prog.

The next patch is going to change the type of vec4_visitor::vp from
struct gl_vertex_program * to struct gl_program *, and rename it. The
sensible name to change it to is vec4_visitor::prog. However, prog is
already used in backend_visitor (which vec4_visitor derives from).
Since backend_visitor::prog is of type struct gl_shader_program *, it
makes sense to rename it to shader_prog.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d5f7aebac2b1afbc5023cd114174860d8d763d06 04-Apr-2013 Eric Anholt <eric@anholt.net> i965/vs: Use GRFs for pull constant offsets on gen7.

This allows the computation of the offset to get written directly into the
message source.

shader-db results:
total instructions in shared programs: 3308390 -> 3283025 (-0.77%)
instructions in affected programs: 442998 -> 417633 (-5.73%)

No difference in GLB2.7 low res (n=9).

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3badbf7f7fa4898c69545fea3c60ea29cf61ae3b 05-Apr-2013 Eric Anholt <eric@anholt.net> i965/vs: When asked to make a dst_reg for a src.xxxx, just write to src.x.

We have several places in our pull constant handling where we make a
temporary src_reg for an int, and then turn it into a dst. In doing so,
we were writing to the dst.xyzw, so we never register coalesced it with a
later mov from dst.x to real_dst.x.

These extra channels written would be removed if we had channel-wise DCE
in the backend, but we don't. Fix it for now by just not writing these
extra channels that won't get used.

Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4fee05b020af72ee802d4349de76fbc36cdd53a9 01-Dec-2012 Eric Anholt <eric@anholt.net> i965/vs: Add a pass to set dependency control fields on instructions.

This is a more aggressive version of the old brw_optimize() path. Reduces
cycles spent in the vertex shader on minecraft by 18.6% +/- 10.0% (n=15).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
20d846ce8b46604ced835eb68079a0dbae2e19dc 12-Mar-2013 Eric Anholt <eric@anholt.net> i965: Add names for all instructions to dump_instruction() in FS and VS.

I'd previously added the minimum names to understand my dumps, but this
makes dumps in general much easier to read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6192e9b377c6fa4f36da42af6c06ca32b10e7e62 20-Mar-2013 Eric Anholt <eric@anholt.net> i965/vs: Include URB payload setup in shader_time.

This much more accurately reflects the cost of the vertex shader, since
the payload setup is often a significant fraction of the instructions in
the VS.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
55feb19704ae69c580f431d6498344521de369cd 18-Dec-2012 Eric Anholt <eric@anholt.net> i965/vs: Use a send from a 2-register VGRF for shader time writes.

This will let us emit it later, after we're setting up MRFs for the
URB write.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
130138030a3dc8bda20766146ca9fda4047133d3 18-Dec-2012 Eric Anholt <eric@anholt.net> i965/vs: Teach copy propagation about sends from GRFs.

This incidentally also teaches it a bit about gen6 math -- we now allow
unswizzled, unmodified GRF temps as the sources for math.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c3a22d42a88c299561dd913d0a00bb986921eeba 18-Dec-2012 Eric Anholt <eric@anholt.net> i965/vs: Prepare split_virtual_grfs() for the presence of SENDs from GRFs.

v2: Fix silly bool handling, and don't add new tabs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d2ba1c24b440ee74436335d8e815be9b72b1ba7f 19-Mar-2013 Eric Anholt <eric@anholt.net> i965: Track ARB program state along with GLSL state for shader_time.

This will let us do much better printouts for non-GLSL programs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d24819dce8cf6dac23f27df46fabbf756a732229 11-Mar-2013 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Add IR dumping for immediates.

This makes dump_instructions more useful.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
db3a0f13ef13b6d392dfc3b7346351533600d343 11-Mar-2013 Eric Anholt <eric@anholt.net> i965: Split shader_time entries into separate cachelines.

This avoids some snooping overhead between EUs processing separate shaders
(so VS versus FS).

Improves performance of a minecraft trace with shader_time by 28.9% +/-
18.3% (n=7), and performance of my old GLSL demo by 93.7% +/- 0.8% (n=4).

v2: Add a define for the stride with a comment explaining its units and
why.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
14cec07177f438717cc6fb9252525e16d6b3d8dd 22-Feb-2013 Eric Anholt <eric@anholt.net> i965: Make perf_debug() output to GL_ARB_debug_output in a debug context.

I tried to ensure that performance in the non-debug case doesn't change
(we still just check one condition up front), and I think the impact is
small enough in the debug context case to warrant including all of it.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f52ce6a0ca73d1cd89091689efd8ea2e14748723 24-Jan-2013 Chris Forbes <chrisf@ijw.co.nz> i965: add a new virtual opcode: SHADER_OPCODE_TXF_MS

This is very similar to the TXF opcode, but lowers to `ld2dms` rather
than `ld` on Gen7.

V4: - add SHADER_OPCODE_TXF_MS to is_tex() functions, so regalloc thinks
it actually writes the correct number of registers. Otherwise in
nontrivial shaders some of the registers tend to get clobbered,
producing bad results.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f0213b124259804ce8e114575fe9058dffdf5864 13-Feb-2013 Matt Turner <mattst88@gmail.com> i965/vs/gen7: Allow MATH instructions to have MRF as a destination

total instructions in shared programs: 346873 -> 346847 (-0.01%)
instructions in affected programs: 364 -> 338 (-7.14%)

(All affected shaders are from Lightsmark)

Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d5efc14635cf25bc130bfa77737913913d9202ce 21-Nov-2012 Eric Anholt <eric@anholt.net> i965: Add asserts to check that we don't realloc ParameterValues.

Things are even more restrictive than they used to be, so I've made
mistakes in this area.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c9e48e5b083b6cf97ecdb2d17c874ea631203b06 02-Aug-2012 Eric Anholt <eric@anholt.net> i965: Generalize VS compute-to-MRF for compute-to-another-GRF, too.

No statistically significant performance difference on glbenchmark 2.7
(n=60). It reduces cycles spent in the vertex shader by 3.3% +/- 0.8%
(n=5), but that's only about .3% of all cycles spent according to the
fixed shader_time.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
471af25fc57dc43a8277b4b17ec82547287621d0 01-Dec-2012 Eric Anholt <eric@anholt.net> i965/vs: Extend opt_compute_to_mrf to handle limited "reswizzling"

The way our visitor works, scalar expression/swizzle results that get
stored in channels other than .x will have an intermediate MOV from
their result in the .x channel to the real .y (or whatever) channel, and
similarly for vec2/vec3 results.

By knowing how to adjust DP4-type instructions for optimizing out a
swizzled MOV, we can reduce instructions in common matrix multiplication
cases.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f74560f3fb516971e6a7b03a2382db2f58699f59 10-Dec-2012 Eric Anholt <eric@anholt.net> i965: Scale shader_time to compensate for resets.

Some shaders experience resets more than others, which skews the numbers
reported. Attempt to correct for this by linearly scaling according to
the number of resets that happen.

Note that will not be accurate if invocations of shaders have varying
times and longer invocations are more likely to reset. However, this
should at least be better than the previous situation.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
338b5f887d462bbe7ef58a233cd00619e43415f0 10-Dec-2012 Eric Anholt <eric@anholt.net> i965: Adjust the split between shader_time_end() and shader_time_write().

I'm about to emit other kinds of writes besides time deltas, and it
turns out with the frequency of resets, we couldn't really use the old
time delta write() function more than once in a shader.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
71f06344a0d72a6bd27750ceca571fc016b8de85 27-Nov-2012 Eric Anholt <eric@anholt.net> i965: Add a debug flag for counting cycles spent in each compiled shader.

This can be used for two purposes: Using hand-coded shaders to determine
per-instruction timings, or figuring out which shader to optimize in a
whole application.

Note that this doesn't cover the instructions that set up the message to
the URB/FB write -- we'd need to convert the MRF usage in these
instructions to GRFs so that our offsets/times don't overwrite our
shader outputs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)

v2: Check the timestamp reset flag in the VS, which is apparently
getting set fairly regularly in the range we watch, resulting in
negative numbers getting added to our 32-bit counter, and thus large
values added to our uint64_t.
v3: Rebase on reladdr changes, removing a new safety check that proved
impossible to satisfy. Add a comment to the AOP defs from Ken's
review, and put them in a slightly more sensible spot.
v4: Check timestamp reset in the FS as well.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b126228f1247fb0fed686ee3ef2c87461f2fc7a7 30-Nov-2012 Eric Anholt <eric@anholt.net> i965: Include codegen time in the INTEL_DEBUG=perf stall detection.

In the VS case, we were missing the entire compile time in the stall
detection!

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0f06864ba566eaff5b739a9d0fba5ed7eaadd60b 30-Nov-2012 Eric Anholt <eric@anholt.net> i965: Don't leak the IR annotation into later instructions.

After walking our IR instructions (Mesa or GLSL), we don't want to also
mark the start of the FB/URB writes or whatever as being that IR. This
can end up being misleading when the end of the IR visit got copy
propagated out to a later instruction in the URB writes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c1023608002c985b9d72edc64732cd666de2a206 27-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Move struct brw_compile (p) entirely inside vec4_generator.

The brw_compile structure contains the brw_instruction store and the
brw_eu_emit.c state tracking fields. These are only useful for the
final assembly generation pass; the earlier compilation stages doesn't
need them.

This also means that the code generator for future hardware won't have
access to the brw_compile structure, which is extremely desirable
because it prevents accidental generation of Gen4-7 code.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
eda9726ef51dcfd3895924eb0f74df8e67aa9c3a 27-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Split final assembly code generation out of vec4_visitor.

Compiling shaders requires several main steps:

1. Generating VS IR from either GLSL IR or Mesa IR
2. Optimizing the IR
3. Register allocation
4. Generating assembly code

This patch splits out step 4 into a separate class named "vec4_generator."

There are several reasons for doing so:

1. Future hardware has a different instruction encoding. Splitting
this out will allow us to replace vec4_generator (which relies
heavily on the brw_eu_emit.c code and struct brw_instruction) with
a new code generator that writes the new format.

2. It reduces the size of the vec4_visitor monolith. (Arguably, a lot
more should be split out, but that's left for "future work.")

3. Separate namespaces allow us to make helper functions for
generating instructions in both classes: ADD() can exist in
vec4_visitor and create IR, while ADD() in vec4_generator() can
create brw_instructions. (Patches for this upcoming.)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8af8a26480e9e71fb1501b675f21a469c1699b9b 27-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Move uses of brw_compile from do_vs_prog to brw_vs_emit.

The brw_compile structure is closely tied to the Gen4-7 hardware
encoding. However, do_vs_prog is very generic: it just calls out to
get a compiled program and then uploads it.

This isn't ultimately where we want it, but it's a step in the right
direction: it's now closer to the code generator.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
746fc346eae21d227b06799f3e82a1404c75bdc9 27-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Rework memory contexts for shader compilation data.

During compilation, we allocate a bunch of things: the IR needs to last
at least until code generation...and then the program store needs to
last until after we upload the program.

For simplicity's sake, just keep it all around until we upload the
program. After that, it can all be freed.

This will also save a lot of headaches during the upcoming refactoring.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
403bb1d306c5bc23ad9e2c26fd39071e6e41f665 27-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Pass the brw_context pointer into vec4_visitor and do_vs_prog.

We used to steal it out of the brw_compile struct...but vec4_visitor
isn't going to have one of those in the future.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
dd50c88386c8220f4631115b68a10930378ead6c 27-Nov-2012 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Move some functions from brw_vec4_emit.cpp to brw_vec4.cpp.

This leaves only the final code generation stage in brw_vec4_emit.cpp,
moving the payload setup, run(), and brw_vs_emit functions to brw_vec4.cpp.

The fragment shader backend puts these functions in brw_fs.cpp, so this
patch also helps with consistency.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
10ff6772c8054aea12ac0f08e2e3898fd4a7f76b 25-Oct-2012 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Don't lose the MRF writemask when doing compute-to-MRF.

Consider the following code sequence:

mul(8) g4<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
mov.sat(8) m1<1>.xyF g4<4,4,1>F
mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
mov.sat(8) m1<1>.zwF g4<4,4,1>F

The compute-to-MRF pass will discover the first mov.sat and attempt to
replace it by rewriting earlier instructions. Everything works out,
so it replaces scan_inst's destination file, reg, and reg_offset,
resulting in:

mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF
mov.sat(8) m1<1>.zwF g4<4,4,1>F

Unfortunately, it loses the .xy writemask on the mov.sat's MRF
destination. While this doesn't pose an immediate problem, it then
proceeds to transform the second mov.sat, resulting in:

mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF
mul(8) m1<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF

Instead of writing both halves of the vector (like the original code),
it overwrites the full vector both times, clobbering the desired .xy
values.

When encountering a MOV, the compute-to-MRF code scans for instructions
which generate channels of the MOV source. It ensures that all
necessary channels are available (possibly written by several
instructions). In this case, *more* channels are available than
necessary, so we want to take the subset that's actually used.
Taking the bitwise and of both writemasks should accomplish that.

This was discovered by analyzing an ARB_vertex_program test
(glean/vertProg1/MUL test (with swizzle and masking)) with my new
Mesa IR -> Vec4 IR translator code. However, it should be possible
with GLSL programs as well.

NOTE: This is a candidate for stable release branches.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f593acd5778d4fdfa3493bb90c99b52e45667bc0 19-Oct-2012 Tapani Pälli <tapani.palli@intel.com> i965/vs: include format argument in debug printf

otherwise some compilers will throw error
"error: format not a string literal and no format arguments"

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
20ebebac5153affcbd44350332678a2fb04d4c96 03-Oct-2012 Eric Anholt <eric@anholt.net> i965/vs: Improve live interval calculation.

This is derived from the FS visitor code for the same, but tracks each channel
separately (otherwise, some typical fill-a-channel-at-a-time patterns would
produce excessive live intervals across loops and cause spilling).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48375
(crash -> failure, can turn into pass by forcing unrolling still)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
914d8f9f84a3539758716d676d59a1fee4cc559f 04-Oct-2012 Eric Anholt <eric@anholt.net> i965/vs: Add a little bit of IR-level debug ability.

This is super basic, but it let me visualize a problem I had with
opt_compute_to_mrf().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
34c58acb59bc0b827e28ef9e89044621ab0b3ee1 03-Oct-2012 Eric Anholt <eric@anholt.net> i965/vs: Add support for splitting virtual GRFs.

This should improve our ability to register allocate without spilling.
Unfortuantely, due to the live variable analysis being ignorant of loops, we
still have register allocation failures on some programs.

v2: Add more context to the comment explaining the function.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
25ca9cc8236845a4be32a6f39b4a6d1664d4b403 04-Jul-2012 Eric Anholt <eric@anholt.net> i965/vs: Move the other two src_reg/dst_reg constructors to brw_vec4.cpp.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b2f5d4c3ec9ec2fec8b39c87eb00121a24107276 04-Jul-2012 Eric Anholt <eric@anholt.net> i965/vs: Move class functions to brw_vec4.cpp.

This has less impact than for the FS (4k savings), because it was partially
done already, but makes things more consistent.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7e7c40ff98cc2b930bc3113609ace5430f2bdc95 26-Oct-2011 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Add vec4_instruction::is_tex() query.

Copy and pasted from fs_inst::is_tex(), but without TXB.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1d4f3ca8f0442821c914b758b323e6e5124149a3 29-Sep-2011 Kenneth Graunke <kenneth@whitecape.org> i965/vs: Implement integer quotient and remainder math operations.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c662764f4f9d9d0303fb2685dfdc93824fa15dca 06-Sep-2011 Eric Anholt <eric@anholt.net> i965/vs: Add support for compute-to-MRF.

Removes 1.8% of the instructions from 97% of the vertex shaders in
shader-db.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
160848d8ef96cf3a760c02cc576df7dbffc1f669 06-Sep-2011 Eric Anholt <eric@anholt.net> i965/vs: Add a function for how many MRFs get written as part of a SEND.

This will be used for compute-to-mrf, which needs to know when MRFs
get overwritten.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f0c04e6c22babf2aee2ad1ee85dbd6f996be3712 03-Sep-2011 Eric Anholt <eric@anholt.net> i965/vs: Add support for simple algebraic optimizations.

We generate silly code for array access, and it's easier to generally
support the cleanup than to specifically avoid the bad code in each
place we might generate it.

Removes 4.6% of instructions from 41.6% of shaders in shader-db,
particularly savage2/hon and unigine.

v2: Fixes by Ken: Make is_zero/one member functions, and fix a
progress flag.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
cc9eb936c220267b6130b705fc696d05906a31df 02-Sep-2011 Eric Anholt <eric@anholt.net> i965/vs: Add support for copy propagation of the UNIFORM and ATTR files.

Removes 2.0% of the instructions from 35.7% of vertex shaders in shader-db.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
42ce13195b94d0d51ca8e7fa5eed07fde8f37988 30-Aug-2011 Eric Anholt <eric@anholt.net> i965/vs: Add constant propagation to a few opcodes.

This differs from the FS in that we track constants in each
destination channel, and we we have to look at all the swizzled source
channels. Also, the instruction stream walk is done in an O(n) manner
instead of O(n^2).

Across shader-db, this reduces 8.0% of the instructions from 60.0% of
the vertex shaders, leaving us now behind the old backend by 11.1%
overall.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
df35d691807656d3627b6fa6f51a08674bdc043e 07-Sep-2011 Eric Anholt <eric@anholt.net> i965/vs: Add support for overflowing the number of available push constants.

Fixes glsl-vs-uniform-array-4.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33742

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
72cfc6f3778d8297e52c254a5861a88eb62e4d67 23-Aug-2011 Eric Anholt <eric@anholt.net> i965/vs: Pack live uniform vectors together in the push constant upload.

At some point we need to also move uniform accesses out to pull
constants when there are just too many in use, but we lack tests for
that at the moment.

Fixes glsl-vs-large-uniform-array.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7c84b9d303345fa5075dba8c4ea7af449d93b0f8 23-Aug-2011 Eric Anholt <eric@anholt.net> i965/vs: Track uniforms as separate vectors once we've done array access.

This will make it easier to figure out which elements are totally
unused and not upload them.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8174945d3346dc049ae56dcb4bf1eab39f5c88aa 17-Aug-2011 Eric Anholt <eric@anholt.net> i965/vs: Add simple dead code elimination.

This is copied right from the fragment shader. It is needed for real
register allocation to work correctly.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3dadc1e3cceac80a1b63cad2e10f0e0f8904531b 17-Aug-2011 Eric Anholt <eric@anholt.net> i965/vs: Copy the live intervals calculation over from the FS.

This is a rather pessimistic calculation, since it doesn't distinguish
individual channels of a vec4, or elements of an array, but should be
a minimum start for register allocation.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp