Cross Reference: /external/mesa3d/src/mesa/drivers/dri/i965/brw

History log of /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
56ee2df4bf9b1e8c26cf8689f5ef20237c95466b	13-Jan-2017	Juan A. Suarez Romero <jasuarez@igalia.com>	i965/vec4: Fix mapping attributes This patch reverts 57bab6708f2bbc1ab8a3d202e9a467963596d462, which was causing issues with ILK and earlier VS programs. 1. brw_nir.c: Revert "i965/vec4/nir: vec4 also needs to remap vs attributes" Do not perform a remap in vec4 backend. Rather, do it later when setup attributes 2. brw_vec4.cpp: This fixes mapping ATTRx to proper GRFn. Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99391 [jordan.l.justen@intel.com: merge Juan's two patches from bugzilla] Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
58fdb85f0f413d1a144d4beb6519da59bc52c974	21-Apr-2016	Alejandro Piñeiro <apinheiro@igalia.com>	i965/vec4: take into account doubles when creating attribute mapping Doubles needs more that one slot per attribute. So when filling the attribute_map we check if it is a double in order to allocate one extra register. Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f8310189f4a31c443657cd0c1aef35db02b86c95	21-Apr-2016	Alejandro Piñeiro <apinheiro@igalia.com>	i965/vec4: use attribute slots for first non payload GRF As part of the payload setup, setup_attributes is called with the first GRF that can be used for the attributes (first ones are used for uniforms for example) and returns the first GRF that is not part of the payload. Before this patch, it adds directly the number of attributes. But as with 64-bit attributes can consume more than one slot, that is not valid anymore. This patch change the addition to use the number of slots consumed. gen >= 8 would not be affected, as they use the scalar mode. For that case, the vs configuration is done at fs_visitor::assign_vs_urb_setup. v2: add explanation in commit log (Jordan) Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c2acf97fcc9b32eaa9778771282758e5652a8ad4	16-Dec-2016	Juan A. Suarez Romero <jasuarez@igalia.com>	nir/i965: use two slots from inputs_read for dvec3/dvec4 vertex input attributes So far, input_reads was a bitmap tracking which vertex input locations were being used. In OpenGL, an attribute bigger than a vec4 (like a dvec3 or dvec4) consumes just one location, any other small attribute. So we mark the proper bit in inputs_read, and also the same bit in double_inputs_read if the attribute is a dvec3/dvec4. But in Vulkan, this is slightly different: a dvec3/dvec4 attribute consumes two locations, not just one. And hence two bits would be marked in inputs_read for the same vertex input attribute. To avoid handling two different situations in NIR, we just choose the latest one: in OpenGL, when creating NIR from GLSL/IR, any dvec3/dvec4 vertex input attribute is marked with two bits in the inputs_read bitmap (and also in the double_inputs_read), and following attributes are adjusted accordingly. As example, if in our GLSL/IR shader we have three attributes: layout(location = 0) vec3 attr0; layout(location = 1) dvec4 attr1; layout(location = 2) dvec3 attr2; then in our NIR shader we put attr0 in location 0, attr1 in locations 1 and 2, and attr2 in location 3 and 4. Checking carefully, basically we are using slots rather than locations in NIR. When emitting the vertices, we do a inverse map to know the corresponding location for each slot. v2 (Jason): - use two slots from inputs_read for dvec3/dvec4 NIR from GLSL/IR. v3 (Jason): - Fix commit log error. - Use ladder ifs and fix braces. - elements_double is divisible by 2, don't need DIV_ROUND_UP(). - Use if ladder instead of a switch. - Add comment about hardware restriction in 64bit vertex attributes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7c6b714cd0fe06044c9a810186f5ce3690152574	05-Jan-2017	Kenneth Graunke <kenneth@whitecape.org>	i965: Print VS output VUE map in Vulkan too. We need to move this to the shared layer. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c762809e49daf61fc986721006ce6a520e6e735f	01-Sep-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: run scalarize_df() after spilling Spilling of 64-bit data requires data shuffling for the corresponding scratch read/write messages. This produces unsupported swizzle regions and writemasks that we need to scalarize. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
73610384a8357287cef64434c789ff03c2f6f37a	01-Sep-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: prevent src/dst hazards during 64-bit register allocation 8-wide compressed DF operations are executed as two separate 4-wide DF operations. In that scenario, we have to be careful when we allocate register space for their operands to prevent the case where the first half of the instruction overwrites the source of the second half. To do this we mark compressed instructions as having hazards to make sure that ther register allocators assigns a register regions for the destination that does not overlap with the region assigned for any of its source operands. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2b57adad0056273e38d9a9736cd98be95c0deb07	18-Aug-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4/scalarize_df: support more swizzles via vstride=0 By exploiting gen7's hardware decompression bug with vstride=0 we gain the capacity to support additional swizzle combinations. This also fixes ZW writes from X/Y channels like in: mov r2.z:df r0.xxxx:df Because DF regions use 2-wide rows with a vstride of 2, the region generated for the source would be r0<2,2,1>.xyxy:DF, which is equivalent to r0.xxzz, so we end up writing r0.z in r2.z instead of r0.x. Using a vertical stride of 0 in these cases we get to replicate the XX swizzle and write what we want. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c3edacaa288ae01c0f37e645737feeeb48f2c3f2	19-Jul-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4/scalarize_df: do not scalarize swizzles that we can support natively Certain swizzles like XYZW can be supported by translating only the first two 64-bit swizzle channels to 32-bit channels. This happens with swizzles such that the first two logical components, when translated to 32-bit channels and replicated across the second dvec2 row, select the same channels specified by the 3rd and 4th logical swizzle components. Notice that this opens up the possibility that some instructions are not scalarized and can end up with XY or ZW 32-bit writemasks. Make sure we always scalarize in such cases. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2f0bc54e2bf6c7d218f30acc88f5cb94bd6214f7	01-Jul-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: split instructions that read 64-bit interleaved attributes Stages that use interleaved attributes generate regions with a vstride=0 that can hit the gen7 hardware decompression bug. v2: - Make static the function and fix indent (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0579c85e5ca7d406cad42db7c1501d6b1fb9696b	01-Jul-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: dump subnr for FIXED_GRF This came in handy when debugging the payload setup for Tess Eval, since it prints correct subnr for attributes that can be loaded in the second half of a register. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5fe8d567d8dadeb2b77addd73762f6bde4acfac2	06-Oct-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: fix attribute setup for doubles Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6a01259d8a13aace16e4f1ce9e09e0e41bd52273	06-Oct-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: fix indentation in lower_attributes_to_hw_regs() Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ae400e38d90ea2fddf1b050ff94f52bdec94e150	15-Jul-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: make emit_pull_constant_load support 64-bit loads This way callers don't need to know about 64-bit particularities and we reuse some code. v2: - use byte_offset() instead of offset() - only mark the surface as used once Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
df6e3aa6ae23346bad59d071d340a67be0e2a2c5	13-Jul-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: fix move_push_constants_to_pull_constants() for 64-bit data v2: adapt to changes in offset() Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
eee2c0d7854e55e92e0e72eb0fb94ab83d702754	13-Jul-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: fix indentation in move_push_constants_to_pull_constants() Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
07bc6a35d3d6d94d45b81bd10002f0e420d855c2	28-Jun-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: do not split scratch read/write opcodes 64-bit scratch read/writes require to shuffle data around so we need to have access to the full 64-bit data. We will do the right thing for these when we emit the messages. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2a857104e41167cef3c6a5132a45c88056c75dff	23-Jun-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: Do not use DepCtrl with 64-bit instructions The BDW PRM says that it is not supported, but it seems that gen7 is also affected, since doing DepCtrl on double-float instructions leads to GPU hangs in some cases, which is probably not surprising knowing that this is not supported in new hardware iterations. The SKL PRMs do not mention this restriction, so it is probably fine. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
506154f704dcb9185dadcd655fd6d0603916ea97	23-Jun-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8 platforms v2: - Add Broxton as Intel's internal PRMs says that it is needed (Matt). Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b3a7d0ee9d5f792ab68fbe77da5e3ea85d4bc4c0	08-Jun-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: Lower 64-bit MAD The previous patch made sure that we do not generate MAD instructions for any NIR's 64-bit ffma, but there is nothing preventing i965 from producing MAD instructions as a result of lowerings or optimization passes. This patch makes sure that any 64-bit MAD produced inside the driver after translating from NIR is also converted to MUL+ADD before we generate code. v2: - Use a copy constructor to copy all relevant instruction fields from the original mad into the add and mul instructions v3: - Rename the lowering and fix commit log (Matt) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
83dcd146020f5e54d1e0a46c585ed672e75abaa0	01-Jun-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands We make scalar sources in 3src instructions use subnr instead of swizzles because they don't really use swizzles. With doubles it is more complicated because we use vstride=0 in more scenarios in which they don't produce scalar regions. Also RepCtrl=1 is not allowed with 64-bit operands, so we should avoid this. v2: Fix typo (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
49be3abbe7afd64f9e3435e9a9e341e30acacb52	30-May-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: fix indentation in pack_uniform_registers Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bdf5498c6b7870c14139279e76f1e4b281bed2cd	29-Jun-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: fix pack_uniform_registers for doubles We need to consider the fact that dvec3/4 require two vec4 slots. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
23278a75ce06c3c083892b2a20d9efdf794167d6	18-Jul-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: teach register coalescing about 64-bit Specifically, at least for now, we don't want to deal with the fact that channel sizes for fp64 instructions are twice the size, so prevent coalescing from instructions with a different type size. Also, we should check that if we are coalescing a register from another MOV we should be writing the same amount of data in both operations, otherwise we end up wiring more or less than the original instruction. This can happen, for example, when we have split fp64 MOVs with an exec size of 4 that only write one register each and then a MOV with exec size of 8 that reads both. We want to avoid the pass to think that it can coalesce from the first split MOV alone. Ideally we would like the pass to see that it can coalesce from both split MOVs instead, but for now we keep it simple. Finally, the pass doesn't support coalescing of multiple registers but in the case of normal SIMD4x2 double-precision instructions they naturally write two registers (one per vertex) and there is no reason why we should not allow coalescing in this case. Change the restriction to bail if we see instructions that write more than 8 channels, where the channels can be 32-bit or 64-bit. v2: - Make sure that scan_inst and inst write the same amount of data. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ac5a06ff83c32ab14e01e526e729b2fbfe3a2426	18-Jul-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: implement access to DF source components Z/W The general idea is that with 32-bit swizzles we cannot address DF components Z/W directly, so instead we select the region that starts at the the 16B offset into the register and use X/Y swizzles. The above, however, has the caveat that we can't do that without violating register region restrictions unless we probably do some sort of SIMD splitting. Alternatively, we can accomplish what we need without SIMD splitting by exploiting the gen7 hardware decompression bug for instructions with a vstride=0. For example, an instruction like this: mov(8) r2.x:DF r0.2<0>xyzw:DF Activates the hardware bug and produces this region: Component: x0 y0 z0 w0 x1 y1 z1 w1 Register: r0.2 r0.3 r0.2 r0.3 r1.2 r1.3 r1.2 r1.3 Where r0.2 and r0.3 are r0.z:DF for the first vertex of the SIMD4x2 execution and r1.2 and r1.3 are the same for the second vertex. Using this to our advantage we can select r0.z:DF by doing r0.2<0,2,1>.xyxy and r0.w by doing r0.2<0,2,1>.zwzw without needing to split the instruction. Of course, this only works for gen7, but that is the only hardware platform were we implement align16/fp64 at the moment. v2: Adapted to the fact that we now do this after converting to hardware registers (Iago) Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e238601a2da9512c0fd263e8378f30498a0a1507	24-May-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: translate 64-bit swizzles to 32-bit The hardware can only operate with 32-bit swizzles, which is a rather limiting restriction. However, the idea is not to expose this to the optimization passes, which would be a mess to deal with. Instead, we let the bulk of the vec4 backend ignore this fact and we fix the swizzles right at codegen time. At the moment the pass only needs to handle single value swizzles thanks to the scalarization pass that runs before it. Notice that this only works for X/Y swizzles. We will add support for Z/W swizzles in the next patch, since they need a bit more work. v2 (Sam): - Do not expand swizzle of 64-bit immediate values. v3: - Do this after translation to hardware registers instead of doing it right before so we don't need the force_vstride0 flag (Curro). - Squashed patch that included FIXED_GRF in the list of register files that need this translation (Iago). - Remove swizzle assignments for VGRF and UNIFORM files in convert_to_hw_regs(), they will be set by apply_logical_swizzle() (Iago). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
fb7cb853c964db44ab99c1592e1ef7dec2f0c25b	24-May-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: add a scalarization pass for double-precision instructions The hardware only supports 32-bit swizzles, which means that we can only access directly channels XY of a DF making access to channels ZW more difficult, specially considering the various regioning restrictions imposed by the hardware. The combination of both things makes handling ramdom swizzles on DF operands rather difficult, as there are many combinations that can't be represented at all, at least not without some work and some level of instruction splitting depending on the case. Writemasks are 64-bit in general, however XY and ZW writemasks also work in 32-bit, which means these writemasks can't be represented natively, adding to the complexity. For now, we decided to try and simplify things as much as possible to avoid dealing with all this from the get go by adding a scalarization pass that runs after the main optimization loop. By fully scalarizing DF instructions in align16 we avoid most of the complexity introduced by the aforementioned hardware restrictions and we have an easier path to an initial fully functional version for the vector backend in Haswell and IvyBridge. Later, we can improve the implementation so we don't necessarily scalarize everything, iteratively adding more complexity and building on top of a framework that is already working. Curro drafted some ideas for how this could be done here: https://bugs.freedesktop.org/show_bug.cgi?id=92760#c82 v2: - Use a copy constructor for the scalar instructions so we copy all relevant instructions fields from the original instruction. v3: Fix indention in one switch (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f4b8649233fa10e89205b6b5f6f334279b198f22	17-Jun-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: split double-precision SEL There is a hardware bug affecting compressed double-precision SEL instructions in align16 mode by which they won't read predication mask properly. The bug does not affect other predicated instructions and it does not affect SEL in Align1 mode either. This was found empirically and verified by Curro in the simulator. Fix this by splitting double-precision SEL in Align16 mode to use an execution size of 4. v2: Check that the dst type is 64-bit, since we can have 16-wide single precision bcsel instructions that also write 2 registers. v3: Replace bcsel by SEL in all the comments as bcsel is the nir opcode but SEL is the actual assembly instruction (Matt). Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a83608f50483ac397545d3815bfe8dc3be5126b6	29-Jun-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: dump NibCtrl for instructions with execsize != 8 v2: do it in the same fashion as the FS backend for consistency (Curro) Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
58767f0fec7809c3408adbc4d147dd56f2ee3d4d	29-Aug-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: add a SIMD lowering pass Generally, instructions in Align16 mode only ever write to a single register and don't need any form of SIMD splitting, that's why we have never had a SIMD splitting pass in the vec4 backend. However, double-precision instructions typically write 2 registers and in some cases they run into certain hardware bugs and limitations that we need to work around by splitting the instructions so we only write to 1 register at a time. This patch implements a SIMD splitting pass similar to the one in the scalar backend. Because we only use double-precision instructions in Align16 mode in gen7 (gen8+ is fully scalar and gens < 7 do not implement fp64) the pass should be a no-op on any other generation. For now the pass only handles the gen7 restriction where any instruction that writes 2 registers also needs to read 2 registers. This affects double-precision instructions reading uniforms, for example. Later patches will extend the lowering pass adding a few more cases. v2: - Move the simd lowering pass after the main optimization loop and run copy-propagation and dce if it reports progress (Curro) - Compute number of registers written instead of fixing it to 1 (Iago) - Use group from backend_instruction (Iago) - Drop assertion that checked that we only split 8-wide instructions into 4-wide. (Curro) - Don't assume that instructions can only be 8-wide, we might want to use 16-wide instructions in the future too (Curro) - Wrap gen7 workarounds in a conditional to ease adding workarounds for other gens in the future (Curro) - Handle dst/src overlap hazard (Curro) - Use the horiz_offset() helper to simplify the implementation (Curro) - Drop the assertion that checks that each split instruction writes exactly one register (Curro) - Use the copy constructor to generate split instructions with all the relevant fields initialized to the values in the original instruction instead of copying only a handful of them manually (Curro) v3 (Iago): - When copying to a temporary, allocate the number of registers required for the copy based on the size written of the lowered instruction instead of assuming that all lowered instructions produce single-register writes - Adapt to changes in offset() Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4ea3bf8ebb56c8db6e885a77d81502a0b2adca4f	10-Jun-2016	Juan A. Suarez Romero <jasuarez@igalia.com>	i965/vec4: handle 32 and 64 bit channels in liveness analysis Our current data flow analysis does not take into account that channels on 64-bit operands are 64-bit. This is a problem when the same register is accessed using both 64-bit and 32-bit channels. This is very common in operations where we need to access 64-bit data in 32-bit chunks, such as the double packing and packing operations. This patch changes the analysis by checking the bits that each source or destination datatype needs. Actually, rather than bits, we use blocks of 32bits, which is the minimum channel size. Because a vgrf can contain a dvec4 (256 bits), we reserve 8 32-bit blocks to map the channels. v2 (Curro): - Simplify code by making the var_from_reg helpers take an extra argument with the register component we want. - Fix a couple of cases where we had to update the code to the new way of representing live variables. v3: - Fix indent in multiline expressions (Matt) - Fix comment's closing tag (Matt) - Use DIV_ROUND_UP(inst->size_written, 16) instead of 2 * regs_written(inst) to avoid rounding issues. The same for regs_read(i). (Curro). - Add asserts in var_from_reg() to avoid exceeding the allocated registers (Curro). Reviewed-by: Francisco Jerez <currojerez@riseup.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
29dd5cf9d64ac998cb313db8a908272a6154ec46	30-May-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: dump the instruction execution size Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f79547840a1951dbf82c7b6629935c6e89020e27	30-May-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: fix regs_read() for doubles Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c722a8e61ebc72d7d21c2bed0f623218d739fdb7	17-Feb-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: Rename DF to/from F generator opcodes The opcodes are not specific for conversions to/from float since we need the same for conversions to/from other 32-bit types. Rename the opcodes accordingly and change the asserts to check the size of the types involved instead. v2: - Rename to VEC4_OPCODE_TO_DOUBLE and VEC4_OPCODE_FROM_DOUBLE (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
21cf6f14d5abdf7d0f9641404387e0c00de6f56f	12-Feb-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: make opt_vector_float ignore doubles The pass does not support doubles in its current form. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
54b998e0e488189307d2614fe56a3b78b442d316	17-Jun-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: add VEC4_OPCODE_SET_{LOW,HIGH}_32BIT opcodes These opcodes will set the low/high 32-bit in each 64-bit data element using Align1 mode. We will use this to implement packDouble2x32. We use Align1 mode because in order to implement this in Align16 mode we would need to use 32-bit logical swizzles (XZ for low, YW for high), but the IR works in terms of 64-bit logical swizzles for DF operands all the way up to codegen. v2: - use suboffset() instead of get_element_ud() - no need to set the width on the dst Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6979e5a41241993b9e7bedea80f29fb43d96aa47	31-May-2016	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: add VEC4_OPCODE_PICK_{LOW,HIGH}_32BIT opcodes These opcodes will pick the low/high 32-bit in each 64-bit data element using Align1 mode. We will use this, for example, to do things like unpackDouble2x32. We use Align1 mode because in order to implement this in Align16 mode we would need to use 32-bit logical swizzles (XZ for low, YW for high), but the IR works in terms of 64-bit logical swizzles for DF operands all the way up to codegen. v2: - use suboffset() instead of get_element_ud() - no need to set the width on the dst Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c35fa7ac5507a64943aa518b2dac8bddfdc9e14b	18-Nov-2015	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: set correct register regions for 32-bit and 64-bit For 32-bit instructions we want to use <4,4,1> regions for VGRF sources so we should really set a width of 4 (we were setting 8). For 64-bit instructions we want to use a width of 2 because the hardware uses 32-bit swizzles, meaning that we can only address 2 consecutive 64-bit components in a row. Also, Curro suggested that the hardware is probably fixing the width to 2 for 64-bit instructions anyway, so just go with that and use <2,2,1>. v2: - No need to explicitly set the vertical stride of 64-bit regions to 2, brw_vecn_grf with a width of 2 will do that for us. - No need to adjust the width of dst registers. v3 (Ian): - Make type_size and width const. Signed-off-by: Connor Abbott <connor.w.abbott@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
558f27953101c438747c3e9d3c3f98ce21e79007	14-Aug-2015	Iago Toral Quiroga <itoral@igalia.com>	i965/vec4: add double/float conversion pseudo-opcodes These need to be emitted as align1 MOV's, since they need to have a stride of 2 on the float register (whether src or dest) so that data from another thread doesn't cross the middle of a SIMD8 register. v2 (Iago): - The float-to-double needs to align 32-bit data to 64-bit before doing the conversion. This was doable in align16 when we tried to use an execsize of 4, but with an execsize of 8 we would need another align1 opcode to do that (since we need data to cross the middle of a SIMD register). Just making the opcode handle this internally seems more practical that adding another opcode just for this purpose and having the caller know about this before converting. - The double-to-float conversion produces 32-bit elements aligned to 64-bit so we make the opcode re-pack the result to 32-bit and fit in one register, as expected by SIMD4x2 operation. This still requires that callers reserve two registers for the float data destination because we need to produce 64-bit aligned data first, and repack it later on the same destination register, but it saves the need for a re-pack opcode only to achieve this making the operation complete in a single opcode. Hopefully that is worth the weirdness of the double register allocation... Signed-off-by: Connor Abbott <connor.w.abbott@intel.com> Signed-off-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2d6eee3144ce16b39909522be466bdb3871f4c1b	13-Aug-2015	Connor Abbott <connor.w.abbott@intel.com>	i965/vec4: add support for printing DF immediates Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e729504fb1799c3ae31cea76d73946530ef9806f	14-Sep-2016	Timothy Arceri <timothy.arceri@collabora.com>	nir: pass compiler rather than devinfo to functions that call nir_optimize Later we will pass compiler to nir_optimise to be used by the loop unroll pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a1a292d17710a2bfb33f798c9f5fda73a5985261	04-Oct-2016	Kenneth Graunke <kenneth@whitecape.org>	i965: Store a clip_distance_mask field similar to cull_distance_mask. This isn't useful for legacy GL, but will be used in Vulkan. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
19c652b29ce7271374cd0951bdadc9840964e78e	04-Oct-2016	Kenneth Graunke <kenneth@whitecape.org>	i965: Use shader_info for brw_vue_prog_data::cull_distance_mask. This also allows us to move it from a GL specific location to a part of the compiler shared by both GL and Vulkan. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e1af20f18a86f52a9640faf2d4ff8a71b0a4fa9b	13-Oct-2016	Timothy Arceri <timothy.arceri@collabora.com>	nir/i965/anv/radv/gallium: make shader info a pointer When restoring something from shader cache we won't have and don't want to create a nir_shader this change detaches the two. There are other advantages such as being able to reuse the shader info populated by GLSL IR. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
89e1436e2d4ff0c15202708979eb36761cae4167	11-Oct-2016	Ian Romanick <ian.d.romanick@intel.com>	i965: Silence unused parameter warnings brw_link.cpp:76:44: warning: unused parameter ‘shader_type’ [-Wunused-parameter] gl_shader_stage shader_type, ^ brw_nir.c: In function ‘brw_nir_lower_vs_inputs’: brw_nir.c:194:55: warning: unused parameter ‘devinfo’ [-Wunused-parameter] const struct gen_device_info *devinfo, ^ brw_vec4_visitor.cpp:914:37: warning: unused parameter ‘sampler’ [-Wunused-parameter] uint32_t sampler, ^ brw_vec4_visitor.cpp:1146:34: warning: unused parameter ‘stream_id’ [-Wunused-parameter] vec4_visitor::gs_emit_vertex(int stream_id) ^ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Engestrom <eric@engestrom.ch> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f57f526fc5cfaedf26b2becf8f1899d5de0d0461	16-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/ir: Skip eliminate_find_live_channel() for stages with sparse thread dispatch. The eliminate_find_live_channel optimization eliminates FIND_LIVE_CHANNEL instructions in cases where control flow is known to be uniform, and replaces them with 'MOV 0', which in turn unblocks subsequent elimination of the BROADCAST instruction frequently used on the result of FIND_LIVE_CHANNEL. This is however not correct in per-sample fragment shader dispatch because the PSD can dispatch a fully unlit sample under certain conditions. Disable the optimization in that case. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> v2: Add devinfo argument to brw_stage_has_packed_dispatch() to implement hardware generation check. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5ca35c63673dad28854c00ce34ec6f085ba4ec5e	02-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Assert that ATTR regions are register-aligned. It might be useful to actually handle this once copy propagation becomes smarter about register-misaligned offsets. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8bed1adfc144d9ae8d55ccb9b277942da8a78064	02-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Assign correct destination offset to rewritten instruction in register coalesce. Because the pass already checks that the destination offset of each 'scan_inst' that needs to be rewritten matches 'inst->src[0].offset' exactly, the final offset of the rewritten instruction is just the original destination offset of the copy. This is in preparation for adding support for sub-GRF offsets to the VEC4 IR. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3a74e437fdec02c28749c94bc1bcf21c3c4b48d7	02-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Don't coalesce registers with overlapping writes not matching the MOV source. In preparation for adding support for sub-GRF offsets to the VEC4 IR. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1bb5074474445ea9f54d0f52383f99ac0fa6128f	02-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Compare full register offsets in opt_register_coalesce nop move check. In preparation for adding support for sub-GRF offsets to the VEC4 IR. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3be0d6d040753c62b25077fb6b85ad1f0808b258	02-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Check that the write offsets match when setting dependency controls. For simplicity just assume that two writes to the same GRF with different sub-GRF offsets will potentially interfere and break the dependency control chain. This is in preparation for adding sub-GRF offset support to the VEC4 IR. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b52fefc4d55a4627bf0d59c78ac531603fa08fda	02-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Change opt_vector_float to keep track of the last offset seen in bytes. This simplifies things slightly and makes the pass more correct in presence of sub-GRF offsets. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
230615e2280e6d28456e7d6a42b1e42645515b4d	09-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Simplify src/dst_reg to brw_reg conversion by using byte_offset(). This should also have the side effect of fixing convert_to_hw_regs() to handle sub-GRF register offsets. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
eb746a80e5e99bafd3957a1cb2d9db8548a1a6be	02-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/ir: Update several stale comments. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
47784e2346b56bea6a1111fecaa953239ff198ca	02-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/ir: Don't print ARF subnr values twice. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5d65d51e78c2f73389a0d30dac6dda4561e91bec	02-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Print src/dst_reg::offset field consistently for all register files. C.f. 'i965/fs: Print fs_reg::offset field consistently for all register files.'. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
fcd9d1badcd97486eea5d87bf701a3b0a16b4ba9	02-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Drop backend_reg::in_range() in favor of regions_overlap(). This makes sure that overlap checks are done correctly throughout the back-end when the '*this' register starts before the register/size pair provided as argument, and is actually less annoying to use than in_range() at this point since regions_overlap() takes its size arguments in bytes. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
728dd30c0ac0078653974de36087456065d2e3ae	08-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Replace vec4_instruction::regs_read with ::size_read using byte units. The previous regs_read value can be recovered by rewriting each reference of regs_read() like 'x = i.regs_read(j)' to 'x = DIV_ROUND_UP(i.size_read(j), reg_unit)'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
69fdf13c215c2970feaca76f178a5c2c11ba8fec	03-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Replace vec4_instruction::regs_written with ::size_written field in bytes. The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d28cfa35fec75c367b940ff829ba8eaa035fbd22	02-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Add wrapper functions for vec4_instruction::regs_read and ::regs_written. This is in preparation for dropping vec4_instruction::regs_read and ::regs_written in favor of more accurate alternatives expressed in byte units. The main reason these wrappers are useful is that a number of optimization passes implement dataflow analysis with register granularity, so these helpers will come in handy once we've switched register offsets and sizes to the byte representation. The wrapper functions will also make sure that GRF misalignment (currently neglected by most of the back-end) is taken into account correctly in the calculation of regs_read and regs_written. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
fba020e5af49d9d9a2c6e4d4b79115ed1e74a127	01-Sep-2016	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Replace dst/src_reg::reg_offset with dst/src_reg::offset expressed in bytes. The dst/src_reg::offset field in byte units introduced in the previous patch is a more straightforward alternative to an offset representation split between ::reg_offset and ::subreg_offset fields. The split representation makes it too easy to forget about one of the offsets while dealing with the other, which has led to multiple FS back-end bugs in the past. To make the matter worse the unit reg_offset was expressed in was rather inconsistent, for uniforms it would be expressed in either 4B or 16B units depending on the back-end, and for most other things it would be expressed in 32B units. This encodes reg_offset as a new offset field expressed consistently in byte units. Each rvalue reference of reg_offset in existing code like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and each lvalue reference like 'r.reg_offset = x' is rewritten to 'r.offset = r.offset % reg_unit + x * reg_unit'. Because the change affects a lot of places and is rather non-trivial to verify due to the inconsistent value of reg_unit, I've tried to avoid making any additional changes other than applying the rewrite rule above in order to keep the patch as simple as possible, sometimes at the cost of introducing obvious stupidity (e.g. algebraic expressions that could be simplified given some knowledge of the context) -- I'll clean those up later on in a second pass. v2: Fix division by the wrong reg_unit in the UNIFORM case of convert_to_hw_regs(). (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
527f37199929932300acc1688d8160e1f3b1d753	23-Aug-2016	Jason Ekstrand <jason.ekstrand@intel.com>	intel: s/brw_device_info/gen_device_info/ Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/*/.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/*/.h sed -i -e 's/brw_device_info/gen_device_info/g' */i965/.c sed -i -e 's/brw_device_info/gen_device_info/g' */i965/.cpp sed -i -e 's/brw_device_info/gen_device_info/g' */i965/.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e7c376adfdecd4c1333997c8be8bb066a87c67b4	19-Aug-2016	Matt Turner <mattst88@gmail.com>	i965/vec4: Ignore swizzle of VGRF for use by var_range_end(). var_range_end(v, n) loops over the n components of variable number v and finds the maximum value, giving the last use of any component of v. Therefore it expects v to correspond to the variable associated with the .x channel of the VGRF. var_from_reg() however returns the variable for the first channel of the VGRF, post-swizzle. So, if the last register had a swizzle with y, z, or w in the swizzle component, we would read out of bounds. For any other register, we would read liveness information from the next register. The fix is to convert the src_reg to a dst_reg in order to call the dst_reg version of var_from_reg() that doesn't consider the swizzle. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4c3a6b07e2960266adca634f8607ef38f71b8318	20-Jul-2016	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vec4: Make opt_vector_float reset at the top of each block The pass isn't really control-flow aware and you can get into case where it tries to combine instructions from different blocks. This can actually lead to an assertion failure when removing unneeded instructions if part of the vector is set in one block and part in another. This prevents regressions in the next commit. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Cc: "12.0" <mesa-stable@lists.freedesktop.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7ea09511ca4f58640063cc1ee08386cce5300535	04-Apr-2016	Juan A. Suarez Romero <jasuarez@igalia.com>	i965/fs: calculate first non-payload GRF using attrib slots When computing where the first non-payload GRF starts, we can't rely on the number of attributes, as each attribute can be using 1 or 2 slots depending on whether they are a dvec3/4 or other. Instead, we need to use the number of slots used by the attributes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b7423b485e11b768f68e8d5865fbc74b07ee6d48	04-Apr-2016	Juan A. Suarez Romero <jasuarez@igalia.com>	i965/vec4: use attribute slots to calculate URB read length Do not use total attributes because a dvec3/dvec4 attribute requires two slots. So rather use total attribute slots. v2: do not use loop to calculate required attribute slots (Kenneth Graunke) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1ec466d0ff59ab17edef95c84ed733c1fea5655e	28-Apr-2016	Jason Ekstrand <jason.ekstrand@intel.com>	i965/fs: Stop setting dispatch_grf_start_reg from the visitor Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1cc7573162a7f0e8346d7abab50890c58a0dce9a	28-Apr-2016	Francisco Jerez <currojerez@riseup.net>	i965: Pass devinfo pointer to is_3src() helpers. This is not strictly required for the following changes because none of the three-source opcodes we support at the moment in the compiler back-end has been removed or redefined, but that's likely to change in the future. In any case having hardware instructions specified as a pair of hardware device and opcode number explicitly in all cases will simplify the opcode look-up interface introduced in a subsequent commit, since the opcode number alone is in general ambiguous. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c55dc77ab13420a9fe0177ccd21a6b0a950d9113	28-Apr-2016	Francisco Jerez <currojerez@riseup.net>	i965: Pass devinfo pointer to brw_instruction_name(). A future series will implement support for an instruction that happens to have the same opcode number as another instruction we support already on a disjoint set of hardware generations. In order to disambiguate which instruction it is brw_instruction_name() will need some way to find out which device we are generating code for. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
60a17d071825da4a06303cb699e4417edaaa6386	14-Apr-2016	Kenneth Graunke <kenneth@whitecape.org>	i965: Properly handle integer types in opt_vector_float(). Previously, opt_vector_float() always interpreted MOV sources as floating point, and always created a MOV with a F-type destination. This meant that we could mess up sequences of integer loads, such as: mov vgrf6.0.x:D, 0D mov vgrf6.0.y:D, 1D mov vgrf6.0.z:D, 2D mov vgrf6.0.w:D, 3D Here, integer 0/1/2/3 become approximately 0.0f, so we generated: mov vgrf6.0:F, [0F, 0F, 0F, 0F] which is clearly wrong. We can properly handle this by converting integer values to float (rather than bitcasting), and emitting a type converting MOV: mov vgrf6.0:D, [0F, 1F, 2F, 3F] To do this, see first see if the integer values (converted to float) are representable. If so, we use a D-type MOV. If not, we then try the floating point values and an F-type MOV. We make zero not impose type restrictions. This is important because 0D would imply a D-type MOV, but is often used in sequences such as MOV 0D, MOV 0x3f800000D, where we want to use an F-type MOV. Fixes about 54 dEQP-GLES2 failures with the vec4 VS backend. This recently became visible due to changes in opt_vector_float() which made it optimize more cases, but it was a pre-existing bug. Apparently it also manages to turn more integer loads into VFs, producing the following shader-db statistics on Haswell: total instructions in shared programs: 7084195 -> 7082191 (-0.03%) instructions in affected programs: 246027 -> 244023 (-0.81%) helped: 1937 total cycles in shared programs: 65669642 -> 65651968 (-0.03%) cycles in affected programs: 531064 -> 513390 (-3.33%) helped: 1177 v2: Handle the type of zero better. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1aa28f3509b033e0f86510a6d4c7993fca650b3b	14-Apr-2016	Kenneth Graunke <kenneth@whitecape.org>	i965: Make opt_vector_float() only handle non-type-conversion MOVs. We don't handle this properly - we'd have to perform the type conversion before trying to convert the value to a VF. While we could do that, it doesn't seem particularly useful - most vector loads should be consistently typed (all float or all integer). As a special case, we do allow type-converting MOVs of integer 0, as it's represented the same regardless of the type. I believe this case does actually come up. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2a25a5142bd78b22cc9ada41b8988bb282c2a7ac	14-Apr-2016	Kenneth Graunke <kenneth@whitecape.org>	i965: Fold vectorize_mov() back into the one caller. After the previous patch, this helper is only called in one place. So, just fold it back in - there are a lot of parameters here and not much code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9967561158acd94edff0fa93ceaf4bc527e271ed	14-Apr-2016	Kenneth Graunke <kenneth@whitecape.org>	i965: Rework opt_vector_float() control flow. This reworks opt_vector_float() so that there's only one place that flushes out any accumulated state and emits a VF. v2: Don't break the sequence for non-representable numbers - just skip recording their values. Only break it for non-MOVs or register changes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a112391d52a458c588b8770cbf1ca9fce8863b79	06-Apr-2016	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vec4: Handle MOV_INDIRECT in pack_uniform_registers Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
aaac8a18904f44e93a2223c93727086358d6a655	24-Nov-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vec4: Add support for SHADER_OPCODE_MOV_INDIRECT Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
61ee5e62a2beeb2e405ff3aa5e3eb26d1bf5437d	05-Apr-2016	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vec4: Use can_do_writemask in can_reswizzle Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
75b68f9114dc3ba1b501fb7de8198c03b3dcb1fd	05-Apr-2016	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vec4: Move can_do_writemask to vec4_instruction Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8e76f664beb845f8dca30ca5635f9369618563b0	09-Dec-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vec4: Get rid of the uniform_size array Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
056849772f66582fd7e8a181c3fb16955f84243b	25-Nov-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vec4: Use MOV_INDIRECT instead of reladdr for indirect push constants This commit moves us to an instruction based model rather than a register-based model for indirects. This is more accurate anyway as we have to emit instructions to resolve the reladdr. It's also a lot simpler because it gets rid of the recursive reladdr problem by design. One side-effect of this is that we need a whole new algorithm in move_uniform_array_access_to_pull_constants. This new algorithm is much more straightforward than the old one and is fairly similar to what we're already doing in the FS backend. Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
01425c45b32fa7f323515b05697c6cc0d245ad32	17-Mar-2016	Jason Ekstrand <jason.ekstrand@intel.com>	i965: Remove the RCP+RSQ algebraic optimizations NIR already has this optimization and it can do much better than the little peephole in the backend. No shader-db change on Haswell or Broadwell. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7d7990cf657550be4d038a0424ffdc0ef7fd8faa	14-Mar-2016	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Consider removal of no-op MOVs as progress during register coalesce. Bug found by the liveness analysis validation pass that will be introduced in a later commit. The no-op MOV check in opt_register_coalesce() was removing instructions which makes the cached liveness analysis calculation inconsistent with the shader IR. We were failing to set progress to true in that case though, which means that invalidate_live_intervals() wouldn't necessarily be called at the end of the function. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2f76a9924e7b0b33a508ee3651b0cb2ab536a7dc	02-Mar-2016	Juan A. Suarez Romero <jasuarez@igalia.com>	i965/vec4: add opportunistic behaviour to opt_vector_float() opt_vector_float() transforms several scalar MOV operations to a single vectorial MOV. This is done when those MOV covers all the components of the destination register. So something like: mov vgrf3.0.xy:D, 0D mov vgrf3.0.w:D, 1065353216D mov vgrf3.0.z:D, 0D is transformed in: mov vgrf3.0:F, [0F, 0F, 0F, 1F] But there are cases where not all the components are written. For example, in: mov vgrf2.0.x:D, 1073741824D mov vgrf3.0.xy:D, 0D mov vgrf3.0.w:D, 1065353216D mov vgrf4.0.xy:D, 1065353216D mov vgrf4.0.w:D, 0D mov vgrf6.0:UD, u4.xyzw:UD Nor vgrf3 nor vgrf4 .z components are written, so the optimization is not applied. But it could be applied anyway with the components covered, using a writemask to select the ones written. So we could transform it in: mov vgrf2.0.x:D, 1073741824D mov vgrf3.0.xyw:F, [0F, 0F, 0F, 1F] mov vgrf4.0.xyw:F, [1F, 1F, 0F, 0F] mov vgrf6.0:UD, u4.xyzw:UD This commit does precisely that: opportunistically apply opt_vector_float() when possible. total instructions in shared programs: 7124660 -> 7114784 (-0.14%) instructions in affected programs: 443078 -> 433202 (-2.23%) helped: 4998 HURT: 0 total cycles in shared programs: 64757760 -> 64728016 (-0.05%) cycles in affected programs: 1401686 -> 1371942 (-2.12%) helped: 3243 HURT: 38 v2: change vectorize_mov() signature (Matt). v3: take in account predicates (Juan). v4 [mattst88]: Update shader-db numbers. Fix some whitespace issues. Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
cfbd9831f89ef165e7998d0b8524a1aefedec404	25-Feb-2016	Kenneth Graunke <kenneth@whitecape.org>	i965: Eliminate brw_nir_lower_{inputs,outputs,io} functions. Now that each stage is directly calling brw_nir_lower_io(), and we have per-stage helper functions, it makes sense to just call the relevant one directly, rather than going through multiple switch statements. This also eliminates stupid function parameters, such as the two that only apply to vertex attributes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2f2c00c7279e7c43e520e21de1781f8cec263e92	11-Feb-2016	Matt Turner <mattst88@gmail.com>	i965: Lower min/max after optimization on Gen4/5. Gen4/5's SEL instruction cannot use conditional modifiers, so min/max are implemented as CMP + SEL. Handling that after optimization lets us CSE more. On Ironlake: total instructions in shared programs: 6426035 -> 6422753 (-0.05%) instructions in affected programs: 326604 -> 323322 (-1.00%) helped: 1411 total cycles in shared programs: 129184700 -> 129101586 (-0.06%) cycles in affected programs: 18950290 -> 18867176 (-0.44%) helped: 2419 HURT: 328 Reviewed-by: Francisco Jerez <currojerez@riseup.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8122d21d1507b4d6d351299f88fff0c645c0b4ff	13-Feb-2016	Kenneth Graunke <kenneth@whitecape.org>	i965: Fix gl_DrawID in the vec4 backend. brw_draw_upload.c uploads VertexID/InstanceID first, then DrawID. So we need to assign the attribute mapping in that order as well. Fixes the following Pigit tests with the vec4 backend: - arb_shader_draw_parameters-drawid vertexid - arb_shader_draw_parameters-drawid-indirect basevertex Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5743fd957145040a4734b5542ee5187cfad4cf1d	11-Feb-2016	Ben Widawsky <benjamin.widawsky@intel.com>	i965: Rename optimizer debug 00 filename This allows ls, and scripts to get the file names in the correct order of optimization. Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
85f5c18fef1ff2f19d698f150e23a02acd6f59b9	14-Jan-2016	Kenneth Graunke <kenneth@whitecape.org>	i965/vec4: Drop support for ATTR as an instruction destination. This is no longer necessary...and it doesn't make much sense to have inputs as destinations. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d56ae2d1605fc1b5a3fdf5aba9aefc3c7692a4ba	14-Jan-2016	Kenneth Graunke <kenneth@whitecape.org>	i965: Apply VS attribute workarounds in NIR. This patch re-implements the pre-Haswell VS attribute workarounds. Instead of emitting shader code in the vec4 backend, we now simply call a NIR pass to emit the necessary code. This simplifies the vec4 backend. Beyond deleting code, it removes the primary use of ATTR as a destination. It also eliminates the requirement that the vec4 VS backend express the ATTR file in terms of VERT_ATTRIB_* locations, giving us a bit more flexibility. This approach is a little different: rather than munging the attributes at the top, we emit code to fix them up when they're accessed. However, we run the optimizer afterwards, so CSE should eliminate the redundant math. It may even be able to fuse it with other calculations based on the input value. shader-db does not handle non-default NOS settings, so I have no statistics about this patch. Note that the scalar backend does not implement VS attribute workarounds, as they are unnecessary on hardware which allows SIMD8 VS. v2: Do one multiply for FIXED rescaling and select components from either the original or scaled copy, rather than multiplying each component separately (suggested by Matt Turner). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
830b075e86e3e9af1bf12316d0f9d888a85a973b	05-Jan-2016	Kenneth Graunke <kenneth@whitecape.org>	i965: Explicitly write the "TR DS Cache Disable" bit at TCS EOT. Bit 0 of the Patch Header is "TR DS Cache Disable". Setting that bit disables the DS Cache for tessellator-output topologies resulting in stitch-transition regions (but leaves it enabled for other cases). We probably shouldn't leave this to chance - the URB could contain garbage - which could result in the cache randomly being turned on or off. This patch makes the final EOT write 0 to the first DWord (which only contains this one bit). This ensures the cache is always on. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9870f798beab701a9edda81ff7ccc39f1875d610	15-Jan-2016	Jason Ekstrand <jason.ekstrand@intel.com>	i965/fs/generator: Take an actual shader stage rather than a string Cc: "11.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
824d82025d0bff9841647942aca501fba16fc1a9	14-Jan-2016	Kenneth Graunke <kenneth@whitecape.org>	i965: Make an is_scalar boolean in brw_compile_vs(). Shorter than compiler->scalar_stage[MESA_SHADER_VERTEX], which can help with line-wrapping. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
53a9b6223f4ebf66e8892e04ffe47eb5586eda5c	31-Dec-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Move 3-src subnr swizzle handling into the vec4 backend. While most align16 instructions only support a SubRegNum of 0 or 4 (using swizzling to control the other channels), 3-src instructions actually support arbitrary SubRegNums. When the RepCtrl bit is set, we believe it ignores the swizzle and uses the equivalent of a <0,1,0> region from the subnr. In the past, we adopted a vec4-centric approach of specifying subnr of 0 or 4 and a swizzle, then having brw_eu_emit.c convert that to a proper SubRegNum. This isn't a great fit for the scalar backend, where we don't set swizzles at all, and happily set subnrs in the range [0, 7]. This patch changes brw_eu_emit.c to use subnr and swizzle directly, relying on the higher levels to set them sensibly. This should fix problems where scalar sources get copy propagated into 3-src instructions in the FS backend. I've only observed this with TES push model inputs, but I suppose it could happen in other cases. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
cddfc2cefa93b884c40329dcb193fe4fb22143ab	10-Dec-2015	Kristian Høgsberg Kristensen <krh@bitplanet.net>	i965: Add support for gl_DrawIDARB and enable extension We have to break open a new vec4 for gl_DrawIDARB. We've used up all space in the vec4 we use for SGVS and gl_DrawIDARB has to come from its own separate vertex buffer anyway. This is because we point the vb for base vertex and base instance into the draw parameter BO for indirect draw calls, but the draw id is generated by mesa in a different buffer. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
17ebb55a14b5a9aa639845fbda9330ef9421834a	10-Dec-2015	Kristian Høgsberg Kristensen <krh@bitplanet.net>	i965: Add support for gl_BaseVertexARB and gl_BaseInstanceARB We already have gl_BaseVertexARB in the .x component of the SGVS vec4 and plug gl_BaseInstanceARB into the last free component (.y). Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bd8ab8dedb2cc557ea3cb58d507f237743b3f7f9	24-Dec-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Don't set interleave or complete on TCS EOT message. Setting interleave on the TCS EOT message causes Ivybridge hardware to GPU hang like crazy. Individual tests would pass, but running even a simple test like nop.shader_test in a loop would hang within 1-3 runs. Adding sleep delays worked around the problem, somehow. Interleave doesn't make much sense given that we only have one patch URB handle, not two. Complete doesn't seem useful either. There's no reason to actually set those bits. We were just being lazy. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b7793783b3df94880655234bc2a9054eddf01913	26-Nov-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Relase input URB Handles on Gen7/7.5 when TCS threads finish. Pre-Broadwell hardware requires us to manually release the ICP Handles by issuing URB read messages with the "Complete" bit set. We can do this in pairs to use fewer URB read messages. Based heavily on work from Chris Forbes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1245724f728915694ecb9c318a68107c01ccc808	17-Nov-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Port tessellation evaluation shaders to vec4 mode. This can be used on Broadwell by setting INTEL_SCALAR_TES=0. More importantly, it will be used for Ivybridge and Haswell. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
24be658d13b13fdb8a1977208038b4ba43bce4ac	17-Nov-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Add tessellation control shaders. The TCS is the first tessellation shader stage, and the most complicated. It has access to each of the control points in the input patch, and computes a new output patch. There is one logical invocation per output control point; all invocations run in parallel, and can communicate by reading and writing output variables. One of the main responsibilities of the TCS is to write the special gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which control how much new geometry the hardware tessellation engine will produce. Otherwise, it simply writes outputs that are passed along to the TES. We run in SIMD4x2 mode, handling two logical invocations per EU thread. The hardware doesn't properly manage the dispatch mask for us; it always initializes it to 0xFF. We wrap the whole program in an IF..ENDIF block to handle an odd number of invocations, essentially falling back to SIMD4x1 on the last thread. v2: Update comments (requested by Jordan Justen). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
83dedb6354d0e9b04e8ccad77e86bdb7bad44bdd	20-Nov-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Add src/dst interference for certain instructions with hazards. When working on tessellation shaders, I created some vec4 virtual opcodes for creating message headers through a sequence like: mov(8) g7<1>UD 0x00000000UD { align1 WE_all 1Q compacted }; mov(1) g7.5<1>UD 0x00000100UD { align1 WE_all }; mov(1) g7<1>UD g0<0,1,0>UD { align1 WE_all compacted }; mov(1) g7.3<1>UD g8<0,1,0>UD { align1 WE_all }; This is done in the generator since the vec4 backend can't handle align1 regioning. From the visitor's point of view, this is a single opcode: hs_set_output_urb_offsets vgrf7.0:UD, 1U, vgrf8.xxxx:UD Normally, there's no hazard between sources and destinations - an instruction (naturally) reads its sources, then writes the result to the destination. However, when the virtual instruction generates multiple hardware instructions, we can get into trouble. In the above example, if the register allocator assigned vgrf7 and vgrf8 to the same hardware register, then we'd clobber the source with 0 in the first instruction, and read back the wrong value in the last one. It occured to me that this is exactly the same problem we have with SIMD16 instructions that use W/UW or B/UB types with 0 stride. The hardware implicitly decodes them as two SIMD8 instructions, and with the overlapping regions, the first would clobber the second. Previously, we handled that by incrementing the live range end IP by 1, which works, but is excessive: the next instruction doesn't actually care about that. It might also be the end of control flow. This might keep values alive too long. What we really want is to say "my source and destinations interfere". This patch creates new infrastructure for doing just that, and teaches the register allocator to add interference when there's a hazard. For my vec4 case, we can determine this by switching on opcodes. For the SIMD16 case, we just move the existing code there. I audited our existing virtual opcodes that generate multiple instructions; I believe FS_OPCODE_PACK_HALF_2x16_SPLIT needs this treatment as well, but no others. v2: Rebased by mattst88. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f36993b46962eab4446bc1964eb47149751aee26	23-Nov-2015	Matt Turner <mattst88@gmail.com>	i965: Clean up #includes in the compiler. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2d8c5299032d229c8f6e936db5644cd53716e6c1	20-Nov-2015	Matt Turner <mattst88@gmail.com>	i965: Prevent implicit upcasts to brw_reg. Now that backend_reg inherits from brw_reg, we have to be careful to avoid the object slicing problem. Reviewed-by: Francisco Jerez <currojerez@riseup.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
799f924073c62c3a012c48a51895b46ad621e36c	24-Nov-2015	Matt Turner <mattst88@gmail.com>	i965: Use scope operator to ensure brw_reg is interpreted as a type. In the next patch, I make backend_reg's inheritance from brw_reg private, which confuses clang when it sees the type "struct brw_reg" in the derived class constructors, thinking it is referring to the privately inherited brw_reg: brw_fs.cpp:366:23: error: 'brw_reg' is a private member of 'brw_reg' fs_reg::fs_reg(struct brw_reg reg) : ^ brw_shader.h:39:22: note: constrained by private inheritance here struct backend_reg : private brw_reg ^~~~~~~~~~~~~~~ brw_reg.h:232:8: note: member is declared here struct brw_reg { ^ Avoid this by marking brw_reg with the scope resolution operator. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f093c842e65b251e24ea3a2d6daaa91326a4f862	21-Nov-2015	Matt Turner <mattst88@gmail.com>	i965: Use implicit backend_reg copy-constructor. In order to do this, we have to change the signature of the backend_reg(brw_reg) constructor to take a reference to a brw_reg in order to avoid unresolvable ambiguity about which constructor is actually being called in the other modifications in this patch. As far as I understand it, the rule in C++ is that if multiple constructors are available for parent classes, the one closest to you in the class heirarchy is closen, but if one of them didn't take a reference, that screws things up. Reviewed-by: Francisco Jerez <currojerez@riseup.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
309a44d63c75a7d688157486b094e555f49c907d	22-Nov-2015	Matt Turner <mattst88@gmail.com>	i965: Add and use backend_reg::equals(). Reviewed-by: Francisco Jerez <currojerez@riseup.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6c8ba59cff14a1a86273f4008ff2a8e68335ab25	11-Nov-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965: Use nir_lower_tex for texture coordinate lowering Previously, we had a rescale_texcoords helper in the FS backend for handling rescaling of texture coordinates. Now that we can do variants in NIR, we can use nir_lower_tex to do the rescaling for us. This allows us to delete the i965-specific code and gives us proper TEXTURE_RECTANGLE and GL_CLAMP handling in vertex and geometry shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ce767bbdfff7c2a7829b652c111a11eb9ddba026	11-Nov-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965: Move postprocess_nir to codegen time This allows us to insert NIR passes between initial NIR compilation and optimization (link time) and actual backend code-gen. In particular, it will allow us to do shader variants in NIR and share some of that shader variant code between backends. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a5b3115f0a9ede775b332b1a669de570668e871c	02-Nov-2015	Matt Turner <mattst88@gmail.com>	i965: Drop IMM fs_reg/src_reg -> brw_reg conversions. The previous two commits make this unnecessary. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f9a9ba5eac2f1934bd7fecc92cd309f22411164b	02-Nov-2015	Matt Turner <mattst88@gmail.com>	i965/vec4: Replace src_reg(imm) constructors with brw_imm_*(). Cuts 1.5k of .text. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
44d6c0c805d2911cc5dfe853e5bc5a505f87775f	12-Nov-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Convert scalar_* flags to a scalar_stage array. I was going to add scalar_tcs and scalar_tes flags, and then thought better of it and decided to convert this to an array. Simpler. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0eb3db117b56b081ee2674cc8940c193ffc3c41b	02-Nov-2015	Matt Turner <mattst88@gmail.com>	i965: Use BRW_MRF_COMPR4 macro in more places. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
49b3215d7076db8b9afe8998b01ef250795b5892	27-Oct-2015	Matt Turner <mattst88@gmail.com>	i965: Combine register file field. The first four values (2-bits) are hardware values, and VGRF, ATTR, and UNIFORM remain values used in the IR. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b3315a6f56fb93f2884168cbf9358b2606641db5	27-Oct-2015	Matt Turner <mattst88@gmail.com>	i965: Replace HW_REG with ARF/FIXED_GRF. HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to look at different fields for type, abs, negate, writemask, swizzle, and a second file. They also caused annoying problems like immediate sources being considered scheduling barriers (commit 6148e94e2) and other such nonsense. Instead use ARF/FIXED_GRF/MRF for fixed registers in those files. After a sufficient amount of time has passed since "GRF" was used, we can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b163aa01487ab5f9b22c48b7badc5d65999c4985	27-Oct-2015	Matt Turner <mattst88@gmail.com>	i965: Rename GRF to VGRF. The 2-bit hardware register file field is ARF, GRF, MRF, IMM. Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to mean an assigned general purpose register. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7638e75cf99263c1ee8e31c6cc5a319feec2c943	26-Oct-2015	Matt Turner <mattst88@gmail.com>	i965: Use brw_reg's nr field to store register number. In addition to combining another field, we get replace silliness like "reg.reg" with something that actually makes sense, "reg.nr"; and no one will ever wonder again why dst.reg isn't a dst_reg. Moving the now 16-bit nr field to a 16-bit boundary decreases code size by about 3k. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3048053908310eaf082058e5be34ae902e1fc02c	26-Oct-2015	Matt Turner <mattst88@gmail.com>	i965: Unwrap some lines. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
58fa9d47b536403c4e3ca5d6a2495691338388fd	26-Oct-2015	Matt Turner <mattst88@gmail.com>	i965/vec4: Remove swizzle/writemask fields from src/dst_reg. Also allows us to handle HW_REGs in the swizzle() and writemask() functions. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
94b1031703b1b5759436fe215323727cffce5f86	25-Oct-2015	Matt Turner <mattst88@gmail.com>	i965: Remove fixed_hw_reg field from backend_reg. Since backend_reg now inherits brw_reg, we can use it in place of the fixed_hw_reg field. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1392e45bfb396ccbfa5bb0c6063522e0550988d3	24-Oct-2015	Matt Turner <mattst88@gmail.com>	i965: Use immediate storage in inherited brw_reg. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e42fb0c2a687cdcd6af2a590f6f5e24f64cfff3b	23-Oct-2015	Matt Turner <mattst88@gmail.com>	i965: Make 'dw1' and 'bits' unnamed structures in brw_reg. Generated by sed -i -e 's/\.bits\././g' .c .h .cpp sed -i -e 's/dw1\.//g' .c .h .cpp and then reverting changes to comments in gen7_blorp.cpp and brw_fs_generator.cpp. There wasn't any utility offered by forcing the programmer to list these to access their fields. Removing them will reduce churn in future commits. This is C11 (and gcc has apparently supported it for sometime "compatibility with other compilers") See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e42a29531ae3d5dedb72011da2947357dfa8715b	10-Nov-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Print force_writemask_all in dump_instructions(). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4ef27745c8ed5153464db22950a90d74d2ef4435	09-Sep-2015	Neil Roberts <neil@linux.intel.com>	i965/vec4/skl+: Use ld2dms_w instead of ld2dms In order to support 16x MSAA, skl+ has a wider version of ld2dms that takes two parameters for the MCS data. The MCS data in the response still fits in a single register so we just need to ensure we copy both values rather than just the lower one. Acked-by: Ben Widawsky <ben@bwidawsk.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7c81a6a647257c309cb1ca36c60aa4bfa8e2e022	26-Oct-2015	Matt Turner <mattst88@gmail.com>	i965: Replace default case with list of enum values. If we add a new file type, we'd like to get warnings if it's not handled. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4cba8f5d21e4b50343e7c7bfbeb603b59c5d71dd	23-Oct-2015	Kenneth Graunke <kenneth@whitecape.org>	i965/vec4: Wrap vec4_generator in a C function. vec4_generator is a class for convenience, but only exports a single method as its public API. It makes much more sense to just export a single function. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
73ff0ead3688519eb76ea8bc32eabb9004e6f37b	23-Oct-2015	Kenneth Graunke <kenneth@whitecape.org>	i965/vec4: Convert src_reg/dst_reg to brw_reg at the end of the visitor. This patch makes the visitor convert registers to the HW_REG file at the very end, after register allocation, post-RA scheduling, and dependency control flagging. After that, everything is in fixed brw_regs. This simplifies the code generator, as it can just use the hardware registers rather than having to interpret our abstract files. In particular, interpreting the UNIFORM file meant reading prog_data to figure out where push constants are supposed to start. Having the part of the code that performs register allocation also translate everything to hardware registers seems sensible. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bfc73ff10eafad59b6ae9ca3991f9f1a3700b3a1	07-Oct-2015	Emil Velikov <emil.l.velikov@gmail.com>	i965: remove unneeded src_reg copy in emit_shader_time_write The variable is already of type src_reg. creating a new instance only to destroy it seems unnecessary. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8cf84a7e470dbd3b46ce4081459d2ecfab22c2d5	09-Oct-2015	Alejandro Piñeiro <apinheiro@igalia.com>	i965/vec4: print predicate control at brw_vec4 dump_instruction v2: externalize pred_ctrl_align16 from brw_disasm.c instead of adding a copy on brw_vec4.c, as suggested by Matt Turner Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
627f94b72e0e9443ad116f072599a7342269f297	28-Sep-2015	Alejandro Piñeiro <apinheiro@igalia.com>	i965/vec4: adding vec4_cmod_propagation optimization vec4 port of fs_cmod_propagation. Shader-db results (no vec4 grepping): total instructions in shared programs: 6240413 -> 6235841 (-0.07%) instructions in affected programs: 401933 -> 397361 (-1.14%) total loops in shared programs: 1979 -> 1979 (0.00%) helped: 2265 HURT: 0 v2: remove extra space and combine two if blocks, as suggested by Matt Turner v3: add condition check to bail out if current inst and inst being scanned has different writemask, as pointed by Matt Turner v3: updated shader-db numbers v4: remove block from foreach_inst_in_block_*_starting_from after commit 801f151917fedb13c5c6e96281a18d833dd6901f Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
801f151917fedb13c5c6e96281a18d833dd6901f	20-Oct-2015	Neil Roberts <neil@linux.intel.com>	i965: Remove block arg from foreach_inst_in_block_*_starting_from Since 49374fab5d793 these macros no longer actually use the block argument. I think this is worth doing to make the macros easier to use because they already have really long names and a confusing set of arguments. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9e17c36b8ba79e688011a5fd293ad5f42da21b66	14-Oct-2015	Matt Turner <mattst88@gmail.com>	i965: Extract can_change_source_types() functions. Make them members of fs_inst/vec4_instruction for use elsewhere. Also fix the fs version to check that dst.type == src[1].type and for !saturate. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
41c474df53d9dcd5fd8e24eba5b7acc2b3c32795	15-Oct-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vs: Move URB entry_size and read_length calculations to compile_vs Reviewed-By: Eduardo Lima Mitev <elima@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4467344c829f1dccdf74e27bef2c5fda72552be6	09-Oct-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965: Rename brw_foo_emit to brw_compile_foo Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5d8bf6de6166a686a006478a420bcd373860e9ee	08-Oct-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler This commit removes all dependence on GL state by getting rid of the brw_context parameter and the GL data structures. v2 (Jason Ekstrand): - Patch use_legacy_snorm_formula through as a function argument rather than trying to go through the shader key. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8f1d968704858d78d7e78a6b88db3ea2bc0cf749	06-Oct-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vec4: Remove gl_program and gl_shader_program from the generator Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5e86f5b3d21fe8e96676bb0608990d72dbf61b85	06-Oct-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/fs: Remove the gl_program from the generator Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
031d3501322aee0a1474c7f2a9b79f9fa9947430	26-Aug-2015	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Unify URB entry size/read length calculations between backends. Both the vec4 and scalar VS backends had virtually identical URB entry size and read length calculations. We can move those up a level to backend-agnostic code and reuse it for both. Unfortunately, the backends need to know nr_attributes to compute first_non_payload_grf, so I had to store that in prog_data. We could use urb_read_length, but that's nr_attributes rounded up to a multiple of two, so doing so would waste a register in some cases. There's more code to be removed in the vec4 backend, but that will come in a follow-on patch. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ee0f0108c8e87b9cfec25bade66670bbc4254139	07-Oct-2015	Kristian Høgsberg Kristensen <krh@bitplanet.net>	i965: Move brw_get_shader_time_index() call out of emit functions brw_get_shader_time_index() is all tangled up in brw_context state and we can't call it from the compiler. Thanks the Jasons recent refactoring, we can just get the index and pass to the emit functions instead. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ba71d581aeb96c4626500eb5b19f3bef2f40d586	05-Oct-2015	Kristian Høgsberg Kristensen <krh@bitplanet.net>	i965: Move brw_dump_ir() out of brw_*_emit() functions We move these calls one level up into the codegen functions. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5a360dcad1fdb91f9129cb21775b9af60cbf57e4	03-Oct-2015	Matt Turner <mattst88@gmail.com>	i965: Generalize predicated break pass for use in vec4 backend. instructions in affected programs: 44204 -> 43762 (-1.00%) helped: 221 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bf7b6fd3fd6d98305d64ee6224ca9f9e7ba48444	02-Oct-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/shader: Get rid of the shader, prog, and shader_prog fields Unfortunately, we can't get rid of them entirely. The FS backend still needs gl_program for handling TEXTURE_RECTANGLE. The GS vec4 backend still needs gl_shader_program for handling transfom feedback. However, the VS needs neither and we can substantially reduce the amount they are used. One day we will be free from their tyranny. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
404419ee1a57c79982d93eefe4de099d61ad2eee	02-Oct-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/fs,vec4: Get rid of the sanity_param_count It doesn't exist for anything other than an assert that, as far as I can tell, isn't possible to trip. Soon, we will remove prog from the visitor entirely and this will become even more impossible to hit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ca6a436f12cb55e9415049a217229c99b02ad3b8	02-Oct-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vec4: Use nir info instead of pulling things out of [shader_]prog Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ea006c4cb5eb2d98d6bfd5a6c32fcae10b636f17	01-Oct-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965: Move binding table setup to codegen time. Setting up binding tables really has little to do with the actual process of turning shaders into instructions; it's more part of setting up prog_data. This commit moves it out of the visitors and with the rest of the prog_data setup stuff. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
28709e37d96d6b64753ca4dcce5fbfeb75f5b499	01-Oct-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/shader: Pull assign_common_binding_table_offsets out of backend_shader This really has nothing to do with the backend compiler and we'd like to eventually be able to set this up earlier in the compile process. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5609e0d7b41e861a3359991e8d0f2053b255fc31	30-Sep-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vec4: Get rid of the uniform_vector_size array The uniform_vector_size array was only ever used by pack_uniform_registers which no longer needs it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ea35fb0fbead2902b1ba37e7cdb1523853fabd8b	30-Sep-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vec4: Use the actual channels used in pack_uniform_registers Previously, pack_uniform_registers worked based on the size of the uniform as given to us when we initially set up the uniforms. However, we have to walk through the uniforms and figure out liveness anyway, so we migh as well record the number of channels used as we go. This may also allow us to pack things tighter in a few cases. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
fc3f45234b4ff9545c84fbe8ec5261604d5ab611	01-Oct-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vs: Move lazy NIR creation to codegen_vs_prog The next commit will add code to codegen_vs_prog that requires the NIR shader to be there in all cases. It doesn't hurt anything to just move it from brw_vs_emit to its only caller. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b85761d11d2abff4d45a4938b34c1c7840c97339	21-Sep-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vec4: Always use NIR GLSL IR vs. NIR shader-db results for vec4 programs on i965: total instructions in shared programs: 1499328 -> 1388354 (-7.40%) instructions in affected programs: 1245199 -> 1134225 (-8.91%) helped: 7469 HURT: 2440 GLSL IR vs. NIR shader-db results for vec4 programs on G4x: total instructions in shared programs: 1436799 -> 1325825 (-7.72%) instructions in affected programs: 1205599 -> 1094625 (-9.20%) helped: 7469 HURT: 2440 GLSL IR vs. NIR shader-db results for vec4 programs on Iron Lake: total instructions in shared programs: 1436654 -> 1325682 (-7.72%) instructions in affected programs: 1205503 -> 1094531 (-9.21%) helped: 7468 HURT: 2440 GLSL IR vs. NIR shader-db results for vec4 programs on Sandy Bridge: total instructions in shared programs: 2016249 -> 1787033 (-11.37%) instructions in affected programs: 1850547 -> 1621331 (-12.39%) helped: 14856 HURT: 1481 GLSL IR vs. NIR shader-db results for vec4 programs on Ivy Bridge: total instructions in shared programs: 1848027 -> 1648216 (-10.81%) instructions in affected programs: 1660279 -> 1460468 (-12.03%) helped: 14668 HURT: 1369 GLSL IR vs. NIR shader-db results for vec4 programs on Bay Trail: total instructions in shared programs: 1848027 -> 1648216 (-10.81%) instructions in affected programs: 1660279 -> 1460468 (-12.03%) helped: 14668 HURT: 1369 GLSL IR vs. NIR shader-db results for vec4 programs on Haswell: total instructions in shared programs: 1848027 -> 1648216 (-10.81%) instructions in affected programs: 1660279 -> 1460468 (-12.03%) helped: 14668 HURT: 1369 I also ran our full suite of benchmarks on a Haswell and had the following statistically significant (according to ministat) changes: Test master-glsl master-nir diff bench_OglGeomPoint 461.556 463.006 1.450 bench_OglTerrainFlyInst 184.484 187.574 3.090 bench_OglTerrainPanInst 132.412 136.307 3.895 bench_OglTexFilterAniso 19.653 19.645 -0.008 bench_OglTexFilterTri 58.333 58.009 -0.324 bench_OglVSInstancing 65.049 65.327 0.278 bench_trexoff 69.474 69.694 0.220 bench_valley 40.708 41.125 0.417 v2 (Jason Ekstrand): - Remove more uses of NirOptions as a switch - New shader-db numbers - Added benchmark numbers Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6485880232df46c0cdded0b063b8841a7855bd32	28-Aug-2015	Samuel Iglesias Gonsalvez <siglesias@igalia.com>	i965/vec4: Implement VS_OPCODE_GET_BUFFER_SIZE Notice that Skylake needs to include a header in the sampler message so it will need some tweaks to work there. Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f2e75ac88a92ab2180de576aca298929cfce03f2	22-Sep-2015	Antia Puentes <apuentes@igalia.com>	i965/vec4: Don't coalesce regs in Gen6 MATH ops if reswizzle/writemask needed Gen6 MATH instructions can not execute in align16 mode, so swizzles or writemasking are not allowed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033 Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
10da96887c785930c2553b2d5bde91e52b8b034a	21-Sep-2015	Matt Turner <mattst88@gmail.com>	i965/vec4: Detect and delete useless MOVs. With NIR: instructions in affected programs: 111508 -> 109193 (-2.08%) helped: 507 Without NIR: instructions in affected programs: 28763 -> 28474 (-1.00%) helped: 186 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a548c75e31b4146d55133cb8c57a82117c196584	05-Sep-2015	Kristian Høgsberg Kristensen <krh@bitplanet.net>	i965: Move perf_debug code to brw_codegen__prog() We're trying to avoid a libdrm dependency in the core compiler, so let's move the perf_debug code one level up from the brw__emit() helpers to the brw_codegen_*_prog() helpers. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
79f1a7ae28c37f77e08e550cd077959a2a1f8341	05-Aug-2015	Antia Puentes <apuentes@igalia.com>	i965/vec4: Fix saturation errors when coalescing registers If the register types do not match and the instruction that contains the final destination is saturated, register coalescing generated non-equivalent code. This did not happen when using IR because types usually matched, but it is visible in nir-vec4. For example, mov vgrf7:D vgrf2:D mov.sat m4:F vgrf7:F is coalesced to: mov.sat m4:D vgrf2:D The patch prevents coalescing in such scenario, unless the instruction we want to coalesce into is a MOV (without type conversion implied). In that case, the patch sets the register types to the type of the final destination. Shader-db results in HSW (only vec4 instructions shown): total instructions in shared programs: 1754415 -> 1754416 (0.00%) instructions in affected programs: 74 -> 75 (1.35%) helped: 0 HURT: 1 GAINED: 0 LOST: 0 Only one extra instruction in one of the shaders, that comes from eliminating a saturation error by preventing register coalesce. Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1037e0a84f61f4b1815093bcfd548d4b58ca106f	11-Sep-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vec4: Don't reswizzle hardware registers Cc: "11.0 10.6" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91719 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d4e29af2344c06490913efc35430f93a966061bb	11-Sep-2015	Alejandro Piñeiro <apinheiro@igalia.com>	i965/vec4: check writemask when bailing out at register coalesce opt_register_coalesce stopped to check previous instructions to coalesce with if somebody else was writing on the same destination. This can be optimized to check if somebody else was writing to the same channels of the same destination using the writemask. Shader DB results (taking into account only vec4): total instructions in shared programs: 1781593 -> 1734957 (-2.62%) instructions in affected programs: 1238390 -> 1191754 (-3.77%) helped: 12782 HURT: 0 GAINED: 0 LOST: 0 v2: removed some parenthesis, fixed indentation, as suggested by Matt Turner v3: added brackets, for consistency, as suggested by Eduardo Lima Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0b91bcea98c0fe201bba89abe1ca3aee4d04c56c	12-Aug-2015	Ilia Mirkin <imirkin@alum.mit.edu>	i965: add support for textureSamples function Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> [v2: kayden-supplied code in fs_nir replacing need for logical opcode] Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bd6e516fc24128d604f677a16f692d88d65a49f1	23-Jul-2015	Iago Toral Quiroga <itoral@igalia.com>	i965: Add a debug option for spilling everything in vec4 code Reviewed-by: Francisco Jerez <currojerez@riseup.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4f4b7c4711d98606270133dfd456acabfa8267a6	28-Aug-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Remove the brw_vue_prog_key base class. The legacy userclip fields are only used for the vertex shader, and at that point there's only program_string_id and the tex struct, which are common to all keys. So there's no need for a "VUE" key base class. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
014b90221ad5cf833bfdd55b0336771d209f0f1d	28-Aug-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Move legacy clip plane handling to vec4_vs_visitor. This is now only used for the vertex shader, so it makes sense to get it out of any paths run by the geometry shader. Instead of passing the gl_clip_plane array into the run() method (which is shared among all subclasses), we add it as a vec4_vs_visitor constructor parameter. This eliminates the bogus NULL parameter in the GS case. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
082b7f1876095f32578720f30fdc35771b2b3e0a	28-Aug-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Delete the brw_vue_program_key::userclip_active flag. There are two uses of this flag. The primary use is checking whether we need to emit code to convert legacy gl_ClipVertex/gl_Position clipping to clip distances. In this case, we also have to upload the clip planes as uniforms, which means setting nr_userclip_plane_consts to a positive value. Checking if it's > 0 works for detecting this case. Gen4-5 also wants to know whether we're doing clipping at all, so it can emit user clip flags. Checking if output_reg[VARYING_SLOT_CLIP_DIST0] is set to a real register suffices for this. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4de86e1371b0d59a5b9a787b726be3d373024647	01-Sep-2015	Alejandro Piñeiro <apinheiro@igalia.com>	i965/vec4: fill src_reg type using the constructor type parameter The src_reg constructor that received the glsl_type was using it only to build the swizzle, but not to fill this->type as dst_reg is doing. This caused some type mismatch between movs and alu operations on the NIR path, so copy propagation optimization was not applied to remove unneeded movs if negate modifier was involved. This was first detected on minus (negate+add) operations. Shader DB results (taking into account only vec4): total instructions in shared programs: 20019 -> 19934 (-0.42%) instructions in affected programs: 2918 -> 2833 (-2.91%) helped: 79 HURT: 0 GAINED: 0 LOST: 0 Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8765f1d7ddfb00dc5b202e4e679ebe640a547d50	18-Aug-2015	Matt Turner <mattst88@gmail.com>	i965: Only consider fixed_hw_reg in equals() if file is HW_REG/IMM. Noticed when debugging things that lead to the next patch. On G45 (and presumably ILK) this helps register coalescing: total instructions in shared programs: 4077373 -> 4077340 (-0.00%) instructions in affected programs: 43751 -> 43718 (-0.08%) helped: 52 HURT: 2 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
34d162260f513a7eaec12611e3859bb34230cf33	08-Jul-2015	Antia Puentes <apuentes@igalia.com>	i965/vec4: Handle uniform and GRF array access on vertex programs (NIR) When the NIR-vec4 pass is enabled, handles uniform and GRF array access on ARB_vertex_program like it is done on vertex shaders. When the old IR-vec4 pass is used, emit_program_code() emits pull constant loads directly instead of using relative addressing, hence to call to move_uniform_array_access_to_pull_constants() is not needed and it is enough to call to split_uniform_registers(). The patch also calls to move_grf_array_access_to_scratch() like it is done for shaders, however I suspect this is a no-op for vertex programs and we could remove it. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
90825e3ca977057c8f3d6ad2d1aa38277cc3ff11	08-Jul-2015	Antia Puentes <apuentes@igalia.com>	i965/vec4: Enable NIR-vec4 pass on ARB_vertex_programs Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
38fc4a91cd5c04fdd5921b8776f8e203513ab517	01-Jul-2015	Iago Toral Quiroga <itoral@igalia.com>	i965/nir: Enable NIR-vec4 pass on geometry shaders Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0d43d27df742ad95a086580bae2ee08a0bc00e69	23-May-2015	Alejandro Piñeiro <apinheiro@igalia.com>	i965/vec4: Add a new dst_reg constructor accepting a brw_reg_type This is useful for the upcoming texture support in NIR->vec4 pass, as we found several cases where the brw_type is available, but not the glsl_type. Without this new constructor, the alternative would be: dst_reg reg(MRF, <reg>) reg.type = <brw_type> reg.writemask = <mask> Adding a new constructor makes code easier to read. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5e839727ed2378a01d3b657bad83abd4728e8da6	22-Jul-2015	Eduardo Lima Mitev <elima@igalia.com>	i965/nir: Pass a is_scalar boolean to brw_create_nir() The upcoming introduction of NIR->vec4 pass will require that some NIR lowering passes are enabled/disabled depending on the type of shader (scalar vs. vector). With this patch we pass a 'is_scalar' variable to the process of constructing the NIR, to let an external context decide how the shader should be handled. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
47d68908f2c3ad3e9011a2cf910b04cd3300673a	16-Jun-2015	Eduardo Lima Mitev <elima@igalia.com>	i965/nir/vec4: Select between new nir_vec4 or current vec4_visitor code-paths The NIR->vec4 pass will be activated if both the following conditions are met: * INTEL_USE_NIR environment variable is defined and is positive (1 or true) * The stage is vertex shader (support for geometry shaders and ARB_vertex_program will be added later). Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f12302b89836a24255674a251f7a6902b4e9af7c	29-Jun-2015	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Get rid of brw_vs_compile completely. After tearing it out another level or two, and just passing the key and vp directly, we can finally remove this struct. It also eliminates a pointless memcpy() of the key. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
64390967c1abc326875e495f233afec6e685db72	30-Jun-2015	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Remove 'c'/vs_compile from vec4_vs_visitor. At this point, the brw_vs_compile structure only contains the key and gl_vertex_program pointer. We may as well pass and store them directly; it's simpler and more convenient (key-> instead of vs_compile->key...). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
13372a0ce746cde6fa6e0aa3c5130e4227f123e0	29-Jun-2015	Kenneth Graunke <kenneth@whitecape.org>	i965/vec4: Move c->last_scratch into vec4_visitor. Nothing outside of vec4_visitor uses it, so we may as well keep it internal. Commit db9c915abcc5ad78d2d11d0e732f04cc94631350 for the vec4 backend. (The empty class will be going away soon.) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8524deb8c8fc37abc2cb2717be64a533746a92f9	29-Jun-2015	Kenneth Graunke <kenneth@whitecape.org>	i965/vec4: Move total_scratch calculation into the visitor. This is more consistent with how we do it in the FS backend, and reduces a tiny bit of duplication. It'll also allow for a bit more tidying. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
dc776ffb900b21421158ef8efbd675bdd47593bc	29-Jun-2015	Kenneth Graunke <kenneth@whitecape.org>	i965/vec4: Move perf_debug about register spilling into the visitor. This patch makes us only issue the performance warning about register spilling if we actually spilled registers. We also use scratch space for indirect addressing and the like. This is basically commit c51163b0cf7aff0375b1a5ea4cb3da9d9e164044 for the vec4 backend. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0163c99e8f6959b5d6c7a937a322127cfdf9315f	30-Jun-2015	Kenneth Graunke <kenneth@whitecape.org>	i965/vec4: Plumb log_data through so the backend_shader field gets set. Jason plumbed this through a while back in the FS backend, but apparently we were just passing NULL in the vec4 backend. This patch passes brw in as intended. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
40801295d5a3d747661abb1e2ca64d44c0e3dc05	23-Jun-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965: Remove the brw_context from the visitors As of this commit, nothing actually needs the brw_context. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bcaf4a3f077e3e3fbc66f264fe9124fa920ee70c	23-Jun-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vec4_vs: Add an explicit use_legacy_snorm_formula flag This way we can stop doing is_gles3 checks inside of the compiler. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
663f8d121d792edee5c012461bfd0b650011ff4a	20-Jun-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vs: Pass the current set of clip planes through run() and run_vs() Previously, these were pulled out of the GL context conditionally based on whether we were running ff/ARB or a GLSL program. Now, we just pass them in so that the visitor doesn't have to grab them itself. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1b0f6ffa15b25e8601d60fe1ea74e893f7d33cf5	20-Jun-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965: Pull calls to get_shader_time_index out of the visitor Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c7893dc3c590b86787d8118e3920debaea3f16da	19-Jun-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965: Use a single index per shader for shader_time. Previously, each shader took 3 shader time indices which were potentially at arbirary points in the shader time buffer. Now, each shader gets a single index which refers to 3 consecutive locations in the buffer. This simplifies some of the logic at the cost of having a magic 3 a few places. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6e255a3299c9ec5208cb5519b5da2edb0ce2972b	17-Apr-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965: Add compiler options to brw_compiler This creates the options at screen cration time and then we just copy them into the context at context creation time. We also move is_scalar to the brw_compiler structure. We also end up manually setting some values that the core would have set by default for us. Fortunately, there are only two non-zero shader compiler option defaults that we aren't overriding anyway so this isn't a big deal. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d7565b7d65f8203c20735a61b86e9158b8ec4447	16-Apr-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965: Remove the dependance on brw_context from the generators Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e639a6f68e701f23b977a49c45d646c164991d36	16-Apr-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965: Plumb compiler debug logging through a function pointer in brw_compiler v2 (Ken): Make shader_debug_log a printf-like function. v3 (Jason): Add a void * to pass the brw_context through Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0f8ec779ddff4126837a7d4216ecf1d4b97e93d2	12-Mar-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Create a shader_dispatch_mode enum to replace VS/GS fields. We used to store the GS dispatch mode in brw_gs_prog_data while separately storing the VS dispatch mode in brw_vue_prog_data::simd8. This patch introduces an enum to represent all possible dispatch modes, and stores it in brw_vue_prog_data::dispatch_mode, unifying the two. Based on a suggestion by Matt Turner. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b95ec49e57f81bdd75795dc93022533704efe509	20-May-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vs: Rework the logic for generating NIR from ARB vertex programs Whether or not to use NIR is now equivalent to brw->scalar_vs. We can simplify the logic and make it far less confusing. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
99cb4233205edcfa1a1e2967eef7bb16ff19bec4	20-May-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965: Rename backend_visitor to backend_shader The backend_shader class really is a representation of a shader. The fact that it inherits from ir_visitor is somewhat immaterial. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3687d752e51829b4723c9abb07ae56d2bbcda570	12-Mar-2015	Kenneth Graunke <kenneth@whitecape.org>	i965/fs: Combine the fs_visitor constructors. For scalar GS support, we either need to add a fourth constructor which takes the GS structures, or combine the existing two and pass the shader stage. Given that they're not significantly different, I opted for the latter. v2: Remove more stuff from the .h file (Jason and Jordan). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
76c1086f2dfb37a1edf6d2df6eebbe11ccbfc50b	24-Mar-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965: Change header_present to header_size in backend_instruction Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3da9f708d4f1375d674fae4d6c6eb06e4c8d9613	20-Feb-2015	Francisco Jerez <currojerez@riseup.net>	i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode. v2: Save some CPU cycles by doing 'return progress' rather than 'depth++' in the discard jump special case. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f2fad0dc80627e853eea558498f18a9fa769992e	19-Feb-2015	Francisco Jerez <currojerez@riseup.net>	i965: Perform basic optimizations on the BROADCAST opcode. v2: Style fixes. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f118e5d15fd9b35cf27a975a702c5fb81d3157aa	23-Apr-2015	Francisco Jerez <currojerez@riseup.net>	i965: Add typed surface access opcodes. Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0775d8835ac8d1f2ab75d04f0cddbad36b6787fe	23-Apr-2015	Francisco Jerez <currojerez@riseup.net>	i965: Add untyped surface write opcode. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
20915130ace4cc0f700ece2a99c0353581a156bb	26-Feb-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Add support for untyped surface message sends from GRF. This doesn't actually enable untyped surface message sends from GRF yet, the upcoming atomic counter and image intrinsic lowering code will. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
17233f9bbcbf570f0c7633c63dbd5ed88634ed60	21-Apr-2015	Jordan Justen <jordan.l.justen@intel.com>	i965: Add brw_setup_tex_for_precompile. Use in VS, GS & FS. Suggested-by: Kristian Høgsberg <krh@bitplanet.net> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1ac7db07b363207e8ded9259f84bbcaa084b8667	12-Mar-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Unhardcode a few more stage names and abbreviations. The stage_abbrev and stage_name fields in backend_visitor provide what we need without any additional effort. It also means we'll get the right names for compute shaders, SIMD8 geometry shaders, and both kinds of tessellation shaders. This does unfortunately change the capitalization of the stage abbreviation in the INTEL_DEBUG=optimizer output filenames. It doesn't seem worth adding code to handle, though. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
dffc1a0ae3a75d426f10c5d3ba021de977467929	25-Apr-2015	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Remove unnecessary NULL check on generate_code() result. Code generation is not allowed to fail for any reason - in fact, fs_generator has no mechanism for failing. The visitor is responsible for that. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
28e9601d0e681411b60a7de8be9f401b0df77d29	16-Apr-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965: Add a devinfo field to backend_visitor and use it for gen checks Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
89c1feb78d010bc457f5d02be84c955eebf3549f	08-Apr-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Create NIR during LinkShader() and ProgramStringNotify(). Previously, we translated into NIR and did all the optimizations and lowering as part of running fs_visitor. This meant that we did all of that work twice for fragment shaders - once for SIMD8, and again for SIMD16. We also had to redo it every time we hit a state based recompile. We now generate NIR once at link time. ARB programs don't have linking, so we instead generate it at ProgramStringNotify time. Mesa's fixed function vertex program handling doesn't bother to inform the driver about new programs at all (which is rather mean), so we generate NIR at the last minute, if it hasn't happened already. shader-db runs ~9.4% faster on my i7-5600U, with a release build. v2: Check NirOptions != NULL in ProgramStringNotify(). Don't bother using _mesa_program_enum_to_shader_stage as we already know it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bff421332661bfd0f82ab9eee9e4fec9d06ed1a1	03-Apr-2015	Jason Ekstrand <jason.ekstrand@intel.com>	i965: Check the INTEL_USE_NIR environment variable once at context creation Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
31dc63d5ca090fed3f1adcd4fd0db2f1f7aa19f7	25-Mar-2015	Kenneth Graunke <kenneth@whitecape.org>	i965/nir: Use NIR for ARB_vertex_program support on Gen8+. Everything is already in place; we simply have to take the scalar code generation path. This gives us SIMD8 VS programs, instead of SIMD4x2. v2: Rebase on the patch that drops brw->gen >= 8. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ef09cfb51e0c1cc9e3c6f370813a843a6ecaa4e2	25-Mar-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Drop unnecessary brw->gen >= 8 check from scalar VS code. brw->scalar_vs already implies that brw->gen >= 8. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e6e655ef76bb22193b31af2841cb50fda0c39461	18-Mar-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Define helpers to calculate the common live interval of a range of variables. These will be especially useful when we start keeping track of liveness information for each subregister. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
588859e18cb597612e56980a65a762ef069363e4	18-Mar-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Fix handling of multiple register reads and writes in split_virtual_grfs(). Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9304f60cbe7c348a4771a7746606730bea3ae45f	18-Mar-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Fix handling of multiple register reads and writes in opt_register_coalesce(). Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
516d45f78a3bbab0288c49c0f876ebdf4ad05bff	18-Mar-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Some more trivial swizzle clean-up. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
430c6bf70e48c08ba4dc9e00f2b88e2230793010	18-Mar-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Improve src_reg/dst_reg conversion constructors. This simplifies the src_reg/dst_reg conversion constructors using the swizzle utils introduced in a previous patch. It also makes them more useful by changing their semantics slightly: dst_reg(src_reg) used to set the writemask to XYZW if the src_reg swizzle was anything other than XXXX, which was almost certainly not what the caller intended if the swizzle was non-trivial. After this patch the same components that are present in the swizzle will be enabled in the resulting writemask. src_reg(dst_reg) used to set the first components of the swizzle to the enabled components of the writemask and then replicate the last enabled component to fill the swizzle, which, in cases where the writemask didn't have exactly the first n components set, would in general not be compatible with the original dst_reg. E.g.: \| ADD(tmp, src_reg(tmp), src_reg(1)); would not do what one would expect (add one to each of the enabled components of tmp) if tmp didn't have a writemask of the described form (e.g. YZ, YW, XZW would all fail). This pattern actually occurs in many different places in the VEC4 back-end, it's a wonder that it hasn't caused piglit failures until now. After this patch src_reg(dst_reg) will construct a swizzle with each enabled component at its natural position (e.g. Y at the second position, Z at the third, and so on). The resulting swizzle will behave like the identity when used in any instruction with the original writemask. I've manually verified that none of the callers of both conversion constructors were relying on the previous broken semantics. There are no piglit regressions on any generation. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
62fd3353387547504966d77f3350afc9b688ef93	18-Mar-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Pass argument by reference to src_reg/dst_reg conversion constructors. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
23bda945f570b4f566ed39b4c1de89a957247df7	18-Mar-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Remove swizzle_for_size() in favour of brw_swizzle_for_size(). It could be objected that swizzle_for_size() is "faster" than brw_swizzle_for_size(). It's not measurably better in any reasonable CPU-bound benchmark on VLV according to the Finnish benchmarking system (including the SynMark2 DrvShComp shader compilation benchmark). Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9a17e4e900256b5be73d935fa5f35c98b3b0d7fe	18-Mar-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Simplify opt_register_coalesce() using the swizzle utils. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
05ec72d8ecdba04a81745fbc3ca0df40c7fb8828	18-Mar-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Simplify reswizzle() using the swizzle utils. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7b30493dc4f0b1346fe4c1fe52211f0c0d7ed229	18-Mar-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Simplify opt_reduce_swizzle() using the swizzle utils. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7e816c7feb8cffa878546eee363240b1b66d5c42	18-Mar-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Fix signedness of dst_reg::writemask. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b0d422cd2a99d2fd26ab11880d5d8410ebfc64b2	16-Mar-2015	Matt Turner <mattst88@gmail.com>	i965/fs: Print spills:fills and number of promoted constants. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
78df9d5e30fbca8b0795594448a3bcae05d5f5f2	05-Mar-2015	Matt Turner <mattst88@gmail.com>	i965/vec4: Handle saturate in dump_instruction(). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
63d6d09a3b3790c5ec00f2cbc06f58c82ae40b0c	03-Feb-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Don't attempt to reduce swizzles of send from GRF instructions. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
eb47d0efd39d73d4388389d6c0ebe458160f79fa	05-Feb-2015	Matt Turner <mattst88@gmail.com>	i965: Optimize multiplication by -1 into a negated MOV. instructions in affected programs: 968 -> 942 (-2.69%) helped: 4 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
55de910f909ac668ec7ea8fd94ec4f235b0d0335	11-Feb-2015	Eric Anholt <eric@anholt.net>	i965: Quiet another compiler warning about uninitialized values. The compiler can't tell that we're always going to hit the first if block on the first time through the loop. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b40bcd24e0c86fb02c226261c1fe46fb362be217	04-Feb-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Don't set any dependency control bits for F32TO16 on Gen8. It's expanded to several instructions. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
530445330b403d835a4027b41388b5eea8c2e1ab	03-Feb-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Init mlen for several send from GRF instructions. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
de666fc102b805707c7033b203c5b76ccbbcef8d	05-Feb-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Fix the scheduler to take into account reads and writes of multiple registers. v2: Avoid nested ternary operators in vec4_instruction::regs_read(). (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8ad486077e122c19b603750e19dd678bb7793d5b	05-Feb-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Make vec4_visitor::implied_mrf_writes() return zero for sends from GRF. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
388b136e677e30249e062145b488c2d938c1ef17	05-Feb-2015	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Implement equals() method for dst_reg too. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
dfe957c02b753dbb5b372e768a5677f577daf9ef	06-Feb-2015	Francisco Jerez <currojerez@riseup.net>	i965: Move up fs_inst::flag_subreg to backend_instruction. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
447879eb88b8df41ad32cf4406cc636b112b72d9	10-Feb-2015	Francisco Jerez <currojerez@riseup.net>	i965: Factor out virtual GRF allocation to a separate object. Right now virtual GRF book-keeping and allocation is performed in each visitor class separately (among other hundred different things), leading to duplicated logic in each visitor and preventing layering as it forces any code that manipulates i965 IR and needs to allocate virtual registers to depend on the specific visitor that happens to be used to translate from GLSL IR. v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor). Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8030e269e911c4f90a44d9a77eb342dd2657d229	03-Dec-2014	Ben Widawsky <benjamin.widawsky@intel.com>	i965/vec4: Correct MUL destination hazard As it turns out, we were over-thinking the cause of the hang on Cherryview. It's simply errata for Cherryview. commit 88fea85f09e2252035bec66ab26c375b45b000f5 Author: Ben Widawsky <benjamin.widawsky@intel.com> Date: Fri Nov 21 10:47:41 2014 -0800 i965/vec4/gen8: Handle the MUL dest hazard exception This is an explanation to why we never saw the hang on BDW. NOTE: The problem the original patch was trying to fix does still exist. It will have to be fixed at some point. v2: Modify commit message, s/CHV/BDW Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212 Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
94e7b59a75fc2ecc51a74196f6cd198546603b85	05-Jan-2015	Matt Turner <mattst88@gmail.com>	i965: Convert CMP.GE -(abs)reg 0 -> CMP.Z reg 0. total instructions in shared programs: 5952059 -> 5951603 (-0.01%) instructions in affected programs: 138812 -> 138356 (-0.33%) GAINED: 1 LOST: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
41d9f232b6a7f53086b9c428cca30e45905abd48	12-Jan-2015	Matt Turner <mattst88@gmail.com>	i965/vec4: Make sure that imm writes are to registers in the same file. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87887 /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3167a80bb1119616b70fbbcf2661d3fb511a6034	13-Jan-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Fix "vertex" vs. "geometry" and "VS" vs. "GS" in debug output. We were happily printing "Native code for unnamed vertex shader" and "VS vec4" program for geometry shaders in our INTEL_DEBUG=gs output, as well as the KHR_debug output used by shader-db. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
68ed14d6adcaf4b91216fc1c53792e88d1fd024d	13-Jan-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Pass a shader stage abbreviation to fs_generator(). A lot of messages hardcoded the string "FS", which is confusing on Broadwell, where we use this code for VS support as well. shader-db particularly got confused, as it reported two "FS SIMD8" shaders, and no vertex shaders at all. Craziness ensued. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0b98b2bf535d6e6b6b02c0d47ea03f98adf42f15	01-Jan-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Make the precompile ignore DEPTH_TEXTURE_MODE on Gen7.5+. Gen7.5+ platforms that support the "Shader Channel Select" feature leave key->tex.swizzles[i] as SWIZZLE_NOOP except when GL_DEPTH_TEXTURE_MODE is GL_ALPHA (which is really uncommon). So, the precompile should leave them as SWIZZLE_NOOP (aka SWIZZLE_XYZW) as well. We didn't notice this because prog->ShadowSamplers is not set correctly. The next patch will fix that problem. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
408e298942ffb03c00e05dce2569c291df6bec49	01-Jan-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Fix INTEL_DEBUG=optimizer with VF types. Hardcoding stderr is wrong; INTEL_DEBUG=optimizer uses other files. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9b8bd67768769b685c25e1276e053505aede5f93	01-Jan-2015	Kenneth Graunke <kenneth@whitecape.org>	i965: Show opt_vector_float() and later passes in INTEL_DEBUG=optimizer. In order to support calling opt_vector_float() inside a condition, this patch makes OPT() a statement expression: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html We've used that elsewhere already. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
798c094e6266bf53b332f332e82a90c338c49915	21-Dec-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Do separate copy followed by constant propagation after opt_vector_float(). total instructions in shared programs: 5877012 -> 5876617 (-0.01%) instructions in affected programs: 33140 -> 32745 (-1.19%) From before the commit that allows VF constant propagation (which hurt some programs) to here, the results are: total instructions in shared programs: 5877951 -> 5876617 (-0.02%) instructions in affected programs: 123444 -> 122110 (-1.08%) with no programs hurt. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bbdd3198a5f778ba55c037e4af86d88b06ca4e95	20-Dec-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Do CSE, copy propagation, and DCE after opt_vector_float(). total instructions in shared programs: 5869005 -> 5868220 (-0.01%) instructions in affected programs: 70208 -> 69423 (-1.12%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
44573458bdc52acc304fb75d6df502312b8e149c	20-Dec-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Add pass to gather constants into a vector-float MOV. Currently only handles consecutive instructions with the same destination that collectively write all channels. total instructions in shared programs: 5879798 -> 5869011 (-0.18%) instructions in affected programs: 465236 -> 454449 (-2.32%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7bc6e455e231076bfac6c678c375ea4aca94ebf0	21-Dec-2014	Matt Turner <mattst88@gmail.com>	i965: Add support for saturating immediates. I don't feel great about assert(!"unimplemented: ...") but these cases do only seem possible under some currently impossible circumstances. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3978585bccf69ff8f607cad0de025ea91c418587	20-Dec-2014	Matt Turner <mattst88@gmail.com>	i965: Add fs_reg/src_reg constructors that take vf[4]. Sometimes it's easier to generate 4x values into an array, and the memcpy is 1 instruction, rather than 11 to piece 4 arguments together. I'd forgotten to remove the prototype from fs_reg from a previous patch, so it's already there for us here. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8517e665bc4c378e8e7523827090fd1b06abaecd	12-Dec-2014	Andres Gomez <agomez@igalia.com>	i965/brw_reg: struct constructor now needs explicit negate and abs values. We were assuming, when constructing a new brw_reg struct, that the negate and abs register modifiers would not be present by default in the new register. Now, we force explicitly setting these values when constructing a new register. This will avoid problems like forgetting to properly set them when we are using a previous register to generate this new register, as it was happening in the dFdx and dFdy generation functions. Fixes piglit test shaders/glsl-deriv-varyings Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82991 Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ee5fb8d1ba7f50ed94e1a34fa0f6e15a0588145e	21-Oct-2014	Kristian Høgsberg <krh@bitplanet.net>	i965: Generate vs code using scalar backend for BDW+ With everything in place, we can now use the scalar backend compiler for vertex shaders on BDW+. We make scalar vertex shaders the default on BDW+ but add a new vec4vs debug option to force the vec4 backend. No piglit regressions. Performance impact is minimal, I see a ~1.5 improvement on the T-Rex GLBenchmark case, but in general it's in the noise. Some of our internal synthetic, vs bounded benchmarks show great improvement, 20%-40% in some cases, but real-world cases are mostly unaffected. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
bf2307937995212895375d1e258d50207da3d24e	25-Nov-2014	Kristian Høgsberg <krh@bitplanet.net>	i965: Rename brw_vec4_prog_data/key to brw_bue_prog_data/key These structs aren't vec4 specific, they are shared by shader stages operating on Vertex URB Entries (VUEs). VUEs are the data structures in the URB that hold vertex data between the pipeline geometry stages. Using vue in the name instead of vec4 makes a lot more sense, especially when we add scalar vertex shader support. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0d3cc01b0b092271938ce2cf2b77d27dc385e4d8	24-Oct-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Allow CSE on uniform-vec4 expansion MOVs. Three source instructions cannot directly source a packed vec4 (<0,4,1> regioning) like vec4 uniforms, so we emit a MOV that expands the vec4 to both halves of a register. If these uniform values are used by multiple three-source instructions, we'll emit multiple expansion moves, which we cannot combine in CSE (because CSE emits moves itself). So emit a virtual instruction that we can CSE. Sometimes we demote a uniform to to a pull constant after emitting an expansion move for it. In that case, recognize in opt_algebraic that if the .file of the new instruction is GRF then it's just a real move that we can copy propagate and such. total instructions in shared programs: 5822418 -> 5812335 (-0.17%) instructions in affected programs: 351841 -> 341758 (-2.87%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
afd605f3461462ba1b9f522b079ff5a03e7ab55c	01-Dec-2014	Kenneth Graunke <kenneth@whitecape.org>	i965: Make vertex color clamp handling code VS specific. Vertex color clamping only applies to gl_[Secondary]{Front,Back}Color, which are compatibility-only built-in varyings. We only support GS in core profile, so they can't exist in geometry shaders. We can drop several dirty bits from the GS program key - they're unnecessary for a core profile implementation. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5df88c2096281f416b2738debac1c4c329e29673	03-Nov-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Rewrite dead code elimination to use live in/out. Improves 359 shaders by >=10% 114 shaders by >=20% 91 shaders by >=30% 82 shaders by >=40% 22 shaders by >=50% 4 shaders by >=60% 2 shaders by >=80% total instructions in shared programs: 5845346 -> 5822422 (-0.39%) instructions in affected programs: 364979 -> 342055 (-6.28%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e14c7c7faff3c204a5eefc1f2ea487d4730b8382	10-Mar-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Add VEC4_OPCODE_PACK_4_BYTES. Will be used by emit_pack_{s,u}norm_4x8(). /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
cb0ba848d4176c1ed2c4542fd5875867f460fc3b	09-Mar-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Add vector float immediate infrastructure. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
88fea85f09e2252035bec66ab26c375b45b000f5	21-Nov-2014	Ben Widawsky <benjamin.widawsky@intel.com>	i965/vec4/gen8: Handle the MUL dest hazard exception Fix one of the few cases where we can't reliable touch the destination hazard bits. I am explicitly doing this patch individually so it is easy to backport. I was tempted to do this patch before the previous patch which reorganized the code, but I believe even doing that first, this is still easy to backport. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212 Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
156f565f9eb36dad3cd959952724bc54f9ff21ea	21-Nov-2014	Ben Widawsky <benjamin.widawsky@intel.com>	i965/vec4: Extract depctrl hazards Move this to a separate function so that we can begin to add other little caveats without making too big a mess. NOTE: There is some desire to improve this function eventually, but we need to fix a bug first. v2: Use const for the inst for the hazard check (Matt) Invert safe logic to get rid of the double negative (Matt) Add PRM reference for predicates (Matt) Add note about empirical evidence for math (Matt) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7d560a3861ff30aa9d8ec872cf9cd7d72a980eb2	21-Oct-2014	Ian Romanick <ian.d.romanick@intel.com>	i965: Silence unused parameter warning in brw_dump_ir Just remove the parameter. Silences: brw_program.c: In function 'brw_dump_ir': brw_program.c:566:33: warning: unused parameter 'brw' [-Wunused-parameter] Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b52126b44f40643aa2c0986c1d51330f4e4130b5	27-Sep-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Optimize sqrt+inv into rsq. Transform sqrt a, b rcp c, a into sqrt a, b rsq c, b In most cases the sqrt's result is still used, so the improvement here is that we've broken a dependency between these instructions. Leads to 80 fewer INV instructions and 80 more RSQ. Occasionally the sqrt's result is no longer used, leading to: instructions in affected programs: 5005 -> 4949 (-1.12%) Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
189ac077644c4ef2c6c15080b6d094410c74abdc	27-Sep-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Call opt_algebraic after opt_cse. The next patch adds an algebraic optimization for the pattern sqrt a, b rcp c, a and turns it into sqrt a, b rsq c, b but many vertex shaders do a = sqrt(b); var1 /= a; var2 /= a; which generates sqrt a, b rcp c, a rcp d, a If we apply the algebraic optimization before CSE, we'll end up with sqrt a, b rsq c, b rcp d, a Applying CSE combines the RCP instructions, preventing this from happening. No shader-db changes. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
72bb3f81c621931e42759148bc8bddc511266dd0	02-Sep-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Don't iterate between blocks with inst->next/prev. The register coalescing portion of this patch hurts three shaders in Guacamelee by one instruction each, but examining the diff makes me believe that what we were generating was (perhaps harmlessly) incorrect. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
90bfeb22444df6ce779251522e47bf169e130f8e	01-Sep-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Don't use instruction list after calculating the cfg. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a4fb8897a2bd00eefa8a503ec17d45e791bced91	01-Sep-2014	Matt Turner <mattst88@gmail.com>	i965: Remove now unneeded calls to calculate_cfg(). Now that nothing invalidates the CFG, we can calculate_cfg() immediately after emit_fb_writes()/emit_thread_end() and never again. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
072ea414d04f1b9a7bf06a00b9011e8ad521c878	01-Sep-2014	Matt Turner <mattst88@gmail.com>	i965: Remove cfg-invalidating parameter from invalidate_live_intervals. Everything has been converted to preserve the CFG. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
269b6e24d6ec61d8d8d0c5d1b3d1bfa4f4a55f5f	25-Aug-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Preserve CFG in spill_reg(). Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b0b64c85e4a0dafbb46405e4b3c17be24b63347f	25-Aug-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Preserve the CFG in a few more places. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c66165ab2b15047792808433b788632a4b9df287	01-Aug-2014	Iago Toral Quiroga <itoral@igalia.com>	i965/gen6/gs: Fix binding table clash between TF surfaces and textures. For gen6 geometry shaders we use the first BRW_MAX_SOL_BINDINGS entries of the binding table for transform feedback surfaces. However, vec4_visitor will setup the binding table so that textures use the same space in the binding table. This is done when calling assign_common_binding_table_offsets(0) as part if its run() method. To fix this clash we add a virtual method to the vec4_visitor hierarchy to assign the binding table offsets, so that we can change this behavior specifically for gen6 geometry shaders by mapping textures right after the first BRW_MAX_SOL_BINDINGS entries. Also, when there is no user-provided geometry shader, we only need to upload the binding table if we have transform feedback, however, in the case of a user-provided geometry shader, we can't only look into transform feedback to make that decision. This fixes multiple piglit tests for textureSize() and texelFetch() when these functions are called from a geometry shader in gen6, like these: bin/textureSize gs sampler2D -fbo -auto bin/texelFetch gs usampler2D -fbo -auto Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2c85132e511bbef9a0965c69848981b1bffb5bad	09-Jul-2014	Iago Toral Quiroga <itoral@igalia.com>	i965/gen6/gs: Implement GS_OPCODE_URB_WRITE_ALLOCATE. Gen6 geometry shaders need to allocate URB handles for each new vertex they emit after the first (the URB handle for the first vertex is obtained via the FF_SYNC message). This opcode adds the URB allocation mechanism to regular URB writes. Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d0bdd4ce983ddd52f9f4b70dced4e471c60a130c	09-Jul-2014	Iago Toral Quiroga <itoral@igalia.com>	i965/gen6/gs: Implement GS_OPCODE_FF_SYNC. This implements the FF_SYNC message required in gen6 geometry shaders to get the initial URB handle. Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
667f758788f0796d9be16f0f361022d447f622f5	09-Sep-2014	Chris Forbes <chrisf@ijw.co.nz>	i965/vec4: slightly improve insn dumping with no srcs Previously, we would get a trailing ', ' which looked strange. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6b6145204dd4a1112f6e1fe10162636141495b79	11-Sep-2014	Kenneth Graunke <kenneth@whitecape.org>	i965: Separate gl_InstanceID and gl_VertexID uploading. We always uploaded them together, mostly out of laziness - both required an additional vertex element. However, gl_VertexID now also requires an additional vertex buffer for storing gl_BaseVertex; for non-indirect draws this also means uploading (a small amount of) data. This is extra overhead we don't need if the shader only uses gl_InstanceID. In particular, our clear shaders currently use gl_InstanceID for doing layered clears, but don't need gl_VertexID. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "10.3" <mesa-stable@lists.freedesktop.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
87472ae58cf2a5c812630f4eabd485931d243e0c	05-Sep-2014	Matt Turner <mattst88@gmail.com>	i965/fs: Brown bag fix. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e8df6a6b32aae7695ce010f18588c51cb7d18978	31-Aug-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Add ability to reswizzle arbitrary swizzles. Before commit 04895f5c we would only reswizzle dot product instructions (since they wrote the same value into all channels, and we didn't have to think about anything else). That commit extended reswizzling to cases when the swizzle was single valued -- i.e., writing the same result into all channels. But allowing reswizzling of arbitrary things is actually really easy and is even less code. (Why didn't we do this in the first place?!) total instructions in shared programs: 4266079 -> 4261000 (-0.12%) instructions in affected programs: 351933 -> 346854 (-1.44%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1ee1d8ab468cafd25cfcc513319f3f046492947f	31-Aug-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Reswizzle sources when necessary. Despite the comment above the function claiming otherwise, the function did not reswizzle sources, which would lead to bad code generation since commit 04895f5c, which began claiming we could do such swizzling when we could not. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82932 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f92fbd554f2e9e702a2bd650c9b2571a3f4f1ab8	02-Sep-2014	Kenneth Graunke <kenneth@whitecape.org>	i965: Move curb_read_length/total_scratch to brw_stage_prog_data. All shader stages have these fields, so it makes sense to store them in the common base structure, rather than duplicating them in each. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1c573c9adbb8bb95bc10f6ade76a430684918160	28-Aug-2014	Jason Ekstrand <jason.ekstrand@intel.com>	i965/vec4: Don't segfault when debug-logging a null program Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
20a849b4aa63c7fce96b04de674a4c70f054ed9c	13-Jul-2014	Matt Turner <mattst88@gmail.com>	i965: Use basic-block aware insertion/removal functions. To avoid invalidating and recreating the control flow graph. Also stop invalidating the CFG in places we didn't add or remove an instruction. cfg calculations: 202951 -> 80307 (-60.43%) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
04895f5c601b240df547739da786b7c2b65bdd1e	15-Aug-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Allow reswizzling writemasks when swizzle is single-valued. total instructions in shared programs: 4288033 -> 4266151 (-0.51%) instructions in affected programs: 930915 -> 909033 (-2.35%) /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9a071e3339afcf6fd937ae31121fa3b3face3bfe	18-Aug-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Add a pass to reduce swizzles. total instructions in shared programs: 4344280 -> 4288033 (-1.29%) instructions in affected programs: 397468 -> 341221 (-14.15%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a3d0ccb037082f3aa66bd558dfbe89f63a6eedd3	12-Jul-2014	Matt Turner <mattst88@gmail.com>	i965: Pass a cfg pointer to generate_{code,assembly}. The loop over all instructions is now two-fold, over all of the blocks and all of the instructions in each block. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
596990d91e2a4c4a3a303c6c2da623bf1840771b	12-Jul-2014	Matt Turner <mattst88@gmail.com>	i965: Add and use foreach_block macro. Use this as an opportunity to rename 'block_num' to 'num'. block->num is clear, and block->block_num has always been redundant. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
972e87ca30b4c4b7f6269e5f9fe8c5cb6356f744	14-Aug-2014	Pekka Paalanen <pekka.paalanen@collabora.co.uk>	i965: fix compiler error in union initiliazer gcc 4.6.3 chokes with the following error: brw_vec4.cpp: In member function 'int brw::vec4_visitor::setup_uniforms(int)': brw_vec4.cpp:1496:37: error: expected primary-expression before '.' token Apparently C++ does not do named initializers for unions, except maybe as a gcc extension, which is not present here. As .f is the first element of the union, just drop it. Fixes the build error. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2c50212b14da27de4e3da62488ae4e35c069d84e	11-Aug-2014	Neil Roberts <neil@linux.intel.com>	i965: Store uniform constant values in a gl_constant_value instead of float The brw_stage_prog_data struct previously contained an array of float pointers to the values of parameters. These were then copied into a batch buffer to upload the values using a regular assignment. However the float values were also being overloaded to store integer values for integer uniforms. This can break if x87 floating-point registers are used to do the assignment because the fst instruction tries to fix up invalid float values. If an integer constant happened to look like an invalid float value then it would get altered when it was copied into the batch buffer. This patch changes the pointers to be gl_constant_value instead so that the assignment should end up copying without any alteration. This also makes it more obvious that the values being stored here are overloaded for multiple types. There are some static asserts where the values are uploaded to ensure that the size of gl_constant_value is the same as a float. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81150 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f17bfc9ba954608c58fd0560f255e40eef7e7cea	11-Aug-2014	Kenneth Graunke <kenneth@whitecape.org>	i965: Never use the Gen8 code generators. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
074d472398b3cc7f32fe5c0cc742853cf66fabed	30-Jun-2014	Kenneth Graunke <kenneth@whitecape.org>	i965: Switch to the EU emit layer for code generation on Broadwell. Everything should be in place to unify code generation between Gen4-7 and Gen8+. We should be able to drop the Gen8 generators at this point. However, leave them hooked up for a brief moment, for testing and comparison purposes. Set GEN8=1 to use the old Gen8+ code generator paths. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
36a4a6bbdca0c30e16d56e6b406ea7c94831048f	22-Jul-2014	Kenneth Graunke <kenneth@whitecape.org>	i965: Port INTEL_DEBUG=optimizer to the vec4 backend. Largely via copy and paste. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3e9105f7eefae97c928034662f67019973b9e483	12-Jul-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Use foreach_inst_in_block a couple more places. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1761671b0627ce8e1c0eae721e1fca5c2d04690e	12-Jul-2014	Matt Turner <mattst88@gmail.com>	i965: Replace cfg instances with calls to calculate_cfg(). Avoids regenerating it unnecessarily. Every program in shader-db improved, none by an amount less than a 1/3 reduction. One Dota2 shader decreased from 62 -> 24. cfg calculations: 429492 -> 193197 (-55.02%) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1854ead64ca465ca03e8e5369cd1749bc92c315a	06-Jul-2014	Chris Forbes <chrisf@ijw.co.nz>	i965: Avoid crashing while dumping vec4 insn operands We'd otherwise go looking into virtual_grf_sizes for things that aren't in there at all. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3c8dc48ad1d4061a2a1d0b9ea3126350b98274f0	06-Mar-2013	Kenneth Graunke <kenneth@whitecape.org>	i965/vec4: Add basic common subexpression elimination. [mattst88]: Modified to perform CSE on instructions with the same writemask. Offered no improvement before. total instructions in shared programs: 1995633 -> 1995185 (-0.02%) instructions in affected programs: 14410 -> 13962 (-3.11%) Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
34ef6a7651d6651e0bca77c4d4b890af582ad360	30-Jun-2014	Matt Turner <mattst88@gmail.com>	i965: Move is_zero/one/null/accumulator into backend_reg. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
53992a102ffddf2e0fad401252cfc1c034d022ad	30-Jun-2014	Matt Turner <mattst88@gmail.com>	i965: Use immediate storage in brw_reg for visitor regs. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
489ec685542590c7412db81623952c1aa75d946f	19-May-2014	Eric Anholt <eric@anholt.net>	i965: Update a ton of comments about constant buffers. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c0f1929dd23bbc558e9eef0f8fd40e10dfef3c21	19-May-2014	Eric Anholt <eric@anholt.net>	i965: Move dispatch_grf_start_reg and first_curbe_grf into stage_prog_data. I wanted to access this value from stage-generic code, so stop storing it under two different names. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3d826729dabab53896cdbb1f453c76fab1c7e696	29-Jun-2014	Matt Turner <mattst88@gmail.com>	i965: Use unreachable() instead of unconditional assert(). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
266109736a9a69c3fdbe49fe1665a7a63c5cc122	25-Jun-2014	Matt Turner <mattst88@gmail.com>	i965: Use typed foreach_in_list_safe instead of foreach_list_safe. Acked-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c5030ac0ac15d3c91c4352789f94281da9a9dcad	25-Jun-2014	Matt Turner <mattst88@gmail.com>	i965: Use typed foreach_in_list instead of foreach_list. Acked-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
46659d46a8c2f7bbc8deb472faff2dccbde92d29	24-Jun-2014	Matt Turner <mattst88@gmail.com>	i965: Make can_do_source_mods() a member of the instruction classes. Pretty nonsensical to have it as a method of the visitor just for access to brw. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d0575d98fc595dcc17706dc73d1eb461027ca17a	14-Jun-2014	Kenneth Graunke <kenneth@whitecape.org>	i965/vec4: Fix dead code elimination for VGRFs of size > 1. When faced with code such as: mov vgrf31.0:UD, 960D mov vgrf31.1:UD, vgrf30.xxxx:UD The dead code eliminator didn't consider reg_offsets, so it decided that the second instruction was writing was writing to the same register as the first one, and eliminated the first one. But they're actually different registers. This fixes INTEL_DEBUG=shader_time for vertex shaders. In the above code, vgrf31.0 represents the offset into the shader_time buffer where the data should be written, and vgrf31.1 represents the actual time data. With a completely undefined offset, results were...unexpected. I think this is probably one of the few cases (maybe only case) where we generate multiple MOVs to a large VGRF. Normally, we just use them as texturing results; the other SEND-from-GRF uses a size 1 VGRF. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79029 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7b9cf797903a5ea70072a28c0486d3e99ee60645	06-Mar-2013	Kenneth Graunke <kenneth@whitecape.org>	i965: Make src_reg::equals() take a constant reference, not a pointer. This is more typical C++ style. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
56d6dcf4f771d57d2759b2a5c5006f24444c696f	29-May-2014	Matt Turner <mattst88@gmail.com>	i965: Give dump_instruction() a FILE* argument. Use function overloading rather than default arguments, since gdb doesn't know about default arguments. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
306ed81b9363721058c568244f9860c5c8c819f4	04-Apr-2014	Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>	i965: Add writes_accumulator flag Our hardware has an "accumulator" register, which can be used to store intermediate results across multiple instructions. Many instructions can implicitly write a value to the accumulator in addition to their normal destination register. This is enabled by the "AccWrEn" flag. This patch introduces a new flag, inst->writes_accumulator, which allows us to express the AccWrEn notion in the IR. It also creates a n ALU2_ACC macro to easily define emitters for instructions that implicitly write the accumulator. Previously, we only supported implicit accumulator writes from the ADDC, SUBB, and MACH instructions. We always enabled them on those instructions, and left them disabled for other instructions. To take advantage of the MAC (multiply-accumulate) instruction, we need to be able to set AccWrEn on other types of instructions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
30c35d1dcb2fde19b1c968751fda5151b795d257	09-Apr-2014	Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>	i965: Add is_accumulator() function. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
602510395a96a1f6ca29189e4f5cfb3f07f21d23	13-Feb-2014	Mike Stroyan <mike@LunarG.com>	i965: Avoid dependency hints on math opcodes Putting NoDDClr and NoDDChk dependency control on instruction sequences that include math opcodes can cause corruption of channels. Treat math opcodes like send opcodes and suppress dependency hinting. Signed-off-by: Mike Stroyan <mike@LunarG.com> Tested-by: Tony Bertapelli <anthony.p.bertapelli@intel.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
596737ee91cc199a8edff5dc440736471e28f297	24-Mar-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Let DCE eliminate dead writes in other basic blocks. We previously stopped searching for unread writes after encountering control flow, but we can instead just search backwards until we hit control flow. instructions in affected programs: 22854 -> 22194 (-2.89%) /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
20dee82a75ac7415fba0b3540a1f99d60b2325db	01-Apr-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Consider sources of non-GRF-dst instructions for dead channels. Previously we'd ignore the sources of instructions with non-GRF destinations when calculating calculating the dead channels. This would lead to us incorrectly removing the first instruction in this sequence: mov vgrf11, ... cmp.ne.f0 null, vgrf11, 1.0 mov vgrf11, ... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76616 /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e14cc504f307a7fa88c8b6757df53026aaa39b08	02-Apr-2014	Tapani Pälli <tapani.palli@intel.com>	i965/vec4: do not trim dead channels on gen6 for math Do not set a writemask on Gen6 for math instructions, those are executed using align1 mode that does not support a destination mask. v2: cleanups, better comment (Matt) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76883 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3a8bd9724196075da76ddcb50eff4867c5a37398	29-Mar-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Don't trim writemasks of texture instructions. It was my understanding that the writemask works in SIMD4x2 mode for texturing instructions and doesn't require a message header. Some bit of this logic must be wrong, so disable it until it's understood. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76617 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
764e25d79dad3096274ab2df04f5aa3ffb232119	19-Mar-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Eliminate dead writes to the flag register. For each write, search previous instructions for unread writes to the flag register and remove them. Note that this will not eliminate the last unread write. total instructions in shared programs: 788074 -> 788004 (-0.01%) instructions in affected programs: 4930 -> 4860 (-1.42%) Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9cd51bb0c4608258199c69bc7738e72f055799d2	11-Mar-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Eliminate writes that are never read. With an awful O(n^2) algorithm that searches previous instructions for dead writes. total instructions in shared programs: 805582 -> 788074 (-2.17%) instructions in affected programs: 144561 -> 127053 (-12.11%) Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1b8f143a2302739de90cb643d732e12b55d4e4eb	12-Mar-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Factor code out of DCE into a separate function. Will be reused in the next commit. Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9630ba6c6e754b438cf67c7d76ec1c99488df3ba	11-Mar-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Let dead code eliminate trim dead channels. That is, modify mad dst, a, b, c to be mad dst.xyz, a, b, c if dst.w is never read. total instructions in shared programs: 811869 -> 805582 (-0.77%) instructions in affected programs: 168287 -> 162000 (-3.74%) Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
dc0f5099fa3cb564c25eb892fde93cacd29df8f1	11-Mar-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Track live ranges per-channel, not per vgrf. Will be squashed with the next patch. Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
89ccd11eebeee884d581e831b61368ac97057b43	11-Mar-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Don't dead code eliminate instructions writing the flag. A future patch adds support for removing dead writes to the flag register. This patch simplifies the logic until then. total instructions in shared programs: 811813 -> 811869 (0.01%) instructions in affected programs: 3378 -> 3434 (1.66%) Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3a12f50f9ca7f03f470ee053b9076ac12c4d486d	11-Mar-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Preparatory clean up of dead_code_eliminate(). Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
10dd6eca89951e0cb40e21c3b53caa33d8fcb383	13-Mar-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Add is_null() method to dst_reg. Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0884ce8f42d0e04e889c6d0e4dde91f9aa58e85e	13-Mar-2014	Matt Turner <mattst88@gmail.com>	i965/vec4: Print the predicate in dump_instructions(). Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
01d9023a9b9a50b42f7a4ef4799d0e35e0b045ca	11-Mar-2014	Kenneth Graunke <kenneth@whitecape.org>	i965: Fix register types in dump_instructions(), again. In commit e57d77280efcbfd6579a88f071426653287ef833, I fixed this for destinations in the Vec4 backend, and sources in the scalar backend. But not both types in both backends. To prevent this mess from continuing, make the reg_encoding table static, so only the disassembler can use it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a290cd039cc07330598a101e74d25289ce70bcee	18-Feb-2014	Topi Pohjolainen <topi.pohjolainen@intel.com>	i965: Merge resolving of shader program source Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
59989a4a92e638415d50e9acdd0685eb56eb17f3	27-Feb-2014	Petri Latvala <petri.latvala@intel.com>	i965: Assert array index on access to vec4_visitor's arrays. v2: vec4_visitor::pack_uniform_registers(): Use correct comparison in the assert, this->uniforms is already adjusted. Compare the actual value used to index uniform_size and uniform_vector_size instead. Signed-off-by: Petri Latvala <petri.latvala@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a76e5dce4fc8d50f8699c108833f24e80167d706	23-Dec-2013	Eric Anholt <eric@anholt.net>	i965: Move compiler debugging output to stderr. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f28c9208652143b4925bd97ce9823728c34d34a5	21-Feb-2014	Eric Anholt <eric@anholt.net>	i965: Refactor debug dumping of GLSL IR. This was only going to get worse when tesselation shows up, and was causing too much extra duplication in my stderr changes coming up. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c2ebbe2728cd709029313f4b9c9cc53432c510a1	20-Feb-2014	Eric Anholt <eric@anholt.net>	i965: Stop throwing away our double precision for time calculations. Fixes negative times being reported in our perf debug. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
42b226ef824ed61ccf51fa9a1198cba305ad5472	19-Feb-2014	Francisco Jerez <currojerez@riseup.net>	i965: Make sure that backend_reg::type and brw_reg::type are consistent for fixed regs. And define non-mutating helper functions to retype fixed and normal regs with a common interface. At some point we may want to get rid of ::fixed_hw_reg completely and have fixed regs use the normal register data members (e.g. backend_reg::reg to select a fixed GRF number, src_reg::swizzle to store the swizzle, etc.), I have the feeling that this is not the last headache we're going to get because of the multiple ways to represent the same thing and the different register interface depending on the file a register is stored in... Reviewed-by: Paul Berry <stereotype441@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ae8b066da5862b4cfc510b3a9a0e1273f9f6edd4	19-Feb-2014	Francisco Jerez <currojerez@riseup.net>	i965: Move up duplicated fields from stage-specific prog_data to brw_stage_prog_data. There doesn't seem to be any reason for nr_params, nr_pull_params, param, and pull_param to be duplicated in the stage-specific subclasses of brw_stage_prog_data. Moving their definition to the common base class will allow some code sharing in a future commit, the removal of brw_vec4_prog_data_compare and brw__prog_data_free, and the simplification of the stage-specific brw__prog_data_compare. Reviewed-by: Paul Berry <stereotype441@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7f00c5f1a3e0db20a89cfedefd53cbe817fec9e3	23-Nov-2013	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Add constructor of src_reg from a fixed hardware reg. Reviewed-by: Paul Berry <stereotype441@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b424da4be07ab8d34986e6f3824c679b623df952	28-Nov-2013	Francisco Jerez <currojerez@riseup.net>	i965/vec4: Fix confusion between SWIZZLE and BRW_SWIZZLE macros. Most of the VEC4 back-end agrees on src_reg::swizzle being one of the BRW_SWIZZLE macros defined in brw_reg.h, except in two places where we use Mesa's SWIZZLE macros. There is even a doxygen comment saying that Mesa's macros are the right ones. They are incompatible swizzle representations (3 bits vs. 2 bits per component), and the code using Mesa's works by pure luck. Fix it. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Paul Berry <stereotype441@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e57d77280efcbfd6579a88f071426653287ef833	05-Feb-2014	Kenneth Graunke <kenneth@whitecape.org>	i965: Fix register types in dump_instructions(). This regressed when I converted BRW_REGISTER_TYPE_* to be an abstract type that doesn't match the hardware description. dump_instruction() was using reg_encoding[] from brw_disasm.c, which no longer matches (and was incorrect for Gen8+ anyway). This patch introduces a new function to convert the abstract enum values into the letter suffix we expect. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reported-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
ce527a6722491fa7d696266d5dec13f0b72bf8e8	10-Dec-2013	Topi Pohjolainen <topi.pohjolainen@intel.com>	i965: rename tex_ms to tex_cms Prepares for the introduction of non-compressed multi-sampled lookup used in the blorp programs. v2: now also taking into account gen8 Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Paul Berry <stereotype441@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
71bc11a37508542662132b16a53acd5f541cd2b4	05-Dec-2013	Matt Turner <mattst88@gmail.com>	i965: Print reg_offset for vgrf of size > 1 in dump_instruction(). Previously we wouldn't print the +0 for the first part of a VGRF of size greater than 1. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a4d68e9ee94cf4855a3240c3516279b4e7740268	17-Jan-2014	Paul Berry <stereotype441@gmail.com>	i965: Add GS support to INTEL_DEBUG=shader_time. Previously, time spent in geometry shaders would be counted as part of the vertex shader time. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9eb568d7531eb4715be24d5076353ea6c10c8ceb	07-Dec-2012	Kenneth Graunke <kenneth@whitecape.org>	i965: Create a new vec4 backend for Broadwell. This replaces the old vec4_generator backend. v2: Port to use the C-based instruction representation. Also, remove Geometry Shader offset hacks - the visitor will handle those instead of this code. v3: Texturing fixes (including adding textureGather support). v4: Pass brw_context to gen8_instruction functions as required. v5: Add SHADER_OPCODE_TXF_MCS support; port DUAL_INSTANCED gs fixes (caught by Eric). Simplify ADDC/SUBB handling; add comments to gen8_set_dp_message calls (suggested by Matt). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
26a3bf5c726199d7664d5878ef1f73592e55caa7	28-Nov-2013	Eric Anholt <eric@anholt.net>	i965: Stop doing our optimization on a copy of the GLSL IR. The original intent was that we'd keep a driver-private copy, and there would be the normal copy for swrast to make use of without the tuning (or anything more invasive we might do) specific to i965. Only, we don't generate swrast code any more, because swrast can't render current shaders anyway. Thus, our private copy is rather a waste, and we can just do our backend-specific operations on the linked shader. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7629c489c88a6f6dd47b311a90ad64e216c9a37c	29-Nov-2013	Chris Forbes <chrisf@ijw.co.nz>	i965: Add shader opcode for sampling MCS surface Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8814806c97ed60c5bb4d6cb1927cd05445864388	21-Oct-2013	Matt Turner <mattst88@gmail.com>	i965: Print conditional mod in dump_instruction(). Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
637dda1c307aee921ecc646b75f891deab6585a9	02-Dec-2013	Matt Turner <mattst88@gmail.com>	i965: Print argument types in dump_instruction(). Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
729fe77e3bdf64768e8447c281f249ac80c1b9a2	02-Dec-2013	Matt Turner <mattst88@gmail.com>	i965/vec4: Don't print swizzles for immediate values. Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
2b8e0a73fbc021305fdcab7a3c6661de7af911a9	02-Dec-2013	Matt Turner <mattst88@gmail.com>	i965/vec4: Print negate and absolute value for src args. Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a85f1b7adf1023667fea090242ba448d935eaa67	26-Nov-2013	Matt Turner <mattst88@gmail.com>	i965/vec4: Add support for printing HW_REGs in dump_instruction(). Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0e4053234df5e3461e80c90dfd743c3ac96006eb	26-Nov-2013	Matt Turner <mattst88@gmail.com>	i965: Don't print extra (null) arguments in dump_instruction(). Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d2fcdd0973ee33a2627d1dee6d78091e605af160	29-Nov-2013	Matt Turner <mattst88@gmail.com>	i965/cfg: Clean up cfg_t constructors. parent_mem_ctx was unused since db47074a, so remove the two wrappers around create() and make create() the constructor. Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a97cd0f4d7902965d5173f4bcbf2ad27c0eb5d12	30-Oct-2013	Matt Turner <mattst88@gmail.com>	i965: Add a pass to remove dead control flow. Removes IF/ENDIF and IF/ELSE/ENDIF with no intervening instructions. total instructions in shared programs: 1360393 -> 1360387 (-0.00%) instructions in affected programs: 157 -> 151 (-3.82%) (no change in vertex shaders) Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1c263f8f4f767df0511e63377c17a95ebebba944	11-Nov-2013	Matt Turner <mattst88@gmail.com>	i965/vec4: Add invalidate_live_intervals method. Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
34fe051e215107dddbaae71e2edf15f88d839936	20-Oct-2013	Francisco Jerez <currojerez@riseup.net>	i965: Add a 'has_side_effects' back-end instruction predicate. This patch fixes the three dead code elimination passes and the VEC4/FS instruction scheduling passes so they leave instructions with side effects alone. At some point it might be interesting to have the instruction scheduler calculate the exact memory dependencies between atomic ops, but they're rare enough that it seems unlikely that it will make any practical difference. Reviewed-by: Paul Berry <stereotype441@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6032261682388ced64bd33328a5025f561927a38	16-Oct-2013	Eric Anholt <eric@anholt.net>	i965: Merge together opcodes for SHADER_OPCODE_GEN4_SCRATCH_READ/WRITE I'm going to be introducing gen7 variants, and the previous naming was going to get confusing. Reviewed-by: Paul Berry <stereotype441@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5e621cb9fef7eada5a3c131d27f5b0b142658758	11-Sep-2013	Francisco Jerez <currojerez@riseup.net>	i965/gen7: Implement code generation for untyped surface read instructions. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
cfaaa9bbb7a6ab5819f4fa9e38352b72d6293cff	11-Sep-2013	Francisco Jerez <currojerez@riseup.net>	i965/gen7: Implement code generation for untyped atomic instructions. Reviewed-by: Paul Berry <stereotype441@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6bb2cf2107c4461ea9dd100edaf110b839311b90	08-Oct-2013	Chris Forbes <chrisf@ijw.co.nz>	i965: Add SHADER_OPCODE_TG4_OFFSET for gather with nonconstant offsets. The generator code ends up clearer this way than if we had to sniff via the message length. Implemented via the gather4_po message in hardware, which is present in Gen7 and later. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
89647cffb31ee1ea42d581b1053b4bb147b3e58a	16-Oct-2013	Paul Berry <stereotype441@gmail.com>	i965/vec4: if register allocation fails, don't try to schedule. Otherwise the scheduler would be invoked with prog_data->total_grf == 0, causing havoc. In a future patch, this will allow us to try compiling a geometry shader in DUAL_OBJECT mode with spilling disabled, and then fall back to DUAL_INSTANCED mode if that failed. Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8bb15813e3047820a95724e4257aa2c862eeb31a	16-Oct-2013	Paul Berry <stereotype441@gmail.com>	i965/vec4: Add the ability for attributes to be interleaved. When geometry shaders are operated in "single" or "dual instanced" mode, a single set of geometry shader inputs is interleaved into the thread payload (with each payload register containing a pair of inputs) in order to save register space. This patch modifies vec4_visitor::lower_attributes_to_hw_regs so that it can handle the interleaved format. Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e0f34301b29ecf3fb7118b2e05872510c104a49b	23-Oct-2013	Paul Berry <stereotype441@gmail.com>	i965/vec4: Extract function to set up vec4 prog key for precompiling. This will allow us to re-use it for precompiling geometry shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
068df64ba6a8309427612836e5eb384721ca6d40	23-Oct-2013	Paul Berry <stereotype441@gmail.com>	i965/vec4: Remove uses_clip_distance from program key. This should never have been in the program key in the first place, since it's determined by the shader source, not by GL state. Change the code to just refer to gl_program::UsesClipDistanceOut directly. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
705a90e30435490c2de84f4f6741cab335fa7608	03-Oct-2013	Eric Anholt <eric@anholt.net>	i965: Move the common binding table offset code to brw_shader.cpp. Now that both vec4 and fs are dynamically assigning offsets, a lot of the code is the same. v2: Avoid passing around the next offset through the class. (Review by Paul) Reviewed-by: Paul Berry <stereotype441@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d395485e1df44853cdf86b0bd46b7af36c7e1c13	03-Oct-2013	Eric Anholt <eric@anholt.net>	i965/vec4: Dynamically assign the VS/GS binding table offsets. Note that the dropped comment in brw_context.h is mostly (better written) in brw_binding_table.c as well. Reviewed-by: Paul Berry <stereotype441@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3c9dc2d31b80fc73bffa1f40a91443a53229c8e2	02-Oct-2013	Eric Anholt <eric@anholt.net>	i965: Make a brw_stage_prog_data for storing the SURF_INDEX information. It would be nice to be able to pack our binding table so that programs that use 1 render target don't upload an extra BRW_MAX_DRAW_BUFFERS - 1 binding table entries. To do that, we need the compiled program to have information on where its surfaces go. v2: Rename size to size_bytes to be more explicit. Reviewed-by: Paul Berry <stereotype441@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
a5ec01fb1bd4ad5418eb16cb05e6f6929d1444e8	20-Sep-2013	Matt Turner <mattst88@gmail.com>	i965: Don't copy prop source mods into instructions that can't take them. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e7dc88026a821a31bf2afeb934dded11c91401a1	20-Sep-2013	Matt Turner <mattst88@gmail.com>	i965: Fixup for don't dead-code eliminate instructions that write to the accumulator. Accidentally pushed an old version of the patch. v2: Set destination register using brw_null_reg(). Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
92dc16c3e2e2b9e3e71baaccc67bbe727e9d68ab	20-Sep-2013	Matt Turner <mattst88@gmail.com>	i965: Don't dead-code eliminate instructions that write to the accumulator. Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
fb455500bfb11cca0f45076a9eaccc0ddd764731	31-Mar-2013	Chris Forbes <chrisf@ijw.co.nz>	i965: add SHADER_OPCODE_TG4 Adds the Gen7 message IDs, a new SHADER_OPCODE_TG4 pseudo-op, and low-level support for emitting it via generate_tex(). V3: Updated for changes in master. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4e3d1712a223f9f0b4ff4a34b9b5447a92877347	28-Aug-2013	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Detect GRF sources in split_virtual_grfs send-from-GRF code. It is incorrect to assume that src[0] of a SEND-from-GRF opcode is the GRF. VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 uses an IMM as src[0], and stores the GRF as src[1]. To be safe, loop over all the source registers and mark any GRFs. We probably won't ever have more than one, but it's simpler to just check all three rather than attempting to bail early. Fixes assertion failures in Unigine Sanctuary since we started making register allocation rely on split_virtual_grfs working. (The register classes were actually sufficient, we were just interpreting an IMM as a virtual GRF number.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68637 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Cc: mesa-stable@lists.freedesktop.org /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
09e2df5961cfe04925bdd820e6ea59af3ba783f6	30-Aug-2013	Eric Anholt <eric@anholt.net>	i965/vs: Fix regression on pre-gen6 with no VS uniforms in use. df06745c5adb524e15d157f976c08f1718f08efa made it so that we didn't allocate extra uniform space for unused clip planes, which also incidentally made us not allocate any space at all, which we were relying on for this no-uniforms case. Instead of putting the knowledge of this special HW exception into the thing that normally preallocates prog_data for us, just allocate it here. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68766 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4416cb79926f089ff55dbbb352b94ec2890ae823	23-Mar-2013	Paul Berry <stereotype441@gmail.com>	i965/gs: Add GS_OPCODE_THREAD_END. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
96eb2f353605b382cf4fc930cc1d322130b12270	21-Mar-2013	Paul Berry <stereotype441@gmail.com>	i965/gs: Add GS_OPCODE_URB_WRITE. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7f57101ad53112b16e4a400f312f68a85dfbd108	13-Jul-2013	Paul Berry <stereotype441@gmail.com>	i965/vec4: Virtualize setup_payload instead of setup_attributes. When I initially generalized the vec4_visitor class in preparation for geometry shaders, I assumed that the setup_attributes() function would need to be different between vertex and geometry shaders, but its caller, setup_payload(), could be shared. So I made setup_attributes() a virtual function. It turns out this isn't true; setup_payload() needs to be different too, since the geometry shader payload sometimes includes an extra register (primitive ID) that has to come before uniforms. So setup_payload() needs to be the virtual function instead of setup_attributes(). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
626495d269e2c2df9dae5c46c086ffff93c77a19	13-Jul-2013	Paul Berry <stereotype441@gmail.com>	i965/vec4: Allow for dispatch_grf_start_reg to vary. Both 3DSTATE_VS and 3DSTATE_GS have a dispatch_grf_start_reg control, which determines the register where the hardware delivers data sourced from the URB (push constants followed by per-vertex input data). For vertex shaders, we always set dispatch_grf_start_reg to 1, since R1 is always the first register available for push constants in vertex shaders. For geometry shaders, we'll need the flexibility to set dispatch_grf_start_reg to different values depending on the behvaiour of the geometry shader; if it accesses gl_PrimitiveIDIn, we'll need to set it to 2 to allow the primitive ID to be delivered to the thread in R1. This patch eliminates the assumption that dispatch_grf_start_reg is always 1. In vec4_visitor, we record the regnum that was passed to vec4_visitor::setup_uniforms() in prog_data for later use. In vec4_generator, we consult this value when converting an abstract UNIFORM register to a concrete hardware register. And in the code that emits 3DSTATE_VS, we set dispatch_grf_start_reg based on the value recorded in prog_data. This will allow us to set dispatch_grf_start_reg to the appropriate value when compiling geometry shaders. Vertex shaders will continue to always use a dispatch_grf_start_reg of 1. v2: Make dispatch_grf_start_reg "unsigned" rather than "GLuint". Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
72168f5f0069b2a0d8a2434ba80f4446952e84c7	15-Aug-2013	Paul Berry <stereotype441@gmail.com>	i965/vec4: Move vec4 data structures and functions to brw_vec4.{cpp,h}. This patch moves the following things into brw_vec4.{cpp,h}: - struct brw_vec4_compile - struct brw_vec4_prog_key - brw_vec4_prog_data_compare() - brw_vec4_prog_data_free() This will allow us to avoid having to include brw_vs.h in geometry-shader-specific files. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5fb13d871e062354a77a427b3a3fe7f3d6908e5a	20-Mar-2013	Paul Berry <stereotype441@gmail.com>	i965: Stop including brw_vs.h from brw_vec4.h. This is backwards from what we are going to want in the long term, which is: - brw_vec4.h declares general-purpose vec4 infrastructure needed by both VS and GS - brw_vs.h includes brw_vec4.h and adds VS-specific parts. - brw_gs.h includes brw_vec4.h and adds GS-specific parts. Note that at the moment brw_vec.h contains a fair amount of VS-specific declarations--I plan to address that in a later patch. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c642bd3dcc1a6f1039732c614ab8a56dd3779427	15-Aug-2013	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Plumb brw_vec4_prog_data into vec4_generator(). This will be useful for the next commit. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
53631be4ebaa4fb13a7f129727c1cdd32fcc6f3d	06-Jul-2013	Kenneth Graunke <kenneth@whitecape.org>	i965: Move intel_context::gen and gt fields to brw_context. Most functions no longer use intel_context, so this patch additionally removes the local "intel" variables to avoid compiler warnings. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b15f1fc3c6b3b9dc4422940c412f80e581c9900d	03-Jul-2013	Kenneth Graunke <kenneth@whitecape.org>	i965: Move intel_context::perf_debug to brw_context. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
329779a0b45b63be17627f026533c80b2c8f7991	03-Jul-2013	Kenneth Graunke <kenneth@whitecape.org>	i965: Move intel_context::batch to brw_context. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
426ca34b7a2c3b9edfc0189daece8de3aff80627	13-Jun-2013	Eric Anholt <eric@anholt.net>	glsl: Remove ir_print_visitor.h includes and usage We have ir->print() to do the old declaration of a visitor and having the IR accept the visitor (yuck!). And now you can call _mesa_print_ir() safely anywhere that you know what an ir_instruction is. A couple of missing printf("\n")s are added in error paths -- when an expression is handed to the visitor, it doesn't print '\n' (since it might be a step in printing a whole expression tree). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6220cc931f15ddb428ea481e8b9a70ce26ca3304	28-May-2013	Eric Anholt <eric@anholt.net>	i965/vs: Fix implied_mrf_writes() for integer division pre-gen6. Previously it would assertion fail in debug builds (though the correct value was returned in a non-debug build). Marking it as a candidate for stable even though it has no current consumers in the stable branches, in case one shows up in a later backport. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64727 NOTE: This is a candidate for stable branches. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0f3068a58bdbceb2cb93e3848b0e2145629cdf43	01-May-2013	Eric Anholt <eric@anholt.net>	i965/vs: Make virtual grf live intervals actually cover their used range. This is the same change as the previous commit to the FS. A very few VSes are regressed by 1 or 2 instructions, which look recoverable with a bit more dead code elimination. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
573d8813fdbb116f4500d2044c56d80aab73ab7f	01-Dec-2012	Eric Anholt <eric@anholt.net>	i965/vs: Add instruction scheduling. While this is ignorant of dependency control, it's still good for a 0.39% +/- 0.08% performance improvement on GLBenchmark 2.7 (n=548) v2: Rewrite as a subclass of the base class for the FS instruction scheduler, inheriting the same latency information. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
63c8155b09bca7917631ec678a0d0db6e7965a1a	29-Apr-2013	Eric Anholt <eric@anholt.net>	i965: Make dump_instructions be a virtual method of the visitor. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5e46482993dfd30b888d5219f6fecf4b4d1f42de	28-Apr-2013	Kenneth Graunke <kenneth@whitecape.org>	i965: Move is_math/is_tex/is_control_flow() to backend_instruction. These are entirely based on the opcode, which is available in backend_instruction. It makes sense to only implement them in one place. This changes the VS implementation of is_tex() slightly, which now accepts FS_OPCODE_TXB and SHADER_OPCODE_LOD. However, since those aren't generated in the VS anyway, it should be fine. This also makes is_control_flow() available in the VS. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
417d8917d4924652f1cd0c64dbf3677d4eddbf8c	16-Apr-2013	Paul Berry <stereotype441@gmail.com>	i965/vec4: Fix hypothetical use of uninitialized data in attribute_map[]. Fixes issue identified by Klocwork analysis: 'attribute_map' array elements might be used uninitialized in this function (vec4_visitor::lower_attributes_to_hw_regs). The attribute_map array contains the mapping from shader input attributes to the hardware registers they are stored in. vec4_vs_visitor::setup_attributes() only populates elements of this array which, according to core Mesa, are actually used by the shader. Therefore, when vec4_visitor::lower_attributes_to_hw_regs() accesses the array to lower a register access in the shader, it should in principle only access elements of attribute_map that contain valid data. However, if a bug ever caused the driver back-end to access an input that was not flagged as used by core Mesa, then lower_attributes_to_hw_regs() would access uninitialized memory, which could cause illegal instructions to get generated, resulting in a possible GPU hang. This patch makes the situation more robust by using memset() to pre-initialize the attribute_map array to zero, so that if such a bug ever occurred, lower_attributes_to_hw_regs() would generate a (mostly) harmless access to r0. In addition, it adds assertions to lower_attributes_to_hw_regs() so that if we do have such a bug, we're likely to discover it quickly. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
dea70404eb615bfa148fbd0fec5670fb2657c47b	11-Apr-2013	Eric Anholt <eric@anholt.net>	i965: Fix a warning in the release build. This was copy and pasted from can_reswizzle_dst(), and we can just fold it in instead to avoid the warning. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
195a6cca3cbc26eeea0f7f8dfc21dd3429911779	11-Apr-2013	Matt Turner <mattst88@gmail.com>	i965/vs: Print error if vertex shader fails to compile. Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
32a8e877666f7c3798d736bb6f05ad2f41356ebf	11-Apr-2013	Matt Turner <mattst88@gmail.com>	i965: NULL check prog on shader compilation failure. Also change if (shader) to if (prog) for consistency. Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e9fa3a94486d80da34542cfd24425c208a8d30fe	23-Mar-2013	Paul Berry <stereotype441@gmail.com>	i965/vs: Don't hardcode DEBUG_VS in generic vec4 code. Since the vec4_visitor and vec4_generator classes are going to be re-used for geometry shaders, we can't enable their debug functionality based on (INTEL_DEBUG & DEBUG_VS) anymore. Instead, add a debug_flag boolean to these two classes, so that when they're instantiated the caller can specify whether debug dumps are needed. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
444fce6398556118629ef01204a7d8ff7af0bea3	22-Mar-2013	Paul Berry <stereotype441@gmail.com>	i965/vs: Generalize attribute setup code in preparation for GS. This patch introduces a new function, vec4_visitor::lower_attributes_to_hw_regs(), which replaces registers of type ATTR in the instruction stream with the hardware registers that store those attributes. This logic will need to be common between the vertex and geometry shaders. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
9bb6840b28a9a77377d437198c62d705cade5370	17-Feb-2013	Paul Berry <stereotype441@gmail.com>	i965/vs: Generalize data structures pointed to by vec4_generator. This patch removes the following field from vec4_generator, since it is not used: - struct brw_vs_compile c And changes the following field: - struct gl_vertex_program vp => struct gl_program *prog With these changes, vec4_generator no longer refers to any VS-specific data structures. This will pave the way for re-using it for geometry shaders. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> v2: Use the name "prog" rather than "p". Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5743bea0ba1eda07be831d95c5b7729f9ba98a92	17-Feb-2013	Paul Berry <stereotype441@gmail.com>	i965/vs: move VS-specific data members to vs_vec4_visitor. This patch moves the following data structures from vec4_visitor to vec4_vs_visitor, since they contain VS-specific data: - struct brw_vs_compile c (renamed to vs_compile) - struct brw_vs_prog_data prog_data (renamed to vs_prog_data) - src_reg vp_temp_regs - src_reg vp_addr_reg Since brw_vs_compile and brw_vs_prog_data also contain vec4-generic data, the following pointers are added to the base class, to allow it to access the vec4-generic portions of these data structures: - struct brw_vec4_compile c - struct brw_vec4_prog_key key - struct brw_vec4_prog_data prog_data Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> v2: Use shorter names in the base class and longer names in the derived class. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8941f73c7ccb3c6cfa965a19f346e4b6ead6abdb	17-Feb-2013	Paul Berry <stereotype441@gmail.com>	i965/vs: Make some vec4_visitor functions virtual. This patch makes the following vec4_visitor functions virtual, since they will need to be implemented differently for vertex and geometry shaders. Some of the functions are renamed to reflect their generic purpose, rather than their VS-specific behaviour: - setup_attributes - emit_attribute_fixups (renamed to emit_prolog) - emit_vertex_program_code (renamed to emit_program_code) - emit_urb_writes (renamed to emit_thread_end) Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
e9be5a05f70be7cff58b29bff07af71e6d339085	16-Feb-2013	Paul Berry <stereotype441@gmail.com>	i965/vs: Make vec4_vs_visitor class derived from vec4_visitor. This patch just creates the derived class; later patches will migrate VS-specific functions and data structures from the base class into the derived class. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
5fff3752c88255ea3f4eb26cddb2c996694b33b1	17-Feb-2013	Paul Berry <stereotype441@gmail.com>	i965/vs: split brw_vs_prog_data into generic and VS-specific parts. This will allow the generic parts to be re-used for geometry shaders. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> v2: Put urb_read_length and urb_entry_size in the generic struct. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0c994f181ce1a09cdbb7db27e4ad5565248bf8e1	16-Feb-2013	Paul Berry <stereotype441@gmail.com>	i965/vs: split brw_vs_prog_key into generic and VS-specific parts. This will allow the generic parts to be re-used for geometry shaders. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
09cd6e06d2c7a54ca6eb8d3102822efa78e01a9c	16-Feb-2013	Paul Berry <stereotype441@gmail.com>	i965/vs: Remove brw_vs_prog_data pointer from brw_vs_compile. In patches that follow, we'll be splitting structs brw_vs_prog_data and brw_vs_compile into a vec4-generic base struct and a VS-specific derived struct (this will allow the vec4-generic code to be re-used for geometry shaders). Having brw_vs_compile point to brw_vs_prog_data makes it difficult to do this cleanly. Fortunately most of the functions that use brw_vs_compile (those in the vec4_visitor class) already have access to brw_vs_prog_data through a separate pointer (vec4_visitor::prog_data). So all we have to do is use that pointer consistently, and plumb prog_data through the few remaining functions that need access to it. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b29613371c316e9273ebe29ba37fb2f04c2ed58d	16-Feb-2013	Paul Berry <stereotype441@gmail.com>	i965/vs: Make type of vec4_visitor::vp more generic. The vec4_visitor functions don't use any VS specific data from vec4_visitor::vp. So rename it to "prog" and change its type from struct gl_vertex_program * to struct gl_program *. This will allow the code to be re-used for geometry shaders. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> v2: Use the name "prog" rather than "p". Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
fe97f26c86d65b1b0e026c725c7da348a91093d9	09-Apr-2013	Paul Berry <stereotype441@gmail.com>	i965: Rename backend_visitor::prog to shader_prog. The next patch is going to change the type of vec4_visitor::vp from struct gl_vertex_program * to struct gl_program , and rename it. The sensible name to change it to is vec4_visitor::prog. However, prog is already used in backend_visitor (which vec4_visitor derives from). Since backend_visitor::prog is of type struct gl_shader_program , it makes sense to rename it to shader_prog. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d5f7aebac2b1afbc5023cd114174860d8d763d06	04-Apr-2013	Eric Anholt <eric@anholt.net>	i965/vs: Use GRFs for pull constant offsets on gen7. This allows the computation of the offset to get written directly into the message source. shader-db results: total instructions in shared programs: 3308390 -> 3283025 (-0.77%) instructions in affected programs: 442998 -> 417633 (-5.73%) No difference in GLB2.7 low res (n=9). Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3badbf7f7fa4898c69545fea3c60ea29cf61ae3b	05-Apr-2013	Eric Anholt <eric@anholt.net>	i965/vs: When asked to make a dst_reg for a src.xxxx, just write to src.x. We have several places in our pull constant handling where we make a temporary src_reg for an int, and then turn it into a dst. In doing so, we were writing to the dst.xyzw, so we never register coalesced it with a later mov from dst.x to real_dst.x. These extra channels written would be removed if we had channel-wise DCE in the backend, but we don't. Fix it for now by just not writing these extra channels that won't get used. Reviewed-by: Matt Turner <mattst88@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
4fee05b020af72ee802d4349de76fbc36cdd53a9	01-Dec-2012	Eric Anholt <eric@anholt.net>	i965/vs: Add a pass to set dependency control fields on instructions. This is a more aggressive version of the old brw_optimize() path. Reduces cycles spent in the vertex shader on minecraft by 18.6% +/- 10.0% (n=15). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
20d846ce8b46604ced835eb68079a0dbae2e19dc	12-Mar-2013	Eric Anholt <eric@anholt.net>	i965: Add names for all instructions to dump_instruction() in FS and VS. I'd previously added the minimum names to understand my dumps, but this makes dumps in general much easier to read. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
6192e9b377c6fa4f36da42af6c06ca32b10e7e62	20-Mar-2013	Eric Anholt <eric@anholt.net>	i965/vs: Include URB payload setup in shader_time. This much more accurately reflects the cost of the vertex shader, since the payload setup is often a significant fraction of the instructions in the VS. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
55feb19704ae69c580f431d6498344521de369cd	18-Dec-2012	Eric Anholt <eric@anholt.net>	i965/vs: Use a send from a 2-register VGRF for shader time writes. This will let us emit it later, after we're setting up MRFs for the URB write. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
130138030a3dc8bda20766146ca9fda4047133d3	18-Dec-2012	Eric Anholt <eric@anholt.net>	i965/vs: Teach copy propagation about sends from GRFs. This incidentally also teaches it a bit about gen6 math -- we now allow unswizzled, unmodified GRF temps as the sources for math. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c3a22d42a88c299561dd913d0a00bb986921eeba	18-Dec-2012	Eric Anholt <eric@anholt.net>	i965/vs: Prepare split_virtual_grfs() for the presence of SENDs from GRFs. v2: Fix silly bool handling, and don't add new tabs. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d2ba1c24b440ee74436335d8e815be9b72b1ba7f	19-Mar-2013	Eric Anholt <eric@anholt.net>	i965: Track ARB program state along with GLSL state for shader_time. This will let us do much better printouts for non-GLSL programs. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d24819dce8cf6dac23f27df46fabbf756a732229	11-Mar-2013	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Add IR dumping for immediates. This makes dump_instructions more useful. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
db3a0f13ef13b6d392dfc3b7346351533600d343	11-Mar-2013	Eric Anholt <eric@anholt.net>	i965: Split shader_time entries into separate cachelines. This avoids some snooping overhead between EUs processing separate shaders (so VS versus FS). Improves performance of a minecraft trace with shader_time by 28.9% +/- 18.3% (n=7), and performance of my old GLSL demo by 93.7% +/- 0.8% (n=4). v2: Add a define for the stride with a comment explaining its units and why. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
14cec07177f438717cc6fb9252525e16d6b3d8dd	22-Feb-2013	Eric Anholt <eric@anholt.net>	i965: Make perf_debug() output to GL_ARB_debug_output in a debug context. I tried to ensure that performance in the non-debug case doesn't change (we still just check one condition up front), and I think the impact is small enough in the debug context case to warrant including all of it. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f52ce6a0ca73d1cd89091689efd8ea2e14748723	24-Jan-2013	Chris Forbes <chrisf@ijw.co.nz>	i965: add a new virtual opcode: SHADER_OPCODE_TXF_MS This is very similar to the TXF opcode, but lowers to `ld2dms` rather than `ld` on Gen7. V4: - add SHADER_OPCODE_TXF_MS to is_tex() functions, so regalloc thinks it actually writes the correct number of registers. Otherwise in nontrivial shaders some of the registers tend to get clobbered, producing bad results. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Paul Berry <stereotype441@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f0213b124259804ce8e114575fe9058dffdf5864	13-Feb-2013	Matt Turner <mattst88@gmail.com>	i965/vs/gen7: Allow MATH instructions to have MRF as a destination total instructions in shared programs: 346873 -> 346847 (-0.01%) instructions in affected programs: 364 -> 338 (-7.14%) (All affected shaders are from Lightsmark) Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
d5efc14635cf25bc130bfa77737913913d9202ce	21-Nov-2012	Eric Anholt <eric@anholt.net>	i965: Add asserts to check that we don't realloc ParameterValues. Things are even more restrictive than they used to be, so I've made mistakes in this area. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c9e48e5b083b6cf97ecdb2d17c874ea631203b06	02-Aug-2012	Eric Anholt <eric@anholt.net>	i965: Generalize VS compute-to-MRF for compute-to-another-GRF, too. No statistically significant performance difference on glbenchmark 2.7 (n=60). It reduces cycles spent in the vertex shader by 3.3% +/- 0.8% (n=5), but that's only about .3% of all cycles spent according to the fixed shader_time. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
471af25fc57dc43a8277b4b17ec82547287621d0	01-Dec-2012	Eric Anholt <eric@anholt.net>	i965/vs: Extend opt_compute_to_mrf to handle limited "reswizzling" The way our visitor works, scalar expression/swizzle results that get stored in channels other than .x will have an intermediate MOV from their result in the .x channel to the real .y (or whatever) channel, and similarly for vec2/vec3 results. By knowing how to adjust DP4-type instructions for optimizing out a swizzled MOV, we can reduce instructions in common matrix multiplication cases. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f74560f3fb516971e6a7b03a2382db2f58699f59	10-Dec-2012	Eric Anholt <eric@anholt.net>	i965: Scale shader_time to compensate for resets. Some shaders experience resets more than others, which skews the numbers reported. Attempt to correct for this by linearly scaling according to the number of resets that happen. Note that will not be accurate if invocations of shaders have varying times and longer invocations are more likely to reset. However, this should at least be better than the previous situation. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
338b5f887d462bbe7ef58a233cd00619e43415f0	10-Dec-2012	Eric Anholt <eric@anholt.net>	i965: Adjust the split between shader_time_end() and shader_time_write(). I'm about to emit other kinds of writes besides time deltas, and it turns out with the frequency of resets, we couldn't really use the old time delta write() function more than once in a shader. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
71f06344a0d72a6bd27750ceca571fc016b8de85	27-Nov-2012	Eric Anholt <eric@anholt.net>	i965: Add a debug flag for counting cycles spent in each compiled shader. This can be used for two purposes: Using hand-coded shaders to determine per-instruction timings, or figuring out which shader to optimize in a whole application. Note that this doesn't cover the instructions that set up the message to the URB/FB write -- we'd need to convert the MRF usage in these instructions to GRFs so that our offsets/times don't overwrite our shader outputs. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1) v2: Check the timestamp reset flag in the VS, which is apparently getting set fairly regularly in the range we watch, resulting in negative numbers getting added to our 32-bit counter, and thus large values added to our uint64_t. v3: Rebase on reladdr changes, removing a new safety check that proved impossible to satisfy. Add a comment to the AOP defs from Ken's review, and put them in a slightly more sensible spot. v4: Check timestamp reset in the FS as well. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b126228f1247fb0fed686ee3ef2c87461f2fc7a7	30-Nov-2012	Eric Anholt <eric@anholt.net>	i965: Include codegen time in the INTEL_DEBUG=perf stall detection. In the VS case, we were missing the entire compile time in the stall detection! Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
0f06864ba566eaff5b739a9d0fba5ed7eaadd60b	30-Nov-2012	Eric Anholt <eric@anholt.net>	i965: Don't leak the IR annotation into later instructions. After walking our IR instructions (Mesa or GLSL), we don't want to also mark the start of the FB/URB writes or whatever as being that IR. This can end up being misleading when the end of the IR visit got copy propagated out to a later instruction in the URB writes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c1023608002c985b9d72edc64732cd666de2a206	27-Nov-2012	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Move struct brw_compile (p) entirely inside vec4_generator. The brw_compile structure contains the brw_instruction store and the brw_eu_emit.c state tracking fields. These are only useful for the final assembly generation pass; the earlier compilation stages doesn't need them. This also means that the code generator for future hardware won't have access to the brw_compile structure, which is extremely desirable because it prevents accidental generation of Gen4-7 code. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
eda9726ef51dcfd3895924eb0f74df8e67aa9c3a	27-Nov-2012	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Split final assembly code generation out of vec4_visitor. Compiling shaders requires several main steps: 1. Generating VS IR from either GLSL IR or Mesa IR 2. Optimizing the IR 3. Register allocation 4. Generating assembly code This patch splits out step 4 into a separate class named "vec4_generator." There are several reasons for doing so: 1. Future hardware has a different instruction encoding. Splitting this out will allow us to replace vec4_generator (which relies heavily on the brw_eu_emit.c code and struct brw_instruction) with a new code generator that writes the new format. 2. It reduces the size of the vec4_visitor monolith. (Arguably, a lot more should be split out, but that's left for "future work.") 3. Separate namespaces allow us to make helper functions for generating instructions in both classes: ADD() can exist in vec4_visitor and create IR, while ADD() in vec4_generator() can create brw_instructions. (Patches for this upcoming.) Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8af8a26480e9e71fb1501b675f21a469c1699b9b	27-Nov-2012	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Move uses of brw_compile from do_vs_prog to brw_vs_emit. The brw_compile structure is closely tied to the Gen4-7 hardware encoding. However, do_vs_prog is very generic: it just calls out to get a compiled program and then uploads it. This isn't ultimately where we want it, but it's a step in the right direction: it's now closer to the code generator. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
746fc346eae21d227b06799f3e82a1404c75bdc9	27-Nov-2012	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Rework memory contexts for shader compilation data. During compilation, we allocate a bunch of things: the IR needs to last at least until code generation...and then the program store needs to last until after we upload the program. For simplicity's sake, just keep it all around until we upload the program. After that, it can all be freed. This will also save a lot of headaches during the upcoming refactoring. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
403bb1d306c5bc23ad9e2c26fd39071e6e41f665	27-Nov-2012	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Pass the brw_context pointer into vec4_visitor and do_vs_prog. We used to steal it out of the brw_compile struct...but vec4_visitor isn't going to have one of those in the future. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
dd50c88386c8220f4631115b68a10930378ead6c	27-Nov-2012	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Move some functions from brw_vec4_emit.cpp to brw_vec4.cpp. This leaves only the final code generation stage in brw_vec4_emit.cpp, moving the payload setup, run(), and brw_vs_emit functions to brw_vec4.cpp. The fragment shader backend puts these functions in brw_fs.cpp, so this patch also helps with consistency. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
10ff6772c8054aea12ac0f08e2e3898fd4a7f76b	25-Oct-2012	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Don't lose the MRF writemask when doing compute-to-MRF. Consider the following code sequence: mul(8) g4<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF mov.sat(8) m1<1>.xyF g4<4,4,1>F mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF mov.sat(8) m1<1>.zwF g4<4,4,1>F The compute-to-MRF pass will discover the first mov.sat and attempt to replace it by rewriting earlier instructions. Everything works out, so it replaces scan_inst's destination file, reg, and reg_offset, resulting in: mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF mul(8) g4<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF mov.sat(8) m1<1>.zwF g4<4,4,1>F Unfortunately, it loses the .xy writemask on the mov.sat's MRF destination. While this doesn't pose an immediate problem, it then proceeds to transform the second mov.sat, resulting in: mul(8) m1<1>F g1<0,4,1>.wzwwF g3<4,4,1>.wzwwF mul(8) m1<1>F g1<0,4,1>.xxyxF g3<4,4,1>.xxyxF Instead of writing both halves of the vector (like the original code), it overwrites the full vector both times, clobbering the desired .xy values. When encountering a MOV, the compute-to-MRF code scans for instructions which generate channels of the MOV source. It ensures that all necessary channels are available (possibly written by several instructions). In this case, more channels are available than necessary, so we want to take the subset that's actually used. Taking the bitwise and of both writemasks should accomplish that. This was discovered by analyzing an ARB_vertex_program test (glean/vertProg1/MUL test (with swizzle and masking)) with my new Mesa IR -> Vec4 IR translator code. However, it should be possible with GLSL programs as well. NOTE: This is a candidate for stable release branches. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f593acd5778d4fdfa3493bb90c99b52e45667bc0	19-Oct-2012	Tapani Pälli <tapani.palli@intel.com>	i965/vs: include format argument in debug printf otherwise some compilers will throw error "error: format not a string literal and no format arguments" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Chad Versace <chad.versace@linux.intel.com> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
20ebebac5153affcbd44350332678a2fb04d4c96	03-Oct-2012	Eric Anholt <eric@anholt.net>	i965/vs: Improve live interval calculation. This is derived from the FS visitor code for the same, but tracks each channel separately (otherwise, some typical fill-a-channel-at-a-time patterns would produce excessive live intervals across loops and cause spilling). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48375 (crash -> failure, can turn into pass by forcing unrolling still) /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
914d8f9f84a3539758716d676d59a1fee4cc559f	04-Oct-2012	Eric Anholt <eric@anholt.net>	i965/vs: Add a little bit of IR-level debug ability. This is super basic, but it let me visualize a problem I had with opt_compute_to_mrf(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
34c58acb59bc0b827e28ef9e89044621ab0b3ee1	03-Oct-2012	Eric Anholt <eric@anholt.net>	i965/vs: Add support for splitting virtual GRFs. This should improve our ability to register allocate without spilling. Unfortuantely, due to the live variable analysis being ignorant of loops, we still have register allocation failures on some programs. v2: Add more context to the comment explaining the function. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1) /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
25ca9cc8236845a4be32a6f39b4a6d1664d4b403	04-Jul-2012	Eric Anholt <eric@anholt.net>	i965/vs: Move the other two src_reg/dst_reg constructors to brw_vec4.cpp. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
b2f5d4c3ec9ec2fec8b39c87eb00121a24107276	04-Jul-2012	Eric Anholt <eric@anholt.net>	i965/vs: Move class functions to brw_vec4.cpp. This has less impact than for the FS (4k savings), because it was partially done already, but makes things more consistent. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7e7c40ff98cc2b930bc3113609ace5430f2bdc95	26-Oct-2011	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Add vec4_instruction::is_tex() query. Copy and pasted from fs_inst::is_tex(), but without TXB. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
1d4f3ca8f0442821c914b758b323e6e5124149a3	29-Sep-2011	Kenneth Graunke <kenneth@whitecape.org>	i965/vs: Implement integer quotient and remainder math operations. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
c662764f4f9d9d0303fb2685dfdc93824fa15dca	06-Sep-2011	Eric Anholt <eric@anholt.net>	i965/vs: Add support for compute-to-MRF. Removes 1.8% of the instructions from 97% of the vertex shaders in shader-db. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
160848d8ef96cf3a760c02cc576df7dbffc1f669	06-Sep-2011	Eric Anholt <eric@anholt.net>	i965/vs: Add a function for how many MRFs get written as part of a SEND. This will be used for compute-to-mrf, which needs to know when MRFs get overwritten. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
f0c04e6c22babf2aee2ad1ee85dbd6f996be3712	03-Sep-2011	Eric Anholt <eric@anholt.net>	i965/vs: Add support for simple algebraic optimizations. We generate silly code for array access, and it's easier to generally support the cleanup than to specifically avoid the bad code in each place we might generate it. Removes 4.6% of instructions from 41.6% of shaders in shader-db, particularly savage2/hon and unigine. v2: Fixes by Ken: Make is_zero/one member functions, and fix a progress flag. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
cc9eb936c220267b6130b705fc696d05906a31df	02-Sep-2011	Eric Anholt <eric@anholt.net>	i965/vs: Add support for copy propagation of the UNIFORM and ATTR files. Removes 2.0% of the instructions from 35.7% of vertex shaders in shader-db. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
42ce13195b94d0d51ca8e7fa5eed07fde8f37988	30-Aug-2011	Eric Anholt <eric@anholt.net>	i965/vs: Add constant propagation to a few opcodes. This differs from the FS in that we track constants in each destination channel, and we we have to look at all the swizzled source channels. Also, the instruction stream walk is done in an O(n) manner instead of O(n^2). Across shader-db, this reduces 8.0% of the instructions from 60.0% of the vertex shaders, leaving us now behind the old backend by 11.1% overall. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
df35d691807656d3627b6fa6f51a08674bdc043e	07-Sep-2011	Eric Anholt <eric@anholt.net>	i965/vs: Add support for overflowing the number of available push constants. Fixes glsl-vs-uniform-array-4. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33742 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
72cfc6f3778d8297e52c254a5861a88eb62e4d67	23-Aug-2011	Eric Anholt <eric@anholt.net>	i965/vs: Pack live uniform vectors together in the push constant upload. At some point we need to also move uniform accesses out to pull constants when there are just too many in use, but we lack tests for that at the moment. Fixes glsl-vs-large-uniform-array. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
7c84b9d303345fa5075dba8c4ea7af449d93b0f8	23-Aug-2011	Eric Anholt <eric@anholt.net>	i965/vs: Track uniforms as separate vectors once we've done array access. This will make it easier to figure out which elements are totally unused and not upload them. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
8174945d3346dc049ae56dcb4bf1eab39f5c88aa	17-Aug-2011	Eric Anholt <eric@anholt.net>	i965/vs: Add simple dead code elimination. This is copied right from the fragment shader. It is needed for real register allocation to work correctly. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp
3dadc1e3cceac80a1b63cad2e10f0e0f8904531b	17-Aug-2011	Eric Anholt <eric@anholt.net>	i965/vs: Copy the live intervals calculation over from the FS. This is a rather pessimistic calculation, since it doesn't distinguish individual channels of a vec4, or elements of an array, but should be a minimum start for register allocation. /external/mesa3d/src/mesa/drivers/dri/i965/brw_vec4.cpp