7d3a10c516eb5b1b4058bee4640c5fbb22617f5b |
|
20-Feb-2017 |
Samuel Iglesias Gonsálvez <siglesias@igalia.com> |
i965/fs: detect different bit size accesses to uniforms to push them in proper locations Previously, if we had accesses with different sizes to the same uniform, we might not push it aligned with the bigger one. This is a problem in BSW/BXT when we access an array of DF uniform with both direct and indirect addressing because for the latter we use 32-bit MOV INDIRECT instructions. However this problem can happen with other generations and bitsizes. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> (cherry picked from commit a497ab6838ae5a9898abfed82f7bc8295b490911)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d4caa4249c28a4941e9f6a57ea97955f5d63797f |
|
21-Feb-2017 |
Samuel Iglesias Gonsálvez <siglesias@igalia.com> |
i965/fs: mark last DF uniform array element as 64 bit live one This bug can make that we don't detect the end of a contiguous area correctly and push larger areas than the real ones. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> (cherry picked from commit 7427425247d80c9f59a3c3ad2dfeeb2429de6f67)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0d5071db5e50629a63490639a3c86dfc65bf27ab |
|
13-Jan-2017 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Move Gen4-5 interpolation stuff to brw_wm_prog_data. This fixes glxgears rendering, which had surprisingly been broken since late October! Specifically, commit 91d61fbf7cb61a44adcaae51ee08ad0dd6b. glxgears uses glShadeModel(GL_FLAT) when drawing the main portion of the gears, then uses glShadeModel(GL_SMOOTH) for drawing the Gouraud-shaded inner portion of the gears. This results in the same fragment program having two different state-dependent interpolation maps: one where gl_Color is flat, and another where it's smooth. The problem is that there's only one gen4_fragment_program, so it can't store both. Each FS compile would trash the last one. But, the FS compiles are cached, so the first one would store FLAT, and the second would see a matching program in the cache and never bother to compile one with SMOOTH. (Clearing the program cache on every draw made it render correctly.) Instead, move it to brw_wm_prog_data, where we can keep a copy for every specialization of the program. The only downside is bloating the structure a bit, but we can tighten that up a bit if we need to. This also lets us kill gen4_fragment_program entirely! Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5edc3381628d1db4468f31b1c66bb518146e35b5 |
|
09-Jan-2017 |
Kenneth Graunke <kenneth@whitecape.org> |
compiler: Merge shader_info's tcs and tes structs. Annoyingly, SPIR-V lets you specify all of these fields in either the TCS or TES, which means that we need to be able to store all of them for either shader stage. Putting them in a union won't work. Combining both is an easy solution, and given that the TCS struct only had a single field, it's pretty inexpensive. This patch renames the combined struct to "tess" to indicate that it's for tessellation in general, not one of the two stages. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c2acf97fcc9b32eaa9778771282758e5652a8ad4 |
|
16-Dec-2016 |
Juan A. Suarez Romero <jasuarez@igalia.com> |
nir/i965: use two slots from inputs_read for dvec3/dvec4 vertex input attributes So far, input_reads was a bitmap tracking which vertex input locations were being used. In OpenGL, an attribute bigger than a vec4 (like a dvec3 or dvec4) consumes just one location, any other small attribute. So we mark the proper bit in inputs_read, and also the same bit in double_inputs_read if the attribute is a dvec3/dvec4. But in Vulkan, this is slightly different: a dvec3/dvec4 attribute consumes two locations, not just one. And hence two bits would be marked in inputs_read for the same vertex input attribute. To avoid handling two different situations in NIR, we just choose the latest one: in OpenGL, when creating NIR from GLSL/IR, any dvec3/dvec4 vertex input attribute is marked with two bits in the inputs_read bitmap (and also in the double_inputs_read), and following attributes are adjusted accordingly. As example, if in our GLSL/IR shader we have three attributes: layout(location = 0) vec3 attr0; layout(location = 1) dvec4 attr1; layout(location = 2) dvec3 attr2; then in our NIR shader we put attr0 in location 0, attr1 in locations 1 and 2, and attr2 in location 3 and 4. Checking carefully, basically we are using slots rather than locations in NIR. When emitting the vertices, we do a inverse map to know the corresponding location for each slot. v2 (Jason): - use two slots from inputs_read for dvec3/dvec4 NIR from GLSL/IR. v3 (Jason): - Fix commit log error. - Use ladder ifs and fix braces. - elements_double is divisible by 2, don't need DIV_ROUND_UP(). - Use if ladder instead of a switch. - Add comment about hardware restriction in 64bit vertex attributes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e729504fb1799c3ae31cea76d73946530ef9806f |
|
14-Sep-2016 |
Timothy Arceri <timothy.arceri@collabora.com> |
nir: pass compiler rather than devinfo to functions that call nir_optimize Later we will pass compiler to nir_optimise to be used by the loop unroll pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b56fa830c6095f8226456b2aeb62f2dfad804be5 |
|
09-Dec-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fetch one cacheline of pull constants at a time. Asking the DC for less than one cacheline (4 owords) of data for uniform pull constants is suboptimal because the DC cannot request less than that from L3, resulting in wasted bandwidth and unnecessary message dispatch overhead, and exacerbating the IVB L3 serialization bug. The following table summarizes the overall framerate improvement (with statistical significance of 5% and sample size ~10) from the whole series up to this patch for several benchmarks and hardware generations: | SKL | BDW | HSW SynMark2 OglShMapPcf | 24.63% ±0.45% | 4.01% ±0.70% | 10.31% ±0.38% GfxBench4 gl_manhattan31 | 5.93% ±0.35% | 3.92% ±0.31% | 6.62% ±0.22% GfxBench4 gl_4 | 2.52% ±0.44% | 1.23% ±0.10% | N/A Unigine Valley | 0.83% ±0.17% | 0.23% ±0.05% | 0.74% ±0.45% Note that there are two versions of the Manhattan demo shipped with GfxBench4, one of them is the original gl_manhattan demo which doesn't use UBOs, so this patch will have no effect on it, and another one is the gl_manhattan31 demo based on GL 4.3/GLES 3.1, which this patch benefits as shown above. I haven't observed any statistically significant regressions in the benchmarks I have at hand. Note that the comparatively huge improvement on SKL in the OglShMapPcf test case is due to the combined effect of this patch and the register pressure benefit on SKL+ of "i965/fs: Switch to the constant cache for uniform pull constants.", part of the same series. Going up to 8 oword blocks would improve performance of pull constants even more, but at the cost of some additional bandwidth and register pressure, so it would have to be done on-demand based on the number of constants actually used by the shader. v2: Fix for Gen4 and 5. v3: Non-trivial rebase. Rework to allow the visitor specifiy arbitrary pull constant block sizes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9b22a0d295316b7547667ebbfe1e1b6182439186 |
|
09-Dec-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Expose arbitrary pull constant load sizes to the IR. Change the FS generator to ask the dataport for enough owords worth of constants to fill the execution size of the instruction -- Which means that the visitor now needs to set the execution size correctly for uniform pull constant load instructions, which we were kind of neglecting until now. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ad38ba113491869ab0dffed937f7b3dd50e8a735 |
|
26-Oct-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Switch to the constant cache for uniform pull constants. This reverts to using the oword block read messages for uniform pull constant loads, as used to be the case until 4c1fdae0a01b3f92ec03b61aac1d3df5. There are two important differences though: Now the L3 cacheability bits are set up correctly for UBOs (since 11f5d8a5d4fbb861ec161f68593e429cbd65d1cd), and we target the constant cache instead of the data cache. The latter used to get no L3 way allocation on boot on all platforms that existed at the time, so oword read messages wouldn't get cached on L3 regardless of the MOCS bits, what probably explains the apparent slowness of oword fetches. Constant cache loads seem to perform better than SIMD4x2 sampler loads in a number of cases, they alleviate some of the cache thrashing caused by the competition with textures for the L1/L2 sampler caches, and they allow fetching up to 128B worth of constants with a single oword fetch message. Note that IVB devices suffer from a hardware bug that leads to serialization of L3 read requests overlapping the same cacheline as result of a (on IVB buggy) mechanism of the L3 to preserve coherency. Since read requests for matching cachelines from any L3 client are not pipelined, throughput may decrease in cases where there are no non-overlapping requests left in the queue that can be processed between them. This situation should be relatively uncommon as long as we make sure that we don't use the 1/2 oword messages in cases where the shader intends to read from any other location of the same cacheline at some other point. This is generally a good idea anyway on all generations because using the 1 and 2 oword messages is expected to waste bandwidth since the minimum L3 request size for the DC is exactly 4 owords (i.e. one cacheline). A future commit will have this effect. I haven't been able to find any real-world example where this would still result in a regression on IVB, but if someone happens to find one it shouldn't be too difficult to add an IVB-specific check to have it fall back to the sampler cache for pull constant loads. Note that on SKL+ this change has the additional benefit of reducing the register footprint of pull constant loads. The following table summarizes the effect of the whole series on several shader-db stats: Total instructions Total cycles BWR: 4571248 -> 4568342 (-0.06%) 123375740 -> 123373296 (-0.00%) ELK: 3989020 -> 3985402 (-0.09%) 98757068 -> 98754058 (-0.00%) ILK: 6383591 -> 6376787 (-0.11%) 143649910 -> 143648914 (-0.00%) SNB: 7528395 -> 7501446 (-0.36%) 103503796 -> 102460370 (-1.01%) IVB: 6949221 -> 6943317 (-0.08%) 60592262 -> 60584422 (-0.01%) HSW: 6409753 -> 6403702 (-0.09%) 60609070 -> 60604414 (-0.01%) BDW: 8043467 -> 7976364 (-0.83%) 68427730 -> 68483042 (0.08%) CHV: 8045019 -> 7977916 (-0.83%) 68297426 -> 68352756 (0.08%) SKL: 8204037 -> 7939086 (-3.23%) 66583900 -> 65624378 (-1.44%) Lost->Gained Total spills Total fills BWR: 5 -> 5 1488 -> 1488 (0.00%) 1957 -> 1957 (0.00%) ELK: 5 -> 5 1489 -> 1489 (0.00%) 1958 -> 1958 (0.00%) ILK: 1 -> 4 1449 -> 1449 (0.00%) 1921 -> 1921 (0.00%) SNB: 0 -> 0 549 -> 549 (0.00%) 52 -> 52 (0.00%) IVB: 13 -> 3 1271 -> 1271 (0.00%) 1162 -> 1162 (0.00%) HSW: 11 -> 0 1271 -> 1271 (0.00%) 1162 -> 1162 (0.00%) BDW: 12 -> 0 1340 -> 1340 (0.00%) 1452 -> 1452 (0.00%) CHV: 12 -> 0 1340 -> 1340 (0.00%) 1452 -> 1452 (0.00%) SKL: 0 -> 120 1269 -> 375 (-70.45%) 1563 -> 690 (-55.85%) v3: Non-trivial rebase. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fd249c803e3ae2acb83f5e3b7152728e73228b7b |
|
12-Dec-2016 |
Ilia Mirkin <imirkin@alum.mit.edu> |
treewide: s/comparitor/comparator/ git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g' Just happened to notice this in a patch that was sent and included one of the tokens in question. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6014da50ec41d1ad43fec94a625962ac3f2f10cb |
|
28-Nov-2016 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Rename opt_copy_propagate -> opt_copy_propagation. Matches the vec4 backend, cmod propagation, and saturate propagation. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e9f17e9fb06a4389588f47be8c766b07e8d8b89f |
|
25-Nov-2016 |
Lionel Landwerlin <lionel.g.landwerlin@intel.com> |
i965: enable INTEL_conservative_rasterization on Gen9+ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Forbes <chrisforbes@google.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0ff74a8990d9fe37365beb35ed8abacfbf3ed567 |
|
06-Dec-2016 |
Plamena Manolova <plamena.manolova@intel.com> |
i965: Add i965 plumbing for ARB_post_depth_coverage for i965 (gen9+). This extension allows the fragment shader to control whether values in gl_SampleMaskIn[] reflect the coverage after application of the early depth and stencil tests. Signed-off-by: Plamena Manolova <plamena.manolova@intel.com> Reviewed-by: Chris Forbes <chrisforbes@google.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
faf20df143a63e58aa729446f21c38ae39a438f2 |
|
29-Nov-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Refactor handling of constant tg4 offsets Previously, we had an OFFSET_VALUE source for logical texture instructions that was intended to mean exactly what it says, "offset". In reality, we only fully used it for tg4 offsets. We used offset_value.file == IMM to mean, "you have a constant offset, go look in instr->offset" and didn't actually use the contents of the register at all in that case except for in nir_emit_texture where we used it as a temporary before we copy it into instr->offset. This commit renames OFFSET_VALUE to TG4_OFFSET and restricts its usage to indirect tg4 offsets only. The nir_emit_texture code is refactored so that we explicitly build a header_bits value which is placed in instr->offset and the constant offset values (both for tg4 and regular texture operations) are used to construct header_bits and don't go through the offset source at all. Finally, we stop passing offset_value in to lower_sampler_logical_send_gen5 because we can't do indirect offsets until gen7 anyway. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b9df2251c17e3ce52fa55c81f492591e08c3ee04 |
|
25-Oct-2016 |
Anuj Phogat <anuj.phogat@gmail.com> |
i965: Fix GPU hang related to multiple render targets and alpha testing This patch should have been the part of commit e592f7df. In a situation when there are multiple render targets with alpha testing enabled, if fragment shader doesn't write to draw buffer zero, it causes the GPU hang on SKL. No GPU hang is seen on HSW. Simulator gives a warning for all gen6+ h/w: "Illegal render target write message length 0xa expected 0xc" This patch fixes the GPU hang as well as the simulator warning with new piglit test fbo-mrt-alphatest-no-buffer-zero-write: https://patchwork.freedesktop.org/patch/118212 No regressions in Jenkins CI system. Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
91d61fbf7cb61a44adcaae51ee08ad0dd6b2a03b |
|
20-Oct-2016 |
Timothy Arceri <timothy.arceri@collabora.com> |
i965: rewrite brw_setup_vue_interpolation() Here brw_setup_vue_interpolation() is rewritten not to use the InterpQualifier array in gl_fragment_program which will allow us to remove it. This change also makes the code which is only used by gen4/5 more self contained as it now has its own gen5_fragment_program struct rather than storing the map in brw_context. This means the interpolation map will only get processed once and will get stored in the in memory cache rather than being processed everytime the fs changes. Also by calling this from the fs compile code rather than from the upload code and using the interpolation assigned there we can get rid of the BRW_NEW_INTERPOLATION_MAP flag. It might not seem ideal to add a gen5_fragment_program struct however by the end of this series we will have gotten rid of all the brw_{shader_stage}_program structs and replaced them with a generic brw_program struct so there will only be two program structs which is better than what we have now. V2: Don't remove BRW_NEW_INTERPOLATION_MAP from dirty_bit_map until the following patch to fix build error. V3 - Suggestions by Jason: - name struct gen4_fragment_program rather than gen5_fragment_program - don't use enum with memset() - create interp mode set helper and simplify logic to call it - add assert when calling function to show prog will never be NULL for gen4/5 i.e. no Vulkan Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e1af20f18a86f52a9640faf2d4ff8a71b0a4fa9b |
|
13-Oct-2016 |
Timothy Arceri <timothy.arceri@collabora.com> |
nir/i965/anv/radv/gallium: make shader info a pointer When restoring something from shader cache we won't have and don't want to create a nir_shader this change detaches the two. There are other advantages such as being able to reuse the shader info populated by GLSL IR. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
59864e8e02057cc6fa0448a8af067a3cf53389da |
|
13-Oct-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Don't use nir_assign_var_locations for VS/TES/GS outputs. Fixes spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-dvec3. v2: Remove nir_outputs field from fs_visitor (caught by Tim and Iago). Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
27715c73ff84349466f62df0023863acd477f262 |
|
15-Oct-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Make split_virtual_grfs() call compact_virtual_grfs(). Post-splitting, VGRFs have a maximum size (MAX_VGRF_SIZE). This is required by the register allocator, as we have to create classes for each size of VGRF. We can (and do) allocate virtual registers larger than MAX_VGRF_SIZE, but we must ensure that they are splittable. split_virtual_grfs() asserts that the post-splitting register size is in range. Unfortunately, these trip for completely dead registers which are too large - we only set split points for live registers. So dead ones are never split, and if they happened to be too large, they'd trip asserts. To fix this, call compact_virtual_grfs() to eliminate dead registers before splitting. v2: Add a comment written by Iago. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e51e055fcdf8107aafaba358fa65b00f963e1728 |
|
09-Sep-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Introduce downcast helpers for prog_data structures. Similar to brw_context(...), intel_texture_object(...), and so on. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
df4ff31d3c1d907c237ed0e699deec1e24e8a9d3 |
|
04-Oct-2016 |
Timothy Arceri <timothy.arceri@collabora.com> |
i965: add MAYBE_UNUSED to assert param This fixes an unused variable warning on release builds. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
da274ba5f88ca76bb2e4369967cea381b9f219e4 |
|
09-Sep-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Drop pointless stage == MESA_SHADER_FRAGMENT checks. There's an assert right above this. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e5311ba1acba738346a18ef661b0f8bbc33bba8e |
|
16-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/ir: Test thread dispatch packing assumptions. Not [originally] intended for upstream. Should cause a GPU hang if some thread is executed with a non-contiguous dispatch mask breaking assumptions of brw_stage_has_packed_dispatch(). Doesn't cause any CTS, DEQP or Piglit regressions, while replacing brw_stage_has_packed_dispatch() with a dummy implementation that unconditionally returns true on top of this patch causes multiple GPU hangs. v2: Refactor into a separate function instead of emitting the test code directly from emit_nir_code(), drop VEC4 test and clean up slightly for upstream. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f57f526fc5cfaedf26b2becf8f1899d5de0d0461 |
|
16-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/ir: Skip eliminate_find_live_channel() for stages with sparse thread dispatch. The eliminate_find_live_channel optimization eliminates FIND_LIVE_CHANNEL instructions in cases where control flow is known to be uniform, and replaces them with 'MOV 0', which in turn unblocks subsequent elimination of the BROADCAST instruction frequently used on the result of FIND_LIVE_CHANNEL. This is however not correct in per-sample fragment shader dispatch because the PSD can dispatch a fully unlit sample under certain conditions. Disable the optimization in that case. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> v2: Add devinfo argument to brw_stage_has_packed_dispatch() to implement hardware generation check.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a2392cee48076f1fe6feab7d49214990cfa6a551 |
|
15-Sep-2016 |
Jason Ekstrand <jason@jlekstrand.net> |
i965/reg: Make brw_sr0_reg take a subnr and return a vec1 reg The state register sr0 is really a collection of dwords not a SIMD8 anything. It's much more convenient for brw_sr0_reg to return the particular dword you're looking for rather than a giant blob you have to massage into what you want. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> [ Francisco Jerez: Trivial simplification of brw_ud1_reg(). ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
111f6b250d01fa1937103f24b5cb54b15dd77fbf |
|
14-Sep-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/nir: Roll set_default_interpolation into lower_fs_inputs Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
246db0063eb6e01aad961b1c73d32fca911ae1df |
|
14-Sep-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use NIR for handling forced per-sample interpolation Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
114874b22beafb2d07006b197c62d717fc7f80cc |
|
14-Sep-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use sample interpolation for interpolateAtCentroid in persample mode From the ARB_gpu_shader5 spec: The built-in functions interpolateAtCentroid() and interpolateAtSample() will sample variables as though they were declared with the "centroid" or "sample" qualifiers, respectively. When running with persample dispatch forced by the API, we interpolate anything that isn't flat as if it's qualified by "sample". In order to keep interpolateAtCentroid() consistent with the "centroid" qualifier, we need to make interpolateAtCentroid() do sample interpolation instead. Nothing in the GLSL spec guarantees that the result of interpolateAtCentroid is uniform across samples in any way, so this is a perfectly fine thing to do. Fixes 8 of the new dEQP-VK.pipeline.multisample_interpolation.* Vulkan CTS tests that specifically validate consistency between the "sample" qualifier and interpolateAtSample() Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Cc: "12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
eb746a80e5e99bafd3957a1cb2d9db8548a1a6be |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/ir: Update several stale comments. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
47784e2346b56bea6a1111fecaa953239ff198ca |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/ir: Don't print ARF subnr values twice. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ec259f5307bc801f8482f2825ca9d52fe5ead95e |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Print fs_reg::offset field consistently for all register files. The offset printing code in fs_visitor::dump_instruction() was doing things differently for sources and destinations and for each register file -- In some cases it would be added to the base register number fs_reg::nr, in other cases it would follow the base register separated with a plus sign, in other cases (uniforms) it would do both (!). The sub-register offset was also being printed or not rather inconsistently. Fix it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
80e1d670b4b4c080ce2092a3b52d2415bc4c6a42 |
|
01-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Get rid of fs_inst::set_smear(). component() was generally a better alternative because of several issues set_smear() had: - It wouldn't take the original stride and offset of the register into account, which means that set_smear() on the result of e.g. another set_smear() call or an offset() call would give a bogus region as result. - It was an inherently destructive operation. See the 'nir_intrinsic_shader_clock' hunk below for how this could lead to subtle bugs in cases where set_smear() was called multiple times on the same register like 'r.set_smear(0), r.set_smear(1)' with the expectation that each call would return a separate value instead of a reference to the same subsequently mutated object. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8e58e4412f97be9c3b07d7a7d72d3884606411a2 |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Use region_contained_in() in compute-to-mrf coalescing pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2d7d4a791083ff63f37ac1e40bfe8b448e7f8045 |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Simplify a bunch of fs_inst::size_written calculations by using component_size(). Using component_size() is easier and generally more correct because it takes into account the register type and stride for you. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
62aaef6c83e4eb354bd7f15803db01e90d22fc34 |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Simplify and fix buggy stride/offset calculations using subscript(). These were bashing the 'offset' and 'stride' values of several registers without taking the previous value into account, which probably didn't matter in practice for optimize_frontfacing_ternary() because the 'tmp' register already had a known region, but it would have given the wrong region as result in the other cases in lower_integer_multiplication(). subscript(..., i) is a more straightforward way to take the i-th field of a given type from each channel of a register which should give the right answer as result regardless of the original 'offset' and 'stride' parameters of the register region. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3b7b90878770530ad3da44c6beb1401c40f1ffd6 |
|
07-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Simplify get_fpu_lowered_simd_width() by using inequalities instead of rounding. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bae3a411712d815bf8b8d4526c72c174512086d3 |
|
08-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix signedness of the return value of fs_inst::size_read(). Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a384503c156e182560104e6c43a6bf0c64608791 |
|
03-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Switch mask_relative_to() used in compute-to-mrf to byte units. This makes the helper function less annoying to use and somewhat more accurate. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
401fc228fd7214086ced0a887bbbefd2e60948fa |
|
03-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix bogus sub-MRF offset calculation in compute-to-mrf. The 'scan_inst->dst.offset % REG_SIZE' term in the final 'scan_inst->dst.offset' calculation is obviously bogus. The offset from the start of the copy destination register 'inst->dst' where the destination of the generating instruction 'scan_inst' would be written to (before compute-to-mrf runs) is just the offset of 'scan_inst->dst' relative to the source of the copy instruction (AKA rel_offset in the code below). Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cd0134072a7e088cf1ebcf1c4250aa13ac8a5c59 |
|
03-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Take into account copy register offset during compute-to-mrf. This was dropping 'inst->dst.offset' on the floor. Nothing in the code above seems to guarantee that it's zero and in that case the offset of the register being coalesced into wouldn't be taken into account while rewriting the generating instruction. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b42c13a5b8ac7d643bbf4c1592607811a81b4ebb |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Drop fs_inst::overwrites_reg() in favor of regions_overlap(). fs_inst::overwrites_reg is rather easy to misuse because it cannot tell how large the register region starting at 'reg' is, so in cases where the destination region starts after 'reg' it may give a misleading result. regions_overlap() is somewhat more verbose to use but handles arbitrary overlap correctly so it should generally be used instead. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3a4ea7cf803cb5af2b7d0e7d71ee4825294a94aa |
|
03-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Don't consider LOAD_PAYLOAD with stride > 1 source to behave like a raw copy. Noticed the problem by inspection while typing in the previous commit. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1164aa1a1ba9d140a2b1435703b0029e0fe69f6f |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Don't consider LOAD_PAYLOAD with sub-GRF offset to behave like a raw copy. This was likely the original intention, and at least register coalesce relies on it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
717d8efd584d8db7fbbdbe7deb51371e28d6c492 |
|
07-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Take into account misalignment in regs_written() and regs_read(). There was a workaround for this in fs_inst::size_read() for the SHADER_OPCODE_MOV_INDIRECT instruction and FIXED_GRF register file *only*. We should take this possibility into account for the sources and destinations of all instructions on all optimization passes that need to quantize dataflow in 32B increments by adding the amount of misalignment to the size read or written from the regs_read() and regs_written() helpers respectively. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d6b60934aaf2d525f7d1072c0c21af8468254647 |
|
07-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Return more accurate read size for LINTERP from fs_inst::size_read. The LINTERP virtual instruction only reads three scalar components from the first 16B of the second source, we can now teach size_read() about it since its return value is represented with byte granularity. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
31a40202b8bdf8bb65d33862144a03610fd57e3f |
|
03-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Return more accurate read size from fs_inst::size_read for IMM and UNIFORM files. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e1a918ba7be6b21303caa2d81671f2d3f17dd692 |
|
08-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Replace fs_inst::regs_read with ::size_read using byte units. The previous regs_read value can be recovered by rewriting each reference of regs_read() like 'x = i.regs_read(j)' to 'x = DIV_ROUND_UP(i.size_read(j), reg_unit)'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
69570bbad876bb9da609c3b651aacda28cecc542 |
|
07-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Replace fs_inst::regs_written with ::size_written field in bytes. The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c458eeb94620fbce0a37474fc292545002d67f76 |
|
08-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Add wrapper functions for fs_inst::regs_read and ::regs_written. This is in preparation for dropping fs_inst::regs_read and ::regs_written in favor of more accurate alternatives expressed in byte units. The main reason these wrappers are useful is that a number of optimization passes implement dataflow analysis with register granularity, so these helpers will come in handy once we've switched register offsets and sizes to the byte representation. The wrapper functions will also make sure that GRF misalignment (currently neglected by most of the back-end) is taken into account correctly in the calculation of regs_read and regs_written. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
be095e11e41158f91bcb3f6fcbc2e2a91a5d9124 |
|
02-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Replace fs_reg::subreg_offset with fs_reg::offset expressed in bytes. The fs_reg::subreg_offset and ::offset fields are now redundant, the sub-GRF offset can just be added to the single ::offset field expressed in byte units. The current subreg_offset value can be recovered by applying the following rule: Replace each rvalue reference of subreg_offset like 'x = r.subreg_offset' with 'x = r.offset % reg_unit', and each lvalue reference like 'r.subreg_offset = x' with 'r.offset = ROUND_DOWN_TO(r.offset, reg_unit) + x'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
86944e063ad40cac0860bfd85a3cc4e9a9805aa3 |
|
01-Sep-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes. The fs_reg::offset field in byte units introduced in this patch is a more straightforward alternative to the current register offset representation split between fs_reg::reg_offset and ::subreg_offset. The split representation makes it too easy to forget about one of the offsets while dealing with the other, which has led to multiple back-end bugs in the past. To make the matter worse the unit reg_offset was expressed in was rather inconsistent, for uniforms it would be expressed in either 4B or 16B units depending on the back-end, and for most other things it would be expressed in 32B units. This encodes reg_offset as a new offset field expressed consistently in byte units. Each rvalue reference of reg_offset in existing code like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and each lvalue reference like 'r.reg_offset = x' is rewritten to 'r.offset = r.offset % reg_unit + x * reg_unit'. Because the change affects a lot of places and is rather non-trivial to verify due to the inconsistent value of reg_unit, I've tried to avoid making any additional changes other than applying the rewrite rule above in order to keep the patch as simple as possible, sometimes at the cost of introducing obvious stupidity (e.g. algebraic expressions that could be simplified given some knowledge of the context) -- I'll clean those up later on in a second pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
175ac629be1396fb8566836e32961a22fc5cca21 |
|
08-Sep-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Fail the shader compile instead of asserting when we can't spill Blorp doesn't handle spilling so we set allow_spilling to false in that case. The blorp 16x MSAA resolve shader spills in 16-wide but not 8-wide. This commit makes it so that we fail the 16-wide compile and successfully fall back to 8-wide instead of just assert-failing when trying to compile the 16-wide shader. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
527f37199929932300acc1688d8160e1f3b1d753 |
|
23-Aug-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
intel: s/brw_device_info/gen_device_info/ Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
db123df74773f458e573a9c034ee783570a3ed0f |
|
22-Jul-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Define logical framebuffer read opcode and lower it to physical reads. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f2f75b0cf05d2519d618c71b19d2187b8ed0d545 |
|
22-Jul-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Define framebuffer read virtual opcode. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fe6abb5755e0368c993e6f7cf25a0712ee6503a9 |
|
22-Jul-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Emit interpolation setup if non-coherent framebuffer fetch is in use. This will be required for the next commit since the non-coherent path makes use of the fragment coordinates implicitly, so they need to be calculated. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
98d61ee083de57da6b97c9fcf67003f56f5f5a6b |
|
22-Jul-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Force per-sample dispatch if the shader reads from a multisample FBO. The result of a framebuffer fetch from a multisample FBO is inherently per-sample, so the spec requires at least those sections of the shader that depend on the framebuffer fetch result to be executed once per sample. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b2b621a0ec57f08586b9afcf666c0eadc0993ca0 |
|
08-Aug-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Switch to per-subspan discard jumps. ANY4H is more efficient than ANY8H and ANY16H because it makes sure that whenever a whole subspan hits a discard statement it gets disabled by the EU until the end of the program, regardless of whether the discard condition is uniform across all channels of the SIMD8-16 thread. OTOH ANY8H/ANY16H would cause the rest of the program to be executed for *all* channels if only one of the channels hadn't taken the discard branch, potentially increasing the bandwidth and ALU usage of the program unnecessarily. This change increases the FPS by over 3x of a simple micro-benchmark that discards a bunch of fragments and then does a single costly texturing operation. I've just re-verified the FPS change on HSW and SKL, but I expect all platforms from Gen6 up to get a similar benefit. Note that we could potentially be more aggressive and use the NORMAL predicate to discard individual channels, but that would need to happen post-scheduling because the scheduler currently doesn't care to reorder HALT instructions with respect to other instructions, and the NORMAL predicate would cause the results of subsequent derivative computations to become undefined -- If the scheduler didn't reorder HALT instructions it would actually be safe to switch to NORMAL because the behavior of derivative computations after a non-uniform discard statement is undefined by the GLSL spec, but that would make the optimization implemented by one of the following commits somewhat more difficult. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4d436c011fd9f7ebcadbaebef05090d2056e9d48 |
|
12-Aug-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Estimate maximum sampler message execution size more accurately. The current logic used to determine the execution size of sampler messages was based on special-casing several argument and opcode combinations, which unsurprisingly missed the possibility that some messages could exceed the payload size limit or not depending on the number of coordinate components present. In particular: - The TXL, TXB and TEX messages (the latter on non-FS stages only) would attempt to use SIMD16 on Gen7+ hardware even if a shadow reference was present and the texture was a cubemap array, causing it to overflow the maximum supported sampler payload size and crash. - The TG4_OFFSET message with shadow comparison was falling back to SIMD8 regardless of the number of coordinate components, which is unnecessary when two coordinates or less are present. Both cases have been handled incorrectly ever since cubemap arrays and texture gather were respectively enabled (the current logic used by the SIMD lowering pass is almost unchanged from the previous no16 fall-back logic used pre-SIMD lowering times). Fixes the following GL4.5 conformance test on Gen7-8 (the bug also affects Gen9+ in principle, but SKL passes the test by luck because it manages to use the TXL_LZ message instead of TXL): GL45-CTS.texture_cube_map_array.sampling Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97267 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
61a02fb74c07d574b726a8b27517a02251aa4be4 |
|
13-Aug-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Return zero from fs_inst::components_read for non-present sources. This makes it easier for the caller to find out how many scalar components are actually read by the instruction. As a bonus we no longer need to special-case BAD_FILE in the implementation of fs_inst::regs_read. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0c754d1c4203d87dbb9d2dd882ef42686e6d01ec |
|
12-Aug-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Lower TEX to TXL during NIR translation. This simplifies the code slightly and will allow the SIMD lowering pass to find out easily what the actual texturing opcode is in order to determine the maximum execution size of texturing instructions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cec377eed3ab6420679dceef98ad0eea27b5f644 |
|
01-Aug-2016 |
Timothy Arceri <timothy.arceri@collabora.com> |
i965: fix comparison warning Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ebdc82d06532f992aea592265c29a11330e698fa |
|
26-Jul-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix move_interpolation_to_top() pass. The pass I introduced in commit a2dc11a7818c04d8dc0324e8fcba98d60bae was entirely broken. A missing "break" made the load_interpolated_input case always fall through to "default" and hit a "continue", making it not actually move any load_interpolated_input intrinsics at all. It would only move the simple load_barycentric_* intrinsics, which don't emit any code anyway, making it basically useless. The initial version I sent of the pass worked, but I apparently failed to verify that the simplified version in v2 actually worked. With the obvious fix applied (so we actually tried to move load_interpolated_input intrinsics), I discovered a second bug: we weren't moving the offset SSA def to the top, breaking SSA validation. The new version of the pass actually moves load_interpolated_input intrinsics and all their dependencies, as intended. Papers over GPU hangs on Ivybridge and Baytrail caused by the recent NIR FS input rework by restoring the old behavior. (I'm not honestly sure why they hang with PLN not at the top.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97083 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2db357e4c3dcb49deabae7b68721d57ad9ea0000 |
|
21-Jul-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Include VUE handles for GS with invocations > 1. We always resort to the pull model for instanced GS inputs. So, we'd better include the VUE handles, or else we can't actually pull anything. Ian reports that on his branch with OES_geometry_shader enabled, this fixes a bunch of dEQP-GLES31.functional.geometry_shading tests:: - instanced.draw_2_instances_geometry_2_invocations - instanced.draw_2_instances_geometry_8_invocations - instanced.draw_4_instances_geometry_2_invocations - instanced.draw_4_instances_geometry_8_invocations - instanced.draw_8_instances_geometry_2_invocations - instanced.draw_8_instances_geometry_8_invocations - instanced.geometry_2_invocations - instanced.geometry_32_invocations - instanced.geometry_8_invocations - instanced.geometry_max_invocations - instanced.geometry_output_different_2_invocations - instanced.geometry_output_different_32_invocations - instanced.geometry_output_different_8_invocations - instanced.geometry_output_different_max_invocations - instanced.invocation_output_vary_by_attribute - instanced.invocation_output_vary_by_texture - instanced.invocation_output_vary_by_uniform - query.primitives_generated_instanced Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Tested-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
09e46f99ad465ab253de3fc321f39062cfbe1984 |
|
19-Jul-2016 |
Timothy Arceri <timothy.arceri@collabora.com> |
i965: bring back type_size_vec4_times_4() We will use this for output varyings. To make component packing simpler we will just treat all varyings as vec4s. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
160820995210e0b85fd25821f5ae785d6a539e08 |
|
16-Jul-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Delete the FS_OPCODE_INTERPOLATE_AT_CENTROID virtual opcode. We no longer use this message. As far as I can tell, it's fairly useless - the equivalent information is provided in the payload. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisforbes@google.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1eef0b73aa323d94d5a080cd1efa81ccacdbd0d2 |
|
12-Jul-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Rewrite FS input handling to use the new NIR intrinsics. This eliminates the need to walk the list of input variables, recurse into their types (via logic largely redundant with nir_lower_io), and interpolate all possible inputs up front. The backend no longer has to care about variables at all, which eliminates complications from trying to pack multiple variables into the same location. Instead, each intrinsic specifies exactly what's needed. This should unblock Timothy's work on GL_ARB_enhanced_layouts. Each load_interpolated_input intrinsic corresponds to PLN instructions, while load_barycentric_at_* intrinsics correspond to pixel interpolator messages. The pixel/centroid/sample barycentric intrinsics simply refer to payload fields (delta_xy[]), and don't actually generate any code. Because we use a single intrinsic for both centroid-qualified variables and interpolateAtCentroid(), they become indistinguishable. We stop sending pixel interpolator messages for those, and instead use the payload provided data, which should be considerably faster. On Broadwell: total instructions in shared programs: 9067751 -> 9067570 (-0.00%) instructions in affected programs: 145902 -> 145721 (-0.12%) helped: 422 HURT: 209 total spills in shared programs: 2849 -> 2899 (1.76%) spills in affected programs: 760 -> 810 (6.58%) helped: 0 HURT: 10 total fills in shared programs: 3910 -> 3950 (1.02%) fills in affected programs: 617 -> 657 (6.48%) helped: 0 HURT: 10 LOST: 3 GAINED: 3 The differences mostly appear to be slight changes in MOVs. v2: Use nir_shader_compiler_options::use_interpolated_input_intrinsics flag rather than passing it directly to nir_lower_io. Use the unreachable() macro rather than assert in one place. (Review feedback from Chris Forbes.) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisforbes@google.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a2dc11a7818c04d8dc0324e8fcba98d60baea529 |
|
18-Jul-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Move load_interpolated_input/barycentric_* intrinsics to the top. Currently, i965 interpolates all FS inputs at the top of the program. This has advantages and disadvantages, but I'd like to keep that policy while reworking this code. We can consider changing it independently. The next patch will make the compiler generate PLN instructions "on the fly", when it encounters an input load intrinsic, rather than doing it for all inputs at the start of the program. To emulate this behavior, we introduce an ugly pass to move all NIR load_interpolated_input and payload-based (not interpolator message) load_barycentric_* intrinsics to the shader's start block. This helps avoid regressions in shader-db for cases such as: if (...) { ...load some input... } else { ...load that same input... } which CSE can't handle, because there's no dominance relationship between the two loads. Because the start block dominates all others, we can CSE all inputs and emit PLNs exactly once, as we did before. Ideally, global value numbering would eliminate these redundant loads, while not forcing them all the way to the start block. When that lands, we should consider dropping this hacky pass. Again, this pass currently does nothing, as i965 doesn't generate these intrinsics yet. But it will shortly, and I figured I'd separate this code as it's relatively self-contained. v2: Dramatically simplify pass - instead of creating new instructions, just remove/re-insert their list nodes (suggested by Jason Ekstrand). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisforbes@google.com> [v1] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
048a56c1fc8f66e74645cc5ff4b4eb3d5ee471a8 |
|
18-Jul-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Add a pass to demote sample interpolation intrinsics. When working with a non-multisampled render target, asking for "sample" interpolation locations doesn't make sense. We demote them to centroid. In a couple of patches, brw_compute_barycentric_modes will begin looking at these intrinsics to determine the barycentric modes. fs_visitor also will use them to code-generate pixel interpolator messages or payload references. Handling the "but what if it's not MSAA?" logic ahead of time in a NIR pass simplifies things and prevents duplicated logic. This patch doesn't actually do anything useful yet as we don't generate these intrinsics. I decided to keep it separate as it's self-contained, in the hopes of shrinking the "convert everything" patch for reviewers. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisforbes@google.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d7a47a76e0c6ccd1765f4c10c390e7d4f5f86414 |
|
28-Jun-2016 |
Ian Romanick <ian.d.romanick@intel.com> |
i965: Update assertion to account for Gen < 7 Previously SHADER_OPCODE_MULH could only exist on Gen7+, so the assertion assumed the Gen7+ accumulator rules. A future patch will allow this instruction on at least Gen6, so update the assertion. v2: Use get_lowered_simd_width instead of open coding it. Suggested by Curro. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7ef7738a61ded5632105b8de6f8141307592e20a |
|
15-Jul-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Write gl_FragCoord directly to the destination. This patch makes emit_general_interpolation take a destination register as an argument, and write directly to that. This is simpler than the old approach of ralloc'ing a register, writing to that temporary, and then making the caller emit per-component MOVs to copy it to the actual destination. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a03812c32188f6d29d386165ca02771fe0865352 |
|
15-Jul-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Drop has_pln checks in unlit centroid workaround. The unlit centroid workaround starts being necessary on Gen6, which is the first platform with multisampling. PLN exists on G45+, so all platforms which need this workaround have PLN. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b94890c19fa82003a03f960d9c3de091756233ac |
|
14-Jul-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Drop VARYING_SLOT_FACE special case in barycentric setup. glsl_to_nir always produces a system value for gl_FrontFacing, rather than an input. So there should never be an input with this slot, making this code dead. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ac1181ffbef5250cb3b651e047cce5116727c34c |
|
07-Jul-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
compiler: Rename INTERP_QUALIFIER_* to INTERP_MODE_*. Likewise, rename the enum type to glsl_interp_mode. Beyond the GLSL front-end, talking about "interpolation modes" seems more natural than "interpolation qualifiers" - in the IR, we're removed from how exactly the source language specifies how to interpolate an input. Also, SPIR-V calls these "decorations" rather than "qualifiers". Generated by: $ find . -regextype egrep -regex '.*\.(c|cpp|h)' -type f -exec sed -i \ -e 's/INTERP_QUALIFIER_/INTERP_MODE_/g' \ -e 's/glsl_interp_qualifier/glsl_interp_mode/g' {} \; Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f05770121fb165b28b06af9c502dd21300dee530 |
|
12-Jul-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Remove the emit_linterp() helper. Rather than computing the barycentric mode each time we emit a LINTERP, we can simply compute it once, as soon as we know we're doing non-flat interpolation. At that point, emit_linterp() doesn't do much, so fold it into the call sites and drop it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
203243f5ffe438c7f7b5f92d8bc177b76880bf5b |
|
12-Jul-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Reduce the number of fs_reg(brw_reg) calls in LINTERP handling. A bit tidier. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
eefbbb943e81b182a1c5ef6cac8425686f5b636c |
|
12-Jul-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Make a barycentric_mode() helper function. This combines two copies of basically the same code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
783511e605160bcfc9132b6fbc83c8816262effd |
|
12-Jul-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Rename brw_wm_barycentric_interp_mode to brw_barycentric_mode. brw_wm_barycentric_interp_mode is wordy, brw_barycentric_mode is less typing and suffers from fewer line wrapping problems. The enum values themselves don't really benefit from "WM" in the name, either. Put "BARYCENTRIC" first instead of at the end and drop "WM". Generated by: for file in *.c *.cpp *.h; do sed -i \ -e 's/brw_wm_barycentric_interp_mode/brw_barycentric_mode/g' \ -e 's/BRW_WM_\([A-Z_]*\)_BARYCENTRIC/BRW_BARYCENTRIC_\1/g' \ -e 's/BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT/BRW_BARYCENTRIC_MODE_COUNT/g' \ $file; done with a few whitespace changes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2d6dd30a9b30cbbd12a32122249dbd0963209bf1 |
|
07-Jul-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Handle default interpolation modes and locations in NIR. This consolidates a bunch of hacks in a single place - by setting the interpolation modes and locations on variables appropriately, we can simply trust them in the rest of the code. This avoids having to handle INTERP_QUALIFIER_NONE, gl_Color overrides, sample-shading overrides, and Gen4-5 centroid-overrides in a bunch of places. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a2bd7334ed4faba5fc1cf3cad7e119f560c2c904 |
|
08-Jul-2016 |
Samuel Iglesias Gonsálvez <siglesias@igalia.com> |
i965/fs: do d2x lowering before simd splitting So that we can have gen7 split large writes produced by this lowering pass. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
376d7ee5874615c8e4208de3e70983a002617e26 |
|
01-Apr-2016 |
Iago Toral Quiroga <itoral@igalia.com> |
i965/fs: do pack lowering before simd splitting So that we can have gen7 split large writes produced by the pack lowering. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
aa4796ae815f38ff44283476f3553edc06114e80 |
|
30-Mar-2016 |
Iago Toral Quiroga <itoral@igalia.com> |
i965/fs/gen7: split instructions that run into exec masking bugs In fp64 we can produce code like this: mov(16) vgrf2<2>:UD, vgrf3<2>:UD That our simd lowering pass would typically split in instructions with a width of 8, writing to two consecutive registers each. Unfortunately, gen7 hardware has a bug affecting execution masking and as a result, the second GRF register write won't work properly. Curro verified this: "The problem is that pre-Gen8 EUs are hardwired to use the QtrCtrl+1 (where QtrCtrl is the 8-bit quarter of the execution mask signals specified in the instruction control fields) for the second compressed half of any single-precision instruction (for double-precision instructions it's hardwired to use NibCtrl+1, at least on HSW), which means that the EU will apply the wrong execution controls for the second sequential GRF write if the number of channels per GRF is not exactly eight in single-precision mode (or four in double-float mode)." In practice, this means that we cannot write more than one consecutive GRF in a single instruction if the number of channels per GRF is not exactly eight in single-precision mode (or four in double-float mode). This patch makes our SIMD lowering pass split this kind of instructions so that the split versions only write to a single register. In the example above this means that we split the write in 4 instructions, each one writing 4 UD elements (width = 4) to a single register. v2 (Curro): - Make explicit that the thing about hardwiring NibCtrl+1 for the second compressed half is known to happen in Haswell and the issue with IVB might not be exactly the same. - Assign max_width instead of returning early so that we can handle multiple restrictions affecting to the same instruction. - Avoid division by 0 if the instruction does not write any registers. - Ignore instructions what have WE_all set. - Use the instruction execution type size instead of the dst type size. v3 (Curro): - Move the implementation down so it is not placed in the middle of another workaround. - Declare channels_per_grf as const. - Don't break the loop early if we find a BAD_FILE source. - Fix the number of channels that the hardware shifts for the second half of a compressed instruction to be 8 in single precision and 4 in double precision. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
034bd2532775a1f7da5379a523621458e273f619 |
|
26-Jun-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Print EOT in fs_visitor::dump_instruction(). This was useful when debugging the previous commit's issue. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
192813e50ee8888a9012f5adce3003d0ca2aee22 |
|
23-Jun-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Delete send-from-GRF only opcodes from implied_mrf_writes(). These only exist post-Sandybridge, and always use send-from-GRF. So inst->base_mrf will be -1, and we will have already returned 0. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
255cff76d961e56199acab2ab523140e43ea2de2 |
|
23-Jun-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Drop unnecessary inst->base_mrf = -1 assignments. These are now unnecessary, as base_mrf is -1 by default. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3e04e3758e90b2a65eaefb95155d43605f506961 |
|
23-Jun-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Set fs_inst::base_mrf = -1 by default. On MRF platforms, we need to set base_mrf to the first MRF value we'd like to use for the message. On send-from-GRF platforms, we set it to -1 to indicate that the operation doesn't use MRFs. As MRF platforms are becoming increasingly a thing of the past, we've forgotten to bother with this. It makes more sense to set it to -1 by default, so we don't have to think about it for new code. I searched the code for every instance of 'mlen =' in brw_fs*cpp, and it appears that all MRF-based messages correctly program a base_mrf. Forgetting to set base_mrf = -1 can confuse the register allocator, causing it to think we have a large fake-MRF region. This ends up moving the send-with-EOT registers earlier, sometimes even out of the g112-g127 range, which is illegal. For example, this fixes illegal sends in Piglit's arb_gpu_shader_fp64-layout-std430-fp64-shader, which had SSBO messages with mlen > 0 but base_mrf == 0. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
60a27ad122128145d28be37e9c0b0bc86a8e5181 |
|
23-Jun-2016 |
Giuseppe Bilotta <giuseppe.bilotta@gmail.com> |
Remove wrongly repeated words in comments Clean up misrepetitions ('if if', 'the the' etc) found throughout the comments. This has been done manually, after grepping case-insensitively for duplicate if, is, the, then, do, for, an, plus a few other typos corrected in fly-by v2: * proper commit message and non-joke title; * replace two 'as is' followed by 'is' to 'as-is'. v3: * 'a integer' => 'an integer' and similar (originally spotted by Jason Ekstrand, I fixed a few other similar ones while at it) Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Chad Versace <chad.versace@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0195299c868ec99bc6c595c641da81bb2632252e |
|
07-Jun-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use a default Y coordinate of 0 for TXF on gen9+ Previously, we were incrementing length but not actually putting anything in the Y coordinate. This meant that 1-D TXF operations had a garbage array index. If the surface is emitted as 1-D non-array, the coordinate gets discarded and it works fine. If it happens to be bound as an array surface, it may count as an out-of-bounds array access and you get zero. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
40013c50333caf7a4a66204ac29695aad0d9b06d |
|
14-Jun-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Reorganize prog_data->total_scratch code a bit. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cd89c834a8b3b4e5f5874c8e1f90c9b01d541181 |
|
09-Jun-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix multiplication of immediates on Cherryview/Broxton. Cherryview and Broxton don't support DW x DW multiplication. We have piles of code to handle this, but apparently weren't retyping in the immediate case. For example, tests/spec/arb_tessellation_shader/execution/dvec3-vs-tcs-tes makes the simulator angry about instructions such as: mul(8) r18<1>:D r10.0<8;8,1>:D 0x00000003:D Just retype to W or UW. It should be safe on all platforms. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462 Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a8a9d1bf41c00123cefb6e757f3509c62e880a15 |
|
14-Jun-2016 |
Timothy Arceri <timothy.arceri@collabora.com> |
i965: remove type_size_vec4_times_4() type_size_vec4_times_4() was introduced as a fix in 8dcf807cb43383 however since 3810c1561 we can just use type_size_scalar() and get the actual number of outputs we need. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bd9f9726519fad94e88b9266b0c255aa00251f4d |
|
11-Jun-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix regs_written for SIMD-lowered instructions some more. ISTR having suggested this during review of the recent FP64 changes to the SIMD lowering pass, but it doesn't look like it was taken into account in the end. Using the fs_reg::component_size helper instead of this open-coded variant makes sure that the stride is taken into account correctly. Fixes at least the following piglit tests with spilling forced on (since otherwise regs_written would be calculated incorrectly and the spilling code would be rather confused about how much data needs to be spilled): spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-shader spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-mixed-shader Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1db37ebecf5af55215ace3801f8dbb8b10c5305e |
|
10-Jun-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Assert that the scratch spaces are in range. I don't know that anything actually guarantees this, but if we exceed the limits, we may end up overflowing and trashing random buffers that happen to be nearby in the VMA space, leading to rendering corruption, hangs, or worse. We should really fix this properly. However, the pitfall has existed for ages, so for now we should at least detect it. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a42a93dc123163f84058f3886e5ce1b02b9856f5 |
|
10-Jun-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix CS scratch size calculations on Ivybridge and Baytrail. These are linear, not powers of two, and much more limited. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
147a90d82a5de637f968e0d5f383cabcb792f1ce |
|
10-Jun-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix Haswell CS per-thread scratch space encoding. Most scratch stages use power of two sizes, in kilobytes, where 0 means 1kB. But compute shaders on Haswell have a minimum of 2kB, and use a representation where 0 = 2kB. This meant that we were effectively telling the hardware to allocate each thread twice as much space as we meant to, while simultaneously not allocating that much space in the buffer, leading to overflows. Note that the existing code is completely wrong for Ivybridge, but that will take additional work to sort out, so I've left it as is for now. A subsequent commit will take care of that. Together with the previous patches, this fixes rendering corruption on Synmark's Gl43CSDof on Haswell. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cb30727648fea301cfff1647d947bfab540c3bf6 |
|
26-May-2016 |
Samuel Iglesias Gonsálvez <siglesias@igalia.com> |
i965/fs: fix FS_OPCODE_CINTERP for unpacked double input varyings Data starts at suboffet 3 in 32-bit units (12 bytes), so it is not 64-bit aligned and the current implementation fails to read the data properly. Instead, when there is is a double input varying, read it as vector of floats with twice the number of components. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> Cc: "12.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
060c8d245deb83aeb412de98810cad6052aafb78 |
|
27-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Reindent emit_zip(). Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7aa76d66a1f5edad9e8c1d54aafdce99ffa6c345 |
|
27-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Skip SIMD lowering destination zipping if possible. Skipping the temporary allocation and copy instructions is easy (just return dst), but the conditions used to find out whether the copy can be optimized out safely without breaking the program are rather complex: The destination must be exactly one component of at most the execution width of the lowered instruction, and all source regions of the instruction must be either fully disjoint from the destination or be aligned with it group by group. v2: Don't handle partial source-destination overlap for simplicity (Jason). No instruction count regressions with respect to v1 in either shader-db or the few FP64 shader_runner test-cases with partial overlap I've checked manually. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0a3acff5b53d409181dcd2f31a4a50af06f73a57 |
|
23-May-2016 |
Jordan Justen <jordan.l.justen@intel.com> |
i965: Remove old CS local ID handling The old method pushed data for each channels uvec3 data of gl_LocalInvocationID. The new method pushes 1 dword of data that is a 'thread local ID' value. Based on that value, we can generate gl_LocalInvocationIndex and gl_LocalInvocationID with some calculations. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b1f22c6317940dac543e44dd638ea9f4fbcd6ca7 |
|
01-Jun-2016 |
Jordan Justen <jordan.l.justen@intel.com> |
i965: Enable cross-thread constants and compact local IDs for hsw+ The cross thread constant support appears on Haswell. It allows us to upload a set of uniform data for all threads without duplicating it per thread. One complication is that cross-thread constants are loaded into registers before per-thread constants. Previously, our local IDs were loaded before the uniform data and treated as 'payload' data, even though they were actually pushed into the registers like the other uniform data. Therefore, in this patch we simultaneously enable a newer layout where each thread now uses a single uniform slot for a unique local ID for the thread. This uniform is handled specially to make sure it is added last into the uniform push constant registers. This minimizes our usage of push constant registers, and maximizes our ability to use cross-thread constants for registers. To swap from the old to the new layout, we also need to flip some lowering pass switches to let our driver handle the lowering instead. We also no longer force thread_local_id_index to -1. v4: * Minimize size of patch that switches from the old local ID layout to the new layout (Jason) Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d437798ace47e47dbcb1244734dc1af3ecb5ab84 |
|
23-May-2016 |
Jordan Justen <jordan.l.justen@intel.com> |
i965: Add CS push constant info to brw_cs_prog_data We need information about push constants in a few places for the GL driver, and another couple places for the vulkan driver. When we add support for uploading both a common (cross-thread) set of push constants, combined with the previous per-thread push constant data, things are going to get even more complicated. To simplify things, we add push constant info into the cs prog_data struct. The cross-thread constant support is added as of Haswell. To support it we need to make sure all push constants with uniform values are added to earlier registers. The register that varies per thread and holds the thread invocation's unique local ID needs to be added last. For now we add the code that would calculate cross-thread constatn information for hsw+, but we force it (cross_thread_supported) off until the other parts of the driver support it. v4: * Support older local ID push constant layout as well. (Jason) Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1b79e7ebbd77a7e714fafadd91459059aacf2407 |
|
26-May-2016 |
Jordan Justen <jordan.l.justen@intel.com> |
i965: Store number of threads in brw_cs_prog_data Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3ef0957dac11edee7babc9746ec766dcb055d909 |
|
22-May-2016 |
Jordan Justen <jordan.l.justen@intel.com> |
i965: Add nir based intrinsic lowering and thread ID uniform We add a lowering pass for nir intrinsics. This pass can replace nir intrinsics with driver specific nir lower code. We lower the gl_LocalInvocationIndex intrinsic based on a uniform which is loaded with a thread specific ID. We also lower the gl_LocalInvocationID based on gl_LocalInvocationIndex. v2: * Create variable during lowering pass. (Ken) v3: * Don't create a variable, but instead just insert an intrisic call to load a uniform from the allocated location. (Jason) v4: * Don't run this pass if thread_local_id_index < 0 Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
04fc72501a90af94b0b5699e57fea68ad6e8795b |
|
23-May-2016 |
Jordan Justen <jordan.l.justen@intel.com> |
i965: Put CS local thread ID uniform in last push register This thread ID uniform will be used to compute the gl_LocalInvocationIndex and gl_LocalInvocationID values. It is important for this uniform to be added in the last push constant register. fs_visitor::assign_constant_locations is updated to make sure this happens. The reason this is important is that the cross-thread push constant registers are loaded first, and the per-thread push constant registers are loaded after that. (Broadwell adds another push constant upload mechanism which reverses this order, but we are ignoring this for now.) v2: * Add variable in intrinsics lowering pass * Make sure the ID is pushed last in assign_constant_locations, and that we save a spot for the ID in the push constants v3: * Simplify code based with Jason's suggestions. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fa279dfbf0fc89b07007141ad8850ac42206e397 |
|
29-May-2016 |
Jordan Justen <jordan.l.justen@intel.com> |
i965: Add uniform for a CS thread local base ID v4: * Force thread_local_id_index to -1 for now, and have fs_visitor::setup_cs_payload look at thread_local_id_index. This enables us to more easily cut over from the old local ID layout to the new layout, as suggested by Jason. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1205999c229b8e67af39fb9875bd87bc0a1404eb |
|
02-Jun-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Copy the offset when lowering logical pull constant sends This fixes 64 Vulkan CTS tests per gen Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96299 Reviewed-by: Francisco Jerez <currojerez@riseup.net> Cc: "12.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
303ec22ed6124f7860de3856599ab4f02808b84b |
|
25-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Extend remove_duplicate_mrf_writes() to handle non-VGRF to MRF copies. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4fe4f6e8a776acc60633809693e4135f5c894aa3 |
|
28-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix compute_to_mrf() to coalesce VGRFs initialized by multiple single-GRF writes. Which requires using a bitset instead of a boolean flag to keep track of the GRFs we've seen a generating instruction for already. The search loop continues until all instructions initializing the value of the source VGRF have been found, or it is determined that coalescing is not possible. Fixes a few piglit test cases on Gen4-6 which were regressed by 6956015aa514f2d06d0e4b33bfe6bca83142fbf0 due to the different (yet perfectly valid) ordering in which copy instructions are emitted now by the simd lowering pass, which had the side effect of causing this optimization pass to start corrupting the program in cases where a VGRF-to-MRF copy instruction would be eliminated but only the last instruction writing to the source VGRF region would be rewritten to point to the target MRF. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1898673f586b9110fb2a3125e2781cbb1d795c73 |
|
27-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Teach compute_to_mrf() about the COMPR4 address transformation. This will be required to correctly transform the destination of 8-wide instructions that write a single GRF of a VGRF to MRF copy marked COMPR4. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
485fbaff03f7d281ff4f22bd6321548512783799 |
|
27-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Refactor compute_to_mrf() to split search and rewrite into separate loops. This will allow compute_to_mrf to handle cases where the source of the VGRF-to-MRF copy is initialized by more than one instruction. In such cases we cannot rewrite the destination of any of the generating instructions until it's known whether the whole VGRF source region can be coalesced into the destination MRF, which will imply continuing the search until all generating instructions have been found or it has been determined that the VGRF and MRF registers cannot be coalesced. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4b0ec9f4759bab68b51e2f410e9305e39c1e1e7f |
|
28-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix compute-to-mrf VGRF region coverage condition. Compute-to-mrf was checking whether the destination of scan_inst is more than one component (making assumptions about the instruction data type) in order to find out whether the result is being fully copied into the MRF destination, which is rather inaccurate in cases where a single-component instruction is only partially contained in the source region, or when the execution size of the copy and scan_inst instructions differ. Instead check whether the destination region of the instruction is really contained within the bounds of the source region of the copy. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bb61e24787952a4796a687a86200a05cf83af7e9 |
|
27-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Simplify and improve accuracy of compute_to_mrf() by using regions_overlap(). Compute-to-mrf was being rather heavy-handed about checking whether instruction source or destination regions interfere with the copy instruction, which could conceivably lead to program miscompilation. Fix it by using regions_overlap() instead of the open-coded and dubiously correct overlap checks. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4decc426c26a86beb76dc48658ce175d051464c2 |
|
25-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Skip gen4 pre/post-send dependency workaronds for the first/last block. We know that there cannot be any destination dependency race if we reach the beginning or end of the program without having found any other instruction the send could possibly race with. This avoids emitting a pile of useless moves at the beginning or end of the program in the most common case in which the program has a single basic block only. On the original i965 I get the following shader-db results: total instructions in shared programs: 3354165 -> 3215637 (-4.13%) instructions in affected programs: 3183065 -> 3044537 (-4.35%) helped: 13498 HURT: 0 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
daf4a71883bffcedaf27ff046a1ddd4af9d41f7f |
|
29-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Skip SIMD lowering source unzipping for regular scalar regions. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6956015aa514f2d06d0e4b33bfe6bca83142fbf0 |
|
27-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Factor out region zipping and unzipping from the SIMD lowering pass. Just to make sure we keep the SIMD lowering pass tidy when we introduce additional logic to try to optimize out the copy instructions used to zip and unzip the destination and source regions into multiple packed regions of the lowered instruction width. Shouldn't cause any functional changes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a5b4f63c1593cdcbc253cce2838c85b2fd796dac |
|
20-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Implement opt_sampler_eot() in terms of logical sends. This makes the whole LOAD_PAYLOAD munging unnecessary which simplifies the code and will allow the optimization to succeed in more cases independent of whether the LOAD_PAYLOAD instruction can be found or not. The following patch is squashed in: SQUASH: i965/fs: Add basic dataflow check to opt_sampler_eot(). The sampler EOT optimization pass naively assumes that the texturing instruction provides all the data used by the FB write just because they're standing next to each other. The least we should be checking is whether the source and destination regions of the FB write and texturing instructions match. Without this the previous seemingly harmless patch would have caused opt_sampler_eot() to misoptimize a shader from dota-2 causing DCE to eliminate all of its 78 instructions except for the final sampler EOT message (!). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a0d9aed2682f78626f467cbc2b7fc3185d9f9034 |
|
30-Apr-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix UB list sentinel dereference in opt_sampler_eot(). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2a166c13d4a6edecaffc56a8220dda146e3ce8a0 |
|
04-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Take opt_redundant_discard_jumps out of the optimization loop. No shader-db regressions. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d5f2f32b118331070507faf292bbe3da2671df4b |
|
01-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Run SIMD and logical send lowering after the optimization loop. There are two reasons why this is useful: - It avoids the introduction of an amount of partial writes emitted by the SIMD lowering pass to zip and unzip register regions early during optimization, which can make subsequent optimization less effective. - It substantially reduces the burden on the compiler when a large fraction of the instructions in the program need to be split (e.g. during SIMD32 builds). Individual halves of split instructions will be optimized identically (if they can still be optimized at all), so doing it up front can duplicate the amount of instructions the optimizer has to deal with which causes the compilation time to explode in some cases due to the worse-than-linear runtime behaviour of the back-end. It seems helpful to re-run a few optimization passes in cases where any of the lowering passes was able to make progress. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b0c8e5e0c88f7c5d7395715e58a8731e2ab55f7e |
|
30-Apr-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Pass a BAD_FILE register to the logical FB write when oMask is unused. This will let the optimizer know that the sample mask value is unused so its definition can be DCE'ed. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
46ce93ed22891455dbe3eb4c69f5eddd2a7dcf00 |
|
27-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965: Add do32 debug option. The do32 INTEL_DEBUG option causes the back-end to try to generate a SIMD32 program when compiling a compute shader regardless of the specified compute shader workgroup size, which will be useful for testing SIMD32 code generation in the most common case in which the workgroup size doesn't exceed the SIMD16 limit so SIMD32 codegen wouldn't be automatically enabled. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
864737ce6cd5bae030079e749b8b18774a62d073 |
|
17-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Build 32-wide compute shader when needed. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
37fd13ee2daf1dbd80cc7b43f7dcfdd1bb64bcc7 |
|
18-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Extend back-end interface for limiting the shader dispatch width. This replaces the current fs_visitor::no16() interface with fs_visitor::limit_dispatch_width(), which takes an additional parameter allowing the caller to specify the maximum dispatch width a shader can be compiled with. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2d288cb9ea5b1b46eb4fe0061d694560bf54943f |
|
18-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Implement SIMD32 register allocation support. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1d5bf46ad1533ffdb30b5dc0f9244f60b0539285 |
|
01-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Don't mutate multi-component arguments in sampler payload set-up. The Gen5+ sampler message payload construction code steps through the coordinate and derivative components by induction like 'coordinate = offset(coordinate, bld, 1)', the problem is that while doing that it may step one past the end of the coordinate vector causing an assertion failure in offset() if it happens to be a (single component) immediate. Right now coordinates and derivatives are typically passed as actual registers but that will no longer be the case when we start propagating constants into logical messages. Instead express coordinate components in closed form like 'offset(coordinate, bld, i)' -- The end result seems slightly more readable that way and it allows passing the coordinate and derivative registers by const reference instead of by value, so it seems like a clean-up in its own right. v2: Fold a few post-increment operators into the last MOV statement. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8476233ae22c77ca26d8109f0f0d6c74457969f8 |
|
26-Apr-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Estimate number of registers written correctly in opt_register_renaming. The current estimate is incorrect for non-32b types. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
51dd6a60f5ef43a12d1b4384a2aded4d55d14056 |
|
20-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Reset reg_offset of the original destination to zero in compute_to_mrf(). Prevents an assertion failure in the following commit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b9eab911baa380fea1a3d3393f5944c00aa63076 |
|
26-Apr-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Skip remove_duplicate_mrf_writes() during SIMD32 runs. The pass is disabled in SIMD16 dispatch mode for the same reason, it cannot handle instructions that write multiple MRF registers at once. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7d430fc05e8f0a6211fb587f1bc7b2a76ed7de10 |
|
19-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
df1aec763eb972c69bc5127be102a9f281ce94f6 |
|
19-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Define methods to calculate the flag subset read or written by an fs_inst. v2: Codestyle fixes (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ece41df247af247fb573ae8ec208d50e895b7aef |
|
21-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Expose arbitrary channel execution groups to the IR. This generalizes the current fs_inst::force_sechalf flag to allow specifying channel enable groups other than 0 or 8. At some point it will likely make sense to fix the vec4 generator to support arbitrary execution groups and then move the definition of fs_inst::group into backend_instruction (e.g. so we can do FP64 in the VEC4 back-end). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5db4d623956ceb5ffa8599e7797bd13470898158 |
|
20-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Remove FS_OPCODE_PACK_STENCIL_REF virtual instruction. It's just a byte MOV with strided source. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cf5443f984da4eb500c9b1ad9b9f53bc8747fef3 |
|
20-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Limit SIMD width of various virtual opcodes to the maximum supported value. Which is 16 or 8 in most cases. This will make sure that 32-wide virtual instructions get chopped up into chunks of their maximum execution size. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
197833caa3d684c092ee76d1e9ff3fac28576b04 |
|
20-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Lower LOAD_PAYLOAD instructions of unsupported width. Only per-channel LOAD_PAYLOAD instructions can be lowered, which should cover everything that comes in from the front-end. LOAD_PAYLOAD instructions used to construct actual message payloads cannot be easily lowered because they contain headers and vectors of variable type that aren't necessarily channel-aligned -- We shouldn't find any of them in the program at SIMD lowering time though because they're introduced during logical send lowering. An alternative that may be worth considering would be to re-run the SIMD lowering pass after LOAD_PAYLOAD lowering instead of this patch. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9eea3df29f21eb7507354c3b1d85d238b671a211 |
|
18-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Lower DDY instructions to SIMD8 during SIMD lowering time ...on hardware lacking compressed Align16 support. Will allow simplifying the generator code and fixing it for SIMD32 codegen. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
12ae87abb194e2fc5339d8944b6d0e9ddf54ea22 |
|
18-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Apply usual FPU-like execution size restrictions to MULH. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
dea9c1df89cf58591cce83b67d3d905a28f0c101 |
|
18-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Calculate maximum execution size of MOV_INDIRECT correctly. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
122e0315480704a7c6777b994c42448d360e6774 |
|
18-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Assert that IF instruction with embedded compare has legal exec_size. We shouldn't encounter these right now but if we did it wouldn't be possible for the SIMD lowering pass to split it into multiple instructions because of its side effects on control flow, so just assert in order to kill the program. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
98c8bef01cae5fd70dda22fd7ac0b5694c4dfb5f |
|
18-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Implement HSW BFI exec size workarounds in the SIMD lowering pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
88d9cc15637559229fe725c0531de8ad7a0a60a7 |
|
18-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Implement workaround for IVB CMP dependency race in the SIMD lowering pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a6bf5f88c7be5ba1d1d9ebf1412e99886e0cf75c |
|
20-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Enforce common regioning restrictions by SIMD splitting. This change addresses a number of hardware restrictions on the source and destination regions and other execution controls of regular FPU-like instructions that in some cases can be avoided by reducing the execution size of the instruction. Some of these restrictions (e.g. the one about 3src instructions not supporting compression on some hardware) are currently being worked around case by case in the generator with ad-hoc splitting code that is buggy in several ways (e.g. doesn't handle non-trivial execution controls which would break SIMD32 code), but it seems cleaner to implement as many restrictions as we can in a single lowering pass since that will allow us to simplify some of the surrounding code considerably and also make sure that we don't forget applying them in the future. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2b5adb942bad418058d266c85c396040d558f680 |
|
20-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Enforce extended math exec size limits during SIMD lowering. This teaches the SIMD lowering pass about the hardware limits on the execution size of math instructions, which will allow simplifying the generator code and at the same time get rid of a number of bugs in the manual SIMD unrolling done currently that prevent SIMD32 codegen from working. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a8e7b4f1d9ec50d2214e7694da26af6a108e506f |
|
20-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Handle SAMPLEINFO consistently like other texturing instructions. Seems like this texturing opcode was missing its logical counterpart which would prevent it from taking advantage of the SIMD lowering infrastructure, define it and plumb it through the back-end. At some point we'll likely want to emit a single SAMPLEINFO message shared among all channels irrespective of this change, but for the moment this should be enough to get the intrinsic working in SIMD32 mode. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
99b5476d33f967ac2a30c3f8f7f958a7169e7123 |
|
18-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Lower math into Gen4-5 send-like instructions in lower_logical_sends. The benefit is we will be able to use the SIMD lowering pass to unroll math instructions of unsupported width and then remove some cruft from the generator. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ed4d0e41acb78f268b8b5c2dd03f654d11c4460b |
|
20-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Rename Gen4 physical varying pull constant load opcode. For consistency with the Gen7 variant. I'm not doing the same to the uniform pull constant message at this point because the non-GEN7 one is still overloaded to be either an expression-like logical instruction or a Gen4-specific physical send message. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
64a6cb87f1fbfe2e410d6a4087450c2d4eb72228 |
|
18-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Implement promotion of varying pull loads on Gen4 during SIMD lowering. Varying pull constant loads inherit the same limitation of pre-ILK hardware that requires expanding SIMD8 texel fetch instructions to SIMD16, we can deal with pull constant loads in the same way it's done for texturing during SIMD lowering. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d8a3294ac21741c3a78eef72b832902e15fbd948 |
|
18-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Hide varying pull constant load message setup behind logical opcode. This will allow the SIMD lowering pass to split 32-wide varying pull constant loads (not natively supported by the hardware) into 16-wide instructions. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c5f224145a41079ddcc77c0d7df8b4b75ed2d4fe |
|
18-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Handle instruction predication in SIMD lowering pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1760c24b4bcf028477404e283f5768f2b6f25123 |
|
18-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: No need to unzip SIMD-periodic sources during SIMD lowering. If the source value is going to the same for all SIMD-lowered chunks of the instruction there should be no need to unzip the value into multiple temporary registers one for each lowered chunk. As a side effect this fixes SIMD lowering of instructions with a vector immediate source. In the long term it *might* still be worth fixing offset() to handle vector immediates correctly though, this should be good enough for the moment. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e79aa19d88b4d6dbd26c23287292e6bf9f41ce33 |
|
20-May-2016 |
Juan A. Suarez Romero <jasuarez@igalia.com> |
i965: fix double-precision vertex inputs measurement For double-precision vertex inputs we need to measure them in dvec4 terms, and for single-precision vertex inputs we need to measure them in vec4 terms. For the later case, we use type_size_vec4() function. For the former case, we had a wrong implementation based on type_size_vec4(). This commit introduces a proper type_size_dvec4() function, that we use to measure vertex inputs. Measuring double-precision vertex inputs as dvec4 is required because ARB_vertex_attrib_64bit states that these uses the same number of locations than the single-precision version. That is, two consecutives dvec4 would be located in location "x" and location "x+1", not "x+2". Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
dac10e8a1390711f1f36f224644c4a33586cebe3 |
|
17-May-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965, anv: Use NIR FragCoord re-center and y-transform passes. This handles gl_FragCoord transformations and other window system vs. user FBO coordinate system flipping by multiplying/adding uniform values, rather than recompiles. This is much better because we have no decent way to guess whether the application is going to use a shader with the window system FBO or a user FBO, much less the drawable height. This led to a lot of recompiles in many applications. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8a65b5135a167d4f12cef19408e0ca52fffe06bc |
|
05-May-2016 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Recognize and emit ld_lz, sample_lz, sample_c_lz. Ken suggested instead of a big and complicated optimization pass, to just recognize the operations here. It's certainly less code and a lot prettier, but it seems to actually perform worse for currently unknown reasons. total instructions in shared programs: 8923452 -> 8904108 (-0.22%) instructions in affected programs: 814563 -> 795219 (-2.37%) helped: 3336 HURT: 10 total cycles in shared programs: 66970734 -> 66651476 (-0.48%) cycles in affected programs: 10582686 -> 10263428 (-3.02%) helped: 2438 HURT: 691 total spills in shared programs: 1811 -> 1789 (-1.21%) spills in affected programs: 85 -> 63 (-25.88%) helped: 4 total fills in shared programs: 3143 -> 3109 (-1.08%) fills in affected programs: 167 -> 133 (-20.36%) helped: 4 LOST: 2 GAINED: 36 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
75dccf5ac2af716175990ae9eac44cc2c99b7e9c |
|
05-May-2016 |
Matt Turner <mattst88@gmail.com> |
i965: Add infrastucture for sample lod-zero operations. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
767168771376a3ee95d5b1f5b2f5fd577b76391e |
|
17-May-2016 |
Eduardo Lima Mitev <elima@igalia.com> |
i965/fs: Silence warnings related to use of uninitialized values brw_fs.cpp: In function ‘const unsigned int* brw_compile_fs(const [...] brw_fs.cpp:6093:64: warning: ‘simd16_grf_start’ may be used uninitialized [...] prog_data->base.dispatch_grf_start_reg = simd16_grf_start; brw_fs.cpp:5996:29: note: ‘simd16_grf_start’ was declared here uint8_t simd8_grf_start, simd16_grf_start; brw_fs.cpp:6094:52: warning: ‘simd16_grf_used’ may be used uninitialized [...] prog_data->reg_blocks_0 = brw_register_blocks(simd16_grf_used); brw_fs.cpp:5997:29: note: ‘simd16_grf_used’ was declared here unsigned simd8_grf_used, simd16_grf_used; (and more) Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
265487aedfabbcfb073f9d6053d1ceb510b78b27 |
|
16-May-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Add an allow_spilling flag to brw_compile_fs This allows us to disable spilling for blorp shaders since blorp state setup doesn't handle spilling. Without this, blorp fails hard if you run with INTEL_DEBUG=spill. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Tested-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d6281a9d955ad97f993927bc214e4b641cfbe359 |
|
15-Apr-2016 |
Juan A. Suarez Romero <jasuarez@igalia.com> |
i965: take care of doubles when lowering VS inputs Input attributes can require 2 vec4 or 1 vec4 depending on whether they are double-precision or not. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7ea09511ca4f58640063cc1ee08386cce5300535 |
|
04-Apr-2016 |
Juan A. Suarez Romero <jasuarez@igalia.com> |
i965/fs: calculate first non-payload GRF using attrib slots When computing where the first non-payload GRF starts, we can't rely on the number of attributes, as each attribute can be using 1 or 2 slots depending on whether they are a dvec3/4 or other. Instead, we need to use the number of slots used by the attributes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
96c276dda909ddf12714b9e64b7207156e8fd4bb |
|
23-Mar-2016 |
Alejandro Piñeiro <apinheiro@igalia.com> |
i965/fs: half exec_size when dealing with 64 bits attributes The HW has a restriction that only vertical stride may cross register boundaries. Until now this was only handled on VGRFs at rw_reg_from_fs_reg, but it is also needed for attributes. v2: * Remove reference to commit id on commit message (Juan Suarez) * Simplify code that compute final exec_size (Ian Romanick) * Use REG_SIZE on that same code (Kenneth Graunke) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
58f1804c4f38b76c20872d6887b7b5e6029e0454 |
|
18-Jan-2016 |
Iago Toral Quiroga <itoral@igalia.com> |
i965/fs: fix pull constant load component selection for doubles UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4 starting at a constant offset that is 16-byte aligned. If we need to access an unaligned offset we emit a load with an aligned offset and use the remaining constant offset to select the component into the vec4 result that we are interested in. This component must be computed in units of the type size, since that is what fs_reg::set_smear expects. This patch does this change in the two places where we use this message: In demote_pull_constants when we lower uniform access with constant offset into the pull constant buffer and in UBO loads with constant offset. v2 (Sam): - Fix set_smear() in fs_visitor::lower_constant_loads(), take into account source type instead and remove MAX2 (Curro). - Improve changes to nir_intrinsic_load_ubo case in nir_emit_intrinsic() (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e209134f717078fb6c1d4a6d048b4aba22c87993 |
|
14-Jan-2016 |
Iago Toral Quiroga <itoral@igalia.com> |
i965/fs: Fix fs_visitor::VARYING_PULL_CONSTANT_LOAD for doubles v2 (Curro): - Assert on scale == 1 when shuffling 64-bit data. - Remove type_slots, use type_sz(vec4_result.type) instead. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4d9c461e53440182de42d0a16ec66ad7f5c3b00a |
|
04-May-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Stop using the LOAD_PAYLOAD instruction in lower_simd_width. Instead of using the LOAD_PAYLOAD instruction (emitted through the emit_transpose() helper that is no longer useful and this commit removes) which had to be marked force_writemask_all in some cases, emit a series of moves to apply proper channel enable signals to the destination. Until now lower_simd_width() had mainly been used to lower things that invariably had a basic block-local temporary as destination so it didn't seem like a big deal, but I found it to be the reason for several Piglit regressions in my SIMD32 branch and Igalia discovered the same issue independently while working on FP64 support. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a0e6e5f21ffea8acb9500ef699b204c557214b75 |
|
30-Apr-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use MRF0 for the repclear message This is what BLORP does. Making them match cuts down on the noise when looking at AUB diffs. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bee160b31be9e09eeab83f62d26ac331f08955fa |
|
29-Apr-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Organize prog_data by ksp number rather than SIMD width The hardware packets organize kernel pointers and GRF start by slots that don't map directly to dispatch width. This means that all of the state setup code has to re-arrange the data from prog_data into these slots. This logic has been duplicated 4 times in the GL driver and one more time in the Vulkan driver. Let's just put it all in brw_fs.cpp. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1ec466d0ff59ab17edef95c84ed733c1fea5655e |
|
28-Apr-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Stop setting dispatch_grf_start_reg from the visitor Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
082768af30cb73050bda8103a29136afb2fd020f |
|
28-Apr-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Clean up the logic in compile_fs a bit Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
712a980adde0b14eee8b4accd02af9b9740091a2 |
|
10-May-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Rework the persample shading key/prog_data bits This commit reworks and simplifies the way we handle persample shading in the shader key and prog_data. The previous approach had three different key bits that had slightly different and hard-to-decern meanings while the new bits are far more clear. This commit changes it to two easily understood bits that communicate everything we need: 1) key->persample_interp: means that the user has requested persample interpolation through the API. This is equivalent to having SAMPLE_SHADING enabled and having MIN_SAMPLE_SHADING_VALUE set high enough that you actually get multiple per-sample invocations. 2) key->multisample_fbo: means that the shader will be running on an actual multi-sampled framebuffer. This commit also adds a new "persample_dispatch" bit to prog_data which indicates that the shader should be run in persample mode. This way the state setup code doesn't have to look at the fragment program or GL state and can just pull that data out of the prog_data. In theory, this shuffle could mean more recompiles. However, in practice, we were shoving enough state into the key before that we were probably hitting a recompile on every per-sample shader anyway. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
607fb0f13df8e328ed5d173c98fc250449c55aee |
|
10-May-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Reduce the SIMD8 GS push constant threshold from 32 to 24. Three Shadow of Mordor geometry shaders increase by a single instruction, but the number of spills/fills in Orbital Explorer is reduced from 194:1279 -> 82:454. No other programs are affected. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
203c786a73847fb07d805c4cc799b7c7d028695c |
|
10-May-2016 |
Jason Ekstrand <jason@jlekstrand.net> |
i965/fs: Default all constants to a location of -1 Otherwise constants which aren't live get an undefined constant location. When we go to set up param and pull_param we end up assigning all unused uniforms to slot 0. This cases the Vulkan driver to segfault because it doesn't have pull_param. This fixes bugs in the Vulkan driver introduced in c3fab3d000. Reviewed-by: Mark Janes <mark.a.janes@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4c9006f95796e67cf2cac98795627c31b15b0371 |
|
20-Apr-2016 |
Samuel Iglesias Gonsálvez <siglesias@igalia.com> |
i965/fs: fix MOV_INDIRECT exec_size for doubles In that case, the writes need two times the size of a 32-bit value. We need to adjust the exec_size, so it is not breaking any hardware rule. v2: - Add an assert to verify type size is not less than 4 bytes (Jordan). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
03687ab77fea7893f8786ce407d6f4d108b28012 |
|
27-Nov-2015 |
Samuel Iglesias Gonsálvez <siglesias@igalia.com> |
i965/fs: demote_pull_constants() did not take into account double types The constants could be double, and it was allocating size for float types for the destination register of varying pull constant loads. Then the fs_visitor::validate() will complain. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c3fab3d00095ed4a5693d5272073298f07dcb9b5 |
|
05-May-2016 |
Samuel Iglesias Gonsálvez <siglesias@igalia.com> |
i965/fs: push first double-based uniforms in push constant buffer When there is a mix of definitions of uniforms with 32-bit or 64-bit data type sizes, the driver ends up doing misaligned access to double based variables in the push constant buffer. To fix this, this patch pushes first all the 64-bit variables and then the rest. Then, all the variables would be aligned to its data type size. v2: - Fix typo and improve comment (Jordan). - Use ralloc(NULL,...) instead of rzalloc(mem_ctx,...) (Jordan). - Fix typo (Topi). - Use pointers instead of references in set_push_pull_constant_loc() (Topi). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
193cb67a84c1725382f62a2f3aa60564d275c2f8 |
|
31-Mar-2016 |
Iago Toral Quiroga <itoral@igalia.com> |
i965/fs: recognize writes with a subreg_offset > 0 as partial Usually, writes to a subreg_offset > 0 would also have a stride > 1 and we would recognize them as partial, however, there is one case where this does not happen, that is when we generate code for 64-bit imemdiates in gen7, where we produce something like this: mov(8) vgrf10:UD, <low 32-bit> mov(8) vgrf10+0.4:UD, <high 32-bit> and then we use the result with a stride of 0, as in: mov(8) vgrf13:DF, vgrf10<0>:DF Although we could try to avoid this issue by producing different code for this by using writes with a stride of 2, that runs into other problems affecting gen7 and the fact is that any instruction that writes to a subreg_offset > 0 is a partial write so we should really recognize them as such. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
34ed61b33459c975074df0e83a2161fb76526621 |
|
15-Jan-2016 |
Iago Toral Quiroga <itoral@igalia.com> |
i965/fs/lower_simd_width: Fix registers written for split instructions When the original instruction had a stride > 1, the combined registers written by the split instructions won't amount to the same register space written by the original instruction because the split instructions will use a stride of 1. The current code assumed otherwise and computed the number of registers written by split instructions as an equal share based on the relation between the lowered width and the original execution size of the instruction. It is only after the split, when we interleave the components of the result from the lowered instructions back into the original dst register, that the original stride takes effect and we write all the registers specified by the original instruction. Just make the number of register written the same as the vgrf space we allocate for the dst of the split instruction. Fixes crashes in fp64 tests produced as a result of assigning incorrectly the number of registers written by split instructions, which led to incorrect validation of the size of the writes against the allocated vgrf space. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9741cff1ec3bdc0edf4122bf20aa3447dd8cb741 |
|
18-Jan-2016 |
Iago Toral Quiroga <itoral@igalia.com> |
i965/fs: rename our lower_d2f pass to lower_d2x Since it no longer handles conversions from double to float but from double to various other 32-bit types. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9e1b3ea199c3bd01fe89e6ab3eee4cae3da92264 |
|
01-Nov-2015 |
Connor Abbott <cwabbott0@gmail.com> |
i965/fs: add a pass for legalizing d2f We need to do this late, in order to avoid partial writes during the optimization loop. v2: Use subscript() instead of stride(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6b6d68ae0786e456faa828a7eaf76c981c44b1cb |
|
11-Aug-2015 |
Connor Abbott <connor.w.abbott@intel.com> |
i965/fs: fix is_copy_payload() for doubles v2 (Sam): - LOAD_PAYLOAD treats each header source as a 32B block regardless of the datatype. Drop the change (Curro) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4f3888c1caf3455f61b2e20ccf7c39e59f4feaf3 |
|
29-Jul-2015 |
Connor Abbott <connor.w.abbott@intel.com> |
i965/fs: fix assign_constant_locations() for doubles Uniform doubles will read two registers, in which case we need to mark both as being live. v2 (Sam): - Use a formula to get the number of registers read with proper units (Curro). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1f51aada3fbf73ffe601f743b5244df63e17f9d5 |
|
29-Jul-2015 |
Connor Abbott <connor.w.abbott@intel.com> |
i965/fs: fix type_size() for doubles Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fd763177c13579ff51cb35fd8bc3b6d703073b61 |
|
05-May-2016 |
Connor Abbott <cwabbott0@gmail.com> |
i965/fs: add a pass for lowering PACK opcodes v2: Use subscript() instead of stride() (Curro) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ba582e58cd30c815137a11c9497b01d97842e525 |
|
05-May-2016 |
Connor Abbott <cwabbott0@gmail.com> |
i965/fs: add PACK opcode Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a308bae58f3e2dabd2ffaec98c1f91c9abf7a9f8 |
|
04-Aug-2015 |
Connor Abbott <connor.w.abbott@intel.com> |
i965/fs: add support for printing double immediates Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3a886721ed449be0c87ece972acada96cc0811b6 |
|
04-May-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Silence unused variable warning I added this when deleting some unnecessary code in a rebase.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1cc7573162a7f0e8346d7abab50890c58a0dce9a |
|
28-Apr-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965: Pass devinfo pointer to is_3src() helpers. This is not strictly required for the following changes because none of the three-source opcodes we support at the moment in the compiler back-end has been removed or redefined, but that's likely to change in the future. In any case having hardware instructions specified as a pair of hardware device and opcode number explicitly in all cases will simplify the opcode look-up interface introduced in a subsequent commit, since the opcode number alone is in general ambiguous. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c55dc77ab13420a9fe0177ccd21a6b0a950d9113 |
|
28-Apr-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965: Pass devinfo pointer to brw_instruction_name(). A future series will implement support for an instruction that happens to have the same opcode number as another instruction we support already on a disjoint set of hardware generations. In order to disambiguate which instruction it is brw_instruction_name() will need some way to find out which device we are generating code for. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7d9143ad885752184156b3a0d3e492aef09af3b0 |
|
15-Nov-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Write a scalar TCS backend that runs in SINGLE_PATCH mode. Unlike most shader stages, the Hull Shader hardware makes us explicitly tell it how many threads to dispatch and manually configure the channel mask. One perk of this is that we have a lot of flexibility - we can run it in either SIMD4x2 or SIMD8 mode. Treating it as SIMD8 means that shaders with 8 or fewer output vertices (which is overwhemingly the common case) can be handled by a single thread. This has several intriguing properties: - Accessing input arrays with gl_InvocationID as the index is a simple SIMD8 URB read with g1 as the header. No indirect addressing required. - Barriers are no-ops. - We could potentially do output shadowing to combine writes, as the concurrency concerns are gone. (We don't do this yet, though.) v2: Drop first_non_payload_grf change, as it was always adding 0 (caught by Jordan Justen). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
13195f7ef85e0923a7b7d5b8a35eb6b6c257db1c |
|
23-Apr-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Reduce the response length of sampler messages on Skylake. Often, we don't need a full 4 channels worth of data from the sampler. For example, depth comparisons and red textures only return one value. To handle this, the sampler message header contains a mask which can be used to disable channels, and reduce the message length (in SIMD16 mode on all hardware, and SIMD8 mode on Broadwell and later). We've never used it before, since it required setting up a message header. This meant trading a smaller response length for a larger message length and additional MOVs to set it up. However, Skylake introduces a terrific new feature: for headerless messages, you can simply reduce the response length, and it makes the implicit header contain an appropriate mask. So to read only RG, you would simply set the message length to 2 or 4 (SIMD8/16). This means we can finally take advantage of this at no cost. total instructions in shared programs: 9091831 -> 9073067 (-0.21%) instructions in affected programs: 191370 -> 172606 (-9.81%) helped: 2609 HURT: 0 total cycles in shared programs: 70868114 -> 68454752 (-3.41%) cycles in affected programs: 35841154 -> 33427792 (-6.73%) helped: 16357 HURT: 8188 total spills in shared programs: 3492 -> 1707 (-51.12%) spills in affected programs: 2749 -> 964 (-64.93%) helped: 74 HURT: 0 total fills in shared programs: 4266 -> 2647 (-37.95%) fills in affected programs: 3029 -> 1410 (-53.45%) helped: 74 HURT: 0 LOST: 1 GAINED: 143 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
acc2f1fe361af87ce4d50b7e2b58e0da093477e1 |
|
09-Oct-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use inst->regs_written for rlen for texture instructions Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
33565d67641142a68d537023e181b6dcd587e551 |
|
20-Apr-2016 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Readd opt_drop_redundant_mov_to_flags(). This reverts commit b449366587b5f3f64c6fb45fe22c39e4bc8a4309. I removed the pass thinking that it was now not useful, but that was not true. I believe I ran shader-db on HSW and saw no results, but HSW does not use the unlit centroid workaround code and as a result does not emit redundant MOV_DISPATCH_TO_FLAGS instructions. On IVB, the shader-db results are: total instructions in shared programs: 6650806 -> 6646303 (-0.07%) instructions in affected programs: 106893 -> 102390 (-4.21%) helped: 793 total cycles in shared programs: 56195538 -> 56103720 (-0.16%) cycles in affected programs: 873048 -> 781230 (-10.52%) helped: 553 HURT: 209 On SNB, the shader-db results are: total instructions in shared programs: 7173074 -> 7168541 (-0.06%) instructions in affected programs: 119757 -> 115224 (-3.79%) helped: 799 total cycles in shared programs: 98128032 -> 98072938 (-0.06%) cycles in affected programs: 1437104 -> 1382010 (-3.83%) helped: 454 HURT: 237 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
447d3eec6a869200612e5010f47335cb26789a3a |
|
06-Apr-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix gl_SampleMaskIn[] in per-sample shading mode. The coverage mask is not sufficient - in per-sample mode, we also need to AND with a mask representing the samples being processed by the current fragment shader invocation. Fixes 18 dEQP-GLES31.functional.shaders.sample_variables tests: sample_mask_in.bit_count_per_sample.multisample_{rbo,texture}_{1,2,4,8} sample_mask_in.bit_count_per_two_samples.multisample_{rbo,texture}_{4,8} sample_mask_in.bits_unique_per_sample.multisample_{rbo,texture}_{1,2,4,8} sample_mask_in.bits_unique_per_two_samples.multisample_{rbo,texture}_{4,8} Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
66a725570c9f93ab0341e9479390c9d042d7cd00 |
|
05-Apr-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Only enable oMask output when there's a multisample FBO. The ARB_sample_shading specification says that setting gl_SampleMask bits to 0 means that the corresponding sample "should be considered uncovered for the purposes of multisample fragment operations (Section 4.1.3)." The OpenGL 4.4 specification, section 17.3.3 ("Multisample Fragment Operations") specifies: "No changes to the fragment alpha or coverage values are made at this step if MULTISAMPLE is disabled, or if the value of SAMPLE_BUFFERS is not one." oMask output alters coverage masks and can kill pixels. We need to disable it in the above case, which conveniently corresponds to key->multisample_fbo being false. Khronos bug #12188 also spells this out clearly: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=12188 Fixes two Piglit tests: tests/spec/arb_sample_shading/builtin-gl-sample-mask-simple 0 tests/spec/arb_sample_shading/builtin-gl-sample-mask 0 Fixes 21 ES3 conformance tests: ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_zero ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_0 ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_1 ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_2 ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_3 ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_7 ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_zero ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_3 ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_4 ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_5 ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_7 ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_zero ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_2 ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_3 ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_4 ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_6 ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_zero ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_0 ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_2 ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_5 ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_7 Fixes 9 dEQP-GLES31.functional.shaders.sample_variables tests: sample_mask.discard_half_per_pixel.default_framebuffer sample_mask.discard_half_per_pixel.singlesample_rbo sample_mask.discard_half_per_pixel.singlesample_texture sample_mask.discard_half_per_sample.default_framebuffer sample_mask.discard_half_per_sample.singlesample_rbo sample_mask.discard_half_per_sample.singlesample_texture sample_mask.discard_half_per_two_samples.default_framebuffer sample_mask.discard_half_per_two_samples.singlesample_rbo sample_mask.discard_half_per_two_samples.singlesample_texture Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
81407531e0b8d2e6a7f9c39cb44ed6a72dc61e77 |
|
06-Apr-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Generalize wm_key->compute_sample_id to wm_key->multisample_fbo. I'm going to need a key entry meaning "we have a multisample FBO, and multisampling is enabled" in an upcoming patch. This is basically wm_key->compute_sample_id, except that it also checks that the SAMPLE_ID system value is read. The only use of wm_key->compute_sample_id is in emit_sampleid_setup(), which is only called when handling the SAMPLE_ID system value. So we can just eliminate the check and generalize the field. v2: Also update the Vulkan driver. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
57118a19da932b4b5756021a0d75e91f42a68d99 |
|
06-Apr-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Simplify gl_SampleID setup on Gen8+. On Gen7+, the thread payload provides the sample ID - we can read it in two instructions, without any elaborate calculations. We don't even need a state dependency - this will properly produce zero in the non-MSAA case. Unfortunately, we need the state flag anyway, so we may as well continue to use it to produce a single MOV 0 instead of SHR/AND. For some reason, the sample ID field is always zero on Gen7/7.5, so we can't use this yet. However, it works fine on Gen8+. So, land the code and use it where it's working, and leave a TODO for later. v2: Fix register types in the comment (caught by Matt Turner!). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
528255b0b1498d22c820cecc5d75591d25ddb375 |
|
19-Apr-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Flip key->compute_sample_id check. This just moves the simple case first. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f1d29099b4eedafb0302a21c0673d12a6610c369 |
|
06-Apr-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Push everything if pull_param == NULL Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
963513bb24bdd542f1af3733fab53ad450d3221b |
|
09-Dec-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Push small uniform arrays Unfortunately, this also means that we need to use a slightly different algorithm for assign_constant_locations. The old algorithm worked based on the assumption that each read of a uniform value read exactly one float. If it encountered a MOV_INDIRECT, it would immediately bail and push the whole thing. Since we can now read ranges using MOV_INDIRECT, we need to be able to push a series of floats without breaking them up. To do this, we use an algorithm similar to the on in split_virtual_grfs. Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
71f8039f728eb0a67e471321da61f0e88aec8035 |
|
09-Dec-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Rename demote_pull_constants to lower_constant_loads Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
479e38ad63ab1421afe4f25d36f434ac2e12e817 |
|
25-Nov-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Get rid of the param_size array Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
30874216cbaa21e9b757af7db1ef165b5c780a39 |
|
25-Nov-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Stop relying on param_size in assign_constant_locations Now that we have MOV_INDIRECT opcodes, we have all of the size information we need directly in the opcode. With a little restructuring of the algorithm used in assign_constant_locations we don't need param_size anymore. The big thing to watch out for now, however, is that you can have two ranges overlap where neither contains the other. In order to deal with this, we make the first pass just flag what needs pulling and handle assigning pull constant locations until later. Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
275855f315623923eff863265077a9a840885c9e |
|
25-Nov-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Get rid of reladdr We aren't using it anymore. Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3c93cdfaf598bc3c28e3dc288da35675c666602b |
|
25-Nov-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use MOV_INDIRECT for all indirect uniform loads Instead of using reladdr, this commit changes the FS backend to emit a MOV_INDIRECT whenever we need an indirect uniform load. We also have to rework some of the other bits of the backend to handle this new form of uniform load. The obvious change is that demote_pull_constants now acts more like a lowering pass when it hits a MOV_INDIRECT. Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
27bd8ac6f309b9f052a7fa9380ac5e12fb686e31 |
|
24-Nov-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Add support for MOV_INDIRECT on pre-Broadwell hardware While we're at it, we also add support for the possibility that the indirect is, in fact, a constant. This shouldn't happen in the common case (if it does, that means NIR failed to constant-fold something), but it's possible so we should handle it. Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
889e6054b7795baa789cc771e76e009d1605efae |
|
24-Nov-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Fix regs_read() for MOV_INDIRECT with a non-zero subnr The subnr field is in bytes so we don't need to multiply by type_sz. Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
40a8fe04dcee7a867e7d6044b23fafc20599c899 |
|
24-Nov-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Add support for doing MOV_INDIRECT on uniforms Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
240d16ea94834eb2472e91fd4856381951a07007 |
|
25-Nov-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use UD type for offsets in VARYING_PULL_CONSTANT_LOAD Reveiewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e61cc87c757f8bc0b6a3af318a512b22c072595c |
|
06-Apr-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Add a flat_inputs field to prog_data Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3921b64e63db39a3f19ebb8250081ba7ddf843a2 |
|
04-Apr-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Make the repclear shader support either a uniform or a flat input In the Vulkan driver we use a single flat input instead of a uniform because setting up push constants is more disruptive to the pipeline than setting up another vertex input. This uses the number of uniforms as a key to keep it working for the GL driver. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
01425c45b32fa7f323515b05697c6cc0d245ad32 |
|
17-Mar-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Remove the RCP+RSQ algebraic optimizations NIR already has this optimization and it can do much better than the little peephole in the backend. No shader-db change on Haswell or Broadwell. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7d021cb15e6d67ecef8b020fd36c4a680bcc9c39 |
|
18-Jan-2016 |
Jordan Justen <jordan.l.justen@intel.com> |
i965/nir: Lower nir compute shader shared variables Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
93be4158aed9accab06e3df2d8c526d3312bfff8 |
|
12-Mar-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Add missing analysis invalidation in fixup_3src_null_dest(). Bug found by the liveness analysis validation pass that will be introduced in a later commit. fixup_3src_null_dest() was allocating registers which makes the cached liveness analysis calculation incomplete, so it must be invalidated. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6691c03fd39be463e1d222b56e3ec8da9f3b7f24 |
|
12-Mar-2016 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Add missing analysis invalidation in opt_sampler_eot(). Bug found by the liveness analysis validation pass that will be introduced in a later commit. opt_sampler_eot() was allocating registers and inserting and removing instructions, which makes the cached liveness analysis calculation inconsistent with the shader IR, so it must be invalidated. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a100a57e30010da49c96f84a661cec9c57f9eebe |
|
20-Feb-2016 |
Jordan Justen <jordan.l.justen@intel.com> |
i965/hsw: Initialize SLM index in state register For Haswell, we need to initialize the SLM index in the state register. This can be copied out of the CS header dword 0. v2: * Use UW move to avoid changing upper 16-bits of sr0.1 (mattst88) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94081 Fixes: piglit arb_compute_shader/execution/shared-atomics.shader_test Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Cc: "11.2" <mesa-stable@lists.freedesktop.org> Tested-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1) Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d8347f12ead89c5a58f69ce9283a54ac8487159c |
|
22-Feb-2016 |
Jordan Justen <jordan.l.justen@intel.com> |
i965/compute: Skip SIMD8 generation if it can't be used If the local workgroup size is sufficiently large, then the SIMD8 program can't be used. In this case we can skip generating the SIMD8 program. For complex programs this can save a significant amount of time. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e1d54b1ba5a9d579020fab058bb065866bc35554 |
|
22-Feb-2016 |
Jordan Justen <jordan.l.justen@intel.com> |
i965/fs: Allow spilling for SIMD16 compute shaders For fragment shaders, we can always use a SIMD8 program. Therefore, if we detect spilling with a SIMD16 program, then it is better to skip generating a SIMD16 program to only rely on a SIMD8 program. Unfortunately, this doesn't work for compute shaders. For a compute shader, we may be required to use SIMD16 if the local workgroup size is bigger than a certain size. For example, on gen7, if the local workgroup size is larger than 512, then a SIMD16 program is required. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93840 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Cc: "11.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cfbd9831f89ef165e7998d0b8524a1aefedec404 |
|
25-Feb-2016 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Eliminate brw_nir_lower_{inputs,outputs,io} functions. Now that each stage is directly calling brw_nir_lower_io(), and we have per-stage helper functions, it makes sense to just call the relevant one directly, rather than going through multiple switch statements. This also eliminates stupid function parameters, such as the two that only apply to vertex attributes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b3cb6e78aa219ad73c145a25ee1bb48fd8b025d0 |
|
17-Feb-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/nir: Do lower_io late for fragment shaders The Vulkan driver wants to be able to delete fragment outputs that are beyond key.nr_color_regions; this is a lot easier if we lower outputs at specialization time rather than link time. (Rationale added to commit message by Ken) Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2f2c00c7279e7c43e520e21de1781f8cec263e92 |
|
11-Feb-2016 |
Matt Turner <mattst88@gmail.com> |
i965: Lower min/max after optimization on Gen4/5. Gen4/5's SEL instruction cannot use conditional modifiers, so min/max are implemented as CMP + SEL. Handling that after optimization lets us CSE more. On Ironlake: total instructions in shared programs: 6426035 -> 6422753 (-0.05%) instructions in affected programs: 326604 -> 323322 (-1.00%) helped: 1411 total cycles in shared programs: 129184700 -> 129101586 (-0.06%) cycles in affected programs: 18950290 -> 18867176 (-0.44%) helped: 2419 HURT: 328 Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
95ea9f770878517364ac2161eb943afbc77bfef9 |
|
10-Feb-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
glsl/types: Add support for function types SPIR-V has a concept of a function type that's used fairly heavily. We could special-case function types in SPIR-V -> NIR but it's easier if we just add support to glsl_types. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5743fd957145040a4734b5542ee5187cfad4cf1d |
|
11-Feb-2016 |
Ben Widawsky <benjamin.widawsky@intel.com> |
i965: Rename optimizer debug 00 filename This allows ls, and scripts to get the file names in the correct order of optimization. Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
56eb9c44adfa38f776689dd1a1bc42fe55c15dd8 |
|
11-Feb-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Pass usage of depth, W, and sample mask through prog_data We really need to stop pulling information directly out of shaders for state setup. For one thing, if we want any sort of an on-disk shader cache, having all of this metadata in one place is going to be crucial. Also, passing it all through prog_data cleans up the compiler <-> state setup API substantially. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ae3543950c93ec4ac179013cb1c7baaf6f5ef4a7 |
|
11-Feb-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Refactor setup_payload_gen6 to assume FS It's extremely FS specific so the fact that we have a stage check in the middle of it is rather bogus. While were here, we rename setup_payload_gen4 and setup_payload_gen6 to make it obvious that they are both FS specific. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a2c8b5ece5790825dba951c35e4c5aab003e3217 |
|
11-Feb-2016 |
Chris Forbes <chrisf@ijw.co.nz> |
i965: ir: dump floats as %-g rather than %f, so we can see denormals Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b8ab9c8c8674d67e09c1134ca44b37e0a611f5b5 |
|
06-Feb-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Plumb separate surfaces and samplers through from NIR Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a37b8110c13bf9e38220d6eb9e531b2acffcb4ed |
|
06-Feb-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Add an enum for keeping track of texture instruciton sources These logical texture instructions can have a *lot* of sources. It's much safer if we have symbolic names for them. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
24f984f64ae58c274f79eaf9148aea37df67131c |
|
18-Jan-2016 |
Emil Velikov <emil.velikov@collabora.com> |
nir: move glsl_types.{cpp,h} to compiler Allows us to remove the SCons workaround :-) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Matt Turner <mattst88@gmail.com> Acked-by: Jose Fonseca <jfonseca@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
315cda671570a149af5117d9b265dc71396122ba |
|
21-Jan-2016 |
Ben Widawsky <benjamin.widawsky@intel.com> |
i965/fs: Remove unused count from vs urb setup This was originally removed here: commit 031d3501322aee0a1474c7f2a9b79f9fa9947430 Author: Kenneth Graunke <kenneth@whitecape.org> Date: Tue Aug 25 16:59:12 2015 -0700 i965/vs: Unify URB entry size/read length calculations between backends. Then added back: commit bd198b9f0a292a9ff4ffffec3a29bad23d62caba Author: Kenneth Graunke <kenneth@whitecape.org> Date: Fri Aug 14 16:01:33 2015 -0700 i965/vs: Simplify fs_visitor's ATTR file. Note that the authorship dates are out of order, but the above reflects the order of the commit dates. Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9870f798beab701a9edda81ff7ccc39f1875d610 |
|
15-Jan-2016 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs/generator: Take an actual shader stage rather than a string Cc: "11.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
97685ff10e0f866d809fc1e8f115fb6e92ce717c |
|
29-Dec-2015 |
Marta Lofstedt <marta.lofstedt@intel.com> |
i965/gen8: Always use BRW_REGISTER_TYPE_UW for MUL on GEN8+ The imulExtended tests of the shader bitfield tests of the OpenGL ES 3.1 CTS, fail on gen8+, when BRW_REGISTER_TYPE_W is used for SHADER_OPECODE_MULH. Also, remove unused helper function: static inline bool type_is_signed(unsigned type) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595 Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cddfc2cefa93b884c40329dcb193fe4fb22143ab |
|
10-Dec-2015 |
Kristian Høgsberg Kristensen <krh@bitplanet.net> |
i965: Add support for gl_DrawIDARB and enable extension We have to break open a new vec4 for gl_DrawIDARB. We've used up all space in the vec4 we use for SGVS and gl_DrawIDARB has to come from its own separate vertex buffer anyway. This is because we point the vb for base vertex and base instance into the draw parameter BO for indirect draw calls, but the draw id is generated by mesa in a different buffer. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
17ebb55a14b5a9aa639845fbda9330ef9421834a |
|
10-Dec-2015 |
Kristian Høgsberg Kristensen <krh@bitplanet.net> |
i965: Add support for gl_BaseVertexARB and gl_BaseInstanceARB We already have gl_BaseVertexARB in the .x component of the SGVS vec4 and plug gl_BaseInstanceARB into the last free component (.y). Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a5038427c3624e559f954124d77304f9ae9b884c |
|
10-Nov-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Add tessellation evaluation shaders The TES is essentially a post-tessellator VS, which has access to the entire TCS output patch, and a special gl_TessCoord input. Otherwise, they're very straightforward. This patch implements SIMD8 tessellation evaluation shaders for Gen8+. The tessellator can generate a lot of geometry, so operating in SIMD8 mode (8 vertices per thread) is more efficient than SIMD4x2 mode (only 2 vertices per thread). I have another patch which implements SIMD4x2 mode for older hardware (or via an environment variable override). We currently handle all inputs via the pull model. v2: Improve comments (suggested by Jordan Justen). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c51f133197437d01696abd9513fbcda4b16b897c |
|
11-Dec-2015 |
Kristian Høgsberg Kristensen <krh@bitplanet.net> |
i965: Move brw_cs_fill_local_id_payload() to libi965_compiler This is a helper function for setting up the local invocation ID payload according to the cs_prog_data generated by the compiler. It's intended to be available to users of libi965_compiler so move it there.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
18069dce4a4c3d71e6afc6b10bfa7bee0560ba9c |
|
11-Nov-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Make uniform offsets be in terms of bytes This commit pushes makes uniform offsets be terms of bytes starting with nir_lower_io. They get converted to be in terms of vec4s or floats when we cram them in the UNIFORM register file but reladdr remains in terms of bytes all the way down to the point where we lower it to a pull constant load. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
13ad8d03f201a4d09bf7ab9078b00807d61dfada |
|
01-Nov-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use a stride of 1 and byte offsets for UBOs Cc: "11.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
83dedb6354d0e9b04e8ccad77e86bdb7bad44bdd |
|
20-Nov-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Add src/dst interference for certain instructions with hazards. When working on tessellation shaders, I created some vec4 virtual opcodes for creating message headers through a sequence like: mov(8) g7<1>UD 0x00000000UD { align1 WE_all 1Q compacted }; mov(1) g7.5<1>UD 0x00000100UD { align1 WE_all }; mov(1) g7<1>UD g0<0,1,0>UD { align1 WE_all compacted }; mov(1) g7.3<1>UD g8<0,1,0>UD { align1 WE_all }; This is done in the generator since the vec4 backend can't handle align1 regioning. From the visitor's point of view, this is a single opcode: hs_set_output_urb_offsets vgrf7.0:UD, 1U, vgrf8.xxxx:UD Normally, there's no hazard between sources and destinations - an instruction (naturally) reads its sources, then writes the result to the destination. However, when the virtual instruction generates multiple hardware instructions, we can get into trouble. In the above example, if the register allocator assigned vgrf7 and vgrf8 to the same hardware register, then we'd clobber the source with 0 in the first instruction, and read back the wrong value in the last one. It occured to me that this is exactly the same problem we have with SIMD16 instructions that use W/UW or B/UB types with 0 stride. The hardware implicitly decodes them as two SIMD8 instructions, and with the overlapping regions, the first would clobber the second. Previously, we handled that by incrementing the live range end IP by 1, which works, but is excessive: the next instruction doesn't actually care about that. It might also be the end of control flow. This might keep values alive too long. What we really want is to say "my source and destinations interfere". This patch creates new infrastructure for doing just that, and teaches the register allocator to add interference when there's a hazard. For my vec4 case, we can determine this by switching on opcodes. For the SIMD16 case, we just move the existing code there. I audited our existing virtual opcodes that generate multiple instructions; I believe FS_OPCODE_PACK_HALF_2x16_SPLIT needs this treatment as well, but no others. v2: Rebased by mattst88. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3e9003e9cf55265ab1fb6522dc5cbb2f455ea1f9 |
|
20-Nov-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix fragment shader struct inputs. Apparently we have literally no support for FS varying struct inputs. This is somewhat surprising, given that we've had tests for that very feature that have been passing for a long time. Normally, varying packing splits up structures for us, so we don't see them in the backend. However, with SSO, varying packing isn't around to save us, and we get actual structs that we have to handle. This patch changes fs_visitor::emit_general_interpolation() to work recursively, properly handling nested structs/arrays/and so on. (It's easier to read with diff -b, as indentation changes.) When using the vec4 VS backend, this fixes rendering in an upcoming game from Feral Interactive. (The scalar VS backend requires additional bug fixes in the next patch.) v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt). Cc: "11.1 11.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f36993b46962eab4446bc1964eb47149751aee26 |
|
23-Nov-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Clean up #includes in the compiler. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6ba700c3c38f216987ebb9b8a1ce80ac784f2d5a |
|
23-Nov-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Compile brw_cs_fill_local_id_payload() as C. It's only called from C, it compiles as C, so just compile it as C. Notice the missing extern "C" on the definition of the function, which would screw things up if the prototype wasn't parsed before the definition. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ecac1aab538d65f0867fd93e23d0d020c1a5d0f1 |
|
23-Nov-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Push down inclusion of brw_program.h. We were including it in headers, which then caused it to be included in tons of places it wasn't needed. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2d8c5299032d229c8f6e936db5644cd53716e6c1 |
|
20-Nov-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Prevent implicit upcasts to brw_reg. Now that backend_reg inherits from brw_reg, we have to be careful to avoid the object slicing problem. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
799f924073c62c3a012c48a51895b46ad621e36c |
|
24-Nov-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Use scope operator to ensure brw_reg is interpreted as a type. In the next patch, I make backend_reg's inheritance from brw_reg private, which confuses clang when it sees the type "struct brw_reg" in the derived class constructors, thinking it is referring to the privately inherited brw_reg: brw_fs.cpp:366:23: error: 'brw_reg' is a private member of 'brw_reg' fs_reg::fs_reg(struct brw_reg reg) : ^ brw_shader.h:39:22: note: constrained by private inheritance here struct backend_reg : private brw_reg ^~~~~~~~~~~~~~~ brw_reg.h:232:8: note: member is declared here struct brw_reg { ^ Avoid this by marking brw_reg with the scope resolution operator.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
309a44d63c75a7d688157486b094e555f49c907d |
|
22-Nov-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Add and use backend_reg::equals(). Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6c8ba59cff14a1a86273f4008ff2a8e68335ab25 |
|
11-Nov-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Use nir_lower_tex for texture coordinate lowering Previously, we had a rescale_texcoords helper in the FS backend for handling rescaling of texture coordinates. Now that we can do variants in NIR, we can use nir_lower_tex to do the rescaling for us. This allows us to delete the i965-specific code and gives us proper TEXTURE_RECTANGLE and GL_CLAMP handling in vertex and geometry shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ce767bbdfff7c2a7829b652c111a11eb9ddba026 |
|
11-Nov-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Move postprocess_nir to codegen time This allows us to insert NIR passes between initial NIR compilation and optimization (link time) and actual backend code-gen. In particular, it will allow us to do shader variants in NIR and share some of that shader variant code between backends. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
718b9f52dd9ba780decf5bb59f5100cf590393a0 |
|
05-Aug-2015 |
Connor Abbott <connor.w.abbott@intel.com> |
i965/fs: print non-1 strides when dumping instructions v2: - Simplify code (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3ccc41ecfc5e9345a1c291748d8840984f7413ae |
|
02-Nov-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Replace fs_reg(imm) constructors with brw_imm_*(). Cuts 10k of .text, of which only 776 bytes are the fs_reg constructor implementations themselves. text data bss dec hex filename 5204535 214112 27784 5446431 531b1f i965_dri.so before 5193977 214112 27784 5435873 52f1e1 i965_dri.so after Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fc19a0d2e422ea8e45bc5440a91f858f5f345884 |
|
08-Nov-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Allow indirect GS input indexing in the scalar backend. This allows arbitrary non-constant indices on GS input arrays, both for the vertex index, and any array offsets beyond that. All indirects are handled via the pull model. We could potentially handle indirect addressing of pushed data as well, but it would add additional code complexity, and we usually have to pull inputs anyway due to the sheer volume of input data. Plus, marking pushed inputs as live due to indirect addressing could exacerbate register pressure problems pretty badly. We'd need to be careful. v2: Use updated MOV_INDIRECT opcode. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c531d409274328c9713221f33f1d24e0f4877451 |
|
17-Nov-2015 |
Ben Widawsky <benjamin.widawsky@intel.com> |
i965: Add assertion for src_stencil payload size This helps address a coverity warning and prevents future questions about this code. Reported-by: Coverity (via Ilia) Cc: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d2f089ba17c6b17823fc3d244e15c0a18108d5ce |
|
08-Nov-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Introduce a MOV_INDIRECT opcode. The geometry and tessellation control shader stages both read from multiple URB entries (one per vertex). The thread payload contains several URB handles which reference these separate memory segments. In GLSL, these inputs are represented as per-vertex arrays; the outermost array index selects which vertex's inputs to read. This array index does not necessarily need to be constant. To handle that, we need to use indirect addressing on GRFs to select which of the thread payload registers has the appropriate URB handle. (This is before we can even think about applying the pull model!) This patch introduces a new opcode which performs a MOV from a source using VxH indirect addressing (which allows each of the 8 SIMD channels to select distinct data.) Based on a patch by Jason Ekstrand. v2: Rename from INDIRECT_THREAD_PAYLOAD_MOV to MOV_INDIRECT; make it a bit more generic. Use regs_read() instead of hacking up the register allocator. (Suggested by Jason Ekstrand.) v3: Fix regs_read() to be more accurate for small unaligned regions. Also rebase on Matt's work. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v3] Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> [v1]
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5480bbd90ea288877b6e56d4860feb8f97bcba80 |
|
07-Nov-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Add a SHADER_OPCODE_URB_READ_SIMD8_PER_SLOT opcode. We need to use per-slot offsets when there's non-uniform indexing, as each SIMD channel could have a different index. We want to use them for any non-constant index (even if uniform), as it lives in the message header instead of the descriptor, allowing us to set offsets in GRFs rather than immediates. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f88c175a29bb287d41ef90343eb6670525475a06 |
|
12-Nov-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Make convert_attr_sources_to_hw_regs handle stride == 0. This makes expressions like component(fs_reg(ATTR, n), 7) get a proper <0,1,0> region instead of the invalid <0,8,0>. Nobody uses this today, but I plan to. v2: Rebase on Matt's changes; simplify. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
49b3215d7076db8b9afe8998b01ef250795b5892 |
|
27-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Combine register file field. The first four values (2-bits) are hardware values, and VGRF, ATTR, and UNIFORM remain values used in the IR. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b3315a6f56fb93f2884168cbf9358b2606641db5 |
|
27-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Replace HW_REG with ARF/FIXED_GRF. HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to look at different fields for type, abs, negate, writemask, swizzle, and a second file. They also caused annoying problems like immediate sources being considered scheduling barriers (commit 6148e94e2) and other such nonsense. Instead use ARF/FIXED_GRF/MRF for fixed registers in those files. After a sufficient amount of time has passed since "GRF" was used, we can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4b0fbebf024e564c195f3ce94e1ce43a3d6442ea |
|
02-Nov-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Set stride correctly for immediates in fs_reg(brw_reg). The fs_reg() constructors for immediates set stride to 0, except for vector-immediates, which set stride to 1. This patch makes the fs_reg constructor that takes a brw_reg do likewise, so that stride is set correctly for cases such as fs_reg(brw_imm_v(...)). The generator asserts that this is true (and presumably it's useful in some optimization passes?) and the VF fs_reg constructors did this (by virtue of the fact that it doesn't override what init() does). In the next commit, calling this constructor with brw_imm_* will generate an IMM file register rather than a HW_REG, making this change necessary to avoid breakage with existing uses of brw_imm_v(). Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b163aa01487ab5f9b22c48b7badc5d65999c4985 |
|
27-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Rename GRF to VGRF. The 2-bit hardware register file field is ARF, GRF, MRF, IMM. Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to mean an assigned general purpose register. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7638e75cf99263c1ee8e31c6cc5a319feec2c943 |
|
26-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Use brw_reg's nr field to store register number. In addition to combining another field, we get replace silliness like "reg.reg" with something that actually makes sense, "reg.nr"; and no one will ever wonder again why dst.reg isn't a dst_reg. Moving the now 16-bit nr field to a 16-bit boundary decreases code size by about 3k. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3048053908310eaf082058e5be34ae902e1fc02c |
|
26-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Unwrap some lines. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
94b1031703b1b5759436fe215323727cffce5f86 |
|
25-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Remove fixed_hw_reg field from backend_reg. Since backend_reg now inherits brw_reg, we can use it in place of the fixed_hw_reg field. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1392e45bfb396ccbfa5bb0c6063522e0550988d3 |
|
24-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Use immediate storage in inherited brw_reg. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e42fb0c2a687cdcd6af2a590f6f5e24f64cfff3b |
|
23-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Make 'dw1' and 'bits' unnamed structures in brw_reg. Generated by sed -i -e 's/\.bits\././g' *.c *.h *.cpp sed -i -e 's/dw1\.//g' *.c *.h *.cpp and then reverting changes to comments in gen7_blorp.cpp and brw_fs_generator.cpp. There wasn't any utility offered by forcing the programmer to list these to access their fields. Removing them will reduce churn in future commits. This is C11 (and gcc has apparently supported it for sometime "compatibility with other compilers") See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e42a29531ae3d5dedb72011da2947357dfa8715b |
|
10-Nov-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Print force_writemask_all in dump_instructions(). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8dcf807cb43383590ba193c7ff20b8a98e4a9f65 |
|
14-Oct-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix scalar VS float[] and vec2[] output arrays. The scalar VS backend has never handled float[] and vec2[] outputs correctly (my original code was broken). Outputs need to be padded out to vec4 slots. In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4. However, this is wrong: type_size_scalar() for a float[2] would return 2, or for vec2[2] it would return 4. This looked like a single slot, even though in reality each array element would be stored in separate vec4 slots. Because of this bug, outputs[] and output_components[] would not get initialized for the second element's VARYING_SLOT, which meant emit_urb_writes() would skip writing them. Nothing used those values, and dead code elimination threw a party. To fix this, we introduce a new type_size_vec4_times_4() function which pads array elements correctly, but still counts in scalar components, generating correct indices in store_output intrinsics. Normally, varying packing avoids this problem by turning varyings into vec4s. So this doesn't actually fix any Piglit or dEQP tests today. However, if varying packing is disabled, things would be broken. Tessellation shaders can't use varying packing, so this fixes various tcs-input Piglit tests on a branch of mine. v2: Shorten the implementation of type_size_4x to a single line (caught by Connor Abbott), and rename it to type_size_vec4_times_4() (renaming suggested by Jason Ekstrand). Use type_size_vec4 rather than using type_size_vec4_times_4 and then dividing by 4. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d7013988fb1d1c277e1fbce8623abddc43f78e05 |
|
30-Oct-2015 |
Iago Toral Quiroga <itoral@igalia.com> |
i965/fs: Do not mark used direct surfaces in UNIFORM_PULL_CONSTANT_LOAD Right now the generator marks direct surfaces as used but leaves marking of indirect surfaces to the caller. Just make the callers handle marking in both cases for consistency. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
027b64a55afc0fe8efcf9f6217192807e285c830 |
|
30-Oct-2015 |
Iago Toral Quiroga <itoral@igalia.com> |
i965/fs: Do not mark direct used surfaces in VARYING_PULL_CONSTANT_LOAD Right now the generator marks direct surfaces as used but leaves marking of indirect surfaces to the caller. Just make the callers handle marking in both cases for consistency. v2: Use const and remove useless surf_index temporary (Curro) Reviewed-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a6804654283a9d03bee92d61eee5b1d036c8db68 |
|
09-Sep-2015 |
Neil Roberts <neil@linux.intel.com> |
i965/fs/skl+: Fix calculating gl_SampleID for 16x MSAA In order to accomodate 16x MSAA, the starting sample pair index is now 3 bits rather than 2 on SKL+. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e386fb0dee40d0f2342b43b6750b64c8174463a9 |
|
08-Sep-2015 |
Neil Roberts <neil@linux.intel.com> |
i965/fs/skl+: Use ld2dms_w instead of ld2dms In order to support 16x MSAA, skl+ has a wider version of ld2dms that takes two parameters for the MCS data. The MCS data retrieved from the ld_mcs instruction already returns 4 or 8 registers and is documented to return zeroes for the mcsh value when the sample count is less than 16. v2: Use get_lowered_simd_width to fall back to SIMD8 instructions when the message length would be too long in SIMD16. Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
36fd65381756ed1b8f774f7fcdd555941a3d39e1 |
|
12-Mar-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Add scalar geometry shader support. This is hidden behind INTEL_SCALAR_GS=1 for now, as we don't yet support instanced geometry shaders, and Orbital Explorer's shader spills like crazy. But the infrastructure is in place, and it's largely working. v2: Lots of rebasing. v3: (feedback from Kristian Høgsberg) - Handle stride and subreg_offset correctly for ATTRs; use a helper. - Fix missing emit_shader_time_end() call. - Delete dead code after early EOT in static vertex case to avoid tripping asserts in emit_shader_time_end(). - Use proper D/UD type in intexp2(). - Fix "EndPrimitve" and "to that" typos. - Assert that invocations == 1 so we know this is missing. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7c81a6a647257c309cb1ca36c60aa4bfa8e2e022 |
|
26-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Replace default case with list of enum values. If we add a new file type, we'd like to get warnings if it's not handled. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6a15517242214c739bfdd8b6a480ecca81e776d6 |
|
09-Oct-2015 |
Emil Velikov <emil.velikov@collabora.com> |
i965/fs: move the fs_reg::smear() from get_timestamp() to the callers We're about to reuse get_timestamp() for the nir_intrinsic_shader_clock. In the latter the generalisation does not apply, so move the smear() where needed. This also makes the function analogous to the vec4 one. v2: Tweak the comment - The caller -> We (Matt, Connor). v3: More comment tweaks (Connor) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
486268bdb03a36faf09d84e0458ff49dd1325c40 |
|
06-Jun-2015 |
Connor Abbott <cwabbott0@gmail.com> |
i965: always run the post-RA scheduler Before, we would only do scheduling after register allocation if we spilled, despite the fact that the pre-RA scheduler was only supposed to be for register pressure and set the latencies of every instruction to 1. This meant that unless we spilled, which we rarely do, then we never considered instruction latencies at all, and we usually never bothered to try and hide texture fetch latency. Although a later commit removes the setting the latency to 1 part, we still want to always run the post-RA scheduler since it's able to take the false dependencies that the register allocator creates into account, and it can be more aggressive than the pre-RA scheduler since it doesn't have to worry about register pressure at all. Test master post-ra-sched diff %diff bench_OglPSBump2 396.730 402.386 5.656 +1.400% bench_OglPSBump8 244.370 247.591 3.221 +1.300% bench_OglPSPhong 241.117 242.002 0.885 +0.300% bench_OglPSPom 59.555 59.725 0.170 +0.200% bench_OglShMapPcf 86.149 102.346 16.197 +18.800% bench_OglVSTangent 388.849 395.489 6.640 +1.700% bench_trex 65.471 65.862 0.390 +0.500% bench_trexoff 69.562 70.150 0.588 +0.800% bench_heaven 25.179 25.254 0.074 +0.200% Reviewed-by: Jason Ekstrand <jasoan.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8c4151b866181198cb850137a6b65052e79554b1 |
|
29-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Use group(4, 0) to emit an exec-size 4 MOV. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8c902a580a490181e7cde29073b11181db4614f8 |
|
17-Jun-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Implement ARB_fragment_layer_viewport. Normally, we could read gl_Layer from bits 26:16 of R0.0. However, the specification requires that bogus out-of-range 32-bit values written by previous stages need to appear in the fragment shader as-written. Instead, we pass in the full 32-bit value from the VUE header as an extra flat-shaded varying. We have the SF override the value to 0 when the previous stage didn't actually write a value (it's actually defined to return 0). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b94cdcdada251bb8e866cb7af0f2ff222b55a918 |
|
26-Oct-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Properly check for PAD in fragment shaders with > 16 varyings. Commit 268008f98c3810b9f276df985dc93efc0c49f33e changed unused VUE map slots to be initialized with BRW_VARYING_SLOT_PAD, not COUNT. I missed updating this. It also means that commit message was wrong, as some code *did* rely slots being initialized to COUNT. This may fix a bug with SSO programs with > 16 FS input varyings. I think we probably just emitted extra pointless code, but probably didn't break anything. We might also just have no tests for that. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
aedc0aab19c233cc084211959ef2b6be1c500bb7 |
|
21-Oct-2015 |
Kristian Høgsberg Kristensen <krh@bitplanet.net> |
i965/fs: Use unsigned immediate 0 when eliminating SHADER_OPCODE_FIND_LIVE_CHANNEL The destination for SHADER_OPCODE_FIND_LIVE_CHANNEL is always a UD register. When we replace the opcode with a MOV, make sure we use a UD immediate 0 so copy propagation doesn't bail because of non-matching types. Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
de5a450bd360d24db65cbba5b6633f800fda0d2e |
|
17-Oct-2015 |
Kristian Høgsberg Kristensen <krh@bitplanet.net> |
i965: Don't use message headers for untyped reads We always set the mask to 0xffff, which is what it defaults to when no header is present. Let's drop the header instead. v2: Only remove header for untyped reads. Typed reads always need the header. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e2344e11ce8ddefb89a222bbf63a7c60e8ba5655 |
|
21-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Trim unneeded channels in SampleID setup. The AND and SHR produce a scalar value that we had been replicating across $dispatch_width channels. The immediate MOV produces only four useful channels of data. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e10fc055e7dc5281f03a77088a24392098e3473b |
|
21-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Use type-W for immediate in SampleID setup. Not a functional difference, but register is loaded with a signed immediate (V) and added to a signed type (D) producing a signed result (D). Also change the type of g0 to allow for compaction. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1db44252d01bf7539452ccc2b5210c74b8dcd573 |
|
20-Oct-2015 |
Ben Widawsky <benjamin.widawsky@intel.com> |
i965: Implement ARB_shader_stencil_export (gen9+) v2: remove useless source_stencil_to_render_target (Ken) Squash in the actual packing function, which also got to v2: Move the definition of the OPCODE outside of FB_WRITE opcodes (Matt) Reorder the regioning to be in VWH order (Matt) Don't retype src in the backend, just assert instead (Matt) Rename the debug prints to something better (Matt) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5fa7114652068735347c8715d1fc1d2cef72c433 |
|
20-Oct-2015 |
Ben Widawsky <benjamin.widawsky@intel.com> |
i965/fs: Enumerate logical fb writes arguments Gen9 adds the ability to write out a stencil value, so we need to expand the virtual payload by one. Abstracting this now makes that change easier to read. I was admittedly confused early on about some of the hardcoding. If people believe the resulting code is inferior, I am not super attached to the patch. v2: Remove explicit numbering from the enumeration (Matt). Use a real naming scheme, and reference it in the opcode definition (Curro) Add a missed hardcoded logical position in get_lowered_simd_width (Ben) Add an assertion to make sure the component numbering is correct (Ben) Cc: Matt Turner <mattst88@gmail.com> Cc: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ac98888afdc121e6eaafc9c5393647a2df4baef6 |
|
29-Sep-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Introduce a new SHADER_OPCODE_URB_READ_SIMD8 opcode. In scalar mode, geometry shader inputs can easily take up hundreds of registers. This makes pushing VUE entries impractical; we'll need to resort to the pull model in some cases. To support this, we introduce a new opcode corresponding to the "URB Read SIMD8" message. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bea75227829512ab0e4766e00ac1b509c7586667 |
|
06-May-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Introduce new SHADER_OPCODE_URB_WRITE_SIMD8_MASKED/PER_SLOT opcodes. In the vec4 backend, we have a vec4_instruction::urb_write_flags field. There are many kinds of flags for SIMD4x2 messages. However, there are really only two (per-slot offset, use channel masks) for SIMD8 messages. Rather than adding a boolean flag for per-slot offsets (polluting all instructions), I decided to just make three new opcodes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ee77796a5c97105bf7e92e3a7931ee0f331a0545 |
|
20-Oct-2015 |
Neil Roberts <neil@linux.intel.com> |
i965/fs: Disable opt_sampler_eot for more message types In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4 message types because I found by experimentation that it doesn't work. I wrote in the comment that I couldn't find any documentation for this problem. However I've now found the documentation and it has additional restrictions on further message types so this patch updates the comment and adds the others. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
801f151917fedb13c5c6e96281a18d833dd6901f |
|
20-Oct-2015 |
Neil Roberts <neil@linux.intel.com> |
i965: Remove block arg from foreach_inst_in_block_*_starting_from Since 49374fab5d793 these macros no longer actually use the block argument. I think this is worth doing to make the macros easier to use because they already have really long names and a confusing set of arguments. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9e17c36b8ba79e688011a5fd293ad5f42da21b66 |
|
14-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Extract can_change_source_types() functions. Make them members of fs_inst/vec4_instruction for use elsewhere. Also fix the fs version to check that dst.type == src[1].type and for !saturate. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4467344c829f1dccdf74e27bef2c5fda72552be6 |
|
09-Oct-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Rename brw_foo_emit to brw_compile_foo Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
67db9072b9fde74277f74f7303366b8bdd3a711e |
|
09-Oct-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Move some of the prog_data setup into brw_wm_emit This commit moves the common/modern stuff. Some legacy stuff such as setting use_alt_mode was left because it needs to know whether or not we're an ARB program. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4e711872d024ce41c8b07b1150d8a393de21e26d |
|
09-Oct-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/cs: Rework cs_emit to take a nir_shader and a brw_compiler This commit removes all dependence on GL state by getting rid of the brw_context parameter and the GL data structures. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
22ad44910e993e1acd0b4052722fe786626008b5 |
|
06-Oct-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Rework wm_fs_emit to take a nir_shader and a brw_compiler This commit removes all dependence on GL state by getting rid of the brw_context parameter and the GL data structures. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5e86f5b3d21fe8e96676bb0608990d72dbf61b85 |
|
06-Oct-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Remove the gl_program from the generator Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b9b40ef9b7644ea24768bc8b7464b1719efe99bf |
|
10-Oct-2015 |
Rob Clark <robclark@freedesktop.org> |
nir: remove dependency on glsl Move glsl_types into NIR, now that the dependency on glsl_symbol_table has been split out. Possibly makes sense to rename things at this point, but if we do that I'd like to keep it split out into a separate patch to make git history easier to follow (IMHO). v2: fix android build v3: I f***ing hate scons.. but at least it builds Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
176e6930e6c24dfce7cc730faa2612d27689a4df |
|
18-Jul-2015 |
Timothy Arceri <t_arceri@yahoo.com.au> |
i965: add arrays of arrays support for varyings V2: get the correct vector elements value for outputs Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bd198b9f0a292a9ff4ffffec3a29bad23d62caba |
|
15-Aug-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/vs: Simplify fs_visitor's ATTR file. Previously, ATTR was indexed by VERT_ATTRIB_* slots; at the end of compilation, assign_vs_urb_setup() translated those into GRF units, and converted ATTR to HW_REGs. This patch moves the transslation earlier, making ATTR work in terms of GRF units from the beginning. assign_vs_urb_setup() simply has to add the number of payload registers and push constants to obtain the final hardware GRF number. (We can't do this earlier as those values aren't known.) ATTR still supports reg_offset; however, it's simply added to reg. It's not clear whether this is valuable or not. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
031d3501322aee0a1474c7f2a9b79f9fa9947430 |
|
26-Aug-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/vs: Unify URB entry size/read length calculations between backends. Both the vec4 and scalar VS backends had virtually identical URB entry size and read length calculations. We can move those up a level to backend-agnostic code and reuse it for both. Unfortunately, the backends need to know nr_attributes to compute first_non_payload_grf, so I had to store that in prog_data. We could use urb_read_length, but that's nr_attributes rounded up to a multiple of two, so doing so would waste a register in some cases. There's more code to be removed in the vec4 backend, but that will come in a follow-on patch. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9a2573e5fc63f48cde56efdb191c129e7d7fb7b1 |
|
07-Oct-2015 |
Kristian Høgsberg Kristensen <krh@bitplanet.net> |
i965/cs: Get max_cs_threads from brw_compiler devinfo Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ee0f0108c8e87b9cfec25bade66670bbc4254139 |
|
07-Oct-2015 |
Kristian Høgsberg Kristensen <krh@bitplanet.net> |
i965: Move brw_get_shader_time_index() call out of emit functions brw_get_shader_time_index() is all tangled up in brw_context state and we can't call it from the compiler. Thanks the Jasons recent refactoring, we can just get the index and pass to the emit functions instead. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
469d0e449b78ad68e199dbe60e900487255a5d5d |
|
06-Oct-2015 |
Kristian Høgsberg Kristensen <krh@bitplanet.net> |
i965/cs: Split out helper for building local id payload The initial motivation for this patch was to avoid calling brw_cs_prog_local_id_payload_dwords() in gen7_cs_state.c from the compiler. This commit ends up refactoring things a bit more so as to split out the logic to build the local id payload to brw_fs.cpp. This moves the payload building closer to the compiler code that uses the payload layout and makes it available to other users of the compiler. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ba71d581aeb96c4626500eb5b19f3bef2f40d586 |
|
05-Oct-2015 |
Kristian Høgsberg Kristensen <krh@bitplanet.net> |
i965: Move brw_dump_ir() out of brw_*_emit() functions We move these calls one level up into the codegen functions. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3141906fa36839e9276cb65033857c85b39376e5 |
|
22-Sep-2015 |
Iago Toral Quiroga <itoral@igalia.com> |
i965: Define FIRST_SPILL_MRF and FIRST_PULL_LOAD_MRF only once and in one place That should make tracking where we do spills and pull loads a bit easier. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
36e82b137d4a77f24de0fc722c80e445b6e3375c |
|
22-Sep-2015 |
Iago Toral Quiroga <itoral@igalia.com> |
i965: make pull constant loads in gen6 start at MRFs 16/17 So they do not conflict with our (un)spills (MRF 21..23) or our URB writes (MRF 1..15) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0c2add775192f3ee0325d61964ef67f7ca3f6d4e |
|
22-Sep-2015 |
Iago Toral Quiroga <itoral@igalia.com> |
i965: Fix remove_duplicate_mrf_writes so it can handle 24 MRFs in gen6 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5a360dcad1fdb91f9129cb21775b9af60cbf57e4 |
|
03-Oct-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Generalize predicated break pass for use in vec4 backend. instructions in affected programs: 44204 -> 43762 (-1.00%) helped: 221 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bf7b6fd3fd6d98305d64ee6224ca9f9e7ba48444 |
|
02-Oct-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/shader: Get rid of the shader, prog, and shader_prog fields Unfortunately, we can't get rid of them entirely. The FS backend still needs gl_program for handling TEXTURE_RECTANGLE. The GS vec4 backend still needs gl_shader_program for handling transfom feedback. However, the VS needs neither and we can substantially reduce the amount they are used. One day we will be free from their tyranny. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
404419ee1a57c79982d93eefe4de099d61ad2eee |
|
02-Oct-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs,vec4: Get rid of the sanity_param_count It doesn't exist for anything other than an assert that, as far as I can tell, isn't possible to trip. Soon, we will remove prog from the visitor entirely and this will become even more impossible to hit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
756613ed35d6fd2216b5138731c0c38886b8e14a |
|
02-Oct-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use the nir info instead of pulling things out of [shader_]prog Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7b974c5f902b3f652776471aa35306195247a8a7 |
|
01-Oct-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/cs: Remove the prog argument from local_id_payload_dwords Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ea006c4cb5eb2d98d6bfd5a6c32fcae10b636f17 |
|
01-Oct-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Move binding table setup to codegen time. Setting up binding tables really has little to do with the actual process of turning shaders into instructions; it's more part of setting up prog_data. This commit moves it out of the visitors and with the rest of the prog_data setup stuff. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
28709e37d96d6b64753ca4dcce5fbfeb75f5b499 |
|
01-Oct-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/shader: Pull assign_common_binding_table_offsets out of backend_shader This really has nothing to do with the backend compiler and we'd like to eventually be able to set this up earlier in the compile process. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3de81508ea513bf01f2c996c25a2cfdb5b3231d0 |
|
30-Sep-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/shader: Get rid of the setup_vec4_uniform_value helper It's not used by anything anymore Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
604ce8253ae796ecf9763f1612e2fff25591cb07 |
|
26-Aug-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Print reg and reg_offset separately for ATTR files. Reading this output was really confusing. reg represents attribute slots; reg_offset is the x/y/z/w component (0..3) within a vec4 slot. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d1be9d21265cf4e344a5d78b17cea7ee2c8408a1 |
|
24-Sep-2015 |
Jordan Justen <jordan.l.justen@intel.com> |
i965/cs: Add a binding table entry for gl_NumWorkGroups If glDispatchComputeIndirect is used, then the value for this variable must be read from the indirect BO. To allow the same generated code to support indirect and glDispatchCompute, we will also setup a BO for the number of work groups using the intel_upload_data mechanism. This will only be required if the gl_NumWorkGroups variable is accessed. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
99df02ca26f6127c8fa24d38a8a069ac6159356a |
|
10-Sep-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Don't re-layout varyings for separate shader programs. Previously, our VUE map code always assigned slots to varyings sequentially, in one contiguous block. This was a bad fit for separate shaders - the GS input layout depended or the VS output layout, so if we swapped out vertex shaders, we might have to recompile the GS on the fly - which rather defeats the point of using separate shader objects. (Tessellation would suffer from this as well - we could have to recompile the HS, DS, and GS.) Instead, this patch makes the VUE map for separate shaders use a fixed layout, based on the input/output variable's location field. (This is either specified by layout(location = ...) or assigned by the linker.) Corresponding inputs/outputs will match up by location; if there's a mismatch, we're allowed to have undefined behavior. This may be less efficient - depending what locations were chosen, we may have empty padding slots in the VUE. But applications presumably use small consecutive integers for locations, so it hopefully won't be much worse in practice. 3% of Dota 2 Reborn shaders are hurt, but only by 2 instructions. This seems like a small price to pay for avoiding recompiles. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b23eb643ebab9ef250ce026a7e2f651de9be10f6 |
|
13-Apr-2015 |
Samuel Iglesias Gonsalvez <siglesias@igalia.com> |
i965/fs: Implement FS_OPCODE_GET_BUFFER_SIZE Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2ea16966ae66d4dd5c5dcb996d7996d9c734bbee |
|
24-Sep-2015 |
Kristian Høgsberg Kristensen <krh@bitplanet.net> |
i965: Respect stride and subreg_offset for ATTR registers When we assign hw regs to attributes, we don't incorporate the stride and subreg_offset from the fs_reg. It's rarely used, but the integer multiplication lowering uses unusual stride and subreg_offset combination breaks when one source is an attribute. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91970 Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f50645d05c6dffa6463856ded0b8461ac9d24535 |
|
15-Sep-2015 |
Iago Toral Quiroga <itoral@igalia.com> |
i965: Turn BRW_MAX_MRF into a macro that accepts a hardware generation There are some bug reports about shaders failing to compile in gen6 because MRF 14 is used when we need to spill. For example: https://bugs.freedesktop.org/show_bug.cgi?id=86469 https://bugs.freedesktop.org/show_bug.cgi?id=90631 Discussion in bugzilla pointed to the fact that gen6 might actually have 24 MRF registers available instead of 16, so we could use other MRF registers and avoid these conflicts (we still need to investigate why some shaders need up to MRF 14 anyway, since this is not expected). Notice that the hardware docs are not clear about this fact: SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device Hardware" says "Number per Thread" - "24 registers" However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says: "Normal threads should construct their messages in m1..m15. (...) Regardless of actual hardware implementation, the thread should not assume th at MRF addresses above m15 wrap to legal MRF registers." Therefore experimentation was necessary to evaluate if we had these extra MRF registers available or not. This was tested in gen6 using MRF registers 21..23 for spilling and doing a full piglit run (all.py) forcing spilling of everything on the FS backend. It was also tested by doing spilling of everything on both the FS and the VS backends with a piglit run of shader.py. In both cases no regressions were observed. In fact, many of these tests where helped in the cases where we forced spilling, since that triggered the same underlying problem described in the bug reports. Here are some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on gen6 hardware: Using MRFs 13..15 for spilling: crash: 2, fail: 113, pass: 6621, skip: 5461 Using MRFs 21..23 for spilling: crash: 2, fail: 12, pass: 6722, skip: 5461 This patch sets the ground for later patches to implement spilling using MRF registers 21..23 in gen6. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
47e18a595731c054ac254e26066e6dea804f34e8 |
|
15-Sep-2015 |
Jordan Justen <jordan.l.justen@intel.com> |
i965/fs: The barrier send uses only 1 payload register When preparing the barrier payload, the instructions should operate in simd8 mode since we only use 1 payload register. fs_inst::regs_read is also updated to indicate that it only reads one register for SHADER_OPCODE_BARRIER. These issues were flagged by: commit cadd7dd384b33a779d46bd664f456bed4a21a5b7 Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Thu Jul 2 15:41:02 2015 -0700 i965/fs: Add a very basic validation pass Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cadd7dd384b33a779d46bd664f456bed4a21a5b7 |
|
03-Jul-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Add a very basic validation pass Currently the validation pass only validates that regs_read and regs_written are consistent with the sizes of VGRF's. We can add more as we find it to be useful. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a548c75e31b4146d55133cb8c57a82117c196584 |
|
05-Sep-2015 |
Kristian Høgsberg Kristensen <krh@bitplanet.net> |
i965: Move perf_debug code to brw_codegen_*_prog() We're trying to avoid a libdrm dependency in the core compiler, so let's move the perf_debug code one level up from the brw_*_emit() helpers to the brw_codegen_*_prog() helpers. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
84f2ed2cfdab45aa949aa6affe46cfe2944759c1 |
|
05-Sep-2015 |
Kristian Høgsberg Kristensen <krh@bitplanet.net> |
i965: Move brw_fs_precompile() to brw_wm.c All other precompile functions live in the brw_<stage>.c files, make fs follow the convention. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
dc70c86b9b485cb5006a55cc2efd1f154dbfd469 |
|
05-Sep-2015 |
Kristian Høgsberg Kristensen <krh@bitplanet.net> |
i965: Move compute shader code around This moves the compute shader code around in order to make the way the code is split up more consistent. There should be no functional changes. Typically we have a few files per stage: brw_vs.c, brw_wm.c brw_gs.c: code to drive code generation and implement precompiling and cache search. genX_<stage>_state.c gen specific implementation of the state emission for the shader stage. The brw_*_emit() functions are all in the same files as the visitor classes they use (with the exception of VS, which may use either vec4 or fs). To make compute follow this convention, we move the brw_cs_emit() function into brw_fs.cpp. We can then rename brw_cs.cpp to brw_cs.c and do this in C like the other similar files. Finally, move state setup and atoms to gen7_cs_state.c. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c7161a3c3559f0450a90bb1228c74e8fdc9c939b |
|
22-Nov-2014 |
Jordan Justen <jordan.l.justen@intel.com> |
i965/cs: Reserve local invocation id in payload regs Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
af48612b88cb51cd3b957e70490462c0c404f92c |
|
04-Oct-2014 |
Jordan Justen <jordan.l.justen@intel.com> |
i965/fs: Set first_non_payload_grf in assign_curb_setup first_non_payload_grf may be updated in assign_urb_setup for FS or assign_vs_urb_setup for VS. We need to set this in assign_curb_setup for compute shaders since cs does not have an assign_cs_urb_setup like assign_urb_setup (fs) or assign_vs_urb_setup (vs). Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0b91bcea98c0fe201bba89abe1ca3aee4d04c56c |
|
12-Aug-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
i965: add support for textureSamples function Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> [v2: kayden-supplied code in fs_nir replacing need for logical opcode] Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a2151560b8d65be31129c00872ea8d70c564b110 |
|
28-Aug-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Move brw_setup_tex_for_precompile to brw_program.[ch]. This living in brw_fs.{h,cpp} is a historical artifact of us supporting texturing for fragment shaders before any other stages. It's kind of awkward given that we use it for all stages. This avoids having to include brw_fs.h in geometry shader code in order to access this function. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9390cb84593bda516e8c1521c87a08475574d1be |
|
02-Sep-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Handle MRF destinations in lower_integer_multiplication(). The lowered code reads from the destination, which isn't possible from message registers. Fixes the following dEQP tests on SNB: dEQP-GLES3.functional.shaders.precision.int.highp_mul_fragment dEQP-GLES3.functional.shaders.precision.int.mediump_mul_fragment dEQP-GLES3.functional.shaders.precision.int.lowp_mul_fragment Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org> Tested-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8765f1d7ddfb00dc5b202e4e679ebe640a547d50 |
|
18-Aug-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Only consider fixed_hw_reg in equals() if file is HW_REG/IMM. Noticed when debugging things that lead to the next patch. On G45 (and presumably ILK) this helps register coalescing: total instructions in shared programs: 4077373 -> 4077340 (-0.00%) instructions in affected programs: 43751 -> 43718 (-0.08%) helped: 52 HURT: 2 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fee0c5af11dd0995de96e7053377d425a66d03a0 |
|
19-Aug-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Split VGRFs after lowering pull constants The split_virtual_grfs code doesn't properly rewrite reladdr so we need to make sure that any uniform indirects are lowered away first. This fixes the glsl-fs-uniform-indexed-by-swizzled-vec4.shader_test in piglit Cc: "10.6" <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f2e667172a6382f81d1f3e709f02c7ee6cfda4c7 |
|
19-Aug-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i964/fs: Refactor assign_constant_locations Now that all constant locations are assigned in a single function, we can refactor it a bit to unify things. In particular, we now handle pull_constant_loc and push_constant_loc more similarly and we only modify stage_prog_data->params[] in one place at the end of the function. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
dfacae3a56463e2df3a67e245f868e9f2be64dcd |
|
19-Aug-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Combine assign_constant_locations and move_uniform_array_access_to_pull_constants The comment above move_uniform_array_access_to_pull_constants was completely bogus because it has nothing to do with lowering instructions. Instead, it's assiging locations of pull constants. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
640c472fd075814972b1276c5b0ed3a769aacda5 |
|
12-Aug-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Move type_size() methods out of visitor classes. I want to use C function pointers to these, and they don't use anything in the visitor classes anyway. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c56899f41a904762225267cb9c543a0abd901ad5 |
|
19-Aug-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Make setup_vec4_uniform_value and _image_uniform_values take an offset This way they don't implicitly increment the uniforms variable and don't have to be called in-sequence during uniform setup. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8d8b8f58540abbdb8a006a38830a08346a0edf34 |
|
19-Aug-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Rename setup_vector_uniform_values to setup_vec4_uniform_value The new name more accurately represents what it does: Set up a single vec4 uniform value. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
84431c1f1d343c85f3b7fa265293a1d245ba9cf3 |
|
05-May-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965: Teach type_size() about the size of an image uniform. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8a688bee83ced46eb4bff741f05d2da033c07ade |
|
10-Aug-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Make resolve_source_modifiers consistent with the vec4 version Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ee977183dcb543c919d0d70dde610cb191d5a3ea |
|
04-Aug-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Lower arithmetic instructions with register regions of unsupported width. This extends the SIMD lowering pass to enforce the hardware limitation that no directly-addressed source may read more than 2 physical GRFs. One can easily go over this limit when doing 64-bit arithmetic (e.g. FP64 or extended-precision integer MULs) or SIMD32, so it's nice to be able to just emit an instruction of the intended execution size from the visitor and let the lowering pass deal with this restriction transparently. Some hardware arithmetic instructions are not handled here, including all instructions that use the accumulator implicitly (which the SIMD lowering pass deliberately doesn't handle), instructions with non-per-channel sources (e.g. LINE or PLANE) and SEND-like instructions, which need special handling most likely as virtual opcodes. Reviewed-by: Connor Abbott <connor.w.abbott@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
42a18ca76057621ae7d8812b29ea2245d6ff282d |
|
05-Aug-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix fs_inst::regs_read() for sources in the ATTR file. Otherwise it would crash on Gen8 with scalar VS. The issue can easily be reproduced with the following patch, but I don't see any reason why it wouldn't be possible to end up with an ATTR argument here even without it. CC: mesa-stable@lists.freedesktop.org Reviewed-by: Connor Abbott <connor.w.abbott@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3b48a0eeda20f5cf2dbc8de5e36f8fe3461f41bf |
|
06-Aug-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Lower the MULH virtual instruction. Translate MULH into the MUL/MACH sequence. This does roughly the same thing that nir_emit_alu() used to do but we can now handle 16-wide by taking advantage of the SIMD lowering pass. The force_sechalf workaround near the bottom is required because the SIMD lowering pass will emit instructions with non-zero quarter control and we need to make sure we avoid that on integer arithmetic instructions with implicit accumulator access due to a known hardware bug on IVB. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2e731264382954beb1192cd7cc62e16e0b8e7978 |
|
05-Aug-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Indent the implementation of 32x32-bit MUL lowering by one level. In order to make room for the code that will lower the MULH virtual instruction. Also move the hardware generation and execution type checks into the same branch, they are going to have to be different for MULH. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f5b37fb1acad9cf044b7b6d4fa5f2582bd8bc7f4 |
|
05-Aug-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Lower 32x32 bit multiplication on BXT. AFAIK BXT has the same annoying alignment limitation as CHV on the source register regions of 32x32 bit MULs, give it the same treatment. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c1da15709a0c0c2775bd9e534f67c60f7dc95ce8 |
|
12-Jul-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Use float calculations when double is unnecessary. Literals without an f/F suffix are of type double, and implicit conversion rules specify that the float in (float op double) be converted to a double before the operation is performed. I believe float execution was intended (in nearly all cases) or is sufficient (in the case of gen7_urb.c). Removes a lot of float <-> double conversion instructions and replaces many double instructions with float instructions which are cheaper. text data bss dec hex filename 4928659 195160 26192 5150011 4e953b i965_dri.so before 4928315 195152 26192 5149659 4e93db i965_dri.so after Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
02425d3ec2af6945a03583cadcaa5f3f330bbc0e |
|
27-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Make the default builder 64-wide before entering the optimization loop. Not a typo. Replace the default builder with one of bogus width to catch cases in which optimization passes assume that the default dispatch width is good enough. The execution controls of instructions emitted during optimization should in general match the original code that is being manipulated. Many of the problems fixed in this series were caught by the assertions introduced in this patch. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4529916dfd227af6c4e151f45261db22157fe45f |
|
27-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Don't set exec_all on instructions wider than the original in lower_simd_width. This could have led to somewhat increased bandwidth usage for lowered texturing instructions on Gen4 (which is the only case in which lower_width may be greater than inst->exec_size). After the previous patches the invariant mentioned in the comment should no longer be assumed by any of the other optimization and lowering passes, so the exec_all() call shouldn't be necessary anymore. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
eaba922582cfdccc7b198f9b23d8bd3c26197d03 |
|
27-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Initialize a builder explicitly in the gen4 send dependency work-arounds. Instead of relying on the default one. This shouldn't lead to any functional changes because DEP_RESOLVE_MOV overrides the execution size of the instruction anyway and other execution controls are irrelevant. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
992cda2c8a452ec86386a0f98eaf522afe206695 |
|
27-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Switch lower_logical_sends() to the fs_builder constructor from instruction. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
930ebb258524762c765fa864ef7063bd8bb754a1 |
|
27-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Switch lower_load_payload() to the fs_builder constructor from instruction. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bfad71606a987f14f20d2c3607846648f8537f2b |
|
27-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Set up the builder execution size explicitly in opt_sampler_eot(). opt_sampler_eot() was relying on the default builder to have the same width as the sampler and FB write opcodes it was eliminating, the channel selects didn't matter because the builder was only being used to allocate registers, no new instructions were being emitted with it. A future commit will change the width of the default builder what will break this assumption, so initialize it explicitly here. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ff463af436bcf07430807512c9f0bf0f627288ce |
|
27-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Set execution controls correctly in lower_integer_multiplication(). lower_integer_multiplication() was ignoring the execution controls of the original MUL instruction. Fix it by using the new fs_builder constructor. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ce90227c71c8cbe6ca4317f1873ff12c70081c4c |
|
27-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Set execution controls correctly for lowered pull constant loads. demote_pull_constants() was ignoring the execution size and channel selects of the instruction that wanted the constant, which doesn't matter for uniform pull constant loads because all channels get the same scalar value, but it might for varying pull constant loads. Fix it by using the new fs_builder() constructor that takes care of setting execution controls compatible with the instruction passed as argument. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3352724dfa4eb5c93290db92ae99d26d9b89e630 |
|
14-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Implement lowering of logical surface instructions. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
086d29f4d747bbcfe37beeb18ba77fb2cb84dbdc |
|
18-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Hook up SIMD lowering to unroll surface instructions of unsupported width. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7a594a95a930f1658062e4d86d0f37d491b372b3 |
|
21-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Define logical typed and untyped surface opcodes. Each logical variant is largely equivalent to the original opcode but instead of taking a single payload source it expects its arguments separately as individual sources, like: typed_surface_write_logical null, coordinates, source, surface, num_coordinates, num_components This patch defines the opcodes and usual instruction boilerplate, including a placeholder lowering function provided mainly as documentation for their source registers. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a0c02d2bbb765b0e997ad524d8e51838e529d9c0 |
|
28-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965: Define the setup_vector_uniform_values() backend_visitor interface. This cleans up the VEC4 implementation of setup_uniform_values() somewhat and will avoid duplication of the image uniform upload code by having a common interface to upload a vector of uniforms on either back-end. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4be99438e6e40280f9dc071882ce3bfbfabadb4a |
|
13-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Hook up SIMD lowering to handle texturing opcodes of unsupported width. This should match the set of cases in which we currently call fail() or no16() from the emit_texture_*() methods and the ones in which emit_texture_gen4() enables the SIMD16 workaround. Hint for reviewers: It's not a big deal if I happen to have missed some case here, it will just lead to an assertion failure down the road which is easily fixable, however being stricter than necessary won't cause any visible breakage, it would just decrease performance silently due to the unnecessary message splitting, so feel free to double-check that all cases listed here already cause a SIMD8/16 fall-back with the current texturing code -- You may want to skip over the Gen5-6 cases though if you don't have pencil and paper at hand. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2cd466f6c3192015ea1794afc57eb453d7f13818 |
|
18-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Implement lowering of logical texturing opcodes on Gen4. Unlike its Gen5 and Gen7 counterparts this patch isn't a plain refactor of the previous Gen4 texturing code, it's more of a rewrite largely based on emit_texture_gen4_simd16(). The reason is that on the one hand the original emit_texture_gen4() code didn't seem easily fixable to be SIMD width-invariant and had plenty of clutter to support SIMD-width workarounds which are no longer required. On the other hand emit_texture_gen4_simd16() was missing a number of SIMD8-only opcodes. This should generalize both and roughly match their current behaviour where there is overlap. Incidentally this will fix the following piglits on Gen4: arb_shader_texture_lod.execution.arb_shader_texture_lod-texgrad arb_shader_texture_lod.execution.tex-miplevel-selection *gradarb 2d arb_shader_texture_lod.execution.tex-miplevel-selection *gradarb 3d arb_shader_texture_lod.execution.tex-miplevel-selection *projgradarb 2d arb_shader_texture_lod.execution.tex-miplevel-selection *projgradarb 2d_projvec4 arb_shader_texture_lod.execution.tex-miplevel-selection *projgradarb 3d Acked-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
501134b9fe02633ca0cdda66a9b670ae38e791f7 |
|
18-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Implement lowering of logical texturing opcodes on Gen5-6. This should be largely equivalent to emit_texture_gen5() except for slight codestyle changes and the use i965 opcodes instead of the ir_texture_opcode enum, see "i965/fs: Implement lowering of logical texturing opcodes on Gen7+." for the mapping between them. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
03582f95b256e483fc1b0d78bd6a49203a448a23 |
|
17-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Lower SHADER_OPCODE_TXF_UMS/MCS_LOGICAL too on Gen7+. These weren't being handled by emit_texture_gen7() but we can easily lower them here for consistency with other texturing opcodes. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8be01e3548bdd900b7cadb5c9a77e52b01151cfe |
|
13-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Implement lowering of logical texturing opcodes on Gen7+. This should be largely equivalent to emit_texture_gen7() except that we now get i965 sampling opcodes directly rather than ir_texture_opcode enum values. The mapping is as follows: - ir_tex -> SHADER_OPCODE_TEX - ir_txb -> FS_OPCODE_TXB - ir_txl -> SHADER_OPCODE_TXL - ir_txd -> SHADER_OPCODE_TXD - ir_txf -> SHADER_OPCODE_TXF - ir_txf_ms -> SHADER_OPCODE_TXF_CMS - ir_txs -> SHADER_OPCODE_TXS - ir_query_levels -> SHADER_OPCODE_TXS too, the visitor will make sure that the provided lod value is zero in this case. - ir_lod -> SHADER_OPCODE_LOD - ir_tg4 -> SHADER_OPCODE_TG4_OFFSET if the offset value is not immediate, SHADER_OPCODE_TG4 otherwise. Other than that there are only minor changes and style fixes like the implementation now being factored out in static functions to improve encapsulation. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
44a8cf488e0370d7e5abe363c1fd2d21247a6e32 |
|
13-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix opt_zero_samples() for texturing ops not matching dispatch_width. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
33deff4f0582d2c073d34d4d6ec8344d2b1fbf7d |
|
21-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Define logical texture sampling opcodes. Each logical variant is largely equivalent to the original opcode but instead of taking a single payload source it expects the arguments separately as individual sources, like: tex_logical dst, coordinates, shadow_c, lod, lod2, sample_index, mcs, sampler, offset, num_coordinate_components, num_grad_components This patch defines the opcodes and usual instruction boilerplate, including a placeholder lowering function provided mostly as documentation for their source registers. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
59e7e6f7a21f13ff8963cf21af2e969f1f7961f5 |
|
13-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Implement lowering of logical framebuffer writes. This does essentially the same thing as fs_visitor::emit_single_fb_write(), with some slight differences: - We don't have to worry about exec_size and use_2nd_half anymore, 16-wide sources have already been lowered to 8-wide thanks to the previous commit and the manual argument unzipping is no longer required. - The src/dst_depth and sample_mask values are now explicit sources of the instruction instead of being taken from the visitor state directly. The same goes for the kill-pixel mask that will be passed to the instruction explicitly as predicate. - Everything is now done in static functions to improve encapsulation. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
633938afd349f2b423146969688c11f1e29ca17a |
|
13-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Hook up SIMD lowering to unroll FB writes of unsupported width. This shouldn't have any effect because we don't emit logical framebuffer writes yet. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a9f31a032b0a1068a4e2ceed9ed4680ecf13e28b |
|
27-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Define logical framebuffer write opcode. The logical variant is largely equivalent to the original opcode but instead of taking a single payload source it expects its arguments that make up the payload separately as individual sources, like: fb_write_logical null, color0, color1, src0_alpha, src_depth, dst_depth, sample_mask, num_components This patch defines the opcode and usual instruction boilerplate, including a placeholder lowering function provided mainly as self-documentation. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8368939e5d94f8d4ae55a1f22a755922ee77132b |
|
13-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Implement pass to lower instructions of unsupported SIMD width. This lowering pass implements an algorithm to expand SIMDN instructions into a sequence of SIMDM instructions in cases where the hardware doesn't support the original execution size natively for some particular instruction. The most important use-cases are: - Lowering send message instructions that don't support SIMD16 natively into SIMD8 (several texturing, framebuffer write and typed surface operations). - Lowering messages that don't support SIMD8 natively into SIMD16 (*cough*gen4*cough*). - 64-bit precision operations (e.g. FP64 and 64-bit integer multiplication). - SIMD32. The algorithm works by splitting the sources of the original instruction into chunks of width appropriate for the lowered instructions, and then interleaving the results component-wise into the destination of the original instruction. The pass is controlled by the get_lowered_simd_width() function that currently just returns the original execution size making the whole pass a no-op for the moment until some user is introduced. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> v2: Reverse order of the source transformations and split_inst emit call to make the code a bit easier to understand.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
86ae788baefefdb2fa77fe3c242ad2d81c8e834e |
|
16-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix return value of fs_inst::regs_read() for BAD_FILE. Typically BAD_FILE sources are used to mark a source as not present what implies that no registers are read. This will become much more frequent with logical send opcodes which have a large number of sources, many of them optionally used and marked as BAD_FILE when they aren't applicable. It will prove to be useful to be able to rely on the value of regs_read() regardless of whether a source is present or not. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1dd3543ac1bebe089bfe3a8ae5efbe3f564e1144 |
|
13-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Add stub lowering pass for logical send-message opcodes. This pass will house ad-hoc lowering code for several send message-like virtual opcodes that will represent their logically independent arguments as separate instruction sources rather than as a single payload blob. This pass will basically just take the separate arguments that are supposed to be part of the payload and concatenate them to construct a message in the form required by the hardware. Virtual instructions in separate-source form will eventually allow some simplification of the visitor code and make several transformations easier like lowering SIMD16 instructions to SIMD8 algorithmically in cases where the hardware doesn't support the former natively. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fb7eba97d7235d49ac712a21fb51009c86f3bc64 |
|
21-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Factor out source components calculation to a separate method. This cleans up fs_inst::regs_read() slightly by disentangling the calculation of "components" from the handling of message payload arguments. This will also simplify the SIMD lowering and logical send message lowering passes, because it will avoid expressions like 'regs_read * REG_SIZE / component_size' which are not only ugly, they may be inaccurate because regs_read rounds up the result to the closest register multiple so they could give incorrect results when the component size is lower than one register (e.g. uniforms). This didn't seem to be a problem right now because all such expressions happen to be dealing with per-channel GRFs only currently, but that's by no means obvious so better be safe than sorry. v2: Split PIXEL_X/Y and LINTERP into separate case blocks. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
80511d176a49e754a18ce585bab413db7af63bf7 |
|
21-Jul-2015 |
Dave Airlie <airlied@redhat.com> |
i965: add support for ARB_shader_subroutine This just adds some missing pieces to nir/i965, it is lightly tested on my Haswell. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9383664a9cbc5bc4858fc50d7fa565f43028d779 |
|
13-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix stride field for uniforms. This fixes essentially the same problem as for immediates. Registers of the UNIFORM file are typically accessed according to the formula: read_uniform(r, channel_index, array_index) = read_element(r, channel_index * 0 + array_index * 1) Which matches the general direct addressing formula for stride=0: read_direct(r, channel_index, array_index) = read_element(r, channel_index * stride + array_index * max{1, stride * width}) In either case if reladdr is present the access will be according to the composition of two register regions, the first one determining the per-channel array_index used for the second, like: read_indirect(r, channel_index, array_index) = read_direct(r, channel_index, read(r.reladdr, channel_index, array_index)) where: read(r, channel_index, array_index) = if r.reladdr == NULL then read_direct(r, channel_index, array_index) else read_indirect(r, channel_index, array_index) In conclusion we can handle uniforms consistently with the other register files if we set stride to zero. After lowering to a GRF using VARYING_PULL_CONSTANT_LOAD in demote_pull_constant_loads() the stride of the source is set to one again because the result of VARYING_PULL_CONSTANT_LOAD is generally non-uniform. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5f8d9ae5a54961deb02eb52e924a84b99b60f035 |
|
13-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix stride for immediate registers. When the width field was removed from fs_reg the BROADCAST handling code in opt_algebraic() started to miss a number of trivial optimization cases resulting in the ugly indirect-addressing sequence to be emitted unnecessarily for some variable-indexed texturing and UBO loads regardless of one of the sources of BROADCAST being immediate. Apparently the reason was that we were setting the stride field to one for immediates even though they are typically uniform. Width used to be set to one too which is why this optimization used to work previously until the "reg.width == 1" check was removed. The stride field of vector immediates is intentionally left equal to one, because they are strictly speaking not uniform. The assertion in fs_generator makes sure that immediates have the expected stride as consistency check. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9f344b908a95440d215f29c0b05b8ea8dba2839e |
|
01-Jul-2015 |
Connor Abbott <connor.w.abbott@intel.com> |
i965/fs: fix regs_read() for LINTERP The second source always stays within the same SIMD8 register. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7e337859ff98a0caf00fd201a5389933d42d0baa |
|
17-Jul-2015 |
Jordan Justen <jordan.l.justen@intel.com> |
i965/cs: Return 1 for regs_read on CS_OPCODE_CS_TERMINATE This prevents an assertion failure in brw_fs_live_variables.cpp, fs_live_variables::setup_one_read: Assertion `var < num_vars' failed. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4bddd82bf3dae44c2b75cef34e9e85e15d63df7f |
|
14-Jul-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Factor out universally broken calculation of the register component size. This in principle simple calculation was being open-coded in a number of places (in a series I haven't yet sent for review there will be a couple more), all of them were subtly broken in one way or another: None of them were handling the HW_REG case correctly as pointed out by Connor, and fs_inst::regs_read() was handling the stride=0 case rather naively. This patch solves both problems and factors out the calculation as a new fs_reg method. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
dabec9c293ee29335f5a6d5d1d3c2b7a715605c1 |
|
30-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Relax fs_builder channel group assertion when force_writemask_all is on. This assertion was meant to catch code inadvertently escaping the control flow jail determined by the group of channel enable signals selected by some caller, however it seems useful to be able to increase the default execution size as long as force_writemask_all is enabled, because force_writemask_all is an explicit indication that there is no longer a one-to-one correspondence between channels and SIMD components so the restriction doesn't apply. In addition reorder the calls to fs_builder::group and ::exec_all in a couple of places to make sure that we don't temporarily break this invariant in the future for instructions with exec_size higher than the dispatch width. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ebe3043eeacb073c7dbb6162d8f0aee3bc66eeb1 |
|
01-Jul-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Fix PIXEL_X/Y in regs_read() PIXEL_X/Y takes a vec2 in the first argument
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
830f67046ace3c0b95a7f093fe373eeb417a1aad |
|
18-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Remove the width field from fs_reg As of now, the width field is no longer used for anything. The width field "seemed like a good idea at the time" but is actually entirely redundant with the instruction's execution size. Initially, it gave us the ability to easily set the instructions execution size based entirely on register widths. With the builder, we can easiliy set the sizes explicitly and the width field doesn't have as much purpose. At this point, it's just redundant information that can get out of sync so it really needs to go. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
83458e7c53cfc1f344280da6eb9a3b4e2dfdbc00 |
|
18-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use exec_size instead of dst.width for computing component size There are a variety of places where we use dst.width / 8 to compute the size of a single logical channel. Instead, we should be using exec_size. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
21803b7b3304f053a48e313951ffddf1d2cd0bd9 |
|
18-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use the builder dispatch width instead of dst.width for pull constants Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c9676329dd6c69b2e0b12405c3b4078f7d216f2f |
|
18-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Remove exec_size guessing from fs_inst::init() Now that all of the non-explicit constructors are gone, we don't need to guess anymore. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
500525e96019aff551afa8fee841d00ca9ec4c4f |
|
18-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use exec_size for determining regs read/written and partial writes Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
89bc4c78c394e50ddb16cc089bd3ec90681342d7 |
|
18-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Remove fs_inst constructors that don't take an explicit exec_size Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
67c4c9e1a709508b88d6d31eb1f7cb61d187189e |
|
18-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Make better use of the builder in shader_time Previously, we were just depending on register widths to ensure that various things were exec_size of 1 etc. Now, we do so explicitly using the builder. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f7dcc1160331462a071c54ca1067f9e2f57b55be |
|
18-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Add a builder argument to offset() Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c5a8da5f24eae4479b4ebe6301d780f781e24ed2 |
|
01-Jul-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Properly handle LOAD_PAYLOAD in fs_inst::regs_read Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
12bc22ef58377191508af91a918efd18e2da7500 |
|
19-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Report the right value in fs_inst::regs_read() for PIXEL_X/Y Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
aca5228011e7b9e96f3bd3a621c88e63ba47a4f3 |
|
18-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Fix fs_inst::regs_read() for uniform pull constant loads Previously, fs_inst::regs_read() fell back to depending on the register width for the second source. This isn't really correct since it isn't a SIMD8 value at all, but a SIMD4x2 value. This commit changes it to explicitly be always one register. v2: Use mlen for determining the number of registers read Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
241317d59ab440bdcda25bacaadacfb3b4c2dd93 |
|
19-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Actually set/use the mlen for gen7 uniform pull constant loads Previously, we were allocating the payload with different sizes per gen and then figuring out the mlen in the generator based on gen. This meant, among other things, that the higher level passes knew nothing about it. Acked-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3258e1b80d66ec26f14a24a5eae0629a2d23a444 |
|
18-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use a switch statement in fs_inst::regs_read() This makes things a little simpler, more efficient, and quite a bit more readable. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
19a0ba130fd0d0f3b86181a8d05cf5391420360d |
|
27-Jun-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/vs: Move compute_clip_distance() out of emit_urb_writes(). Legacy user clipping (using gl_Position or gl_ClipVertex) is handled by turning those into the modern gl_ClipDistance equivalents. This is unnecessary in Core Profile: if user clipping is enabled, but the shader doesn't write the corresponding gl_ClipDistance entry, results are undefined. Hence, it is also unnecessary for geometry shaders. This patch moves the call up to run_vs(). This is equivalent for VS, but removes the need to pass clip distances into emit_urb_writes(). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
40801295d5a3d747661abb1e2ca64d44c0e3dc05 |
|
23-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Remove the brw_context from the visitors As of this commit, nothing actually needs the brw_context. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
663f8d121d792edee5c012461bfd0b650011ff4a |
|
20-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/vs: Pass the current set of clip planes through run() and run_vs() Previously, these were pulled out of the GL context conditionally based on whether we were running ff/ARB or a GLSL program. Now, we just pass them in so that the visitor doesn't have to grab them itself. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4af62c0f5cbadc762abb1bd2e59f44ca220e3f0a |
|
20-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Add a do_rep_send flag to run_fs Previously, we were pulling it from brw->do_rep_send Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1b0f6ffa15b25e8601d60fe1ea74e893f7d33cf5 |
|
20-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Pull calls to get_shader_time_index out of the visitor Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c7893dc3c590b86787d8118e3920debaea3f16da |
|
19-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Use a single index per shader for shader_time. Previously, each shader took 3 shader time indices which were potentially at arbirary points in the shader time buffer. Now, each shader gets a single index which refers to 3 consecutive locations in the buffer. This simplifies some of the logic at the cost of having a magic 3 a few places. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
073294d3ef20d0dbeffcc38aff3d69eda624ee75 |
|
23-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Plumb compiler debug logging through brw_compiler Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3fd457c9ddd4b9f730e70bfd19b2f9eeeeaef089 |
|
23-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Do the no16 perf logging directly in fs_visitor::no16() While we're at it, we'll drop the note about 10-20% performance loss. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f45bf97f30f2feacf8f976271a43feea70e5c382 |
|
23-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Make no16 non-variadic We never used the fact that it was variadic anyway. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d7565b7d65f8203c20735a61b86e9158b8ec4447 |
|
16-Apr-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Remove the dependance on brw_context from the generators Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e639a6f68e701f23b977a49c45d646c164991d36 |
|
16-Apr-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Plumb compiler debug logging through a function pointer in brw_compiler v2 (Ken): Make shader_debug_log a printf-like function. v3 (Jason): Add a void * to pass the brw_context through Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
630764407aeba4acf9364739bafb0e3516f72e31 |
|
20-Jun-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Replace some instances of brw->gen with devinfo->gen
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a49328d58d1e3e143f9434976d9f3574acefc4ea |
|
22-Jun-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Don't mess up stride for uniform integer multiplication. If the stride is 0, the source is a uniform and we should not modify the stride. Cc: "10.6" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91047 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8d3c48eed24f351c86361707978647c78010bb7f |
|
10-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Remove one more fixed brw_null_reg() from the visitor. Instead use fs_builder::null_reg_f() which has the correct register width. Avoids the assertion failure in fs_builder::emit() hit by the "ES3-CTS.shaders.loops.for_dynamic_iterations.unconditional_break_fragment" GLES3 conformance test introduced by 4af4cfba9ee1014baa4a777660fc9d53d57e4c82. Reported-and-reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
44928b799adbbf2671c482431b3b7a390118725c |
|
08-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Remove dead IR construction code from the visitor. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fe88c7ae38c72ea09ced69fb12ff00f58bdf1d6e |
|
03-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Migrate translation of NIR ALU instructions to the IR builder. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e32c16c47f7a3cf25e2b4d2f3b97d0f8f89669c0 |
|
03-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Migrate FS framebuffer writes to the IR builder. The explicit call to fs_builder::group() in emit_single_fb_write() is required by the builder (otherwise the assertion in fs_builder::emit() would fail) because the subsequent LOAD_PAYLOAD and FB_WRITE instructions are in some cases emitted with a non-native execution width. The previous code would always use the channel enables for the first quarter, which is dubious but probably worked in practice because FB writes are never emitted inside non-uniform control flow and we don't pass the kill-pixel mask via predication in the cases where we have to fall-back to SIMD8 writes. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ad68853f17868081a69b3f73f4bf4c1bc8b2571d |
|
03-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Migrate FS discard handling to the IR builder. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
46f264638ad97a0b806e6fad7117d62a2cf914b6 |
|
03-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Migrate FS gl_SamplePosition/ID computation code to the IR builder. v2: Use fs_builder::AND/SHR/MOV instead of ::emit. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
31477226ec6cbe956a4bbdcae81cc7ca5ad28cc6 |
|
03-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Migrate FS interpolation code to the IR builder. v2: Fix some preexisting trivial codestyle issues. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d3c10ad42729c1fe74a7f7c67465bd2beb7f9e75 |
|
03-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Migrate shader time to the IR builder. v2: Change null register destination type to UD so it can be compacted. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
546839ef639bf871feaa62ab7d811f2fc783bdcd |
|
03-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Migrate pull constant loads to the IR builder. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8f626c14989f005599f7841b89144d2bf58b5704 |
|
03-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Migrate Gen4 send dependency workarounds to the IR builder. v2: Change brw_null_reg() to bld.null_reg_f(). Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4af4cfba9ee1014baa4a777660fc9d53d57e4c82 |
|
03-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Migrate lower_integer_multiplication to the IR builder. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
efa60e49f2e5dd56f1c81487e9aad9f89136d8b4 |
|
03-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Migrate lower_load_payload to the IR builder. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6114ba4dccfdb8f7c657feeed8f8c9b69debba91 |
|
03-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Migrate opt_sampler_eot to the IR builder. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e04b4156a745fc09afa066c892c1913362eae9df |
|
03-Jun-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Allocate a common IR builder object in fs_visitor. v2: Call fs_builder::at_end() to point the builder at the end of the program explicitly. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c820407ef0aac87546d1a778e169cfa1a915a219 |
|
03-Jun-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Print mlen in dump_instructions() output. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
78644ffc4d341deb431145108f0b2d377e59b61e |
|
20-May-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Remove the ir_visitor code Now that everything is running through NIR, this is all dead. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
114497afff4e49139b8c7d61f11a7872b81398bf |
|
20-May-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Make NIR non-optional for scalar shaders Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
99cb4233205edcfa1a1e2967eef7bb16ff19bec4 |
|
20-May-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Rename backend_visitor to backend_shader The backend_shader class really is a representation of a shader. The fact that it inherits from ir_visitor is somewhat immaterial. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0596134410a0decc2f6bba77bfedb82d308aabbe |
|
27-May-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Fix lowering of integer multiplication with cmod. If the multiplication's result is unused, except by a conditional_mod, the destination will be null. Since the final instruction in the lowered sequence is a partial-write, we can't put the conditional mod on it and we have to store the full result to a register and do a MOV with a conditional mod. Cc: "10.6" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90580 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6ca67f62e885f0e42c0cef2db5c0ae837adfe646 |
|
20-May-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Fix implied_mrf_writes for scratch writes We build the entire message in the generator so all the MRF writes are implied. Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f7df169ba13d22338e9276839a7e9629ca0a6b4f |
|
14-May-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Implement integer multiply without mul/mach. Ivybridge and Baytrail can't use mach with 2Q quarter control, so just do it without the accumulator. Stupid accumulator. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4ec09c77471e39e6ff81c99f1edde2e1713a7f24 |
|
13-May-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Support integer multiplication in SIMD16 on Haswell. Ivybridge (and presumably Baytrail) have a bug that prevents this from working. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1e4e17fbd9296cc5064aabdb351a894d10190cb6 |
|
11-May-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Lower integer multiplication after optimizations. 32-bit x 32-bit integer multiplication requires multiple instructions until Broadwell. This patch just lets us treat the MUL instruction in the FS backend like it operates on Broadwell, and after optimizations we lower it into a sequence of instructions on older platforms. Doing this will allow us to some extra optimization on integer multiplies. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3687d752e51829b4723c9abb07ae56d2bbcda570 |
|
12-Mar-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Combine the fs_visitor constructors. For scalar GS support, we either need to add a fourth constructor which takes the GS structures, or combine the existing two and pass the shader stage. Given that they're not significantly different, I opted for the latter. v2: Remove more stuff from the .h file (Jason and Jordan). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0db663503ea86579d3352fe83d428d573a8d2b03 |
|
07-May-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965: Don't forget the force_sechalf flag in lower_load_payload(). Regression from commit 41868bb6824c6106a55c8442006c1e2215abf567. Fixes a bunch of ARB_shader_image_load_store tests. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bfdae9149e00bd5c2521db3e75669ae043eed5cc |
|
08-May-2015 |
Neil Roberts <neil@linux.intel.com> |
i965/fs: Disable opt_sampler_eot for textureGather The opt_sampler_eot optimisation seems to break when the last instruction is SHADER_OPCODE_TG4. A bunch of Piglit tests end up doing this so it causes a lot of regressions. I can't find any documentation or known workarounds to indicate that this is expected behaviour, but considering that this is probably a pretty unlikely situation in a real use case we might as well disable it in order to avoid the regressions. In total this fixes 451 tests. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f98c3f3e44abb0c8cb158c589418def111d72052 |
|
08-May-2015 |
Neil Roberts <neil@linux.intel.com> |
i965/fs: Improve a comment about stripping trailing zeroes Originally I wrote that removing the first parameter doesn't work but I didn't know why. I now found a mention of this in the PRM so it's probably worthing adding it to the comment.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e51bad669a4c42845c44a925bbb5d8885799c28f |
|
07-May-2015 |
Neil Roberts <neil@linux.intel.com> |
i965/skl: In opt_sampler_eot always set destination register to null opt_sampler_eot enables a direct write to framebuffer from a sample. In order to do this the sample message needs to have a message header so if there wasn't one already then the function adds one. In addition the function sets the destination register to null because it's no longer used. However it was only doing this in cases where it was adding a message header. This patch just moves setting the destination so that it happens even if there's a messge header. In practice this doesn't seem to make any difference but it's a bit cleaner. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1c5de556c5972c3020b4095c586a9b439b20cf69 |
|
07-May-2015 |
Neil Roberts <neil@linux.intel.com> |
i965/fs: Set the header_size on LOAD_PAYLOAD in opt_sampler_eot Commit 94ee908448 added a header size parameter to the function to create the LOAD_PAYLOAD instruction. However this broke opt_sampler_eot which manually constructs the instruction and so wasn't setting the header_size. This ends up making the parameters for the send message all have the wrong location and it all falls apart. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7a75b55a01d355090d186357896e3cb141b9775e |
|
02-Apr-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs_inst: Get rid of the effective_width field The effective_width field was an ill-concieved hack to get around issues in the LOAD_PAYLOAD instruction. Now that the LOAD_PAYLOAD instruction is far more sane, this field can die. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
41868bb6824c6106a55c8442006c1e2215abf567 |
|
25-Mar-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Rework the fs_visitor LOAD_PAYLOAD instruction The newly reworked instruction is far more straightforward than the original. Before, the LOAD_PAYLOAD instruction was lowered by a the complicated and broken-by-design pile of heuristics to try and guess force_writemask_all, exec_size, and a number of other factors on the sources. Instead, we use the header_size on the instruction to denote which sources are "header sources". Header sources are required to be a single physical hardware register that is copied verbatim. The registers that follow are considered the actual payload registers and have a width that correspond's to the LOAD_PAYLOAD's exec_size and are treated as being per-channel. This gives us a fairly straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and force_writemask_all modifiers as the LOAD_PAYLOAD operation itself. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. This patch commit itself is made up of a bunch of smaller changes squashed together. Individual change descriptions follow: i965/fs: Rework fs_visitor::LOAD_PAYLOAD We rework LOAD_PAYLOAD to verify that all of the sources that count as headers are, indeed, exactly one register and that all of the non-header sources match the destination width. We then take the exec_size for LOAD_PAYLOAD directly from the destination width. i965/fs: Make destinations of load_payload have the appropreate width i965/fs: Rework fs_visitor::lower_load_payload v2: Don't allow the saturate flag on LOAD_PAYLOAD instructions i965/fs_cse: Support the new-style LOAD_PAYLOAD i965/fs_inst::is_copy_payload: Support the new-style LOAD_PAYLOAD i965/fs: Simplify setup_color_payload Previously, setup_color_payload was a a big helper function that did a lot of gen-specific special casing for setting up the color sources of the LOAD_PAYLOAD instruction. Now that lower_load_payload is much more sane, most of that complexity isn't needed anymore. Instead, we can do a simple fixup pass for color clamps and then just stash sources directly in the LOAD_PAYLOAD. We can trust lower_load_payload to do the right thing with respect to COMPR4. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
94ee908448405c8271e8662914a1c49df8d623b2 |
|
24-Mar-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Make LOAD_PAYLOAD take a header size Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
32af7d4188e286a525081ada9965070dd41dbab7 |
|
02-Apr-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs_inst: Add an is_copy_payload helper This commit adds a new is_copy_payload helper to fs_inst that takes the place of the similarly named functions in cse and register coalesce. The two is_copy_payload functions in CSE and register coalesce were subtly different and potentially subtly broken. The new version unifies the two and should be more correct. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
76c1086f2dfb37a1edf6d2df6eebbe11ccbfc50b |
|
24-Mar-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Change header_present to header_size in backend_instruction Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3da9f708d4f1375d674fae4d6c6eb06e4c8d9613 |
|
20-Feb-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode. v2: Save some CPU cycles by doing 'return progress' rather than 'depth++' in the discard jump special case. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f2fad0dc80627e853eea558498f18a9fa769992e |
|
19-Feb-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965: Perform basic optimizations on the BROADCAST opcode. v2: Style fixes. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f118e5d15fd9b35cf27a975a702c5fb81d3157aa |
|
23-Apr-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965: Add typed surface access opcodes. Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0775d8835ac8d1f2ab75d04f0cddbad36b6787fe |
|
23-Apr-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965: Add untyped surface write opcode. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b750e14fbbeb20a6daa869ae642c0c1e1ce6e6d2 |
|
16-Apr-2015 |
Jordan Justen <jordan.l.justen@intel.com> |
i965/fs: Add CS shader time support Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
17233f9bbcbf570f0c7633c63dbd5ed88634ed60 |
|
21-Apr-2015 |
Jordan Justen <jordan.l.justen@intel.com> |
i965: Add brw_setup_tex_for_precompile. Use in VS, GS & FS. Suggested-by: Kristian Høgsberg <krh@bitplanet.net> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c380973a9564be57acdae5ab6c6a9efcb72cf6c9 |
|
31-Aug-2014 |
Jordan Justen <jordan.l.justen@intel.com> |
i965/fs: Support compute programs in fs_visitor v2: * Clean out some unneeded code copied from run_fs (krh) * Always use NIR * Split shader time out into a separate commit Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
02e9773bc8526f64e4d79e3d9ac11f49882c022f |
|
24-Apr-2015 |
Neil Roberts <neil@linux.intel.com> |
i965/fs: Strip trailing constant zeroes in sample messages If a send message is emitted with a message length that is less than required for the message then the remaining parameters default to zero. We can take advantage of this to save a register when a shader passes constant zeroes as the final coordinates to the sample function. I think this might be useful for GLES applications that are using 2D textures to simulate 1D textures. On Skylake it will be useful for shaders that do texelFetch(tex,something,0) which I think is fairly common. This helps more on Skylake because in that case the order of the instruction operands are u,v,lod,r which is good for 2D textures whereas before they were u,lod,v,r which is only good for 1D textures. On Haswell: total instructions in shared programs: 8535730 -> 8533261 (-0.03%) instructions in affected programs: 236968 -> 234499 (-1.04%) helped: 1174 On Skylake: total instructions in shared programs: 10345646 -> 10341237 (-0.04%) instructions in affected programs: 293011 -> 288602 (-1.50%) helped: 1218 Reviewed-by: Matt Turner <mattst88@gmail.com> v2: Applied suggestions by Kenneth Graunke: - Only apply on Gen5+ - Apply to all texture opcodes, not just TEX and TXF. Moved the optimisation into the loop as suggested by Matt Turner. Fix the array index when there is a header.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1ac7db07b363207e8ded9259f84bbcaa084b8667 |
|
12-Mar-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Unhardcode a few more stage names and abbreviations. The stage_abbrev and stage_name fields in backend_visitor provide what we need without any additional effort. It also means we'll get the right names for compute shaders, SIMD8 geometry shaders, and both kinds of tessellation shaders. This does unfortunately change the capitalization of the stage abbreviation in the INTEL_DEBUG=optimizer output filenames. It doesn't seem worth adding code to handle, though. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5d4f085a43ccd1122301421f2013e42a3f0a7604 |
|
28-Apr-2015 |
Neil Roberts <neil@linux.intel.com> |
i965: Don't try to apply the opt_sampler_eot extension for vs The opt_sampler_eot optimisation of fs_visitor effectively assumes that it is running on a fragment shader because it casts the program key to a brw_wm_prog_key. However on Skylake fs_visitor can also be used for vertex shaders. It looks like this usually works anyway because the optimisation is skipped if key->nr_color_regions != 1. However for a vertex shader the key is actually a brw_vs_prog_key so the space for nr_color_regions is probably taken up by key->base.program_string_id. This can end up making nr_color_regions be 1 in which case the function will later assert when the last instruction is not FS_OPCODE_FB_WRITE. This was making the DEQP test suite assert. Presumably this only happens there because that compiles a lot of shaders so it would end up with a high value for program_string_id. Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a85c4c9b3f75cac9ab133caa91a40eec2e4816ae |
|
16-Apr-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Rename brw_compile to brw_codegen This name better matches what it's actually used for. The patch was generated with the following command: for file in *; do sed -i -e s/brw_compile/brw_codegen/g $file done Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cfc56fcee36912d5fb41262c71463292a737160e |
|
17-Apr-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Use device_info instead of the context for computing vue maps Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
28e9601d0e681411b60a7de8be9f401b0df77d29 |
|
16-Apr-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Add a devinfo field to backend_visitor and use it for gen checks Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5af0604d528733af9113a6f8711c39796ce0ae40 |
|
07-Apr-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Calculate delta_x and delta_y together. This lets SIMD16 programs on G45 and Gen5 use the PLN instruction. On Ironlake: total instructions in shared programs: 5634757 -> 5518055 (-2.07%) instructions in affected programs: 1745837 -> 1629135 (-6.68%) helped: 11439 HURT: 4 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a1dd2f0bb6f9bf61d4a40d033740140b86c060e0 |
|
12-Apr-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add LINTERP's src0 to fs_inst::regs_read(). LINTERP's src0 is PLN's src1, and PLN's src1 reads exec_size / 4 registers. Having that information lets us drop the delta_x/y special case code in split_virtual_grfs(). Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b069f9eafd945a86be633d8fff4e715fc6d7ec2d |
|
08-Feb-2015 |
Ben Widawsky <benjamin.widawsky@intel.com> |
i965/fs: Combine tex/fb_write operations (opt) Certain platforms support the ability to sample from a texture, and write it out to the file RT - thus saving a costly send instructions (note that this is a potnential win if one wanted to backport to a tag that didn't have the patch from Topi which removed excess MOVs from LOAD_PAYLOAD - 97caf5fa04dbd2), v2: Modify the algorithm. Instead of iterating in reverse through blocks and insts, since the last block/inst is the only thing which can benefit. Rebased on top of Ken's patching modifying is_last_send v3: Rebased over almost 2 months, and Incorporated feedback from Matt: Some comment typo fixes and rewordings. Whitespace Move the optimization pass outside of the optimize loop v4: Some cosmetic changes requested from Ken. These changes ensured that the optimization function always returned true when an optimization occurred, and false when one did not. This behavior did not exist with the original patch. As a result, having the separate helper function which Matt did not like no longer made sense, and so now I believe everyone should be happy. Benchmark (n=20) %diff *OglBatch5 -1.4 *OglBatch7 -1.79 OglFillTexMulti 5.57 OglFillTexSingle 1.16 OglShMapPcf 0.05 OglTexFilterAniso 3.01 OglTexFilterTri 1.94 No piglit regressions: (http://otc-gfxtest-01.jf.intel.com:8080/view/dev/job/bwidawsk/112/) [*] I believe my measurements are incorrect for Batch5-7. If I add this new optimization, but never emit the new instruction I see similar results. v5: Remove declaration of combine_tex_header since v4 dropped that function (Ben) Remove check for impossible case of an empty block (Matt) Set dest earlier to avoid extra special-casing in generate_tex (Matt) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6866378cf42c86d03f38616804e6714a932ab70b |
|
10-Apr-2015 |
Ben Widawsky <benjamin.widawsky@intel.com> |
i965/fs: Only emit FS_OPCODE_PLACEHOLDER_HALT if there are discards Based originally on a patch from Ken in May 2014 of the same title. Things changed enough that I didn't feel comfortable leaving his authorship. v2: Replace fp->UsesKill with wm_prog_data->uses_kill. Since Ken took the time to also explain the difference to me, here is his explanation for posterity: "fp->UsesKill indicates that a ARB_fragment_program shader uses the KIL instruction, or that a GLSL shader uses the "discard" insntruction (which are analogous). On Gen4-5, we sometimes have to simulate OpenGL's "Alpha Test" feature by emitting shader code that implicitly does a "discard" instruction. In the key setup, we do: /* key->alpha_test_func means simulating alpha testing via discards, * so the shader definitely kills pixels. */ prog_data.uses_kill = fp->program.UsesKill || key->alpha_test_func; Even though the shader may not technically contain a "discard", we need to act as if it does. I've also been trying to move the i965 state setup code to use brw_wm_prog_key for everything, rather than poking at core Mesa's gl_program/gl_fragment_program/gl_shader/gl_shader_program structures. --Ken" Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
38707e1478a4b6f4687c583d06fbd68e22900735 |
|
01-Apr-2015 |
Ben Widawsky <benjamin.widawsky@intel.com> |
i965/fs: Create a has_side_effects for fs_inst When an instruction has a side effect, it impacts the available options when reordering an instruction. As the EOT flag is an implied write to the render target in the FS, it can be considered a side effect. This patch shouldn't actually have any impact on the current code since the EOT flag implies that the opcode is already one with side effects, FS_OPCODE_FB_WRITE. The next patch however will introduce an optimization whereby the EOT flag can occur with an opcode SHADER_OPCODE_TEX, and as that instruction will perform the same implied write to the render target, it cannot be reordered. v2: Remove extra whitespace (Matt) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
21d29124a719bdaf5794859a4a7441cc6be33df7 |
|
12-Apr-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix INTEL_DEBUG=shader_time for SIMD8 VS. In commit 4ebeb71573ad44f7657810dc5dd2c9030e3e63db, I deleted the emit_shader_time_end() call in emit_urb_writes(). But I failed to add it to run_vs(), as I intended. So no data was recorded at all. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8aee87fe4cce0a883867df3546db0e0a36908086 |
|
20-Feb-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Use SIMD16 instead of SIMD8 on Gen4 when possible. Gen5+ systems allow you to specify multiple shader programs - both SIMD8 and SIMD16 - and the hardware will automatically dispatch to the most appropriate one, given the number of subspans to be processed. However, that is not the case on Gen4. Instead, you program a single shader. If you enable multiple dispatch modes (SIMD8 and SIMD16), the shader is supposed to contain a series of jump instructions at the beginning. The hardware will launch the shader at a small offset, hitting one of the jumps. We've always thought that sounds like a pain, and weren't clear how it affected performance - is it worth having multiple shader types? So, we never bothered with SIMD16 until now. This patch takes a simpler approach: try and compile a SIMD16 shader. If possible, set the no_8 flag, telling the hardware to just use the SIMD16 variant all the time. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bff421332661bfd0f82ab9eee9e4fec9d06ed1a1 |
|
03-Apr-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Check the INTEL_USE_NIR environment variable once at context creation Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b9b66985c3d33fa0db2b49c0e0231aa6d341e183 |
|
20-Mar-2015 |
Carl Worth <cworth@cworth.org> |
i965: Rename do_<stage>_prog to brw_compile_<stage>_prog (and export) This is in preparation for these functions to be called from other files. This commit is intended to have no functional change. It exists in preparation for some upcoming code movement in preparation for the shader cache. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ac69ab7302dffa1350c64a9c69abd7721d0f0127 |
|
27-Mar-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Move env_var_as_boolean to intel_debug.c. I need to use this in brw_vec4.cpp, so it can't be static anymore. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
826d3afb8f421a62020308813397e541e672381e |
|
30-Jan-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Add ARB_fragment_program support to the NIR backend. Use prog_to_nir where we would normally call glsl_to_nir, handle program parameter lists, and skip a few things that don't exist. Using NIR generates much better shader code than Mesa IR, since we get real optimizations, as opposed to prog_optimize: total instructions in shared programs: 314007 -> 279892 (-10.86%) instructions in affected programs: 285173 -> 251058 (-11.96%) helped: 2001 HURT: 67 GAINED: 4 LOST: 7 v2: Change early return in nir_setup_uniforms to if/else (Jordan). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
74c7e5d35181d31e4448c614f6aa62c1e1f60694 |
|
18-Mar-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965: Define method to check whether a backend_reg is inside a given range. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8a0946f3b1522e5f91afe14c8c3b22ba6009ed04 |
|
06-Mar-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Make an emit_discard_jump() function to reduce duplication. This is already copied in two places, and I want to copy it to a third place. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Carl Worth <cworth@cworth.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b0d422cd2a99d2fd26ab11880d5d8410ebfc64b2 |
|
16-Mar-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Print spills:fills and number of promoted constants. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e1f3ddef8c9928d9b8e845b811dc08983c541f99 |
|
17-Mar-2015 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/nir: Make our environment variable checking smarter Before, we enabled NIR if you set INTEL_USE_NIR to anything which mean that INTEL_USE_NIR=false would actually turn on NIR. In preparation for turning NIR on by default, this commit makes it smarter by allowing the INTEL_USE_NIR variable to work as either a force-enable or a force-disable. Reviewed-by: Mark Janes <mark.a.janes@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
627c68308683abbd6e563a09af6013a33938a790 |
|
16-Mar-2015 |
Tapani Pälli <tapani.palli@intel.com> |
i965/fs: in MAD optimizations, switch last argument to be immediate Commit bb33a31 introduced optimizations that transform cases of MAD in to simpler forms but it did not take in to account that src[0] can not be immediate and did not report progress. Patch switches src[0] and src[1] if src[0] is immediate and adds progress reporting. If both sources are immediates, this is taken care of by the same opt_algebraic pass on later run. v2: Fix for all cases, use temporary fs_reg (Matt, Kenneth) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89569 Reviewed-by: Francisco Jerez <currojerez@riseup.net> (v1) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "10.5" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
547c760964bcad23a056e5156e4fefd7487c0192 |
|
09-Mar-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Use NIR for scalar VS when INTEL_USE_NIR is set. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6ac1bc90c4a7a6f32901a9782e14b090f6fe5270 |
|
10-Mar-2015 |
Iago Toral Quiroga <itoral@igalia.com> |
i965: Fix out-of-bounds accesses into pull_constant_loc array The piglit test glsl-fs-uniform-array-loop-unroll.shader_test was designed to do an out of bounds access into an uniform array to make sure that we handle that situation gracefully inside the driver, however, as Ken describes in bug 79202, Valgrind reports that this is leading to an out-of-bounds access in fs_visitor::demote_pull_constants(). Before accessing the pull_constant_loc array we should make sure that the uniform we are trying to access is valid. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79202 Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4ebeb71573ad44f7657810dc5dd2c9030e3e63db |
|
27-Feb-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Make emit_shader_time_end() insert before EOT. Previously, we emitted the shader-time epilogue from emit_fb_writes(), during the middle of looping through color regions (or emit_urb_writes for the VS). This is duplicated several times and rather awkward. I need to fix a bug in our FB write handling, and it will be a lot easier if we move emit_shader_time_end() out of there. Now, we simply emit FB writes/URB writes, and subsequently have emit_shader_time_end() insert instructions before the final SEND with EOT. Not only is this simpler, it's actually a slight improvement: we now include the MOVs to set up the final FB write payload in our shader-time measurements. Note that INTEL_DEBUG=shader_time only exists on Gen7+, and uses send-from-GRF. (In the past, we might have hit trouble where both attempt to use MRFs for messages; that's not a problem now.) v2: Rebase on v3 of the previous patch and other shader_time fixes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> [v1] Acked-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e43af8d09f919d02b5ac0810c1c0f1783cbef6ef |
|
27-Feb-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Make get_timestamp() pass back the MOV rather than emitting it. This makes another part of the INTEL_DEBUG=shader_time code emittable at arbitrary locations, rather than just at the end of the instruction stream. v2: Don't lose smear! Caught by Topi Pohjolainen. v3: Don't set smear on the destination of the MOV. Thanks Topi! Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bea854c7f33cc10b8292f931f114afc4f88a8dd4 |
|
27-Feb-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Make emit_shader_time_write return rather than emit. Instead of emit_shader_time_write, we now do emit(SHADER_TIME_ADD(...)). The advantage is that we can also insert a shader time write at an arbitrary location in the instruction stream, rather than being restricted to emitting at the end. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f1adc45dbe649cdd4538fb96f6d2a27328bbfba1 |
|
08-Mar-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Set smear on shader_time diff register. The ADD(diff, diff, fs_reg(-2u)) instruction reads diff, which is a width 1 register. We need to read it as <0,1,0> with a subreg of 0, which is what smear accomplishes. Fixes assertion: brw_eu_emit.c:285: validate_reg: Assertion `hstride == 0' failed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86974 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ef9cc7d0c176669c03130abf576f2b700be39514 |
|
08-Mar-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Set force_writemask_all on shader_time instructions. These computations don't have anything to do with the currently executing channels, so they should use force_writemask_all. This fixes assert failures. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86974 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f9779e4a8f2ca67423cded0203adac6ad3d5c448 |
|
28-Feb-2015 |
Ian Romanick <ian.d.romanick@intel.com> |
i965/fs: Silence unused parameter warning Unused since b18fd23. brw_fs.cpp:2878:44: warning: unused parameter 'dispatch_width' [-Wunused-parameter] clear_deps_for_inst_src(fs_inst *inst, int dispatch_width, bool *deps, ^ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a84f66a9b6cf46bb19ca71faca5b1d6d81209caf |
|
06-Mar-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/nir: Resolve source modifiers on Gen8+ logic operations. On Gen8+, AND/OR/XOR/NOT don't support the abs() source modifier, and negate changes meaning to bitwise-not (~, not -). This isn't what NIR expects, so we should resolve the source modifers via a MOV. +30 Piglits (fs-op-bit{and,or,xor}-not-abs-*). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
237dcb4aa7c39c59bfd225ae3d73caf709be216d |
|
05-Mar-2015 |
Mark Janes <mark.a.janes@intel.com> |
Fix invalid extern "C" around header inclusion. System headers may contain C++ declarations, which cannot be given C linkage. For this reason, include statements should never occur inside extern "C". This patch moves the C linkage statements to enclose only the declarations within a single header. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e214000f258ae564e64d839cccee9418526f226b |
|
14-Jan-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Don't use backend_visitor::instructions after creating the CFG. This is a fix for a regression introduced in commit a9f8296d ("i965/fs: Preserve the CFG in a few more places."). The errata this code works around is described in a comment before the function: "[DevBW, DevCL] Errata: A destination register from a send can not be used as a destination register until after it has been sourced by an instruction with a different destination register. The framebuffer write's sources must be in message registers, which SEND instructions cannot have as a destination. There's no way for this errata to affect anything at the end of the program. Just remove the code. Cc: 10.4, 10.5 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84613 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
49a7f8c919d23fec977116f218780a35896cc1dd |
|
28-Feb-2015 |
Brian Paul <brianp@vmware.com> |
i965: replace Elements() with ARRAY_SIZE() Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ee3f6745723856419d7f5ecb17652e19855c4caa |
|
06-Jul-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Remove redundant discard jumps. With the previous optimization in place, some shaders wind up with multiple discard jumps in a row, or jumps directly to the next instruction. We can remove those. Without NIR on Haswell: total instructions in shared programs: 5777258 -> 5775872 (-0.02%) instructions in affected programs: 20312 -> 18926 (-6.82%) helped: 716 With NIR on Haswell: total instructions in shared programs: 5773163 -> 5771785 (-0.02%) instructions in affected programs: 21040 -> 19662 (-6.55%) helped: 717 v2: Use the CFG rather than the old instructions list. Presumably the placeholder halt will be in the last basic block. v3: Make sure placeholder_halt->prev isn't the head sentinel (caught twice by Eric Anholt). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
34c93fd7f119fa824062e05377de849b8a2da0e6 |
|
04-Feb-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix lower_load_payload() not to use an incorrect half for immediates and uniforms. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ea7b4d25c8da352f4ca0dcaefa4fadb9e202636e |
|
06-Feb-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix lower_load_payload() to take into account non-zero reg_offset. Fixes metadata guess when instructions in the program specify a destination register with non-zero reg_offset and when the payload of a LOAD_PAYLOAD spans several registers. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
08b4c8f7bf2cc2fe914a07a32bf4961894593e72 |
|
04-Feb-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload(). MRFs cannot be read from anyway so they cannot possibly be a valid source of LOAD_PAYLOAD. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8e47f51a5a7aba2bb56e7185988072431444d811 |
|
17-Jan-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Less broken handling of force_writemask_all in lower_load_payload(). It's perfectly fine to read the second half of a register written with force_writemask_all from a first half MOV instruction or vice versa, and lower_load_payload shouldn't mark the whole MOV as belonging to the second half in that case. Replicate the same metadata to both halves of the destination when writemasking is disabled. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6e62a52865787362ae1deb9dee80140d3a66c519 |
|
20-Feb-2015 |
Ben Widawsky <benjamin.widawsky@intel.com> |
i965/skl: Use 1 register for uniform pull constant payload When under dispatch_width=16 the previous code would allocate 2 registers for the payload when only one is needed. This manifested itself through bugs on SKL which needs to mess with this instruction. Ken though this might impact shader-db, but apparently it doesn't Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89118 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88999 Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Timo Aaltonen <timo.aaltonen@canonical.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c442d0961e4ec6dcc304d652b637bb60687ce3cb |
|
14-Aug-2014 |
Dave Airlie <airlied@gmail.com> |
i965: just avoid warnings with fp64 This just fills in some blanks to avoid warnings in the i965 driver. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Signed-off-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2bd139e18c941e7ea0870ba43314a5c10fd5bb12 |
|
19-Feb-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Un-hardcode DEBUG_WM, "FS", and "fragment". These code paths can (or will) be used for other shader stages. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bb33a31c3830945ae768ebdaeb686291bdf897fa |
|
10-Nov-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add algebraic optimizations for MAD. total instructions in shared programs: 5764176 -> 5763808 (-0.01%) instructions in affected programs: 25121 -> 24753 (-1.46%) helped: 164 HURT: 2 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2dad1e3abdb1ad153289455f3e273101e5bac1a8 |
|
12-Feb-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add pass to combine immediates. total instructions in shared programs: 5885407 -> 5940958 (0.94%) instructions in affected programs: 3617311 -> 3672862 (1.54%) helped: 3 HURT: 23556 GAINED: 31 LOST: 165 ... but will allow us to always emit MAD instructions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7a83f7d4814c9216316a742e97c33259f7b3ae76 |
|
09-Jan-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Handle W/UW-type immediates in dump_instructions().
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
74ef90acd751fc91ba9e20c2f16871fa9bf140e0 |
|
13-Feb-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Let dump_instructions() work before calculate_cfg(). Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fa124a337ca10d2c5d2d81a89dc8c21a7ba2f58b |
|
13-Feb-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Call calculate_cfg() before optimize(). The CFG is fundamental to the FS IR, not merely a piece of optimization. Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
eb47d0efd39d73d4388389d6c0ebe458160f79fa |
|
05-Feb-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Optimize multiplication by -1 into a negated MOV. instructions in affected programs: 968 -> 942 (-2.69%) helped: 4 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
09d6ea9ae3c487be20fb3157368003d30856d3bc |
|
11-Feb-2015 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Remove conditional mod when optimizing a SEL into a MOV. Missed in commit ca675b73, but got right in the companion commit 3c28b2c0.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3df2cb2f863836ec909f5259693c1eeef675a594 |
|
03-Feb-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix fs_inst::regs_written calculation for instructions with scalar dst. Scalar registers are required to have zero stride, fix the regs_written calculation not to assume that the instruction writes zero registers in that case. v2: Rename CEILING() to DIV_ROUND_UP(). (Matt, Ken) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f2668f9f214201503419342b980d3afa8b796926 |
|
06-Feb-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix stack allocation of fs_inst and stop stealing src array provided on construction. Using 'ralloc*(this, ...)' is wrong if the object has automatic storage or was allocated through any other means. Use normal dynamic memory instead. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a3ee6c7d1991a90d22fae992c1cb94123e51ae54 |
|
06-Feb-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Remove dependency of fs_inst on the visitor class. The fs_visitor argument of fs_inst::regs_read() wasn't used at all. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
447879eb88b8df41ad32cf4406cc636b112b72d9 |
|
10-Feb-2015 |
Francisco Jerez <currojerez@riseup.net> |
i965: Factor out virtual GRF allocation to a separate object. Right now virtual GRF book-keeping and allocation is performed in each visitor class separately (among other hundred different things), leading to duplicated logic in each visitor and preventing layering as it forces any code that manipulates i965 IR and needs to allocate virtual registers to depend on the specific visitor that happens to be used to translate from GLSL IR. v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor). Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d4a461caaf00ae13b83f106f032d3f4125687a02 |
|
15-Jan-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix INTEL_DEBUG=shader_time for SIMD8 VS (and GS). We were incorrectly attributing VS time to FS8 on Gen8+, which now use fs_visitor for vertex shaders. We don't hit this for geometry shaders yet, but we may as well add support now - the fix is obvious, and we'll just forget later. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6b3a301f611c9aabc090522951eda589e8302562 |
|
07-Jan-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Set CMP's destination type to src0's type. Allows CMP instructions with float sources to be compacted and coissued. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
94e7b59a75fc2ecc51a74196f6cd198546603b85 |
|
05-Jan-2015 |
Matt Turner <mattst88@gmail.com> |
i965: Convert CMP.GE -(abs)reg 0 -> CMP.Z reg 0. total instructions in shared programs: 5952059 -> 5951603 (-0.01%) instructions in affected programs: 138812 -> 138356 (-0.33%) GAINED: 1 LOST: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
19f9cb72c8b95febd53b80de137e7bf716fb45f1 |
|
22-Aug-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add pass to propagate conditional modifiers. total instructions in shared programs: 5974160 -> 5959463 (-0.25%) instructions in affected programs: 1743737 -> 1729040 (-0.84%) GAINED: 0 LOST: 12 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
eed7223243c35bba092dc0b26e592f6af1ba3fd7 |
|
30-Dec-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add a pass to fixup 3-src instructions that have a null dest. 3-src instructions can only have GRF/MRF destinations. It's really difficult to deal with that restriction in dead code elimination (that wants to give instructions null destinations to show that their result isn't used) while allowing 3-src instructions to have conditional mod, so don't, and just give then a destination before register allocation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a5ca86a9833d6fd57ee609d8d1e630dc66ebd371 |
|
16-Jan-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/nir: Enable SIMD16 support in the NIR FS backend. With the previous commits in place, it just works. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3f263ffbb37d77f97a86686e1d2d5eeabf4ecae6 |
|
16-Jan-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/nir: Replace fs_reg(GRF, virtual_grf_alloc(...)) with vgrf(...). brw_fs_nir.cpp creates almost all of its registers via: fs_reg reg = fs_reg(GRF, virtual_grf_alloc(num_components)); When we add SIMD16 support, we'll need to set reg->width = 16 and double the VGRF size...on pretty much every VGRF it allocates. This patch replaces that pattern with a new "vgrf" helper method: fs_reg reg = vgrf(num_components); The new function correctly takes reg_width into account. For now, reg_width is always 1, so this should have no functional change. v2: Just make vgrf() account for reg_width right away, rather than changing the behavior in the next patch. v3: Replace one last virtual_grf_alloc I missed. It's used in code that only runs for dispatch_width == 8, so it doesn't matter, but consistency is nice. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d1533d87cc7e2c39e7ce9dc838b45a2c39c96e33 |
|
16-May-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Replace fs_reg(fs_visitor, type) with fs_visitor::vgrf(type). I dislike how fs_reg has a constructor that knows about fs_visitor. Apart from that, it stands alone, with no need to interact with the rest of the compiler. Which is sensible - a class that represents a register should do just that. Allocating virtual register numbers should be left up to the compiler (fs_visitor). This patch replaces the constructor with a new fs_visitor::vgrf method, eliminating fs_reg's dependency on fs_visitor. It ends up being no more code. v2: Rebase from May 2014 -> January 2015. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
faaca237341abc0f784edfb16df50104110365b8 |
|
16-Jan-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Make lower_load_payload etc. appear in INTEL_DEBUG=optimizer. In order to support calling lower_load_payload() inside a condition, this patch makes OPT() a statement expression: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html We recently did the equivalent change in the vec4 backend (commit 9b8bd67768769b685c25e1276e053505aede5f93). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b1fe8604c6b679768e880b5e1d7f18b92067721b |
|
21-Oct-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Don't take an ir_variable for emit_general_interpolation Previously, emit_general_interpolation took an ir_variable and pulled the information it needed from that. This meant that in fs_fp, we were constructing a dummy ir_variable just to pass into it. This commit makes emit_general_interpolation take only the information it needs and gets rid of the fs_fp cruft. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ae2880d131e3197114940fc7028397079840f97d |
|
15-Oct-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Only use nir for 8-wide non-fast-clear shaders. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2faf7f87d6a1c00b3f3d3907178a2eeeefa5d2a9 |
|
15-Aug-2014 |
Connor Abbott <connor.abbott@intel.com> |
i965/fs: add a NIR frontend This is similar to the GLSL IR frontend, except consuming NIR. This lets us test NIR as part of an actual compiler. v2: Jason Ekstrand <jason.ekstrand@intel.com>: Make brw_fs_nir build again Only use NIR of INTEL_USE_NIR is set whitespace fixes
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
616a48ebc6b858cf15ade15238f1a549b701ebc3 |
|
05-Aug-2014 |
Connor Abbott <connor.abbott@intel.com> |
i965/fs: make emit_fragcoord_interpolation() not take an ir_variable
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
68ed14d6adcaf4b91216fc1c53792e88d1fd024d |
|
13-Jan-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Pass a shader stage abbreviation to fs_generator(). A lot of messages hardcoded the string "FS", which is confusing on Broadwell, where we use this code for VS support as well. shader-db particularly got confused, as it reported two "FS SIMD8" shaders, and no vertex shaders at all. Craziness ensued. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0ac4c272755c75108a10a84ce33bf6a6234985d3 |
|
10-Dec-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965/skl: Always use a header for SIMD4x2 sampler messages SKL+ overloads the SIMD4x2 SIMD mode to mean either SIMD8D or SIMD4x2 depending on bit 22 in the message header. If the bit is 0 or there is no header we get SIMD8D. We always wand SIMD4x2 in vec4 and for fs pull constants, so use a message header in those cases and set bit 22 there. Based on an initial patch from Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0b98b2bf535d6e6b6b02c0d47ea03f98adf42f15 |
|
01-Jan-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Make the precompile ignore DEPTH_TEXTURE_MODE on Gen7.5+. Gen7.5+ platforms that support the "Shader Channel Select" feature leave key->tex.swizzles[i] as SWIZZLE_NOOP except when GL_DEPTH_TEXTURE_MODE is GL_ALPHA (which is really uncommon). So, the precompile should leave them as SWIZZLE_NOOP (aka SWIZZLE_XYZW) as well. We didn't notice this because prog->ShadowSamplers is not set correctly. The next patch will fix that problem. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
408e298942ffb03c00e05dce2569c291df6bec49 |
|
01-Jan-2015 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix INTEL_DEBUG=optimizer with VF types. Hardcoding stderr is wrong; INTEL_DEBUG=optimizer uses other files. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7bc6e455e231076bfac6c678c375ea4aca94ebf0 |
|
21-Dec-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Add support for saturating immediates. I don't feel great about assert(!"unimplemented: ...") but these cases do only seem possible under some currently impossible circumstances. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3978585bccf69ff8f607cad0de025ea91c418587 |
|
20-Dec-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Add fs_reg/src_reg constructors that take vf[4]. Sometimes it's easier to generate 4x values into an array, and the memcpy is 1 instruction, rather than 11 to piece 4 arguments together. I'd forgotten to remove the prototype from fs_reg from a previous patch, so it's already there for us here. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a5481d6fbba9bcaa0c7d49ae0a3580fee21041a6 |
|
19-Dec-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add missing const qualifier.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fc016bc0f3d83bbf3eb968938f4bc9df55214ecd |
|
16-Dec-2014 |
Mark Janes <mark.a.janes@intel.com> |
i965: remove includes of sampler.h from extern "C" blocks C linkage was removed from functions in program/sampler.cpp. However, some cpp files include program/sampler.h within extern "C" blocks, causing link errors for test_vec4_copy_propagation. Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7ff457b93028d1884c7952080edd919008edf141 |
|
28-Oct-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965: Clean up fs_visitor::run and rename to run_fs Now that fs_visitor::run is back to being only fragment shader compilation, we can clean up a few stage == MESA_SHADER_FRAGMENT conditions and rename it to run_fs. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8b6a797d743be38396fcaf4a2f7fb01d3bcd9ba3 |
|
28-Oct-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965: Add fs_visitor::run_vs() to generate scalar vertex shader code This patch uses the previous refactoring to add a new run_vs() method that generates vertex shader code using the scalar visitor and optimizer. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3d10f0a98c6169dcf4b1a001e624b489abca8298 |
|
21-Oct-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965: Prepare for using the ATTR register file in the fs backend The scalar vertex shader will use the ATTR register file for vertex attributes. This patch adds support for the ATTR file to fs_visitor. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d9e29f5d88d2ddd8ee9d10b7d88377a60fd0094f |
|
21-Oct-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965: Add SIMD8 URB write low-level IR instruction This is all we need from the generator for SIMD8 vertex shaders. This opcode is just the send instruction, all the hard work will happen in the visitor using LOAD_PAYLOAD. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
686ef091a4f76fa68d9d9cd5ef00f40c1416a5da |
|
28-Oct-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965: Remove shader program argument and member from fs_generator Now that the caller passes in the shader debug name, we don't need this anymore. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9a1af7b31824ca573b2609434cf8299bfc9bc5e2 |
|
28-Oct-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965: Set shader name for generator from call site fs_generator no longer knows what stage it's generating code for, so we have to set the debug name of the shader from the call site. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7bb9d33b8d6ecc03670078c3f9623f188135abb7 |
|
21-Oct-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965: Generalize fs_generator further This removes all stage specific data from the generator, and lets us create a generator for any stage. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
092c73a7c32b240a26ffeab2ee475f6d590540b2 |
|
06-Dec-2014 |
Chris Forbes <chrisf@ijw.co.nz> |
i965: Fix regs read for FS_OPCODE_INTERP_PER_SLOT_OFFSET Dead code elimination was eating the Y offset. Fixes the piglit test: spec/ARB_gpu_shader5/arb_gpu_shader5-interpolateAtOffset-nonconst Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2881b123d00562fee8b7d2b4f7825f89a73e0d9f |
|
02-Dec-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Use ~0 to represent true on all generations. Jason realized that we could fix the result of the CMP instruction on Gen <= 5 by doing -(result & 1). Also do the resolves in the vec4 backend before use, rather than when the bool was created. The FS does this and it saves some unnecessary resolves. On Ironlake: total instructions in shared programs: 4289762 -> 4287277 (-0.06%) instructions in affected programs: 619430 -> 616945 (-0.40%) Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2a4f5728ad27bd1605b3604908caa9ad4983e256 |
|
01-Dec-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Remove "disable_derivative_optimization" driconf option. This was added in September 2013 when we first implemented the fast (but lower quality) derivatives. A quick Google search didn't turn up anyone using or recommending the option, so I suspect no one does. Applications that want to control the quality of their derivatives can use the new GL_ARB_derivative_control extension, or use the glHint mechanism. The driconf option seems superfluous. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b449366587b5f3f64c6fb45fe22c39e4bc8a4309 |
|
03-Nov-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Remove opt_drop_redundant_mov_to_flags(). Dead code elimination now handles this. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f1e5418f402c7ac087b1c127cb4476d0d02e0073 |
|
12-Nov-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Don't treat IF or WHILE with cmod as writing the flag. Sandybridge's IF and WHILE instructions can do an embedded comparison with conditional mod. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
133280120b4bc714bbb7665e383f36ab262c280a |
|
08-Nov-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Set prog_data->uses_kill if simulating alpha test via discards. When using MRT on Gen4-5, we have to simulate GL's alpha test feature by emitting discards in the fragment shader. In this case, it makes sense to set prog_data->uses_kill, which means the fragment shader may kill pixels via the discard mechanism. This saves us from having to look an extra key value in a couple of places, including in the generator. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5d23721c1df3d1a05c49b705f0d63e409c89d25f |
|
09-Mar-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add vector float immediate infrastructure. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b55777f39d00a0c54023eba012d326ff09fa530b |
|
24-Nov-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Make precompile functions accessible from C. Previously, the prototypes for brw_vs/gs/fs_precompile were scattered between brw_vs.h (C), brw_gs.h (C), and brw_fs.h (C++ only). Also, brw_fs_precompile had C++ linkage, while the others were C. This patch moves all the prototypes to a central location (brw_shader.h) and makes brw_fs_precompile have C linkage. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
62b425448ca92f568a571e656133e6d234434b4c |
|
24-Nov-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Pass gl_program pointers into precompile functions. We'd like to do precompiling for ARB vertex and fragment programs, which only have gl_program structures - gl_shader_program is NULL. This patch makes the various precompile functions take a gl_program parameter directly, rather than accessing it via gl_shader_program. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
40c0d79d295657f30cb86b002003800844851703 |
|
12-Nov-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Remove is_valid_3src(). Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1fdc75fde418a231a91ef0e68ea92d54bf594ea1 |
|
12-Nov-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Remove unused apply_stride(). Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a4ffc2a445055a81a655e64d57ee393a14a2eb16 |
|
14-Nov-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965: Move fs_visitor ra pass to new fs_visitor::allocate_registers() This will be reused for the scalar VS pass. v2 (Ken): Rebase on master. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c50f2dadc588caa0cb350f44febce56d76d60ccb |
|
14-Nov-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965: Move fs_visitor optimization pass into new method fs_visitor::optimize() We'll reuse this toplevel optimization driver for the scalar VS. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5c4efc644e731178a07bb41c55cf96425166993f |
|
14-Nov-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965: Move more code into codegen-branch of the fs_visitor::run() if statement These last few operations all only apply when we've actually generated code, optimized and allocated registers. The dummy and the repclear shaders don't need the gen4 send workaround, and don't spill. This means we can move these lines into the else-branch, which will make the following refactoring easier. v2 (Ken): Rebase on master, which removed the uncompressed stack. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f2bb655ac75d04dc033546479aabbbf4112cc54e |
|
14-Nov-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965: Refactor fs_generator API We split out SIMD8 and SIMD16 generation into seperate calls to new method generate_code(), which returns the start offset for the generated code. A new get_assembly() method returns the generated code. This avoids asserting MESA_SHADER_FRAGMENT and accessing wm_prog_data in the generator. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
497122a338e8a259abb43a71e79c1475fd44ce65 |
|
31-Oct-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Remove force uncompressed stack. Last use was in shader_time. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7e19e6c877714e05e65ca2cecd1c782fdc260cb6 |
|
31-Oct-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Use execution size of 1 for some shader_time operations. The ADDs depended on dispatch_width, which really isn't what we wanted. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ee7e6009a94d070f58a52001780d295798a28073 |
|
31-Oct-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Use mov(4) instructions to read timestamp. We only want fields 0-2.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
799106d38734a867bf33add2994cb9d414d965e7 |
|
29-Oct-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Don't compute_to_mrf() on Gen >= 7. No differences in shader-db on Haswell (Gen 7.5). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7d560a3861ff30aa9d8ec872cf9cd7d72a980eb2 |
|
21-Oct-2014 |
Ian Romanick <ian.d.romanick@intel.com> |
i965: Silence unused parameter warning in brw_dump_ir Just remove the parameter. Silences: brw_program.c: In function 'brw_dump_ir': brw_program.c:566:33: warning: unused parameter 'brw' [-Wunused-parameter] Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
40492be2a4a339b02c38990ad8736644f3a8776b |
|
24-Oct-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Silence uninitialized variable warning. The compiler isn't privy to the knowledge that we're doing at least one framebuffer write. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ffe582aa2076bc06f9d06e36287bdded45ab5b98 |
|
17-Oct-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Don't pass ir_variable * to emit_sampleid_setup(). gl_SampleID is a built-in variable that always is of type "int". Suggested by Connor Abbott. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
50d0e2e118fb3e42dc83c83de34da3eac0a0d8a1 |
|
01-Oct-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Add a MAX_GRF_SIZE define and use it various places Previously, we had a MAX_SAMPLER_MESSAGE_SIZE which we used instead. However, some FB write messages can validly be longer than this so we need something different. Since MAX_SAMPLER_MESSAGE_SIZE is validly useful on its own, we leave it alone and add a new MAX_GRF_SIZE that's big enough for FB writes. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84539 Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
eedbce9c63a3f385908bdc8a69e8be98dd3522ff |
|
01-Oct-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Fix the build
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
83669fac9d6d3f0633d19dcfebe7cf0286e69ab7 |
|
01-Oct-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Fix an uninitialized value warnings Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
94b68109fbe1cb60cc23a4c5a319039ada81ea81 |
|
27-Sep-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Optimize sqrt+inv into rsq. Transform sqrt a, b rcp c, a into sqrt a, b rsq c, b The improvement here is that we've broken a dependency between these instructions. Leads to 330 fewer INV instructions and 330 more RSQ. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5aa8d8194c4975876276a9c57cdd672978a491ad |
|
15-May-2014 |
Ian Romanick <ian.d.romanick@intel.com> |
glsl: Make ir_variable::num_state_slots and ir_variable::state_slots private Also move num_state_slots inside ir_variable_data for better packing. The payoff for this will come in a few more patches. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4ddc25a8d4796316f0296eaa10eba26bd6dd1718 |
|
27-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Properly calculate the number of instructions in calculate_register_pressure Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
514fd1c55e617bb325979cbee4a89f0727c3b567 |
|
13-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use the GRF for FB writes on gen >= 7 On gen 7, the MRF was removed and we gained the ability to do send instructions directly from the GRF. This commit enables that functinoality for FB writes. v2: Make handling of components more sane. i965/fs: Force a high register for the final FB write v2: Renamed the array for the range mappings and added a comment Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1dd9b90ecd8e001b40febfb8908c0b9a0c08c7d5 |
|
17-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Handle COMPR4 in LOAD_PAYLOAD Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6d770ce93aacf29940bacb6fe2ae78cf716751dc |
|
20-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Add split_virtual_grfs and compute_to_mrf after lower_load_payload If we are going to use LOAD_PAYLOAD operations to fill MRF registers, then we will need this. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9e1f52a6e2b0277de063a8d8b07c5e520795a23b |
|
12-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use the GRF for UNTYPED_SURFACE_READ instructions Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d25aaf1cb1688b38b2a4025dbbff26d74291723c |
|
12-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use the GRF for UNTYPED_ATOMIC instructions Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
48ddd2889e15aaf8ddb6dff5d8b6dc275f7f3f8d |
|
17-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use exec_size instead of force_uncompressed in dump_instruction Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b18fd234da275a0ec6b3c5cb77497a4c487c6366 |
|
16-Aug-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use instruction execution sizes instead of heuristics Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8f1adb59659617a682988bc503b8a0a7077abb84 |
|
05-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Remove unneeded uses of force_uncompressed Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5f41d052bf53e32761fb528f4be99a1af3a33ebc |
|
20-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Make fs_reg::effective_width take fs_inst* instead of fs_visitor* Now that we have execution sizes, we can use that instead of the dispatch width. This way it also works for 8-wide instructions in SIMD16. i965/fs: Make effective_width a variable instead of a function i965/fs: Preserve effective width in constant propagation Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6ba31cc000b096a3b1fe0e0a935a3ab2aa6803d2 |
|
12-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Better guess the width of LOAD_PAYLOAD Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
071ac3a467479ce1ada1b86e2f65d4cc7d07753e |
|
14-Aug-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Add an exec_size field to fs_inst This will, eventually, allow us to manage execution sizes of instructions in a much more natural way from the fs_visitor level. i965/fs: Explicitly set instruction execute size a couple of places i965/blorp: Explicitly set instruction execute sizes Since blorp is all 16-wide and nothing isn't, in general, very careful about register width, we'll just set it all explicitly. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fbc0a798eef49c366437014134c59e16c39c7f95 |
|
30-Aug-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Determine partial writes based on the destination width Now that we track both halves of a 16-wide vgrf, we no longer need to worry about force_sechalf or force_uncompressed. The only real issue is if the destination is too small. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7210583eb84a5d49803dbe37b0960373b4224d10 |
|
18-Aug-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode This is actually the squash of a bunch of different changes. Individual commit titles follow: i965/fs: Always 2-align registers SIMD16 for gen <= 5 i965/fs: Use the register width when applying offsets This reworks both byte_offset() and offset() to be more intelligent. The byte_offset() function now supports offsets bigger than 32. The offset() function uses the byte_offset() function together with the register width and the type size to offset the register by the correct amount. i965/fs: Change regs_read to be in hardware registers i965/fs: Change regs_written to be actual hardware registers i965/fs: Properly handle register widths in LOAD_PAYLOAD The LOAD_PAYLOAD instruction is a bit special because it collects a bunch of registers (with possibly different widths) into a single payload block. Once the payload is constructed, it's treated as a single block of data and most of the information such as register widths doesn't matter anymore. In particular, the offset of any particular source register is the accumulation of the sizes of the previous source registers. i965/fs: Properly set writemasks in LOAD_PAYLOAD i965/fs: Handle register widths in demote_pull_constants i965/fs: Get rid of implicit register doubling in the allocator i965/fs: Reserve enough registers for PLN instructions i965/fs: Make sources and destinations interfere in 16-wide i965/fs: Properly handle register widths in CSE i965/fs: Properly handle register widths in register_coalesce i965/fs: Properly handle widths in copy propagation i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD i965/fs: Properly handle register widths and odd register sizes in spilling i965/fs: Don't waste a register on texture lookups for gen >= 7 Previously, we were waisting a register in SIMD16 mode because we could only allocate registers in pairs. Now that we can allocate and address odd-sized registers, let's get rid of this special-case. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4232a776a699d80601496802ab2d817374a31f56 |
|
13-Aug-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Handle printing of registers better. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5390ca8ce93028d2d6016d4817e92427d09e4a21 |
|
25-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965: Explicitly set widths on gen5 math instruction destinations. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
004fbd53759a8993198883a32d93c9e3f6a65bbd |
|
16-Aug-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Make half() divide the register width by 2 and use it more Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
24d023b9fe18847158ec6c14e1e0e32ff022f060 |
|
13-Aug-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Add a concept of a width to fs_reg Every register in i965 assembly implicitly has a concept of a "width". Usually, this is derived from the execution size of the instruction. However, when writing a compiler it turns out that it is frequently a useful to have the width explicitly in the register and derive the execution size of the instruction from the widths of the registers used in it. This commit adds a width field to fs_reg along with an effective_width() helper function. The effective_width() function tells you how wide the register effectively is when used in an instruction. For example, uniform values have width 1 since the data is not actually repeated, but when used in an instruction they take on the width of the instruction. However, for some instructions (LOAD_PAYLOAD being the notable exception), the width is not the same. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4d5f0eb0487ad13e90f7248c95c023c35457eaf9 |
|
13-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Refactor fs_inst::is_send_from_grf() A switch statement is much easier to read/edit than a big giant or statement. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
72a3780f26951c405c35a1ae51598f7b0a65b92f |
|
17-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Print BAD_FILE registers in dump_instruction Sometimes these show up in LOAD_PAYLOAD instructions and it's nice to be able to see them. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2af4b0aeaff53190b0e17a971119d1b77ddad25b |
|
16-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Make compact_virtual_grfs an optimization pass Previously we disabled compact_virtual_grfs when dumping optimizations. The idea here was to make it easier to diff the dumped shader because you didn't have a sudden renaming. However, sometimes a bug is affected by compact_virtual_grfs and, when this happens, you want to keep dumping instructions with compact_virtual_grfs enabled. By turning it into an optimization pass and dumping it along with the others, we retain the ability to diff because you can just diff against the compact_virtual_grf output. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a25db10c1248d70cf7f4097833fa03fdccd98fe8 |
|
10-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i964/fs: Make immediate fs_reg constructors explicit Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f0d43c09b2fa32db66b7b6dc13becb0c7d3edeea |
|
06-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Use offset a lot more places We have this wonderful offset() function for advancing registers, but we're not using it. Using offset() allows us to do some sanity checking and avoid manually touching fs_reg::reg_offset. In a few commits, we will make offset do even more nifty things for us. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0089d025aa7f7497b3097c5067b589410cd40fbc |
|
20-Aug-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: fix a comment in compact_virtual_grfs Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3dc3fccb7586e6198c50114d6245017fc9badde8 |
|
19-Aug-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Rewrite fs_visitor::split_virtual_grfs The original vgrf splitting code was written with the assumption that vgrfs came in two types: those that can be split into single registers and those that can't be split at all It was very conservative and bailed as soon as more than one element of a register was read or written. This won't work once we start allowing a regular MOV or ADD operation to operate on multiple registers. This rewrite allows for the case where a vgrf of size 5 may appropriately be split in to one register of size 1 and two registers of size 2. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Acked-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
75afe17b7954984ea5b55c2a6d5d124f5eb03328 |
|
26-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Manually generate the meta fast-clear shader Previously, we were generating the fast-clear shader from GLSL. The problem is that fast clears require that we use a replicated write rather than a regular write instruction. In order to get this we had a complicated and somewhat fragile optimization pass that looked for places where we can use a replicated write and used it. Since replicated writes have a lot of restrictions, we only ever use them for fast-clear operations. This commit replaces the optimization pass with a function that just generates the shader we want. This is a) less code, b) less fragile than the optimization pass, and c) generates a more efficient shader. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
49374fab5d793ed426e01f7fef82c87442c14860 |
|
02-Sep-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Make instruction lists local to the bblocks. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
153d148e9e1f89a567b5079003b4b8070925ddcd |
|
25-Aug-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Replace initialization loops with memset(). Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f0598d413bc8eb7ab02318f1db2dbd446a3c736c |
|
02-Sep-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Don't iterate between blocks with inst->next/prev. When instruction lists are per-basic block, this won't work. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2ff0ff880c14f246a419ae3949b2462617e485e1 |
|
01-Sep-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Don't use instruction list after calculating the cfg. The only trick is changing a break into a return true in register coalescing, since the macro is actually a double loop, and break will do something different than you expect. (Wish I'd realized that earlier!) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a4fb8897a2bd00eefa8a503ec17d45e791bced91 |
|
01-Sep-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Remove now unneeded calls to calculate_cfg(). Now that nothing invalidates the CFG, we can calculate_cfg() immediately after emit_fb_writes()/emit_thread_end() and never again. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
072ea414d04f1b9a7bf06a00b9011e8ad521c878 |
|
01-Sep-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Remove cfg-invalidating parameter from invalidate_live_intervals. Everything has been converted to preserve the CFG. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a9f8296dbb4ee9fba22c4c2af625eaa29676f002 |
|
25-Aug-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Preserve the CFG in a few more places. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
517e01b5c3db9ba750698096e823134b288e213f |
|
22-Sep-2014 |
Eric Anholt <eric@anholt.net> |
mesa: Move register_allocate.c to util. The r300 gallium driver is using it outside of the Mesa tree, and I wanted to do so for vc4 as well. Rather than make the multiple-definitions problem even more complicated, just move it to more-shared code. v2: Don't forget to delete the symlink in r300 (review by Matt). Delete more r300-helper references (review by Emil) Don't prefix util/ header inclusion with "util/" (review by Emil) Reviewed-by: Matt Turner <mattst88@gmail.com> (v1) Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> (v1)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
19b08e1bb3a3508049d0527743b2f50f855a24c2 |
|
29-Aug-2014 |
Jordan Justen <jordan.l.justen@intel.com> |
i965/fs: Remove direct fs_visitor brw_wm_prog_key dependence Instead we store a void pointer to the key, and cast it to brw_wm_prog_key for fragment shader specific code paths. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
49e5f76a65978a6188572c0197523dd9c312ebeb |
|
29-Aug-2014 |
Jordan Justen <jordan.l.justen@intel.com> |
i965/fs: Remove direct fs_visitor brw_wm_prog_data dependence Instead we store a brw_stage_prog_data pointer, and cast it to brw_wm_prog_data for fragment shader specific code paths. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
78bd12619474e98503965541c61c5d7e9c408110 |
|
13-Sep-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Mark delta_x/y as BAD_FILE if remapped away completely. Commit afe3d1556f6b77031f7025309511a0eea2a3e8df (i965: Stop doing remapping of "special" regs.) stopped remapping delta_x/delta_y, and additionally stopped considering them always-live. We later realized delta_x was used in register allocaiton, so we actually needed to remap it, which was fixed in commit 23d782067ae834ad53522b46638ea21c62e94ca3 (i965/fs: Keep track of the register that hold delta_x/delta_y.). However, that commit didn't restore the "always consider it live" part. If all the code using delta_x was eliminated, fs_visitor::delta_x would be left pointing at its old register number. Later code in register allocation would handle that register number specially...even though it wasn't actually delta_x. To combat this, set delta_x/y to BAD_FILE if they're eliminated, and check for that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83127 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: "10.3" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
936ca6f3cfb563719d8b51ae000d4f0594aba824 |
|
29-Aug-2014 |
Jordan Justen <jordan.l.justen@intel.com> |
i965: Add uses_kill to brw_wm_prog_data Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ecf6c2675783d369385b32a859b01491fb7fcf12 |
|
06-Sep-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Don't look at virtual_grf_sizes for uniforms Uniform values are in the UNIFORM register file, not the GRF register file. Looking in virtual_grf_sizes makes no sense and only makes the output of dump_instructions confusing. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ef8477cddf9a6b1e13608e4fad9b55c86d0e5af4 |
|
05-Sep-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Fix basic block tracking in try_rep_send(). The 'start' instruction is always in the current block, except for the case of shader time, which emits code in a pattern seen no where else. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
248eaff63d9a5484df1105a0c484d20e086f5f83 |
|
04-Sep-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Pass block to insert and remove functions missed earlier. Otherwise, the basic block start/end IPs don't get updated properly, leading to a broken CFG. This usually results in the following assertion failure: brw_fs_live_variables.cpp:141: void brw::fs_live_variables::setup_def_use(): Assertion `ip == block->start_ip' failed. Fixes KWin, WebGL demos, and a score of Piglit tests on Sandybridge and earlier hardware. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
23e20f4687269f795e912a05bf12baaa94d0dd5a |
|
29-Aug-2014 |
Jordan Justen <jordan.l.justen@intel.com> |
i965/fs: Use prog rather than fp->Base in fs_visitor Reduce fs_visitor's dependence on gl_fragment_program. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f92fbd554f2e9e702a2bd650c9b2571a3f4f1ab8 |
|
02-Sep-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Move curb_read_length/total_scratch to brw_stage_prog_data. All shader stages have these fields, so it makes sense to store them in the common base structure, rather than duplicating them in each. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e8f83538dd4203befe63998b703afd2b488ad56a |
|
29-Aug-2014 |
Jason Ekstrand <jason.ekstrand@intel.com> |
i965/fs: Don't segfault when debug-logging a null program Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cbfcb1b06992e4310683bb54a67d011b08010ec7 |
|
05-Aug-2014 |
Connor Abbott <connor.abbott@intel.com> |
i965/fs: don't pass ir_variable * to emit_samplepos_setup() We were only using it to get at its type, which we already know because it's a builtin variable. Signed-off-by: Connor Abbott <connor.abbott@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ec3d06f591f9561289f7bc64a543a1e8a625faee |
|
05-Aug-2014 |
Connor Abbott <connor.abbott@intel.com> |
i965/fs: don't pass ir_variable * to emit_frontfacing_interpolation() We were only using it to get at its type, which we already know because it's a builtin variable. v2 (Ken): Rebase on Matt's optimized gl_FrontFacing calculations. Signed-off-by: Connor Abbott <connor.abbott@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
70691f0c283ec4e03523f3a4690d9b897b36872e |
|
30-Aug-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix GPU hangs when INTEL_DEBUG=no16 is set. The replicated data clear shader needs to be SIMD16, or else the GPU will hang. So, compile it even if INTEL_DEBUG=no16 is set. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
20a849b4aa63c7fce96b04de674a4c70f054ed9c |
|
13-Jul-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Use basic-block aware insertion/removal functions. To avoid invalidating and recreating the control flow graph. Also stop invalidating the CFG in places we didn't add or remove an instruction. cfg calculations: 202951 -> 80307 (-60.43%) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a3d0ccb037082f3aa66bd558dfbe89f63a6eedd3 |
|
12-Jul-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Pass a cfg pointer to generate_{code,assembly}. The loop over all instructions is now two-fold, over all of the blocks and all of the instructions in each block. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
19c6617adfec618889bb52d5398b8ac3d5969c18 |
|
10-Aug-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Optimize gl_FrontFacing calculation on Gen4/5. Doesn't use fewer instructions, but it does avoid writing the flag register and if we want to switch the representation of true for Gen4/5 in the future, we can just delete the AND instruction.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d1c43ed48777115072809fdb394bccae88ffe83c |
|
10-Aug-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Optimize gl_FrontFacing calculation on Gen6+. total instructions in shared programs: 4288650 -> 4282838 (-0.14%) instructions in affected programs: 595018 -> 589206 (-0.98%) Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2e51dc838be177a09f60958da7d1d904f1038d9c |
|
09-Aug-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Use ~0 to represent true on Gen >= 6. total instructions in shared programs: 4292303 -> 4288650 (-0.09%) instructions in affected programs: 299670 -> 296017 (-1.22%) Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f9dc7aabb3273d6d8a54c6778a5695a8527f4454 |
|
08-Jul-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965: Add optimization pass to let us use the replicate data message The data port has a SIMD16 'replicate data' message, which lets us write the same color for all 16 pixels by sending the four floats in the lower half of a register instead of sending 4 times 16 identical component values in 8 registers. The message comes with a lot of restrictions and could be made generally useful by recognizing when those restriction are satisfied. For now, this lets us enable the optimization when we know it's safe, but we don't enable it by default. The optimization works for simple color clear shaders only, but does recognized and support multiple render targets. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1effbf68983c924b3b70fd2fd9206af6b5475335 |
|
07-Jul-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965: Add an option to not generate the SIMD8 fragment shader For now, this can only be triggered with a new 'no8' INTEL_DEBUG option and a new context flag. We'll use the context flag later, but introducing it now lets us bisect to this commit if it breaks something. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
35ca28816509a887538a6d0c62c96279b38ef8e4 |
|
15-Apr-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Add pass to rename registers to break live ranges. The pass breaks live ranges of virtual registers by allocating new registers when it sees an assignment to a virtual GRF it's already seen written. total instructions in shared programs: 4337879 -> 4335014 (-0.07%) instructions in affected programs: 343865 -> 341000 (-0.83%) GAINED: 46 LOST: 1 [mattst88]: Make pass not break in presence of control flow. invalidate_live_intervals() only if progress. Fix up delta_x/delta_y. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2c50212b14da27de4e3da62488ae4e35c069d84e |
|
11-Aug-2014 |
Neil Roberts <neil@linux.intel.com> |
i965: Store uniform constant values in a gl_constant_value instead of float The brw_stage_prog_data struct previously contained an array of float pointers to the values of parameters. These were then copied into a batch buffer to upload the values using a regular assignment. However the float values were also being overloaded to store integer values for integer uniforms. This can break if x87 floating-point registers are used to do the assignment because the fst instruction tries to fix up invalid float values. If an integer constant happened to look like an invalid float value then it would get altered when it was copied into the batch buffer. This patch changes the pointers to be gl_constant_value instead so that the assignment should end up copying without any alteration. This also makes it more obvious that the values being stored here are overloaded for multiple types. There are some static asserts where the values are uploaded to ensure that the size of gl_constant_value is the same as a float. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81150 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c66d928f2c9fa59e162c391fbdd37df969959718 |
|
17-Jul-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Enable INTDIV in SIMD16 mode. All we need to do is decompose this to two SIMD8 instructions, like we do in many other cases. We even already have code for that. I apparently just botched this last time I tried, and it was easy. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
24878f31c4287a6cc4cfd0fabc34075f9dad4e03 |
|
08-Jul-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Drop "do dual source blending" generator parameter. When dual source blending, the visitor already stores a flag in brw_wm_prog_data (dual_src_blend) for the state upload code to use. The generator also receives this, so there's no need to pass an additional flag. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f17bfc9ba954608c58fd0560f255e40eef7e7cea |
|
11-Aug-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Never use the Gen8 code generators. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
074d472398b3cc7f32fe5c0cc742853cf66fabed |
|
30-Jun-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Switch to the EU emit layer for code generation on Broadwell. Everything should be in place to unify code generation between Gen4-7 and Gen8+. We should be able to drop the Gen8 generators at this point. However, leave them hooked up for a brief moment, for testing and comparison purposes. Set GEN8=1 to use the old Gen8+ code generator paths. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
23d782067ae834ad53522b46638ea21c62e94ca3 |
|
11-Aug-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Keep track of the register that hold delta_x/delta_y. They're needed in register allocation. Fixes a regression since afe3d155. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78875 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0f4c5a70c6e759e3a7bddd7f1c2d2b8d219552a4 |
|
03-Aug-2014 |
Chris Forbes <chrisf@ijw.co.nz> |
i965: Get rid of backend_instruction::sampler The generators no longer use this. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
72e55bb6888ff4d6b69b10d9c58573e4c3d492ec |
|
25-Feb-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
util: Move the open-addressing linear-probing hash_table to src/util. This hash table is used in core Mesa, the GLSL compiler, and the i965 driver, which makes it a good candidate for the new src/util module. It's much faster than program/hash_table.[ch] (see commit 6991c2922f5 for data), and José's u_hash_table.c has a comment saying Gallium should probably consider switching to a linear probing hash table at some point. So this seems like the best candidate for a shared data structure. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> v2 (Jason Ekstrand): Pick up another hash_table use and patch up scons Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
38ffef7840edddada23bac48f669d2070e6f158c |
|
18-Jul-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Fix gl_SampleID for 2x MSAA and SIMD16 mode. We might be able to do this without an extra program key field, but this is non-invasive and fixes the bug, for now. This fixes the following Piglit tests on Broadwell: - ARB_sample_shading/builtin-gl-sample-id 2 - ARB_sample_shading/builtin-gl-sample-position 2 - EXT_framebuffer_multisample/multisample-blit 2 color - EXT_framebuffer_multisample/multisample-blit 2 color linear - EXT_framebuffer_multisample/multisample-blit 2 depth - EXT_framebuffer_multisample/no-color 2 depth combined - EXT_framebuffer_multisample/no-color 2 depth separate - EXT_framebuffer_multisample/no-color 2 depth single - EXT_framebuffer_multisample/no-color 2 depth-computed combined - EXT_framebuffer_multisample/no-color 2 depth-computed separate - EXT_framebuffer_multisample/no-color 2 depth-computed single - EXT_framebuffer_multisample/unaligned-blit 2 color msaa - EXT_framebuffer_multisample/unaligned-blit 2 depth msaa Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991 Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: "10.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5d9f5cd35b63e8d7fdb42a5ad26c53d2a19f6985 |
|
15-Jul-2014 |
Anuj Phogat <anuj.phogat@gmail.com> |
Revert "i965: Extend compute-to-mrf pass to understand blocks of MOVs" This reverts commit bbefb15e01e1c16af69646898918982ae00f8c92. Fixes the 11 regressions caused in framebuffer_blit tests in Khronos GLES3 CTS tests: Original patch reduced the instruction count but had no performance benefits. So, it's safe to revert it without causing any performance regressions. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Acked-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cf1b5eee7f36af29d1d5caba3538ad4985e51f81 |
|
16-Jul-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Use WE_all for gl_SampleID header register munging. This code should execute without regard to the currently executing channels. Asking for gl_SampleID inside control flow might break in strange ways. It appears to break even at the top of the program in SIMD16 mode occasionally as well. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e5adc560cc8544200faa3e04504202839626ab37 |
|
11-Jul-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Set force_uncompressed and force_sechalf on samplepos setup. gen8_fs_generator uses these to decide whether to set the execution size to 8 or 16, so we incorrectly made both of these MOVs the full width in SIMD16 shaders. (It happened to work out on Gen4-7.) Setting them should also help inform optimization passes what's really going on, which could help avoid bugs. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
58270c2fac493497ed7923830f49051a53e86a07 |
|
08-Jul-2014 |
Connor Abbott <cwabbott0@gmail.com> |
exec_list: Make various places use the new length() method. Instead of hand-rolling it. v2 [mattst88]: Rename get_size to length. Expand comment in ir_reader. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1] Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Connor Abbott <connor.abbott@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6e91f2df958c835a1973e32d71578fa295ef00a8 |
|
18-Nov-2013 |
Chris Forbes <chrisf@ijw.co.nz> |
i965/fs: add generator support for pixel interpolator query V5: - Split into separate opcodes - Pass message data in src1 immediate - Put noperspective bit in fs_inst rather than adding any junk to backend_instruction Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bbefb15e01e1c16af69646898918982ae00f8c92 |
|
08-Jul-2014 |
Kristian Høgsberg <krh@bitplanet.net> |
i965: Extend compute-to-mrf pass to understand blocks of MOVs The current compute-to-mrf pass doesn't handle blocks of MOVs. Shaders that end with a texture fetch follwed by an fb write are left like this: 0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000010: send(8) g2<1>UW g6<8,8,1>F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; 0x00000020: mov(8) g113<1>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000028: mov(8) g114<1>F g3<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000030: mov(8) g115<1>F g4<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000038: mov(8) g116<1>F g5<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000040: sendc(8) null g113<8,8,1>F render ( RT write, 0, 4, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; This patch lets compute-to-mrf recognize blocks of MOVs and match them to instructions (typically SEND) that writes multiple registers. With this, the above shader becomes: 0x00000000: pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000008: pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 WE_normal 1Q compacted }; 0x00000010: send(8) g113<1>UW g6<8,8,1>F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; 0x00000020: sendc(8) null g113<8,8,1>F render ( RT write, 0, 20, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; which is the bulk of the shader db results: total instructions in shared programs: 987040 -> 986720 (-0.03%) instructions in affected programs: 844 -> 524 (-37.91%) GAINED: 0 LOST: 0 The optimization also applies to MRT shaders that write the same color value to multiple RTs, in which case we can eliminate four MOVs in a similar fashion. See fbo-drawbuffers2-blend in piglit for an example. No measurable performance impact. No piglit regressions. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ce706b4a9bd53fbe274687025965333541a0e70d |
|
30-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Make a brw_predicate enum. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
46e5b2a497216133be656b38ebfcf96da64b7744 |
|
30-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Make a brw_conditional_mod enum. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3de11cacf0cb307ff3b4130746732d9db73d7583 |
|
30-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Use enum brw_reg_type for register types. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
34ef6a7651d6651e0bca77c4d4b890af582ad360 |
|
30-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Move is_zero/one/null/accumulator into backend_reg. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
53992a102ffddf2e0fad401252cfc1c034d022ad |
|
30-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Use immediate storage in brw_reg for visitor regs. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
489ec685542590c7412db81623952c1aa75d946f |
|
19-May-2014 |
Eric Anholt <eric@anholt.net> |
i965: Update a ton of comments about constant buffers. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c0f1929dd23bbc558e9eef0f8fd40e10dfef3c21 |
|
19-May-2014 |
Eric Anholt <eric@anholt.net> |
i965: Move dispatch_grf_start_reg and first_curbe_grf into stage_prog_data. I wanted to access this value from stage-generic code, so stop storing it under two different names. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3d826729dabab53896cdbb1f453c76fab1c7e696 |
|
29-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Use unreachable() instead of unconditional assert(). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
266109736a9a69c3fdbe49fe1665a7a63c5cc122 |
|
25-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Use typed foreach_in_list_safe instead of foreach_list_safe. Acked-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c5030ac0ac15d3c91c4352789f94281da9a9dcad |
|
25-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Use typed foreach_in_list instead of foreach_list. Acked-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e8e5f0a342505a4d10cbcdee03592c96d286b57c |
|
24-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Use is_head_sentinel() instead of ->prev == NULL. Makes it more clear what we're doing and requires less knowledge of exec_list. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1bfc0a11027449ae7ab7c28eb695f26de530eccf |
|
29-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Mark predicated PLN instructions with dependency hints. To implement the unlit_centroid_workaround, previously we emitted (+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 1Q }; (-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 1Q }; where the flag register contains the channel enable bits from g0. Since the predicates are complementary, the pair of pln instructions write to non-overlapping components of the destination, which is the case that the dependency control hints are designed for. Typically setting dependency control hints on predicated instructions isn't safe (if an instruction doesn't execute due to the predicate, it won't update the scoreboard, leaving it in a bad state) but since we must have at least one channel executing (i.e., +f0 is true for some channel) by virtue of the fact that the thread is running, we can put the +f0 pln instruction last and set the hints: (-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 NoDDClr 1Q }; (+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 NoDDChk 1Q }; Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4fe53ee5d7c418d1ed51c5e8dfe5a2b1f48127a3 |
|
29-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Predicate PLN instructions used in unlit centroid WA. Maybe lets us skip some PLN instructions if whole subspans are disabled? Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e58992aedd9693f0356f3691d510a5e976473a0c |
|
29-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Pass const references to emit functions. Cuts 10k of .text and saves a bunch of useless struct copies.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e4b05af5d42b192ead493bc6ef9061ae57390058 |
|
28-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Pass const references to instruction functions. text data bss dec hex filename 4270747 123200 39648 4433595 43a6bb i965_dri.so 4244821 123200 39648 4407669 434175 i965_dri.so Cuts 25k of .text and saves a bunch of useless struct copies. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
46659d46a8c2f7bbc8deb472faff2dccbde92d29 |
|
24-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Make can_do_source_mods() a member of the instruction classes. Pretty nonsensical to have it as a method of the visitor just for access to brw. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
48f1143c64e46b3d11dc318d7825b6167a2b78e5 |
|
23-Jun-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Don't fix_math_operand() on Gen >= 8. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fab92fa1cba4196a4947731e7105bd1494dfffc4 |
|
18-Apr-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Optimize SEL with the same sources into a MOV. instructions in affected programs: 474 -> 462 (-2.53%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
138905d728fd1f4b38ff6a7137a5bbcac1d0875a |
|
18-Apr-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Lower LOAD_PAYLOAD and clean up. Clean up with with register_coalesce()/dead_code_eliminate().
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b996216384679e9bce5a62e417198da704c09c19 |
|
28-May-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add SHADER_OPCODE_LOAD_PAYLOAD. Will be used to simplify the handling of large virtual GRFs in SSA form. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
237aac39b1994b0fa1e8cd3490ad415b144a8b5f |
|
09-Jun-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Invalidate live intervals when inserting Gen4 SEND workarounds. We need to invalidate the live intervals when inserting new instructions. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ecc78eab119ac8fa3df380a80bc94975e986523c |
|
09-Jun-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Don't use the head sentinel as an fs_inst in Gen4 workaround code. When walking backwards, we want to stop at the head sentinel, which is where scan_inst->prev->prev == NULL, not scan_inst->prev == NULL. Fixes random crashes, as well as valgrind errors. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6e61892aea542593875ebb8ae209af18bbad84bd |
|
05-Jun-2014 |
Iago Toral Quiroga <itoral@igalia.com> |
i965/fs: Let the gen < 8 generator know about runtime_check_aads_emit In gen < 6 we need to produce conditional code based on this flag when doing framebuffer writes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
84e0a5c406f2a8f060352eaa4b5c138e3f1a5a86 |
|
27-May-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add fs_inst constructor that takes a list of sources. Also add an emit() function that calls it. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
521f9b9a48da586ca3352cea7f8bf7c49741cf0d |
|
20-Feb-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add a function to resize fs_inst's sources array. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
07af0abef024f8a17a00975265eff79aa069c9b5 |
|
27-May-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Clean up fs_inst constructors. In a fashion suggested by Ken. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b1dcdcde2e323f960833f5c7da65d5c2c20113c9 |
|
17-Mar-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Loop from 0 to inst->sources, not 0 to 3. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
27e12a8ea933e2f978e0ce9286422e6025c7377d |
|
20-Feb-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Store the number of sources an fs_inst has. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1b60391ed48dc18b034fc3dc837919f4c8b7905c |
|
20-Feb-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: ralloc fs_inst's fs_reg sources. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6d3a15223aedaff26dd3aab900e02c8548956973 |
|
20-Feb-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add and use an fs_inst copy constructor. Will get more complicated when fs_reg src becomes a pointer. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
55bd8b8b660f983b486e699ca74fe5652297331d |
|
07-Apr-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Debug the optimization passes by dumping instr to file. With INTEL_DEBUG=optimizer, write the output of dump_instructions() to a file each time an optimization pass makes progress. This lets you easily diff successive files to see what an optimization pass did. Example filenames written when running glxgears: fs8-0000-00-start fs8-0000-01-04-opt_copy_propagate fs8-0000-01-06-dead_code_eliminate fs8-0000-01-12-compute_to_mrf fs8-0000-02-06-dead_code_eliminate | | | | | | | `-- optimization pass name | | | | | `-- optimization pass number in the loop | | | `-- optimization loop interation | `-- shader program number Note that with INTEL_DEBUG=optimizer, we disable compact_virtual_grfs, so that we can diff instruction lists across loop interations without the register numbers being changes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e9bf1662b048e5927f841e84719a3180650a2b0a |
|
29-May-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Give dump_instructions() a filename argument. This will allow debugging code to dump the IR after an optimization pass makes progress (the next patch). Only let it open and write to a file if the effective user isn't root. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
56d6dcf4f771d57d2759b2a5c5006f24444c696f |
|
29-May-2014 |
Matt Turner <mattst88@gmail.com> |
i965: Give dump_instruction() a FILE* argument. Use function overloading rather than default arguments, since gdb doesn't know about default arguments. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c938be8ad272a06bc0e91c4e718b61a0c5de400e |
|
17-Apr-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Don't use brw_imm_* unnecessarily. Using brw_imm_* creates a source with file=HW_REG, and the scheduler inserts barrier dependencies when it sees HW_REG. None of these are hardware-registers in the sense that they're special and scheduling shouldn't touch them. A few of the modified cases already have HW_REGs for other sources, so it won't allow extra flexibility in some cases. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cd1c1d302b60bdcc131d0feb048c9bc03896ee2f |
|
15-May-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Don't hardcode DEBUG_WM in generic fs code. Similar to Paul's commit e9fa3a944 except brw_fs_generator's debug_flag is for DEBUG_WM and DEBUG_BLORP. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1472584397f7b5ef70dfdffda0aab4a0a38a4db0 |
|
26-Jan-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Assume fragment color clamping is off when precompiling. Modern applications frequencly use both UNORM buffers and FLOAT buffers with color clamping disabled. (FLOAT with clamping explicitly enabled and SNORM buffers appear to be less common.) We don't need to emit saturates in the fragment shader in either of the common cases. Mesa sets ctx->Color._ClampFragmentColor to false if all the color buffers are UNORM. Also, for GL_FIXED_ONLY mode (the default in legacy OpenGL), it will be false if any FLOAT buffers are bound. Since the common case is false, that should be our default. Thanks to Roland Scheidegger for pointing out some faulty logic in v1 of this patch (unnecessary code and incorrect explanations). v2: Drop superfluous code and reword commit message. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cca6dc9f0fd43db366730d67baae1affdca8c6de |
|
14-May-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Rip struct brw_wm_compile out of the visitors and generators. Instead, just pass the key and prog_data as separate parameters. This moves it up a level - one step further toward getting rid of it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2d4ac9b5b825b745257e935dd9b33a2d3507c72a |
|
14-May-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Plumb a mem_ctx all the way through the FS compile. 'c' is going away, but we still need a memory context that lives for the duration of the compile. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
65b2df3ec8906c51ae5b28df9c0b2c71981080d0 |
|
14-May-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Replace c->key with a direct reference in fs_visitor. 'c' is going away. This is also shorter. Marking the key pointer as const will also deter people from changing it in fs_visitor, as it's absolutely not OK to modify it there. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8a04e0de8bbf4caf08c0759f2abaa94de64ee5fd |
|
14-May-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Replace c->prog_data with a direct reference in fs_visitor. 'c' is going away. This is also a bit shorter. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
55f4e3a06b52c3e8b6bfad851e1d4e5243f1e2c0 |
|
14-May-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Move some flags that affect code generation to fs_visitor. runtime_check_aads_emit isn't actually used currently, but I believe we should be using it on Gen4-5, so I haven't eliminated it. See https://bugs.freedesktop.org/show_bug.cgi?id=78679 for details. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8ef78828fadb0f35b07be93492b3d7c297bb9ffd |
|
14-May-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Move payload register info from brw_wm_compile to fs_visitor. This data is created by fs_visitor and only used when emitting code, so keeping it in fs_visitor makes sense. I decided it would be reasonable to group these all together in a struct, since they're highly related. v2: s/nr_payload_regs/payload.num_regs/ in some comments (chrisf). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c76e6db05f9256711a226de8562124a5f14aae2d |
|
14-May-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Simplify gl_SampleMaskIn handling. As far as I can tell, there's no point in allocating an extra register and generating a MOV---we can just use the copy provided as part of our thread payload directly. It's already in the right format. Of course, there are zero Piglit tests for this. We don't actually ship the extension (GL_ARB_gpu_shader5) that exposes this functionality either. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5cd7cf58e66ebb4e87a7fe6bba3b43f062ace47f |
|
14-May-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Rename c->sample_mask_reg to sample_mask_in_reg. This is actually for gl_SampleMaskIn, which is quite different than gl_SampleMask. Renaming should help avoid confusion. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
db9c915abcc5ad78d2d11d0e732f04cc94631350 |
|
14-May-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Move c->last_scratch into fs_visitor. Nothing outside of fs_visitor uses it, so we may as well keep it internal. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7e28bd797dbe1721e5d97916f041493d1f30220d |
|
14-May-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Move total_scratch calculation into fs_visitor::run(). With this one use gone, c->last_scratch is now only used inside fs_visitor. The rest of the driver uses prog_data->total_scratch. We already compute similar prog_data fields in fs_visitor, so this seems reasonable. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c51163b0cf7aff0375b1a5ea4cb3da9d9e164044 |
|
14-May-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Move perf_debug about register spilling to a more obvious spot. The if (!allocated_without_spills) block is an obvious spot for this performance warning message. In the Vec4 backend, scratch is also used for indirect access of temporary arrays. The FS backend doesn't implement that yet, but if it did, this message would be inaccurate, since scratch access wouldn't necessarily mean spilling. Moving it preemptively fixes that. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
afe3d1556f6b77031f7025309511a0eea2a3e8df |
|
07-May-2014 |
Eric Anholt <eric@anholt.net> |
i965: Stop doing remapping of "special" regs. Now that we aren't using pixel_[xy] in live variables, nothing is looking at these regs after the visitor stage. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
da0c3b02e71c7552ba9324a01a73602094105fcc |
|
28-Mar-2014 |
Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> |
i965/fs: Add support for the MAC instruction. This allows us to generate the MAC (multiply-accumulate) instruction, which can be used to implement some expressions in fewer instructions than doing a series of MUL and ADDs. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
306ed81b9363721058c568244f9860c5c8c819f4 |
|
04-Apr-2014 |
Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> |
i965: Add writes_accumulator flag Our hardware has an "accumulator" register, which can be used to store intermediate results across multiple instructions. Many instructions can implicitly write a value to the accumulator in addition to their normal destination register. This is enabled by the "AccWrEn" flag. This patch introduces a new flag, inst->writes_accumulator, which allows us to express the AccWrEn notion in the IR. It also creates a n ALU2_ACC macro to easily define emitters for instructions that implicitly write the accumulator. Previously, we only supported implicit accumulator writes from the ADDC, SUBB, and MACH instructions. We always enabled them on those instructions, and left them disabled for other instructions. To take advantage of the MAC (multiply-accumulate) instruction, we need to be able to set AccWrEn on other types of instructions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
30c35d1dcb2fde19b1c968751fda5151b795d257 |
|
09-Apr-2014 |
Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> |
i965: Add is_accumulator() function. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
73400d8f70bab8549fb4cbcdc9ba905bf93b8716 |
|
14-Apr-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Remove dead_code_eliminate_local(). Subsumed by the new dead_code_eliminate() function. No shader-db changes. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f34f39330bb41fb0a86930908de10353193a841d |
|
13-Apr-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Reimplement dead_code_elimination(). total instructions in shared programs: 1653399 -> 1651790 (-0.10%) instructions in affected programs: 92157 -> 90548 (-1.75%) GAINED: 2 LOST: 2 Also significantly reduces the number of optimization loop iterations: total loop iterations in shared programs: 39724 -> 31651 (-20.32%) loop iterations in affected programs: 21617 -> 13544 (-37.35%) Including some great pathological cases, like 29 -> 3 in Strike Suit Zero and 24 -> 3 in Dota2. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6230b646a5a3f100f4d3dc05dff6c3ace85ee96c |
|
26-Mar-2014 |
Eric Anholt <eric@anholt.net> |
i965/fs: Track whether we're doing dual source in a more obvious way. I'm going to be turning dual_src_output into an array in a moment. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
14b85e3a47b19ffe9c96f67b43f780f8abc86061 |
|
01-Apr-2014 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add a couple more global special regs to special[] Nothing bad came of this because they weren't used after visitor running, but leaving them in a bad state seems like a recipe for pain later. Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4303d26f93919914fa58c0418a3935235d5ae359 |
|
26-Mar-2014 |
Eric Anholt <eric@anholt.net> |
i965/fs: Handle arrays of special regs more cleanly. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
72b845e6409ad353af1abd420162917dadda5a7e |
|
26-Mar-2014 |
Eric Anholt <eric@anholt.net> |
i965/fs: Fix dump_instructions() on uniforms. All of a vec4 uniform was being printed as "u0" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6bda3a526759f655cad62178b491264584119ae1 |
|
07-Apr-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix "SIMD16 unsupported" messages via KHR_debug. Performance warnings are logged via KHR_debug in addition to when the INTEL_DEBUG=perf environment variable is set. Without this, messages in debug contexts would have "(null)" for the reason. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0fbcdec2f6d6fd98db82c680d8bae8eee77ff9f2 |
|
08-Feb-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Split fs_visitor::register_coalesce() into its own file. The function has gotten large, and brw_fs.cpp is the largest source file in the driver. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8b1ab5c93bfcc22b7f50a5c10958e43d0571f8a0 |
|
27-Mar-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Mark appropriate fs_inst members as const. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
26012c16737a8542316062ef17fa9a0b34e274b7 |
|
26-Mar-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Recalculate live intervals in calculate_register_pressure(). Otherwise calling dump_instructions() after declaring a new fs_reg would segfault when calculate_register_pressure()'s loop over reg walked off the end of the virtual_grf_start[] array that calculate_live_intervals() would have reallocated for you, if it had known there was a new register.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0d99aef6c8a940e52afcbffa7091ff9c854ba120 |
|
24-Mar-2014 |
Eric Anholt <eric@anholt.net> |
i965: Fix compiler warning about signed/unsigned. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
de7ad2c88f4ec243c95eaed22c41d0e537912e01 |
|
07-Mar-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Accurately bail on SIMD16 compiles. Ideally, we'd like to never even attempt the SIMD16 compile if we could know ahead of time that it won't succeed---it's purely a waste of time. This is especially important for state-based recompiles, which happen at draw time. The fragment shader compiler has a number of checks like: if (dispatch_width == 16) fail("...some reason..."); This patch introduces a new no16() function which replaces the above pattern. In the SIMD8 compile, it sets a "SIMD16 will never work" flag. Then, brw_wm_fs_emit can check that flag, skip the SIMD16 compile, and issue a helpful performance warning if INTEL_DEBUG=perf is set. (In SIMD16 mode, no16() calls fail(), for safety's sake.) The great part is that this is not a heuristic---if the flag is set, we know with 100% certainty that the SIMD16 compile would fail. (It might fail anyway if we run out of registers, but it's always worth trying.) v2: Fix missing va_end in early-return case (caught by Ilia Mirkin). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v1] Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1] Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b207e88b25e526d0f1ada7b19605b880a27866dc |
|
08-Mar-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Support pull parameters in SIMD16 mode. This is just a matter of reusing the pull/push constant information set up by the SIMD8 compile. This gains us 78 SIMD16 programs in shader-db. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
229319e0f0f872cfb19de3eb0ab620ca611d65d8 |
|
12-Mar-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Use a single instance of the pull_constant_loc[] array. Now that we don't renumber uniform registers, assign_constant_locations and move_uniform_array_access_to_pull_constants use the same names. So, they can share a single copy of the pull_constant_loc[] array. This simplifies the code considerably. assign_constant_locations() doesn't need to walk through pull_params[] to rediscover reladdr demotions; it just has that information in pull_constant_loc[]. We also only need to rewrite the instruction stream once, instead of twice. Even better, we now have a single array describing the layout of all pull parameters, which we can pass to the SIMD16 program. This actually hurts a few shaders in Serious Sam 3, and one in KWin: total instructions in shared programs: 1841957 -> 1842035 (0.00%) instructions in affected programs: 1165 -> 1243 (6.70%) Comparing dump_instructions() before and after the pull constant transformations with and without this patch, it appears that there is a uniform array with variable indexing (reladdr) and constant indexing (of array element 0). Previously, we uploaded array element 0 as both a pull constant (for reladdr) /and/ a push constant. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
542f2e47f2f22522b963a7ab1f8b485d1c9985ba |
|
11-Mar-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Don't renumber UNIFORM registers. Previously, remove_dead_constants() would renumber the UNIFORM registers to be sequential starting from zero, and the resulting register number would be used directly as an index into the params[] array. This renumbering made it difficult to collect and save information about pull constant locations, since setup_pull_constants() and move_uniform_array_access_to_pull_constants() used different names. This patch generalizes setup_pull_constants() to decide whether each uniform register should be a pull constant, push constant, or neither (because it's unused). Then, it stores mappings from UNIFORM register numbers to params[] or pull_params[] indices in the push_constant_loc and pull_constant_loc arrays. (We already did this for pull constants.) Then, assign_curb_setup() just needs to consult the push_constant_loc array to get the real index into the params[] array. This effectively folds all the remove_dead_constants() functionality into assign_constant_locations(), while being less irritable to work with. v2: Add assert(remapped <= i), requested by Topi. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d9f339eccd87413d9f6bf6dd6217db01630f12f8 |
|
10-Mar-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Split pull parameter decision making from mechanical demoting. move_uniform_array_access_to_pull_constants() and setup_pull_constants() both have two parts: 1. Decide which UNIFORM registers to demote to pull constants, and assign locations. 2. Mechanically rewrite the instruction stream to pull the uniform value into a temporary VGRF and use that, eliminating the UNIFORM file access. In order to support pull constants in SIMD16 mode, we will need to make decisions exactly once, but rewrite both instruction streams. Separating these two tasks will make this easier. This patch introduces a new helper, demote_pull_constants(), which takes care of rewriting the instruction stream, in both cases. For the moment, a single invocation of demote_pull_constants can't safely handle both reladdr and non-reladdr tasks, since the two callers still use different names for uniforms due to remove_dead_constants() remapping of things. So, we get an ugly boolean parameter saying which to do. This will go away. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2163e0fd5a6bf2ac95aef331c30f010cb6e39cab |
|
08-Mar-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Record pull constant locations for all array elements. When demoting a variably indexed uniform array to pull constants, we only recorded the location for the base of the array (element 0). Recording locations for all array elements is a trivial amount of code and will make subsequent refactoring easier. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7c7627781feca0c8738da66425d6c530ea598dc4 |
|
07-Mar-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Save push constant location information. Previously, both move_uniform_array_access_to_pull_constants() and setup_pull_constants() maintained stack-local arrays with this information. Storing this information will allow it to be used from multiple functions, allowing us to split and move code around. We'll also eventually want to pass pull constant location information to the SIMD16 compile. Saving this information will help us do that. Unfortunately, the two functions *cannot* share the contents of the array just yet. remove_dead_constants() renumbers all the UNIFORM registers to be contiguous starting at zero, so the two functions talk about uniforms using different names. We can't even remap them, since move_uniform_array_access_to_pull_constants() deletes UNIFORM registers that are only accessed with reladdr, so remove_dead_constants can't even see them. This situation will improve in the next few patches. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
de77efde91401919fe7282a4b07300a10185792b |
|
11-Mar-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Delete dead code to fail compiles with SIMD16 pull parameters. The SIMD8 compile will determine whether pull parameters are necessary. If so, it will set prog_data->nr_pull_params to a value greater than 0. brw_wm_fs_emit checks if nr_pull_params > 0 and skips the SIMD16 compile altogether. So, this code should never occur. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7554539d7ebbed5f5048ddeadaf5a5dc6e2ce2a6 |
|
11-Mar-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Invalidate live intervals when demoting uniforms to pull params. Normally, nothing uses live intervals at this point, so this isn't necessary. However, dump_instructions() calculates them and uses them to show register pressure. So, calling dump_instructions() in this area of the code would segfault due to the arrays being the wrong size. This is not a candidate for stable branches because it only serves to fix internal debugging code that you manually have to invoke by altering the source code or using gdb. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
13782dcf9d34a1bd276312cdecc44deb8f7caafd |
|
11-Mar-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Print "+reladdr" on variably-indexed uniform arrays. Previously, dump_instruction() would print output such as: { 2} 3: mov vgrf1:F, u0:F { 3} 4: mov vgrf7:F, u0:F { 4} 5: mov vgrf8:F, u0:F which looked like either a scalar access or perhaps a constant-indexed access of element 0, when it was really a variable index. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
01d9023a9b9a50b42f7a4ef4799d0e35e0b045ca |
|
11-Mar-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix register types in dump_instructions(), again. In commit e57d77280efcbfd6579a88f071426653287ef833, I fixed this for destinations in the Vec4 backend, and sources in the scalar backend. But not both types in both backends. To prevent this mess from continuing, make the reg_encoding table static, so only the disassembler can use it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a76e5dce4fc8d50f8699c108833f24e80167d706 |
|
23-Dec-2013 |
Eric Anholt <eric@anholt.net> |
i965: Move compiler debugging output to stderr. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f28c9208652143b4925bd97ce9823728c34d34a5 |
|
21-Feb-2014 |
Eric Anholt <eric@anholt.net> |
i965: Refactor debug dumping of GLSL IR. This was only going to get worse when tesselation shows up, and was causing too much extra duplication in my stderr changes coming up. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7770b026937948e1be3ed55f9ff97e6521c500df |
|
22-Feb-2014 |
Matt Turner <mattst88@gmail.com> |
Revert "i965/fs: Make fs_reg's type an enum for better debugging." This reverts commit 5ceadd29b0af835d741bcf09b9622c628e549ae6. I rebased and apparently failed to build test. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75355
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
760c6777a0530b4894dec564cdf218f5364b4df1 |
|
22-Feb-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Drop the emit(fs_inst) overload. Using this emit function implicitly creates three copies, which is pointlessly inefficient. 1. Code creates the original instruction. 2. Calling emit(fs_inst) copies it into the function. 3. It then allocates a new fs_inst and copies it into that. The second could be eliminated by changing the signature to fs_inst(const fs_inst &) but that wouldn't eliminate the third. Making callers heap allocate the instruction and call emit(fs_inst *) allows us to just use the original one, with no extra copies, and isn't much more of a burden. Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
326fc60ee9457d17fb97a7f49c977743426b0859 |
|
20-Feb-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Pass fs_regs by constant reference where possible. These functions (modulo emit_lrp, necessitating the small fix-up) pass these arguments by value unmodified to other functions. No point in making an additional copy. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
070f20272fcfdcafe5d843d240e876ef5cfda560 |
|
20-Feb-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Move setting opcode = NOP to its one useful location. All other callers of init() immediately set opcode to something else. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5ceadd29b0af835d741bcf09b9622c628e549ae6 |
|
20-Feb-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Make fs_reg's type an enum for better debugging. Since the enum is marked as packed, it'll still take only one byte. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c2ebbe2728cd709029313f4b9c9cc53432c510a1 |
|
20-Feb-2014 |
Eric Anholt <eric@anholt.net> |
i965: Stop throwing away our double precision for time calculations. Fixes negative times being reported in our perf debug. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9e3cab8881626edd72d222f35c5d2a5fd9661bce |
|
15-Feb-2014 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add an optimization pass to remove redundant flags movs. We generate steaming piles of these for the centroid workaround, and this quickly cleans them up. total instructions in shared programs: 1591228 -> 1590047 (-0.07%) instructions in affected programs: 26111 -> 24930 (-4.52%) GAINED: 0 LOST: 0 (Improved apps are l4d2, csgo, and dolphin) Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
eef710fc53113a5b3d6bbf7d9a20f63d7add7911 |
|
19-Feb-2014 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Use a separate variable to keep track of the last uniform index seen. Like the VEC4 back-end does. It will make dynamic allocation of the param_size array easier in a future commit. Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6f56d5dc6047d0f926706e28fe1d809622c5b7e3 |
|
08-Dec-2013 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Remove fs_reg::retype. There doesn't seem to be any reason for it to be a method, and it's surprising that the expression 'reg.retype(t)' doesn't retype its object but rather it creates a temporary with the new type. Use 'retype(reg, t)' instead. Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ae8b066da5862b4cfc510b3a9a0e1273f9f6edd4 |
|
19-Feb-2014 |
Francisco Jerez <currojerez@riseup.net> |
i965: Move up duplicated fields from stage-specific prog_data to brw_stage_prog_data. There doesn't seem to be any reason for nr_params, nr_pull_params, param, and pull_param to be duplicated in the stage-specific subclasses of brw_stage_prog_data. Moving their definition to the common base class will allow some code sharing in a future commit, the removal of brw_vec4_prog_data_compare and brw_*_prog_data_free, and the simplification of the stage-specific brw_*_prog_data_compare. Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
422679835479a053d5b5ac9cf75e2fbb7e827755 |
|
15-Feb-2014 |
Eric Anholt <eric@anholt.net> |
i965/fs: Drop dead comment about the old proj_attrib_mask optimization. The code was removed early last year. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a3a55067bdf608402aeb98d515c52e2436a8f226 |
|
15-Jan-2014 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Remove fs_reg::sechalf. The same effect can be achieved using ::subreg_offset. Remove the less flexible alternative and define a convenience function to keep the fs_reg interface sane. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
019bf6ed8dd4843512e9d4924f4702ce36047ad5 |
|
15-Jan-2014 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Remove fs_reg::smear. The same effect can be achieved using a combination of ::stride and ::subreg_offset. Remove the less flexible ::smear to keep the data members of fs_reg orthogonal. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
756d37b1d6d09ad7ee3b8835888a49d4256e427b |
|
08-Dec-2013 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Add support for specifying register horizontal strides. v2: Some improvements for copy propagation with non-contiguous register strides and mismatching types. v3: Add example of the situation that the copy propagation changes are intended to avoid. Clarify that 'fs_reg::apply_stride()' is expected to work with zero strides too. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4c7206bafdd7bde7617e14840812e43459682718 |
|
08-Dec-2013 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Add support for sub-register byte offsets to the FS back-end IR. It would be nice if we could have a single 'reg_offset' field expressed in bytes that would serve the purpose of both, but the semantics of 'reg_offset' are quite complex currently (it's measured in units of one, eight or sixteen dwords depending on the register file and the dispatch width) and changing it to bytes would be a very intrusive change at this stage. Add a separate 'subreg_offset' field for now. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8a2508ee0726b349318c1e05122edbe5a545480a |
|
25-Nov-2013 |
Francisco Jerez <currojerez@riseup.net> |
glsl: Add image type to the GLSL IR. v2: Reuse the glsl_sampler_dim enum for images. Reuse the glsl_type::sampler_* fields instead of creating new ones specific to image types. Reuse the same constructor as for samplers adding a new 'base_type' argument. Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d3e948340be3fe61d3724f1b96651c2097b4026e |
|
07-Feb-2014 |
Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> |
i965: Add missing null check in fs_visitor::dead_code_eliminate_local() Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e57d77280efcbfd6579a88f071426653287ef833 |
|
05-Feb-2014 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix register types in dump_instructions(). This regressed when I converted BRW_REGISTER_TYPE_* to be an abstract type that doesn't match the hardware description. dump_instruction() was using reg_encoding[] from brw_disasm.c, which no longer matches (and was incorrect for Gen8+ anyway). This patch introduces a new function to convert the abstract enum values into the letter suffix we expect. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reported-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5eeb12c0bcd3d25fee9749d797f8541a96935192 |
|
25-Jan-2014 |
Chris Forbes <chrisf@ijw.co.nz> |
i965/fs: Assume FBO rendering in precompile if MRT. If multiple color outputs are written, this shader is unlikely to be useful with a winsys framebuffer. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
046f8d8a6fcae641ed0e7e06e24ab5da39a57c86 |
|
25-Jan-2014 |
Chris Forbes <chrisf@ijw.co.nz> |
i965/fs: Guess nr_color_regions better in precompile Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
947c828d5cbffe9640ac63103a6223112eeff27f |
|
12-Dec-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add a saturation propagation optimization pass. Transforms, for example, mul vgrf3, vgrf2, vgrf1 mov.sat vgrf4, vgrf3 into mul.sat vgrf3, vgrf2, vgrf1 mov vgrf4, vgrf3 which gives register_coalescing an opportunity to remove the MOV instruction. total instructions in shared programs: 1515039 -> 1504634 (-0.69%) instructions in affected programs: 798586 -> 788181 (-1.30%) GAINED: 0 LOST: 4 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ce527a6722491fa7d696266d5dec13f0b72bf8e8 |
|
10-Dec-2013 |
Topi Pohjolainen <topi.pohjolainen@intel.com> |
i965: rename tex_ms to tex_cms Prepares for the introduction of non-compressed multi-sampled lookup used in the blorp programs. v2: now also taking into account gen8 Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f5cfb4ae21df8eebfc6b86c0ce858b1c0a9160dd |
|
15-Jan-2014 |
Anuj Phogat <anuj.phogat@gmail.com> |
i965: Ignore 'centroid' interpolation qualifier in case of persample shading This patch handles the use of 'centroid' qualifier with 'in' variables in a fragment shader when persample shading is enabled. Per sample shading for the whole fragment shader can be enabled by: glEnable(GL_SAMPLE_SHADING) or using {gl_SamplePosition, gl_SampleID} builtin variables in fragment shader. Explaining it below in more detail. /* Enable sample shading using OpenGL API */ glEnable(GL_SAMPLE_SHADING); glMinSampleShading(1.0); Example fragment shader: in vec4 a; centroid in vec4 b; main() { ... } Variable 'a' will be interpolated at sample location. But, what interpolation should we use for variable 'b' ? ARB_sample_shading recommends interpolation at sample position for all the variables. GLSL 400 (and earlier) spec says that: "When an interpolation qualifier is used, it overrides settings established through the OpenGL API." But, this text got deleted in later versions of GLSL. NVIDIA's and AMD's proprietary linux drivers (at OpenGL 4.3) interpolates at sample position. This convinces me to use the similar approach on intel hardware. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a92e5f7cf63d496ad7830b5cea4bbab287c25b8e |
|
06-Jan-2014 |
Anuj Phogat <anuj.phogat@gmail.com> |
i965: Use sample barycentric coordinates with per sample shading Current implementation of arb_sample_shading doesn't set 'Barycentric Interpolation Mode' correctly. We use pixel barycentric coordinates for per sample shading. Instead we should select perspective sample or non-perspective sample barycentric coordinates. It also enables using sample barycentric coordinates in case of a fragment shader variable declared with 'sample' qualifier. e.g. sample in vec4 pos; A piglit test to verify the implementation has been posted on piglit mailing list for review. V2: Do not interpolate all the 'in' variables at sample position if fragment shader uses 'sample' qualifier with one of them. For example we have a fragment shader: #version 330 #extension ARB_gpu_shader5: require sample in vec4 a; in vec4 b; main() { ... } Only 'a' should be sampled at sample location, not 'b'. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bf0773aeca86669371d99eadb928c6dc92d5840a |
|
10-Jan-2014 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Optimize LRP with x == y into a MOV. total instructions in shared programs: 1487331 -> 1485988 (-0.09%) instructions in affected programs: 45638 -> 44295 (-2.94%) GAINED: 7 LOST: 0 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
413622fbefb63c54d331ce5d708479ab847e6709 |
|
15-Dec-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Print the maximum register pressure. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
391eaa59bd2b71078a28ff34dd3d4eed470653ee |
|
05-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Show register pressure in dump_instructions() output. Dumping the number of live registers at each IP allows us to see register pressure and identify any local maxima. This should aid in debugging passes designed to reduce register pressure, as well as optimizations that suddenly trigger spilling. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3b74f4b2333704bc7dbe5714e1f2aa4d201669ee |
|
05-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Compute the number of live registers at each IP. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0ea600ef1ada70bc2280909d86abe29dfd3e8f73 |
|
16-Dec-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Call opt_peephole_sel later in the optimization loop. Calling it after value numbering (added in the next commit) prevents some instruction count regressions. total instructions in shared programs: 1524387 -> 1523905 (-0.03%) instructions in affected programs: 13112 -> 12630 (-3.68%) GAINED: 0 LOST: 3 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ede6c341f686def647bf8ee4912e759b3d9933a6 |
|
16-Dec-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Calculate interference better in register_coalesce. Previously we simply considered two registers whose live ranges overlapped to interfere. Cases such as set A ------ ... | mov B, A -- | ... | B | A use B -- | ... | use A ------ would be considered to interfere, even though B is an unmodified copy of A whose live range fit wholly inside that of A. If no writes to A or B occur between the mov B, A and the use of B then we can safely coalesce them. Instead of removing MOV instructions, we make them NOPs and remove them at once after the main pass is finished in order to avoid recomputing live intervals (which are needed to perform the previous step). total instructions in shared programs: 1543768 -> 1513077 (-1.99%) instructions in affected programs: 951563 -> 920872 (-3.23%) GAINED: 46 LOST: 22 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4a7d0c550e28ae3d434da81c9029272d22fa315e |
|
11-Dec-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Support coalescing registers of size > 1. total instructions in shared programs: 1550048 -> 1549880 (-0.01%) instructions in affected programs: 1896 -> 1728 (-8.86%) Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9bb4d71fd2ff8ed24cb4d1485df1f1ff667bcb3c |
|
11-Dec-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add a comment explaining how register coalescing works. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
71bc11a37508542662132b16a53acd5f541cd2b4 |
|
05-Dec-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Print reg_offset for vgrf of size > 1 in dump_instruction(). Previously we wouldn't print the +0 for the first part of a VGRF of size greater than 1. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
11f6882e1daf73cead8bc9febe5e29ada98f4add |
|
07-Dec-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Create a new fragment shader backend for Broadwell. This replaces the old fs_generator backend. v2: Port to the C-based representation of assembly instructions. Fix texturing after the texture-grf merge. v3: Add high quality derivative support. Fix SET_SIMD4X2_OFFSET. v4: Pass brw_context to gen8_instruction functions as required. v5: Fixes for MRT, as well as zero render targets (alpha test only). v6: Replace n-wide with SIMDn in comments and messages; port over Topi's blorp-generator changes; add missing TXF_MCS opcode, fix missing high quality derivatives for DDX; fix typo (all caught by Eric). Simplify ADDC/SUBB handling; drop "Used only on Gen6+" comment (caught by Matt). Emit SIMD16 versions of three source instructions (caught by both Eric and Matt). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
746e3e3b3ad20a29ee6de64d663d2dc11deac06e |
|
13-Nov-2013 |
Eric Anholt <eric@anholt.net> |
i965: Replace 8-wide and 16-wide with SIMD8 and SIMD16. Those are the terms used in the docs, and think "n-wide" was something I just happened to say. Note that shader-db needs updating for the INTEL_DEBUG=fs parsing. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
26a3bf5c726199d7664d5878ef1f73592e55caa7 |
|
28-Nov-2013 |
Eric Anholt <eric@anholt.net> |
i965: Stop doing our optimization on a copy of the GLSL IR. The original intent was that we'd keep a driver-private copy, and there would be the normal copy for swrast to make use of without the tuning (or anything more invasive we might do) specific to i965. Only, we don't generate swrast code any more, because swrast can't render current shaders anyway. Thus, our private copy is rather a waste, and we can just do our backend-specific operations on the linked shader. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
544869377d6ec8c150d4d91d17a01f22cd84d479 |
|
08-Dec-2013 |
Chris Forbes <chrisf@ijw.co.nz> |
i965/fs: add support for gl_SampleMaskIn[] v2: - add assert so we don't run into trouble on Gen6. - adjust for Tapani's rearrangement of ir_variable Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
447bb9029f7e03b03e507053b9f63536d8fc74ac |
|
12-Dec-2013 |
Tapani Pälli <tapani.palli@intel.com> |
glsl: move variables in to ir_variable::data, part II This patch moves following bitfields and variables to the data structure: explicit_location, explicit_index, explicit_binding, has_initializer, is_unmatched_generic_inout, location_frac, from_named_ifc_block_nonarray, from_named_ifc_block_array, depth_layout, location, index, binding, max_array_access, atomic Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
33ee2c67c0a4e8f2fefbf37dacabd14918060af5 |
|
12-Dec-2013 |
Tapani Pälli <tapani.palli@intel.com> |
glsl: move variables in to ir_variable::data, part I This patch moves following bitfields in to the data structure: used, assigned, how_declared, mode, interpolation, origin_upper_left, pixel_center_integer Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c1d3080ee86cd3d914712ffe0bb533c5d6a6b271 |
|
11-Dec-2013 |
Tapani Pälli <tapani.palli@intel.com> |
glsl: introduce data section to ir_variable Data section helps serialization and cloning of a ir_variable. This patch includes the helper bits used for read only ir_variables. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7629c489c88a6f6dd47b311a90ad64e216c9a37c |
|
29-Nov-2013 |
Chris Forbes <chrisf@ijw.co.nz> |
i965: Add shader opcode for sampling MCS surface Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d30b2ed5f83841531b4c5aa21bde50acad35560a |
|
23-Oct-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: New peephole optimization to flatten IF/BREAK/ENDIF. total instructions in shared programs: 1550713 -> 1550449 (-0.02%) instructions in affected programs: 7931 -> 7667 (-3.33%) Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
13de9f03f177d3ae0921fded1a102b66130f8b40 |
|
23-Oct-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: New peephole optimization to generate SEL. fs_visitor::try_replace_with_sel optimizes only if statements whose "then" and "else" bodies contain a single MOV instruction. It also could not handle constant arguments, since they cause an extra MOV immediate to be generated (since we haven't run constant propagation, there are more than the single MOV). This peephole fixes both of these and operates as a normal optimization pass. fs_visitor::try_replace_with_sel is still arguably necessary, since it runs before pull constant loads are lowered. total instructions in shared programs: 1559129 -> 1545833 (-0.85%) instructions in affected programs: 167120 -> 153824 (-7.96%) GAINED: 13 LOST: 6 Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fa227e7cbca279cd70ea7028a33d520579385f9f |
|
23-Oct-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add SEL() convenience function. Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8814806c97ed60c5bb4d6cb1927cd05445864388 |
|
21-Oct-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Print conditional mod in dump_instruction(). Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
637dda1c307aee921ecc646b75f891deab6585a9 |
|
02-Dec-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Print argument types in dump_instruction(). Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
942151af300e067f72572cd8785fa3526132570c |
|
26-Nov-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Print ARF registers properly in dump_instruction(). Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0e4053234df5e3461e80c90dfd743c3ac96006eb |
|
26-Nov-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Don't print extra (null) arguments in dump_instruction(). Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
04d83396eef7a8c8603f55bc0a0b04c80a9f6cf5 |
|
30-Nov-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Rename register_coalesce_2() -> register_coalesce(). Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9a6b14f6745206eb018c8474feafae4bafdcb8e5 |
|
30-Nov-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Remove now useless register_coalesce() pass. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1520ae48b880d9bee287583d15ac40c89d0ced8b |
|
29-Nov-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Let register_coalesce_2() eliminate self-moves. This is the last thing that register_coalesce() still handled. total instructions in shared programs: 1561060 -> 1560908 (-0.01%) instructions in affected programs: 15758 -> 15606 (-0.96%) Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c4815f6cd6f659acd361f1b4cf63473a46ca7de9 |
|
26-Nov-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Always reserve binding table space for at least one render target. In brw_update_renderbuffer_surfaces(), if there are no color draw buffers, we always set up a null render target at surface index 0 so we have something to use with the FB write marking the end of thread. However, when we recently began computing surface indexes dynamically, we failed to reserve space for it. This meant that the first texture would be assigned surface index 0, and our closing FB write would clobber the texture. Fixes Piglit's EXT_packed_depth_stencil/fbo-blit-d24s8 test on Gen4-5, which regressed as of commit 4e5306453da6a1c076309e543ec92d999e02f67a ("i965/fs: Dynamically set up the WM binding table offsets.") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70605 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Tested-by: lu hua <huax.lu@intel.com> Cc: "10.0" mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4f64dabb5fc0361a86146ce095c11131f14dfc49 |
|
23-Nov-2013 |
Francisco Jerez <currojerez@riseup.net> |
i965/fs: Fix misleading comment. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
46cf80fb366cb14827724a7fea004e81400cc602 |
|
19-Nov-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Make the first pre-allocation heuristic be the post heuristic. I recently made us try two different things that tried to reduce register pressure so that we would be more likely to allocate successfully. But now that we have the logic for trying two, we can make the first thing we try be the normal, not-prioritizing-register-pressure heuristic. This means one less scheduling pass in the common case of that heuristic not producing spills, plus the best schedule we know how to produce, if that one happens to succeed. This is important, because our register allocation produces a lot of possibly avoidable dependencies for the post-register-allocation schedule, despite ra_set_allocate_round_robin(). GLB2.7: 1.04127% +/- 0.732461% fps improvement (n=31) nexuiz: No difference (n=5) lightsmark: 0.838512% +/- 0.300147% fps improvement (n=86) minecraft apitrace: No difference (n=15) Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a97cd0f4d7902965d5173f4bcbf2ad27c0eb5d12 |
|
30-Oct-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Add a pass to remove dead control flow. Removes IF/ENDIF and IF/ELSE/ENDIF with no intervening instructions. total instructions in shared programs: 1360393 -> 1360387 (-0.00%) instructions in affected programs: 157 -> 151 (-3.82%) (no change in vertex shaders) Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9793fc1335f11b4131d6db680bec567dcfccfb5f |
|
15-Nov-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Use source's original type in register_coalesce(). Previously, register_coalesce() would modify mov vgrf1:f vgrf2:f cmp null vgrf3:d vgrf1:d to be cmp null vgrf3:d vgrf2:f and incorrectly use vgrf2's type in the instruction that the mov was coalesced into. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ec8cc65926de3e7391f3bcec8ee26fc8f4d36159 |
|
02-Jan-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Remove force_sechalf stack Only Gen4 color write setup uses the force_sechalf flag, and it only sets it on a single instruction. It also already has to get a pointer to the instruction and manually set the saturate flag, so we may as well just set force_sechalf the same way and avoid the complexity of a stack. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e133c0103d4336c47911e89cc8a17a1c78bfdbb8 |
|
14-Nov-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Assert that IF with cmod is Gen6 only. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e9daead784921e453906853a4a78a2f3135af2e0 |
|
07-Nov-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Try a different pre-scheduling heuristic if the first spills. Since LIFO fails on some shaders in one particular way, and non-LIFO systematically fails in another way on different kinds of shaders, try them both, and pick whichever one successfully register allocates first. Slightly prefer non-LIFO in case we produce extra dependencies in register allocation, since it should start out with fewer stalls than LIFO. This is madness, but I haven't come up with another way to get unigine tropics to not spill while keeping other programs from not spilling and retaining the non-unigine performance wins from texture-grf. total instructions in shared programs: 1626728 -> 1626288 (-0.03%) instructions in affected programs: 1015 -> 575 (-43.35%) GAINED: 50 LOST: 0 Improves Unigine Tropics performance by 14.5257% +/- 0.241838% (n=38) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445 Cc: "10.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fbd8303a943d0d491b7c2415eb237a0731c7dec5 |
|
07-Nov-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Do instruction pre-scheduling just before register allocation. Long ago, the HW_REG usage in assign_curb/urb_setup() were scheduling barriers, so we had to run scheduler before them in order for it to be able to do basically anything. Now that that's fixed, we can delay the scheduling until we go to allocate (which will make the next change less scary). Cc: "10.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f7e15fcf56595aac99644292386a6e6d06dc6ec0 |
|
26-Oct-2013 |
Chris Forbes <chrisf@ijw.co.nz> |
i965/fs: Gen4-5: Implement alpha test in shader for MRT V2: Add comment explaining what emit_alpha_test() is for; fix spurious temp and bogus whitespace. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ca82ba90dd7ef78be2b95972dc19913c76d5e6a8 |
|
27-Oct-2013 |
Chris Forbes <chrisf@ijw.co.nz> |
i965/fs: Gen4-5: Setup discard masks for MRT alpha test The same setup is required here as when the user-provided shader explicitly uses KIL or discard. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
34fe051e215107dddbaae71e2edf15f88d839936 |
|
20-Oct-2013 |
Francisco Jerez <currojerez@riseup.net> |
i965: Add a 'has_side_effects' back-end instruction predicate. This patch fixes the three dead code elimination passes and the VEC4/FS instruction scheduling passes so they leave instructions with side effects alone. At some point it might be interesting to have the instruction scheduler calculate the exact memory dependencies between atomic ops, but they're rare enough that it seems unlikely that it will make any practical difference. Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e12bbb503f71b60b9f212e82fdd3ed9aaf3ab318 |
|
25-Oct-2013 |
Anuj Phogat <anuj.phogat@gmail.com> |
i965: Add FS backend for builtin gl_SampleID V2: - Update comments - Add compute_sample_id variables in brw_wm_prog_key - Add a special backend instruction to compute sample_id. V3: - Make changes to support simd16 mode. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
65d0452bbc14c69ecd2cffdb38f711cfbaab348e |
|
25-Oct-2013 |
Anuj Phogat <anuj.phogat@gmail.com> |
i965: Add FS backend for builtin gl_SamplePosition V2: - Update comments. - Add compute_pos_offset variable in brw_wm_prog_key. - Add variable uses_pos_offset in brw_wm_prog_data. V3: - Make changes to support simd16 mode. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3c28b2c09f491bfa55dc9e5d7858a8b900c25432 |
|
28-Oct-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Optimize saturating SEL.G(E) with imm val <= 0.0f. Only one program's instruction count is changed, but a shader in Tropics is also affected. instructions in affected programs: 326 -> 320 (-1.84%) Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ca675b73d3ac2e1b57ec385c2c80b05b6382f6b6 |
|
28-Oct-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Optimize saturating SEL.L(E) with imm val >= 1.0. total instructions in shared programs: 1409124 -> 1406971 (-0.15%) instructions in affected programs: 158376 -> 156223 (-1.36%) Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a8f76d829bdcdb5f238ba6206f1b768098745022 |
|
28-Oct-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Optimize OR with identical sources into a MOV. Helps a lot of Steam games. total instructions in shared programs: 1409360 -> 1409124 (-0.02%) instructions in affected programs: 20842 -> 20606 (-1.13%) Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
20d0297ff2d507aab42e59ebfde375d5205642cb |
|
20-Oct-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add reads_flag() and writes_flag() to fs_inst. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f768f998e0e5885c36af1efee6ca70fdf90deb96 |
|
22-Oct-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add is_null() method to fs_reg. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6032261682388ced64bd33328a5025f561927a38 |
|
16-Oct-2013 |
Eric Anholt <eric@anholt.net> |
i965: Merge together opcodes for SHADER_OPCODE_GEN4_SCRATCH_READ/WRITE I'm going to be introducing gen7 variants, and the previous naming was going to get confusing. Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
44ec2f1751ec4a9f0ba9035f2343ffe5e16e693c |
|
16-Oct-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Fix broken register spilling debug code. Now that reg spilling generates new vgrfs, we were looping forever if you ever turned it on. Instead, move the debug code into the register allocator right near where we'd be doing spilling anyway, which should more accurately reflect how register spilling occurs in the wild. Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
774b787d6b7abe601309cf437b09b592fea0394d |
|
29-Oct-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Drop our dead push constants before overflowing to pull constants. The idea of the original order was that you'd dead code eliminate accesses to push constants. But I've never seen a case of that (nor has shader-db), while we frequently see sparse accesses of large constant arrays that would overflow into pull constants. Cuts pull constant use on csgo, serious sam, planeshift, and the cave: total instructions in shared programs: 1695103 -> 1688795 (-0.37%) instructions in affected programs: 92024 -> 85716 (-6.85%) GAINED: 339 LOST: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5e621cb9fef7eada5a3c131d27f5b0b142658758 |
|
11-Sep-2013 |
Francisco Jerez <currojerez@riseup.net> |
i965/gen7: Implement code generation for untyped surface read instructions.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cfaaa9bbb7a6ab5819f4fa9e38352b72d6293cff |
|
11-Sep-2013 |
Francisco Jerez <currojerez@riseup.net> |
i965/gen7: Implement code generation for untyped atomic instructions. Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
26db3b933f7fbc81d6c2bead2a8b0479a3691424 |
|
20-Oct-2013 |
Francisco Jerez <currojerez@riseup.net> |
glsl: Add new atomic_uint built-in GLSL type. v2: Fix GLSL version in which the type became available. Add contains_atomic() convenience method. Split off atomic counter comparison error checking to a separate patch that will handle all opaque types. Include new ir_variable fields for atomic types. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6bb2cf2107c4461ea9dd100edaf110b839311b90 |
|
08-Oct-2013 |
Chris Forbes <chrisf@ijw.co.nz> |
i965: Add SHADER_OPCODE_TG4_OFFSET for gather with nonconstant offsets. The generator code ends up clearer this way than if we had to sniff via the message length. Implemented via the gather4_po message in hardware, which is present in Gen7 and later. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
43b05b8fac68784bc8d61851125bd49783e5ebd0 |
|
20-Oct-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Only emit interpolation setup if there are actual FS inputs. Dead code elimination would get rid of the extra instructions, but skipping this saves iterations through the optimization loop. From shader-db: N Min Max Median Avg Stddev x 14672 3 16 3 3.1334515 0.59904168 + 14672 1 16 3 2.8955153 0.77732963 Difference at 95.0% confidence -0.237936 +/- 0.0158798 -7.59342% +/- 0.506783% (Student's t, pooled s = 0.693935) Embarassingly, the classic shadow mapping shader: void main() { } used to require three iterations through the optimization loop. With this patch, it only requires one (which makes no progress). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
197f3a33fbce525e8f7799466935304d9e24c0f1 |
|
09-Oct-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Handle printing HW_REGS in dump_instruction(). Scheduling debugging now prints: Instructions before scheduling (reg_alloc 1) 0: linterp vgrf20, hw_reg2, hw_reg3, hw_reg4, 1: linterp vgrf21, hw_reg2, hw_reg3, hw_reg4+16, Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
caf9cef7eee77f736ff76a65f385bf718efd1dc1 |
|
01-Sep-2013 |
Paul Berry <stereotype441@gmail.com> |
i965/fs: Remove bogus field prog_data->dispatch_width. Despite the name, this field wasn't being set to the dispatch width at all; it was always 8. The only place it was used was that the constant buffer read length was aligned to it, and as far as I can tell from the docs, there is no need to align this value to the dispatch width; aligning it to a multiple of 8 is sufficient. So I've just replaced it with a hardcoded 8. v2: In gen6_wm_state, use brw->wm.base.push_const_size for consistency with VS and GS state upload. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
705a90e30435490c2de84f4f6741cab335fa7608 |
|
03-Oct-2013 |
Eric Anholt <eric@anholt.net> |
i965: Move the common binding table offset code to brw_shader.cpp. Now that both vec4 and fs are dynamically assigning offsets, a lot of the code is the same. v2: Avoid passing around the next offset through the class. (Review by Paul) Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4e5306453da6a1c076309e543ec92d999e02f67a |
|
03-Oct-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Dynamically set up the WM binding table offsets. Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3c9dc2d31b80fc73bffa1f40a91443a53229c8e2 |
|
02-Oct-2013 |
Eric Anholt <eric@anholt.net> |
i965: Make a brw_stage_prog_data for storing the SURF_INDEX information. It would be nice to be able to pack our binding table so that programs that use 1 render target don't upload an extra BRW_MAX_DRAW_BUFFERS - 1 binding table entries. To do that, we need the compiled program to have information on where its surfaces go. v2: Rename size to size_bytes to be more explicit. Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a5ec01fb1bd4ad5418eb16cb05e6f6929d1444e8 |
|
20-Sep-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Don't copy prop source mods into instructions that can't take them.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
36fbe66d3a71df76fcb6f915846da4471b3a8442 |
|
10-Oct-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Convert gen7 to using GRFs for texture messages. Looking at Lightsmark's shaders, the way we used MRFs (or in gen7's case, GRFs) was bad in a couple of ways. One was that it prevented compute-to-MRF for the common case of a texcoord that gets used exactly once, but where the texcoord setup all gets emitted before the texture calls (such as when it's a bare fragment shader input, which gets interpolated before processing main()). Another was that it introduced a bunch of dependencies that constrained scheduling, and forced waits for texture operations to be done before they are required. For example, we can now move the compute-to-MRF interpolation for the second texture send down after the first send. The downside is that this generally prevents remove_duplicate_mrf_writes() from doing anything, whereas previously it avoided work for the case of sampling from the same texcoord twice. However, I suspect that most of the win that originally justified that code was in avoiding the WAR stall on the first send, which this patch also avoids, rather than the small cost of the extra instruction. We see instruction count regressions in shaders in unigine, yofrankie, savage2, hon, and gstreamer. Improves GLB2.7 performance by 0.633628% +/- 0.491809% (n=121/125, avg of ~66fps, outliers below 61 dropped). Improves openarena performance by 1.01092% +/- 0.66897% (n=425). No significant difference on Lightsmark (n=44). v2: Squash in the fix for register unspilling for send-from-GRF, fixing a segfault in lightsmark. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b6af650a095034eaa2de93bf6cf2985d7fdfce89 |
|
30-Apr-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Use per-channel interference for register_coalesce_2(). This will let us coalesce into texture-from-GRF arguments, which would otherwise be prevented due to the live interval for the whole vgrf extending across all the MOVs setting up the channels of the message v2 (Kenneth Graunke): Rebase for renames. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3093085847db0455a88e45f20e29660b2b7f8515 |
|
05-Jun-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Use the new per-channel live ranges for dead code elimination. v2 (Kenneth Graunke): Rebase on s/live_variables/live_intervals/g. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3ea84beb1687f20074efdb1bcc790370bed2fc65 |
|
07-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Invalidate live intervals when compacting; don't fix them. When compacting the list of VGRFs, we patch up the live interval ranges (which are indexed by VGRF number). Unfortunately, once we make per-component data available, this will become too complicated to maintain. Instead, simply invalidate them. This was pulled out of a patch by Eric Anholt. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4b821a97b5fcdc4c530d5455c43196be09830322 |
|
06-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Create a helper function for invalidating live intervals. For now, this simply sets live_intervals_valid = false, but in the future it will do something more sophisticated. Based on a patch by Eric Anholt. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a26e17a36503c5387447cd560c81dbea6f2d89f9 |
|
26-Sep-2013 |
Chia-I Wu <olv@lunarg.com> |
i965: keep SecHalf flag after register coalescing Copy sechalf to the new register, otherwise we would read wrong HW registers. Signed-off-by: Chia-I Wu <olv@lunarg.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b645913ff6c74228d8c05dd236a545ef2e734071 |
|
28-Sep-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Remove the "ARF" register file. The registers in the architecture register file don't share much in common, so there's no point in grouping them together. Use the HW_REG class instead. The vec4 backend already does this. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e7dc88026a821a31bf2afeb934dded11c91401a1 |
|
20-Sep-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Fixup for don't dead-code eliminate instructions that write to the accumulator. Accidentally pushed an old version of the patch. v2: Set destination register using brw_null_reg(). Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
92dc16c3e2e2b9e3e71baaccc67bbe727e9d68ab |
|
20-Sep-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Don't dead-code eliminate instructions that write to the accumulator. Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
014cce3dc49f5b0bfd7fbb1940ed661c9fc7bbd7 |
|
19-Sep-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Generate code for ir_binop_carry and ir_binop_borrow. Using the ADDC and SUBB instructions on Gen7. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fb455500bfb11cca0f45076a9eaccc0ddd764731 |
|
31-Mar-2013 |
Chris Forbes <chrisf@ijw.co.nz> |
i965: add SHADER_OPCODE_TG4 Adds the Gen7 message IDs, a new SHADER_OPCODE_TG4 pseudo-op, and low-level support for emitting it via generate_tex(). V3: Updated for changes in master. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
848c0e72f36d0e1e460193a2d30b2f631529156f |
|
12-Sep-2013 |
Chia-I Wu <olv@lunarg.com> |
i965: compute DDX in a subspan based only on top row Consider only the top-left and top-right pixels to approximate DDX in a 2x2 subspan, unless the application requests a more accurate approximation via GL_FRAGMENT_SHADER_DERIVATIVE_HINT or this optimization is disabled from the new driconf option disable_derivative_optimization. This results in a less accurate approximation. However, it improves the performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at 95.0% confidence) on Haswell. No noticeable image quality difference observed. The improvement comes from faster sample_d. It seems, on Haswell, some optimizations are introduced to allow faster sample_d when all pixels in a subspan have the same derivative. I considered SAMPLE_STATE too, which allows one to control the quality of sample_d on Haswell. But it gave much worse image quality without giving better performance comparing to this change. No piglit quick.tests regression on Haswell (tested with v1). v2: better guess for precompile program key Signed-off-by: Chia-I Wu <olv@lunarg.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0e72db9f9729b8fe62213452751fed1cd337a7bc |
|
11-Sep-2013 |
Francisco Jerez <currojerez@riseup.net> |
mesa: Fix misplaced includes of "main/uniforms.h". Several C++ source files include "main/uniforms.h" from an extern "C" block, which is both unnecessary, because "uniforms.h" already checks for a C++ compiler and sets the right linkage, and incorrect, because the header file includes other C++ headers ("glsl_types.h" and "ir_uniform.h") that are supposed to get C++ linkage. Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
875972029eddfd53cb90a8e34e9f27b2afed119f |
|
03-Sep-2013 |
Paul Berry <stereotype441@gmail.com> |
i965/fs: When >64 input components, order them to match prev pipeline stage. Since the SF/SBE stage is only capable of performing arbitrary reorderings of 16 varying slots, we can't arrange the fragment shader inputs in an arbitrary order if there are more than 16 input varying slots in use. We need to make sure that slots 16-31 match the corresponding outputs of the previous pipeline stage. The easiest way to accomplish this is to just make all varying slots match up with the previous pipeline stage. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a4546ec114853235db375b20fb47ddcd6a7f21e7 |
|
03-Sep-2013 |
Paul Berry <stereotype441@gmail.com> |
i965/fs: Simplify computation of key.input_slots_valid during precompile. The for loop was rather silly. In addition to checking brw->gen < 6 on each loop iteration, it took pains to exclude bits from fp->Base.InputsRead that don't correspond to fragment shader inputs. But those bits would never have been set in the first place, since the only bits that are ever set in fp->Base.InputsRead are fragment shader inputs. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3a83b20dcccf21ec184e35bcfa9bc577379dfd51 |
|
03-Sep-2013 |
Paul Berry <stereotype441@gmail.com> |
i965/fs: Stop wasting input attribute space on gl_FragCoord and gl_FrontFacing. Previously, if a fragment shader accessed gl_FragCoord or gl_FrontFacing, we would assign them their own slots in the fragment shader input attribute array, using up space that could be made available to real varyings. This was not strictly necessary (since these values are not true varyings, and are instead computed from other data available in the FS payload). But we had to do it anyway because the SF/SBE setup code assumed that every 1 bit in the gl_program::InputsRead bitfield corresponded to a genuine varying variable. Now that the SF/SBE code consults brw_wm_prog_data and only sets up the attributes that the fragment shader actually needs, we don't have to do this anymore. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8c69eaba1a8a5e8a82112eb5c51b2f8978dd2c23 |
|
03-Sep-2013 |
Paul Berry <stereotype441@gmail.com> |
i965/fs: Change brw_wm_prog_data::urb_read_length to num_varying_inputs. On gen4-5, the FS stage reads varying inputs from URB entries that were output by the SF thread, where each register stores the interpolation setup for two components of a vec4, therefore the FS urb_read_length is twice the number of FS input varyings. On gen6+, varying inputs are directly deposited in the FS payload by the SF/SBE fixed function logic, so urb_read_length is irrelevant. However, in future patches, it will be nice to be able to consult brw_wm_prog_data to determine how many varying inputs the FS expects (rather than inferring it from gl_program::InputsRead). So instead of storing urb_read_length, we simply store num_varying_inputs in brw_wm_prog_data. On gen4-5, we multiply this by 2 to recover the URB read length. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
58f01bd17d5587c21d7f543b8f3769f3405dc420 |
|
03-Sep-2013 |
Paul Berry <stereotype441@gmail.com> |
i965/fs: Expose "urb_setup" as part of brw_wm_prog_data. At the moment, for Gen6+, the FS assumes that all varying inputs are delivered to it in the order in which they appear in the gl_program::InputsRead bitfield, and the SF/SBE setup code ensures that they are delivered in this order. When we add support for more than 64 varying components, this will no longer always be possible, because the Gen6+ SF/SBE stage is only capable of performing arbitrary reorderings of 16 varying slots. To allow extra flexibility in the ordering of FS varyings, this patch causes the FS to advertise exactly what ordering it expects. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4b3c0a797f89830fd5ba0943b061abf4fc38337e |
|
02-Sep-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Use brw_stage_state for WM data as well. This gets the VS, GS, and PS all using the same data structure. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a35b32025011eeac01f2e5a476dbf3ac132a61b3 |
|
28-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Detect GRF sources in split_virtual_grfs send-from-GRF code. It is incorrect to assume that src[0] of a SEND-from-GRF opcode is the GRF. For example, FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD uses src[1] for the GRF. To be safe, loop over all the source registers and mark any GRFs. We probably won't ever have more than one, but it's simpler to just check all three rather than attempting to bail early. Not observed to fix anything yet, but likely to. Parallels the bug fix in the previous commit, which actually does fix known failures. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
530842127eabd41a809ee4d7136ff52857a4e685 |
|
24-Apr-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add support for translating ir_triop_fma into MAD. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a4ff1fd388369dbf80d324c84502b28b5f9d3da4 |
|
15-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Shorten sampler loops in precompile key setup. Now that we have the number of samplers available, we don't need to iterate over all 16. This should be particularly helpful for vertex shaders. v2: Use the correct shader program (caught by Paul Berry). This needs to initialize the exact same set of sampler swizzles as the actual key setup, or else we end up doing recompiles due to some being XYZW and others being 0. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9c48ae751ab28f35eb878551d24c071be0ce11b0 |
|
09-Aug-2013 |
Matt Turner <mattst88@gmail.com> |
i965: Don't copy propagate bitcasts with source modifiers. Previously, copy propagation would cause bitcast_f2u(abs(float)) to be performed in a single step, but the application of source modifiers (abs, neg) happens after type conversion, leading to incorrect results. That is, for bitcast_f2u(abs(float)) we would in fact generate code to do abs(bitcast_f2u(float)). For example, whereas bitcast_f2u(abs(float)) might result in a register argument such as (abs)g2.2<0,1,0>UD v2: Set interfered = true and break in register_coalesce instead of returning false. Reviewed-by: Paul Berry <stereoytpe441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4d95efd14617d4a96a89d8e52d0cf684a5d6c4b1 |
|
05-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Add dump_instruction() support for ARF destinations. CMP instructions use BRW_ARF_NULL as a destination. Prior to this patch, dump_instruction() decoded the destination as "???". Now it decodes BRW_ARF_NULL as "(null)" and other ARFs numerically. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ee7bfab06805bff508c31b3ad3fb13d181f3fbf1 |
|
05-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Remove extraneous newline in dump_instruction() for CMP. This resulted in printouts like: 246: cmp.cmod.f0.0 ???, vgrf152, 0.000000f, (null), With this patch, CMP is properly printed on one line. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2c32c3985ca6232a81d21feb9ac6443145b42d0e |
|
06-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Consider predicated SEL instructions as whole variable writes. The instruction (+f0.0) SEL dst, src0, src1 will write either src0 or src1 to dst, depending on the predicate. Unlike most predicated instructions, it always writes to dst. fs_inst::is_partial_write() is supposed to return true if the whole register is guaranteed to be written. The !inst->predicated check makes sense for most instructions, which might not write the whole register, but SEL is a special case. This caused live interval analysis to ignore the destination of predicated SEL instructions when computing "def" information. Requires the previous commit to avoid regressions. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
53d8cff63b30326eaaafe3019d00354d4775a622 |
|
04-Aug-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Log a performance warning if skipping 16-wide due to pulls. Usually, the driver creates both 8-wide and 16-wide variants of every fragment shader. When 16-wide compilation fails, it logs a performance warning explaining why only an 8-wide program exists. However, when there are pull parameters, the driver won't even bother trying the 16-wide compile (since it would fail). In this case, it failed to emit a performance warning, leaving no explanation for the missing 16-wide program. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
21922cb70d0a2de23f6080c8b9c4324cba5a2fff |
|
06-Jul-2013 |
Chris Forbes <chrisf@ijw.co.nz> |
i965 Gen4/5: Generalize SF interpolation setup for GLSL1.3 Previously the SF only handled the builtin color varying specially. This patch generalizes that support to cover user-defined varyings, driven by the interpolation mode array set up alongside the VUE map. Based on the following patches from Olivier Galibert: - http://lists.freedesktop.org/archives/mesa-dev/2012-July/024335.html - http://lists.freedesktop.org/archives/mesa-dev/2012-July/024339.html With this patch, all the GLSL 1.3 interpolation tests that do not clip (spec/glsl-1.30/execution/interpolation/*-none.shader_test) pass. V5: Move key.do_flat_shading to brw_sf_compile.has_flat_shading; drop vestigial hunks. V6: Real bools. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
53631be4ebaa4fb13a7f129727c1cdd32fcc6f3d |
|
06-Jul-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Move intel_context::gen and gt fields to brw_context. Most functions no longer use intel_context, so this patch additionally removes the local "intel" variables to avoid compiler warnings. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
794de2f3873bcedc78300b3ba69656adc755894c |
|
06-Jul-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Move intel_context::is_<platform> flags to brw_context. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b15f1fc3c6b3b9dc4422940c412f80e581c9900d |
|
03-Jul-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Move intel_context::perf_debug to brw_context. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
329779a0b45b63be17627f026533c80b2c8f7991 |
|
03-Jul-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Move intel_context::batch to brw_context. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
426ca34b7a2c3b9edfc0189daece8de3aff80627 |
|
13-Jun-2013 |
Eric Anholt <eric@anholt.net> |
glsl: Remove ir_print_visitor.h includes and usage We have ir->print() to do the old declaration of a visitor and having the IR accept the visitor (yuck!). And now you can call _mesa_print_ir() safely anywhere that you know what an ir_instruction is. A couple of missing printf("\n")s are added in error paths -- when an expression is handed to the visitor, it doesn't print '\n' (since it might be a step in printing a whole expression tree). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0677ea063cd96adefe87c1fb01ef7c66d905535b |
|
30-May-2013 |
Dave Airlie <airlied@gmail.com> |
i965: fix problem with constant out of bounds access (v3) Okay I now understand why Frank would want to run away, this is my attempt at fixing the CVE out of bounds access to constants outside the range. This attempt converts any illegal constants to constant 0 as per the GL spec, and is undefined behaviour. A future patch should add some debug for users to find this out, but this needs to be backported to stable branches. CVE-2013-1872 v2: drop the last hunk which was a separate fix (now in master). hopefully fix the indentations. v3: don't fail piglit, the whole 8/16 dispatch stuff was over my head, and I spent a while figuring it out, but this one is definitely safe, one piglit pass extra on my Ironlake. NOTE: This is a candidate for stable branches. Signed-off-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
60f9b722ef80c499a94b4e5ab7304dcd739ea569 |
|
30-May-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
Revert "i965: fix problem with constant out of bounds access (v2)" This reverts commit 98dfd59a0445666060c97b0dccaf0e9f030b547a. The patch was clearly not Piglit tested, as it caused at least 225 tests to start crashing with assertion failures. That was before my desktop tanked and the test run died completely.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
98dfd59a0445666060c97b0dccaf0e9f030b547a |
|
30-May-2013 |
Dave Airlie <airlied@redhat.com> |
i965: fix problem with constant out of bounds access (v2) This is my attempt at fixing this as the CVE is making RH security team care enough to make me look at this. (please upstream, security fixes are more important than whatever else you are doing, if for no other reason than it saves me having to fix stuff I've no real clue about). Since Frank's original fix was denied, here is my attempt to just alias all constants that are out of bounds < 0 or > nr_params to constant 0, hopefully this provides the undefined behaviour idr requires.. CVE-2013-1872 v2: drop the last hunk which was a separate fix (now in master). hopefully fix the indentations. NOTE: This is a candidate for stable branches. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e290372542d0475e612e4d10a27b22eae3158ecd |
|
01-May-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Make virtual grf live intervals actually cover their used range. Previously, we would sometimes not consider a write to a register to extend the end of the interval, nor would we consider a read before a write to extend the start. This made for a bunch of complicated logic related to how to treat the results when dead code might be present. Instead, just extend the interval and fix dead code elimination to know how to remove it. Interestingly, this actually results in a tiny bit more optimization: total instructions in shared programs: 1391220 -> 1390799 (-0.03%) instructions in affected programs: 14037 -> 13616 (-3.00%) v2: Fix a theoretical problem with the simd16 workaround if dst == src, where we would revert the bump of the live range. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1f0f26d60c148e360908af34130c4e00dba8f3df |
|
10-Apr-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Add support for bit instructions. Don't bother scalarizing ir_binop_bfm, since its results are identical for all channels. v2: Subtract result of FBH from 31 (unless an error) to convert MSB counts to LSB counts. v3: Use op0->clone() in ir_triop_bfi to prevent (var_ref channel_expressions) from appearing multiple times in the IR. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v2]
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ab04f3b2d74af061a0d2ebf3d1a02d8fcf73ff09 |
|
30-Apr-2013 |
Eric Anholt <eric@anholt.net> |
i965: Share the register file enum between the two backends. I need this so I can look at vec4 and fs registers' files from the same .cpp file without namespaces. As far as I can tell we never rely on the particular numerical values of the files, though I thought it sounded like a good idea when doing the VS (it turns out having 0 be BAD_FILE is nicer). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
63c8155b09bca7917631ec678a0d0db6e7965a1a |
|
29-Apr-2013 |
Eric Anholt <eric@anholt.net> |
i965: Make dump_instructions be a virtual method of the visitor. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
61ca2c4f73f84eec29454698188309ab311eb503 |
|
26-Apr-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Allow LRPs with uniform registers. Improves GLB2.7 performance on my HSW by 0.671455% +/- 0.225037% (n=62). v2: Make is_valid_3src() a method of the fs_reg. (recommended by Ken) Reviewed-by: Matt Turner <mattst88@gmail.com> (v1) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5e46482993dfd30b888d5219f6fecf4b4d1f42de |
|
28-Apr-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Move is_math/is_tex/is_control_flow() to backend_instruction. These are entirely based on the opcode, which is available in backend_instruction. It makes sense to only implement them in one place. This changes the VS implementation of is_tex() slightly, which now accepts FS_OPCODE_TXB and SHADER_OPCODE_LOD. However, since those aren't generated in the VS anyway, it should be fine. This also makes is_control_flow() available in the VS. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
79f786f9367e0071916e1d3c25bfff00d114339c |
|
27-Apr-2013 |
Chris Forbes <chrisf@ijw.co.nz> |
i965/fs: Don't try to use bogus interpolation modes pre-Gen6. Interpolation modes other than perspective-barycentric-pixel-center (and their associated coefficients in the WM payload) only exist in Gen6 and later. Unfortunately, if a varying was declared as `centroid`, we would blindly read the nonexistant values, and so produce all manner of bad behavior -- texture swimming, snow, etc. Fixes rendering in Counter-Strike Source and Team Fortress 2 on Ironlake. NOTE: This is a candidate for the 9.1 branch. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Tested-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
578987ce1c17d17cfa538eb70d07a751fda55eb1 |
|
16-Apr-2013 |
Eric Anholt <eric@anholt.net> |
i965: Avoid recompiles for fragment clamping on non-clamping APIs. Removes 75/78 state-dependent recompiles in GLB2.7 (the remaining 3 are due to FBO-rendering size predictions). We currently expose GL_ARB_color_buffer_float on GL core, so we may mis-predict there, but I'm about to send a patch for removing that silly extension in that case. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
dcb1b89c65b963ccc0e37cb7ace1e69c42f8cd26 |
|
11-Apr-2013 |
Eric Anholt <eric@anholt.net> |
i965: Silence one more compile warning. We don't want to store this thing in the class, and we do need the definition to be at the top of the function and held onto until the end here, so there's not much to do besides (void) reference it. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
248175ab3b90139580f7e9403ac5243d7aeac823 |
|
11-Apr-2013 |
Eric Anholt <eric@anholt.net> |
i965: Fix an unused variable warning in the release build. It's used in an assert, but we have this as a member of the class anyway. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
adf251406b5cf5e74f88c91799228dd9cc8dac29 |
|
11-Apr-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Fix some untriggered optimization bugs with uncompressed/sechalf. We have this support for firsthalf/sechalf instructions, which would be called in the !has_compr4 (aka original gen4) 16-wide case. We currently only support 16-wide for gen5+, so we weren't tripping over this, but it would have been a problem if we ever try to enable it. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
eaca8a94e27d5ec13fcbe5158212310292270e51 |
|
09-Apr-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add basic-block-level dead code elimination. This is a poor substitute for proper global dead code elimination that could replace both our current paths, but it was very easy to write. It particularly helps with Valve's shaders that are translated out of DX assembly, which has been register allocated and thus have a bunch of unrelated uses of the same variable (some of which get copy-propagated from and then left for dead). shader-db results: total instructions in shared programs: 1735753 -> 1731698 (-0.23%) instructions in affected programs: 492620 -> 488565 (-0.82%) v2: Fix comment typo Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
36d0fde603015066fce0ff37fd9be609800243e8 |
|
09-Apr-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Remove incorrect note of writing attr in centroid workaround. This instruction doesn't update its IR destination, it just moves from payload to f0. This caused the dead code elimination pass I'm adding to dead-code-eliminate the first step of interpolation. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2cb7f1e766d28dd238274f74d9568ab4438c4965 |
|
04-Jun-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add a helper function for checking for partial register updates. These checks were all over, and every time I wrote one I had to try to decide again what the cases were for partial updates. v2: Fix inadvertent reladdr check removal. Reviewed-by: Matt Turner <mattst88@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
32a8e877666f7c3798d736bb6f05ad2f41356ebf |
|
11-Apr-2013 |
Matt Turner <mattst88@gmail.com> |
i965: NULL check prog on shader compilation failure. Also change if (shader) to if (prog) for consistency. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fe97f26c86d65b1b0e026c725c7da348a91093d9 |
|
09-Apr-2013 |
Paul Berry <stereotype441@gmail.com> |
i965: Rename backend_visitor::prog to shader_prog. The next patch is going to change the type of vec4_visitor::vp from struct gl_vertex_program * to struct gl_program *, and rename it. The sensible name to change it to is vec4_visitor::prog. However, prog is already used in backend_visitor (which vec4_visitor derives from). Since backend_visitor::prog is of type struct gl_shader_program *, it makes sense to rename it to shader_prog. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
705c8247fa0eb50587b6c19561eb31e4d3a1b876 |
|
13-Mar-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Remove now dead brw_wm_prog_key::proj_attrib_mask field. The previous commit removed the last user of this field, so there's no longer any point in setting it. Removing this should eliminate state-dependent recompiles, and make the precompile more reliable. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7183568869a0ce856b2f3d4cd9e1d7bd63ff9092 |
|
13-Mar-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Remove fixed-function texture projection avoidance optimization. This optimization attempts to avoid extra attribute interpolation instructions for texture coordinates where the W-component is 1.0. Unfortunately, it requires a lot of complexity: the brw_wm_input_sizes state atom (all the brw_vs_constval.c code) needs to run on each draw. It computes the input_size_masks array, then uses that to compute proj_attrib_mask. Differences in proj_attrib_mask can cause state-dependent fragment shader recompiles. We also often fail to guess proj_attrib_mask for the fragment shader precompile, causing us to needlessly compile it twice. Furthermore, this optimization only applies to fixed-function programs; it does not help modern GLSL-based programs at all. Generally, older fixed-function programs run fine on modern hardware anyway. The optimization has existed in some form since the initial commit. When we rewrote the fragment shader backend, we dropped it for a while. Eric readded it in commit eb30820f268608cf451da32de69723036dddbc62 as part of an attempt to cure a ~1% performance regression caused by converting the fixed-function fragment shader generation code from Mesa IR to GLSL IR. However, no performance data was included in the commit message, so it's unclear whether or not it was successful. Time has passed, so I decided to re-measure this. Surprisingly, Eric's OpenArena timedemo actually runs /faster/ after removing this and the brw_wm_input_sizes atom. On Ivybridge at 1024x768, I measured a 1.39532% +/- 0.91833% increase in FPS (n = 55). On Ironlake, there was no statistically significant difference (n = 37). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
70b27e0e4b5d15e575ea477d63c0f6cb19d645c2 |
|
18-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Use LD messages for pre-gen7 varying-index uniform loads This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on my GM45 forced to load all uniforms through the varying-index path), but we get a whole vec4 at a time to reuse in the next commit. v2: Fix comment about channels in the other message. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> NOTE: This is a candidate for the 9.1 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ce316f62efa208b1a43fe81831126fc75c5807c5 |
|
20-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Don't double-emit SEND dependency workarounds at control flow. We weren't setting needs_dep[i] in the loops, so we'd continue on to potentially add the same workaround MOVs to the later basic block boundaries, too. We can either set needs_dep[i] to exit through the normal path, or we can just return since we know we're done. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3cf69b228404791cf15231321b6a18b5701be0a6 |
|
18-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Bake regs_written into the IR instead of recomputing it later. For sampler messages, it depends on the target gen, and on gen4 SIMD16-sampler-on-SIMD8-execution we were returning 4 instead of 8 like we should. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> NOTE: This is a candidate for the 9.1 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
dca5fc14358a8b267b3854c39c976a822885898f |
|
13-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Improve performance of varying-index uniform loads on IVB. Like we have done for the VS and for constant-index uniform loads, we use the sampler engine to get caching in front of the L3 to avoid tickling the IVB L3 bug. This is also a bit of a functional change, as we're now loading a vec4 instead of a single dword, though we're not taking advantage of the other 3 components of the vec4 (yet). With the driver hacked to always take the varying-index path for all uniforms, improves performance of my old GLSL demo by 315% +/- 2% (n=4). This a major fix for some blur shaders in compositors from the varying-index uniforms support I introduced in 9.1. v2: Move old offset computation into the pre-gen7 path. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554 NOTE: This is a candidate for the 9.1 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bc0e1591f64b8b3f2693fceaaa8bba9198e26171 |
|
15-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Avoid inappropriate optimization with regs_written > 1. Right now we don't have anything with regs_written() > 1 and !inst->mlen, but that's about to change. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
740350c982bd2735b9eb9063c2b91856b6f1ad31 |
|
14-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965: Make the fragment shader pull constants index by dwords, not vec4s. We want to load vec4s, since loading a vec4 instead of a dword is basically no increased latency. But for variable indexed access, the previous requirement of aligned vec4s for a sampler LD was hard to implement. Note that this change only affects those messages that use the surface format, like sampler LDs, but not to the untyped data cache loads we've used in other cases. No significant performance difference on my GLSL demo with uniforms forced to take the varying pull constants path (n=4). NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8c694dfe6478ce9355c866ae70db45e49e499de3 |
|
13-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Move varying uniform offset compuation into the helper func. I'm going to want to change the math for gen7 using sampler LD instructions in a way that gets CSE to occur like we'd hope. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
59e858861caad2649f4c282eb277a7fc6202ab65 |
|
13-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Remove creation of a MOV instruction that's never used. We weren't inserting it into the list, so it did nothing. This line was replaced by the MOV/MUL block above. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
57a502518e79d42b014517bf36b297cc68947389 |
|
28-Mar-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix INTEL_DEBUG=shader_time for fragment shaders with discards. "discard" instructions generate HALT instructions which jump to a final HALT near the end of the shader. Previously, fs_generator created this final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it to jump right before the FB write epilogue. This is normally good. However, INTEL_DEBUG=shader_time also has an epilogue section which records the final timestamp. The frontend emits IR for this just before FS_OPCODE_FB_WRITE. Unfortunately, this led to the following ordering: 1. Shader Time Epilogue 2. Final HALT (where discards jump) 3. Framebuffer Write Epilogue This meant that discarded pixels completely skipped the shader time epilogue, causing no ending timestamp to be written. This obviously led to inaccurate results. This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just before any epilogue sections. This is where the final HALT should be generated, and makes it easy to ensure the correct ordering: 1. Final HALT 2. Shader Time Epilogue 3. Framebuffer Write Epilogue For shaders that don't discard, this opcode compiles away to nothing. The scheduler adds barrier dependencies to make sure that it doesn't get moved above any FS_OPCODE_DISCARD_JUMP instructions. One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to a mere 5.13 Gcycles. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
20d846ce8b46604ced835eb68079a0dbae2e19dc |
|
12-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965: Add names for all instructions to dump_instruction() in FS and VS. I'd previously added the minimum names to understand my dumps, but this makes dumps in general much easier to read. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b8aa9f7d3a146cff9c2c530abf815a1b316374ca |
|
06-Mar-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Generate LOD sampler message from ir_lod. v2: Support Ironlake as well. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
47e795d8612e5fde70740450d02370514ecc79e3 |
|
19-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Include everything but the final FB write in shader_time. Previously, if you just wrote a constant color to the render target, no time got noted at all. This is convenient for doing single-instruction timings, but not so much for actual program analysis. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5c5218ea6163f694a256562df1d73a108396e40d |
|
19-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Switch shader_time writes to using GRFs. This avoids conflicts between shader_time and FB writes, so we can include more of the program under our profiling. This does mean hiding more of the message setup from the optimizer, which doesn't have a way to handle multi-reg sends from GRFs. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d2ba1c24b440ee74436335d8e815be9b72b1ba7f |
|
19-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965: Track ARB program state along with GLSL state for shader_time. This will let us do much better printouts for non-GLSL programs. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0a0deb92d9e25067ac4b89cbbd8f8f8f3b4d05db |
|
20-Mar-2013 |
Paul Berry <stereotype441@gmail.com> |
i965/fs: Rename vp_outputs_written to input_slots_valid. With the introduction of geometry shaders, fragment inputs will no longer come exclusively from the vertex shader; sometimes they come from the geometry shader. So the name "vp_outputs_written" will become a misnomer. This patch renames vp_outputs_written to input_slots_valid, to reflect the true meaning of the bitfield from the fragment shader's point of view: it indicates which of the possible input slots contain valid data that was written by the previous shader stage. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
995bbc22564b22de2ef6aac4e6881fd4c23e3162 |
|
23-Feb-2013 |
Paul Berry <stereotype441@gmail.com> |
i965/fs: Avoid unnecessary recompiles due to POS bit of proj_attrib_mask. Previous to this patch, when using fixed function fragment shading, bit VARYING_BIT_POS of brw_wm_prog_key::proj_attrib_mask was being set differently during precompiles and normal usage. During precompiles it was being set only if the fragment shader reads from window position (which it never does), so it was always being set to 0. During normal usage it was being set if the vertex shader writes to all 4 components of gl_Position (which it usually does), so it was usually being set to 1. As a result, we were almost always doing an extra recompile for the fixed function fragment shader. The recompile was totally unnecessary, though, because brw_wm_prog_key::proj_attrib_mask is only consulted for fs_visitor::emit_general_interpolation(), which isn't used for VARYING_SLOT_POS. This patch avoids the unnecessary recompile by always setting bit VARYING_BIT_POS of brw_wm_prog_key::proj_attrib_mask to 1. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
eed6baf7621fa94e7888f8079b155fc67a08540c |
|
23-Feb-2013 |
Paul Berry <stereotype441@gmail.com> |
Replace gl_frag_attrib enum with gl_varying_slot. This patch makes the following search-and-replace changes: gl_frag_attrib -> gl_varying_slot FRAG_ATTRIB_* -> VARYING_SLOT_* FRAG_BIT_* -> VARYING_BIT_* Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Brian Paul <brianp@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
10a131211ef18fa1d368dee394045df945dcbb6e |
|
23-Feb-2013 |
Paul Berry <stereotype441@gmail.com> |
Get rid of _mesa_vert_result_to_frag_attrib(). Now that there is no difference between the enums that represent vertex outputs and fragment inputs, there's no need for a conversion function. But we still need to be able to detect when a given vertex output has no corresponding fragment input. So it is replaced by a new function, _mesa_varying_slot_in_fs(), which tells whether the given varying slot exists as an FS input or not. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Brian Paul <brianp@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
36b252e94724b2512ea941eff2b3a3abeb80be79 |
|
23-Feb-2013 |
Paul Berry <stereotype441@gmail.com> |
Replace gl_vert_result enum with gl_varying_slot. This patch makes the following search-and-replace changes: gl_vert_result -> gl_varying_slot VERT_RESULT_* -> VARYING_SLOT_* Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Brian Paul <brianp@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6bec74bfd98e2f9c090c550c18c02f71ea80d04e |
|
24-Feb-2013 |
Paul Berry <stereotype441@gmail.com> |
i965: Change fragment input related bitfields to 64-bit. This patch updates the bitfields brw_context::wm.input_size_masks, tracker::size_masks, and brw_wm_prog_key::proj_attrib_mask, all of which are indexed by gl_frag_attrib, from 32-bit to 64-bit. This paves the way for supporting geometry shaders, and for merging the gl_frag_attrib and gl_vert_result enums. The combination of these two will require at least 55 bits in the bitfields. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Brian Paul <brianp@vmware.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
db3a0f13ef13b6d392dfc3b7346351533600d343 |
|
11-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965: Split shader_time entries into separate cachelines. This avoids some snooping overhead between EUs processing separate shaders (so VS versus FS). Improves performance of a minecraft trace with shader_time by 28.9% +/- 18.3% (n=7), and performance of my old GLSL demo by 93.7% +/- 0.8% (n=4). v2: Add a define for the stride with a comment explaining its units and why. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4dc7e6dcbf0d9c360e257c704774c9b083511b47 |
|
07-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Also do the gen4 SEND dependency workaround against other SENDs. We were handling the the dependency workaround for the first written reg of a send preceding the one we're fixing up, but didn't consider the other regs. Thus if you had two sampler calls that got allocated to the same set of regs, one might, rarely, ovewrite the other. This was occurring in XBMC's GLSL shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44567 NOTE: This is a candidate for the stable branches. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4c1fdae0a01b3f92ec03b61aac1d3df500d51fc6 |
|
06-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Switch to using sampler LD messages for uniform pull constants. When forcing the compiler to always generate pull constants instead of push constants (in order to have an easy to use testcase), improves performance of my old GLSL demo 23.3553% +/- 1.42968% (n=7). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60866 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1323772543083dec23baf5a50222bdfc88ff6c3a |
|
07-Mar-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Fix broken rendering in large shaders with UBO loads. The lowering process creates a new vgrf on gen7 that should be represented in live interval analysis. As-is, it was getting a conflicting allocation with gl_FragDepth in the dolphin emulator, producing broken rendering. NOTE: This is a candidate for the 9.1 branch. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61317 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
14cec07177f438717cc6fb9252525e16d6b3d8dd |
|
22-Feb-2013 |
Eric Anholt <eric@anholt.net> |
i965: Make perf_debug() output to GL_ARB_debug_output in a debug context. I tried to ensure that performance in the non-debug case doesn't change (we still just check one condition up front), and I think the impact is small enough in the debug context case to warrant including all of it. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f52ce6a0ca73d1cd89091689efd8ea2e14748723 |
|
24-Jan-2013 |
Chris Forbes <chrisf@ijw.co.nz> |
i965: add a new virtual opcode: SHADER_OPCODE_TXF_MS This is very similar to the TXF opcode, but lowers to `ld2dms` rather than `ld` on Gen7. V4: - add SHADER_OPCODE_TXF_MS to is_tex() functions, so regalloc thinks it actually writes the correct number of registers. Otherwise in nontrivial shaders some of the registers tend to get clobbered, producing bad results. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0a1d145e5f1e6120e70e9b46e069167a0d653579 |
|
02-Dec-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Use the LRP instruction for ir_triop_lrp when possible. v2 [mattst88]: - Add BRW_OPCODE_LRP to list of CSE-able expressions. - Fix op_var[] array size. - Rename arguments to emit_lrp to (x, y, a) to clear confusion. - Add LRP function to brw_fs.cpp/.h. - Corrected comment about LRP instruction arguments in emit_lrp. v3 [mattst88]: - Duplicate MAD code for LRP instead of using a function pointer. - Check for != GRF instead of == IMM in emit_lrp. - Lower LRP on gen < 6. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> 1
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4eeb9ded9d0165a81aa9d2ac01197e30ddf9d835 |
|
12-Feb-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs/gen7: Allow MATH instructions to have MRF as a destination total instructions in shared programs: 1376297 -> 1375626 (-0.05%) instructions in affected programs: 35977 -> 35306 (-1.87%) Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b9f6795e34ad0b85b1f4f288dc6d1e5fcee30697 |
|
11-Feb-2013 |
Matt Turner <mattst88@gmail.com> |
i965/fs: Remove duplicate scan_inst->mlen check Is already checked 20 lines below. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7b0731d940c758ca9c1e883cdea454d8787255c1 |
|
21-Feb-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Fix broken math on values loaded from uniform buffers on gen6. In a debug build this led to assertion failures, but on a non-debug build the hardware would just reference the whole vec8 instead of the same channel 8 times. Fixes the new piglit glsl-1.40/uniform-buffer/fs-exp2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57121 Note: This is a candidate for the stable branches Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
aebd3f46e305829ebfcc817cafa8592edc2f80ab |
|
16-Feb-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Delay setup of uniform loads until after pre-regalloc scheduling. This should fix the register allocation explosion on the GLES 3.0 test on gen6. It also gives us an instruction that will fit our CSE handling. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> NOTE: This is a candidate for the 9.1 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
de7cb1cff3dbe30bbd691ed56e61c9d37ba5f2da |
|
16-Feb-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add a bit more instruction dumping useful for upcoming work. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c37992c54d753e732783f712dea2d483450371dd |
|
06-Feb-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Do a general SEND dependency workaround for the original 965. We'd been ad-hoc inserting instructions in some SEND messages with no knowledge of when it was required (so extra instructions), but not all SENDs (so not often enough). This should do much better than that, though it's still flow-control-ignorant. v2: Use BRW_MAX_MRF instead of magic numbers. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58960 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> NOTE: Candidate for the stable branches.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bf91f0b03942d966cf453201dc52c4aa4049f8fa |
|
06-Feb-2013 |
Eric Anholt <eric@anholt.net> |
i965/fs: Use a helper function for checking for flow control instructions. In 2 of our checks, we were missing BREAK and CONTINUE. NOTE: Candidate for the stable branches. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
491364e1f34ddb2c8ea439e871dd42aaa5cc9b28 |
|
11-Dec-2012 |
Ian Romanick <ian.d.romanick@intel.com> |
glsl: Add GLSL_TYPE_INTERFACE Interfaces are structurally identical to structures from the compiler's point of view. They have some additional restrictions, and generally GPUs use different instructions to access them. Using a different base type should make this a bit easier. This commit also adds the glsl_type::interface_packing fields. For GLSL_TYPE_INTERFACE types, this will track the specified packing mode. It is analogous to gl_uniform_buffer::_Packing. v2: Add serveral missing GLSL_TYPE_INTERFACE cases in switch-statements. v3: Add information about glsl_type::interface_packing. Move row_major checking in glsl_type::record_key_compare from this patch to the previous patch. Both suggested by Paul Berry. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ecfb404e8d4fcd35524d1c4b3421e24980fe3976 |
|
11-Dec-2012 |
Ian Romanick <ian.d.romanick@intel.com> |
glsl: Replace most default cases in switches on GLSL type This makes it easier to find switch-statements that need to be updated after a new GLSL_TYPE_* is added because the compiler will generate a warning. Switch-statements that only had a small number of cases (e.g., everything in ir_constant_expression.cpp) were not modified. I may regret that decision when we eventually add support for doubles. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Carl Worth <cworth@cworth.org> Reviewed-by: Chad Versace <chad.versace@linux.intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4a6753926f51accd6f71d9caea18b15a99b8be24 |
|
11-Jan-2013 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Add an INTEL_DEBUG=no16 option. Often when debugging, I don't want to see SIMD16 shaders. It makes INTEL_DEBUG=vs/fs output much easier to read, especially when a program dumps many shaders. Plus, I also want to verify that SIMD8 works before even considering SIMD16. v2: Fix the likeliness check (caught by Chris and Eric). Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c0d1f508d6d471cf44329f43d8a79230ed8db0b6 |
|
21-Nov-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Reference the core GL uniform storage for non-builtin uniforms. There's no reason to use an external copy if the relayout in the external copy isn't serving us. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f189570ccf60ab665cbe9feeff52685600f8163d |
|
21-Nov-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Remove the param_index/param_offset indirection. Now that ParameterValues doesn't change across the visitor, we don't need to go through this. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d5efc14635cf25bc130bfa77737913913d9202ce |
|
21-Nov-2012 |
Eric Anholt <eric@anholt.net> |
i965: Add asserts to check that we don't realloc ParameterValues. Things are even more restrictive than they used to be, so I've made mistakes in this area. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7171c45d3a6392b947d96c10362ce0459b741669 |
|
01-Dec-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Drop an unnecessary _safe on a list walk. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
78ce522932e8c356880c7ca10dace4b6fe6cf313 |
|
01-Dec-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add a note explaining a detail of register_coalesce_2(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3c560633548f4b0298a372903de32639706f8c40 |
|
05-Dec-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Move the failure for gen7 16-wide intdiv to emit_math(). The cube map array code adds another caller of emit_math(), which needs this check. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6e34723ac9fa2d5c34cb2a38118ecf5b856c4992 |
|
28-Nov-2012 |
Chris Forbes <chrisf@ijw.co.nz> |
i965: fs: fix gen6+ math operands in one place V4: Fix various style nits as pointed out by Eric, and expand IMM operands on both Gen6 and Gen7. v5: minor style nits (by anholt) Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
461a29783a28e579a9a5a236e5f47ffb6d18a328 |
|
05-Dec-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Set up gen7 UBO loads as sends from GRFs. This gives the instruction scheduler a chance to schedule between the loads, whereas before it was restricted due to the dependencies between the MRFs for setting them up. For one shader in gles3conform, it goes from getting stuck in register allocation for as long as anybody's bothered to leave it running down to 23 seconds, thanks to the LIFO scheduling. Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7a9f940cab8e5a3bbbab3e302de2311b36159d91 |
|
04-Dec-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Schedule instructions both before and after register allocation. Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f74560f3fb516971e6a7b03a2382db2f58699f59 |
|
10-Dec-2012 |
Eric Anholt <eric@anholt.net> |
i965: Scale shader_time to compensate for resets. Some shaders experience resets more than others, which skews the numbers reported. Attempt to correct for this by linearly scaling according to the number of resets that happen. Note that will not be accurate if invocations of shaders have varying times and longer invocations are more likely to reset. However, this should at least be better than the previous situation.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
338b5f887d462bbe7ef58a233cd00619e43415f0 |
|
10-Dec-2012 |
Eric Anholt <eric@anholt.net> |
i965: Adjust the split between shader_time_end() and shader_time_write(). I'm about to emit other kinds of writes besides time deltas, and it turns out with the frequency of resets, we couldn't really use the old time delta write() function more than once in a shader.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d5016495cc1b50b1673d0d3ab8e6af8249b071d5 |
|
06-Dec-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Rewrite discards to use a flag subreg to track discarded pixels. This makes much more sense on gen6+, and will also prove useful for early exit of shaders on discard. v2: fix up a stale comment from before converting gen4-5. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b278f65e1c5295794dcf08d100356e6ded6c1f32 |
|
06-Dec-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add an instruction flag for choosing the flag subregister. We're going to redo discard handling to track discards in the other flag subregister, saving instructions in the discard and allowing predicated jumps out to the end of the shader. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
71f06344a0d72a6bd27750ceca571fc016b8de85 |
|
27-Nov-2012 |
Eric Anholt <eric@anholt.net> |
i965: Add a debug flag for counting cycles spent in each compiled shader. This can be used for two purposes: Using hand-coded shaders to determine per-instruction timings, or figuring out which shader to optimize in a whole application. Note that this doesn't cover the instructions that set up the message to the URB/FB write -- we'd need to convert the MRF usage in these instructions to GRFs so that our offsets/times don't overwrite our shader outputs. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1) v2: Check the timestamp reset flag in the VS, which is apparently getting set fairly regularly in the range we watch, resulting in negative numbers getting added to our 32-bit counter, and thus large values added to our uint64_t. v3: Rebase on reladdr changes, removing a new safety check that proved impossible to satisfy. Add a comment to the AOP defs from Ken's review, and put them in a slightly more sensible spot. v4: Check timestamp reset in the FS as well.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a64c1eb9b110f29b8abf803a8256306702629bdc |
|
09-Nov-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add support for uniform array access with a variable index. Serious Sam 3 had a shader hitting this path, but it's used rarely so it didn't show a significant performance difference (n=7). It does reduce compile time massively, though -- one shader goes from 14s compile time and 11723 instructions generated to .44s and 499 instructions. Note that some shaders lose 16-wide mode because we don't support 16-wide and pull constants at the moment (generally, things looping over a few-element array where the loop isn't getting unrolled). Given that those shaders are being generated with 15-20% fewer instructions, it probably outweighs the loss of 16-wide.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f22a909a080d603db122ac8517a80bd8f4006fe2 |
|
09-Nov-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Restrict optimization that would fail for gen7's SENDs from GRFs v2: Fix SNB math bug in register_coalesce() where I was looking at the instruction to be removed, not the instruction to be copy propagated into.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9156d0cba1090c4bcc3a6c0c7b2ad8921a295be4 |
|
26-Nov-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Allow source mods on gen7+ math. This gen6 restriction was removed in gen7 as the mathbox merge to act more like a normal instruction was finished in the hardware.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d8214e4384aaf0ee412ad9aea80f9fec522c1e4a |
|
07-Nov-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add instruction emit for varying-index reads of uniforms. The gen7 send-from-GRF path is sufficiently different from the perspective of IR generation and optimization that I just made it a separate opcode. v2: fix whitespace, rebase on Ken's recent refactor.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
29340d02dc38a9cc352d44412871dc9d4e3f878a |
|
07-Nov-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Rename the existing pull constant load opcode. We're going to use another send message for handling loads with a varying per-fragment array index.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b126228f1247fb0fed686ee3ef2c87461f2fc7a7 |
|
30-Nov-2012 |
Eric Anholt <eric@anholt.net> |
i965: Include codegen time in the INTEL_DEBUG=perf stall detection. In the VS case, we were missing the entire compile time in the stall detection! Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0f06864ba566eaff5b739a9d0fba5ed7eaadd60b |
|
30-Nov-2012 |
Eric Anholt <eric@anholt.net> |
i965: Don't leak the IR annotation into later instructions. After walking our IR instructions (Mesa or GLSL), we don't want to also mark the start of the FB/URB writes or whatever as being that IR. This can end up being misleading when the end of the IR visit got copy propagated out to a later instruction in the URB writes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8d0bb74a11f1905e32f6db23fbf8bb29ff8fa367 |
|
18-Nov-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Add fs_reg::is_zero() and is_one(); use for opt_algebraic(). These helper macros save you from writing nasty expressions like: if ((inst->src[1].type == BRW_REGISTER_TYPE_F && inst->src[1].imm.f == 1.0) || ((inst->src[1].type == BRW_REGISTER_TYPE_D || inst->src[1].type == BRW_REGISTER_TYPE_UD) && inst->src[1].imm.u == 1)) { Instead, you simply get to write inst->src[1].is_one(). Simple. Also, this makes the FS backend match the VS backend (which has these). This patch also converts opt_algebraic to use the new helper functions. As a consequence, it will now also optimize integer-typed expressions. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
154ef07aa74e1d91e16cf9f2492cae33790b0998 |
|
30-Oct-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add some minimal backend-IR dumping. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9136723214136a95a3c915d580943c888cd99503 |
|
21-Nov-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Move struct brw_compile (p) entirely inside fs_generator. The brw_compile structure contains the brw_instruction store and the brw_eu_emit.c state tracking fields. These are only useful for the final assembly generation pass; the earlier compilation stages doesn't need them. This also means that the code generator for future hardware won't have access to the brw_compile structure, which is extremely desirable because it prevents accidental generation of Gen4-7 code. v2: rzalloc p, as suggested by Eric. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ea681a0d64ecde3a2e729fe3b71d3f3fe4cedff0 |
|
09-Nov-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Split final assembly code generation out of fs_visitor. Compiling shaders requires several main steps: 1. Generating FS IR from either GLSL IR or Mesa IR 2. Optimizing the IR 3. Register allocation 4. Generating assembly code This patch splits out step 4 into a separate class named "fs_generator." There are several reasons for doing so: 1. Future hardware has a different instruction encoding. Splitting this out will allow us to replace fs_generator (which relies heavily on the brw_eu_emit.c code and struct brw_instruction) with a new code generator that writes the new format. 2. It reduces the size of the fs_visitor monolith. (Arguably, a lot more should be split out, but that's left for "future work.") 3. Separate namespaces allow us to make helper functions for generating instructions in both classes: ADD() can exist in fs_visitor and create IR, while ADD() in fs_generator() can create brw_instructions. (Patches for this upcoming.) Furthermore, this patch changes the order of operations slightly. Rather than doing steps 1-4 for SIMD8, then 1-4 for SIMD16, we now: - Do steps 1-3 for SIMD8, then repeat 1-3 for SIMD16 - Generate final assembly code for both modes together This is because the frontend work can be done independently, but final assembly generation needs to pack both into a single program store to feed the GPU. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4d09fe938e72b26d814b6b52caee5112cf6f1103 |
|
21-Nov-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Move uses of brw_compile from do_wm_prog to brw_wm_fs_emit. The brw_compile structure is closely tied to the Gen4-7 hardware encoding. However, do_wm_prog is very generic: it just calls out to get a compiled program and then uploads it. This isn't ultimately where we want it, but it's a step in the right direction: it's now closer to the code generator. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3417b2f2b249d89fc71379bfc0eaa1055de365ba |
|
20-Nov-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Pass the brw_context pointer into fs_visitor explicitly. We used to steal it out of the brw_compile struct...but fs_visitor isn't going to have one of those in the future. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1f74002a9817e000d3f5633dd5eb6adfd1d51ba5 |
|
20-Nov-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Move brw_wm_compile::fp to fs_visitor. Also change it from a brw_fragment_program to a gl_fragment_program, since that seems to be what everything wants anyway. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7b0d30eb8765066b9f3b5f2a50c426ccbac675fa |
|
20-Nov-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Remove struct brw_shader * parameter to fs_visitor constructor. We can easily recover it from prog, and this makes it clear that we aren't passing additional information in. v2: Use an if-statement rather than the ?: operator (suggested by Eric). Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a303df86de96a428f82377a8c38db8b7e3223447 |
|
20-Nov-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Move brw_wm_compile::dispatch_width into fs_visitor. Also, rather than having brw_wm_fs_emit poke at it directly, make it a parameter to the fs_visitor constructor. All other changes generated by search and replace (with occasional whitespace fixup). v2: Make dispatch_width const (as suggested by Paul); fix doxygen mistake (pointed out by Eric); update for rebase. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
47a6a7b51b774091f46aed264b3591fd36c8baed |
|
19-Nov-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Move brw_wm_lookup_iz() to fs_visitor::setup_payload_gen4(). This necessitates compiling brw_wm_iz.c as C++. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2429c9d347fe1c6e98a248c1039041f6a59fd749 |
|
14-Nov-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Move brw_wm_payload_setup() to fs_visitor::setup_payload_gen6() Now that we only have the one backend, there's no real point in keeping this separate. Moving it should allow some future simplifications. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d82b873a501606d62b9f208b6d5cda79c9a6b4b8 |
|
09-Nov-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add helper functions for IF and CMP and use them. v2: Rebase on gen6-if fix. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
32d6809bb50a08d9b80ed8b3d13cc6b76580a3a9 |
|
09-Nov-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add helper functions for generating ALU ops, like in the VS. This gives us checking of our arguments (no more passing 1 operand to BRW_OPCODE_MUL!), at the cost of a couple of extra parens. v2: Rebase on gen6-if fix. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5cea0273414bd5897c318b4d632b08ce8080a2fe |
|
15-Nov-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Properly patch special values during VGRF compaction. In addition to registers used by instructions, fs_visitor maintains direct references to certain "special" values used for inputs/outputs. When I added VGRF compaction, I overlooked these, believing that these direct references weren't used once instructions were generated. That was wrong. For example, pixel_x/y are used in virtual_grf_interferes(), which is called by optimization passes and register allocation. This patch treats all of them as used and patches them after compacting. While it's not strictly necessary to patch all of them (as some aren't used after emitting code), it seems safer to simply fix them all. Fixes oglconform's textureswizzle/advanced.shader.targets, piglit's glsl-fs-lots-of-tex, and glean's texCombine on pre-Gen6 hardware. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56790 Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d010e70a07ab4a0b24aad8c9693a7f9c680d6164 |
|
03-Nov-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Don't calculate_live_intervals() in opt_algebraic(). There's no point: opt_algebraic() doesn't use any liveness information. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cf26b4569a2effc59d072ffd2b2bf9b055faab43 |
|
31-Oct-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Do dead code elimination just after copy propagation. If we put the register coalescing in between the two, then we end up with code sequences involving dead writes that the dead code elimination doesn't know how to remove. In place of making dead code elimination smart (which we should do, too), make it less important for the moment. shader-db results: total instructions in shared programs: 722240 -> 721275 (-0.13%) instructions in affected programs: 50573 -> 49608 (-1.91%) (no shaders regressed). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
05882b0d3b69ac14e9bc93460c77f9dc203c2ff9 |
|
02-Nov-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Compact the virtual GRF arrays. During code generation, we create tons of temporary variables, many of which get immediately killed and are never used. Later optimization and analysis passes, such as compute_live_intervals, loop over all the virtual GRFs. By compacting them, we can save a lot of overhead. Reduces compilation time in L4D2's largest fragment shader from 10.2 seconds to 5.2 seconds (50%). Drops compute_live_variables() from 10-12% of another game's startup time to 8%. Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
54679fcbcae7a2d41cb439e52e386bd811a291b4 |
|
03-Oct-2012 |
Eric Anholt <eric@anholt.net> |
i965: Share the predicate field between FS and VS. Note that BRW_PREDICATE_NONE is 0 and BRW_PREDICATE_NORMAL is 1, so that's a lot like the true/false we had in the FS before. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
24aeeb2fdcde7a0c257db6469c6b0f064d53d3cf |
|
03-Oct-2012 |
Eric Anholt <eric@anholt.net> |
i965: Make the FS and VS share a few visitor/instruction fields. This will let us reuse brw_fs_cfg.cpp from brw_vec4_*. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
41954107c00d68869f0316126908e873662b4c6d |
|
15-Oct-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Fix segfault when using INTEL_DEBUG=perf with non-GLSL. Now that ARB programs and fixed function are routed through the new backend, shader might be NULL. Don't do INTEL_DEBUG=perf support in that case, since it relies on shader->compiled_once. Since INTEL_DEBUG=perf wasn't previously supported, this maintains the status quo. It might be nice to support it someday, however. This could be moved to brw_shader_program instead of brw_shader, but it appears even prog can be NULL in that case. Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fb5bf03a2092159166229eacf57c71587f762c57 |
|
21-Sep-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Move constant propagation to the same codebase as copy prop. This means that we don't get constant prop across into the first block after a BRW_OPCODE_IF or a BRW_OPCODE_DO, but we have hope for properly doing it across control flow at some point. More importantly, with the next commit it will help avoid O(n^2) with instruction count runtime for shaders that have many constant moves. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
97615b2d8c7c3cea6fd3a43bcb1739a96e2046c4 |
|
27-Aug-2012 |
Eric Anholt <eric@anholt.net> |
i965: Replace brw_wm_* with dumping code into the fs_visitor. This makes a giant pile of code newly dead. It also fixes TXB on newer chipsets, which has been totally broken (I now have a piglit test for that). It passes the same set of Ian's ARB_fragment_program tests. It also improves high-settings ETQW performance by 3.2 +/- 1.9% (n=3), thanks to better optimization and having 8-wide along with 16-wide shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=24355 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9cfc00a84c7a7176a385808e0b92705e79955505 |
|
20-Sep-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add a couple more algebraic cases that help some ARB_fp patterns. ARB_fp doesn't go through the GLSL optimizer, and these were things you see frequently thanks to conditionals being lowered to SLT/SGE and MUL. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
077d01b673ec255005a1a847faf3be897517f4e7 |
|
01-Feb-2012 |
Eric Anholt <eric@anholt.net> |
i965: Add support for instruction compaction. This reduces program size by using some smaller encodings for common bit patterns in the Gen ISA, with the hope of making programs fit in the instruction cache better. v2: Use larger bitshifts for the uncompressed field setups, in line with the way it's described in the spec. Consistently name a brw_compile "p" like all other code. Add a couple more tests. Consistently call things "compacted" not "compressed" (which is a different feature). Drop the explicit check for not compacting SENDs, which is unjustified and already implied by our lack of support for immediate values. Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
23cd6c43da6fb1ff89b994664df2658a7929402e |
|
11-Sep-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Remove incorrect comment above opt_algebraic. The comment was cut-and-pasted from propagate_constants(), and had no relation at all to opt_algebraic().
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f144b78dfbb97a70121be6f20d10bad8111267e3 |
|
27-Aug-2012 |
Eric Anholt <eric@anholt.net> |
i965: Make the param pointer arrays for the WM dynamically sized. Saves 26.5MB of wasted memory allocation in the l4d2 demo. v2: Rebase on compare func change, fix comments. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4d9abd96cc177cade79b64544096eb45bf8313a2 |
|
31-Aug-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Don't use brw->fragment_program in calculate_urb_setup(). Reading brw->fragment_program is nonsensical in compiler code: it contains the currently active program (if any), not the one currently being compiled. Attempting to access it may either lead to crashes (null pointer dereference if no program is active) or wrong results. Fixes piglit regressions since 9ef710575b914ddfc8e9a162d98ad554c1c217f7 on pre-Sandybridge hardware. The actual bug was created in commit 7b1fbc688999fd568e65211d79d7678562061594. NOTE: This is a candidate for the 9.0 and 8.0 branches. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54183 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
85b24b07512c5f3f05c5a3eb9561598ace97526c |
|
26-Aug-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Assume shadow sampler swizzling is <X, X, X, 1>. Our previous assumption, SWIZZLE_XYZW, was completely bogus for depth textures. There are no Y, Z, or W components. DEPTH_TEXTURE_MODE has three options: - GL_LUMINANCE: <X, X, X, 1> - GL_INTENSITY: <X, X, X, X> - GL_ALPHA: <0, 0, 0, X> The default value is GL_LUMINANCE, and most applications don't seem to alter DEPTH_TEXTURE_MODE. Make that our precompile guess. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f3d0daf7ea7e42ff9ce11e8bd6fba1059a2406e8 |
|
26-Aug-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Index sampler program key data by linker-assigned index. Now that most things are based on the linker-assigned index, it makes sense to convert the arrays in the VS/WM program key as well. It seems silly to leave them indexed by texture unit. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ab17762c70852ca8fc400d7b5c6696d412ff2afe |
|
14-Aug-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Only set proj_attrib_mask for fixed function. brw_wm_prog_key's proj_attrib_mask field is designed to enable an optimization for fixed-function programs, letting us avoid projecting attributes where the divisor is 1.0. However, for shaders, this is not useful, and is pretty much impossible to guess when building the FS precompile key. Turning it off for shaders should allow the precompile to work and not lose much. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Suggested-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b6b1fc1261e86e2aa03ae8d2dd587c88a207354f |
|
14-Aug-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Don't set vp_outputs_written in the WM program key on Gen6+. It's only used by on pre-Sandybridge hardware. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a3685544e1e88828c4931059686cf3acc199079c |
|
14-Aug-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Don't set iz_lookup the FS precompile's program key on Gen6+. We already changed the actual program key builder to only set these bits on gen < 6; this patch just brings the precompile state back in line so it doesn't mismatch every time. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
43e3a7533d5537e48cef23588131dd25d938ee4b |
|
14-Aug-2012 |
Eric Anholt <eric@anholt.net> |
i965: Fix the scaling of seconds to ms in perf debug. *headdesk*
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
006c1a3c652803e2ff8d5f7ea55c9cb5d8353279 |
|
07-Aug-2012 |
Eric Anholt <eric@anholt.net> |
i965: Add perf debug for stalls during shader compiles. v2: fix bad comment from before I gave up and decided to just use doubles. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fc3b7c9b56701f23b002543de33a8d8c43f9bdc2 |
|
12-Jul-2012 |
Eric Anholt <eric@anholt.net> |
i965: Add performance debug for shader recompiles. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d72ff03e699e78381049e29d89163519e6730dd4 |
|
12-Jul-2012 |
Eric Anholt <eric@anholt.net> |
i965: Add INTEL_DEBUG=perf for failure to compile 16-wide shaders. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7426d9d7699452f15f3288e781e1791d8d00a64a |
|
19-Jul-2012 |
Olivier Galibert <galibert@pobox.com> |
i965/fs: Fix the FS inputs setup when some SF outputs aren't used in the FS. If there was an edge flag or a two-side-color pair present, we'd end up mismatched and read values from earlier in the VUE for later FS inputs. v2: Fix regression in gles2conform shaders generating point size. (change by anholt) Signed-off-by: Olivier Galibert <galibert@pobox.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> NOTE: This is a candidate for the 8.0 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
90de96ff0d6d54ba0f9a337a6a107acf4134682d |
|
21-Jun-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add support for loading uniform buffer variables as pull constants. Variable array indexing isn't finished, because the lowering pass turns it all into conditional moves of constant index accesses so I can't test it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
454dc83f66643e66ea7ee9117368211f0cfe84d7 |
|
21-Jun-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Communicate the pull constant block read parameters through fs_regs. I wanted to add the surface index as a variable value for UBO support, and a reg seemed like the obvious way to go. This exposes more of the information to CSE, which we'll probably want to apply to pull constant loads for UBOs eventually (you might access 4 floats in a row, each of which would produce an oword block read of the same block). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cc44aa77490e1360b099eb0b887266f434298b4f |
|
21-Jul-2012 |
Eric Anholt <eric@anholt.net> |
i965: Remove unused param conversion code. Ever since ctx->NativeIntegers was set, the conversion flag has been PARAM_NO_CONVERT. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d08fdacd58dfa6b1926e9df4707dd9e8dd5370c5 |
|
20-Jun-2012 |
Paul Berry <stereotype441@gmail.com> |
i965: Avoid unnecessary recompiles for shaders that don't use dFdy(). The i965 back-end needs to compile dFdy() differently for FBOs and window system framebuffers, because Y coordinates are flipped between the two (see commit 82d2596: i965: Compute dFdy() correctly for FBOs). This patch avoids unnecessarily recompiling shaders that don't use dFdy(), by only setting render_to_fbo in the wm program key if the shader actually uses dFdy(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a454f8ec6df9334df42249be910cc2d57d913bff |
|
07-Jul-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs.h: Refactor tests for instructions modifying a register. There's one instance of a potential behavior change: propagate_constants may now propagate into a part of a vgrf after a different part of it was overwritten by a send that returns multiple registers. I don't think we ever generate IR that meets that condition, but it's something to note if we bisect behavior change to this. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fc01376c50c15938f3b78431023ca3281304663d |
|
06-Jul-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Replace usage is_tex() with regs_written() checks. In these places, we care about any sort of send that hits more than one reg, not just textures. We don't yet have anything else returning more than one reg, so there's no change. v2: Use mlen instead of is_tex() for the is-it-a-send check. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a6411520b40d59a8806289c7aaea4a6b26a54443 |
|
06-Jul-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Rename virtual_grf_next to virtual_grf_count. "count" is a more useful name, since most of the time we're using it for looping over the variables. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b546aebae922214dced54c75e6f64830aabd5d1c |
|
10-Jul-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Delete previous workaround for textureGrad with shadow samplers. It had many problems: - The shadow comparison was done post-filtering. - It required state-dependent recompiles whenever the comparison function changed. - It didn't even work: many cases hit assertion failures. - I never implemented it for the VS. The new lowering pass which converts textureGrad to textureLod by computing the LOD value works much better. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2343fe9a5d1786413453e6e8e5c7700143d68a26 |
|
05-Jun-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Invalidate live intervals in passes that remove an instruction. Since live intervals are based on ip, removing an instruction trashes the intervals unless we were to go do some surgery. These happen to usually remove a use of a grf, so it's time to recalculate, anyway. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> NOTE: This is a candidate for the 8.0 release branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fe27916ddf41b9fb60c334c47c1aa81b8dd9005e |
|
04-Jul-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Move class functions from the header to .cpp files. Cuts compile time for brw_fs.h changes from 2.7s to .7s and reduces i965_dri.so size by 70k. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8313f44409ceb733e9f8835926364164237b3111 |
|
21-Jun-2012 |
Paul Berry <stereotype441@gmail.com> |
i965/msaa: Fix centroid interpolation of unlit pixels. From the Ivy Bridge PRM, Vol 2 Part 1 p280-281 (3DSTATE_WM: Barycentric Interpolation Mode): "Errata: When Centroid Barycentric mode is required, HW may produce incorrect interpolation results when a 2X2 pixels have unlit pixels." To work around this problem, after doing centroid interpolation, we replace the centroid-interpolated values for unlit pixels with non-centroid-interpolated values (which are interpolated at pixel centers). This produces correct rendering at the expense of a slight increase in shader execution time. I've conditioned the workaround with a runtime flag (brw->needs_unlit_centroid_workaround) in the hopes that we won't need it in future chip generations. Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4} {centroid-deriv,centroid-deriv-disabled}". All MSAA interpolation tests pass now. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d1056541e239dfcee0ad6af2fd2d9fab37dbf025 |
|
18-Jun-2012 |
Paul Berry <stereotype441@gmail.com> |
i965/msaa: Add backend support for centroid interpolation. This patch causes the fragment shader to be configured correctly (and the correct code to be generated) for centroid interpolation. This required two changes: brw_compute_barycentric_interp_modes() needs to determine when centroid barycentric coordinates need to be included in the pixel shader thread payload, and fs_visitor::emit_general_interpolation() needs to interpolate using the correct set of barycentric coordinates. Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4} centroid-edges" on i965. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cf0e7aa9f8bc9c175ebd9b2ab3a8bfec4afc5abf |
|
21-Jun-2012 |
Paul Berry <stereotype441@gmail.com> |
i965/fs: Refactor interpolation code to prepare for adding centroid support. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
82d25963a838cfebdeb9b080169979329ee850ea |
|
20-Jun-2012 |
Paul Berry <stereotype441@gmail.com> |
i965: Compute dFdy() correctly for FBOs. On i965, dFdx() and dFdy() are computed by taking advantage of the fact that each consecutive set of 4 pixels dispatched to the fragment shader always constitutes a contiguous 2x2 block of pixels in a fixed arrangement known as a "sub-span". So we calculate dFdx() by taking the difference between the values computed for the left and right halves of the sub-span, and we calculate dFdy() by taking the difference between the values computed for the top and bottom halves of the sub-span. However, there's a subtlety when FBOs are in use: since FBOs use a coordinate system where the origin is at the upper left, and window system framebuffers use a coordinate system where the origin is at the lower left, the computation of dFdy() needs to be negated for FBOs. This patch modifies the fragment shader back-ends to negate the value of dFdy() when an FBO is in use. It also modifies the code that populates the program key (brw_wm_populate_key() and brw_fs_precompile()) so that they always record in the program key whether we are rendering to an FBO or to a window system framebuffer; this ensures that the fragment shader will get recompiled when switching between FBO and non-FBO use. This will result in unnecessary recompiles of fragment shaders that don't use dFdy(). To fix that, we will need to adapt the GLSL and NV_fragment_program front-ends to record whether or not a given shader uses dFdy(). I plan to implement this in a future patch series; I've left FIXME comments in the code as a reminder. Fixes Piglit test "fbo-deriv". NOTE: This is a candidate for stable release branches. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f220f73b9c5aca16ca21ea8bbbbf8718703b12cf |
|
08-May-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Do more register coalescing by using the interference graph. By using the live variables code for determining interference, we can handle coalescing in the presence of control flow, which the other register coalescing path couldn't. Total instructions: 207184 -> 206990 74/1246 programs affected (5.9%) 33993 -> 33799 instructions in affected programs (0.6% reduction) There is a newerth shader that loses out, because of some extra MOVs that now get their dead-code nature obscured by coalescing. This should be fixed by doing better at dead code elimination.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d7787adda8006506545256547d8d590a282487af |
|
08-May-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add support for copy propagation. We could do more by handling abs/negate and non-GRF sources, but this is a good start. Improves tropics performance 0.30% +/- .17% (n=43). shader-db results: Total instructions: 208032 -> 207184 60/1246 programs affected (4.8%) 23286 -> 22438 instructions in affected programs (3.6% reduction) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a4e9b5a768d2d9e59b6054148afb6a6b94c0e4e6 |
|
11-May-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Add a local common subexpression elimination pass. Total instructions: 18210 -> 17836 49/163 programs affected (30.1%) 12888 -> 12514 instructions in affected programs (2.9% reduction) This reduces Lightsmark's "Scale down filter" shader from 395 instructions to 283, a whopping 28%. It also reduces register pressure significantly: the SIMD8 program now uses 29 registers instead of 101, giving us more than enough room for a SIMD16 program. v2: Add && !inst->conditional_mod to the "skip some instructions" check. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d1029f99884e2ba7f663765274cd6bdb4f82feed |
|
11-May-2012 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Use a const reference in fs_reg::equals instead of a pointer. This lets you omit some ampersands and is more idiomatic C++. Using const also marks the function as not altering either register (which was obvious, but nice to enforce). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4433b0302d0aa9dc61002e8bb4fd1b752b0be338 |
|
20-Apr-2012 |
Brian Paul <brianp@vmware.com> |
intel: use _mesa_is_winsys/user_fbo() helpers Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
34b17ee598e855e1090a455c2dac31ed8104954b |
|
11-Apr-2012 |
Eric Anholt <eric@anholt.net> |
i965: Move the old live interval analysis code next to the new live vars code. I'm about to replace the insides of this using the new analysis.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
32ae8d3b321185a85b73ff703d8fc26bd5f48fa7 |
|
10-Mar-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Try to avoid generating extra MOVs to do saturates. This change (before the previous two) produced a .23% +/- .11% performance improvement in Unigine Tropics at 1024x768 on IVB. Total instructions: 269270 -> 262649 614/2148 programs affected (28.6%) 179386 -> 172765 instructions in affected programs (3.7% reduction) v2: Move some of the logic of finding the instruction that produced the result of an expression tree to a helper.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
43af02ac731dac7d80f7e47feb0c80e4da156769 |
|
27-Feb-2012 |
Yuanhan Liu <yuanhan.liu@linux.intel.com> |
i965: handle gl_PointCoord for Gen4 and Gen5 platforms This patch add the support of gl_PointCoord gl builtin variable for platform gen4 and gen5(ILK). Unlike gen6+, we don't have a hardware support of gl_PointCoord, means hardware will not calculate the interpolation coefficient for you. Instead, you should handle it yourself in sf shader stage. But badly, gl_PointCoord is a FS instead of VS builtin variable, thus it's not included in c.vue_map generated in VS stage. Thus the current code doesn't aware of this attribute. And to handle it correctly, we need add it to c.vue_map manually to let SF shader generate the needed interpolation coefficient for FS shader. SF stage has it's own copy of vue_map, thus I think it's safe to do it manually. Since handling gl_PointCoord for gen4 and gen5 platforms is somehow a little special, I added a lot of comments and hope I didn't overdo it ;) v2: add a /* _NEW_BUFFERS */ comment to note the state flag dependency and also add the _NEW_BUFFERS dirty mask (Eric). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45975 Piglit: glsl-fs-pointcoord and fbo-gl_pointcoord NOTE: This is a candidate for stable release branches. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a7f46eadea4555ed377928d4e3f89db4a445312e |
|
09-Feb-2012 |
Eric Anholt <eric@anholt.net> |
i965: Report the failure message when failing to compile the fragment shader. We just abort later, but at least this should result in more informative bug reports. NOTE: This is a candidate for release branches. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a4586d2e2e444d1212d4abfd1ea5bbeff4503feb |
|
27-Jan-2012 |
Eric Anholt <eric@anholt.net> |
intel: Comment typo fix.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
30f86aec01e1e1df4265d10a4618e34e9b8fec95 |
|
06-Jan-2012 |
Eric Anholt <eric@anholt.net> |
i965/fs: Fix projector==1.0 optimization pre-gen6. The optimization was supposed to turn an attribute component that was always 1.0 into a mov of 1.0. But by leaving loop this patch removes out of that test, we applied the projection correction to the 1.0 and got some other value, breaking openarena once it was converted to using the new compiler backend. Originally this hunk was separate from the former loop to make the generated instructions slightly better pipelined. We now have automatic instruction scheduling to handle that, and the generated instruction sequence looked the same to me after this change (except for the bugfix).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
069901e2f5a8f4a58047d25335f2526f1acc7234 |
|
19-Dec-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Allow constant propagation into IF with embedded compare. This saves a couple of instructions on most programs with control flow. More interestingly, 6 shaders from unigine sanctuary now fit into 16-wide without register spilling.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1b05fc7cdd0e5d77b50bc8ee2f2c851da5884d72 |
|
07-Dec-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Factor out texturing related data from brw_wm_prog_key. The idea is to reuse this for the VS and (in the future) GS as well. v2: Include yuvtex data since we're not dropping GL_MESA_ycbycr. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> [v1] Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
febad1779ae5cb5c85d66c2635baea62da52d2fa |
|
26-Oct-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Rename texturing ops from FS_OPCODE to SHADER_OPCODE, except TXB. We'll be reusing most of these for the VS shortly. The one exception is TXB (texturing with LOD bias), which is explicitly forbidden in the VS. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a73c65c5342bf41fa0dfefe7daa9197ce6a11db4 |
|
18-Oct-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Enable faster workaround-free math on Ivybridge. According to the documentation, Ivybridge's math instruction works in SIMD16 mode for the fragment shader, and no longer forbids align16 mode for the vertex shader. The documentation claims that SIMD16 mode isn't supported for INT DIV, but empirical evidence shows that it works fine. Presumably the note is trying to warn us that the variant that returns both quotient and remainder in (dst, dst + 1) doesn't work in SIMD16 mode since dst + 1 would be sechalf(dst), trashing half your results. Since we don't use that variant, we don't care and can just enable SIMD16 everywhere. The documentation also still claims that source modifiers and conditional modifiers aren't supported, but empirical evidence and study of the simulator both show that they work just fine. Goodbye workarounds. Math just works now. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8fad0f99989866eeb72889a84f12f6f817334ddb |
|
02-Nov-2011 |
Paul Berry <stereotype441@gmail.com> |
i965: Fix constant propagation into 32-bit integer MUL. i965's MUL instruction can't take an immediate value as its first argument. So normally, if constant propagation wants to propagate a constant into the first argument of a MUL instruction, it swaps the order of the two arguments. This doesn't work for 32-bit integer (and unsigned integer) multiplies, because the MUL operation is asymmetric in that case (it multiplies 16 bits of one operand by 32 bits of the other). Fixes piglit tests {vs,fs}-multiply-const-{ivec4,uvec4}. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9734bd05608c00a1d84851f3d46d5deb52e75d5e |
|
25-Oct-2011 |
Paul Berry <stereotype441@gmail.com> |
i965: Fix flat integral varyings. Previously, the vertex and fragment shader back-ends assumed that all varyings were floats. In GLSL 1.30 this is no longer true--they can also be of integral types provided that they have an interpolation qualifier of "flat". This required two changes in each back-end: assigning the correct type to the register that holds the varying value during shader execution, and assigning the correct type to the register that ties the varying value to the rest of the graphics pipeline (the message register in the case of VS, and the payload register in the case of FS). Fixes piglit tests fs-int-interpolation and fs-uint-interpolation. Acked-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5aa96286e7e1a5380673eb75e8653616b48751fd |
|
22-Oct-2011 |
Paul Berry <stereotype441@gmail.com> |
i965/gen6+: Add support for noperspective interpolation. This required the following changes: - WM setup now makes the appropriate set of barycentric coordinates (perspective vs. noperspective) available to the fragment shader, based on whether the shader requires perspective interpolation, noperspective interpolation, both, or neither. - The fragment shader backend now uses the appropriate set of barycentric coordiantes when interpolating, based on the interpolation mode returned by ir_variable::determine_interpolation_mode(). - SF setup now uses gl_fragment_program::InterpQualifier to determine which attributes are to be flat shaded (as opposed to the old logic, which only flat shaded colors). - CLIP setup now ensures that the clipper outputs non-perspective barycentric coordinates when they are needed by the fragment shader. Fixes the remaining piglit tests of interpolation qualifiers that were failing: - interpolation-flat-*-smooth-none - interpolation-flat-other-flat-none - interpolation-noperspective-* - interpolation-smooth-gl_*Color-flat-* Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f8386a29f07c6a41c4afb99fc3ecd9f18e9151e8 |
|
21-Oct-2011 |
Paul Berry <stereotype441@gmail.com> |
i965/fs: use determine_interpolation_mode(). This patch changes how fs_visitor::emit_general_interpolation() decides what kind of interpolation to do. Previously, it used the shade model to determine how to interpolate colors, and used smooth interpolation on everything else. Now it uses ir_variable::determine_interpolation_mode(), so that it respects GLSL 1.30 interpolation qualifiers. Fixes piglit tests interpolation-flat-*-smooth-{distance,fixed,vertex} and interpolation-flat-other-flat-{distance,fixed,vertex}. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e04bdeae82797dbdcf6f544a997a4626fdfd4aee |
|
22-Oct-2011 |
Paul Berry <stereotype441@gmail.com> |
i965/gen6+: Parameterize barycentric interpolation modes. This patch modifies the fragment shader back-end so that instead of using a single delta_x/delta_y register pair to store barycentric coordinates, it uses an array of such register pairs, one for each possible intepolation mode. When setting up the WM, we intstruct it to only provide the barycentric coordinates that are actually needed by the fragment shader--that is computed by brw_compute_barycentric_interp_modes(). Currently this function returns just BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC, because this is the only interpolation mode we support. However, that will change in a later patch. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
102bdd26e1acf1ebf75ef85b62df2400239fd480 |
|
21-Oct-2011 |
Paul Berry <stereotype441@gmail.com> |
i965/fs: Fix split_virtual_grfs() when delta_xy not in a virtual register. This patch modifies the special case in fs_visitor::split_virtual_grfs() that prevents splitting from being applied to the delta_x/delta_y register pair (this register pair needs to remain contiguous so that it can be used by the PLN instruction). When gen>=6, this register pair is in a fixed location, not a virtual register, so it was in no danger of being split. And split_virtual_grfs' attempt not to split it was preventing some other unrelated register from being split. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
73b0a28ba8b3e2ab917d4c729f34ddbde52c9e88 |
|
04-Oct-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Fix comparisions with uint negation. The condmod instruction ends up generating garbage condition codes, because apparently the comparison happens on the accumulator value (33 bits for UD), not the truncated value that would be written. Fixes fs-op-neg-* Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2e5a1a254ed81b1d3efa6064f48183eefac784d0 |
|
07-Oct-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
intel: Convert from GLboolean to 'bool' from stdbool.h. I initially produced the patch using this bash command: for file in {intel,i915,i965}/*.{c,cpp,h}; do [ ! -h $file ] && sed -i 's/GLboolean/bool/g' $file && sed -i 's/GL_TRUE/true/g' $file && sed -i 's/GL_FALSE/false/g' $file; done Then I manually added #include <stdbool.h> to fix compilation errors, and converted a few functions back to GLboolean that were used in core Mesa's function pointer table to avoid "incompatible pointer" warnings. Finally, I cleaned up some whitespace issues introduced by the change. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chad Versace <chad@chad-versace.us> Acked-by: Paul Berry <stereotype441@gmail.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d06cc42c3c85382600176d118d8bf492b4de6a55 |
|
07-Oct-2011 |
Paul Berry <stereotype441@gmail.com> |
i965: Fix computation of abs(-x) in FS When updating a register reference to reflect the fact that we were taking its absolute value, the fragment shader back-end failed to clear the negate flag, resulting in abs(-x) getting computed as -abs(x). I also found (and fixed) a similar problem in brw_eu.h, but I'm not aware of an actual manifestation of that problem. Fixes piglit test glsl-fs-abs-neg-with-intermediate.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
de772c402215b956ab3aa0875330fc1bf7cdf95b |
|
21-Aug-2011 |
Ian Romanick <ian.d.romanick@intel.com> |
mesa: Use gl_shader_program::_LinkedShaders instead of FragmentProgram Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4170227407eea7fd8287b17480a37309bf73f4e4 |
|
07-Oct-2011 |
Brian Paul <brianp@vmware.com> |
i965: silence unused var warnings in non-debug builds Reviewed-by: Chad Versace <chad@chad-versace.us>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b9af592dfa8f8d0fe9f29c2d48bf6846cbd5c50f |
|
29-Sep-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Reverse the operands for INT DIV prior to Gen6. Apparently on Gen4 and 5, the denominator comes first. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ff8f272b0d02b41a0ce34ab6af7119b9e06f4961 |
|
29-Sep-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Implement integer quotient and remainder math operations. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
30be2cc6c7c3378ee17885b5bf41d7ae53bf6fe0 |
|
26-Aug-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Implement texelFetch() on Ironlake and Sandybridge. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
23eec54bb0f368d9c88894b544b4af8f01cae2ae |
|
07-Sep-2011 |
Brian Paul <brianp@vmware.com> |
i965: add casts to silence int/enum conversion warnings
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
51e7b058750cc480c296d45f773d7a5a662457f5 |
|
06-Sep-2011 |
Brian Paul <brianp@vmware.com> |
mesa: put _mesa_ prefix on vert_result_to_frag_attrib()
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6489a1d5bab75589569658d374257bf23cb67a23 |
|
30-Aug-2011 |
Paul Berry <stereotype441@gmail.com> |
Refactor code that converts between gl_vert_result and gl_frag_attrib. Previously, this conversion was duplicated in several places in the i965 driver. This patch moves it to a common location in mtypes.h, near the declaration of gl_vert_result and gl_frag_attrib. I've also added comments to remind us that we may need to revisit the conversion code when adding elements to gl_vert_result and gl_frag_attrib. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2f0edc60f4bd2ae5999a6afa656e3bb3f181bf0f |
|
26-Aug-2011 |
Chad Versace <chad@chad-versace.us> |
i965: Fix Android build by removing relative includes Replace each occurence of #include "../glsl/*.h" with #include "glsl/*.h" Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Chad Versace <chad@chad-versace.us>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ecf8963754489abfb5097c130a9bcd4cdb76b6bd |
|
19-Jun-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Implement textureSize (TXS) on Gen5+. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e98ee06776e0ba055e0194836d5813a0bc7e7795 |
|
12-Aug-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Don't double-convert integer/boolean uniforms. When ctx->Const.NativeIntegers is set, Core Mesa loads integer/boolean uniforms directly, rather than loading the floating point equivalent. So, when that's set, we don't need to perform any conversions. Unfortunately, we can't properly support native integers with the old vertex shader backend, so this patch leaves them disabled for now. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7fbe7fe13359d3f349664410ec73d7bd48824ed6 |
|
11-Aug-2011 |
Eric Anholt <eric@anholt.net> |
i965/vs: Run the shader backend at link time and return compile failures. Link failure is something that shouldn't happen, but we sometimes want it during development. The precompile also allows analysis of shader codegen with shader-db.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
65b5cbbcf783f6c668ab5b31a0734680dd396794 |
|
05-Aug-2011 |
Eric Anholt <eric@anholt.net> |
i965: Rename math FS_OPCODE_* to SHADER_OPCODE_*. I want to just use the same enums in the VS.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6034b9a5124475d300d0678bd2fb6160865fa972 |
|
03-May-2011 |
Eric Anholt <eric@anholt.net> |
i965: Create a shared enum for hardware and compiler-internal opcodes. This should make gdbing more pleasant, and it might be used in sharing part of the codegen between the VS and FS backends.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c9e81fe14f36933617c862efb15ae09194485eab |
|
15-May-2011 |
Eric Anholt <eric@anholt.net> |
i965: Drop the reg/hw_reg distinction. "reg" was set in only one case, virtual GRFs pre register allocation, and would be unset and have hw_reg set after allocation. Since we never bothered with looking at virtual GRF number after allocation anyway, just use the same storage and avoid confusion.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b76378d46a211521582cfab56dc05031a57502a6 |
|
04-May-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Eliminate the magic nature of virtual GRF 0. This was a debugging aid at one point -- virtual grf 0 should never be allocated, and it would be used if undefined register access occurred in codegen. However, it made the confusing register allocation code even more confusing by indexing things off of 1 all over.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ee0373b833155804bb8846c6f05f897b9ee5afa6 |
|
26-Jul-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Don't upload unused uniform components. This saves both register space and upload bandwidth for unused values. Note that previously we were relying on the visitor not initially generating references to different sets of uniforms between the 8-wide and 16-wide code generation, and now we're relying on them dead-code eliminating the same stuff, too.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4683529048ee133481b2d8f1cae1685aa1736f9a |
|
04-Aug-2011 |
Bryan Cain <bryancain3@gmail.com> |
Merge branch 'glsl-to-tgsi' Conflicts: src/mesa/state_tracker/st_atom_pixeltransfer.c src/mesa/state_tracker/st_program.c
|
54db6e618e43abbd69b59e0a03e2b6ec83d3120f |
|
30-Jun-2011 |
Bryan Cain <bryancain3@gmail.com> |
r200, r600c, i965: fix build
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f710b8c7501f29f5f8941e757ea1066cbeb03305 |
|
23-Jul-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Allow register coalescing where the source is a uniform. Removes 0.8% of the fragment shader instructions on Unigine Tropics.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a8b86459a1bb74cfdf0d63572a9fe194b2b5b53f |
|
23-Jul-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Optimize a * 1.0 -> a. This appears in our instruction stream as a result of the brw_vs_constval.c handling.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6d8d6b41b85a18685351f3023a4cd41266ba9e68 |
|
23-Jul-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: If we see a RCP of a constant, try to constant fold it.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
eb30820f268608cf451da32de69723036dddbc62 |
|
23-Jul-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Port texture projection avoidance optimization from the old backend. This is part of fixing a ~1% performance regression in OpenArena when changing the fixed function fragment shader to using the new backend. Right now this just avoids the LINTERP of the projector, not the math using it.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
44ffb4ae207e48f78fae55925601b8708ed09c1d |
|
29-Jul-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Stop using the exec_list iterator. The old style has gone out of favor in the project, but I kept copy and pasting from existing iterator code.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6430df37736d71dd2bd6f1fe447d39f0b68cb567 |
|
10-Jun-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Add support for TXD with shadow comparisons. Our hardware doesn't have a sample_d_c message, so we have to do a regular sample_d and emit instructions to manually perform the comparison. This requires a state dependent recompile whenever the sampler's compare mode or function change. This adds the per-sampler comparison functions to brw_wm_prog_key, but only sets them when the sampler's compare mode is GL_COMPARE_R_TO_TEXTURE (i.e. only for shadow sampling). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ad9481e12813d5f1dec95ce123927e132fa935fb |
|
11-Jun-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Check for compilation failure and bail before optimizing. Prior to this patch, it would attempt to optimize and allocate registers for the program even if it failed to compile. This seems wasteful. More importantly, the "message length > 11" failure seems to choke the instruction scheduler, making it somehow use an undefined value and segmentation fault. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c173541d9769d41a85cc899bc49699a3587df4bf |
|
27-Apr-2011 |
Eric Anholt <eric@anholt.net> |
i965: Use state streaming on programs, and state base address on gen5+. There will be a little bit of thrashing of the program cache BO as the cache warms up, but once the application is in steady state, this reduces relocations on gen5 and later. On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6% +/- 1.3% (n=6). No statistically significant performance difference on nexuiz (n=5).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
23ef4a6063668c187d00a0502207f0c03be5f994 |
|
10-Jun-2011 |
Eugeni Dodonov <eugeni@mandriva.com> |
Fix format not a string literal error with -Werror=format-security A trivial fix for error: format not a string literal and no format arguments with compiling with -Werror=format-security flags. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c331b3123ecda127919458e24848b7c1596525ac |
|
12-May-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Use the embedded compare in SEL on gen6+. This avoids the extra CMP and the predication on SEL, so in addition to one less instruction, it makes scheduling less constrained. Improves glbenchmark Egypt performance 0.6% +/- 0.2% (n=3). Reduces FS instruction count across affected shaders in shader-db by 1.3% without regressing any. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8752764076e5b3f052a57e0134424a37bf2e9164 |
|
17-May-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Do a FS compile up front at link time to produce link errors. At glLinkShaders time, a fail() call in FS compile in 8-wide (the one that's required to succeed, though we may relax that at some point for pre-Ironlake performance) will now report out as a link error.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d1f70a8a6c6ec7007bad22d3d6013415be2d243a |
|
25-May-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Split the GLSL IR -> FS LIR visitor to brw_fs_visitor.cpp. We now have: brw_fs.cpp handles calling out to everything and optimization. brw_fs_visitor.cpp handles translating to our LIR. brw_fs_emit.cpp handles emitting from our LIR to native code. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
11dd9e9c0fcf9985b90ff4b63b2833345fece027 |
|
25-May-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Split the BRW native code emit to brw_fs_emit.cpp This is all separate from the visitor and the optimization passes which feed into it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b7b700aeb0eab2cae26a01d9db42feea969333c7 |
|
26-May-2011 |
Eric Anholt <eric@anholt.net> |
i965: Move a couple of GLSL IR -> BRW helper functions to brw_shader.cpp. These will be used by the VS backend as well. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
14b86f3c9131c1b26b01e07679cc899df0885b23 |
|
26-May-2011 |
Eric Anholt <eric@anholt.net> |
i965: Move non-FS-specific shader support to brw_shader.cpp. These only existed in brw_fs.cpp because it was the only .cpp file in the area when I wrote them. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
53c89c67f33639afef951e178f93f4e29acc5d53 |
|
27-Apr-2011 |
Eric Anholt <eric@anholt.net> |
i965: Avoid generating MOVs for assignments of expressions. No statistically significant difference measured in 3dbenchmark egypt/pro. It does reduce fragment shader instructions across shader-db by 0.3%.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1791857d7d950d3d2834bbb09b495f51f43ef7c1 |
|
17-May-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Move the computation of register block count from unit to compile. No net code size change, but unit update is down 0.8% code size pre-gen6. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
615117ce4efd041459f7d4b0c77aa8e248345e66 |
|
23-May-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Track fixed GRF regs separate from allocated GRF file in scheduling. There's an assumption here that fixed GRFs will never intersect with the allocated GRFs. That's true today, though it might change some day if we decide to register-allocate the regs containing push constants once they're dead. This fixes a regression in 0f7325b89038937bd428f7c89ed9859189a0ab0b in Lightsmark from the texture instructions now containing g0 references instead of having that be implied. Performance is improved 15.2% +/- 3.6% (n=3). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34968
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f147599ef4b0d14c25a7e0d3f9f1c9b0229bb6fc |
|
19-May-2011 |
Eric Anholt <eric@anholt.net> |
i965: Remove linear_color for GL_PERSPECTIVE_CORRECTION_HINT. From the GL 2.1 spec: "Required perspective-correct interpolation for all fragment attributes except depth in sections 3.4.1 and 3.5.1, effectively making GL PERSPECTIVE CORRECT HINT a no-op." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fa42de5ad7ebbc0b81ce6ba0553742f0413690a7 |
|
24-May-2011 |
Eric Anholt <eric@anholt.net> |
i965: Fix assertion failures in unused brw_reg setup by deleting it. I was using undefined values to create an unused value. Go me. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37366 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9be8524af753791d26fbd65417c5380b4d934296 |
|
21-May-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Fix sampling on Ivybridge after headerless change. Fixes a regression since 90e922267a89fa9bef254bb257405531ceff7356. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
24de02acaca2ed2e5149a6a026b8707cd0d6d27f |
|
21-May-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Remove "TXD" from justification of sampler message headers. The coordinate offsets set in the m1 header are for textureOffset; they have nothing to do with textureGrad (TXD). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
90e922267a89fa9bef254bb257405531ceff7356 |
|
12-May-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Don't emit a header on gen5+ sample messages unless required. Improves glbenchmark egypt performance 0.6% +/- 0.4% (n=6). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4bbc7915f16a8b0dcead3f34aa1b4f0328147bea |
|
12-May-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Fix GPU hang on texture2d-bias on pre-Ironlake. In the 16-wide rework, I missed that we were setting some things to be SIMD16 mode (corresponding to their setup in emit_texture_gen4()). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b126a0c0cb30b1e2f2df1953fe14d8596d1cf4f7 |
|
02-Nov-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for correct GL_CLAMP behavior by clamping coordinates. This removes the stupid strict-conformance fallback code I broke when adding ARB_sampler_objects. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36572 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7592f005608e6c03d53c18d27d9af84bde802014 |
|
11-May-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Drop the viewport index/rtai clearing in gen6 fb writes. These fields are documented to be in the payload, and though the FB write docs say they *aren't* in the payload, for all other fields the payload and header is structured so that no overwriting is required except for non-default options.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
136eb2bde769713b100351ff96bceb970f068c0a |
|
10-May-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add support for "if" statements in 16-wide mode on gen6+. It turns out there's nothing in the hardware preventing this. It appears that it ought to work on pre-gen6 as well, but just produces GPU hangs. Improves glbenchmark Egypt framerate 4.4% +/- 0.3% (n=3), and Pro by 2.6% +/- 0.6% (n=3). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
27b03926618ddcafabb7b61e652fe6458b017b24 |
|
11-May-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Fix discard and alpha test in 16-wide. As of gen6, alt-mode (which we use) MOVs of floats are not raw -- they'll modify infs/nans. This broke discard and alpha test in 16-wide, where apparently the upper 8 bits of the pixel enables being set were causing the whole value to get trashed upon being moved. Treating the values as UD instead of float makes sure they get preserved. While I'm here, replace the two 8-wide moves of the halves of the header with a single compressed move. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36648 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
51761a1aefd31b7df12edd9467ac630b9cbbbbc9 |
|
11-May-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Cut an instruction and a temporary from gen6 discard statements. I thought I was thwarted initially when I couldn't do conditional mod on a MOV, and couldn't use two immediate constants in one instruction. But g0 != g0 is also a way to produce a failing comparison. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5dd5be69f099211db027b6e39150cacefcfdf8b6 |
|
09-May-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Fix compiler warnings about dead code from 963431829055f63ec94d
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2a95568f64a6641a49a2d4855272e9be2ac2db6d |
|
11-May-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Avoid register coalescing away MATH workarounds on Ivybridge. The MATH instruction cannot handle source modifiers, even on Gen7. So, apply this workaround for Sandybridge on Ivybridge as well. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
64ce592679a5b08d66e3cbbf964f9e695e14aee1 |
|
16-Mar-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Add support for IF/ELSE/ENDIF control flow on Ivybridge. Ivybridge's IF instruction doesn't support conditional modifiers. It also introduces UIP, which must point to the ENDIF instruction. ELSE and ENDIF remain the same except that JIP moves from dst to src1. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ff6e3c73f6553cd29b915497b5b00e3ef158a27d |
|
29-Apr-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Add support for Ivybridge texturing messages. Ivybridge puts the shadow comparator first, then lod/bias, and finally the coordinate---unlike previous generations which always reserved four slots for the coordinate at the beginning. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5936d96d33e767aa99f6afa92f2a6582ff04df23 |
|
16-May-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Move IF stack handling into the EU abstraction layer/brw_compile. This hides the IF stack and back-patching of IF/ELSE instructions from each of the code generators, greatly simplifying the interface. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
37642518b8864ce751754957b08cdb437998f4e7 |
|
29-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add support for compute-to-mrf in 16-wide mode. This is more painful than instruction scheduling, as we have to compare two MRF writes to see if they coincide, and have to handle partial GRF writes before that (for example, the result of a math instruction written to color). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
445289b5093acb9abaf7e0a89bfa319fcb4a1c31 |
|
29-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Typo fix a comment. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0834607a891f7c2529d1f2cdeca28b6e98899f8b |
|
25-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Enable constant propagation in 16-wide. All that needed fixing was skipping the newly-possible uncompressed/sechalf partial GRF constant writes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3b20f999bb7e9056e83ca09a842a9747d4ac1674 |
|
23-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add support for 16-wide dispatch with uniforms in use. This is glued in in a bit of an ugly way -- we rely on the uniforms having been set up by 8-wide dispatch, and we just reuse them without the ability to add new uniforms for any reason, since the 8-wide compile is already completed. Today, this all works out because our optimization passes are effectively the same for both and even if they weren't, we don't reduce the set of uniforms pushed after optimization. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b943b9b1a696cf51adfb2a18bcb9cf503fb2737f |
|
23-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add a little whitespace between shader dumping debug. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9c57780dc0604f871650c5d23c06d627d964d803 |
|
28-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add support for compr4 MRF writes. These reduce an emitted (not decoded) instruction per shader on g4x/gen5, but may allow for additional register coalescing as well. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
42ad2f0b9b6a18f1613f6d915a46b4a4a89c5aa2 |
|
14-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add support for 16-wide dispatch on gen5. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
662f1b48bd1a02907bb42ecda889a3aa52a5755d |
|
12-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add initial support for 16-wide dispatch on gen6. At this point it doesn't do uniforms, which have to be laid out the same between 8 and 16. Other than that, it supports everything but flow control, which was the thing that forced us to choose 8-wide for general GLSL support. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
76b7a0c1af23838cb5100424a2a88d621b881d05 |
|
24-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add support for discard instructions in 16-wide mode. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
148a32e622c5b95a4dbd9a8776fddf85ef484147 |
|
29-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add support for math instructions in 16-wide mode. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
54990673a65b72fd222aeafc19f3a384ce597146 |
|
24-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Fix interference calculation of pixel_[xy] in 16-wide. Fixes glsl-fs-ceil in that mode, which produced the code in the comment. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
af20328271425c217630b5114ee172bd8387a91a |
|
23-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Disable some optimization passes under 16-wide for now. These are fixable for 16, but that can wait until after it's basically working. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8575d1836249309048d77d342671aad65c7fa7ff |
|
13-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add support for 16-wide texturing on gen5+. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
141b0bb2779c80d3cd3fd21d2e9d10efa0433f26 |
|
21-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add support for computing pixel_[xy] in 16-wide. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7c647a2fe98a645723fa5eace7f7f6c5c26f4f8e |
|
14-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965: Move the destination reg setup for 8/16 wide to the emit code. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4847f802c28e595130bda14055cd52c9b1f51cd7 |
|
09-Apr-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Constant-fold immediates in src0 of SEL instructions. This is like what we do for add/mul, but we have to invert the predicate to choose the other source instead. This removes 5 extra moves of constants in nexuiz shaders. No statistically significant performance difference on my Sandybridge laptop (n=5). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
811c147220d2630b769e505ce4d40ef9108fe034 |
|
09-Apr-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Constant-fold immediates in src0 of CMP instructions. This is like what we do with add/mul, but we also have to flip the conditional test. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
963431829055f63ec94d88c97a5d07d30e49833a |
|
03-Apr-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Remove broken optimization for live intervals in loops. The theory here was to detect a temporary variable used within a loop, and avoid considering it live across the entire loop. However, it was overeager and failed when the first definition of the variable appeared within the loop but was only conditionally defined. Fixes glsl-fs-loop-redundant-condition.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5d7fefb9afbcc6f1d58a92d07c390e6b912c3b00 |
|
03-Apr-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Switch W and 1/W in Sandybridge interpolation setup. Various documentation mentions that "W" is handed to the WM stage, but further digging seems to indicate that they really mean 1/W. The code here is still unclear, but changing this fixes piglit test "fragcoord_w" on Sandybridge as well as a Khronos ES2 conformance test. I also tested 3DMarkMobile ES2.0's taiji and hoverjet demos, as well as Nexuiz, just to be safe. NOTE: This is a candidate for the 7.10 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a99e80d795f7c6aec0e73369a31d1728577b9727 |
|
25-Mar-2011 |
Ian Romanick <ian.d.romanick@intel.com> |
mesa: Fix ugly indentation left from previous commit Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Chad Versace <chad.versace@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
89d81ab16c05818b290ed735c1343d3abde449bf |
|
25-Jan-2011 |
Ian Romanick <ian.d.romanick@intel.com> |
glsl: Calcluate Mesa state slots in front-end instead of back-end This should be the last bit of infrastructure changes before generating GLSL IR for assembly shaders. This commit leaves some odd code formatting in ir_to_mesa and brw_fs. This was done to minimize whitespace changes / reindentation in some loops. The following commit will restore formatting sanity. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Chad Versace <chad.versace@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
252eaa765e69a70036ec33a7e1e0ffeac1aab2ff |
|
29-Mar-2011 |
Chris Wilson <chris@chris-wilson.co.uk> |
i965: Avoid name clash of loop counter and member src/mesa/drivers/dri/i965/brw_fs.cpp:565 warning: name lookup of ‘c’ changed Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0c8beb0ab5e72a9d2ecaad51db16a7d5291e120b |
|
27-Mar-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Fix linear gl_Color interpolation on pre-gen6 hardware. Civilization 4's shaders make heavy use of gl_Color and don't use perspective interpolation. This resulted in rivers, units, trees, and so on being rendered almost entirely white. This is a regression compared to the old fragment shader backend. Found by inspection (comparing the old and new FS backend code). References: https://bugs.freedesktop.org/show_bug.cgi?id=32949 NOTE: This is a candidate for the 7.10 branch.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4e994e150f65c854229b4af12eae5519ebd9dda1 |
|
25-Mar-2011 |
Ian Romanick <ian.d.romanick@intel.com> |
i965/fs: Use different name for inner loop counter 'i' is already used for the outer loop. This caused some problems while doing other work in this area. No bug exists here... until you want to use the outer loop counter in the inner loop.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2911fa0cca86f7acbc5423cab4dd328a412253cd |
|
13-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Make compile failure more verbose with INTEL_DEBUG=wm.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4a101f957714dea2bc956d516d34c5b56ecb2c64 |
|
13-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Clean up reg_undef args from long ago lack of fs_inst overloads.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
53d78be3bde68bfb6416fb9c1abfbc24030f390e |
|
13-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Clean up the emit calls by introducing emit() overload helpers. I think the code ends up a lot more legible this way, though we've still got the overloads in the fs_inst as well (even though there's only one caller left currently).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5e9aa9926b9bdf1260ce7350b88908bda337388b |
|
26-Feb-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
mesa: Remove the CompileShader driver hook; it's just a no-op.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2279156fe7ac9718533b8b0de90ae96100486680 |
|
16-Mar-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Rename brw_(IF|CONT)_gen6 functions to gen6_(IF|CONT).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cc48d663f7282411d88c6187ce3d03f21df0acd3 |
|
16-Mar-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Rename BRW_SAMPLER_MESSAGE_..._GEN5 to GEN5_SAMPLER_MESSAGE. We already have lots of GEN6_* defines; this seems more consistent.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a99447314ca1cfce60f2a22285398fb222b2a440 |
|
12-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965: Fix alpha testing when there is no color buffer in the FBO. We were alpha testing against an unwritten value, resulting in garbage. (part of) Bug #35073.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b60651a17ba85af14e9d7b9a1398e065adb58665 |
|
11-Mar-2011 |
Eric Anholt <eric@anholt.net> |
i965: Do our lowering passes before the loop of optimization. The optimization loop won't reinsert noise instructions or quadop vectors, so we were traversing the tree for nothing. Lowering vector indexing was in the loop after do_common_optimization() to avoid the work if it ended up that the index was actually constant, but that has been called already in the core.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
25e31140952c328f70f804e0134664d7ed6248a6 |
|
14-Mar-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Enable texture lookups whose return type is 'float' This enables the new shadow texture functions in GLSL 1.30. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chad Versace <chad.versace@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
403be1111190a3fe63ae03bc0111e0a0b026495b |
|
13-Mar-2011 |
Eric Anholt <eric@anholt.net> |
Revert "i965: Use the fixed function GLSL program instead of the ARB program." This reverts commit 81b34a4e3a7aec9cdf2781757408dc5e9eec79cb. There were regressions in the core change that this depends on.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
81b34a4e3a7aec9cdf2781757408dc5e9eec79cb |
|
12-Jan-2011 |
Eric Anholt <eric@anholt.net> |
i965: Use the fixed function GLSL program instead of the ARB program. This gets one more piece of the pipeline onto the new codegen backend. Once ARB_fragment_program can generate GLSL programs, we can nuke the old backend.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
186d3bc7a3389b78a851e34d8f970c28b8db1608 |
|
01-Mar-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
Revert "i965/fs: Correctly set up gl_FragCoord.w on Sandybridge." This reverts commit 4a3b28113c3d23ba21bb8b8f5ebab7c567083a6d, as it caused a regression on Ironlake (bug #34646).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
58f7c9c72ee52527610b26ca8a137dd88c082c89 |
|
25-Feb-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Initial plumbing to support TXD. This adds the opcode and the code to convert ir_txd to OPCODE_TXD; it doesn't actually add support yet.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2830b1ae9032666e62460de5aece8db843c51c14 |
|
28-Oct-2010 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Complete TXL support on gen5+. Initial plumbing existed to turn the ir_txl into OPCODE_TXL, but it was never handled.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4ddd11aad6a396e98ae30e3e78f6736804eae541 |
|
28-Oct-2010 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Complete TXL support on gen4. Initial plumbing existed to turn the ir_txl into OPCODE_TXL, but it was never handled.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e54d62b89677624b5806442cc5053c0ceedd79b0 |
|
28-Oct-2010 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Use a properly named constant in TXB handling. The old value, BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE makes it sound like we're doing a non-bias texture lookup. It has the same value as the new constant BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE_BIAS_COMPARE, so there should be no functional changes.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4a3b28113c3d23ba21bb8b8f5ebab7c567083a6d |
|
20-Feb-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Correctly set up gl_FragCoord.w on Sandybridge. pixel_w is the final result; wpos_w is used on gen4 to compute it. NOTE: This is a candidate for the 7.10 branch. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
df2aef0e197f9276f60a8e755260420c90841269 |
|
20-Feb-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Refactor control flow stack handling. We can't safely use fixed size arrays since Gen6+ supports unlimited nesting of control flow. NOTE: This is a candidate for the 7.10 branch. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2c2686b912de19a430aba9f5ea5fa679eabdc5c6 |
|
19-Feb-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Avoid register coalescing away gen6 MATH workarounds. The code that generates MATH instructions attempts to work around the hardware ignoring source modifiers (abs and negate) by emitting moves into temporaries. Unfortunately, this pass coalesced those registers, restoring the original problem. Avoid doing that. Fixes several OpenGL ES2 conformance failures on Sandybridge. NOTE: This is a candidate for the 7.10 branch. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
72cd7e87d35e96fad9643f1cee706a8568fa3fa1 |
|
19-Feb-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
i965/fs: Apply source modifier workarounds to POW as well. Single-operand math already had these workarounds, but POW (the only two operand function) did not. It needs them too - otherwise we can hit assertion failures in brw_eu_emit.c when code is actually generated. NOTE: This is a candidate for the 7.10 branch. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0f7325b89038937bd428f7c89ed9859189a0ab0b |
|
27-Dec-2010 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Emit texel offsets in sampler messages.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d3073f58c17d8675a2ecdd5dfa83e5520c78e1a8 |
|
21-Jan-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
Convert everything from the talloc API to the ralloc API.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e256e4743c3f8f924f0d191759d9428f33f3e329 |
|
19-Jan-2011 |
Kenneth Graunke <kenneth@whitecape.org> |
glsl, i965: Remove unnecessary talloc includes. These are already picked up by ir.h or glsl_types.h.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
382c2d99da3f219a5b82f391a81b534b6b44ebce |
|
19-Jan-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add a helper function for detecting math opcodes.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1991d92207cf629ba4ceead4bfc3f768d7b9e402 |
|
19-Jan-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Assign URB/CURB register numbers after instruction scheduling. This fixes a bunch of unnecessary barriers due to the scheduler not knowing what that arbitrary register description refers to when trying to reason about its dependencies. The result is rescheduling in the convolution kernel shader in Lightsmark, which results in avoiding register spilling and increasing the performance of the first scene from 6-7 fps midway through the panning to 11fps. The register spilling was a regression from Mesa 7.9 to Mesa 7.10.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
63879d90ace519749fed228ca0e21b5b56c7e1c0 |
|
19-Jan-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add an instruction scheduler. Improves performance of my GLSL demo by 5.1% (+/- 1.4%, n=7). It also reschedules the giant multiply tree at the end of glsl-fs-convolution-1 so that we end up not spilling registers, producing the expected level of performance.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3f2fe31eee1667ef9cad99aaad69e52a09c9effa |
|
19-Jan-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Add a helper for detecting texturing opcodes.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
568e0083651dd29e5bce94ade8625a64a0e85e88 |
|
18-Jan-2011 |
Eric Anholt <eric@anholt.net> |
i965: Fix a comment typo.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8ce425f3e3e330bda859c439b915c4e59b1a2bf4 |
|
18-Jan-2011 |
Eric Anholt <eric@anholt.net> |
i965: Fix a bug in i965 compute-to-MRF. Fixes piglit glsl-fs-texture2d-branching. I couldn't come up with a testcase that didn't involve dead code, but it's still worthwhile to fix I think.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e4be665bbddcb6ddfd7b9b13f01152a97097b35c |
|
18-Jan-2011 |
Eric Anholt <eric@anholt.net> |
i965: Fix dead pointers to fp->Parameters->ParameterValues[] after realloc. Fixes texrect-many regression with ff_fragment_shader -- as we added refs to the subsequent texcoord scaling paramters, the array got realloced to a new address while our params[] still pointed at the old location.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a6e4614ca1284c5731876bb88732b326bf13aba0 |
|
14-Jan-2011 |
Eric Anholt <eric@anholt.net> |
i965: Replace broken handling of dead code with an assert. This code should never have been triggered, but I often did anyway when I disabled optimization passes during debugging, then spent my time debugging that this code doesn't work.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7c7df146b59bae9dcb3a271bd3c671e273015617 |
|
14-Jan-2011 |
Eric Anholt <eric@anholt.net> |
i965: Add an invalidation of live intervals after register splitting. No effect, since it was called before live intervals were calculated.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c3f000b3926988124a44ce7e8cd6588e46063058 |
|
12-Jan-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: Do flat shading when appropriate. We were trying to interpolate, which would end up doing unnecessary math, and doing so on undefined values. Fixes glsl-fs-flat-color.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e880a57a71bbd5152ed26367dcc7051f21c20981 |
|
12-Jan-2011 |
Eric Anholt <eric@anholt.net> |
i965: Clarify when we need to (re-)calculate live intervals. The ad-hoc placement of recalculation somewhere between when they got invalidated and when they were next needed was confusing. This should clarify what's going on here.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ab56e3be9aae54602372427755305c354821e105 |
|
12-Jan-2011 |
Eric Anholt <eric@anholt.net> |
i965/fs: When producing ir_unop_abs of an operand, strip negate. We were returning the negative absolute value, instead of the absolute value. Fixes glsl-fs-abs-neg.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4eb7284ef98c24331761cbe683c5bd89058e3ad3 |
|
12-Jan-2011 |
Eric Anholt <eric@anholt.net> |
i965: Tighten up the check for flow control interfering with coalescing. This greatly improves codegen for programs with flow control by allowing coalescing for all instructions at the top level, not just ones that follow the last flow control in the program.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
df4d83dca4618eb7077637865763d3e9ab750d11 |
|
29-Dec-2010 |
Eric Anholt <eric@anholt.net> |
i965: Do lowering of array indexing of a vector in the FS. Fixes a regression in ember since switching to the native FS backend, and the new piglit tests glsl-fs-vec4-indexing-{2,3} for catching this.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
54df8e48bcceacbfa468d5237f2981b26493df29 |
|
28-Dec-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix regression in FS comparisons on original gen4 due to gen6 changes. Fixes 26 piglit cases on my GM965.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
74dffb39c3434b590b36833905f2b12a6e3477e9 |
|
28-Dec-2010 |
Eric Anholt <eric@anholt.net> |
i965: Factor out the ir comparision to BRW_CONDITIONAL_* code.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
634a7dce9c1d9e4a8576ff8197c8adaea7e9ddd1 |
|
27-Dec-2010 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Flatten if-statements beyond depth 16 on pre-gen6. Gen4 and Gen5 hardware can have a maximum supported nesting depth of 16. Previously, shaders with control flow nested 17 levels deep would cause a driver assertion or segmentation fault. Gen6 (Sandybridge) hardware no longer has this restriction. Fixes fd.o bug #31967.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4374703a9b2ce0be105ee544c8402a932e3e1f52 |
|
22-Dec-2010 |
Zhenyu Wang <zhenyuw@linux.intel.com> |
i965: explicit tell header present for fb write on sandybridge Determine header present for fb write by msg length is not right for SIMD16 dispatch, and if there're more output attributes, header present is not easy to tell from msg length. This explicitly adds new param for fb write to say header present or not. Fixes many cases' hang and failure in GL conformance test.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
036c817f77f71e7c4b17571ae100a9bc93d8fe5b |
|
13-Dec-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix gl_FragCoord.z setup on gen6. Fixes glsl-bug-22603.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c3ca384e7190656afcd9f5143e811843efa2b3cb |
|
09-Dec-2010 |
Vinson Lee <vlee@vmware.com> |
i965: Silence uninitialized variable warning. Fixes this GCC warning. brw_fs.cpp: In function 'brw_reg brw_reg_from_fs_reg(fs_reg*)': brw_fs.cpp:3255: warning: 'brw_reg' may be used uninitialized in this function
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b2167a6c013c057e731b96486e3363c1d1171d60 |
|
08-Dec-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix flipped value of the not-embedded-in-if on gen6. Fixes: glean/glsl1-! (not) operator (1, fail) glean/glsl1-! (not) operator (1, pass)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7ca7e9b626389dd6dac683c6664b8478e6d5c3b9 |
|
07-Dec-2010 |
Eric Anholt <eric@anholt.net> |
i965: Work around gen6 ignoring source modifiers on math instructions. With the change of extended math from having the arguments moved into mrfs and handed off through message passing to being directly hooked up to the EU, it looks like the piece for doing source modifiers (negate and abs) was left out. Fixes: fog-modes glean/fp1-ARB_fog_exp test glean/fp1-ARB_fog_exp2 test glean/fp1-Computed fog exp test glean/fp1-Computed fog exp2 test ext_fog_coord-modes
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6848e27e1462e98dd91826a06f96c203c9eeebd0 |
|
07-Dec-2010 |
Ian Romanick <ian.d.romanick@intel.com> |
i965: Correctly emit constants for aggregate types (array, matrix, struct) Previously the code only handled scalars and vectors. This new code is modeled somewhat after similar code in ir_to_mesa. Reviewed-by: Eric Anholt <eric@anholt.net>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
16f8c823898fd71a3545457eacd2dc31ddeb3592 |
|
11-Nov-2010 |
Eric Anholt <eric@anholt.net> |
i965: Move payload reg setup to compile, not lookup time. Payload reg setup on gen6 depends more on the dispatch width as well as the uses_depth, computes_depth, and other flags. That's something we want to decide at compile time, not at cache lookup. As a bonus, the fragment shader program cache lookup should be cheaper now that there's less to compute for the hash key.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
843a6a308e05bd4bf2056e08ec65ac4770097b93 |
|
01-Dec-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for gen6 CONTINUE instruction emit. At this point, piglit tests for fragment shader loops are working.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
00e5a743e2ee3981a34b95067a97fa73c0f5d779 |
|
01-Dec-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for gen6 BREAK ISA emit. There are now two targets: the hop-to-end-of-block target, and the target for where to resume execution for active channels.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4890e0f09c934e3ffb692b417e5444e43685c876 |
|
01-Dec-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for gen6 DO/WHILE ISA emit. There's no more DO since there's no more mask stack, and WHILE has been shuffled like IF was.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2927b6c21202fd0f9a661665e0093e7193c5df6e |
|
30-Nov-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix type of gl_FragData[] dereference for FB write. Fixes glsl-fs-fragdata-1, and hopefully Eve Online where I noticed this bug in the generated shader. Bug #31952.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b6b91fa02911f5dfc5d528d822674ee5557800d9 |
|
19-Nov-2010 |
Eric Anholt <eric@anholt.net> |
i965: Remove duplicate MRF writes in the FS backend. This is quite common for multitexture sampling, and not only cuts down on the second and later set of MOVs, but typically also allows compute-to-MRF on the first set. No statistically siginficant performance difference in nexuiz (n=3), but it reduces instruction count in one of its shaders and seems like a good idea.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
47b1aac1cf0aefae4df58a60bb7eb26d21e25913 |
|
18-Nov-2010 |
Eric Anholt <eric@anholt.net> |
i965: Improve compute-to-mrf. We were skipping it if the instruction producing the value we were going to compute-to-mrf used its result reg as a source reg. This meant that the typical "write interpolated color to fragment color" or "texture from interpolated texcoord" shader didn't compute-to-MRF. Just don't check for the interference cases until after we've checked if this is the instruction we wanted to compute-to-MRF. Improves nexuiz high-settings performance on my laptop 0.48% +- 0.08% (n=3).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
19631fab35ca4d5ca64d606922f3f20774b27645 |
|
19-Nov-2010 |
Eric Anholt <eric@anholt.net> |
i965: Recognize saturates and turn them into a saturated mov. On pre-gen6, this turns 4 instructions into 1. We could still do better by folding the saturate into the instruction generating the value if nobody else uses it, but that should be a separate pass.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
602ae2441aaca6a652d3fc78114bb60852132f98 |
|
18-Nov-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fold constants into the second arg of BRW_SEL as well. This hits a common case with min/max operations.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f9b420d3bda25ea517b66c5ee2c6bde4fdff3935 |
|
18-Nov-2010 |
Eric Anholt <eric@anholt.net> |
i965: Remove extra \n at the end of every instruction in INTEL_DEBUG=wm.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
61126278a39fbff9a66aff9ecc37893e87950091 |
|
19-Nov-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix compute_to_mrf to not move a MRF write up into another live range. Fixes glsl-fs-copy-propagation-texcoords-1.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
63684a9ae7a66f68df1f2c68cd9358e5622122a3 |
|
19-Nov-2010 |
Kenneth Graunke <kenneth@whitecape.org> |
glsl: Combine many instruction lowering passes into one. This should save on the overhead of tree-walking and provide a convenient place to add more instruction lowering in the future. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
11d6f1c69871d0b7edc28f639256460839fccd2d |
|
16-Nov-2010 |
Ian Romanick <ian.d.romanick@intel.com> |
glsl: Add ir_quadop_vector expression The vector operator collects 2, 3, or 4 scalar components into a vector. Doing this has several advantages. First, it will make ud-chain tracking for components of vectors much easier. Second, a later optimization pass could collect scalars into vectors to allow generation of SWZ instructions (or similar as operands to other instructions on R200 and i915). It also enables an easy way to generate IR for SWZ instructions in the ARB_vertex_program assembler.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fc92e87b9757eda01caf0bb3e2c31b1dbbd73aa0 |
|
11-Nov-2010 |
Ian Romanick <ian.d.romanick@intel.com> |
glsl: Eliminate assumptions about size of ir_expression::operands This may grow in the near future.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f2616e56de8a48360cae8f269727b58490555f4d |
|
18-Nov-2010 |
Ian Romanick <ian.d.romanick@intel.com> |
glsl: Add ir_unop_sin_reduced and ir_unop_cos_reduced The operate just like ir_unop_sin and ir_unop_cos except that they expect their inputs to be limited to the range [-pi, pi]. Several GPUs require this limited range for their sine and cosine instructions, so having these as operations (along with a to-be-written lowering pass) helps this architectures. These new operations also matche the semantics of the GL_ARB_fragment_program SCS instruction. Having these as operations helps in generating GLSL IR directly from assembly fragment programs.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
50b4508319cc5277d51a38065850eaa092afc0d4 |
|
18-Nov-2010 |
Eric Anholt <eric@anholt.net> |
i965: Eliminate dead code more aggressively. If an instruction writes reg but nothing later uses it, then we don't need to bother doing it. Before, we were just killing code that was never read after it was ever written. This removes many interpolation instructions for attributes with only a few comopnents used. Improves nexuiz high-settings performance .46% +/- .12% (n=3) on my Ironlake.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
da35388044db4aa6fc66c08a087d8d703b5a6008 |
|
17-Nov-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fail on loops on gen6 for now until we write the EU emit code for it.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d512cbf58f9039575dbbb5ab65dbbf7b742a0854 |
|
17-Nov-2010 |
Eric Anholt <eric@anholt.net> |
i965: Shut up spurious gcc warning about GLSL_TYPE enums.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9935fe705df44bb633039ca74332cc0c126ccc30 |
|
17-Nov-2010 |
Kenneth Graunke <kenneth@whitecape.org> |
glsl: Remove the ir_binop_cross opcode.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3b337f5cd94384d2d5918fb630aa8089e49b1d8d |
|
13-Nov-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix gl_FragCoord inversion when drawing to an FBO. This showed up as cairo-gl gradients being inverted on everyone but Intel, where I'd apparently tweaked the transformation to work around the bug. Fixes piglit fbo-fragcoord.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d11db2a857141c556378fde9f9f5ec08c7f8636f |
|
14-Nov-2010 |
Vinson Lee <vlee@vmware.com> |
i965: Silence uninitialized variable warning. Silences this GCC warning. brw_fs.cpp: In member function 'void fs_visitor::split_virtual_grfs()': brw_fs.cpp:2516: warning: unused variable 'reg'
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9effc1adf1e7ba57fb3b10909762b76c1ae12f61 |
|
12-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: re-enable gen6 IF statements in the fragment shader. IF statements were getting flattened while they were broken. With Zhenyu's last fix for ENDIF's type, everything appears to have lined up to actually work. This regresses two tests: glsl1-! (not) operator (1, fail) glsl1-! (not) operator (1, pass) but fixes tests that couldn't work before because the IFs couldn't be flattened: glsl-fs-discard-01 occlusion-query-discard (and, naturally, this should be a performance improvement for apps that actually use IF statements to avoid executing a bunch of code).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bb1540835056cdea5db6f55b19c0c87358f14cd1 |
|
03-Nov-2010 |
Eric Anholt <eric@anholt.net> |
intel: Annotate debug printout checks with unlikely(). This provides the optimizer with hints about code hotness, which we're quite certain about for debug printouts (or, rather, while we developers often hit the checks for debug printouts, we don't care about performance while doing so).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cbc966b57bdb61f5bc158352a9c8dd57bf31b81e |
|
19-Oct-2010 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Add bit operation support to the fragment shader backend.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9e3641bd0d739a87a6998300ca29580cb557f380 |
|
25-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Make FS uniforms be the actual type of the uniform at upload time. This fixes some insanity that would otherwise be required for GLSL 1.30 bit ops or gen6 integer uniform operations in general, at the cost of upload-time pain. Given that we only have that pain because mesa's mangling our integer uniforms to be floats, this something that should be fixed outside of the shader codegen.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
84eba3ef71dfa822e5ff0463032cdd2e3515b888 |
|
13-Oct-2010 |
Ian Romanick <ian.d.romanick@intel.com> |
Track separate programs for each stage The assumption is that all stages are the same program or that varyings are passed between stages using built-in varyings.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
62452e7d94a6353b59dfe0a8891d0709670dbeac |
|
26-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for discard instructions on gen6. It's a little more painful than before because we don't have the handy mask register any more, and have to make do with cooking up a value out of the flag register.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0e8c834ffa2f6d943a927e1a32a273d2f8600694 |
|
26-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Clear some undefined fields of g0 when using them for gen6 FB writes. This doesn't appear to help any testcases I'm looking at, but it looks like it's required.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
07cd8f46acc34b04308f81de2faf05ba33da264b |
|
22-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for pull constants to the new FS backend. Fixes glsl-fs-uniform-array-5, but not 6 which fails in ir_to_mesa.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ff622d5528c8cca465e29081c0792ca210cdd092 |
|
22-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Move the FS disasm/annotation printout to codegen time. This makes it a lot easier to track down where we failed when some code emit triggers an assert. Plus, less memory allocation for codegen.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1d91f8d9164b38b4c924f43ec4fc5ceb65c96a78 |
|
22-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Be more aggressive in tracking live/dead intervals within loops. Fixes glsl-fs-convolution-2, which was blowing up due to the array access insanity getting at the uniform values within the loop. Each temporary was considered live across the whole loop.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4e7252510976d8d3ff12437ea8842129f24d88f5 |
|
22-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Correct scratch space allocation. One, it was allocating increments of 1kb, but per thread scratch space is a power of two. Two, the new FS wasn't getting total_scratch set at all, so everyone thought they had 1kb and writes beyond 1kb would go stomping on a neighbor thread. With this plus the previous register spilling for the new FS, glsl-fs-convolution-1 passes.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
99b2c8570ea6f46c6564681631f0e0750a0641cc |
|
19-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for register spilling. It can be tested with if (0) replaced with if (1) to force spilling for all virtual GRFs. Some simple tests work, but large texturing tests fail.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7a3f113e79f983222ecc95c33655a8c9354fcfad |
|
21-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix gl_FrontFacing emit on pre-gen6. It's amazing this code worked. Basically, we would get lucky in register allocation and the tests using frontfacing would happen to allocate gl_FrontFacing storage and the instructions generating gl_FrontFacing but pointing at another register to the same hardware register. Noticed during register spilling debug, when suddenly they didn't get allocatd the same storage.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5ac6c4ecfe77bf7e02ae61981b2c8b1fe73027cd |
|
20-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Split register allocation out of the ever-growing brw_fs.cpp.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ae5698e60467db2a7e3f730788cdcdd3711da101 |
|
19-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Use the new style of IF statement with embedded comparison on gen6. "Everyone else" does it this way, so follow suit. It's fewer instructions, anyway.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
80c9f756b28d15ca097963af35915f5b073f081d |
|
19-Oct-2010 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Remove unused variable.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
746e68c50b4ae1566b342fbc965557b6dbcfaa2e |
|
18-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix a weirdness in NOT handling. XOR makes much more sense. Note that the previous code would have failed for not(not(x)), but that gets optimized out.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ea213417f14a8b2734cb2a88d8aa1ac05a70b7d5 |
|
18-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Disable the debug printf I added for FS disasm.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
65d4234c2398aaa48eb5e29e6e7bede40fe2fd36 |
|
18-Oct-2010 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Add missing "break" statement. Otherwise, it would try to handle arrays as structures, use uninitialized memory, and crash.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
81d0a1fb3f1e5b7bcf43145f8a096691e3a5fdfb |
|
15-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Set the type of the null register to fix gen6 FS comparisons. We often use reg_null as the destination when setting up the flag regs. However, on gen6 there aren't general implicit conversions to destination types from src types, so the comparison to produce the flag regs would be done on the integer result interpreted as a float. Hilarity ensued. Fixes 20 piglit cases.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
20b39c7760487bae73489b9812408e12d1d56dd5 |
|
15-Oct-2010 |
Ian Romanick <ian.d.romanick@intel.com> |
i965: Fix indentation after commit 3322fbaf
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3322fbaf3b5e305ce00c1d08c26965bb98e0cef0 |
|
14-Oct-2010 |
Ian Romanick <ian.d.romanick@intel.com> |
glsl: Slightly change the semantic of _LinkedShaders Previously _LinkedShaders was a compact array of the linked shaders for each shader stage. Now it is arranged such that each slot, indexed by the MESA_SHADER_* defines, refers to a specific shader stage. As a result, some slots will be NULL. This makes things a little more complex in the linker, but it simplifies things in other places. As a side effect _NumLinkedShaders is removed. NOTE: This may be a candidate for the 7.9 branch. If there are other patches that get backported to 7.9 that use _LinkedShader, this patch should be cherry picked also.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4b4284c9c9b472f750663352485290c22f8c3921 |
|
15-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix texturing on pre-gen5. I broke it in 06fd639c519214b6ebcbf29127b6d9ed429f8641 by only testing 2 generations of hardware :(
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f157812bbbcf9caac1f84988e738fc9d1e051056 |
|
14-Oct-2010 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Add support for ir_unop_round_even via the RNDE instruction.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a81d423d93f22a948f3aa4bf73dc6b1a3b70192f |
|
14-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Enable the new FS backend on pre-gen6 as well. It is now to the point where we have no regressing piglit tests. It also fixes Yo Frankie! and Humus DynamicBranching, probably due to the piglit bias tests that work that didn't on the Mesa IR backend. As a downside, performance takes about a 5-10% performance hit at the moment (e.g. nexuiz 19.8fps -> 18.8fps), which I plan to resolve by reintroducing 16-wide fragment shaders where possible. It is a win, though, for fragment shaders using flow control.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f541b685aaf404fa7c8142f51d91c2720d82f264 |
|
14-Oct-2010 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Use RNDZ for ir_unop_trunc in the new FS. The existing code used RNDD, which rounds down, rather than toward zero.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c4226142f3b5d1c931fcc781be8a3aafdfabf316 |
|
14-Oct-2010 |
Kenneth Graunke <kenneth@whitecape.org> |
i965: Use logical-not when emitting ir_unop_ceil. Fixes piglit test glsl-fs-ceil.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5dd07b442e02696bf0ec5d4e3b4be1674519664a |
|
14-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add peepholing of conditional mod generation from expressions. This cuts usually 2 out of 3 instructions for flag reg generation (if statements, conditional assignment) by producing the conditional mod in the expression representing the boolean value. Fixes glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined (register allocation no longer fails for the conditional generation proliferation)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d5599c0b6a22cd0bbc475ec715824660144d02a0 |
|
14-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add a function for handling the move of boolean values to flag regs. This will be a place to peephole comparisions directly to the flag regs, and for now avoids using MOV with conditional mod on gen6, which is now illegal.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4f88550ba0e1ad07e39903f268975921c0101e85 |
|
14-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add a pass to the FS to split virtual GRFs to float channels. Improves nexuiz performance 0.91% (+/- 0.54%, n=8)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b8613d70da34217b98edb9ac9e0a4c9a6598d0b3 |
|
14-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Update the live interval when coalescing regs.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0c6752026c405dc3ab5fe85c6a40ac3f04c685c3 |
|
14-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Set class_sizes[] for the aligned reg pair class. So far, I've only seen this be a valgrind warning and not a real failure.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a57ef244fc55476660f9fb76982130c5c0b25163 |
|
14-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for rescaling GL_TEXTURE_RECTANGLE coords to new FS.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f9995b30756140724f41daf963fa06167912be7f |
|
12-Oct-2010 |
Kristian Høgsberg <krh@bitplanet.net> |
Drop GLcontext typedef and use struct gl_context instead
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
080e7aface81e6a055ac61988ca27a88ad70f879 |
|
12-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix missing "break;" in i2b/f2b, and missing AND of CMP result. Fixes glsl-fs-i2b.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bcec03d527561e2df56bf9ebfa250cef56bb732b |
|
12-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Always use the new FS backend on gen6. It's now much more correct for gen6 than the old backend, with just 2 regressions I've found (one of which is common with pre-gen6 and will be fixed by an array splitting IR pass). This does leave the old Mesa IR backend getting used still when we don't have GLSL IR, but the plan is to get GLSL IR input to the driver for the ARB programs and fixed function by the next release.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0cadd32b6dc80455802c04b479ec8e768f93ffe1 |
|
12-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix gen6 pixel_[xy] setup to avoid mixing int and float src operands. Pre-gen6, you could mix int and float just fine. Now, you get goofy results. Fixes: glsl-arb-fragment-coord-conventions glsl-fs-fragcoord glsl-fs-if-greater glsl-fs-if-greater-equal glsl-fs-if-less glsl-fs-if-less-equal
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
720ed3c906b0f6d5822fe9fa442294c9828e1560 |
|
11-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Expand uniform args to gen6 math to full registers to get hstride == 1. This is a hw requirement in math args. This also is inefficient, as we're calculating the same result 8 times, but then we've been doing that on pre-gen6 as well. If we're doing math on uniforms, though, we'd probably be better served by having some sort of mechanism for precalculating those results into another uniform value to use. Fixes 7 piglit math tests.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
317dbf4613ebf56ca14ee70c1ad6e620ad7942c2 |
|
11-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Don't compute-to-MRF in gen6 math instructions.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
25cf241540007088936a6df16c849441087f722c |
|
11-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Don't consider gen6 math instructions to write to MRFs. This was leftover from the pre-gen6 cleanups. One tests regresses where compute-to-MRF now occurs.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c6dbf253d284f68b0d0e4a3c145583880855324b |
|
08-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Compute to MRF in the new FS backend. This didn't produce a statistically significant performance difference in my demo (n=4) or nexuiz (n=3), but it still seems like a good idea and is recommended by the HW team.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
06fd639c519214b6ebcbf29127b6d9ed429f8641 |
|
09-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Give the FB write and texture opcodes the info on base MRF, like math.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0cd6cea8a3e9339fc69f9de0da6b40e4f9d5f4fe |
|
08-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Give the math opcodes information on base mrf/mrf len. This is progress towards enabling a compute-to-MRF pass.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
37758fb1cbb1ddcd106553763c1b1f222f4cfb47 |
|
11-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Move FS backend structures to a header. It's time to start splitting some of this up.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
251fe2785484f7ba0c194c92fe0feff9c78b52ca |
|
10-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Reduce register interference checks for changed FS_OPCODE_DISCARD. While I don't know of any performance changes from this (once extra reg available out of 128), it makes the generated asm a lot cleaner looking.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
90c402204018c78f4a0b8a79515cf8c582092963 |
|
10-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Split FS_OPCODE_DISCARD into two steps. Having the single opcode write then read the reg meant that single instruction opcodes had to consider their source regs to interfere with their dest regs.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c52a0b5c7d4b55fb183c8ab68aa3561432287283 |
|
05-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add register coalescing to the new FS backend. Improves performance of my GLSL demo 14.3% (+/- 4%, n=4) by eliminating the moves used in ir_assignment and ir_swizzle handling. Still 16.5% to go to catch up to the Mesa IR backend, presumably because instructions are almost perfectly mis-scheduled now.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
cac04a93974e7ae773b84e000a2b26391ee2f4bb |
|
08-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix new FS gen6 interpolation for sparsely-populated arrays. We'd overwrite the same element twice.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bbb840049e7a92af6e0e8c2c5c21c63caec9e826 |
|
06-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Normalize cubemap coordinates like is done in the Mesa IR path. Fixes glsl-fs-texturecube-2-*
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4d202da7a4951eb534f77014238e7cdca9f781e9 |
|
07-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Disable emitting if () statements on gen6 until we really fix them.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b380531fd40e0876218b1116502bafea7911bd3d |
|
06-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Don't assume that WPOS is always provided on gen6 in the new FS. We sensibly only provide it if the FS asks for it. We could actually skip WPOS unless the FS needed WPOS.zw, but that's something for later. Fixes: glsl-texture2d and probably many others.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1fdc8c007ea66b4c9866bf2c679653a005307fa5 |
|
06-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for gl_FrontFacing on gen6. Fixes glsl1-gl_FrontFacing var (2) with new FS.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a760b5b509f85991a10400977576afabcedbb3c5 |
|
06-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Refactor gl_FrontFacing setup out of general variable setup.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
75270f705f319b0ecf297d1bdd328e52a8a956aa |
|
06-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Gen6's sampler messages are the same as Ironlake. This should fix texturing in the new FS backend.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fe6efc25ed3c1edf26073c4e6b6a3a45c857c1eb |
|
06-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Don't do 1/w multiplication in new FS for gen6 Not needed now that we're doing barycentric.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5eeaf3671e2f913d38187fd1401c4b22a2900d57 |
|
06-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix botch in the header_present case in the new FS. I only set it on the color_regions == 0 case, missing the important case, causing GPU hangs on pre-gen6.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3c97c00e3810d31c3aa26173eb9fdef91b3e7c87 |
|
06-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add back gen6 headerless FB writes to the new FS backend. It's not that hard to detect when we need the header.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
634abbf7b2e6ea21db30aafc0de9472ee31d4173 |
|
05-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Also do constant propagation for the second operand of CMP. We could do the first operand as well by flipping the comparison, but this covered several CMPs in code I was looking at.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
dcd0261affc293b75d231e612091ec7b1076fff6 |
|
05-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Enable the constant propagation code. A debug disable had slipped in.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ea909be58dda7e916cb9ce434ecb78597881ad33 |
|
05-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for gen6 FB writes to the new FS. This uses message headers for now, since we'll need it for MRT. We can cut out the header later.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3bf8774e9c293fcad654d1bd67d4b43247b82f97 |
|
04-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add initial folding of constants into operand immediate slots. We could try to detect this in expression handling and do it proactively there, but it seems like less logic to do it in one optional pass at the end.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e27c88d8e6c9d18bfa793f884d02ce6011c4bdde |
|
04-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add trivial dead code elimination in the new FS backend. The glsl core should be handling most dead code issues for us, but we generate some things in codegen that may not get used, like the 1/w value or pixel deltas. It seems a lot easier this way than trying to work out up front whether we're going to use those values or not.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9faf64bc32cf7c1a06a302fff9f80d7e2e2685d5 |
|
04-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Be more conservative on live interval calculation. This also means that our intervals now highlight dead code.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4fb0c92c6986cf4e88296bab8837320210f1794f |
|
03-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for EXT_texture_swizzle to the new FS backend.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
64a9fc3fc15603a8e25d0e1146fe5da5a5bde55b |
|
02-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Don't try to emit code if we failed register allocation.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6397addd6146661689a0e315b06e543ef12d8868 |
|
02-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix off-by-ones in handling the last members of register classes. Luckily, one of them would result in failing out register allocation when the other bugs were encountered. Applies to glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined, which still fails register allocation, but now legitimately.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
afb64311e3484002e06aeac62187b68467610449 |
|
02-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add a sanity check for register allocation sizes.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5ee09413162f4ec83cc7a738e807ffde8c89cca7 |
|
02-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: When producing a single channel swizzle, don't make a temporary. This quickly cuts 8% of the instructions in my glsl demo.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a0799725f52386cef911d3e104c5514a2811290b |
|
02-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Restore the forcing of aligned pairs for delta_xy on chips with PLN. By doing so using the register allocator now, we avoid wasting a register to make the alignment happen.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e9bcc8328968f05a5688a020bfa8165260865a9b |
|
01-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix up copy'n'pasteo from moving coordinate setup around for gen4.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
bfd9715c3c9d40b3f937638073ff2f0969ebd143 |
|
01-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add real support for pre-gen5 texture sampling to the new FS. Fixes 36 testcases, including glsl-fs-shadow2d*-bias which fail on the Mesa IR backend.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
8f63a44636e4fef2f35fe73f24c27db9b04389b1 |
|
01-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Pre-gen6, map VS outputs (not FS inputs) to URB setup in the new FS. We should fix the SF to actually give us just the data we need, but this fixes regressions in the new FS until then. Fixes: glsl-kwin-blur glsl-routing
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ff5ce9289b5159e7de34706b31be771d3e3cefd6 |
|
01-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Also increment attribute location when skipping unused slots. Fixes glsl1-texcoord varying.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
354c40a62411262d1223f439fdaf2176ca9adbe9 |
|
01-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix the gen6 jump size for BREAK/CONT in new FS. Since gen5, jumps are in increments of 64 bits instead of increments of 128-bit instructions.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
efc4a6f7909dbf554ee440210233c4b0f89ac89e |
|
01-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add gen6 attribute interpolation to new FS backend. Untested, since my hardware is not booting at the moment.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1d073cb2d920d1c0b8c6d598055b14048fedc96e |
|
01-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Split the gen4 and gen5 sampler handling apart. Trying to track the insanity of the different argument layouts for normal/shadow crossed with normal/lod/bias one generation at a time is enough. Fixes: glsl1-texture2D() with bias. (first test passing in this code that doesn't pass without it!)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5f237a1ccb28399fbbceecea694f5d18ebba9938 |
|
01-Oct-2010 |
Eric Anholt <eric@anholt.net> |
i965: Use the lowering pass for texture projection. We should end up with the same code, but anyone else with this issue could share the handling (which I got wrong for shadow comparisons in the driver before).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c6960e4471abe287448b9d0e7e6519d588cdf43c |
|
30-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix new FS handling of builtin uniforms with packed scalars in structs. We were pointing each element at the .x channel of the ParameterValues. Fixes glsl1-linear fog.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6f6542a483ec726538f8a4555bddaeb0be6b2146 |
|
30-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix whole-structure/array assignment in new FS. We need to walk the type tree to get the right register types for structure components. Fixes glsl-fs-statevar-call.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ad1506c5ac61b75e45f24a2e18c91dc8a49a3bb0 |
|
30-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Remove my "safety counter" code from loops. I've screwed this up enough times that I don't think it's worth it. This time, it was that I was doing it once per top-level body instruction instead of just once at the end of the loop body.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b90c7d1713c5a52fd85cb9dacad5828ae2fdbf6c |
|
30-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add live interval analysis and hook it up to the register allocator. Fixes 13 piglit cases that failed at register allocation before.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e1261d3c493ff48348483a0084f3017c7e663dc0 |
|
29-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: First cut at register allocation using graph coloring. The interference is totally bogus (maximal), so this is equivalent to our trivial register assignment before. As in, passes the same set of piglit tests.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
21148e1c0a3cf9cf25ded006a3d5ce2b12803ea9 |
|
29-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Clean up the virtual GRF handling. Now, virtual GRFs are consecutive integers, rather than offsetting the next one by the size. We need the size information to still be around for real register allocation, anyway.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
0efea25c4b9c6b5505fdbba25b525efb27468de4 |
|
30-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i956: Make new FS discard do its work in a temp, not the null reg! Fixes: glsl-fs-discard-02 (GPU hang) glsl1-discard statement (2)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1747aa6755088398108febb121a80d9572c1533e |
|
29-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for builtin uniforms to the new FS backend. Fixes 8 piglit tests.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9ac910cfcddf1b6e7c520261371e78fc9bcbddcf |
|
29-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Clean up obsolete FINISHME comment.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ff0eb45f47ebf2fcc1af06a8b6b934c79dff1d41 |
|
29-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix array indexing of arrays of matrices. The deleted code was meant to be handling indexing of a matrix, which would have been a noop if it had been correct.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
17f3b8097d01a63917afaaefccd6eea070271652 |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Don't try to emit interpolation for unused varying slots. Fixes: glsl-fs-varying-array glsl-texcoord-array glsl-texcoord-array-2 glsl-vs-varying-array
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5272c6a7a23ba74c696608fc2cb07fbfaf9e822a |
|
03-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Do interpolation for varying matrices and arrays in the FS backend. Fixes: glsl-array-varying-01 glsl-vs-mat-add-1 glsl-vs-mat-div-1 glsl-vs-mat-div-2 glsl-vs-mat-mul-2 glsl-vs-mat-mul-3
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b9a59f0358f6f6afc7fafc1b417fa1b2c4cdaf37 |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for ARB_fragment_coord_conventions to the new FS backend. Fixes: glsl-arb-frag-coord-conventions glsl-fs-fragcoord
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
701c5f11c9102047c8962f053843469ada3b3a1a |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for ir_loop counters to the new FS backend. Fixes: glsl1-discard statement in for loop glsl-fs-loop-two-counter-02 glsl-fs-loop-two-counter-04
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
89f6783d1769c61b835b49a5fb4405a3249031f4 |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for MRT to the new FS backend. Fixes these tests using gl_FragData or just gl_FragDepth: glsl1-Preprocessor test (extension test 1) glsl1-Preprocessor test (extension test 2) glsl-bug-22603
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
86fd11262cb5697e5c3563e876781b3587788737 |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for non-color render target write data to new FS backend. This is the first time these payload bits have made sense to me, outside of brw_wm_pass* structure. Fixes: glsl1-gl_FragDepth writing
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2999a44968a045b5516ff23d70b711b01bd696a5 |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Set up sampler numbers in the FS backend. +10 piglits
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9e96c737f8cb6faebf7c7339cfcf14f80ed8e73c |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Subtract instead of adding when computing y delta in new FS backend. Fixes 7 piglit cases.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5f7bd68149e59b6940e891928faa532bce0271f6 |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for gl_FrontFacing to the new FS backend. Fixes: glsl1-gl_FrontFacing var (1) glsl1-gl_FrontFacing var (2)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6bf12c8b7366a9db8c88b9cacaa06266b41a73b5 |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for struct, array, and matrix uniforms to FS backend. Fixes 16 piglit cases.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
ba481f2046e6427c8bd7fc5f8cb8ef3059a7881a |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for dereferencing structs to the new FS backend. Fixes: glsl1-struct(2)
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
07fc8eed8f0398063d87acf3a7ee392da4184822 |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Set the variable type when dereferencing an array. We don't set the type on the array virtual reg as a whole, so here's the right place. Fixes: glsl1-GLSL 1.20 arrays glsl1-temp array with constant indexing, fragment shader glsl1-temp array with swizzled variable indexing
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
719f84d9aba6b016e1069e0461cbfc4211f5a3b5 |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix up the FS backend for the variable array indexing pass. We need to re-run channel expressions afterwards as it generates new vector expressions, and we need to successfully support conditional assignment (brw_CMP takes 2 operands, not 1).
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
57edd7c5c116926325e3a86cef618bfd1b5881c1 |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix valgrind complaint about base_ir for new FS debugging.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1723fdb3f0004a685351d005ba0f5bfc1c2a852e |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Apply the same set of lowering passes to new FS as to Mesa IR. While much of this we will want to support natively, this should make the task of reaching the Mesa IR backend's quality easier. Fixes: glsl-fs-main-return.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e10508812aed4c41c62ea27ac540c8d079bece07 |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Actually track the "if" depth in loop in the new FS backend. Fixes: glsl-fs-if-nested-loop.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fceb78e3cc67d035a69613826f46a18e62235f5c |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix negation in the new FS backend. Fixes: glsl1-Negation glsl1-Negation2
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
94d44c33c0ced34e222517ed9c3b72d3c5e3b9f0 |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for dFdx()/dFdy() to the FS backend. Fixes: glsl-fwidth glsl-derivs-swizzle
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
169ff0cc9d189f5a00a2a94313a6ce1503d1d5b9 |
|
28-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Handle all_equal/any_nequal in the new FS. These are generated for scalar operands instead of plain equal/nequal. But for scalars, they're the same anyway. +30 piglits.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
11ba8bafdbb31f40ecbb6478e26496b547d34c68 |
|
27-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix up writemasked assignments in the new FS. Not sure how I managed to get tests to succeed without this. +54 piglits.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
03923ff95ed2c1ee54f0132e87e277b6cf07b7f5 |
|
22-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Warning fix for vector result any_nequal/all_equal change.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
6ef5f212343c0557c4fca272d8236226c1a7c87a |
|
10-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add switch cases for ir_unop_noise, which should have been lowered. Fixes compiler warnings.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e591c4625cae63660c5000fbab366e40fe154ab0 |
|
05-Sep-2010 |
Luca Barbieri <luca@luca-barbieri.com> |
glsl: add several EmitNo* options, and MaxUnrollIterations This increases the chance that GLSL programs will actually work. Note that continues and returns are not yet lowered, so linking will just fail if not supported. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
32b84ef4ca50998914184fc4600d8e43674a9a22 |
|
05-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Make pixel_xy results UW. There is a restriction on the destination of an operation involving a vector immediate being 128-bit aligned and the destination horizontal stride being equivalent to 2 bytes. Fixes bad pixel_x results from gl_FragCoord, where each pair had the same value.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
5afdfa222fa9ec8c54e7d6957d2680c37a9eb715 |
|
06-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Don't bother with RNDZ for f2i. The default type conversion for MOV should be fine, and RNDZ actually requires two instructions.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3fb5377ba57aea356a81c521c0cf1975dc290b61 |
|
04-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Align the start of attribute interp coefficients in FS to use PLN.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3dbc9ea0a35653a0484d3b0a65a305626c251789 |
|
03-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Just assert when we flagged a compile error in the FS for now. Dumping back to potentially 16-wide dispatch doesn't really work out at the moment, and hopefully I'll just be able to resolve all the failures so we never have to do this at all.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
42fc60cadcea920e9d67581de133a47effcc8441 |
|
03-Sep-2010 |
Eric Anholt <eric@anholt.net> |
i965: Clean up fs_reg setup by using a helper for constructors.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1fcb5a9858b7513c5130006933edc224b69be82d |
|
29-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for loops to the new FS backend. This includes a handy little safety check to prevent the loop from going "too long", as permitted by the spec. I haven't gone out of my way to test it, though… Fixes 20 more piglit tests.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
b0a933a4d91c47e697459921073f8afe668bac31 |
|
29-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add "discard" support to the new FS backend. Fixes 3 testcases related to discard.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4ff25c2106fb981334bdc1b032fcf37d8753ba62 |
|
29-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix the new implementation of ir_unop_sign to match brw_wm_emit.c Like the comparison operations, this suffered from CMP only setting the low bit. Doing the AND instructions would be the same instruction count as the more obvious conditional moves, so do cond moves. Fixes glsl-fs-sign and 6 other cases, like trig functions that use sign() internally.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
40aadafa91ef5b931436d400fedafd720d59deff |
|
29-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for texturing with bias to i965 FS backend. Fixes 5 piglit tests for bias. Note that LOD is a 1.30 feature and not yet supported.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
735af3959f4a4eb5940835c5a4117a020f103414 |
|
28-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add initial support for texturing to the new FS backend. Fixes 11 piglit tests.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3d4597f9d4c93d285825d5a6505d4ee7ce6e2c3e |
|
29-Aug-2010 |
Cedric Vivier <cedricv@neonux.com> |
i965: Move libdrm/C++ hack introduced in fa2deb3d to intel_context.h Fixes build on Linux/GCC 4.4 as libdrm includes are also used by other brw_fs_*.cpp files. Bug #29855
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
d20c2766182b632fba296eff7328bf14c802096e |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Don't strip negate/abs flags when assigning uniform locations. Fixes glsl-algebraic-sub-zero-4.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
f0aa2d6118b1af7434b7551227cd72c588568e65 |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add missing handling for BRW_OPCODE_SEL. Fixes 4 piglit tests about min, max, and clamp.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
38d01c5b272d28a805e7598bad2f2ef5c8da732a |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Mask out higher bits of the result of BRW_CMP producing a boolean. When it says it sets the LSB, that's not just a hint as to where the result goes. Only the LSB is modified. Fixes 20 piglit cases.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
4229a93cc756b3ade02dcf93d806610f95497ad3 |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix the types of immediate integer values. When we're trying to do integer ops, handing a float in doesn't help.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
41e75cde2605e62ab691fd725a8a7259f40f5122 |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add translation for RNDD and RNDZ. Fixes: glsl-fs-any. glsl1-integer division with uniform var
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
31c9f468f35637ce3b82e59a43c49c949d59ee9e |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for ir_binop_mod using do_mod_to_fract. Fixes glsl-fs-mod.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
53290900db2f13fd9ab56b8f9780fa309d31780f |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix swapped instructions in ir_unop_abs and ir_unop_neg. Fixes glsl-fs-neg and 5 other tests.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
2776ad2641469d3bdb6f53b99fbd748efd277c51 |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add generate() handling for AND, OR, XOR. 10 more piglit tests pass.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
130368f910a806a12287c7561df7dddd0fc8be40 |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for if instructions in the new FS backend. 20 more piglit tests pass.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a0ffee2cd79deb5a437784e25de6512d7f8e6bb8 |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: When encountering an unknown opcode in new FS backend, print its name.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
40932c1752b0fa918d764e3367f5ab450033304a |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix the maximum grf counting in the new FS backend. glsl-algebraic-rcp-rsq managed to use 33 registers, and we claimed to only use 32, so the write to g32 would go stomping over the precious g0 of some other thread.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
166b3fa29d4b5af8d4e8c410ed71e4348b65bbd9 |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Validate the IR tree after doing our custom optimization passes. This wouldn't catch the last failure fixed in them, because we don't validate assignments well (due to the fact that we've got a pretty glaring inconsistency in how we handle assignment writemasking), but it could catch other failure we may produce.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
55ced3367543994bd21b48326c64edb743001145 |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add a bit of support for matrices to the new FS.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
91a037b5e1374fe0574480a579bd36c71b75f9c2 |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix destination writemasking in the new FS.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a4d97d3726046fca66f3dbcfbe7b276c5eb80b3b |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add preliminary support for uniforms to the new FS backend. +269 piglits
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3dff682b6595c8771655307ed00bd8844f22238c |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Abort on gl_FragDepth in the new FS backend for now. It hangs the GPU due to FB_WRITE handling being incomplete. There are bigger issues to handle first.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
1a3de23509b8170ee87223dc63e992e195a04de5 |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Fix up and actually enable the NewShader and NewShaderProgram hooks.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
fa2deb3ddc8dc9e3eedf7f3dc1d2d2945a95f79b |
|
27-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Hack in avoidance of c++ reserved keyword in libdrm. I'm also fixing this upstream in libdrm, but this avoids new libdrm dependency for the moment.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
363d0f6774b4c6b825f5b903284da1cd51a91986 |
|
26-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add GLSL IR-level source annotation and comments to new FS debug. This should make debugging way easier, as now we have context for reading large programs.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
7268bd82f60b1c9642a48dcfff6d77b2897222cd |
|
26-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Use the implied move in brw_math() in the new FS.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
e85f8272d0757989aeab650fbf929b382d671492 |
|
17-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add support for in varyings to the new FS codegen. At least some tests, like glsl-vs-sign, now work.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
dcb7c0009bf0a1e0c4fb1aae4b7b07efcc0ed173 |
|
16-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Start building the codegen visitor. This can successfully emit a real program that generates magenta now.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
9763d0a82a1ee605a8794f199d432824fb972b6a |
|
26-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Start building direct GLSL2 IR to 965 assembly codegen. Our channel-expressions and vector-splitting changes now happen into a private copy of the IR that we maintain for ourselves. Uniform assignment still happens by the core, so we continue using Mesa IR generation not just for swrast fallbacks but also for uniform values (since there's no storage for their contents other than shader_program->FragmentProgram->Parameters->ParameterValues). And most importantly, at the moment no actual codegen is hooked up other than emitting our favorite color to the framebuffer.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
c1dfdcb93a8991788032d4906c5bf1a5b48cdc48 |
|
26-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add new pass to split vectors into scalar variables Combined with the previous pass, this lets other optimization passes do their work thanks to ir_tree_grafting. Still have regression in instruction count with INTEL_NEW_FS, but register count is even better.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
3a8ad33dde2f059b82ebf09f5cffa66c86f2e734 |
|
13-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Add a pass for the FS to reduce vector expressions down to scalar. This is a step towards implementing a GLSL IR backend for the 965 fragment shader. Because it has downsides with the current codegen, it is hidden under the environment variable INTEL_NEW_FS. This results in an increase in instruction count at the moment (1444 -> 1752 for glsl-fs-raytrace, 345 -> 359 on my demo), because dot products are turned into a series of multiplies and adds instead of a custom expansion of MULs and MACs, and by not splitting the variable types up we don't get tree grafting and thus there are extra moves of temporary storage. However, register count drops for the non-GLSL path (64 -> 56 on my demo shader) because the register allocator sees all the sub-operations.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|
a1bebf73dfdaf2cd23286aa74271b87166589901 |
|
11-Aug-2010 |
Eric Anholt <eric@anholt.net> |
i965: Start building 965 FS backend.
/external/mesa3d/src/mesa/drivers/dri/i965/brw_fs.cpp
|