Cross Reference: /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering

Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
3610b1466d573983d80e3019e8e01ebb97d67d9c	02-Apr-2016	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: we can't load local memory directly into an output This fixes piglit tests like tests/spec/glsl-1.10/execution/variable-indexing/vs-output-array-float-index-wr.shader_test and related ones. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
6eeb284e4f74a2fe5ae6cba90f97f219935e24df	19-Mar-2016	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: normalize cube coordinates after derivatives have been computed In "manual" derivative mode (always used on nv50 and sometimes on nvc0 but always for cube), the idea is that using the quadop instruction, we set up the "other" quads to have values such that the derivatives work out, and then run the texture instruction as if nothing were strange. It pulls values from the other lanes, and does its magic. However cube coordinates have to be normalized - one of the 3 coords has to be 1, to determine which is the major axis, to say which face is being sampled. We were normalizing the coordinates first, and then adding the derivatives. This is wrong for two reasons: - the coordinates got normalized by a scaling factor but the derivatives didn't - the result of the addition didn't end up normalized To resolve this, we flip the logic around to normalize after the per-lane coordinates are set up. This fixes a bunch of textureGrad cube dEQP tests. NOTE: nv50 cube arrays with explicit derivatives are still broken, to be resolved at a later date. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
d2445b00837c9123b59a1ac743c136546f334504	19-Mar-2016	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: force-enable derivatives on TXD ops This matters especially in vertex shaders, where derivatives are disabled by default. This fixes textureGrad in vertex shaders on nv50. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
d86933e6f42b9c2f5bb617c66c91795c560a9abd	15-Mar-2016	Samuel Pitoiset <samuel.pitoiset@gmail.com>	nv50,nvc0: replace resInfoCBSlot by auxCBSlot Having two different variables for the driver constant buffer slot is confusing and really useless. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> Acked-by: Pierre Moreau <pierre.morrow@free.fr> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
c1e4a6bfbf015801c6a8b0ae694482421a22c2d9	13-Mar-2016	Ilia Mirkin <imirkin@alum.mit.edu>	nv50,nvc0: handle SQRT lowering inside the driver First off, st/mesa lowers DSQRT incorrectly (it uses CMP to attempt to find out whether the input is less than 0). Secondly the current approach (x * rsq(x)) behaves poorly for x = inf - a NaN is produced instead of inf. Instead we switch to the less accurate rcp(rsq(x)) method - this behaves nicely for all valid inputs. We still don't do this for DSQRT since the RSQ/RCP ops are really inaccurate, and don't even have Newton-Raphson steps right now. Eventually we should have a separate library function for DSQRT that does it more precisely (and perhaps move this lowering to the post-opt phase). This fixes a number of dEQP precision tests that were expecting better behavior for infinite inputs. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
dbca0f3eba632125904ded6298a87fefdde66d76	11-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: manually optimize multiplication expansion logic The conversion of 32-bit integer multiplies into 16-bit ones happens after the regular optimization loop. However it's fairly common to multiply by a small integer, rendering some of the expansion pointless. Firstly, propagate immediates when possible into mul ops, secondly just remove the ops when they are unnecessary. Including the change to generate imad immediates, the effect is: total instructions in shared programs : 6365463 -> 6351898 (-0.21%) total gprs used in shared programs : 728684 -> 728684 (0.00%) total local used in shared programs : 9904 -> 9904 (0.00%) total bytes used in shared programs : 44001576 -> 44036120 (0.08%) local gpr inst bytes helped 0 0 3288 4 hurt 0 0 0 842 It's easy for this to hurt bytes since we end up always generating the 8-byte form, while we can't always get rid of the immediate in question. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
69e8b476d07544d6ef06414a1a78ce5c04761fdb	09-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: fix texture grad for cubemaps We were ignoring the partial derivatives on the last dim. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
a27548400ea02c39b6602526eb697c673c7d22bb	09-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: fix assumption that prog->maxGPR is in 32-bit reg units On NV50, we use 16-bit reg units (to make it all work with half-regs). A few places assumed that it was always in 32-bit units. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
f920f8eb026d39c0adb547a90399e76b8351fec6	09-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: fix cutoff for using r63 vs r127 when replacing zero The only effect here is a space savings - 822 programs in shader-db affected with the following overall change: total bytes used in shared programs : 44154976 -> 44139880 (-0.03%) Fixes: 641eda0c (nv50/ir: r63 is only 0 if we are using less than 63 registers) Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
204f803ce0e47720d072603fec8a2acde6993fed	04-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: replace zeros in movs as well The original change to put zeroes directly into instructions created conditional mov's with the zero immediate. However that can't be emitted, so make sure to replace the zero with r63. Fixes: 52a800a68 (nv50/ir: allow immediate 0 to be loaded anywhere) Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
101e315cc167b0b00319aa70f64c49470e2525f8	03-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: don't forget to mark flagsDef on cvt in txb lowering Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
af218217d71152df8562b7f087086197f28080fe	08-Nov-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: only take abs value when computing high result Not reachable from TGSI since it only has UMUL, no IMUL. However it's surprising that setting argument types to s32 will cause sign to get lost. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
4294db90b1804dd213b0b4b3ff4eb46a5c390c76	11-Sep-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: add support for TXQS tgsi opcode Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
641eda0c792e10c2792730b1833353564479a557	10-Sep-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: r63 is only 0 if we are using less than 63 registers It is advantageous to use r63 instead of r127 since r63 can fit into the shorter encoding. However if we've RA'd over 63 registers, we must use r127 as the replacement instead. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0" <mesa-stable@lists.freedesktop.org> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
5dcb28c3d26828ed1b0e2bd5a0589c5baab04b85	01-Jul-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: copy joinAt when splitting both before and after The current implementation only moves the joinAt when splitting after the given instruction, not before it. So if you have a BB with foo instr bar joinat and thus with joinAt set, we end up first splitting before instr, at which point the instr's bb is updated to the new bb. Since that bb doesn't have a joinAt set (despite containing one), when splitting after the instr, there is nothing to copy over. Since the joinat will be in the "split" bb irrespective of whether we're splitting before or after the instruction, move it over in either case. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91124 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
2e42deb29c878fb4c52aed6d2d54833aacba18ae	06-Jun-2015	Jürgen Rühle <j-r@online.de>	nv50/ir: OP_JOIN is a flow instruction OP_JOIN instructions are assumed to be flow instructions and mercilessly casted to FlowInstruction. This patch fixes an instance where an OP_JOIN is created as a plain instruction. This can cause crashes in the ir printer. [imirkin: add ->fixed = 1] Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
354206f407fffd5f0b553dcbcc46b178d0b22c47	05-Jan-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: change the way float face is returned The old way made it impossible for the optimizer to reason about what was going on. The new way is the same number of instructions (the neg gets folded into the cvt) but enables the optimizer to be cleverer if comparing to a constant (most common case). [The optimizer is presently not sufficiently clever to work this out, but it could relatively easily be made to be. The old way would have required significant complexity to work out.] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
fb1afd1ea5fd25d82c75c5c3a2aba0bcb53b6d47	05-Jan-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: fix texture offsets in release builds assert's get compiled out in release builds, so they can't be relied upon to perform logic. Reported-by: Pierre Moreau <pierre.morrow@free.fr> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Roy Spliet <rspliet@eclipso.eu> Cc: "10.2 10.3 10.4" <mesa-stable@lists.freedesktop.org> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
1065aa92f4e448fbfe47c074f08dded1933a7f1f	05-Jul-2014	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: ignore bias for samplerCubeShadow on nv50 Unfortunately there's no good way to do this on the nv50 shader isa. Dropping the bias seems preferable to doing the compare post-filtering. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: <mesa-stable@lists.freedesktop.org> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
452a4151aa1eebbc12b621a465fc452fdb95a08b	12-Jun-2013	Christoph Bumiller <e0425955@student.tuwien.ac.at>	nv50/ir: fix lowering of predicated instructions (without defs) Note that predicated instructions with defs are still not supported because transformation to SSA doesn't handle them yet. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.2" <mesa-stable@lists.freedesktop.org> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
d3a5cf052c38087b395871b5b46776e2a7d4a7d7	15-May-2014	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: fix s32 x s32 -> high s32 multiply logic Retrieving the high 32 bits of a signed multiply is rather annoying. It appears that the simplest way to do this is to compute the absolute value of the arguments, and perform a u32 x u32 -> u64 operation. If the arguments' signs differ, then negate the result. Since there is no u64 support in the cvt instruction, we have the perform the 2's complement negation "by hand". This logic can come into use by the IMUL_HI instruction (very unlikely to be seen), as well as from constant folding of division by a constant. Fixes dolphin's divisions by 255. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Ben Skeggs <bskeggs@redhat.com> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
5b8f1a0f7c5b1412577a913d374192a2329fa615	13-May-2014	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: fix integer mul lowering for u32 x u32 -> high u32 UNION appears to expect that all of its sources are conditionally defined. Otherwise it inserts an unpredicated mov instruction which overwrites the desired result. This fixes tests that use UMUL_HI, and much less directly, unsigned integer division by a constant, which uses this functionality in a peephole pass. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Ben Skeggs <bskeggs@redhat.com> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
863573b9cbeb26722fe7bdfbcc4ca6bffdc7dbf6	10-May-2014	Ilia Mirkin <imirkin@alum.mit.edu>	nv50: fix setting of texture ms info to be per-stage Different textures may be bound to each slot for each stage. So we need to be able to upload ms parameters for each one without stages overwriting each other. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Ben Skeggs <bskeggs@redhat.com> Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
f3aa999383074d666d6e3f3506e66b0c937904ca	26-Apr-2014	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: change texture offsets to ValueRefs, allow nonconst This allows us to have non-constant offsets for textureGatherOffset and textureGatherOffsets. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
f715a0a39a0f7f19443e7721ae792878ba504eed	31-Mar-2014	Ilia Mirkin <imirkin@alum.mit.edu>	nv50: add support for PIPE_CAP_SAMPLE_SHADING Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
d5faf8e78603a27dbedb2e9e28b58b1b2bc32858	26-Feb-2014	Ilia Mirkin <imirkin@alum.mit.edu>	nv50: enable texture query lod Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
0e71c65db0df86401f2caf26209ff73e3715443a	07-Feb-2014	Ilia Mirkin <imirkin@alum.mit.edu>	nv50: enable cube map array texture support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
3bd40073b9803baf62f77ed5ac79979e037d2ed6	12-Jan-2014	Ilia Mirkin <imirkin@alum.mit.edu>	nv50: add support for texelFetch'ing MS textures, ARB_texture_multisample Creates two areas in the AUX constbuf: - Sample offsets for MS textures - Per-texture MS settings When executing a texelFetch with a MS sampler, looks up that texture's settings and adjusts the parameters given to the texfetch instruction. With this change, all the ARB_texture_multisample piglits pass, so turn on PIPE_CAP_TEXTURE_MULTISAMPLE. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
b3f82e1a63e8a58f0e7ac297fc5e94ebe76c3339	17-Apr-2013	Bryan Cain <bryancain3@gmail.com>	nv50/ir: delay calculation of indirect addresses Instead of emitting an SHL 4 io an address register on the TGSI ARL and UARL instructions, emit the shift when the loaded address is actually used. This is necessary because input vertex and attribute indices in geometry shaders on nv50 need to be shifted left by 2 instead of 4. Signed-off-by: Bryan Cain <bryancain3@gmail.com> [calim: various updates to the indirect address logic] Signed-off-by: Christoph Bumiller <e0425955@student.tuwien.ac.at> [imirkin: remove OP_MAD change that calim made, add OP_RESTART handling same as OP_EMIT for code flow analysis] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
1386cb94882917e6eabc5b482ab8b443a2f1df51	29-Nov-2013	Ilia Mirkin <imirkin@alum.mit.edu>	nv50: TXF already has integer arguments, don't try to convert from f32 Fixes the texelFetch piglit tests Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
bbe3d6dc29f218e4d790e5ea359d3c6736e94226	09-Sep-2013	Dave Airlie <airlied@gmail.com>	nouveau: fix regression since float comparison instructions (v2) Fix the return type and allow src and dst types for comparison to be separate, this at least fixes the two test cases I've written. v2: drop the u32->s32 change Acked-by: Christoph Bumiller <christoph.bumiller@speed.at> Signed-off-by: Dave Airlie <airlied@redhat.com> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
5eb7ff1175a644ffe3b0f1a75cb235400355f9fb	20-Aug-2013	Johannes Obermayr <johannesobermayr@gmx.de>	Move nv30, nv50 and nvc0 to nouveau. It is planned to ship openSUSE 13.1 with -shared libs. nouveau.la, nv30.la, nv50.la and nvc0.la are currently LIBADDs in all nouveau related targets. This change makes it possible to easily build one shared libnouveau.so which is then LIBADDed. Also dlopen will be faster for one library instead of three and build time on -jX will be reduced. Whitespace fixes were requested by 'git am'. Signed-off-by: Johannes Obermayr <johannesobermayr@gmx.de> Acked-by: Christoph Bumiller <christoph.bumiller@speed.at> Acked-by: Ian Romanick <ian.d.romanick@intel.com> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp