3610b1466d573983d80e3019e8e01ebb97d67d9c |
|
02-Apr-2016 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: we can't load local memory directly into an output This fixes piglit tests like tests/spec/glsl-1.10/execution/variable-indexing/vs-output-array-float-index-wr.shader_test and related ones. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
6eeb284e4f74a2fe5ae6cba90f97f219935e24df |
|
19-Mar-2016 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: normalize cube coordinates after derivatives have been computed In "manual" derivative mode (always used on nv50 and sometimes on nvc0 but always for cube), the idea is that using the quadop instruction, we set up the "other" quads to have values such that the derivatives work out, and then run the texture instruction as if nothing were strange. It pulls values from the other lanes, and does its magic. However cube coordinates have to be normalized - one of the 3 coords has to be 1, to determine which is the major axis, to say which face is being sampled. We were normalizing the coordinates first, and then adding the derivatives. This is wrong for two reasons: - the coordinates got normalized by a scaling factor but the derivatives didn't - the result of the addition didn't end up normalized To resolve this, we flip the logic around to normalize *after* the per-lane coordinates are set up. This fixes a bunch of textureGrad cube dEQP tests. NOTE: nv50 cube arrays with explicit derivatives are still broken, to be resolved at a later date. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
d2445b00837c9123b59a1ac743c136546f334504 |
|
19-Mar-2016 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: force-enable derivatives on TXD ops This matters especially in vertex shaders, where derivatives are disabled by default. This fixes textureGrad in vertex shaders on nv50. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
d86933e6f42b9c2f5bb617c66c91795c560a9abd |
|
15-Mar-2016 |
Samuel Pitoiset <samuel.pitoiset@gmail.com> |
nv50,nvc0: replace resInfoCBSlot by auxCBSlot Having two different variables for the driver constant buffer slot is confusing and really useless. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> Acked-by: Pierre Moreau <pierre.morrow@free.fr>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
c1e4a6bfbf015801c6a8b0ae694482421a22c2d9 |
|
13-Mar-2016 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50,nvc0: handle SQRT lowering inside the driver First off, st/mesa lowers DSQRT incorrectly (it uses CMP to attempt to find out whether the input is less than 0). Secondly the current approach (x * rsq(x)) behaves poorly for x = inf - a NaN is produced instead of inf. Instead we switch to the less accurate rcp(rsq(x)) method - this behaves nicely for all valid inputs. We still don't do this for DSQRT since the RSQ/RCP ops are *really* inaccurate, and don't even have Newton-Raphson steps right now. Eventually we should have a separate library function for DSQRT that does it more precisely (and perhaps move this lowering to the post-opt phase). This fixes a number of dEQP precision tests that were expecting better behavior for infinite inputs. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
dbca0f3eba632125904ded6298a87fefdde66d76 |
|
11-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: manually optimize multiplication expansion logic The conversion of 32-bit integer multiplies into 16-bit ones happens after the regular optimization loop. However it's fairly common to multiply by a small integer, rendering some of the expansion pointless. Firstly, propagate immediates when possible into mul ops, secondly just remove the ops when they are unnecessary. Including the change to generate imad immediates, the effect is: total instructions in shared programs : 6365463 -> 6351898 (-0.21%) total gprs used in shared programs : 728684 -> 728684 (0.00%) total local used in shared programs : 9904 -> 9904 (0.00%) total bytes used in shared programs : 44001576 -> 44036120 (0.08%) local gpr inst bytes helped 0 0 3288 4 hurt 0 0 0 842 It's easy for this to hurt bytes since we end up always generating the 8-byte form, while we can't always get rid of the immediate in question. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
69e8b476d07544d6ef06414a1a78ce5c04761fdb |
|
09-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: fix texture grad for cubemaps We were ignoring the partial derivatives on the last dim. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
a27548400ea02c39b6602526eb697c673c7d22bb |
|
09-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: fix assumption that prog->maxGPR is in 32-bit reg units On NV50, we use 16-bit reg units (to make it all work with half-regs). A few places assumed that it was always in 32-bit units. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
f920f8eb026d39c0adb547a90399e76b8351fec6 |
|
09-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: fix cutoff for using r63 vs r127 when replacing zero The only effect here is a space savings - 822 programs in shader-db affected with the following overall change: total bytes used in shared programs : 44154976 -> 44139880 (-0.03%) Fixes: 641eda0c (nv50/ir: r63 is only 0 if we are using less than 63 registers) Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
204f803ce0e47720d072603fec8a2acde6993fed |
|
04-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: replace zeros in movs as well The original change to put zeroes directly into instructions created conditional mov's with the zero immediate. However that can't be emitted, so make sure to replace the zero with r63. Fixes: 52a800a68 (nv50/ir: allow immediate 0 to be loaded anywhere) Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
101e315cc167b0b00319aa70f64c49470e2525f8 |
|
03-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: don't forget to mark flagsDef on cvt in txb lowering Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
af218217d71152df8562b7f087086197f28080fe |
|
08-Nov-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: only take abs value when computing high result Not reachable from TGSI since it only has UMUL, no IMUL. However it's surprising that setting argument types to s32 will cause sign to get lost. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
4294db90b1804dd213b0b4b3ff4eb46a5c390c76 |
|
11-Sep-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: add support for TXQS tgsi opcode Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
641eda0c792e10c2792730b1833353564479a557 |
|
10-Sep-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: r63 is only 0 if we are using less than 63 registers It is advantageous to use r63 instead of r127 since r63 can fit into the shorter encoding. However if we've RA'd over 63 registers, we must use r127 as the replacement instead. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
5dcb28c3d26828ed1b0e2bd5a0589c5baab04b85 |
|
01-Jul-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: copy joinAt when splitting both before and after The current implementation only moves the joinAt when splitting after the given instruction, not before it. So if you have a BB with foo instr bar joinat and thus with joinAt set, we end up first splitting before instr, at which point the instr's bb is updated to the new bb. Since that bb doesn't have a joinAt set (despite containing one), when splitting after the instr, there is nothing to copy over. Since the joinat will be in the "split" bb irrespective of whether we're splitting before or after the instruction, move it over in either case. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91124 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
2e42deb29c878fb4c52aed6d2d54833aacba18ae |
|
06-Jun-2015 |
Jürgen Rühle <j-r@online.de> |
nv50/ir: OP_JOIN is a flow instruction OP_JOIN instructions are assumed to be flow instructions and mercilessly casted to FlowInstruction. This patch fixes an instance where an OP_JOIN is created as a plain instruction. This can cause crashes in the ir printer. [imirkin: add ->fixed = 1] Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
354206f407fffd5f0b553dcbcc46b178d0b22c47 |
|
05-Jan-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: change the way float face is returned The old way made it impossible for the optimizer to reason about what was going on. The new way is the same number of instructions (the neg gets folded into the cvt) but enables the optimizer to be cleverer if comparing to a constant (most common case). [The optimizer is presently not sufficiently clever to work this out, but it could relatively easily be made to be. The old way would have required significant complexity to work out.] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
fb1afd1ea5fd25d82c75c5c3a2aba0bcb53b6d47 |
|
05-Jan-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: fix texture offsets in release builds assert's get compiled out in release builds, so they can't be relied upon to perform logic. Reported-by: Pierre Moreau <pierre.morrow@free.fr> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Roy Spliet <rspliet@eclipso.eu> Cc: "10.2 10.3 10.4" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
1065aa92f4e448fbfe47c074f08dded1933a7f1f |
|
05-Jul-2014 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: ignore bias for samplerCubeShadow on nv50 Unfortunately there's no good way to do this on the nv50 shader isa. Dropping the bias seems preferable to doing the compare post-filtering. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
452a4151aa1eebbc12b621a465fc452fdb95a08b |
|
12-Jun-2013 |
Christoph Bumiller <e0425955@student.tuwien.ac.at> |
nv50/ir: fix lowering of predicated instructions (without defs) Note that predicated instructions with defs are still not supported because transformation to SSA doesn't handle them yet. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
d3a5cf052c38087b395871b5b46776e2a7d4a7d7 |
|
15-May-2014 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: fix s32 x s32 -> high s32 multiply logic Retrieving the high 32 bits of a signed multiply is rather annoying. It appears that the simplest way to do this is to compute the absolute value of the arguments, and perform a u32 x u32 -> u64 operation. If the arguments' signs differ, then negate the result. Since there is no u64 support in the cvt instruction, we have the perform the 2's complement negation "by hand". This logic can come into use by the IMUL_HI instruction (very unlikely to be seen), as well as from constant folding of division by a constant. Fixes dolphin's divisions by 255. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
5b8f1a0f7c5b1412577a913d374192a2329fa615 |
|
13-May-2014 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: fix integer mul lowering for u32 x u32 -> high u32 UNION appears to expect that all of its sources are conditionally defined. Otherwise it inserts an unpredicated mov instruction which overwrites the desired result. This fixes tests that use UMUL_HI, and much less directly, unsigned integer division by a constant, which uses this functionality in a peephole pass. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
863573b9cbeb26722fe7bdfbcc4ca6bffdc7dbf6 |
|
10-May-2014 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50: fix setting of texture ms info to be per-stage Different textures may be bound to each slot for each stage. So we need to be able to upload ms parameters for each one without stages overwriting each other. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Ben Skeggs <bskeggs@redhat.com> Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
f3aa999383074d666d6e3f3506e66b0c937904ca |
|
26-Apr-2014 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: change texture offsets to ValueRefs, allow nonconst This allows us to have non-constant offsets for textureGatherOffset and textureGatherOffsets. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
f715a0a39a0f7f19443e7721ae792878ba504eed |
|
31-Mar-2014 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50: add support for PIPE_CAP_SAMPLE_SHADING Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
d5faf8e78603a27dbedb2e9e28b58b1b2bc32858 |
|
26-Feb-2014 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50: enable texture query lod Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
0e71c65db0df86401f2caf26209ff73e3715443a |
|
07-Feb-2014 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50: enable cube map array texture support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
3bd40073b9803baf62f77ed5ac79979e037d2ed6 |
|
12-Jan-2014 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50: add support for texelFetch'ing MS textures, ARB_texture_multisample Creates two areas in the AUX constbuf: - Sample offsets for MS textures - Per-texture MS settings When executing a texelFetch with a MS sampler, looks up that texture's settings and adjusts the parameters given to the texfetch instruction. With this change, all the ARB_texture_multisample piglits pass, so turn on PIPE_CAP_TEXTURE_MULTISAMPLE. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
b3f82e1a63e8a58f0e7ac297fc5e94ebe76c3339 |
|
17-Apr-2013 |
Bryan Cain <bryancain3@gmail.com> |
nv50/ir: delay calculation of indirect addresses Instead of emitting an SHL 4 io an address register on the TGSI ARL and UARL instructions, emit the shift when the loaded address is actually used. This is necessary because input vertex and attribute indices in geometry shaders on nv50 need to be shifted left by 2 instead of 4. Signed-off-by: Bryan Cain <bryancain3@gmail.com> [calim: various updates to the indirect address logic] Signed-off-by: Christoph Bumiller <e0425955@student.tuwien.ac.at> [imirkin: remove OP_MAD change that calim made, add OP_RESTART handling same as OP_EMIT for code flow analysis] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
1386cb94882917e6eabc5b482ab8b443a2f1df51 |
|
29-Nov-2013 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50: TXF already has integer arguments, don't try to convert from f32 Fixes the texelFetch piglit tests Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
bbe3d6dc29f218e4d790e5ea359d3c6736e94226 |
|
09-Sep-2013 |
Dave Airlie <airlied@gmail.com> |
nouveau: fix regression since float comparison instructions (v2) Fix the return type and allow src and dst types for comparison to be separate, this at least fixes the two test cases I've written. v2: drop the u32->s32 change Acked-by: Christoph Bumiller <christoph.bumiller@speed.at> Signed-off-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|
5eb7ff1175a644ffe3b0f1a75cb235400355f9fb |
|
20-Aug-2013 |
Johannes Obermayr <johannesobermayr@gmx.de> |
Move nv30, nv50 and nvc0 to nouveau. It is planned to ship openSUSE 13.1 with -shared libs. nouveau.la, nv30.la, nv50.la and nvc0.la are currently LIBADDs in all nouveau related targets. This change makes it possible to easily build one shared libnouveau.so which is then LIBADDed. Also dlopen will be faster for one library instead of three and build time on -jX will be reduced. Whitespace fixes were requested by 'git am'. Signed-off-by: Johannes Obermayr <johannesobermayr@gmx.de> Acked-by: Christoph Bumiller <christoph.bumiller@speed.at> Acked-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
|