History log of /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
3610b1466d573983d80e3019e8e01ebb97d67d9c 02-Apr-2016 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: we can't load local memory directly into an output

This fixes piglit tests like

tests/spec/glsl-1.10/execution/variable-indexing/vs-output-array-float-index-wr.shader_test

and related ones.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
6eeb284e4f74a2fe5ae6cba90f97f219935e24df 19-Mar-2016 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: normalize cube coordinates after derivatives have been computed

In "manual" derivative mode (always used on nv50 and sometimes on nvc0
but always for cube), the idea is that using the quadop instruction, we
set up the "other" quads to have values such that the derivatives work
out, and then run the texture instruction as if nothing were strange. It
pulls values from the other lanes, and does its magic.

However cube coordinates have to be normalized - one of the 3 coords has
to be 1, to determine which is the major axis, to say which face is
being sampled. We were normalizing the coordinates first, and then
adding the derivatives. This is wrong for two reasons:

- the coordinates got normalized by a scaling factor but the
derivatives didn't
- the result of the addition didn't end up normalized

To resolve this, we flip the logic around to normalize *after* the
per-lane coordinates are set up.

This fixes a bunch of textureGrad cube dEQP tests.

NOTE: nv50 cube arrays with explicit derivatives are still broken, to be
resolved at a later date.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
d2445b00837c9123b59a1ac743c136546f334504 19-Mar-2016 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: force-enable derivatives on TXD ops

This matters especially in vertex shaders, where derivatives are
disabled by default. This fixes textureGrad in vertex shaders on nv50.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
d86933e6f42b9c2f5bb617c66c91795c560a9abd 15-Mar-2016 Samuel Pitoiset <samuel.pitoiset@gmail.com> nv50,nvc0: replace resInfoCBSlot by auxCBSlot

Having two different variables for the driver constant buffer slot
is confusing and really useless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Pierre Moreau <pierre.morrow@free.fr>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
c1e4a6bfbf015801c6a8b0ae694482421a22c2d9 13-Mar-2016 Ilia Mirkin <imirkin@alum.mit.edu> nv50,nvc0: handle SQRT lowering inside the driver

First off, st/mesa lowers DSQRT incorrectly (it uses CMP to attempt to
find out whether the input is less than 0). Secondly the current
approach (x * rsq(x)) behaves poorly for x = inf - a NaN is produced
instead of inf.

Instead we switch to the less accurate rcp(rsq(x)) method - this behaves
nicely for all valid inputs. We still don't do this for DSQRT since the
RSQ/RCP ops are *really* inaccurate, and don't even have Newton-Raphson
steps right now. Eventually we should have a separate library function
for DSQRT that does it more precisely (and perhaps move this lowering to
the post-opt phase).

This fixes a number of dEQP precision tests that were expecting better
behavior for infinite inputs.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
dbca0f3eba632125904ded6298a87fefdde66d76 11-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: manually optimize multiplication expansion logic

The conversion of 32-bit integer multiplies into 16-bit ones happens
after the regular optimization loop. However it's fairly common to
multiply by a small integer, rendering some of the expansion pointless.

Firstly, propagate immediates when possible into mul ops, secondly just
remove the ops when they are unnecessary.

Including the change to generate imad immediates, the effect is:

total instructions in shared programs : 6365463 -> 6351898 (-0.21%)
total gprs used in shared programs : 728684 -> 728684 (0.00%)
total local used in shared programs : 9904 -> 9904 (0.00%)
total bytes used in shared programs : 44001576 -> 44036120 (0.08%)

local gpr inst bytes
helped 0 0 3288 4
hurt 0 0 0 842

It's easy for this to hurt bytes since we end up always generating the
8-byte form, while we can't always get rid of the immediate in question.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
69e8b476d07544d6ef06414a1a78ce5c04761fdb 09-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: fix texture grad for cubemaps

We were ignoring the partial derivatives on the last dim.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
a27548400ea02c39b6602526eb697c673c7d22bb 09-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: fix assumption that prog->maxGPR is in 32-bit reg units

On NV50, we use 16-bit reg units (to make it all work with half-regs). A
few places assumed that it was always in 32-bit units.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
f920f8eb026d39c0adb547a90399e76b8351fec6 09-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: fix cutoff for using r63 vs r127 when replacing zero

The only effect here is a space savings - 822 programs in shader-db
affected with the following overall change:

total bytes used in shared programs : 44154976 -> 44139880 (-0.03%)

Fixes: 641eda0c (nv50/ir: r63 is only 0 if we are using less than 63 registers)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
204f803ce0e47720d072603fec8a2acde6993fed 04-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: replace zeros in movs as well

The original change to put zeroes directly into instructions created
conditional mov's with the zero immediate. However that can't be
emitted, so make sure to replace the zero with r63.

Fixes: 52a800a68 (nv50/ir: allow immediate 0 to be loaded anywhere)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
101e315cc167b0b00319aa70f64c49470e2525f8 03-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: don't forget to mark flagsDef on cvt in txb lowering

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
af218217d71152df8562b7f087086197f28080fe 08-Nov-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: only take abs value when computing high result

Not reachable from TGSI since it only has UMUL, no IMUL. However it's
surprising that setting argument types to s32 will cause sign to get
lost.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
4294db90b1804dd213b0b4b3ff4eb46a5c390c76 11-Sep-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: add support for TXQS tgsi opcode

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
641eda0c792e10c2792730b1833353564479a557 10-Sep-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: r63 is only 0 if we are using less than 63 registers

It is advantageous to use r63 instead of r127 since r63 can fit into the
shorter encoding. However if we've RA'd over 63 registers, we must use
r127 as the replacement instead.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
5dcb28c3d26828ed1b0e2bd5a0589c5baab04b85 01-Jul-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: copy joinAt when splitting both before and after

The current implementation only moves the joinAt when splitting after
the given instruction, not before it. So if you have a BB with

foo
instr
bar
joinat

and thus with joinAt set, we end up first splitting before instr, at
which point the instr's bb is updated to the new bb. Since that bb
doesn't have a joinAt set (despite containing one), when splitting after
the instr, there is nothing to copy over. Since the joinat will be in
the "split" bb irrespective of whether we're splitting before or after
the instruction, move it over in either case.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91124
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
2e42deb29c878fb4c52aed6d2d54833aacba18ae 06-Jun-2015 Jürgen Rühle <j-r@online.de> nv50/ir: OP_JOIN is a flow instruction

OP_JOIN instructions are assumed to be flow instructions and mercilessly
casted to FlowInstruction.

This patch fixes an instance where an OP_JOIN is created as a plain
instruction. This can cause crashes in the ir printer.

[imirkin: add ->fixed = 1]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
354206f407fffd5f0b553dcbcc46b178d0b22c47 05-Jan-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: change the way float face is returned

The old way made it impossible for the optimizer to reason about what
was going on. The new way is the same number of instructions (the neg
gets folded into the cvt) but enables the optimizer to be cleverer if
comparing to a constant (most common case). [The optimizer is presently
not sufficiently clever to work this out, but it could relatively easily
be made to be. The old way would have required significant complexity to
work out.]

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
fb1afd1ea5fd25d82c75c5c3a2aba0bcb53b6d47 05-Jan-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: fix texture offsets in release builds

assert's get compiled out in release builds, so they can't be relied
upon to perform logic.

Reported-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Roy Spliet <rspliet@eclipso.eu>
Cc: "10.2 10.3 10.4" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
1065aa92f4e448fbfe47c074f08dded1933a7f1f 05-Jul-2014 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: ignore bias for samplerCubeShadow on nv50

Unfortunately there's no good way to do this on the nv50 shader isa.
Dropping the bias seems preferable to doing the compare post-filtering.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
452a4151aa1eebbc12b621a465fc452fdb95a08b 12-Jun-2013 Christoph Bumiller <e0425955@student.tuwien.ac.at> nv50/ir: fix lowering of predicated instructions (without defs)

Note that predicated instructions with defs are still not supported
because transformation to SSA doesn't handle them yet.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
d3a5cf052c38087b395871b5b46776e2a7d4a7d7 15-May-2014 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: fix s32 x s32 -> high s32 multiply logic

Retrieving the high 32 bits of a signed multiply is rather annoying. It
appears that the simplest way to do this is to compute the absolute
value of the arguments, and perform a u32 x u32 -> u64 operation. If the
arguments' signs differ, then negate the result. Since there is no u64
support in the cvt instruction, we have the perform the 2's complement
negation "by hand".

This logic can come into use by the IMUL_HI instruction (very unlikely
to be seen), as well as from constant folding of division by a constant.
Fixes dolphin's divisions by 255.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
5b8f1a0f7c5b1412577a913d374192a2329fa615 13-May-2014 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: fix integer mul lowering for u32 x u32 -> high u32

UNION appears to expect that all of its sources are conditionally
defined. Otherwise it inserts an unpredicated mov instruction which
overwrites the desired result. This fixes tests that use UMUL_HI, and
much less directly, unsigned integer division by a constant, which uses
this functionality in a peephole pass.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
863573b9cbeb26722fe7bdfbcc4ca6bffdc7dbf6 10-May-2014 Ilia Mirkin <imirkin@alum.mit.edu> nv50: fix setting of texture ms info to be per-stage

Different textures may be bound to each slot for each stage. So we need
to be able to upload ms parameters for each one without stages
overwriting each other.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
f3aa999383074d666d6e3f3506e66b0c937904ca 26-Apr-2014 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: change texture offsets to ValueRefs, allow nonconst

This allows us to have non-constant offsets for textureGatherOffset and
textureGatherOffsets.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
f715a0a39a0f7f19443e7721ae792878ba504eed 31-Mar-2014 Ilia Mirkin <imirkin@alum.mit.edu> nv50: add support for PIPE_CAP_SAMPLE_SHADING

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
d5faf8e78603a27dbedb2e9e28b58b1b2bc32858 26-Feb-2014 Ilia Mirkin <imirkin@alum.mit.edu> nv50: enable texture query lod

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
0e71c65db0df86401f2caf26209ff73e3715443a 07-Feb-2014 Ilia Mirkin <imirkin@alum.mit.edu> nv50: enable cube map array texture support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
3bd40073b9803baf62f77ed5ac79979e037d2ed6 12-Jan-2014 Ilia Mirkin <imirkin@alum.mit.edu> nv50: add support for texelFetch'ing MS textures, ARB_texture_multisample

Creates two areas in the AUX constbuf:
- Sample offsets for MS textures
- Per-texture MS settings

When executing a texelFetch with a MS sampler, looks up that texture's
settings and adjusts the parameters given to the texfetch instruction.

With this change, all the ARB_texture_multisample piglits pass, so turn
on PIPE_CAP_TEXTURE_MULTISAMPLE.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
b3f82e1a63e8a58f0e7ac297fc5e94ebe76c3339 17-Apr-2013 Bryan Cain <bryancain3@gmail.com> nv50/ir: delay calculation of indirect addresses

Instead of emitting an SHL 4 io an address register on the TGSI ARL and UARL
instructions, emit the shift when the loaded address is actually used. This
is necessary because input vertex and attribute indices in geometry shaders on
nv50 need to be shifted left by 2 instead of 4.

Signed-off-by: Bryan Cain <bryancain3@gmail.com>
[calim: various updates to the indirect address logic]
Signed-off-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
[imirkin: remove OP_MAD change that calim made, add OP_RESTART handling
same as OP_EMIT for code flow analysis]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
1386cb94882917e6eabc5b482ab8b443a2f1df51 29-Nov-2013 Ilia Mirkin <imirkin@alum.mit.edu> nv50: TXF already has integer arguments, don't try to convert from f32

Fixes the texelFetch piglit tests

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
bbe3d6dc29f218e4d790e5ea359d3c6736e94226 09-Sep-2013 Dave Airlie <airlied@gmail.com> nouveau: fix regression since float comparison instructions (v2)

Fix the return type and allow src and dst types for comparison
to be separate, this at least fixes the two test cases I've written.

v2: drop the u32->s32 change

Acked-by: Christoph Bumiller <christoph.bumiller@speed.at>
Signed-off-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
5eb7ff1175a644ffe3b0f1a75cb235400355f9fb 20-Aug-2013 Johannes Obermayr <johannesobermayr@gmx.de> Move nv30, nv50 and nvc0 to nouveau.

It is planned to ship openSUSE 13.1 with -shared libs.
nouveau.la, nv30.la, nv50.la and nvc0.la are currently LIBADDs in all nouveau
related targets.
This change makes it possible to easily build one shared libnouveau.so which is
then LIBADDed.
Also dlopen will be faster for one library instead of three and build time on
-jX will be reduced.

Whitespace fixes were requested by 'git am'.

Signed-off-by: Johannes Obermayr <johannesobermayr@gmx.de>
Acked-by: Christoph Bumiller <christoph.bumiller@speed.at>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp