History log of /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
19963231a3245358c0e8fdd74c4654761e62b6c8 13-Jan-2017 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: optimize shl + and

Address loading can often end up as shl + shr + shl combinations. The
latter two are equal shifts, which get converted into an and mask.
However if the previous shl is more than the mask is trying to remove
(in terms of low bits), we can just remove the and entirely. This
reduces some large shaders by as many as 3% of instructions (out of 2K).

total instructions in shared programs : 6495509 -> 6491076 (-0.07%)
total gprs used in shared programs : 954621 -> 954623 (0.00%)

local gpr inst bytes
helped 0 0 1014 1014
hurt 0 2 0 0

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
0404678c5f72162c9898c9c94ca67969106227c8 06-Oct-2016 Karol Herbst <karolherbst@gmail.com> nv50/ir: start LocalCSE with getFirst to merge PHI instructions

total instructions in shared programs : 3499888 -> 3499445 (-0.01%)
total gprs used in shared programs : 453866 -> 453803 (-0.01%)
total local used in shared programs : 21621 -> 21621 (0.00%)
total bytes used in shared programs : 32078952 -> 32074936 (-0.01%)

local gpr inst bytes
helped 0 39 119 119
hurt 0 0 0 0

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
7b7eb7170d16ddb0963900ccf59b39956219373c 20-Oct-2016 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: it appears that OP_DISCARD can't take a join modifier

nvdisasm does not print a .S even though the bit is set.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b7d9677de804375827b3c433027ec2dd32cd1da6 30-Sep-2016 Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> nv50/ir: constant fold OP_SPLIT

Split the source immediate value into new values and move them into the
original defs set by the split. Since we can only have up to 64-bit
immediates, this is largely beneficial for F64 (and, in the future, U64)
operations.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
[imirkin: always use U32, set newi for foldCount tracking]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
a6d6eff2e6ea2ccd585fe9bf1e159979cd3047df 12-Oct-2016 Ilia Mirkin <imirkin@alum.mit.edu> nvc0/ir: be more careful about preserving modifiers in SHLADD creation

src2 was being given the wrong modifier, and we were not properly
managing the modifier on the SHL source either.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
87b06cab14c449e442be27650024f044e93c9a7c 07-Oct-2016 Samuel Pitoiset <samuel.pitoiset@gmail.com> nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)

total instructions in shared programs :2286901 -> 2284473 (-0.11%)
total gprs used in shared programs :335256 -> 335273 (0.01%)
total local used in shared programs :31968 -> 31968 (0.00%)

local gpr inst bytes
helped 0 41 852 852
hurt 0 44 23 23

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
28ecd3eac24ce41b8a855a50f366f1985d1dc934 07-Oct-2016 Samuel Pitoiset <samuel.pitoiset@gmail.com> nv50/ir: fix wrong check when optimizing MAD to SHLADD

Checking if MAD is supported is definitely wrong, and it's
more likely a typo I introduced few days ago which breaks
NV50 because SHLADD is not supported there.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
f96945c5b5c3a52685e76795f03f75c75fb62fc7 06-Oct-2016 Karol Herbst <karolherbst@gmail.com> nv50/ir: optimize sub(a, 0) to a

helped some ue4 demos and divinity OS shaders

total instructions in shared programs : 2818674 -> 2818606 (-0.00%)
total gprs used in shared programs : 379273 -> 379273 (0.00%)
total local used in shared programs : 9505 -> 9505 (0.00%)
total bytes used in shared programs : 25837792 -> 25837192 (-0.00%)

local gpr inst bytes
helped 0 0 33 33
hurt 0 0 0 0

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
115c79be10bf3712a1e1bc25a563c90388c1bcaa 14-Sep-2016 Samuel Pitoiset <samuel.pitoiset@gmail.com> nv50/ir: optimize SHLADD(a, b, c) to MOV((a << b) + c)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
2e008be9a9a4c94564c11718e0f6fc029caa0e44 14-Sep-2016 Samuel Pitoiset <samuel.pitoiset@gmail.com> nv50/ir: optimize SHLADD(a, b, 0x0) to SHL(a, b)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
e4eb0fca024babcd7bea2b34a7e7605287963ce0 14-Sep-2016 Samuel Pitoiset <samuel.pitoiset@gmail.com> nv50/ir: optimize IMAD to SHLADD in presence of power of 2

Only and only if src1 is a power of 2 we can replace IMAD by SHLADD.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
557a29b51fa3324cfbeecff100a54c7c6a6d87cd 18-Sep-2016 Samuel Pitoiset <samuel.pitoiset@gmail.com> nv50/ir: optimize SUB(a, b) to MOV(a - b)

This helps shaders in UE4 demos, especially with Elemental
(+1% perf). This optimization reduces spilling usage in one
shader which explains the little gain.

GF100/GK104:

total instructions in shared programs :2838551 -> 2838045 (-0.02%)
total gprs used in shared programs :396706 -> 396684 (-0.01%)
total local used in shared programs :34432 -> 34416 (-0.05%)

local gpr inst bytes
helped 1 19 112 112
hurt 0 0 0 0

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
3f5cf8c488bfc401d1d5503c1ec61874d7c1477d 20-Jul-2016 Samuel Pitoiset <samuel.pitoiset@gmail.com> nv50/ir: allow to swap sources for OP_SUB

This allows the load-propagation pass to swap the sources in presence
of immediate values.

Maxwell (GM107):

total instructions in shared programs :1928187 -> 1927634 (-0.03%)
total gprs used in shared programs :330741 -> 330154 (-0.18%)
total local used in shared programs :28032 -> 28032 (0.00%)

local gpr inst bytes
helped 0 271 425 425
hurt 0 0 194 194

Fermi (GF114):

total instructions in shared programs :2334474 -> 2333829 (-0.03%)
total gprs used in shared programs :380934 -> 380215 (-0.19%)
total local used in shared programs :33304 -> 33264 (-0.12%)

local gpr inst bytes
helped 5 314 521 521
hurt 0 4 195 195

No regressions on GM107 and GF114 with full piglit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
062c6b8e54c14adcc1ec603fad524f38fe058e67 19-Jun-2016 Ilia Mirkin <imirkin@alum.mit.edu> nv50: fix alphatest for non-blendable formats

The hardware can only do alphatest when using a blendable format. This
means that the various *16 norm formats didn't work with alphatest. It
appears that Talos Principle uses such formats, as well as alpha tests,
for some internal renders, which made them be incorrect. However this
does not appear to affect the final renders, but in a different game it
easily could.

The approach we take is that when alphatests are enabled and a suitable
format is used (which we anticipate is the vast minority of the time),
we insert code into the shader to perform the comparison and discard.
Once inserted, that code lives in the shader forever, and we re-upload
it each time the function changes with a fixed-up compare. To avoid
re-uploading too often, if we switch back to a blendable format, the
test is (effectively) disabled and the hw alphatest functionality is
used.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
df2881381ac67c42aa8ec9e0ed28f21a1d253785 26-May-2016 Ilia Mirkin <imirkin@alum.mit.edu> nvc0/ir: handle a load's reg result not being used for locked variants

For a load locked, we might not use the first result but the second
result is the predicate result of the locking. In that case the load
splitting logic doesn't apply (which is designed for splitting 128-bit
loads). Instead we take the predicate and move it into the first
position (as having a dead result in first def's position upsets all
sorts of things including RA). Update the emitters to deal with this as
well.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
47b390fe45e5e6f982c60b58985892438959cd8e 17-May-2016 Jan Vesely <jano.vesely@gmail.com> Treewide: Remove Elements() macro

Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
787a53988cc6bb7a0f2b43c216837d683336b33f 21-Apr-2016 Hans de Goede <hdegoede@redhat.com> nouveau: codegen: combineLd/St do not combine indirect loads

combineLd/St would combine, i.e. :

st u32 # g[$r2+0x0] $r2
st u32 # g[$r2+0x4] $r3

into:

st u64 # g[$r2+0x0] $r2d

But this is only valid if r2 contains an 8 byte aligned address,
which is not guaranteed for compute shaders

This commit checks for src0 dim 0 not being indirect when combining
loads / stores as combining indirect loads / stores may break alignment
rules.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
059308db841886101586aa3ec5ac74b89abf1a20 07-Apr-2016 Samuel Pitoiset <samuel.pitoiset@gmail.com> nv50/ir: do not try to attach JOIN ops to ATOM

This might result in an INVALID_OPCODE dmesg error in case a join is
attached to an atomic operation.

Spotted with arb_shader_image_load_store-host-mem-barrier on GK104.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b3e7fb52349848b24f005c07859bc43691bd64bd 13-Mar-2016 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: avoid folding mul + add if the mul has a dnz

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
edf774bb7eae32f00b900a6faa9b5c698affdcaa 28-Jan-2016 Karol Herbst <nouveau@karolherbst.de> nv50/ir: we can't do the add to mad conversion when the mul saturates

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
068e9848ba5937673d66c42a4b44067fd78becaf 24-Jan-2016 Karol Herbst <nouveau@karolherbst.de> nv50/ir: optimize neg(and(set, 1)) to set

helps shaders in saints row IV, bioshock infinite and shadow warrior

total instructions in shared programs : 1914931 -> 1903900 (-0.58%)
total gprs used in shared programs : 247920 -> 247785 (-0.05%)
total local used in shared programs : 5673 -> 5673 (0.00%)
total bytes used in shared programs : 17558272 -> 17457320 (-0.57%)

local gpr inst bytes
helped 0 137 719 719
hurt 0 12 0 0

v2: remove this opt for OP_SLCT and check against float for OP_SET
v3: simplified the code

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
7f19e293055d2a9897df803efa310c293280ab8f 30-Jan-2016 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: get rid of memory stores with nop values

This happens especially with exports and varying packing, where the last
bits aren't always filled in. We end up trying to do quad-wide stores,
which ends up being a lot of register moves that carefully preserve the
nop value. Instead don't do the stores.

total instructions in shared programs : 6131375 -> 6125267 (-0.10%)
total gprs used in shared programs : 910139 -> 895501 (-1.61%)
total local used in shared programs : 15328 -> 15328 (0.00%)

local gpr inst
helped 0 7442 4693
hurt 0 90 2687

Most of the helped/hurt instruction changes are by one or two ops
because can no longer do quad-wide stores in all cases.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
3ca941d60ed38800038cd545842e0ed3a69946da 30-Jan-2016 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: fix false global CSE on instructions with multiple defs

If an instruction has multiple defs, we have to do a lot more checks to
make sure that we can move it forward. Among other things, various code
likes to do

a, b = tex()
if () c = a
else c = b

which means that a single phi node will have results pointing at the
same instruction. We obviously can't propagate the tex in this case, but
properly accounting for this situation is tricky. Just don't try for
instructions with multiple defs.

This fixes about 20 shaders in shader-db, including the dolphin efb2ram
shader.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
c3083c70823d8f4bfdabcf38f98dfebeff0a2b2b 03-Jan-2016 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: add support for BUFFER accesses

This largely leaves the existing image logic alone. When image support
is added this will have to be harmonized somehow.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
29d09f8747abea35f4deadced0196725d4ab89cf 27-Jan-2016 Karol Herbst <nouveau@karolherbst.de> nv50/ir: optimize mad/fma with third argument 0 to mul

Very modest effect, but it's clearly the right thing to do.

total instructions in shared programs : 6131491 -> 6131398 (-0.00%)
total gprs used in shared programs : 910157 -> 910131 (-0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)

local gpr inst bytes
helped 0 55 85 85
hurt 0 26 20 20

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
3aa681449ed030ba8b9c56f0a6f2b08bd1fb15a6 27-Jan-2016 Karol Herbst <nouveau@karolherbst.de> nv50/ir: run DCE backwards

Reduces calls up to 50%

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
978ae28ca279354852a586b202e705db3d596041 27-Jan-2016 Karol Herbst <nouveau@karolherbst.de> nv50/ir: optimize shl(shr(a, c), c) to and(a, ~((1 << c) - 1))

Following shader-db results on GK110:

total instructions in shared programs : 6141510 -> 6131491 (-0.16%)
total gprs used in shared programs : 910187 -> 910157 (-0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)

local gpr inst bytes
helped 0 18 821 821
hurt 0 0 0 0

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
dc3ac418bf889620c93f50c68ef55b9e9de3afd3 20-Jan-2016 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: don't flip SHL(ADD) into ADD(SHL) if ADD sources have modifiers

Fixes: 31fde8fa (nv50/ir: flip shl(add, imm) into add(shl, imm))
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
a31819cff8f4560786d731f5f1de6ba814368a2f 18-Jan-2016 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: swap the least-ref'd source into src1 when both const/imm

The whole point of inlining sources is to reduce loads. We can end up in
a situation where one value is used a lot of times, and one value is
used only once per instruction. The once-per-instruction one is the one
that should get inlined, but with the previous algorithm, it was given
no preference.

This flips things around to preferring putting less-referenced values
into src1 which increases the likelihood of them being inlined.

While we're at it, adjust the heuristic to not treat 0 as an immediate,
as well as (effectively) check for situations where LIMMs can't be
loaded. All this yields improvements on nvc0:

total instructions in shared programs : 6261157 -> 6255985 (-0.08%)
total gprs used in shared programs : 945082 -> 943417 (-0.18%)
total local used in shared programs : 30372 -> 30288 (-0.28%)
total bytes used in shared programs : 50089256 -> 50047880 (-0.08%)

local gpr inst bytes
helped 21 822 3332 3332
hurt 0 278 565 565

And more importantly avoids generating really bad code with SSBOs, where
we end up checking a lot of different values (usually immediates) against
the length.

On nv50 we get comparable results, and even improve packing (bytes went
down more than instructions):

total instructions in shared programs : 6346564 -> 6341277 (-0.08%)
total gprs used in shared programs : 728719 -> 725131 (-0.49%)
total local used in shared programs : 3552 -> 3552 (0.00%)
total bytes used in shared programs : 43995688 -> 43932928 (-0.14%)

local gpr inst bytes
helped 0 1380 3252 3774
hurt 0 287 1710 1365

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
d50e6128b815595f7918d6818e8a9cd20d53efd1 07-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: attempt to do more constant folding on mad -> add conversion

The add might actually have a 0 as an argument, which would convert it
into a mov. Make sure to detect that. Also avoid the hack of putting the
immediate directly into the instruction, instead use a mov to put it
into place and let the later LoadPropagation pass place it if possible.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
724134f68322087ef88bc590febd0011167ae367 29-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: float(s32 & 0xff) = float(u8), not s8

Make sure to make conversion unsigned when we're ANDing the high bits
away. Fixes corruption in dolphin.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
d35695096de2358aef40452b5e3304a02534f7db 10-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: combine sequences of conversions

In some cases shaders want non-default rounding when converting float to
integer. This can be done in one go, so merge the two ops. This comes up
in the packUnorm4x8 & co functions, as well as a few random shaders.
Overall shader-db impact is minimal, helping a handful of witcher2 and
other misc shaders.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
a0b5d5beedb5bc5dcfd4c62c02576fdddf63d1f0 11-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: teach post-ra immediate folding into mad about integers

There will usually be a split before the mad op, peer through that and
pick out the right word of the immediate.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
ab70ea1353ac9859ee51d236482fe92a0493362d 11-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: add short imad support

Support emission of the short imad, but also include it in the various
logic that tries to make it possible to emit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
6aca7fecb7f7b6c67cf0315e781060a8d1d4b704 10-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: can't have predication and immediates

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
a27548400ea02c39b6602526eb697c673c7d22bb 09-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: fix assumption that prog->maxGPR is in 32-bit reg units

On NV50, we use 16-bit reg units (to make it all work with half-regs). A
few places assumed that it was always in 32-bit units.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
0f647bd65bae16c7a2dc7a960c96593ad6ab729c 08-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: check if the target supports the new offset before inlining

Fixes: abd326e81b (nv50/ir: propagate indirect loads into instructions)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93300
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
f97f755192210ce3690e67abccefa133d398d373 08-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nvc0/ir: fix up mul+add -> mad algebraic opt, enable for integers

For some reason this has been disabled for integers ever since codegen
was merged, despite there being emission code for IMAD. Seems to work.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
0ef5c8ab7405fcc76b23393d4414f46cc9edb1fc 04-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: fold shl + mul with immediates

On SM20 this gives:

total instructions in shared programs : 6299222 -> 6294240 (-0.08%)
total gprs used in shared programs : 944139 -> 944068 (-0.01%)
total local used in shared programs : 54116 -> 54116 (0.00%)

local gpr inst bytes
helped 0 126 2781 2781
hurt 0 55 11 11

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
abd326e81b06f58797be94bd655ee06b17a34f0c 04-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: propagate indirect loads into instructions

This way $r1 = $r0 + 4; c1[$r1] becomes c1[$r0+4].

On SM35:

total instructions in shared programs : 6206257 -> 6185058 (-0.34%)
total gprs used in shared programs : 911045 -> 910722 (-0.04%)
total local used in shared programs : 39072 -> 39072 (0.00%)

local gpr inst bytes
helped 0 417 4195 4195
hurt 0 280 0 0

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
31fde8fabadcd9240c1e96c8a953b465def9b516 04-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: flip shl(add, imm) into add(shl, imm)

This works when the add also has an immediate. This often happens in
address calculations. These addresses can then be inlined as well.

On code targeted to SM35:

total instructions in shared programs : 6223346 -> 6206257 (-0.27%)
total gprs used in shared programs : 911075 -> 911045 (-0.00%)
total local used in shared programs : 39072 -> 39072 (0.00%)

local gpr inst bytes
helped 0 119 3664 3664
hurt 0 74 15 15

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
a3722b81f534598f25d9d155a6d30bc59a6f4e59 03-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: fold fma/mad when all 3 args are immediates

This happens pretty rarely, but might as well do it when it does.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
49692f86a1b77fac4634d2a3f0502ec7451c3435 03-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: fix DCE to not generate 96-bit loads

A situation where there's a 128-bit load where the last component gets
DCE'd causes a 96-bit load to be generated, which no GPU can actually
emit. Avoid generating such instructions by scaling back to 64-bit on
the first load when splitting.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
11fcf46590129abfa2ca2117a320e8a8052761e4 03-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: the mad source might not have a defining instruction

For example if it's $r63 (aka 0), there won't be a definition.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
ff61ac48387d3f42ede50a572c11f404f4cd3abb 02-Dec-2015 Ilia Mirkin <imirkin@alum.mit.edu> nvc0/ir: fold postfactor into immediate

SM20-SM50 can't emit a post-factor in the presence of a long immediate.
Make sure to fold it in.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
393d0c336bc766a123e139ae85383663f81e00d1 07-Nov-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: properly set the type of the constant folding result

This removes the hack used for merge, which only covers a fraction of
the cases.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
2f9aaed7499499679d44e47b7a070df237f77683 07-Nov-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: add support for const-folding OP_CVT with F64 source/dest

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
f979d3cfec2b336801fe59ccd264111f403428f5 05-Nov-2015 Hans de Goede <hdegoede@redhat.com> nv50/ir: Add support for 64bit immediates to checkSwapSrc01

Now that we support 64 bit immediates in insnCanLoad, we need to swap
64 bit immediate sources too for optimal effect.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
428506ece2c7627392d0f02c7f83021caa46bb4f 05-Nov-2015 Hans de Goede <hdegoede@redhat.com> nv50/ir: Add support for merge-s to the ConstantFolding pass

This allows later passes like LoadPropagation to properly deal with 64
bit immediates.

If the new 64 bit load this introduces does not get optimized away then
split64BitOpPostRA() will split this into 2 instructions again.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
74b86b971f3bf9b0482341b07c1cbc2e520fb1d0 10-Sep-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: don't fold immediate into mad if registers are too high

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91551
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
3e6adbd761f72b612aba57fd86bb5203aae07133 11-Jan-2015 Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> nv50/ir: Handle OP_CVT when folding constant expressions

[imirkin: handle more type combinations, use macro]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
f5b926183ded75661ab3f786ac1739b1f912c6c5 19-Aug-2015 Ilia Mirkin <imirkin@alum.mit.edu> nvc0/ir: undo more shifts still by allowing a pre-SHL to occur

This happens with unpackSnorm lowering. There's yet another
bitfield-extract behind it, but there's too much variation to be worth
cutting through.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
9ebe7dc09479d9a8df2733ef96525a2b5e758f6d 19-Aug-2015 Ilia Mirkin <imirkin@alum.mit.edu> nvc0/ir: don't require AND when the high byte is being addressed

unpackUnorm* lowering doesn't AND the high byte/word as it's
unnecessary. Detect that situation as well.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
63cb85e567ad1025ee990b38f43c2f1ef811821b 19-Aug-2015 Ilia Mirkin <imirkin@alum.mit.edu> nvc0/ir: detect i2f/i2i which operate on specific bytes/words

Some Unigine shaders have been observed to unpack bytes out of 32-bit
integers and convert them to floats. I2F/I2I can handle this sort of
thing directly. Detect the handleable situations.

This misses 16-bit word capabilities in nv50, but I haven't seen shaders
that would actually make use of that.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
51499bb5ff5626b893383545c494c7f808763404 19-Aug-2015 Ilia Mirkin <imirkin@alum.mit.edu> nvc0/ir: detect AND/SHR pairs and convert into EXTBF

Some shaders appear to extract bits using shift/and combos. Detect
(some) of those and convert to EXTBF instead.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
24a7d4e437e27c758c2848e887ceaf1d4a55ae50 24-Jul-2015 Ilia Mirkin <imirkin@alum.mit.edu> nvc0/ir: per-patch vars are in a separate address space

There's no need to attempt to avoid overlapping generic i/o with patch
i/o. By the same token, we can't merge patch and non-patch loads/stores.

This fixes at least the

tes-both-input-array-*-index-rd

tessellation variable-indexing tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
da89e75d9c6399c8fb0286460c91a77778c0eec9 30-Apr-2015 Ilia Mirkin <imirkin@alum.mit.edu> nvc0/ir: allow tess eval output loads to be CSE'd

These only happen for gl_TessCoord which are constant.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
ad62ec8316a926682958e7ab52639992867c3755 26-Jun-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: propagate modifier to right arg when const-folding mad

An immediate has to be the second arg of an ADD operation. However we
were mistakenly propagating the modifier of the non-folded value to the
folded immediate argument.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91117
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
36e3eb6a957f8f20ed187ec88a067fc65cb81432 18-Jun-2015 Ilia Mirkin <imirkin@alum.mit.edu> nvc0/ir: can't have a join on a load with an indirect source

Triggers an INVALID_OPCODE warning on GK208. Seems rare enough to not
warrant verification on other chips. Fixes the new piglits:

ubo_array_indexing/fs-nonuniform-control-flow.shader_test
ubo_array_indexing/vs-nonuniform-control-flow.shader_test

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
fa7f9f123b70f313d3c073b52c9c16b4b8df28f8 23-May-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: avoid messing up arg1 of PFETCH

There can be scenarios where the "indirect" arg of a PFETCH becomes
known, and so the code will attempt to propagate it. Use this
opportunity to just fold it into the first argument, and prevent the
load propagation pass from touching PFETCH further.

This fixes gs-input-array-vec4-index-rd.shader_test and
vs-output-array-vec4-index-wr-before-gs.shader_test on nvc0 at least.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
a85aba190dfab02ffccf744bad5ad10357394de0 09-May-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: allow OP_SET to merge with OP_SET_AND/etc as well as a neg

This covers the pattern where a KILL_IF is used, which triggers a
comparison of -x to 0. This can usually be folded into the comparison whose
result is being compared to 0, however it may, itself, have already been
combined with another comparison. That shouldn't impact the logic of
this pass however. With this and the & 1.0 change, code like

00000020: 001c0001 80081df4 set b32 $r0 lt f32 $r0 0x3e800000
00000028: 001c0000 201fc000 and b32 $r0 $r0 0x3f800000
00000030: 7f9c001e dd885c00 set $p0 0x1 lt f32 neg $r0 0x0
00000038: 0000003c 19800000 $p0 discard

becomes

00000020: 001c001d b5881df4 set $p0 0x1 lt f32 $r0 0x3e800000
00000028: 0000003c 19800000 $p0 discard

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
d2a474e8d4b03f10aec57c7f7740addad1e1ea9d 04-May-2015 Ilia Mirkin <imirkin@alum.mit.edu> nvc0/ir: optimize set & 1.0 to produce boolean-float sets

This has started to happen more now that the backend is producing
KILL_IF more often.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
c4ac09e30e2520b0ac6d403eb6c77f23e7f24f49 09-May-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: only propagate saturate up if some actual folding took place

The former logic would copy the saturate up to any mul with an immediate
if there was a subsequent mul with a saturate. However we only want to
do that if we collapsed 2 muls by multiplying their immediates (or were
able to put the immediate in as a post-multiplier).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
01d3b750b3682f3774f1bd01fa07a6b3c8baf28e 03-Apr-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: avoid folding immediates into imad operations

Commit 09ee907266 added logic to fold immediates into mad operations,
but the emission code is only there for fmad. Only allow it on float
types.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
49b86007aa2bb599ada6cdbed7ff56246917f12e 25-Mar-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: take postFactor into account when doing peephole optimizations

Multiply operations can have a post-factor on them, which other ops
don't support. Only perform the peephole optimizations when there is no
post-factor involved.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89758
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
5491458843998e8083baf9b62c14895946de1a3f 07-Jul-2014 Ilia Mirkin <imirkin@alum.mit.edu> nvc0/ir: remove merge/split pairs to allow normal propagation to occur

Because the TGSI interface creates merges for each instruction source
and then splits them back out, there are a lot of unnecessary
merge/split pairs which do essentially nothing. The various modifier/etc
propagation doesn't know how to walk though those, so just remove them
when they're unnecessary.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
09ee907266f315300a7856b55e50e74dce8e946f 06-Feb-2015 Roy Spliet <rspliet@eclipso.eu> nv50/ir: Fold IMM into MAD

Add a specific optimisation pass for NV50 to check whether SRC0 or SRC1 is
a MOV dst, IMM. If so: fold the IMM in and try to drop the MOV. Must be
done post-RA because it requires that SDST == SSRC2.

V2: improve readability and add comments to clarify decisions
V3: Remove redundant code... compiler already attempts to put the IMM in
SSRC1

Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
9e94b87b6012450d714edc6d0c46b15a89d5ce61 01-Jan-2015 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: fold MAD when one of the multiplicands is const

Fold MAD dst, src0, immed, src2 (or src0/immed swapped) when
- immed = 0 -> MOV dst, src2
- immed = +/- 1 -> ADD dst, src0, src2

These types of MAD patterns were observed in some st/nine shaders.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
6c2b079231f84b09b3f35183930afe522baee168 01-Sep-2014 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: avoid creating instructions that can't be emitted

When constant folding a MAD operation, we first fold the multiply and
generate an ADD. However we do so without making sure that the immediate
can be handled in the saturate case. If it can't, load the immediate in
a separate instruction.

Reported-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
a9b21015f5e3a6a37e53a8b3c755519f7b70479e 08-Jul-2014 Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> nv50/ir: use unordered_set instead of list to keep track of var uses

The set of variable uses does not need to be ordered in any way, and
removing/adding elements is a fairly common operation in various
optimization passes.

This shortens runtime of piglit test fp-long-alu to ~22s from ~4h

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
4f4e9ba1661528bed8e956a4931ae154e6612824 04-Jun-2014 Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> nvc0/ir: Handle OP_POPCNT when folding constant expressions

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
[imirkin: make sure to only fold 1-arg popcnt in opnd]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
fdc1d96b0ff59e163ed9fe894a1e6d08d4204b94 04-Jun-2014 Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> nvc0/ir: Handle OP_BFIND when folding constant expressions

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
4674343e8f37f336b68bb04212c928f28af66958 04-Jun-2014 Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> nvc0/ir: Handle reverse subop for OP_EXTBF when folding constant expressions

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
3164bfc73418e2e046c7a750eaac8a6d66dfe02d 04-Jun-2014 Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> nv50/ir: clear subop when folding constant expressions

Some operations (e.g. OP_MUL/OP_MAD/OP_EXTBF) might have a subop set.
After folding, make sure that it is cleared

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
3b0867f35b5b294eb0d40524a6bc4c8de888a96f 11-Jun-2013 Christoph Bumiller <e0425955@student.tuwien.ac.at> nv50/ir/opt: fix constant folding with saturate modifier

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
d2a3de19c6aa5881228734c73df706483a4aecf9 15-May-2014 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: fix constant folding for OP_MUL subop HIGH

These instructions can come in either through IMUL_HI/UMUL_HI TGSI
opcodes, or from OP_DIV constant folding.

Also make sure that the constant foldings which delete the original
instruction still get counted as having done something.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
7b9475fa652b9df6d599edbea8fa5049fdd995e1 09-May-2014 Ben Skeggs <bskeggs@redhat.com> nvc0: maxwell isa has no per-instruction join modifier

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
68f47cad0d23281309741cc47eeaa26ebbb41bca 10-May-2014 Ilia Mirkin <imirkin@alum.mit.edu> nv50/ir: make sure to reverse cond codes on all the OP_SET variants

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.2 10.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
46364a53ef30e5c97e1eeb5a879dd99a47415b73 27-Apr-2014 Ilia Mirkin <imirkin@alum.mit.edu> nvc0/ir: do constant folding of extbf/insbf

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
bbe3d6dc29f218e4d790e5ea359d3c6736e94226 09-Sep-2013 Dave Airlie <airlied@gmail.com> nouveau: fix regression since float comparison instructions (v2)

Fix the return type and allow src and dst types for comparison
to be separate, this at least fixes the two test cases I've written.

v2: drop the u32->s32 change

Acked-by: Christoph Bumiller <christoph.bumiller@speed.at>
Signed-off-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
5eb7ff1175a644ffe3b0f1a75cb235400355f9fb 20-Aug-2013 Johannes Obermayr <johannesobermayr@gmx.de> Move nv30, nv50 and nvc0 to nouveau.

It is planned to ship openSUSE 13.1 with -shared libs.
nouveau.la, nv30.la, nv50.la and nvc0.la are currently LIBADDs in all nouveau
related targets.
This change makes it possible to easily build one shared libnouveau.so which is
then LIBADDed.
Also dlopen will be faster for one library instead of three and build time on
-jX will be reduced.

Whitespace fixes were requested by 'git am'.

Signed-off-by: Johannes Obermayr <johannesobermayr@gmx.de>
Acked-by: Christoph Bumiller <christoph.bumiller@speed.at>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp