19963231a3245358c0e8fdd74c4654761e62b6c8 |
|
13-Jan-2017 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: optimize shl + and Address loading can often end up as shl + shr + shl combinations. The latter two are equal shifts, which get converted into an and mask. However if the previous shl is more than the mask is trying to remove (in terms of low bits), we can just remove the and entirely. This reduces some large shaders by as many as 3% of instructions (out of 2K). total instructions in shared programs : 6495509 -> 6491076 (-0.07%) total gprs used in shared programs : 954621 -> 954623 (0.00%) local gpr inst bytes helped 0 0 1014 1014 hurt 0 2 0 0 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
0404678c5f72162c9898c9c94ca67969106227c8 |
|
06-Oct-2016 |
Karol Herbst <karolherbst@gmail.com> |
nv50/ir: start LocalCSE with getFirst to merge PHI instructions total instructions in shared programs : 3499888 -> 3499445 (-0.01%) total gprs used in shared programs : 453866 -> 453803 (-0.01%) total local used in shared programs : 21621 -> 21621 (0.00%) total bytes used in shared programs : 32078952 -> 32074936 (-0.01%) local gpr inst bytes helped 0 39 119 119 hurt 0 0 0 0 Signed-off-by: Karol Herbst <karolherbst@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
7b7eb7170d16ddb0963900ccf59b39956219373c |
|
20-Oct-2016 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: it appears that OP_DISCARD can't take a join modifier nvdisasm does not print a .S even though the bit is set. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
b7d9677de804375827b3c433027ec2dd32cd1da6 |
|
30-Sep-2016 |
Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> |
nv50/ir: constant fold OP_SPLIT Split the source immediate value into new values and move them into the original defs set by the split. Since we can only have up to 64-bit immediates, this is largely beneficial for F64 (and, in the future, U64) operations. Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> [imirkin: always use U32, set newi for foldCount tracking] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
a6d6eff2e6ea2ccd585fe9bf1e159979cd3047df |
|
12-Oct-2016 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nvc0/ir: be more careful about preserving modifiers in SHLADD creation src2 was being given the wrong modifier, and we were not properly managing the modifier on the SHL source either. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
87b06cab14c449e442be27650024f044e93c9a7c |
|
07-Oct-2016 |
Samuel Pitoiset <samuel.pitoiset@gmail.com> |
nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c) total instructions in shared programs :2286901 -> 2284473 (-0.11%) total gprs used in shared programs :335256 -> 335273 (0.01%) total local used in shared programs :31968 -> 31968 (0.00%) local gpr inst bytes helped 0 41 852 852 hurt 0 44 23 23 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
28ecd3eac24ce41b8a855a50f366f1985d1dc934 |
|
07-Oct-2016 |
Samuel Pitoiset <samuel.pitoiset@gmail.com> |
nv50/ir: fix wrong check when optimizing MAD to SHLADD Checking if MAD is supported is definitely wrong, and it's more likely a typo I introduced few days ago which breaks NV50 because SHLADD is not supported there. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
f96945c5b5c3a52685e76795f03f75c75fb62fc7 |
|
06-Oct-2016 |
Karol Herbst <karolherbst@gmail.com> |
nv50/ir: optimize sub(a, 0) to a helped some ue4 demos and divinity OS shaders total instructions in shared programs : 2818674 -> 2818606 (-0.00%) total gprs used in shared programs : 379273 -> 379273 (0.00%) total local used in shared programs : 9505 -> 9505 (0.00%) total bytes used in shared programs : 25837792 -> 25837192 (-0.00%) local gpr inst bytes helped 0 0 33 33 hurt 0 0 0 0 Signed-off-by: Karol Herbst <karolherbst@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
115c79be10bf3712a1e1bc25a563c90388c1bcaa |
|
14-Sep-2016 |
Samuel Pitoiset <samuel.pitoiset@gmail.com> |
nv50/ir: optimize SHLADD(a, b, c) to MOV((a << b) + c) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
2e008be9a9a4c94564c11718e0f6fc029caa0e44 |
|
14-Sep-2016 |
Samuel Pitoiset <samuel.pitoiset@gmail.com> |
nv50/ir: optimize SHLADD(a, b, 0x0) to SHL(a, b) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
e4eb0fca024babcd7bea2b34a7e7605287963ce0 |
|
14-Sep-2016 |
Samuel Pitoiset <samuel.pitoiset@gmail.com> |
nv50/ir: optimize IMAD to SHLADD in presence of power of 2 Only and only if src1 is a power of 2 we can replace IMAD by SHLADD. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
557a29b51fa3324cfbeecff100a54c7c6a6d87cd |
|
18-Sep-2016 |
Samuel Pitoiset <samuel.pitoiset@gmail.com> |
nv50/ir: optimize SUB(a, b) to MOV(a - b) This helps shaders in UE4 demos, especially with Elemental (+1% perf). This optimization reduces spilling usage in one shader which explains the little gain. GF100/GK104: total instructions in shared programs :2838551 -> 2838045 (-0.02%) total gprs used in shared programs :396706 -> 396684 (-0.01%) total local used in shared programs :34432 -> 34416 (-0.05%) local gpr inst bytes helped 1 19 112 112 hurt 0 0 0 0 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
3f5cf8c488bfc401d1d5503c1ec61874d7c1477d |
|
20-Jul-2016 |
Samuel Pitoiset <samuel.pitoiset@gmail.com> |
nv50/ir: allow to swap sources for OP_SUB This allows the load-propagation pass to swap the sources in presence of immediate values. Maxwell (GM107): total instructions in shared programs :1928187 -> 1927634 (-0.03%) total gprs used in shared programs :330741 -> 330154 (-0.18%) total local used in shared programs :28032 -> 28032 (0.00%) local gpr inst bytes helped 0 271 425 425 hurt 0 0 194 194 Fermi (GF114): total instructions in shared programs :2334474 -> 2333829 (-0.03%) total gprs used in shared programs :380934 -> 380215 (-0.19%) total local used in shared programs :33304 -> 33264 (-0.12%) local gpr inst bytes helped 5 314 521 521 hurt 0 4 195 195 No regressions on GM107 and GF114 with full piglit. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
062c6b8e54c14adcc1ec603fad524f38fe058e67 |
|
19-Jun-2016 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50: fix alphatest for non-blendable formats The hardware can only do alphatest when using a blendable format. This means that the various *16 norm formats didn't work with alphatest. It appears that Talos Principle uses such formats, as well as alpha tests, for some internal renders, which made them be incorrect. However this does not appear to affect the final renders, but in a different game it easily could. The approach we take is that when alphatests are enabled and a suitable format is used (which we anticipate is the vast minority of the time), we insert code into the shader to perform the comparison and discard. Once inserted, that code lives in the shader forever, and we re-upload it each time the function changes with a fixed-up compare. To avoid re-uploading too often, if we switch back to a blendable format, the test is (effectively) disabled and the hw alphatest functionality is used. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
df2881381ac67c42aa8ec9e0ed28f21a1d253785 |
|
26-May-2016 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nvc0/ir: handle a load's reg result not being used for locked variants For a load locked, we might not use the first result but the second result is the predicate result of the locking. In that case the load splitting logic doesn't apply (which is designed for splitting 128-bit loads). Instead we take the predicate and move it into the first position (as having a dead result in first def's position upsets all sorts of things including RA). Update the emitters to deal with this as well. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
47b390fe45e5e6f982c60b58985892438959cd8e |
|
17-May-2016 |
Jan Vesely <jano.vesely@gmail.com> |
Treewide: Remove Elements() macro Signed-off-by: Jan Vesely <jano.vesely@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
787a53988cc6bb7a0f2b43c216837d683336b33f |
|
21-Apr-2016 |
Hans de Goede <hdegoede@redhat.com> |
nouveau: codegen: combineLd/St do not combine indirect loads combineLd/St would combine, i.e. : st u32 # g[$r2+0x0] $r2 st u32 # g[$r2+0x4] $r3 into: st u64 # g[$r2+0x0] $r2d But this is only valid if r2 contains an 8 byte aligned address, which is not guaranteed for compute shaders This commit checks for src0 dim 0 not being indirect when combining loads / stores as combining indirect loads / stores may break alignment rules. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
059308db841886101586aa3ec5ac74b89abf1a20 |
|
07-Apr-2016 |
Samuel Pitoiset <samuel.pitoiset@gmail.com> |
nv50/ir: do not try to attach JOIN ops to ATOM This might result in an INVALID_OPCODE dmesg error in case a join is attached to an atomic operation. Spotted with arb_shader_image_load_store-host-mem-barrier on GK104. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
b3e7fb52349848b24f005c07859bc43691bd64bd |
|
13-Mar-2016 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: avoid folding mul + add if the mul has a dnz Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
edf774bb7eae32f00b900a6faa9b5c698affdcaa |
|
28-Jan-2016 |
Karol Herbst <nouveau@karolherbst.de> |
nv50/ir: we can't do the add to mad conversion when the mul saturates Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
068e9848ba5937673d66c42a4b44067fd78becaf |
|
24-Jan-2016 |
Karol Herbst <nouveau@karolherbst.de> |
nv50/ir: optimize neg(and(set, 1)) to set helps shaders in saints row IV, bioshock infinite and shadow warrior total instructions in shared programs : 1914931 -> 1903900 (-0.58%) total gprs used in shared programs : 247920 -> 247785 (-0.05%) total local used in shared programs : 5673 -> 5673 (0.00%) total bytes used in shared programs : 17558272 -> 17457320 (-0.57%) local gpr inst bytes helped 0 137 719 719 hurt 0 12 0 0 v2: remove this opt for OP_SLCT and check against float for OP_SET v3: simplified the code Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
7f19e293055d2a9897df803efa310c293280ab8f |
|
30-Jan-2016 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: get rid of memory stores with nop values This happens especially with exports and varying packing, where the last bits aren't always filled in. We end up trying to do quad-wide stores, which ends up being a lot of register moves that carefully preserve the nop value. Instead don't do the stores. total instructions in shared programs : 6131375 -> 6125267 (-0.10%) total gprs used in shared programs : 910139 -> 895501 (-1.61%) total local used in shared programs : 15328 -> 15328 (0.00%) local gpr inst helped 0 7442 4693 hurt 0 90 2687 Most of the helped/hurt instruction changes are by one or two ops because can no longer do quad-wide stores in all cases. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
3ca941d60ed38800038cd545842e0ed3a69946da |
|
30-Jan-2016 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: fix false global CSE on instructions with multiple defs If an instruction has multiple defs, we have to do a lot more checks to make sure that we can move it forward. Among other things, various code likes to do a, b = tex() if () c = a else c = b which means that a single phi node will have results pointing at the same instruction. We obviously can't propagate the tex in this case, but properly accounting for this situation is tricky. Just don't try for instructions with multiple defs. This fixes about 20 shaders in shader-db, including the dolphin efb2ram shader. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
c3083c70823d8f4bfdabcf38f98dfebeff0a2b2b |
|
03-Jan-2016 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: add support for BUFFER accesses This largely leaves the existing image logic alone. When image support is added this will have to be harmonized somehow. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
29d09f8747abea35f4deadced0196725d4ab89cf |
|
27-Jan-2016 |
Karol Herbst <nouveau@karolherbst.de> |
nv50/ir: optimize mad/fma with third argument 0 to mul Very modest effect, but it's clearly the right thing to do. total instructions in shared programs : 6131491 -> 6131398 (-0.00%) total gprs used in shared programs : 910157 -> 910131 (-0.00%) total local used in shared programs : 15328 -> 15328 (0.00%) local gpr inst bytes helped 0 55 85 85 hurt 0 26 20 20 Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
3aa681449ed030ba8b9c56f0a6f2b08bd1fb15a6 |
|
27-Jan-2016 |
Karol Herbst <nouveau@karolherbst.de> |
nv50/ir: run DCE backwards Reduces calls up to 50% Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
978ae28ca279354852a586b202e705db3d596041 |
|
27-Jan-2016 |
Karol Herbst <nouveau@karolherbst.de> |
nv50/ir: optimize shl(shr(a, c), c) to and(a, ~((1 << c) - 1)) Following shader-db results on GK110: total instructions in shared programs : 6141510 -> 6131491 (-0.16%) total gprs used in shared programs : 910187 -> 910157 (-0.00%) total local used in shared programs : 15328 -> 15328 (0.00%) local gpr inst bytes helped 0 18 821 821 hurt 0 0 0 0 Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
dc3ac418bf889620c93f50c68ef55b9e9de3afd3 |
|
20-Jan-2016 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: don't flip SHL(ADD) into ADD(SHL) if ADD sources have modifiers Fixes: 31fde8fa (nv50/ir: flip shl(add, imm) into add(shl, imm)) Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
a31819cff8f4560786d731f5f1de6ba814368a2f |
|
18-Jan-2016 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: swap the least-ref'd source into src1 when both const/imm The whole point of inlining sources is to reduce loads. We can end up in a situation where one value is used a lot of times, and one value is used only once per instruction. The once-per-instruction one is the one that should get inlined, but with the previous algorithm, it was given no preference. This flips things around to preferring putting less-referenced values into src1 which increases the likelihood of them being inlined. While we're at it, adjust the heuristic to not treat 0 as an immediate, as well as (effectively) check for situations where LIMMs can't be loaded. All this yields improvements on nvc0: total instructions in shared programs : 6261157 -> 6255985 (-0.08%) total gprs used in shared programs : 945082 -> 943417 (-0.18%) total local used in shared programs : 30372 -> 30288 (-0.28%) total bytes used in shared programs : 50089256 -> 50047880 (-0.08%) local gpr inst bytes helped 21 822 3332 3332 hurt 0 278 565 565 And more importantly avoids generating really bad code with SSBOs, where we end up checking a lot of different values (usually immediates) against the length. On nv50 we get comparable results, and even improve packing (bytes went down more than instructions): total instructions in shared programs : 6346564 -> 6341277 (-0.08%) total gprs used in shared programs : 728719 -> 725131 (-0.49%) total local used in shared programs : 3552 -> 3552 (0.00%) total bytes used in shared programs : 43995688 -> 43932928 (-0.14%) local gpr inst bytes helped 0 1380 3252 3774 hurt 0 287 1710 1365 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
d50e6128b815595f7918d6818e8a9cd20d53efd1 |
|
07-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: attempt to do more constant folding on mad -> add conversion The add might actually have a 0 as an argument, which would convert it into a mov. Make sure to detect that. Also avoid the hack of putting the immediate directly into the instruction, instead use a mov to put it into place and let the later LoadPropagation pass place it if possible. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
724134f68322087ef88bc590febd0011167ae367 |
|
29-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: float(s32 & 0xff) = float(u8), not s8 Make sure to make conversion unsigned when we're ANDing the high bits away. Fixes corruption in dolphin. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
d35695096de2358aef40452b5e3304a02534f7db |
|
10-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: combine sequences of conversions In some cases shaders want non-default rounding when converting float to integer. This can be done in one go, so merge the two ops. This comes up in the packUnorm4x8 & co functions, as well as a few random shaders. Overall shader-db impact is minimal, helping a handful of witcher2 and other misc shaders. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
a0b5d5beedb5bc5dcfd4c62c02576fdddf63d1f0 |
|
11-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: teach post-ra immediate folding into mad about integers There will usually be a split before the mad op, peer through that and pick out the right word of the immediate. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
ab70ea1353ac9859ee51d236482fe92a0493362d |
|
11-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: add short imad support Support emission of the short imad, but also include it in the various logic that tries to make it possible to emit. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
6aca7fecb7f7b6c67cf0315e781060a8d1d4b704 |
|
10-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: can't have predication and immediates Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
a27548400ea02c39b6602526eb697c673c7d22bb |
|
09-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: fix assumption that prog->maxGPR is in 32-bit reg units On NV50, we use 16-bit reg units (to make it all work with half-regs). A few places assumed that it was always in 32-bit units. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
0f647bd65bae16c7a2dc7a960c96593ad6ab729c |
|
08-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: check if the target supports the new offset before inlining Fixes: abd326e81b (nv50/ir: propagate indirect loads into instructions) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93300 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
f97f755192210ce3690e67abccefa133d398d373 |
|
08-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nvc0/ir: fix up mul+add -> mad algebraic opt, enable for integers For some reason this has been disabled for integers ever since codegen was merged, despite there being emission code for IMAD. Seems to work. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
0ef5c8ab7405fcc76b23393d4414f46cc9edb1fc |
|
04-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: fold shl + mul with immediates On SM20 this gives: total instructions in shared programs : 6299222 -> 6294240 (-0.08%) total gprs used in shared programs : 944139 -> 944068 (-0.01%) total local used in shared programs : 54116 -> 54116 (0.00%) local gpr inst bytes helped 0 126 2781 2781 hurt 0 55 11 11 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
abd326e81b06f58797be94bd655ee06b17a34f0c |
|
04-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: propagate indirect loads into instructions This way $r1 = $r0 + 4; c1[$r1] becomes c1[$r0+4]. On SM35: total instructions in shared programs : 6206257 -> 6185058 (-0.34%) total gprs used in shared programs : 911045 -> 910722 (-0.04%) total local used in shared programs : 39072 -> 39072 (0.00%) local gpr inst bytes helped 0 417 4195 4195 hurt 0 280 0 0 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
31fde8fabadcd9240c1e96c8a953b465def9b516 |
|
04-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: flip shl(add, imm) into add(shl, imm) This works when the add also has an immediate. This often happens in address calculations. These addresses can then be inlined as well. On code targeted to SM35: total instructions in shared programs : 6223346 -> 6206257 (-0.27%) total gprs used in shared programs : 911075 -> 911045 (-0.00%) total local used in shared programs : 39072 -> 39072 (0.00%) local gpr inst bytes helped 0 119 3664 3664 hurt 0 74 15 15 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
a3722b81f534598f25d9d155a6d30bc59a6f4e59 |
|
03-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: fold fma/mad when all 3 args are immediates This happens pretty rarely, but might as well do it when it does. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
49692f86a1b77fac4634d2a3f0502ec7451c3435 |
|
03-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: fix DCE to not generate 96-bit loads A situation where there's a 128-bit load where the last component gets DCE'd causes a 96-bit load to be generated, which no GPU can actually emit. Avoid generating such instructions by scaling back to 64-bit on the first load when splitting. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
11fcf46590129abfa2ca2117a320e8a8052761e4 |
|
03-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: the mad source might not have a defining instruction For example if it's $r63 (aka 0), there won't be a definition. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
ff61ac48387d3f42ede50a572c11f404f4cd3abb |
|
02-Dec-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nvc0/ir: fold postfactor into immediate SM20-SM50 can't emit a post-factor in the presence of a long immediate. Make sure to fold it in. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
393d0c336bc766a123e139ae85383663f81e00d1 |
|
07-Nov-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: properly set the type of the constant folding result This removes the hack used for merge, which only covers a fraction of the cases. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
2f9aaed7499499679d44e47b7a070df237f77683 |
|
07-Nov-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: add support for const-folding OP_CVT with F64 source/dest Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
f979d3cfec2b336801fe59ccd264111f403428f5 |
|
05-Nov-2015 |
Hans de Goede <hdegoede@redhat.com> |
nv50/ir: Add support for 64bit immediates to checkSwapSrc01 Now that we support 64 bit immediates in insnCanLoad, we need to swap 64 bit immediate sources too for optimal effect. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
428506ece2c7627392d0f02c7f83021caa46bb4f |
|
05-Nov-2015 |
Hans de Goede <hdegoede@redhat.com> |
nv50/ir: Add support for merge-s to the ConstantFolding pass This allows later passes like LoadPropagation to properly deal with 64 bit immediates. If the new 64 bit load this introduces does not get optimized away then split64BitOpPostRA() will split this into 2 instructions again. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
74b86b971f3bf9b0482341b07c1cbc2e520fb1d0 |
|
10-Sep-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: don't fold immediate into mad if registers are too high Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91551 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
3e6adbd761f72b612aba57fd86bb5203aae07133 |
|
11-Jan-2015 |
Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> |
nv50/ir: Handle OP_CVT when folding constant expressions [imirkin: handle more type combinations, use macro] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
f5b926183ded75661ab3f786ac1739b1f912c6c5 |
|
19-Aug-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nvc0/ir: undo more shifts still by allowing a pre-SHL to occur This happens with unpackSnorm lowering. There's yet another bitfield-extract behind it, but there's too much variation to be worth cutting through. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
9ebe7dc09479d9a8df2733ef96525a2b5e758f6d |
|
19-Aug-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nvc0/ir: don't require AND when the high byte is being addressed unpackUnorm* lowering doesn't AND the high byte/word as it's unnecessary. Detect that situation as well. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
63cb85e567ad1025ee990b38f43c2f1ef811821b |
|
19-Aug-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nvc0/ir: detect i2f/i2i which operate on specific bytes/words Some Unigine shaders have been observed to unpack bytes out of 32-bit integers and convert them to floats. I2F/I2I can handle this sort of thing directly. Detect the handleable situations. This misses 16-bit word capabilities in nv50, but I haven't seen shaders that would actually make use of that. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
51499bb5ff5626b893383545c494c7f808763404 |
|
19-Aug-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nvc0/ir: detect AND/SHR pairs and convert into EXTBF Some shaders appear to extract bits using shift/and combos. Detect (some) of those and convert to EXTBF instead. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
24a7d4e437e27c758c2848e887ceaf1d4a55ae50 |
|
24-Jul-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nvc0/ir: per-patch vars are in a separate address space There's no need to attempt to avoid overlapping generic i/o with patch i/o. By the same token, we can't merge patch and non-patch loads/stores. This fixes at least the tes-both-input-array-*-index-rd tessellation variable-indexing tests. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
da89e75d9c6399c8fb0286460c91a77778c0eec9 |
|
30-Apr-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nvc0/ir: allow tess eval output loads to be CSE'd These only happen for gl_TessCoord which are constant. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
ad62ec8316a926682958e7ab52639992867c3755 |
|
26-Jun-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: propagate modifier to right arg when const-folding mad An immediate has to be the second arg of an ADD operation. However we were mistakenly propagating the modifier of the non-folded value to the folded immediate argument. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91117 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
36e3eb6a957f8f20ed187ec88a067fc65cb81432 |
|
18-Jun-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nvc0/ir: can't have a join on a load with an indirect source Triggers an INVALID_OPCODE warning on GK208. Seems rare enough to not warrant verification on other chips. Fixes the new piglits: ubo_array_indexing/fs-nonuniform-control-flow.shader_test ubo_array_indexing/vs-nonuniform-control-flow.shader_test Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
fa7f9f123b70f313d3c073b52c9c16b4b8df28f8 |
|
23-May-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: avoid messing up arg1 of PFETCH There can be scenarios where the "indirect" arg of a PFETCH becomes known, and so the code will attempt to propagate it. Use this opportunity to just fold it into the first argument, and prevent the load propagation pass from touching PFETCH further. This fixes gs-input-array-vec4-index-rd.shader_test and vs-output-array-vec4-index-wr-before-gs.shader_test on nvc0 at least. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
a85aba190dfab02ffccf744bad5ad10357394de0 |
|
09-May-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: allow OP_SET to merge with OP_SET_AND/etc as well as a neg This covers the pattern where a KILL_IF is used, which triggers a comparison of -x to 0. This can usually be folded into the comparison whose result is being compared to 0, however it may, itself, have already been combined with another comparison. That shouldn't impact the logic of this pass however. With this and the & 1.0 change, code like 00000020: 001c0001 80081df4 set b32 $r0 lt f32 $r0 0x3e800000 00000028: 001c0000 201fc000 and b32 $r0 $r0 0x3f800000 00000030: 7f9c001e dd885c00 set $p0 0x1 lt f32 neg $r0 0x0 00000038: 0000003c 19800000 $p0 discard becomes 00000020: 001c001d b5881df4 set $p0 0x1 lt f32 $r0 0x3e800000 00000028: 0000003c 19800000 $p0 discard Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
d2a474e8d4b03f10aec57c7f7740addad1e1ea9d |
|
04-May-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nvc0/ir: optimize set & 1.0 to produce boolean-float sets This has started to happen more now that the backend is producing KILL_IF more often. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
c4ac09e30e2520b0ac6d403eb6c77f23e7f24f49 |
|
09-May-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: only propagate saturate up if some actual folding took place The former logic would copy the saturate up to any mul with an immediate if there was a subsequent mul with a saturate. However we only want to do that if we collapsed 2 muls by multiplying their immediates (or were able to put the immediate in as a post-multiplier). Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
01d3b750b3682f3774f1bd01fa07a6b3c8baf28e |
|
03-Apr-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: avoid folding immediates into imad operations Commit 09ee907266 added logic to fold immediates into mad operations, but the emission code is only there for fmad. Only allow it on float types. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
49b86007aa2bb599ada6cdbed7ff56246917f12e |
|
25-Mar-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: take postFactor into account when doing peephole optimizations Multiply operations can have a post-factor on them, which other ops don't support. Only perform the peephole optimizations when there is no post-factor involved. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89758 Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
5491458843998e8083baf9b62c14895946de1a3f |
|
07-Jul-2014 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nvc0/ir: remove merge/split pairs to allow normal propagation to occur Because the TGSI interface creates merges for each instruction source and then splits them back out, there are a lot of unnecessary merge/split pairs which do essentially nothing. The various modifier/etc propagation doesn't know how to walk though those, so just remove them when they're unnecessary. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
09ee907266f315300a7856b55e50e74dce8e946f |
|
06-Feb-2015 |
Roy Spliet <rspliet@eclipso.eu> |
nv50/ir: Fold IMM into MAD Add a specific optimisation pass for NV50 to check whether SRC0 or SRC1 is a MOV dst, IMM. If so: fold the IMM in and try to drop the MOV. Must be done post-RA because it requires that SDST == SSRC2. V2: improve readability and add comments to clarify decisions V3: Remove redundant code... compiler already attempts to put the IMM in SSRC1 Signed-off-by: Roy Spliet <rspliet@eclipso.eu> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
9e94b87b6012450d714edc6d0c46b15a89d5ce61 |
|
01-Jan-2015 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: fold MAD when one of the multiplicands is const Fold MAD dst, src0, immed, src2 (or src0/immed swapped) when - immed = 0 -> MOV dst, src2 - immed = +/- 1 -> ADD dst, src0, src2 These types of MAD patterns were observed in some st/nine shaders. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
6c2b079231f84b09b3f35183930afe522baee168 |
|
01-Sep-2014 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: avoid creating instructions that can't be emitted When constant folding a MAD operation, we first fold the multiply and generate an ADD. However we do so without making sure that the immediate can be handled in the saturate case. If it can't, load the immediate in a separate instruction. Reported-by: Tiziano Bacocco <tizbac2@gmail.com> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
a9b21015f5e3a6a37e53a8b3c755519f7b70479e |
|
08-Jul-2014 |
Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> |
nv50/ir: use unordered_set instead of list to keep track of var uses The set of variable uses does not need to be ordered in any way, and removing/adding elements is a fairly common operation in various optimization passes. This shortens runtime of piglit test fp-long-alu to ~22s from ~4h Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
4f4e9ba1661528bed8e956a4931ae154e6612824 |
|
04-Jun-2014 |
Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> |
nvc0/ir: Handle OP_POPCNT when folding constant expressions Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> [imirkin: make sure to only fold 1-arg popcnt in opnd] Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
fdc1d96b0ff59e163ed9fe894a1e6d08d4204b94 |
|
04-Jun-2014 |
Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> |
nvc0/ir: Handle OP_BFIND when folding constant expressions Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
4674343e8f37f336b68bb04212c928f28af66958 |
|
04-Jun-2014 |
Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> |
nvc0/ir: Handle reverse subop for OP_EXTBF when folding constant expressions Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
3164bfc73418e2e046c7a750eaac8a6d66dfe02d |
|
04-Jun-2014 |
Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> |
nv50/ir: clear subop when folding constant expressions Some operations (e.g. OP_MUL/OP_MAD/OP_EXTBF) might have a subop set. After folding, make sure that it is cleared Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
3b0867f35b5b294eb0d40524a6bc4c8de888a96f |
|
11-Jun-2013 |
Christoph Bumiller <e0425955@student.tuwien.ac.at> |
nv50/ir/opt: fix constant folding with saturate modifier Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.2" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
d2a3de19c6aa5881228734c73df706483a4aecf9 |
|
15-May-2014 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: fix constant folding for OP_MUL subop HIGH These instructions can come in either through IMUL_HI/UMUL_HI TGSI opcodes, or from OP_DIV constant folding. Also make sure that the constant foldings which delete the original instruction still get counted as having done something. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
7b9475fa652b9df6d599edbea8fa5049fdd995e1 |
|
09-May-2014 |
Ben Skeggs <bskeggs@redhat.com> |
nvc0: maxwell isa has no per-instruction join modifier Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
68f47cad0d23281309741cc47eeaa26ebbb41bca |
|
10-May-2014 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nv50/ir: make sure to reverse cond codes on all the OP_SET variants Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Ben Skeggs <bskeggs@redhat.com> Cc: "10.2 10.1" <mesa-stable@lists.freedesktop.org>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
46364a53ef30e5c97e1eeb5a879dd99a47415b73 |
|
27-Apr-2014 |
Ilia Mirkin <imirkin@alum.mit.edu> |
nvc0/ir: do constant folding of extbf/insbf Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
bbe3d6dc29f218e4d790e5ea359d3c6736e94226 |
|
09-Sep-2013 |
Dave Airlie <airlied@gmail.com> |
nouveau: fix regression since float comparison instructions (v2) Fix the return type and allow src and dst types for comparison to be separate, this at least fixes the two test cases I've written. v2: drop the u32->s32 change Acked-by: Christoph Bumiller <christoph.bumiller@speed.at> Signed-off-by: Dave Airlie <airlied@redhat.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|
5eb7ff1175a644ffe3b0f1a75cb235400355f9fb |
|
20-Aug-2013 |
Johannes Obermayr <johannesobermayr@gmx.de> |
Move nv30, nv50 and nvc0 to nouveau. It is planned to ship openSUSE 13.1 with -shared libs. nouveau.la, nv30.la, nv50.la and nvc0.la are currently LIBADDs in all nouveau related targets. This change makes it possible to easily build one shared libnouveau.so which is then LIBADDed. Also dlopen will be faster for one library instead of three and build time on -jX will be reduced. Whitespace fixes were requested by 'git am'. Signed-off-by: Johannes Obermayr <johannesobermayr@gmx.de> Acked-by: Christoph Bumiller <christoph.bumiller@speed.at> Acked-by: Ian Romanick <ian.d.romanick@intel.com>
/external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
|