Cross Reference: /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir

History log of /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
19963231a3245358c0e8fdd74c4654761e62b6c8	13-Jan-2017	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: optimize shl + and Address loading can often end up as shl + shr + shl combinations. The latter two are equal shifts, which get converted into an and mask. However if the previous shl is more than the mask is trying to remove (in terms of low bits), we can just remove the and entirely. This reduces some large shaders by as many as 3% of instructions (out of 2K). total instructions in shared programs : 6495509 -> 6491076 (-0.07%) total gprs used in shared programs : 954621 -> 954623 (0.00%) local gpr inst bytes helped 0 0 1014 1014 hurt 0 2 0 0 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
0404678c5f72162c9898c9c94ca67969106227c8	06-Oct-2016	Karol Herbst <karolherbst@gmail.com>	nv50/ir: start LocalCSE with getFirst to merge PHI instructions total instructions in shared programs : 3499888 -> 3499445 (-0.01%) total gprs used in shared programs : 453866 -> 453803 (-0.01%) total local used in shared programs : 21621 -> 21621 (0.00%) total bytes used in shared programs : 32078952 -> 32074936 (-0.01%) local gpr inst bytes helped 0 39 119 119 hurt 0 0 0 0 Signed-off-by: Karol Herbst <karolherbst@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
7b7eb7170d16ddb0963900ccf59b39956219373c	20-Oct-2016	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: it appears that OP_DISCARD can't take a join modifier nvdisasm does not print a .S even though the bit is set. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b7d9677de804375827b3c433027ec2dd32cd1da6	30-Sep-2016	Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>	nv50/ir: constant fold OP_SPLIT Split the source immediate value into new values and move them into the original defs set by the split. Since we can only have up to 64-bit immediates, this is largely beneficial for F64 (and, in the future, U64) operations. Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> [imirkin: always use U32, set newi for foldCount tracking] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
a6d6eff2e6ea2ccd585fe9bf1e159979cd3047df	12-Oct-2016	Ilia Mirkin <imirkin@alum.mit.edu>	nvc0/ir: be more careful about preserving modifiers in SHLADD creation src2 was being given the wrong modifier, and we were not properly managing the modifier on the SHL source either. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
87b06cab14c449e442be27650024f044e93c9a7c	07-Oct-2016	Samuel Pitoiset <samuel.pitoiset@gmail.com>	nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c) total instructions in shared programs :2286901 -> 2284473 (-0.11%) total gprs used in shared programs :335256 -> 335273 (0.01%) total local used in shared programs :31968 -> 31968 (0.00%) local gpr inst bytes helped 0 41 852 852 hurt 0 44 23 23 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
28ecd3eac24ce41b8a855a50f366f1985d1dc934	07-Oct-2016	Samuel Pitoiset <samuel.pitoiset@gmail.com>	nv50/ir: fix wrong check when optimizing MAD to SHLADD Checking if MAD is supported is definitely wrong, and it's more likely a typo I introduced few days ago which breaks NV50 because SHLADD is not supported there. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
f96945c5b5c3a52685e76795f03f75c75fb62fc7	06-Oct-2016	Karol Herbst <karolherbst@gmail.com>	nv50/ir: optimize sub(a, 0) to a helped some ue4 demos and divinity OS shaders total instructions in shared programs : 2818674 -> 2818606 (-0.00%) total gprs used in shared programs : 379273 -> 379273 (0.00%) total local used in shared programs : 9505 -> 9505 (0.00%) total bytes used in shared programs : 25837792 -> 25837192 (-0.00%) local gpr inst bytes helped 0 0 33 33 hurt 0 0 0 0 Signed-off-by: Karol Herbst <karolherbst@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
115c79be10bf3712a1e1bc25a563c90388c1bcaa	14-Sep-2016	Samuel Pitoiset <samuel.pitoiset@gmail.com>	nv50/ir: optimize SHLADD(a, b, c) to MOV((a << b) + c) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
2e008be9a9a4c94564c11718e0f6fc029caa0e44	14-Sep-2016	Samuel Pitoiset <samuel.pitoiset@gmail.com>	nv50/ir: optimize SHLADD(a, b, 0x0) to SHL(a, b) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
e4eb0fca024babcd7bea2b34a7e7605287963ce0	14-Sep-2016	Samuel Pitoiset <samuel.pitoiset@gmail.com>	nv50/ir: optimize IMAD to SHLADD in presence of power of 2 Only and only if src1 is a power of 2 we can replace IMAD by SHLADD. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
557a29b51fa3324cfbeecff100a54c7c6a6d87cd	18-Sep-2016	Samuel Pitoiset <samuel.pitoiset@gmail.com>	nv50/ir: optimize SUB(a, b) to MOV(a - b) This helps shaders in UE4 demos, especially with Elemental (+1% perf). This optimization reduces spilling usage in one shader which explains the little gain. GF100/GK104: total instructions in shared programs :2838551 -> 2838045 (-0.02%) total gprs used in shared programs :396706 -> 396684 (-0.01%) total local used in shared programs :34432 -> 34416 (-0.05%) local gpr inst bytes helped 1 19 112 112 hurt 0 0 0 0 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
3f5cf8c488bfc401d1d5503c1ec61874d7c1477d	20-Jul-2016	Samuel Pitoiset <samuel.pitoiset@gmail.com>	nv50/ir: allow to swap sources for OP_SUB This allows the load-propagation pass to swap the sources in presence of immediate values. Maxwell (GM107): total instructions in shared programs :1928187 -> 1927634 (-0.03%) total gprs used in shared programs :330741 -> 330154 (-0.18%) total local used in shared programs :28032 -> 28032 (0.00%) local gpr inst bytes helped 0 271 425 425 hurt 0 0 194 194 Fermi (GF114): total instructions in shared programs :2334474 -> 2333829 (-0.03%) total gprs used in shared programs :380934 -> 380215 (-0.19%) total local used in shared programs :33304 -> 33264 (-0.12%) local gpr inst bytes helped 5 314 521 521 hurt 0 4 195 195 No regressions on GM107 and GF114 with full piglit. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
062c6b8e54c14adcc1ec603fad524f38fe058e67	19-Jun-2016	Ilia Mirkin <imirkin@alum.mit.edu>	nv50: fix alphatest for non-blendable formats The hardware can only do alphatest when using a blendable format. This means that the various *16 norm formats didn't work with alphatest. It appears that Talos Principle uses such formats, as well as alpha tests, for some internal renders, which made them be incorrect. However this does not appear to affect the final renders, but in a different game it easily could. The approach we take is that when alphatests are enabled and a suitable format is used (which we anticipate is the vast minority of the time), we insert code into the shader to perform the comparison and discard. Once inserted, that code lives in the shader forever, and we re-upload it each time the function changes with a fixed-up compare. To avoid re-uploading too often, if we switch back to a blendable format, the test is (effectively) disabled and the hw alphatest functionality is used. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
df2881381ac67c42aa8ec9e0ed28f21a1d253785	26-May-2016	Ilia Mirkin <imirkin@alum.mit.edu>	nvc0/ir: handle a load's reg result not being used for locked variants For a load locked, we might not use the first result but the second result is the predicate result of the locking. In that case the load splitting logic doesn't apply (which is designed for splitting 128-bit loads). Instead we take the predicate and move it into the first position (as having a dead result in first def's position upsets all sorts of things including RA). Update the emitters to deal with this as well. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
47b390fe45e5e6f982c60b58985892438959cd8e	17-May-2016	Jan Vesely <jano.vesely@gmail.com>	Treewide: Remove Elements() macro Signed-off-by: Jan Vesely <jano.vesely@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
787a53988cc6bb7a0f2b43c216837d683336b33f	21-Apr-2016	Hans de Goede <hdegoede@redhat.com>	nouveau: codegen: combineLd/St do not combine indirect loads combineLd/St would combine, i.e. : st u32 # g[$r2+0x0] $r2 st u32 # g[$r2+0x4] $r3 into: st u64 # g[$r2+0x0] $r2d But this is only valid if r2 contains an 8 byte aligned address, which is not guaranteed for compute shaders This commit checks for src0 dim 0 not being indirect when combining loads / stores as combining indirect loads / stores may break alignment rules. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
059308db841886101586aa3ec5ac74b89abf1a20	07-Apr-2016	Samuel Pitoiset <samuel.pitoiset@gmail.com>	nv50/ir: do not try to attach JOIN ops to ATOM This might result in an INVALID_OPCODE dmesg error in case a join is attached to an atomic operation. Spotted with arb_shader_image_load_store-host-mem-barrier on GK104. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b3e7fb52349848b24f005c07859bc43691bd64bd	13-Mar-2016	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: avoid folding mul + add if the mul has a dnz Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
edf774bb7eae32f00b900a6faa9b5c698affdcaa	28-Jan-2016	Karol Herbst <nouveau@karolherbst.de>	nv50/ir: we can't do the add to mad conversion when the mul saturates Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
068e9848ba5937673d66c42a4b44067fd78becaf	24-Jan-2016	Karol Herbst <nouveau@karolherbst.de>	nv50/ir: optimize neg(and(set, 1)) to set helps shaders in saints row IV, bioshock infinite and shadow warrior total instructions in shared programs : 1914931 -> 1903900 (-0.58%) total gprs used in shared programs : 247920 -> 247785 (-0.05%) total local used in shared programs : 5673 -> 5673 (0.00%) total bytes used in shared programs : 17558272 -> 17457320 (-0.57%) local gpr inst bytes helped 0 137 719 719 hurt 0 12 0 0 v2: remove this opt for OP_SLCT and check against float for OP_SET v3: simplified the code Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
7f19e293055d2a9897df803efa310c293280ab8f	30-Jan-2016	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: get rid of memory stores with nop values This happens especially with exports and varying packing, where the last bits aren't always filled in. We end up trying to do quad-wide stores, which ends up being a lot of register moves that carefully preserve the nop value. Instead don't do the stores. total instructions in shared programs : 6131375 -> 6125267 (-0.10%) total gprs used in shared programs : 910139 -> 895501 (-1.61%) total local used in shared programs : 15328 -> 15328 (0.00%) local gpr inst helped 0 7442 4693 hurt 0 90 2687 Most of the helped/hurt instruction changes are by one or two ops because can no longer do quad-wide stores in all cases. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
3ca941d60ed38800038cd545842e0ed3a69946da	30-Jan-2016	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: fix false global CSE on instructions with multiple defs If an instruction has multiple defs, we have to do a lot more checks to make sure that we can move it forward. Among other things, various code likes to do a, b = tex() if () c = a else c = b which means that a single phi node will have results pointing at the same instruction. We obviously can't propagate the tex in this case, but properly accounting for this situation is tricky. Just don't try for instructions with multiple defs. This fixes about 20 shaders in shader-db, including the dolphin efb2ram shader. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
c3083c70823d8f4bfdabcf38f98dfebeff0a2b2b	03-Jan-2016	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: add support for BUFFER accesses This largely leaves the existing image logic alone. When image support is added this will have to be harmonized somehow. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
29d09f8747abea35f4deadced0196725d4ab89cf	27-Jan-2016	Karol Herbst <nouveau@karolherbst.de>	nv50/ir: optimize mad/fma with third argument 0 to mul Very modest effect, but it's clearly the right thing to do. total instructions in shared programs : 6131491 -> 6131398 (-0.00%) total gprs used in shared programs : 910157 -> 910131 (-0.00%) total local used in shared programs : 15328 -> 15328 (0.00%) local gpr inst bytes helped 0 55 85 85 hurt 0 26 20 20 Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
3aa681449ed030ba8b9c56f0a6f2b08bd1fb15a6	27-Jan-2016	Karol Herbst <nouveau@karolherbst.de>	nv50/ir: run DCE backwards Reduces calls up to 50% Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
978ae28ca279354852a586b202e705db3d596041	27-Jan-2016	Karol Herbst <nouveau@karolherbst.de>	nv50/ir: optimize shl(shr(a, c), c) to and(a, ~((1 << c) - 1)) Following shader-db results on GK110: total instructions in shared programs : 6141510 -> 6131491 (-0.16%) total gprs used in shared programs : 910187 -> 910157 (-0.00%) total local used in shared programs : 15328 -> 15328 (0.00%) local gpr inst bytes helped 0 18 821 821 hurt 0 0 0 0 Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
dc3ac418bf889620c93f50c68ef55b9e9de3afd3	20-Jan-2016	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: don't flip SHL(ADD) into ADD(SHL) if ADD sources have modifiers Fixes: 31fde8fa (nv50/ir: flip shl(add, imm) into add(shl, imm)) Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
a31819cff8f4560786d731f5f1de6ba814368a2f	18-Jan-2016	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: swap the least-ref'd source into src1 when both const/imm The whole point of inlining sources is to reduce loads. We can end up in a situation where one value is used a lot of times, and one value is used only once per instruction. The once-per-instruction one is the one that should get inlined, but with the previous algorithm, it was given no preference. This flips things around to preferring putting less-referenced values into src1 which increases the likelihood of them being inlined. While we're at it, adjust the heuristic to not treat 0 as an immediate, as well as (effectively) check for situations where LIMMs can't be loaded. All this yields improvements on nvc0: total instructions in shared programs : 6261157 -> 6255985 (-0.08%) total gprs used in shared programs : 945082 -> 943417 (-0.18%) total local used in shared programs : 30372 -> 30288 (-0.28%) total bytes used in shared programs : 50089256 -> 50047880 (-0.08%) local gpr inst bytes helped 21 822 3332 3332 hurt 0 278 565 565 And more importantly avoids generating really bad code with SSBOs, where we end up checking a lot of different values (usually immediates) against the length. On nv50 we get comparable results, and even improve packing (bytes went down more than instructions): total instructions in shared programs : 6346564 -> 6341277 (-0.08%) total gprs used in shared programs : 728719 -> 725131 (-0.49%) total local used in shared programs : 3552 -> 3552 (0.00%) total bytes used in shared programs : 43995688 -> 43932928 (-0.14%) local gpr inst bytes helped 0 1380 3252 3774 hurt 0 287 1710 1365 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
d50e6128b815595f7918d6818e8a9cd20d53efd1	07-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: attempt to do more constant folding on mad -> add conversion The add might actually have a 0 as an argument, which would convert it into a mov. Make sure to detect that. Also avoid the hack of putting the immediate directly into the instruction, instead use a mov to put it into place and let the later LoadPropagation pass place it if possible. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
724134f68322087ef88bc590febd0011167ae367	29-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: float(s32 & 0xff) = float(u8), not s8 Make sure to make conversion unsigned when we're ANDing the high bits away. Fixes corruption in dolphin. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
d35695096de2358aef40452b5e3304a02534f7db	10-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: combine sequences of conversions In some cases shaders want non-default rounding when converting float to integer. This can be done in one go, so merge the two ops. This comes up in the packUnorm4x8 & co functions, as well as a few random shaders. Overall shader-db impact is minimal, helping a handful of witcher2 and other misc shaders. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
a0b5d5beedb5bc5dcfd4c62c02576fdddf63d1f0	11-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: teach post-ra immediate folding into mad about integers There will usually be a split before the mad op, peer through that and pick out the right word of the immediate. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
ab70ea1353ac9859ee51d236482fe92a0493362d	11-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: add short imad support Support emission of the short imad, but also include it in the various logic that tries to make it possible to emit. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
6aca7fecb7f7b6c67cf0315e781060a8d1d4b704	10-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: can't have predication and immediates Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
a27548400ea02c39b6602526eb697c673c7d22bb	09-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: fix assumption that prog->maxGPR is in 32-bit reg units On NV50, we use 16-bit reg units (to make it all work with half-regs). A few places assumed that it was always in 32-bit units. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
0f647bd65bae16c7a2dc7a960c96593ad6ab729c	08-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: check if the target supports the new offset before inlining Fixes: abd326e81b (nv50/ir: propagate indirect loads into instructions) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93300 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
f97f755192210ce3690e67abccefa133d398d373	08-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nvc0/ir: fix up mul+add -> mad algebraic opt, enable for integers For some reason this has been disabled for integers ever since codegen was merged, despite there being emission code for IMAD. Seems to work. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
0ef5c8ab7405fcc76b23393d4414f46cc9edb1fc	04-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: fold shl + mul with immediates On SM20 this gives: total instructions in shared programs : 6299222 -> 6294240 (-0.08%) total gprs used in shared programs : 944139 -> 944068 (-0.01%) total local used in shared programs : 54116 -> 54116 (0.00%) local gpr inst bytes helped 0 126 2781 2781 hurt 0 55 11 11 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
abd326e81b06f58797be94bd655ee06b17a34f0c	04-Dec-2015	Ilia Mirkin <imirkin@alum.mit.edu>	nv50/ir: propagate indirect loads into instructions This way $r1 = $r0 + 4; c1[$r1] becomes c1[$r0+4]. On SM35: total instructions in shared programs : 6206257 -> 6185058 (-0.34%) total gprs used in shared programs : 911045 -> 910722 (-0.04%) total local used in shared programs : 39072 -> 39072 (0.00%) local gpr inst bytes helped 0 417 4195 4195 hurt 0 280 0 0 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> /external/mesa3d/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp