History log of /external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
fc10dc9fdea6ad7d04dfcdb8fd2e2d59ea67f68b 22-Dec-2016 Rob Clark <robdclark@gmail.com> freedreno/ir3: rework location of driver constants

Rework how we lay out driver constants (driver-params, UBO/TFBO buffer
addresses, immediates) for more flexibility. For a5xx+ we need to deal
with the fact that gpu ptrs are 64b instead of 32b, which makes the
fixed offset scheme not work so well. While we are dealing with that
we might also make the layout more dynamic to account for varying # of
UBOs, etc.

Signed-off-by: Rob Clark <robdclark@gmail.com>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
46ff17559b1369f0fe7dca000f51f077ffd1dae5 23-May-2016 Rob Clark <robclark@freedesktop.org> freedreno/ir3: disable cp for indirect src's

The variable-indexing tests always had a few random fails, which I
usually couldn't reproduce when running tests manually. Somehow
recently this got a lot worse. I ported a couple of the shaders to
GLES to see what blob does, and it also seems to be avoiding to cp
indirect srcs. So I guess indirect w/ instructions other than cat1
(mov) are not totally reliable. Let's just switch that off until
this is better understood.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
f0a1f3de27831d1ccb8a51fc0c99d63f25fd6e2a 01-May-2016 Rob Clark <robclark@freedesktop.org> freedreno/ir3: cp small negative integers too

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
173871dfb988c3e9fb74a8016d2b024619a5d918 11-Apr-2016 Rob Clark <robclark@freedesktop.org> freedreno/ir3: lower immeds to const

Helps reduce register pressure and instruction counts for immediates
that would otherwise require a mov into gpr.

total instructions in shared programs: 4455332 -> 4369297 (-1.93%)
total dwords in shared programs: 8807872 -> 8614432 (-2.20%)
total full registers used in shared programs: 263062 -> 250846 (-4.64%)
total half registers used in shader programs: 9845 -> 9845 (0.00%)
total const registers used in shared programs: 1029735 -> 1466993 (42.46%)

half full const instr dwords
helped 0 10415 0 17861 5912
hurt 0 1157 21458 947 33

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
b15c7fc268785cc8c960368d287ec799fe9dc502 11-Apr-2016 Rob Clark <robclark@freedesktop.org> freedreno/ir3: add ir3_cp_ctx

Needed in next commit.. just split out to reduce noise.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
dfd23abdcce7fb01da842d2fc69d54a04ecdfee2 25-Apr-2016 Rob Clark <robclark@freedesktop.org> freedreno: disallow cat4 immed src

Normally this would never happen (constant-propagation in NIR would
eliminate the instruction), except it does happen for 'undef' which
we turn into immed 0.0 for bookkeeping purposes.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
b35ad6e7016a877def4b3d2d22b3a2265a68e2eb 07-Apr-2016 Rob Clark <robclark@freedesktop.org> freedreno/ir3: cleanup double cmps.s from frontend

Since we cannot mov into a predicate register, the frontend uses a
'cmps.s p0.x, cond, 0' as a stand-in for mov to p0.x. It does this
since it has no way to know that the source cond instruction (ie.
for a kill, br, etc) will only be used to write the predicate reg.
Detect this, and re-write the instruction writing p0.x to skip the
original cmps.[sfu]. (It is done like this, rather than re-writing
the dest of the first cmps.[sfu] in case the first cmps.[sfu]
actually has other users.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
f9cdbf44054009122fcc16c887fb90ccc33b52c9 05-Apr-2016 Rob Clark <robclark@freedesktop.org> freedreno/ir3: eliminate unnecessary absneg's

The frontend inserts (abs) and (neg)'s to convert between NIR boolean
(~0/0) and native boolean (1/0). So we'd end up with things like:

cmps.s.ge r1.x, ...
absneg.s r1.x, (neg)r1.x
absneg.s r1.x, (abs)r1.x
sel.b32 r2.x, r0.x, r1.x, r0.y

The (neg) already gets collapsed due to the following (abs). Now by
realizing that r1.x comes from a cmps.s instruction, we can drop the
(abs) as well.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
8e451c2d06d18ee54dc3098b3987af6e0bc59f5e 04-Apr-2016 Rob Clark <robclark@freedesktop.org> freedreno/ir3: don't cp into phi's

The block defining a phi source might not have been executed. If we
allow copy propagation, we could end up pointing to a src instruction in
the wrong block.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
383b6e87f90e0ac84a200e1677a44b370976c93b 04-Apr-2016 Rob Clark <robclark@freedesktop.org> freedreno/ir3: we can't store immediate values

Fixes some transform-feedback piglits, like:

bin/ext_transform_feedback-nonflat-integral

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
19739e4fb9024f42a8fc332e6fa94c292bb6bc16 27-Mar-2016 Rob Clark <robclark@freedesktop.org> freedreno/ir3: remove ir3_instruction::category

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
ddede497b831fb98e3540247b570968532cdacc9 15-Jan-2016 Rob Clark <robclark@freedesktop.org> freedreno/ir3: workaround bug/feature

Seems like in certain cases, we cannot use c<a0.x+0> as the third src to
cat3 instructions. This may be slightly conservative, we may only have
this restriction when the first src is also const.

This fixes, for example, +24/-0 of the variable-indexing piglit tests.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
fad158a0e01f4c28851477e6d1eb5c8fd67e226b 10-Jan-2016 Rob Clark <robclark@freedesktop.org> freedreno/ir3: array rework

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
cc7ed34df94a389e60221228896332d0ac8ea6c6 10-Jan-2016 Rob Clark <robclark@freedesktop.org> freedreno/ir3: refactor/simplify cp

If we handle separately the special case of eliminating output mov
(which includes keeps and various other cases where we don't have a
consuming instruction's src register to collapse things into), we
can simplify the logic.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
96d4db683f90f02e72d34ece544de7eedfa873ee 25-Jul-2015 Rob Clark <robclark@freedesktop.org> freedreno/ir3: track "keeps" in ir

Previously we had a fixed array to track kills, since they don't
generate an SSA value, and then cheated by stuffing them in the
outputs array before sending things through depth/sched/etc. But
store instructions will need similar treatment. So convert this
over to a more general array of instructions that must be kept
and fix up the places that were previously relying on kills being
in the output array.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
6b9f5cd5f7b25e9e03104fe279df74817f69fe87 02-Jul-2015 Rob Clark <robclark@freedesktop.org> freedreno/ir3: fix indirects tracking

cp would update instr->address but not update the indirects array
resulting in sched getting confused when it had to 'spill' the address
register. Add an ir3_instr_set_address() helper to set instr->address
and also update ir->indirects, and update all places that were writing
instr->address to use helper instead.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
457f7c2a2a93b45396ac66e0d4b3896d2db8fdf3 09-Jun-2015 Rob Clark <robdclark@gmail.com> freedreno/ir3: block reshuffling and loops!

This shuffles things around to allow the shader to have multiple basic
blocks. We drop the entire CFG structure from nir and just preserve the
blocks. At scheduling we know whether to schedule conditional branches
or unconditional jumps at the end of the block based on the # of block
successors. (Dropping jumps to the following instruction, etc.)

One slight complication is that variables (load_var/store_var, ie.
arrays) are not in SSA form, so we have to figure out where to put the
phi's ourself. For this, we use the predecessor set information from
nir_block. (We could perhaps use NIR's dominance frontier information
to help with this?)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
c8fb5f8a011e1db78af3ceaf91c5cb3b1acaee14 25-May-2015 Rob Clark <robclark@freedesktop.org> freedreno/ir3: move inputs/outputs to shader

These belong in the shader, rather than the block. Mostly a lot of
churn and nothing too interesting. But splitting this out from the
rest of ir3_block reshuffling to cut down the noise in the later
patch.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
d52fb2f5ad828f879286b9068023b82b9897bc17 01-May-2015 Rob Clark <robclark@freedesktop.org> freedreno/ir3/ra: use register_allocate

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
adf1659ff5f07d907eca552be3b566e408c8601e 30-Apr-2015 Rob Clark <robclark@freedesktop.org> freedreno/ir3: use standard list implementation

Use standard list_head double-linked list and related iterators,
helpers, etc, rather than weird combo of instruction array and next
pointers depending on stage. Now block has an instrs_list. In
certain stages where we want to remove and re-add to the blocks list
we just use list_replace() to copy the list to a new list_head.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
87807e5cc50f404a8e3ec8864bf8b7427ab6d687 12-Apr-2015 Rob Clark <robclark@freedesktop.org> freedreno/ir3: move out helper

We'll also want it in NIR f/e for implementing UBO support.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
d5357c16cc0ccd84c3475778fcc08a025b8c24f7 08-Apr-2015 Rob Clark <robclark@freedesktop.org> freedreno/ir3/cp: handle indirect properly

I noticed some cases where we where trying to copy-propagate indirect
src's into places they cannot go, like 2nd src for cat3 (mad, etc).
Expand out valid_flags() to be aware of relativ flag, and fix up a few
related spots.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
f0e9a632a12798bd727799e396cde665bd960665 06-Apr-2015 Rob Clark <robclark@freedesktop.org> freedreno/ir3/cp: support to swap mad src's

For a normal MAD (ie. not MADSH), if first source is gpr and second
source is const, we can swap the first two sources to avoid needing a
mov instruction.

This gives back the biggest advantage TGSI f/e had over NIR f/e for
common shaders, since TGSI f/e had this logic in the f/e. Note that
doing this in copy-prop step has the advantage that it will also work
for cases like:

MOV TEMP[b], CONST[x]
MAD TEMP[d], TEMP[a], TEMP[b], TEMP[c]

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
f370e95421f553ace931a02743c96be80fd62dc8 29-Mar-2015 Rob Clark <robclark@freedesktop.org> freedreno/ir3: handle const/immed/abs/neg in cp

Be smarter about propagating copies from const or immed, or with abs/neg
modifiers. Also, realize that absneg.s and absneg.f are really "fancy"
mov instructions.

This opens up the possibility to remove more copies. It helps the TGSI
frontend a bit, but will be really needed for the NIR f/e which builds
everything up in SSA form (ie. will *always* insert a mov from const or
immediate).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
104713d9f2dced94a427004a25c54b2c7feee166 29-Mar-2015 Rob Clark <robclark@freedesktop.org> freedreno/ir3: split float/int abs/neg

Even though in the end, they map to the same bits, the backend will need
to be able to differentiate float abs/neg vs integer abs/neg. Rather
than making the backend figure it out based on instruction opcode (which
when combined with mov/absneg instructions, can be awkward), just split
out different flags for each so the frontend can signal it's intentions
more clearly. Also, since (neg) for bitwise op's is actually a bitwise-
not, split it out into bnot flag.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
060d3499202c339a27fbc366335f2122ed4ff7bc 23-Jan-2015 Rob Clark <robclark@freedesktop.org> freedreno/ir3: relative dst

To simplify RA, assign arrays that are written to first. Since enough
dependency information is in the graph to preserve order of reads and
writes of array, so all SSA names for the array collapse into one, just
assign the entire thing by array-id.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
17754b70d78649f29e25dfe938de91d64dbf5ebf 04-Feb-2015 Rob Clark <robclark@freedesktop.org> freedreno/ir3: drop deref nodes

The meta-deref instruction doesn't really do what we need for relative
destination. Instead, since each instruction can reference at most a
single address value, track the dependency on the address register via
instr->address. This lets us express the dependency regardless of
whether it is used for dst and/or src.

The foreach_ssa_src{_n} iterator macros now also iterates the address
register so, at least in SSA form, the address register behaves as an
additional virtual src to the instruction. Which is pretty much what
we want, as far as scheduling/etc.

TODO:
For now, the foreach_src{_n} iterators are unchanged. We could wrap
the address in an ir3_register and make the foreach_src_{_n} iterators
behave the same way. But that seems unnecessary at this point, since
we mainly care about the address dependency when in SSA form.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
f8f7548f466509bf881db1826ef6dd23ffe2acdf 02-Feb-2015 Rob Clark <robclark@freedesktop.org> freedreno/ir3: helpful iterator macros

I remembered that we are using c99.. which makes some sugary iterator
macros easier. So introduce iterator macros to iterate all src
registers and all SSA src instructions. The _n variants also return
the src #, since there are a handful of places that need this.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
9a9f2a893b5e29a77d66671191653f0b4261f546 25-Oct-2014 Rob Clark <robclark@freedesktop.org> freedreno/ir3: simplify RA

Group inputs/outputs, in addition to fanin/fanout, as they must also
exist in sequential scalar registers. This lets us simplify RA by
working in terms of neighbor groups.

NOTE: has the slight problem that it can't optimize out mov's for things
like:

MOV OUT[n], IN[m]

To avoid this, instead of trying to figure out what mov's we can
eliminate, we first remove all mov's prior to grouping, and then
re-insert mov's as needed while grouping inputs/outputs/fanins.
Eventually we'd prefer the frontend to not insert extra mov's in the
first place (so we don't have to bother removing them). This is the
plan for an eventual NIR based frontend, so separate out the instr
grouping (which will still be needed for NIR frontend) from the mov
elimination (which won't).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
13862812dc910a4ef57cb72cb9fe777ce3c14515 24-Oct-2014 Rob Clark <robclark@freedesktop.org> freedreno/ir3: consider instruction neighbors in cp

Fanin (merge) nodes require it's srcs to be "adjacent" in consecutive
scalar registers. Keep track of instruction neighbors in copy-
propagation step and avoid eliminating mov's which would cause an
instruction to need multiple distinct left and/or right neighbors.

This lets us not fall on our face when we encounter things like:

1: MOV TEMP[2], IN[0].xyzw
2: TEX OUT[0].xy, TEMP[2], SAMP[0], SHADOW2D
3: MOV TEMP[2].xy, IN[0].yxzz
4: TEX OUT[0].zw, TEMP[2], SAMP[0], SHADOW2D
5: END

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
3dd9a0d6fdea1ff5b1fe903fce206bf1d1515400 01-Oct-2014 Ilia Mirkin <imirkin@alum.mit.edu> freedreno/ir3: avoid fan-in sources referring to same instruction

Since the RA has to be done s.t. each one gets its own (adjacent)
register, it would complicate matters if instructions were allowed to be
repeated. This enables copy-propagation use in situations where
previously that might have happened.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c
db193e5ad06e7a2fbcffb3bb5df85d212eb12291 25-Jul-2014 Rob Clark <robclark@freedesktop.org> freedreno/ir3: split out shader compiler from a3xx

Move the bits we want to share between generations from fd3_program to
ir3_shader. So overall structure is:

fdN_shader_stateobj -> ir3_shader -> ir3_shader_variant -> ir3
|- ...
\- ir3_shader_variant -> ir3

So the ir3_shader becomes the topmost generation neutral object, which
manages the set of variants each of which generates, compiles, and
assembles it's own ir.

There is a bit of additional renaming to s/fd3_compiler/ir3_compiler/,
etc.

Keep the split between the gallium level stateobj and the shader helper
object because it might be a good idea to pre-compute some generation
specific register values (ie. anything that is independent of linking).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
/external/mesa3d/src/gallium/drivers/freedreno/ir3/ir3_cp.c