History log of /external/skia/src/jumper/SkJumper_stages_lowp.cpp
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
529cb2cd75ac19e0d0d97dd15122de4bb9b586b5 21-Feb-2018 Mike Reed <reed@google.com> lowp impl for decal stages

Bug: skia:
Change-Id: If6481d202bf22a95f1dea0c5bf7d84698b63869a
Reviewed-on: https://skia-review.googlesource.com/109241
Commit-Queue: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
ac568a934f8f82bf3a359b757d67eb3a797d3593 25-Jan-2018 Mike Klein <mtklein@chromium.org> 1010102, 101010x, 888x in sw

Same sort of deal as before, now with all three new formats.
While I was at it, I made sure RGBA 8888 and BGRA 8888 both work too.

We don't want the 101010's in lowp, but 888x should be fine.

After looking at the DM images on monitors at work, I decided to
re-enable dither even on 10-bit images.

Looking at the GMs in 888x or 101010x is interesting... I think we must
not be clearing the memory allocated for layers? Seems like we want to
allocate layers as 8888?

Change-Id: I3a85b4f00877792a6425a7e7eb31eacb04ae9218
Reviewed-on: https://skia-review.googlesource.com/101640
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
376fd31ad485c3df35d934c56364ff0c34eacdfa 11-Dec-2017 Mike Klein <mtklein@chromium.org> remove vfpv4 requirement for SkJumper on ARMv7

VFPv4 gives us two interesting features:
- FMA
- f16<->f32 conversions

Even without FMAs, NEON still has non-fused MLA instructions. We don't
really care about the fusedness of those mul-adds, so losing FMA here is
kind of no big deal.

We already maintain portable code to do f16<->f32 conversions, so it's
not much of a maintanence hit to use that instead of the native
instructions. To my knowledge software F16 rendering is not a
performance critical mode of operation for any of our users.

This drops our minimum requirement to basically just having NEON.
Devices like the Nexus 7 2012 will now take SkJumper fast paths
instead of portable code. (Though actually, we've only ever
required NEON for _lowp... only the float code also needed vfpv4).

The main file to look at here is actually SkJumper_vectors.h,
where you will see all the substantive changes. The rest just
kind of tears down most of the old complexity, add adds ABI
to put just a little of it back. :)

Change-Id: Ia9237117698729c91e5fa51126baf80748093bf4
Bug: skia:
Reviewed-on: https://skia-review.googlesource.com/83521
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
c0ae2c8a6272bec976a6b617408f98856c79862d 10-Nov-2017 Florin Malita <fmalita@chromium.org> AVX2 specialization for lowp gradient lookup

705.32 -> 457.76 gradient_sweep_clamp_3color
609.38 -> 345.34 gradient_radial1_clamp_3color


Change-Id: I0165ac8f004ee095ada4f12b33db0a94ae39fca3
Reviewed-on: https://skia-review.googlesource.com/69902
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Florin Malita <fmalita@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
9af41d7caa30a5937dec50130d46a37c02405296 10-Nov-2017 Greg Daniel <egdaniel@google.com> Revert "more powerful map()"

This reverts commit a3dd5ec3a769fb833ce77878cd4e551c15e5074d.

Reason for revert: breaking build on Build-Debian9-Clang-x86_64_Release-Fast
Original change's description:
> more powerful map()
>
> Change-Id: Icbae002999a295e3a9d1d2e6046e686784d5f608
> Reviewed-on: https://skia-review.googlesource.com/69901
> Reviewed-by: Florin Malita <fmalita@chromium.org>
> Commit-Queue: Mike Klein <mtklein@chromium.org>

TBR=mtklein@chromium.org,fmalita@chromium.org

Change-Id: Ice989dd6a6b2786f318791dd91f2c06f689cb979
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/70105
Reviewed-by: Greg Daniel <egdaniel@google.com>
Commit-Queue: Greg Daniel <egdaniel@google.com>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
a3dd5ec3a769fb833ce77878cd4e551c15e5074d 10-Nov-2017 Mike Klein <mtklein@chromium.org> more powerful map()

Change-Id: Icbae002999a295e3a9d1d2e6046e686784d5f608
Reviewed-on: https://skia-review.googlesource.com/69901
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
5e9fdefae2ea05e5e939ac7b7e7a16f05cbd7f58 10-Nov-2017 Florin Malita <fmalita@chromium.org> AVX2 gather for lowp

Change-Id: I15f83a72645fed0ed8dca9c9aad66c5db5eb247a
Reviewed-on: https://skia-review.googlesource.com/69920
Commit-Queue: Florin Malita <fmalita@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
e95a62faa0e615af3971981040fe0f90e8a489f5 01-Nov-2017 Mike Klein <mtklein@chromium.org> add some lowp gradient stages

I was originally going to add these to help test a lowp dither, but
after looking at diffs I don't think lowp dither is a good idea.

Non-dithered lowp gradients look fine to me so far.

I'd have done conics, but they scare me.

Change-Id: I8f5e75aec726983186214845ca38cfa0d54496b3
Reviewed-on: https://skia-review.googlesource.com/66460
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
0b78a6912eba5b50eeb4a13f735957dd57ca6be2 31-Oct-2017 Mike Klein <mtklein@chromium.org> another batch of lowp stages

The 4444 image in all_bitmap_configs now draws slightly different before
and after serialization. (It's serialized as 8888.) Still looks fine.

Change-Id: I1396cf1550b6769a1734ed25d59bd5b1866dfacd
Reviewed-on: https://skia-review.googlesource.com/65960
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
0cb158969c87b49e9da53a82ab6d18ad3d6d0775 24-Oct-2017 Mike Klein <mtklein@chromium.org> add srcover_bgra_8888

Chrome generally uses BGRA buffers, so srcover_rgba_8888 isn't really
doing them any good. Probably a good idea to cover both kN32 options
any time we specialize like this?

There's one small diff, so I've lazily guarded this by
SK_LEGACY_LOWP_STAGES, which I want to rebaseline today anyway.

Change-Id: Ice672aa01a3fc83be0798580d6730a54df075478
Reviewed-on: https://skia-review.googlesource.com/63301
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
be0bd925614bcfdea859416177b527294a6c92b1 20-Oct-2017 Mike Klein <mtklein@chromium.org> more easy lowp shader stages

This fills out a couple more matrix and gather stages.

Deletes a not particularly important unit test that was using a
scale matrix in a weird, non-lowp compatible way.

This will require guards for Blink layout tests.

Change-Id: I54cb228ff541f771e8f4758f07d26c5161d48af3
Reviewed-on: https://skia-review.googlesource.com/62520
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
955ed3d9b6d4dc5450ce0c6c86c610f831b0c38a 17-Oct-2017 Mike Klein <mtklein@chromium.org> start on lowp shaders

We're going to want to assign types to the stages depending on their
inputs and outputs:
GG: x,y -> x,y
GP: x,y -> r,g,b,a
PP: r,g,b,a -> r,g,b,a

(There are a couple other degenerate cases here, where a stage ignores
its inputs or creates no outputs, but we can always just pretend their
null input or output is one type or the other arbitrarily.)

The GG stages will be pretty much entirely float code, and the GP stages
a mix of float math and byte stuff.

Since we've chosen U16 to match our register size in _lowp land,
we'll unpack each F register across two of those for transport between
stages. This is a notional, free operation in both directions.

Change-Id: I605311d0dc327a1a3a9d688173d9498c1658e715
Reviewed-on: https://skia-review.googlesource.com/60800
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
b4379132d12cd11e52944331487b045f1f32989d 17-Oct-2017 Mike Klein <mtklein@chromium.org> rename (x,y) to (dx,dy)

Today (x,y) are the integer coordinates of the first destination pixel
we're working on. By renaming them (dx,dy), we free up the names (x,y)
for working (i.e. _source_) x and y.

Until now we've generally just been continuing to call those (r,g), but
in the _lowp code that won't be possible (r+g hold x together, b+a y)
but we'll have the ability to just give them proper names x and y.

Change-Id: Id5faa09c4406116df5df7494efc6cb23659e9a2f
Reviewed-on: https://skia-review.googlesource.com/60820
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
f757ae6aa74df9028ec1c9e14fed2b710c32ac23 15-Sep-2017 Mike Klein <mtklein@chromium.org> Retry, Bump stored lowp uniform color to 16-bit storage.

This makes loading into 16-bit channels more natural in _lowp.cpp.
Update a unit test to stop using out-of-range "colors".

Change-Id: I494687aac87948b60a40de447aa1527cf7167b2d
Cq-Include-Trybots: skia.primary:Test-Debian9-Clang-GCE-CPU-AVX2-x86_64-Release-UBSAN_float_cast_overflow
Reviewed-on: https://skia-review.googlesource.com/47580
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
7e4e99386fe6e7bf131ba8461dcb1990bf15e346 15-Sep-2017 Mike Klein <mtklein@chromium.org> Implement some easy _lowp stages.

- load_565 allows 565-src sprite blits
- scale_565 / lerp_565 allow subpixel text
- luminance_to_alpha is a color filter, and lets us write grey 8

And update CachedDecodingPixelRefTest with a yet more robust color.

Change-Id: I8af499c43f0f28093744d9c2993af553e36c9526
Reviewed-on: https://skia-review.googlesource.com/47021
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
6e80aea309d90ae6618fb5df4eb0fb81d63d8278 16-Sep-2017 Mike Klein <mtklein@google.com> Revert "Bump stored lowp uniform color to 16-bit storage."

This reverts commit d286bfbd96f8b7ccf1cbce74f07d2f3917dbec30.

Reason for revert:

../../../src/core/SkRasterPipeline.cpp:98:34: runtime error: 4.87906e+09 is outside the range of representable values of type 'unsigned short'

Excellent new bot!

Original change's description:
> Bump stored lowp uniform color to 16-bit storage.
>
> This makes loading into 16-bit channels more natural in _lowp.cpp.
>
> Change-Id: I1ed393873654060ef52f4632d670465528006bbd
> Reviewed-on: https://skia-review.googlesource.com/47261
> Reviewed-by: Mike Reed <reed@google.com>
> Commit-Queue: Mike Klein <mtklein@chromium.org>

TBR=mtklein@chromium.org,reed@google.com

Change-Id: Ia65645c1261a7b31588c4ddaf2b1b3b327d265b0
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/47540
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
d286bfbd96f8b7ccf1cbce74f07d2f3917dbec30 15-Sep-2017 Mike Klein <mtklein@chromium.org> Bump stored lowp uniform color to 16-bit storage.

This makes loading into 16-bit channels more natural in _lowp.cpp.

Change-Id: I1ed393873654060ef52f4632d670465528006bbd
Reviewed-on: https://skia-review.googlesource.com/47261
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
7c55726156a6b2bda10035188be0077848bd3ede 15-Sep-2017 Mike Klein <mtklein@chromium.org> centralize SI, Ctx, and load_and_inc()

We've got independent definitions of SI, LazyCtx/Ctx, and load_and_inc()
in _stages.cpp and _lowp.cpp. It's a good time to centralize them,
taking _stages.cpp's SI and load_and_inc(), and _lowp's Ctx.

SI and load_and_inc() are uninterestingly different. But using _lowp's
Ctx will let us get its prettier typed stage definitions into
_stages.cpp, but that is not not done here.

This is a pure refactor with no generated code changes.

Change-Id: I53260b0fdc71a77bf9e3ed6f3df3a2a4cbd2392b
Reviewed-on: https://skia-review.googlesource.com/47181
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
f419c441938c117427e895b4b7ff22bf470e7184 14-Sep-2017 Mike Klein <mtklein@chromium.org> fix android roll

Guarding loads of 8-15 with defined(__AVX2__) should prevent errors
like these:

external/skia/src/jumper/SkJumper_stages_lowp.cpp:287:46: error:
'memcpy' called with size bigger than buffer
case 12: memcpy(&v, ptr, 12*sizeof(T)); break;

The loads of 8-15 were of course unreachable, given the &(N-1) == &7.

Change-Id: Ifcb5c177c6909e1df55cb564779a4d6610ff7b32
Reviewed-on: https://skia-review.googlesource.com/46521
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
9b2f69b0aacc4819c3b4b9c5c8af8ef8ab35d572 12-Sep-2017 Mike Klein <mtklein@chromium.org> grand unifried lowp stages

I have text_16_AA_FF -> 8888 (forcing RP) faster than head now on my
laptop. I'm feeling confident that we can make this perform well.

After looking at performance a bit more today, it looks like everything
is within what I'd consider comparable in performance, especially on
ARM. On x86-64 it looks like big bulk blits get a little slower and
small mask blits get a little faster.

Quality looks good, and maybe improved for 565.

There are fewer platform-specific differences now in _lowp, and I think
they're few enough now that we could even consider completing the
unification by folding the 8-bit and float code together. Rename
"div255()" to "rebias()", slap on a few coats of paint...

Guarded for Chrome with SK_JUMPER_LEGACY_LOWP.

Change-Id: I36309c07cf736f3cb31952cca66030ad56026318
Reviewed-on: https://skia-review.googlesource.com/45982
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
5910ed347a638ded8cd4c06dbfda086695df1112 04-Aug-2017 Mike Klein <mtklein@chromium.org> 15-bit lowp is dead, long live 8-bit lowp

Change-Id: Icc4b06094aeba3af99b534746f66286d776ef78a
Reviewed-on: https://skia-review.googlesource.com/30920
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
1a2e3e1e77bf7d7da31e8403d88b743f74669c3c 03-Aug-2017 Mike Klein <mtklein@chromium.org> Store float and byte constant colors.

This makes loading them much simpler in 8-bit mode.

Change-Id: I35ff34ebd0b93425c4e39e055bf4ade8cf8561e1
Reviewed-on: https://skia-review.googlesource.com/30621
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
45c16fa82cd2fec010d4cb7763b654a413cabd0c 19-Jul-2017 Mike Klein <mtklein@chromium.org> convert over to 2d-mode

[√] convert all stages to use SkJumper_MemoryCtx / be 2d-compatible
[√] convert compile to 2d also, remove 1d run/compile
[√] convert all call sites
[√] no diffs

Change-Id: I3b806eb8fe0c3ec043359616409f7cd1211a1e43
Reviewed-on: https://skia-review.googlesource.com/24263
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
3b92b6907a6b9f7020a7589ad07cad2472fe4e86 18-Jul-2017 Mike Klein <mtklein@chromium.org> start on raster pipeline 2d mode

- Add run_2d(x,y,w,h) and start_pipeline_2d().
- Add and test a 2d-compatible store_8888_2d stage.

Change-Id: Ib9c225d1b8cb40471ae4333df1d06eec4d506f8a
Reviewed-on: https://skia-review.googlesource.com/24401
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
140635504c40b99debb0e714aca8d90652aa6aff 18-Jul-2017 Mike Klein <mtklein@chromium.org> minor fixes to start_pipeline_lowp

- in _lowp.cpp, JUMPER is always defined, so no need to check.
- the return type of this function has been void for a while.

Change-Id: I5271e8dab784f46c7ffa9cfba6eb55b5e399b537
Reviewed-on: https://skia-review.googlesource.com/24326
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
c91e3877a57ef140b688627b8c0acaafbefc9034 05-Jul-2017 Mike Reed <reed@google.com> add stages for black and white colors

histogram of test skps:

black: 1/7
white: 2/7
other: 4/7

Bug: skia:
Change-Id: I3a092899d31ce87837e66e5c8ea9ec5e0f239361
Reviewed-on: https://skia-review.googlesource.com/21408
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Reed <reed@google.com>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
111f8a9eea6980a70a300e3a8bfd758257310fe2 27-Jun-2017 Mike Klein <mtklein@chromium.org> add bgra as 1st class format

This is a start to eliminating swap_rb as a stage.

I've just hit the main hot spots here. Going to look into
the ~dozen other spots to see how they should work next.

Change-Id: I26fb46a042facf7bd6fff3b47c9fcee86d7142fd
Reviewed-on: https://skia-review.googlesource.com/20982
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
9f2b3d1fbfa16146184a13b778fbe2fa5355c51b 27-Jun-2017 Mike Klein <mtklein@chromium.org> remove unused "swap" stage

Change-Id: I25619f010f8ac6441529cfe8dff2d8c42d7400cf
Reviewed-on: https://skia-review.googlesource.com/20988
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
279091ef85eec98969b72805bbf613f1c0660380 27-Jun-2017 Mike Reed <reed@google.com> specialize loaders for dst registers, to avoid move/swap stages

Bug: skia:
Change-Id: I75d82ef2226c5f116b7de2208c4e914739414b6d
Reviewed-on: https://skia-review.googlesource.com/20984
Commit-Queue: Mike Reed <reed@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
d08109f3448c4a725a233e759508e9242e6bba42 27-Jun-2017 Mike Klein <mtklein@chromium.org> try not zeroing registers in start_pipeline

Generally stages take care of state setup themselves, either with
seed_shader, constant_color, a load, etc. I think these zeros may
be unnecessarily cautious.

This can't make anything draw more correctly, but it could make things
- draw wrong
- draw more slowly
- draw more quickly
so it's an interesting thing to try and keep an eye on.

Change-Id: I7e5ea3cd79e55a65e1dbd214601e147ba3815b87
Reviewed-on: https://skia-review.googlesource.com/20976
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
8c3d5156c7ab2bf723c307043841815d670895c5 19-Jun-2017 Mike Klein <mtklein@chromium.org> add _hsw lowp backend

CQ_INCLUDE_TRYBOTS=skia.primary:Build-Ubuntu-Clang-x86_64-Debug-MSAN

Change-Id: Id53279c17589b3434629bb644358ee238af8649f
Reviewed-on: https://skia-review.googlesource.com/20269
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Mike Reed <reed@google.com>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
c4fcbed6b2b2d1e2253e325b292931cb3d05d3fe 26-Jun-2017 Mike Klein <mtklein@chromium.org> somewhat less silly tail loads and stores

No reason to keep going one at a time when we know there are generally
better ways to handle loading a power-of-two number of low lanes.

This strategy scales up too, with quick answers for 8 (one 8 byte load),
12 (one 8 byte, one 4 byte), etc.

$ ninja -C out monobench; and out/monobench SkRasterPipeline_compile 300

Before: 46.946ns
After: 43.341ns

(This happens to be _lowp. Expect similar small speedups elsewhere.)

Change-Id: I08f87769ea3c9f06ad13d2b1d5326e542b9b63a8
Reviewed-on: https://skia-review.googlesource.com/20903
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
4b8d491b2230de6153836113c6864303eeedcd67 26-Jun-2017 Mike Klein <mtklein@chromium.org> lean more on the compiler in lowp stages

This refactors {from,to}_{byte,8888} to lean a bit more on the compiler,
and to share code between the two. The algorithm is not exactly the
same, but it's comparable, and the results of course are identical.

This new algorithm is a lot easier to generalize to AVX2, and parallels
the full-precision {from,to}_{byte,8888} functions in _stages.cpp.

Change-Id: I31ea90d65967bf4ede2497d1e2197cb0e7648bf8
Reviewed-on: https://skia-review.googlesource.com/20828
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
57c9d20102fd53a960293b45e3fed7cb1aac94fd 20-Jun-2017 Mike Klein <mtklein@chromium.org> rephrase lowp constant_color

This doesn't change the generated code (no .S files change),
but it does rephrase what we're trying to do to make it
generalize to AVX2 better:

- load 4 floats
- add 256.0f to each
- splat out the low 2 bytes of each 4 byte lane as r,g,b,a

Change-Id: Iadc5bc1f2a268679d1ccadd31cd24949a71e0aa4
Reviewed-on: https://skia-review.googlesource.com/20270
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
5883a11862771c3d5abb3a26c8c4b6bb0570de23 20-Jun-2017 Mike Klein <mtklein@chromium.org> remove defined(JUMPER) guards in _lowp.cpp

JUMPER is always defined in that file;
we never use it as a portable fallback.

Change-Id: Ic7caf726191599d4058adbf80084ede9f80676ee
Reviewed-on: https://skia-review.googlesource.com/20271
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
277f7f240f127772f7aa158baefc503dca05cf08 15-Jun-2017 Mike Klein <mtklein@chromium.org> delete lowp plus

I have figured out how to implement lowp clamp_1/clamp_a, and
implementing clamp_1 would make lowp plus active.

But... the way we have factored blend modes requires us to be able to
lerp between the dst and possibly-out-of-range src values. This is not
possible in lowp. If we try to multiply with values in [0x8001,0xffff],
we'll just get garbage. We'll clamp them back in range, but sadly
clamped garbage is still garbage.

So the simplest thing to do is keep plus blends in floats. This CL
doesn't even change that... we'd use floats before and after it. It
just removes the lowp plus stage code that is both dead and buggy.

As far as I can tell, no other drawing is currently gated by lowp
missing clamp_1 or clamp_a.

Change-Id: I55b73c840614f1bff9cd610dff90ca5e2b5c73e5
Reviewed-on: https://skia-review.googlesource.com/19909
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
1dda8bbf467802113ab1802f792b5b191108add1 06-Jun-2017 Mike Klein <mtklein@chromium.org> use smarter float -> skfixed15 logic everywhere

This is the same logic from constant_color, covering all the other
places where we convert from float to fixed, e.g. scale_1_float.

This isn't quite ideal yet. We replace mulss+cvttss2si for addss+movd,
which is great, but this leads to a silly sequence of code:

addss %xmm2, %xmm0
movd %xmm0, %r9d
movd %r9d, %xmm0
pshuflw $0x0, %xmm0, %xmm0

Those two movd are pointless...

Again, all diffs due to switching from truncation to rounding.

Change-Id: Icf6f3b6eb370fe41cea0cebcfda0b8907e055f41
Reviewed-on: https://skia-review.googlesource.com/18846
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
ce4b6c100f6e700b50933f75e3b4429357001028 06-Jun-2017 Mike Klein <mtklein@chromium.org> less naive lowp constant_color

This is as good as we can get without switching away from float inputs.

All diffs due to rounding (from the +256.0f).

Change-Id: I0d314f111d313577ce9078660178be17e865f11e
Reviewed-on: https://skia-review.googlesource.com/18845
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
8e200787b769ba9ee0bd4bd37d27ded1fafc06e6 06-Jun-2017 Mike Klein <mtklein@chromium.org> more easy lowp stages

Change-Id: I8a292bc98135b41ceedb4242451436c3657616fc
Reviewed-on: https://skia-review.googlesource.com/18722
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
9653d3aa84505c30aa5440b5629cdb25525666c3 05-Jun-2017 Mike Klein <mtklein@chromium.org> more lowp blend modes

Change-Id: Id62e989d4278f273c040b159ed4d2fd6a2f209e0
Reviewed-on: https://skia-review.googlesource.com/18627
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
f36031b68aa5b92204187c154fc5bc717db20a3a 05-Jun-2017 Mike Klein <mtklein@chromium.org> lowp: add some big easy stages

srcover_rgba_8888, lerp_u8, lerp_1_float, scale_u8, scale_1_float...
this is enough for _lots_ of drawing.

Change-Id: Ibe42adb8b1da6c66db3085851561dc9070556ee3
Reviewed-on: https://skia-review.googlesource.com/18622
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
727b09c8984b5c972ccde7f8f94d404b221eda6d 05-Jun-2017 Mike Klein <mtklein@chromium.org> lowp: add constant_color, swap, move_dst_src

This is enough for us to do some really simple draws.
Also add some debug tools to help prioritize porting.

Change-Id: I334f8fd2133be1aeec3f3406371a81aa6c184776
Reviewed-on: https://skia-review.googlesource.com/18597
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
5adb01bf0d190b20abe50fac608f396c65993424 05-Jun-2017 Mike Klein <mtklein@chromium.org> lowp: add move_src_dst and srcover

This is enough to run the bench SkRasterPipeline_compile.

$ ninja -C out monobench; and out/monobench SkRasterPipeline_compile 300

Before: 300 SkRasterPipeline_compile 48.4858ns
After: 300 SkRasterPipeline_compile 37.5801ns

Change-Id: Icb80348908dfb016826700a44566222c9f7a853c
Reviewed-on: https://skia-review.googlesource.com/18595
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
1f29bf093f01e9c9bf79cbd2ac27da62efc8e3a4 05-Jun-2017 Mike Klein <mtklein@chromium.org> slight streamlining for lowp load_8888 with pshufb

We can use 2 pshufb to replace 4 unpacks when deinterlacing the colors.

Change-Id: I713fbbc94f5cb9eaf14f85323b0ec76dc2246e98
Reviewed-on: https://skia-review.googlesource.com/18531
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
9beafc41afa3c78ffa12649dcde73628c277da9c 05-Jun-2017 Mike Klein <mtklein@chromium.org> have start_pipeline() return limit again

This is spooky.
I don't quite yet understand why, but this makes things much faster.

Performance regressed across the board when we no longer needed the
value and changed it to return void:

https://perf.skia.org/e/?begin=1496176469&keys=6994&xbaroffset=28513

You can see similar regressions following this Chromium bug link.
BUG=chromium:729237

Change-Id: I68371b0456014f909acf819aca52aa4f4f187460
Reviewed-on: https://skia-review.googlesource.com/18580
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp
bba02c203190a19bce7eb60887c4ae64041c8ae8 02-Jun-2017 Mike Klein <mtklein@chromium.org> start on SkJumper lowp mode

Just 3 stages implemented so far:

load_8888
swap_rb
store_8888

That's enough to make the shortest non-trivial pipeline
that you see in the new unit test.

Change-Id: Iabf90866ab452f7183d8c8dec1405ece2db695dc
Reviewed-on: https://skia-review.googlesource.com/18458
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_stages_lowp.cpp