History log of /external/skia/src/opts/SkNx_sse.h
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
9206c76337cd349c769103e06af005edd0583a10 30-Jan-2017 Florin Malita <fmalita@chromium.org> SkRasterPipeline shader adapter

Reland of https://skia-review.googlesource.com/c/7615.

(lifted from https://skia-review.googlesource.com/c/7088/)

CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD

Change-Id: I797a2f0ae80209c8637875418e08d2fa03249672
Reviewed-on: https://skia-review.googlesource.com/7731
Commit-Queue: Florin Malita <fmalita@google.com>
Reviewed-by: Mike Reed <reed@google.com>
/external/skia/src/opts/SkNx_sse.h
1da27ef853ae3e701b7f4aae670c21684396dcce 19-Jan-2017 Matt Sarett <msarett@google.com> Add F16 support to SkPNGImageEncoder

BUG=skia:

CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD

Change-Id: Ifd221365a7b9f9a4a4fc5382621e0da7189e1148
Reviewed-on: https://skia-review.googlesource.com/6526
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Leon Scroggins <scroggo@google.com>
Commit-Queue: Matt Sarett <msarett@google.com>
/external/skia/src/opts/SkNx_sse.h
5bee0b6de6b3ad1166d067e6b5046b48b8240a29 19-Jan-2017 Matt Sarett <msarett@google.com> Reland "Respect full precision for RGB16 PNGs" (part 2)

This lands all the new xform hooks but no change to src/codec.
So the new decode features are turned off.

I'm relanding this in pieces to try to bisect a
strange MSAN error.

Original CL:
https://skia-review.googlesource.com/c/7085/

BUG=skia:

CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-MSAN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD,Build-Ubuntu-Clang-x86_64-Release-Fast


Change-Id: I451a2a29c73ca475e9e7a5ded58d4948d6b8be19
Reviewed-on: https://skia-review.googlesource.com/7277
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Matt Sarett <msarett@google.com>
/external/skia/src/opts/SkNx_sse.h
dfff166db5d5226dc002a22ab3e3097ef971d615 18-Jan-2017 Matt Sarett <msarett@google.com> Revert "Respect full precision for RGB16 PNGs"

This reverts commit 7a090c403da1dad6a2e19f2011158bd894a62d91.

Reason for revert: <INSERT REASONING HERE>

Original change's description:
> Respect full precision for RGB16 PNGs
>
> BUG=skia:
>
> CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
>
> Change-Id: If58d201daae97bce2f8efbc453c2ec452e682493
> Reviewed-on: https://skia-review.googlesource.com/7085
> Commit-Queue: Matt Sarett <msarett@google.com>
> Reviewed-by: Mike Klein <mtklein@chromium.org>
> Reviewed-by: Leon Scroggins <scroggo@google.com>
> Reviewed-by: Mike Reed <reed@google.com>
>

TBR=mtklein@chromium.org,mtklein@google.com,msarett@google.com,scroggo@google.com,reed@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD

Change-Id: Ibd9879bc4f65ca0c2457dd0bfb5eb008d9a8f672
Reviewed-on: https://skia-review.googlesource.com/7183
Commit-Queue: Matt Sarett <msarett@google.com>
Reviewed-by: Matt Sarett <msarett@google.com>
/external/skia/src/opts/SkNx_sse.h
7a090c403da1dad6a2e19f2011158bd894a62d91 17-Jan-2017 Matt Sarett <msarett@google.com> Respect full precision for RGB16 PNGs

BUG=skia:

CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD

Change-Id: If58d201daae97bce2f8efbc453c2ec452e682493
Reviewed-on: https://skia-review.googlesource.com/7085
Commit-Queue: Matt Sarett <msarett@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Leon Scroggins <scroggo@google.com>
Reviewed-by: Mike Reed <reed@google.com>
/external/skia/src/opts/SkNx_sse.h
e71b167dbd4d8da76495ca85db83d1a3b49aaabd 13-Jan-2017 Mike Klein <mtklein@chromium.org> Attempt 3: SkRasterPipelineBlitter: support A8

Now that SkOpts_hsw.cpp no longer hooks in SkRasterPipeline_opts,
it should be safe to try this again.

This reverts commit 86d55b312a2649d80890ccf75f24571ada0265f1.

Change-Id: I2d495600ca9d3a0f49c2e02fbaaae349cefac3a1
Reviewed-on: https://skia-review.googlesource.com/6985
Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
379938e47bc9edb6edfd21aabefa01aed71dd135 13-Jan-2017 Matt Sarett <msarett@google.com> Use RasterPipeline to support full precision on 16-bit RGBA pngs

Reland of Original Change:
https://skia-review.googlesource.com/6260

CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD

Change-Id: I809984dd9af225103bfbe83492a17c19da7c5e40
Reviewed-on: https://skia-review.googlesource.com/6980
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Matt Sarett <msarett@google.com>
/external/skia/src/opts/SkNx_sse.h
25b60833e7c3dd25f2317b3f0e7af07f04b5beba 12-Jan-2017 Matt Sarett <msarett@google.com> Revert "Use RasterPipeline to support full precision on 16-bit RGBA pngs"

This reverts commit bb2339da39ab3ee59121acd911920dafcd4a2f72.

Reason for revert: Breaks MSAN

Original change's description:
> Use RasterPipeline to support full precision on 16-bit RGBA pngs
>
> TODO: Support more precision on 16-bit RGB pngs
>
> BUG=skia:
>
> CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
>
> Change-Id: I89dfef3b4887b9c4895c17309933883ab90ffa4d
> Reviewed-on: https://skia-review.googlesource.com/6260
> Reviewed-by: Mike Reed <reed@google.com>
> Reviewed-by: Leon Scroggins <scroggo@google.com>
> Reviewed-by: Mike Klein <mtklein@chromium.org>
> Commit-Queue: Matt Sarett <msarett@google.com>
>

TBR=mtklein@chromium.org,mtklein@google.com,msarett@google.com,scroggo@google.com,reed@google.com,reviews@skia.org
BUG=skia:
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true

Change-Id: I47579c20af033a75883e2b35567cb9c690ce54b0
Reviewed-on: https://skia-review.googlesource.com/6975
Commit-Queue: Matt Sarett <msarett@google.com>
Reviewed-by: Matt Sarett <msarett@google.com>
/external/skia/src/opts/SkNx_sse.h
bb2339da39ab3ee59121acd911920dafcd4a2f72 12-Jan-2017 Matt Sarett <msarett@google.com> Use RasterPipeline to support full precision on 16-bit RGBA pngs

TODO: Support more precision on 16-bit RGB pngs

BUG=skia:

CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD

Change-Id: I89dfef3b4887b9c4895c17309933883ab90ffa4d
Reviewed-on: https://skia-review.googlesource.com/6260
Reviewed-by: Mike Reed <reed@google.com>
Reviewed-by: Leon Scroggins <scroggo@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Matt Sarett <msarett@google.com>
/external/skia/src/opts/SkNx_sse.h
86d55b312a2649d80890ccf75f24571ada0265f1 07-Jan-2017 Mike Klein <mtklein@chromium.org> Revert "Retry "SkRasterPipelineBlitter: support A8"..."

This reverts commit f55ea6a1deb21120944d406124a2984b5009260a.

Reason for revert: crbug.com/679147

Original change's description:
> Retry "SkRasterPipelineBlitter: support A8"...
>
> ...preferring SkA8_Coverage_Blitter over SkRasterPipelineBlitter.
>
> I think we could make this work with SkRasterPipelineBlitter (tell it, draw white in Src mode with this mask), but the existing blitter is pretty hard to beat in efficiency and correctness.
>
> CQ_INCLUDE_TRYBOTS=skia.primary:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-MSAN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
>
>
> Change-Id: I72df9995c63f3334d8111c59711818cb5ed1e63c
> Reviewed-on: https://skia-review.googlesource.com/6627
> Reviewed-by: Mike Klein <mtklein@chromium.org>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
>

TBR=mtklein@chromium.org,brianosman@google.com,reed@google.com,reviews@skia.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true

Change-Id: I6a36b4c087a52e54f4d591ded40e6a202fb77068
Reviewed-on: https://skia-review.googlesource.com/6760
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
d05204258017a040c97cbd9d0e3f7993211668ad 06-Jan-2017 Matt Sarett <msarett@google.com> Fix Sk8f::Store4 (for HSW)

This should fix the colorspacexform gm in Gold.
https://gold.skia.org/search?head=true&include=false&limit=50&neg=false&pos=false&query=name%3Dcolorspacexform%26source_type%3Dgm&unt=true

BUG=skia:

Change-Id: I05e2c2c0e7d7095f6935e60ff1bf89858380335f
Reviewed-on: https://skia-review.googlesource.com/6721
Commit-Queue: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
f55ea6a1deb21120944d406124a2984b5009260a 06-Jan-2017 Mike Klein <mtklein@chromium.org> Retry "SkRasterPipelineBlitter: support A8"...

...preferring SkA8_Coverage_Blitter over SkRasterPipelineBlitter.

I think we could make this work with SkRasterPipelineBlitter (tell it, draw white in Src mode with this mask), but the existing blitter is pretty hard to beat in efficiency and correctness.

CQ_INCLUDE_TRYBOTS=skia.primary:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-MSAN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD


Change-Id: I72df9995c63f3334d8111c59711818cb5ed1e63c
Reviewed-on: https://skia-review.googlesource.com/6627
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
4a224f60111643f36d6f1c204142ce34511a32a1 05-Jan-2017 Mike Klein <mtklein@chromium.org> Revert "SkRasterPipelineBlitter: support A8"

This reverts commit f44373c119290b501d4aec7385e16d12c28a1f0f.

Reason for revert: MSAN

Original change's description:
> SkRasterPipelineBlitter: support A8
>
> This adds support for loading and storing A8, then uses it in SkRasterPipelineBlitter.
>
> I think this handles all dst formats now: A8, 565, 8888 (by policy, sRGB only) and F16.
>
> CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel;skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
>
>
> Change-Id: Id207f6e6c56b6bfcc301d77dd23e0959bb7afba8
> Reviewed-on: https://skia-review.googlesource.com/6554
> Reviewed-by: Mike Reed <reed@google.com>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
>

TBR=mtklein@chromium.org,reed@google.com,reviews@skia.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true

Change-Id: I9ead3c3335e1776e9a1639ca0481253821505d67
Reviewed-on: https://skia-review.googlesource.com/6625
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
f44373c119290b501d4aec7385e16d12c28a1f0f 04-Jan-2017 Mike Klein <mtklein@chromium.org> SkRasterPipelineBlitter: support A8

This adds support for loading and storing A8, then uses it in SkRasterPipelineBlitter.

I think this handles all dst formats now: A8, 565, 8888 (by policy, sRGB only) and F16.

CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel;skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD


Change-Id: Id207f6e6c56b6bfcc301d77dd23e0959bb7afba8
Reviewed-on: https://skia-review.googlesource.com/6554
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
0d413f0f455c984eacf46c9d5cb0978d7b73357f 14-Dec-2016 Mike Klein <mtklein@chromium.org> Revert "Revert "SkNx basically always is fast now.""

This reverts commit 8ba64d1996ba6c9ecfb12132cdab7d5d99af7456.

Reason for revert: does not appear to have been blocking the roll.

Original change's description:
> Revert "SkNx basically always is fast now."
>
> This reverts commit 21f783829619186442041de6008f7f58f4f6250d.
>
> Reason for revert: roll?
>
> Original change's description:
> > SkNx basically always is fast now.
> >
> > We had this SKNX_IS_FAST hanging around from before Chrome always built with NEON.
> >
> > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
> >
> > Change-Id: Ia5cc0323b3ef052192e2903f961aee11eb3f82d8
> > Reviewed-on: https://skia-review.googlesource.com/5946
> > Commit-Queue: Mike Klein <mtklein@chromium.org>
> > Reviewed-by: Mike Reed <reed@google.com>
> > Reviewed-by: Florin Malita <fmalita@chromium.org>
> >
>
> TBR=mtklein@chromium.org,fmalita@chromium.org,reed@google.com,reviews@skia.org
> NOPRESUBMIT=true
> NOTREECHECKS=true
> NOTRY=true
>
> Change-Id: I0e57285c68eae0a64213fe29ea4cca5519777954
> Reviewed-on: https://skia-review.googlesource.com/6040
> Commit-Queue: Mike Klein <mtklein@chromium.org>
> Reviewed-by: Mike Klein <mtklein@chromium.org>
>

TBR=mtklein@chromium.org,reviews@skia.org,fmalita@chromium.org,reed@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true

Change-Id: I230dd4c2abb2d14ffc302be5376b9eaacbbeafcc
Reviewed-on: https://skia-review.googlesource.com/6026
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
8ba64d1996ba6c9ecfb12132cdab7d5d99af7456 14-Dec-2016 Mike Klein <mtklein@chromium.org> Revert "SkNx basically always is fast now."

This reverts commit 21f783829619186442041de6008f7f58f4f6250d.

Reason for revert: roll?

Original change's description:
> SkNx basically always is fast now.
>
> We had this SKNX_IS_FAST hanging around from before Chrome always built with NEON.
>
> CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
>
> Change-Id: Ia5cc0323b3ef052192e2903f961aee11eb3f82d8
> Reviewed-on: https://skia-review.googlesource.com/5946
> Commit-Queue: Mike Klein <mtklein@chromium.org>
> Reviewed-by: Mike Reed <reed@google.com>
> Reviewed-by: Florin Malita <fmalita@chromium.org>
>

TBR=mtklein@chromium.org,fmalita@chromium.org,reed@google.com,reviews@skia.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true

Change-Id: I0e57285c68eae0a64213fe29ea4cca5519777954
Reviewed-on: https://skia-review.googlesource.com/6040
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
21f783829619186442041de6008f7f58f4f6250d 13-Dec-2016 Mike Klein <mtklein@chromium.org> SkNx basically always is fast now.

We had this SKNX_IS_FAST hanging around from before Chrome always built with NEON.

CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD

Change-Id: Ia5cc0323b3ef052192e2903f961aee11eb3f82d8
Reviewed-on: https://skia-review.googlesource.com/5946
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
/external/skia/src/opts/SkNx_sse.h
2c8104eb18d15916db8f7b648bd79f7384397af9 29-Nov-2016 Mike Klein <mtklein@chromium.org> SkNx_abi is unused.

CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD

Change-Id: I082c34a1f484715cd2dca55a8d23101235755e6a
Reviewed-on: https://skia-review.googlesource.com/5233
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
2e35e8a45c9ff04b88d7a98b381ce56e565d8402 18-Nov-2016 Mike Klein <mtklein@chromium.org> mirror tiling

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5064
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Change-Id: I9d48ba73e145701245e5ec4f32c8c360da6baddd
Reviewed-on: https://skia-review.googlesource.com/5064
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
b273fc429fff8af5984e1e3dc1068c71dd369d57 17-Nov-2016 Mike Klein <mtklein@chromium.org> repeat tiling

Does the note in repeat() look right?

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4980
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Change-Id: I347023f312ab4643fc387e486192c3ae3357db8b
Reviewed-on: https://skia-review.googlesource.com/4980
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
06a65e2799eaead18f778792801406aff4aec0d9 17-Nov-2016 Mike Klein <mtklein@chromium.org> Support SkImageShader in SkRasterPipeline blitter

First of many CLs, I'm sure.

This handles 8888 or sRGB sources with an affine matrix, clamp/clamp tiling, and nearest-neighbor sampling only.

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4906
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Change-Id: I99f7508852b3d44b6f52f7a0bee29a793af35c48
Reviewed-on: https://skia-review.googlesource.com/4906
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
a4a4488a4c3f16758f7e2b050168fe8d2f3b2a4d 04-Nov-2016 mtklein <mtklein@chromium.org> skrpb: evaluate color filters for constant shaders once.

The simplest thing to do here is just run shader+color filter pipeline at
construction time to create a new constant color shader (replacing the paint
color).

This reduces a pipeline like:
- constant_color (paint color)
- matrix_4x5
- clamp_a
- load_d_foo, xfermode, lerp, store_foo
to
- constant_color (paint color -> matrix_4x5 -> clamp_a)
- load_d_foo, xfermode, lerp, store_foo

To implement this all, we add a new store_f32 stage that writes SkPM4f, and
finally get around to implementing Sk8f::Store4() (store while reinterlacing).
Sk4f::Store4() already exists for both SSE and NEON.

Next step: reduce simple constant_color -> store pipelines (src mode, full
coverage) into non-pipeline memsets.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2480823002

Review-Url: https://codereview.chromium.org/2480823002
/external/skia/src/opts/SkNx_sse.h
7c78f3a863c620d722f02d00b88de5b3cde298a4 19-Oct-2016 Mike Klein <mtklein@chromium.org> SkNx: use SK_ALWAYS_INLINE thoroughly.

MSVC's not so good at inlining. So tell it where to. It won't hurt the others.

This has nothing directly to do with ODR safety. The anonymous namespaces and 'static' on freestanding functions provide the correctness we need there. But this change can help to mechanically prevent the sort of problems ODR violations can lead to.

I may follow up by extending this strategy further to Sk4px, which is used to implement a lot of the legacy xfermodes.

BUG=skia:

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3608
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Change-Id: I927334c40910ce43da1fbabdf243c9cd5438bea6
Reviewed-on: https://skia-review.googlesource.com/3608
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
5ec5590bf796ef9d966f7348b3ce81a6e3fe3848 15-Oct-2016 Mike Klein <mtklein@chromium.org> Unbreak -Fast bot.

CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3423

Change-Id: I1875c86b785da4483038c10715af7827b7d71f0b
Reviewed-on: https://skia-review.googlesource.com/3423
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
511ea17b967914ae8342d7861c6695ec8d492bcc 15-Oct-2016 Mike Klein <mtklein@chromium.org> SkNx_abi for passing Sk4f as function arguments, etc.

CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot

BUG=skia:

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3422

Change-Id: Idc0a192faa7ff843aef023229186580c69baf1f7
Reviewed-on: https://skia-review.googlesource.com/3422
Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
1e76464f87ea55a8749eb94b8e6c79638983d65a 14-Oct-2016 Mike Klein <mtklein@chromium.org> Wrap SkNx types in anonymous namespace again.

This should make each compilation unit's SkNx types distinct from each other's as far as C++ cares. This keeps us from violating the One Definition Rule with different implementations for the same function.

Here's an example I like. Sk4i SkNx_cast(Sk4b) has at least 4 different sensible implementations:
- SSE2: punpcklbw xmm, zero; punpcklbw xmm, zero
- SSSE3: load mask; pshufb xmm, mask
- SSE4.1: pmovzxbd
- AVX2: vpmovzxbd

We really want all these to inline, but if for some reason they don't (Debug build, poor inliner) and they're compiled in SkOpts.cpp, SkOpts_ssse3.cpp, SkOpts_sse41.cpp, SkOpts_hsw.cpp... boom!

BUG=skia:

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3461

Change-Id: I0088ebfd7640c1b0de989738ed43c81b530dc0d9
Reviewed-on: https://skia-review.googlesource.com/3461
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
04adfda9c74481d0b640c0ce18864588babfcdf6 12-Oct-2016 Mike Klein <mtklein@chromium.org> SkRasterPipeline: 8x pipelines, without any 8x code enabled.

Original review here: https://skia-review.googlesource.com/c/2990/
Second attempt here: https://skia-review.googlesource.com/c/3064/

This is the same as the second attempt, but with the change to SkOpts_hsw.cpp left out.
That omitted part is the key piece... this just lands the refactoring.

CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3242

Change-Id: Iaafa793a4854c2c9cd7e85cca3701bf871253f71
Reviewed-on: https://skia-review.googlesource.com/3242
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
42f4b42e8311f168aeeadd939b476c05b329500e 10-Oct-2016 Mike Klein <mtklein@chromium.org> Revert "SkRasterPipeline: 8x pipelines, attempt 2"

This reverts commit Id0ba250037e271a9475fe2f0989d64f0aa909bae.

crbug.com/654213
Looks like Chrome Canary's picking up Haswell code on non-Haswell machines.

Change-Id: I16f976da24db86d5c99636c472ffad56db213a2a
Reviewed-on: https://skia-review.googlesource.com/3108
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
a71e151c6f0be68dc96ad2d169bbc31edca8f946 07-Oct-2016 Mike Klein <mtklein@chromium.org> SkRasterPipeline: 8x pipelines, attempt 2

Original review here: https://skia-review.googlesource.com/c/2990/

Changes since:
- simpler implementations of load_tail() / store_tail(): slower, but more obviously correct to all compilers
- fleshed out math ops on Sk8i and Sk8u to make unit tests happy on -Fast bot (where we always have AVX2)
- now storing stage functions as void(*)() to avoid undefined behavior and/or linker problems. This restores 32-bit Windows.
- all AVX2 Sk8x methods are marked always-inline, to avoid linking the "wrong" version on Debug builds.

CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3064

Change-Id: Id0ba250037e271a9475fe2f0989d64f0aa909bae
Reviewed-on: https://skia-review.googlesource.com/3064
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
942858774831fea6e3f5f9d6c39b9538cf031bfd 07-Oct-2016 Mike Klein <mtklein@chromium.org> Revert "SkRasterPipeline: 8x pipelines"

This reverts commit I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a.

Reason for revert: lots of failing bots.

TBR=mtklein@chromium.org,msarett@google.com,reviews@skia.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true

Change-Id: I653bed3905187f43196504f19424985fa2a765b5
Reviewed-on: https://skia-review.googlesource.com/3063
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
1aebdaee0e2aa4324509fd3ad4c40c21703ae4a2 06-Oct-2016 Mike Klein <mtklein@chromium.org> SkRasterPipeline: 8x pipelines

Bench runtime changes:
sRGB: 7194 -> 3735 = 1.93x faster
F16: 6531 -> 2559 = 2.55x faster

Instead of building 4x and 1-3x pipelines and then maybe 8x and 1-7x, instead build either the short ones or the long ones, but not both. If we just take care to use a compatible run_pipeline(), there's some cross-module type disagreement but everything works out in the end.

Oddly, a few places that looked like they'd be faster using SkNx_fma() or Sk4f_round()/Sk8f_round() are actually faster the long way, e.g. multiply, add 0.5, truncate. Curious! In all the other places you see here that I've used SkNx_fma(), it's been a significant speedup.

This folds in a couple refactors and cleanups that I've been meaning to do. Hope you don't mind... if find the new code considerably easier to read than the old code.

BUG=skia:

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2990
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Change-Id: I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a
Reviewed-on: https://skia-review.googlesource.com/2990
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
33cbfd75afdd383770bb6253c06ba819a2481a35 06-Oct-2016 Mike Klein <mtklein@chromium.org> Make load4 and store4 part of SkNx properly.

Every type now nominally has Load4() and Store4() methods.
The ones that we use are implemented.

BUG=skia:

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3046
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Change-Id: I7984f0c2063ef8acbc322bd2e968f8f7eaa0d8fd
Reviewed-on: https://skia-review.googlesource.com/3046
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkNx_sse.h
c0444615ed76360f680619ad4d1f92cda6181a50 16-Sep-2016 msarett <msarett@google.com> Support Float32 output from SkColorSpaceXform

* Adds Float32 support to SkColorSpaceXform
* Changes API to allows clients to ask for F32, updates clients to
new API
* Adds Sk4f_load4 and Sk4f_store4 to SkNx
* Make use of new xform in SkGr.cpp

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47
Review-Url: https://codereview.chromium.org/2339233003
/external/skia/src/opts/SkNx_sse.h
c71a9b7f53938b4f33f36f48e867b8b72cc1cc61 16-Sep-2016 msarett <msarett@google.com> Revert of Support Float32 output from SkColorSpaceXform (patchset #7 id:140001 of https://codereview.chromium.org/2339233003/ )

Reason for revert:
Hitting an assert

Original issue's description:
> Support Float32 output from SkColorSpaceXform
>
> * Adds Float32 support to SkColorSpaceXform
> * Changes API to allows clients to ask for F32, updates clients to
> new API
> * Adds Sk4f_load4 and Sk4f_store4 to SkNx
> * Make use of new xform in SkGr.cpp
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
> CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47

TBR=brianosman@google.com,mtklein@google.com,scroggo@google.com,mtklein@chromium.org,bsalomon@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review-Url: https://codereview.chromium.org/2347473007
/external/skia/src/opts/SkNx_sse.h
43d6651111374b5d1e4ddd9030dcf079b448ec47 16-Sep-2016 msarett <msarett@google.com> Support Float32 output from SkColorSpaceXform

* Adds Float32 support to SkColorSpaceXform
* Changes API to allows clients to ask for F32, updates clients to
new API
* Adds Sk4f_load4 and Sk4f_store4 to SkNx
* Make use of new xform in SkGr.cpp

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2339233003
/external/skia/src/opts/SkNx_sse.h
15ee3deee8aca2bf6e658449f25ee34a8153e6ee 02-Aug-2016 msarett <msarett@google.com> Refactor of SkColorSpaceXformOpts

(1) Performance is better or stays the same.

(2) Code is split into functions (RasterPipeline-ish
design). IMO, it's not really more or less readable.
But I think it's now much easier add capabilities,
apply optimizations, or do more refactors. Or to
actually use RasterPipeline. I help back from trying
any of these to try to keep this CL sane.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2194303002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2194303002
/external/skia/src/opts/SkNx_sse.h
d05a8752738f84b0115678b3cdad89237173e904 29-Jul-2016 mtklein <mtklein@chromium.org> SkNx: add Sk4u

This lets us get at logical >> in a nicely principled way.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2197683002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2197683002
/external/skia/src/opts/SkNx_sse.h
f660b7cfcfbf3062f88e61f8320ea7051da72213 26-Jul-2016 mtklein <mtklein@chromium.org> Add Sk4h_load4 for loading F16.

Should feel very similar to Sk4h_store4:
NEON uses its native instruction, SSE unpacks manually.

Since we'll have our F16s in 4 Sk4h by the time we're done here,
this also extracts an Sk4h->Sk4f routine from the old uint64_t->Sk4f one.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2184753002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2184753002
/external/skia/src/opts/SkNx_sse.h
6bdbf4412bd1a6fe26be1042ccf080174b13021f 19-Jul-2016 msarett <msarett@google.com> Improve naive SkColorXform to half floats

This should give us a good baseline to explore using SkRasterPipeline.

A particular colorxform to half float drops from 425us to 282us on my desktop.

Color Xform to Half Float (HP z620)
Original 425us
Trans16 (not 32) 355us
Vector Trans16 378us
Trans16 + Keep Halfs in Vector 335us
Vector Trans16 + Keep Halfs in Vector 282us
Final 282us

Color Xform to Half Float (Nexus 5X)
Original 556us
Final 472us

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2159993003
/external/skia/src/opts/SkNx_sse.h
036e1831e05ae3a6ec9bcd30cb24f6b1a49a3541 15-Jul-2016 mtklein <mtklein@chromium.org> Add a bench to measure the best way to pack from int to uint16_t with SSE.

I measured relative runtimes on my laptop:

pack_int_uint16_t_ss…
1036 …e41 1x …se3 1.01x …e2_b 3.01x …e2_a 3.02x

I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think,
so I'm going to exercise a little cowardice and leave that option disabled for now.

The ssse3 version probably looks a little faster than it will be in practice.
We'll usually need to load its mask, which here is hoisted out of the bench loop.

The two sse2 variants are close enough in speed that I'm tie breaking them on other
concerns: the <<16, >>16 version doesn't need any scratch registers or to load any
constants, so it wins.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot

Review-Url: https://codereview.chromium.org/2150343002
/external/skia/src/opts/SkNx_sse.h
58e389b0518b46bbe58ba01c23443cf23c18435c 15-Jul-2016 mtklein <mtklein@chromium.org> Expand _01 half<->float limitation to _finite. Simplify.

It's become clear we need to sometimes deal with values <0 or >1.
I'm not yet convinced we care about NaN or +-inf.

We had some fairly clever tricks and optimizations here for NEON
and SSE. I've thrown them out in favor of a single implementation.
If we find the specializations mattered, we can certainly figure out
how to extend them to this new range/domain.

This happens to add a vectorized float -> half for ARMv7, which was
missing from the _01 version. (The SSE strategy was not portable to
platforms that flush denorm floats to zero.)

I've tested the full float range for FloatToHalf on my desktop and a 5x.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot

Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90
Review-Url: https://codereview.chromium.org/2145663003
/external/skia/src/opts/SkNx_sse.h
64bbad360f30930befafee1bdefe4ad89f130dac 14-Jul-2016 mtklein <mtklein@google.com> Revert of Expand _01 half<->float limitation to _finite. Simplify. (patchset #7 id:120001 of https://codereview.chromium.org/2145663003/ )

Reason for revert:
Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast

Original issue's description:
> Expand _01 half<->float limitation to _finite. Simplify.
>
> It's become clear we need to sometimes deal with values <0 or >1.
> I'm not yet convinced we care about NaN or +-inf.
>
> We had some fairly clever tricks and optimizations here for NEON
> and SSE. I've thrown them out in favor of a single implementation.
> If we find the specializations mattered, we can certainly figure out
> how to extend them to this new range/domain.
>
> This happens to add a vectorized float -> half for ARMv7, which was
> missing from the _01 version. (The SSE strategy was not portable to
> platforms that flush denorm floats to zero.)
>
> I've tested the full float range for FloatToHalf on my desktop and a 5x.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
> CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90

TBR=msarett@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review-Url: https://codereview.chromium.org/2151023003
/external/skia/src/opts/SkNx_sse.h
3296bee70d074bb8094b3229dbe12fa016657e90 14-Jul-2016 mtklein <mtklein@chromium.org> Expand _01 half<->float limitation to _finite. Simplify.

It's become clear we need to sometimes deal with values <0 or >1.
I'm not yet convinced we care about NaN or +-inf.

We had some fairly clever tricks and optimizations here for NEON
and SSE. I've thrown them out in favor of a single implementation.
If we find the specializations mattered, we can certainly figure out
how to extend them to this new range/domain.

This happens to add a vectorized float -> half for ARMv7, which was
missing from the _01 version. (The SSE strategy was not portable to
platforms that flush denorm floats to zero.)

I've tested the full float range for FloatToHalf on my desktop and a 5x.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2145663003
/external/skia/src/opts/SkNx_sse.h
281b33fdd909ee3f43192cdf950ce00e3df62407 13-Jul-2016 mtklein <mtklein@chromium.org> SkRasterPipeline preliminaries

Re-uploading to see if I can get a CL number < 2^31.
patch from issue 2147533002 at patchset 240001 (http://crrev.com/2147533002#ps240001)

Already reviewed at the other crrev link.
TBR=

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147533002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2144573004
/external/skia/src/opts/SkNx_sse.h
7d3ff7142360f456be4e21e64c6c014cc919785e 12-Jul-2016 msarett <msarett@google.com> Add Sk4f_RoundToInt

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2134753006
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2134753006
/external/skia/src/opts/SkNx_sse.h
afb8539f62b3e92dcc5dbe0598605f4058a1e03a 11-Jul-2016 msarett <msarett@google.com> Revert of try to speed-up maprect + round2i + contains (patchset #8 id:140001 of https://codereview.chromium.org/2133413002/ )

Reason for revert:
Breaking the roll...
https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/253294/steps/compile%20%28with%20patch%29/logs/stdio

Original issue's description:
> try to speed-up maprect + round2i + contains
>
> We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well.
>
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002
> CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/b42b785d1cbc98bd34aceae338060831b974f9c5

TBR=mtklein@google.com,reed@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review-Url: https://codereview.chromium.org/2136343002
/external/skia/src/opts/SkNx_sse.h
b42b785d1cbc98bd34aceae338060831b974f9c5 11-Jul-2016 reed <reed@google.com> try to speed-up maprect + round2i + contains

We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2133413002
/external/skia/src/opts/SkNx_sse.h
5608e2ed2299496eee3c57e0fe426ae9bd0d07a4 11-Jul-2016 mtklein <mtklein@chromium.org> Clean up hyper-local SkCpu feature test experiment.

This removes the code paths where we make SkCpu::Supports() calls
from within a tight loop. It keeps code paths using SkCpu::Supports()
to choose entire routines from src/opts/.

We can't rely on these hyper-local checks to be hoisted up reliably enough.
It worked pretty well with the first couple platforms we tried (e.g. Clang
on Linux/Mac) but we can't gaurantee it works everywhere.

Further, I'm not able to actually do anything fancy with those tests
outside of x86... I've not found a way to get, say, NEON+F16 conversion
code embedded into ordinary NEON code outside writing then entire function
in external assembly.

This whole idea becomes less important now that we've got a way to chain
separate function calls together efficiently. We can now, e.g., use an
AVX+F16C method to load some pixels, then chain that into an ordinary AVX
method to color filter them.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2138073002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2138073002
/external/skia/src/opts/SkNx_sse.h
d2809573deb7b99e764f7f71fe34a5b5322df0b2 20-Jun-2016 msarett <msarett@google.com> Support sRGB dsts in opt code

201295.jpg on HP z620 (300x280)

QCMS Xform 0.418 ms
Skia NEW Xform 0.378 ms

Vs QCMS 1.11x

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2078623002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2078623002
/external/skia/src/opts/SkNx_sse.h
64f061a4e85bea6a48901b1385b7159f85909a34 17-Jun-2016 mtklein <mtklein@chromium.org> port SkColorXform_opts to Sk4f

I have tested that this compiles.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2078913003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2078913003
/external/skia/src/opts/SkNx_sse.h
e18fa440e74e9af0324de0a1de9b6ffb0fe3c3d3 09-Jun-2016 mtklein <mtklein@chromium.org> Move immintrin/arm_neon includes to where they are used.

On my Mac (so, immintrin), this improves compile time, both wall and cpu,
by about 16%. To test I ran this on an SSD with files hot in their caches:

$ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
ninja -C out/Release -t clean && \
time ninja -C out/Release

Before: 159 wall / 3367 cpu
159 wall / 3368 cpu

After: 137 wall / 2860 cpu
136 wall / 2863 cpu

I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
That made no signficant difference, so I've kept immintrin for its simplicity.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

TBR=reed@google.com
No public API changes.

Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a
Review-Url: https://codereview.chromium.org/2045633002
/external/skia/src/opts/SkNx_sse.h
50bcb189f8785a599a3024d8eba4681c2e8ca37a 08-Jun-2016 mtklein <mtklein@google.com> Revert of Move immintrin/arm_neon includes to where they are used. (patchset #2 id:20001 of https://codereview.chromium.org/2045633002/ )

Reason for revert:
Appears to have broken the ARMv7 aspect of the Google3 roll in bizarre seemingly-unrelated ways.

Original issue's description:
> Move immintrin/arm_neon includes to where they are used.
>
> On my Mac (so, immintrin), this improves compile time, both wall and cpu,
> by about 16%. To test I ran this on an SSD with files hot in their caches:
>
> $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
> ninja -C out/Release -t clean && \
> time ninja -C out/Release
>
> Before: 159 wall / 3367 cpu
> 159 wall / 3368 cpu
>
> After: 137 wall / 2860 cpu
> 136 wall / 2863 cpu
>
> I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
> That made no signficant difference, so I've kept immintrin for its simplicity.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> TBR=reed@google.com
> No public API changes.
>
> Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a

TBR=herb@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review-Url: https://codereview.chromium.org/2046213002
/external/skia/src/opts/SkNx_sse.h
12dfaaa53c23f3d03050bde8f64136ac1f44164a 07-Jun-2016 mtklein <mtklein@chromium.org> Move immintrin/arm_neon includes to where they are used.

On my Mac (so, immintrin), this improves compile time, both wall and cpu,
by about 16%. To test I ran this on an SSD with files hot in their caches:

$ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
ninja -C out/Release -t clean && \
time ninja -C out/Release

Before: 159 wall / 3367 cpu
159 wall / 3368 cpu

After: 137 wall / 2860 cpu
136 wall / 2863 cpu

I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
That made no signficant difference, so I've kept immintrin for its simplicity.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

TBR=reed@google.com
No public API changes.

Review-Url: https://codereview.chromium.org/2045633002
/external/skia/src/opts/SkNx_sse.h
244a65350e52c9438931ecdc05a4913f29d343bc 19-Apr-2016 mtklein <mtklein@chromium.org> skcpu: sse4.1 floor, f16c f16<->f32

- floor with roundps is about 4.5x faster when available
- f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:

+0x180 movups (%r15), %xmm1
+0x184 vcvtph2ps (%rbx), %xmm2
+0x189 movaps %xmm1, %xmm3
+0x18c shufps $255, %xmm3, %xmm3
+0x190 movaps %xmm0, %xmm4
+0x193 subps %xmm3, %xmm4
+0x196 mulps %xmm2, %xmm4
+0x199 addps %xmm1, %xmm4
+0x19c vcvtps2ph $0, %xmm4, (%rbx)
+0x1a2 addq $16, %r15
+0x1a6 addq $8, %rbx
+0x1aa decl %r14d
+0x1ad jne +0x180

If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9

Committed: https://skia.googlesource.com/skia/+/3faf74b8364491ca806f523fbb1d8a97be592663

Review URL: https://codereview.chromium.org/1891513002
/external/skia/src/opts/SkNx_sse.h
2c7f24093a394ccbe54a7db60ba79af14682e7fa 15-Apr-2016 mtklein <mtklein@google.com> Revert of skcpu: sse4.1 floor, f16c f16<->f32 (patchset #11 id:200001 of https://codereview.chromium.org/1891513002/ )

Reason for revert:
this depends on a CL I want to revert

Original issue's description:
> skcpu: sse4.1 floor, f16c f16<->f32
>
> - floor with roundps is about 4.5x faster when available
> - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:
>
> +0x180 movups (%r15), %xmm1
> +0x184 vcvtph2ps (%rbx), %xmm2
> +0x189 movaps %xmm1, %xmm3
> +0x18c shufps $255, %xmm3, %xmm3
> +0x190 movaps %xmm0, %xmm4
> +0x193 subps %xmm3, %xmm4
> +0x196 mulps %xmm2, %xmm4
> +0x199 addps %xmm1, %xmm4
> +0x19c vcvtps2ph $0, %xmm4, (%rbx)
> +0x1a2 addq $16, %r15
> +0x1a6 addq $8, %rbx
> +0x1aa decl %r14d
> +0x1ad jne +0x180
>
> If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9
>
> Committed: https://skia.googlesource.com/skia/+/3faf74b8364491ca806f523fbb1d8a97be592663

TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1897433002
/external/skia/src/opts/SkNx_sse.h
3faf74b8364491ca806f523fbb1d8a97be592663 15-Apr-2016 mtklein <mtklein@chromium.org> skcpu: sse4.1 floor, f16c f16<->f32

- floor with roundps is about 4.5x faster when available
- f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:

+0x180 movups (%r15), %xmm1
+0x184 vcvtph2ps (%rbx), %xmm2
+0x189 movaps %xmm1, %xmm3
+0x18c shufps $255, %xmm3, %xmm3
+0x190 movaps %xmm0, %xmm4
+0x193 subps %xmm3, %xmm4
+0x196 mulps %xmm2, %xmm4
+0x199 addps %xmm1, %xmm4
+0x19c vcvtps2ph $0, %xmm4, (%rbx)
+0x1a2 addq $16, %r15
+0x1a6 addq $8, %rbx
+0x1aa decl %r14d
+0x1ad jne +0x180

If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9

Review URL: https://codereview.chromium.org/1891513002
/external/skia/src/opts/SkNx_sse.h
834d9e109298ae704043128005f8c1bc622350f4 15-Apr-2016 mtklein <mtklein@google.com> Revert of skcpu: sse4.1 floor, f16c f16<->f32 (patchset #10 id:180001 of https://codereview.chromium.org/1891513002/ )

Reason for revert:
Need to change around my #if guards so that clang-cl is treated like GCC and Clang, rather than MSVC.

Original issue's description:
> skcpu: sse4.1 floor, f16c f16<->f32
>
> - floor with roundps is about 4.5x faster when available
> - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:
>
> +0x180 movups (%r15), %xmm1
> +0x184 vcvtph2ps (%rbx), %xmm2
> +0x189 movaps %xmm1, %xmm3
> +0x18c shufps $255, %xmm3, %xmm3
> +0x190 movaps %xmm0, %xmm4
> +0x193 subps %xmm3, %xmm4
> +0x196 mulps %xmm2, %xmm4
> +0x199 addps %xmm1, %xmm4
> +0x19c vcvtps2ph $0, %xmm4, (%rbx)
> +0x1a2 addq $16, %r15
> +0x1a6 addq $8, %rbx
> +0x1aa decl %r14d
> +0x1ad jne +0x180
>
> If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9

TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1891993002
/external/skia/src/opts/SkNx_sse.h
cbe3c1af987d622ea67ef560d855b41bb14a0ce9 14-Apr-2016 mtklein <mtklein@chromium.org> skcpu: sse4.1 floor, f16c f16<->f32

- floor with roundps is about 4.5x faster when available
- f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:

+0x180 movups (%r15), %xmm1
+0x184 vcvtph2ps (%rbx), %xmm2
+0x189 movaps %xmm1, %xmm3
+0x18c shufps $255, %xmm3, %xmm3
+0x190 movaps %xmm0, %xmm4
+0x193 subps %xmm3, %xmm4
+0x196 mulps %xmm2, %xmm4
+0x199 addps %xmm1, %xmm4
+0x19c vcvtps2ph $0, %xmm4, (%rbx)
+0x1a2 addq $16, %r15
+0x1a6 addq $8, %rbx
+0x1aa decl %r14d
+0x1ad jne +0x180

If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1891513002
/external/skia/src/opts/SkNx_sse.h
f8f90e4a85638faa18e7b4133cfe4d1ff5b1b23e 21-Mar-2016 mtklein <mtklein@chromium.org> SkNx refresh

- rearrange a bit
- fewer macros
- hooks for all operators
- add left and right scalar operator overrides
- add +=, &=, <<=, etc.
- add SkNx_split() and SkNx_join()
- simplify the many rsqrt() and invert() options to just what we actually use

This refactoring pointed out that our float <-> int NEON conversions are not specialized, so I've implemented them. It seems nice that this is an error rather than silently falling back to serial code.

It's unclear to me if split/join want to be external, static methods, or non-static methods (SkNx_join(), Sk4f::Join(), x.join()). Time will tell?

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1812233003
CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1812233003
/external/skia/src/opts/SkNx_sse.h
fd5a26080d4647cc913f5f2b2dc72cb35abac3ab 01-Mar-2016 herb <herb@google.com> Add swizzle for rgb8888.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1746153002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1746153002
/external/skia/src/opts/SkNx_sse.h
7c249e531900929c2fe2cdde76619fa6d2538c49 21-Feb-2016 mtklein <mtklein@chromium.org> SkNx: kth<...>() -> [...]

Just some syntax cleanup. No real change: kth<...>() was calling [...] already.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1714363002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1714363002
/external/skia/src/opts/SkNx_sse.h
0cf795fd1135babe0ee0b3585f3ad49a02fe1387 17-Feb-2016 mtklein <mtklein@chromium.org> fast sk4f <-> sk4i SSE methods

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1707673002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1707673002
/external/skia/src/opts/SkNx_sse.h
97120a7ed193b2e081c9503a7c58657e0c6f4920 12-Feb-2016 mtklein <mtklein@google.com> Revert of SkNx refactoring (patchset #4 id:60001 of https://codereview.chromium.org/1690633003/ )

Reason for revert:
Precautionary revert for chromium:586487

Original issue's description:
> SkNx refactoring
>
> - add back Sk4i typedef
> - define SSE casts in terms of Sk4i
> * uint8 <-> float becomes uint8 <-> int <-> float
> * uint16 <-> float becomes uint16 <-> int <-> float
>
> This has the nice side effect of specializing uint8 <-> int
> and uint16 <-> int, which are useful in their own right.
>
> There are many cast specializations now, some of which call each other.
> I have tried to arrange them in some sort of sensible order, subject to
> the constraint that those called must precede those who call.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1690633003
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/c1eb311f4e98934476f1b2ad5d6de772cf140d60

TBR=herb@google.com,mtklein@chromium.org
# Not skipping CQ checks because original CL landed more than 1 days ago.
BUG=chromium:586487

Review URL: https://codereview.chromium.org/1696903002
/external/skia/src/opts/SkNx_sse.h
c1eb311f4e98934476f1b2ad5d6de772cf140d60 11-Feb-2016 mtklein <mtklein@chromium.org> SkNx refactoring

- add back Sk4i typedef
- define SSE casts in terms of Sk4i
* uint8 <-> float becomes uint8 <-> int <-> float
* uint16 <-> float becomes uint16 <-> int <-> float

This has the nice side effect of specializing uint8 <-> int
and uint16 <-> int, which are useful in their own right.

There are many cast specializations now, some of which call each other.
I have tried to arrange them in some sort of sensible order, subject to
the constraint that those called must precede those who call.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1690633003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1690633003
/external/skia/src/opts/SkNx_sse.h
e5fe9a42d487c8648101c6f8454575d6de1acafa 10-Feb-2016 mtklein <mtklein@chromium.org> Sk4f: floor() via int32_t roundtrip.

About 25% faster on both x86 and ARMv7.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1682953002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1682953002
/external/skia/src/opts/SkNx_sse.h
126626e52310b39429ad5eaa8e68e0caa24e6202 10-Feb-2016 mtklein <mtklein@chromium.org> Sk4f: add floor()

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Committed: https://skia.googlesource.com/skia/+/86c6c4935171a1d2d6a9ffbff37ec6dac1326614

CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus7-GPU-Tegra3-Arm7-Release-Trybot,Test-Android-GCC-Nexus9-GPU-TegraK1-Arm64-Release-Trybot

Review URL: https://codereview.chromium.org/1685773002
/external/skia/src/opts/SkNx_sse.h
c023210893da7ca407bad8c9f07c8339ee81854c 10-Feb-2016 mtklein <mtklein@google.com> Revert of Sk4f: add floor() (patchset #3 id:40001 of https://codereview.chromium.org/1685773002/ )

Reason for revert:
build break must be this, right?

Original issue's description:
> Sk4f: add floor()
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/86c6c4935171a1d2d6a9ffbff37ec6dac1326614

TBR=herb@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1679343004
/external/skia/src/opts/SkNx_sse.h
86c6c4935171a1d2d6a9ffbff37ec6dac1326614 09-Feb-2016 mtklein <mtklein@chromium.org> Sk4f: add floor()

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1685773002
/external/skia/src/opts/SkNx_sse.h
8273ca460fcde18ec884279fb9a41f330e92e0f9 09-Feb-2016 mtklein <mtklein@chromium.org> restore sk4i SSE specialization

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1679343003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1679343003
/external/skia/src/opts/SkNx_sse.h
e4c0beed744d09dae4757c1893d8caa64ee09cd2 09-Feb-2016 mtklein <mtklein@chromium.org> sknx refactoring

- trim unused specializations (Sk4i, Sk2d) and apis (SkNx_dup)
- expand apis a little
* v[0] == v.kth<0>()
* SkNx_shuffle can now convert to different-sized vectors, e.g. Sk2f <-> Sk4f
- remove anonymous namespace

I believe it's safe to remove the anonymous namespace right now.
We're worried about violating the One Definition Rule; the anonymous namespace protected us from that.

In Release builds, this is mostly moot, as everything tends to inline completely.
In Debug builds, violating the ODR is at worst an inconvenience, time spent trying to figure out why the bot is broken.

Now that we're building with SSE2/NEON everywhere, very few bots have even a chance about getting confused by two definitions of the same type or function. Where we do compile variants depending on, e.g., SSSE3, we do so in static inline functions. These are not subject to the ODR.

I plan to follow up with a tedious .kth<...>() -> [...] auto-replace.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1683543002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1683543002
/external/skia/src/opts/SkNx_sse.h
629f25a7ef229d97e22530eae043882b20cd8d8c 08-Feb-2016 mtklein <mtklein@chromium.org> fix float <---> uint16_t conversion, with Mike's tests.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1679713002

patch from issue 1679713002 at patchset 1 (http://crrev.com/1679713002#ps1)
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1676893002
/external/skia/src/opts/SkNx_sse.h
2d340f2f37a28e82323428725019f12a8538f48e 07-Feb-2016 mtklein <mtklein@chromium.org> could not resist: fast sse float <--> u16

- generalizes the bench to float <--> {u8,u16}
- must remember to implement NEON version at some point

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1676853002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1676853002
/external/skia/src/opts/SkNx_sse.h
507ef6d68115ae9e6d884bb36436a1463523d893 31-Jan-2016 mtklein <mtklein@chromium.org> SkNx Load/store: take any pointer.

This means we can remove a lot of explicit casts in code that uses SkNx.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1650653002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1650653002
/external/skia/src/opts/SkNx_sse.h
550e9b0ef1c0fba42e2e902a467af322ad2413da 20-Jan-2016 mtklein <mtklein@chromium.org> SkNx miplevel building

All sizes approximately twice as fast.

Before:
micros bench
1649.35 mipmap_build_512x512 nonrendering
1824.42 mipmap_build_511x512 nonrendering
2100.66 ? mipmap_build_512x511 nonrendering
2375.94 mipmap_build_511x511 nonrendering

After:
micros bench
730.32 ! mipmap_build_512x512 nonrendering
922.12 mipmap_build_511x512 nonrendering
999.07 mipmap_build_512x511 nonrendering
1342.93 ! mipmap_build_511x511 nonrendering

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1606013003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Committed: https://skia.googlesource.com/skia/+/3bd5aba2a0e165997f683cf3aa306661e71464f6

Review URL: https://codereview.chromium.org/1606013003
/external/skia/src/opts/SkNx_sse.h
26def8687f9763dcc00ab3022aee4b86700970c1 20-Jan-2016 mtklein <mtklein@google.com> Revert of SkNx miplevel building (patchset #3 id:40001 of https://codereview.chromium.org/1606013003/ )

Reason for revert:
Paranoid revert to see if it helps skia:4823

Original issue's description:
> SkNx miplevel building
>
> All sizes approximately twice as fast.
>
> Before:
> micros bench
> 1649.35 mipmap_build_512x512 nonrendering
> 1824.42 mipmap_build_511x512 nonrendering
> 2100.66 ? mipmap_build_512x511 nonrendering
> 2375.94 mipmap_build_511x511 nonrendering
>
> After:
> micros bench
> 730.32 ! mipmap_build_512x512 nonrendering
> 922.12 mipmap_build_511x512 nonrendering
> 999.07 mipmap_build_512x511 nonrendering
> 1342.93 ! mipmap_build_511x511 nonrendering
>
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1606013003
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/3bd5aba2a0e165997f683cf3aa306661e71464f6

TBR=reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1607323002
/external/skia/src/opts/SkNx_sse.h
3bd5aba2a0e165997f683cf3aa306661e71464f6 19-Jan-2016 mtklein <mtklein@chromium.org> SkNx miplevel building

All sizes approximately twice as fast.

Before:
micros bench
1649.35 mipmap_build_512x512 nonrendering
1824.42 mipmap_build_511x512 nonrendering
2100.66 ? mipmap_build_512x511 nonrendering
2375.94 mipmap_build_511x511 nonrendering

After:
micros bench
730.32 ! mipmap_build_512x512 nonrendering
922.12 mipmap_build_511x512 nonrendering
999.07 mipmap_build_512x511 nonrendering
1342.93 ! mipmap_build_511x511 nonrendering

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1606013003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1606013003
/external/skia/src/opts/SkNx_sse.h
c33065a93ad0874672c4c66b9711aa0b3ef7b7e7 15-Jan-2016 mtklein <mtklein@chromium.org> add SkNx::abs(), for now only implemented for Sk4f

There's no reason we couldn't implement this for all ints and floats;
just don't want to land unused code.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1590843003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1590843003
/external/skia/src/opts/SkNx_sse.h
fce612ac32a224d874acfce2b1ccf8db1a553307 15-Dec-2015 mtklein <mtklein@chromium.org> Specialize Sk2d for SSE2

Given the autovectorization we've seen, I wouldn't expect big speedups
from this, but it does give us a point of control over what's going on.

BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1526923003
/external/skia/src/opts/SkNx_sse.h
6f37b4a4757ea3eb00c76162cc37f8a56c3b8bdb 14-Dec-2015 mtklein <mtklein@chromium.org> Unify some SkNx code

- one base case and one N=1 case instead of two each (or three with doubles)
- use SkNx_cast instead of FromBytes/toBytes
- 4-at-a-time Sk4f::ToBytes becomes a special standalone Sk4f_ToBytes

If I did everything right, this'll be perf- and pixel- neutral.

https://gold.skia.org/search2?issue=1526523003&unt=true&query=source_type%3Dgm&master=false

BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1526523003
/external/skia/src/opts/SkNx_sse.h
c2e0ac4fce5bfe5bac9f2cb1ebd3aea3a8900fb1 03-Dec-2015 fmalita <fmalita@chromium.org> Don't use the Sk4f gradient impl without SIMD

Also remove the SK_SUPPORT_LEGACY_LINEAR_GRADIENT_TABLE guard since it is no
longer used in Chromium.

BUG=chromium:563492
R=reed@google.com,mtklein@google.com
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1489233005
/external/skia/src/opts/SkNx_sse.h
9db43ac4ee1a83a4f7b332fe6c00f592b6237349 01-Dec-2015 mtklein <mtklein@chromium.org> Add Sk4f::ToBytes(uint8_t[16], Sk4f, Sk4f, Sk4f, Sk4f)

This is a big speedup for float -> byte. E.g. gradient_linear_clamp_3color:
x86-64 147µs -> 103µs (Broadwell MBP)
arm64 2.03ms -> 648µs (Galaxy S6)
armv7 1.12ms -> 489µs (Galaxy S6, same device!)

BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.android:Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot

Review URL: https://codereview.chromium.org/1483953002
/external/skia/src/opts/SkNx_sse.h
6c221b40680ff933c7b8f2ac3dfa76b5732aee3e 20-Nov-2015 mtklein <mtklein@chromium.org> Add SkNx_cast().

SkNx_cast() can cast between any of our vector types,
provided they have the same number of elements.

Any types should work with the default implementation,
and we can drop in specializations as needed, like the
SSE and NEON Sk4f -> Sk4i I included here as an example.

To make this work, I made some internal name changes:
SkNi<N,T> -> SkNx<N, T>
SkNf<N> -> SkNx<N, float>
User aliases (Sk4f, Sk16b, etc.) stay the same.
We can land this first (it's PS1) if that makes things easier.

BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1464623002
/external/skia/src/opts/SkNx_sse.h
084db25d47dbad3ffbd7d15c04b63d344b351f90 11-Nov-2015 mtklein <mtklein@chromium.org> float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX.

Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x
Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x
Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x
Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x
Xfermode_ColorDodge 25ms -> 16.5ms 0.66x
Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x

Ought to be no pixel diffs:
https://gold.skia.org/search2?issue=1432903002&unt=true&query=source_type%3Dgm&master=false

Incidental stuff:

I made the SkNx(T) constructors implicit to make writing math expressions simpler.
This allows us to write expressions like
Sk4f v;
...
v = v*4;
rather than
Sk4f v;
...
v = v * Sk4f(4);

As written it only works when the constant is on the right-hand side,
so expressions like `(Sk4f(1) - da)` have to stay for now. I plan on
following up with a CL that lets those become `(1 - da)` too.

BUG=skia:4117
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1432903002
/external/skia/src/opts/SkNx_sse.h
6f797092d2054f79374fb98fc1d57ca3554c7db4 09-Nov-2015 mtklein <mtklein@chromium.org> prune unused SkNx features

- remove float -> int conversion, keeping float -> byte
- remove support for doubles

I was thinking of specializing Sk8f for AVX. This will help keep the complexity down.

This may cause minor diffs in radial gradients: toBytes() rounds where castTrunc() truncated. But I don't see any diffs in Gold.
https://gold.skia.org/search2?issue=1411563008&unt=true&query=source_type%3Dgm&master=false

BUG=skia:4117
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1411563008
/external/skia/src/opts/SkNx_sse.h
a508f3c62d726867ad9f722623fbf0ab5d06e440 01-Sep-2015 mtklein <mtklein@chromium.org> Require Sk4f::toBytes() clamps

BUG=skia:4117

CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.android:Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Release-Trybot

Review URL: https://codereview.chromium.org/1312053004
/external/skia/src/opts/SkNx_sse.h
aba1dc8c6aa5cbc4f38ddfce757832359f200b54 31-Aug-2015 mtklein <mtklein@chromium.org> Move float<->byte conversions into Sk4f.

This lets us avoid conversions to [0.0, 1.0] space and rounding that aren't necessary
for SkColorCubeFilter_opts.h.

Dropping rounding on the way back to bytes means we'll see a bunch of off-by-1 diffs.

Rough perf effect:
SSSE3: 110 -> 93 (~15%)
NEON: 465 -> 375 (~20%)

This is the beginning of the end for SkPMFloat as an entity distinct from Sk4f.
I've kept it for now so I can convert sites one by one and think about how things
that really want to keep PM color order will work.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1319413003
/external/skia/src/opts/SkNx_sse.h
082e329887a8f1efe4e1020f0a0a6ea09961712d 12-Aug-2015 mtklein <mtklein@google.com> Revert of Refactor to put SkXfermode_opts inside SK_OPTS_NS. (patchset #1 id:1 of https://codereview.chromium.org/1286093004/ )

Reason for revert:
Maybe causing test / gold problems?

Original issue's description:
> Refactor to put SkXfermode_opts inside SK_OPTS_NS.
>
> Without this refactor I was getting warnings previously about having code
> inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to
> code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1].
>
> That low-level code was in an anonymous namespace to allow multiple independent
> copies of its methods to be instantiated without the linker getting confused /
> offended about violating the One Definition Rule. This was only happening in
> Debug mode where the methods were not being inlined.
>
> To fix this all, I've force-inlined the methods of the low-level code and
> removed the anonymous namespace.
>
> BUG=skia:4117
>
>
> [1] Here is what those errors looked like:
>
> In file included from ../../../../src/core/SkOpts.cpp:18:0:
> ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror]
> class Sk4pxXfermode : public SkProcCoeffXfermode {
> ^
> ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror]
> ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror]
> class SkPMFloatXfermode : public SkProcCoeffXfermode {
> ^
> cc1plus: all warnings being treated as errors
>
> Committed: https://skia.googlesource.com/skia/+/b07bee3121680b53b98b780ac08d14d374dd4c6f

TBR=djsollen@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4117

Review URL: https://codereview.chromium.org/1284333002
/external/skia/src/opts/SkNx_sse.h
b07bee3121680b53b98b780ac08d14d374dd4c6f 12-Aug-2015 mtklein <mtklein@chromium.org> Refactor to put SkXfermode_opts inside SK_OPTS_NS.

Without this refactor I was getting warnings previously about having code
inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to
code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1].

That low-level code was in an anonymous namespace to allow multiple independent
copies of its methods to be instantiated without the linker getting confused /
offended about violating the One Definition Rule. This was only happening in
Debug mode where the methods were not being inlined.

To fix this all, I've force-inlined the methods of the low-level code and
removed the anonymous namespace.

BUG=skia:4117

[1] Here is what those errors looked like:

In file included from ../../../../src/core/SkOpts.cpp:18:0:
../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror]
class Sk4pxXfermode : public SkProcCoeffXfermode {
^
../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror]
../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror]
class SkPMFloatXfermode : public SkProcCoeffXfermode {
^
cc1plus: all warnings being treated as errors

Review URL: https://codereview.chromium.org/1286093004
/external/skia/src/opts/SkNx_sse.h
4be181e304d2b280c6801bd13369cfba236d1a66 14-Jul-2015 mtklein <mtklein@chromium.org> 3-15% speedup to HardLight / Overlay xfermodes.

While investigating my bug (skia:4052) I saw this TODO and figured
it'd make me feel better about an otherwise unsuccessful investigation.

This speeds up HardLight and Overlay (same code) by about 15% with SSE, mostly
by rewriting the logic from 1 cheap comparison and 2 expensive div255() calls
to 2 cheap comparisons and 1 expensive div255().

NEON speeds up by a more modest ~3%.

BUG=skia:

Review URL: https://codereview.chromium.org/1230663005
/external/skia/src/opts/SkNx_sse.h
2aab22a58a366df4752c1cf0f004092c6e7be335 26-Jun-2015 mtklein <mtklein@chromium.org> Color dodge and burn with SkPMFloat.

Both 25-35% faster with SSE.
With NEON, Burn measures as a ~10% regression, Dodge a huge 2.9x improvement.

The Burn regression is somewhat artificial: we're drawing random colored rects onto an opaque white dst, so we're heavily biased toward the (d==da) fast path in the serial code. In the vector code there's no short-circuiting and we always pay a fixed cost for ColorBurn regardless of src or dst content.

Dodge's fast paths, in contrast, only trigger when (s==sa) or (d==0), neither of which happens any more than randomly in our benchmark. I don't think (d==0) should happen at all. Similarly, the (s==0) Burn fast path is really only going to happen as often as SkRandom allows.

In practice, the existing Burn benchmark is hitting its fast path 100% of the time. So I actually feel really great that this only dings the benchmark by 10%.

Chrome's still guarded by SK_SUPPORT_LEGACY_XFERMODES, which I'll lift after finishing the last xfermode, SoftLight.

BUG=skia:

Review URL: https://codereview.chromium.org/1214443002
/external/skia/src/opts/SkNx_sse.h
b5e861185a69b3e47d1c1bf622fd3f83e5f13898 25-Jun-2015 mtklein <mtklein@chromium.org> Implement four more xfermodes with Sk4px.

HardLight, Overlay, Darken, and Lighten are all
~2x faster with SSE, ~25% faster with NEON.

This covers all previously-implemented NEON xfermodes.
3 previous SSE xfermodes remain. Those need division
and sqrt, so I'm planning on using SkPMFloat for them.
It'll help the readability and NEON speed if I move that
into [0,1] space first.

The main new concept here is c.thenElse(t,e), which behaves like
(c ? t : e) except, of course, both t and e are evaluated. This allows
us to emulate conditionals with vectors.

This also removes the concept of SkNb. Instead of a standalone bool
vector, each SkNi or SkNf will just return their own types for
comparisons. Turns out to be a lot more manageable this way.

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc

CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot

Review URL: https://codereview.chromium.org/1196713004
/external/skia/src/opts/SkNx_sse.h
0cc1f0a8d5ed69c76d75061bc2dee3b1d0ce0605 24-Jun-2015 mtklein <mtklein@google.com> Revert of Implement four more xfermodes with Sk4px. (patchset #16 id:290001 of https://codereview.chromium.org/1196713004/)

Reason for revert:
64-bit ARM build failures.

Original issue's description:
> Implement four more xfermodes with Sk4px.
>
> HardLight, Overlay, Darken, and Lighten are all
> ~2x faster with SSE, ~25% faster with NEON.
>
> This covers all previously-implemented NEON xfermodes.
> 3 previous SSE xfermodes remain. Those need division
> and sqrt, so I'm planning on using SkPMFloat for them.
> It'll help the readability and NEON speed if I move that
> into [0,1] space first.
>
> The main new concept here is c.thenElse(t,e), which behaves like
> (c ? t : e) except, of course, both t and e are evaluated. This allows
> us to emulate conditionals with vectors.
>
> This also removes the concept of SkNb. Instead of a standalone bool
> vector, each SkNi or SkNf will just return their own types for
> comparisons. Turns out to be a lot more manageable this way.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1205703008
/external/skia/src/opts/SkNx_sse.h
b9d4163bebab0f5639f9c5928bb5fc15f472dddc 24-Jun-2015 mtklein <mtklein@chromium.org> Implement four more xfermodes with Sk4px.

HardLight, Overlay, Darken, and Lighten are all
~2x faster with SSE, ~25% faster with NEON.

This covers all previously-implemented NEON xfermodes.
3 previous SSE xfermodes remain. Those need division
and sqrt, so I'm planning on using SkPMFloat for them.
It'll help the readability and NEON speed if I move that
into [0,1] space first.

The main new concept here is c.thenElse(t,e), which behaves like
(c ? t : e) except, of course, both t and e are evaluated. This allows
us to emulate conditionals with vectors.

This also removes the concept of SkNb. Instead of a standalone bool
vector, each SkNi or SkNf will just return their own types for
comparisons. Turns out to be a lot more manageable this way.

BUG=skia:

Review URL: https://codereview.chromium.org/1196713004
/external/skia/src/opts/SkNx_sse.h
059ac00446404506a46cd303db15239c7aae49d5 22-Jun-2015 mtklein <mtklein@chromium.org> Update some Sk4px APIs.

Mostly this is about ergonomics, making it easier to do good operations and hard / impossible to do bad ones.

- SkAlpha / SkPMColor constructors become static factories.
- Remove div255TruncNarrow(), rename div255RoundNarrow() to div255(). In practice we always want to round, and the narrowing to 8-bit is contextually obvious.
- Rename fastMulDiv255Round() approxMulDiv255() to stress it's approximate-ness over its speed. Drop Round for the same reason as above... we should always round.
- Add operator overloads so we don't have to keep throwing in seemingly-random Sk4px() or Sk4px::Wide() casts.
- use operator*() for 8-bit x 8-bit -> 16-bit math. It's always what we want, and there's generally no 8x8->8 alternative.
- MapFoo can take a const Func&. Don't think it makes a big difference, but nice to do.

BUG=skia:

Review URL: https://codereview.chromium.org/1202013002
/external/skia/src/opts/SkNx_sse.h
aa999cb952753d373c03befa7fce8c4700ef9308 23-May-2015 mtklein <mtklein@chromium.org> Everyone gets a namespace {}.

If we include Sk4px.h, SkPMFloat.h, or SkNx.h into files with different
SIMD flags, that could cause different definitions of the same method.

Normally that's moot, because all the code inlines, but in Debug it tends not
to. So in Debug, the linker picks one definition for us. That breaks _someone_.

Wrapping everything in a namespace {} keeps the definitions separate.

Tested locally, it fixes this bug.
BUG=skia:3861

This code is not yet enabled in Chrome, so shouldn't affect the roll.
NOTREECHECKS=true

Review URL: https://codereview.chromium.org/1154523004
/external/skia/src/opts/SkNx_sse.h
27e517ae533775889c98c65fa2f07b98357ecbc2 15-May-2015 mtklein <mtklein@chromium.org> add Min to SkNi, specialized for u8 and u16 on SSE and NEON

0x8001 / 0x7fff don't seem to work, but we were close: 0x8000 does.

I plan to use this to implement the Difference xfermode,
and it seems generally handy.

BUG=skia:

Review URL: https://codereview.chromium.org/1133933004
/external/skia/src/opts/SkNx_sse.h
6cbf18c70bf99f58b2bb1c49cdf8d41be561fee4 13-May-2015 mtklein <mtklein@chromium.org> Plus xfermode using Sk4px.

Xfermode_Plus runs 4-5x faster.

We expect mixed_xfermodes to have a small diff. This is because kFoldCoverageIntoSrcAlpha was incorrectly set to true.

This implementation handily beats the Sk4f impl, the portable impl, and the existing SSE2 impl. Reading the SkXfermodes_opts_SSE2.cpp file, I'm pretty confident that we'll be able to beat all SSE2 impls.

I believe this impl will beat or match the existing NEON impl too, but that may not be true for more complicated xfermodes. They can take advantage of transposing ARGBARGB... to AAAARRRR.... cheaply and I haven't figured out an abstraction for that yet that doesn't screw SSE.

Adds:
- MapDstSrc() to Sk4px
- saturatedAdd() to SkNi (only implemented as far as it's used).
- div255Narrow()

BUG=skia:

Review URL: https://codereview.chromium.org/1138893002
/external/skia/src/opts/SkNx_sse.h
d2ffd36eb62e99abe2920369d1e040954cc2044f 12-May-2015 mtklein <mtklein@chromium.org> Sk4px

Xfermode_SrcOver:
SSE: 2.08ms -> 2.03ms (~2% faster)
NEON: my N5 is noisy, but there appears to be no perf change

BUG=skia:

Review URL: https://codereview.chromium.org/1132273004
/external/skia/src/opts/SkNx_sse.h
d7c014ff03d44d3ed7a6a2ddca59621a7e98f739 27-Apr-2015 mtklein <mtklein@chromium.org> Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM

This is a logical no-op. Everything was using the equivalent of rsqrt1() before, and is now after.

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/9de16283fdc8cc0d31a84f503578d0ecea4e8297

CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot

Review URL: https://codereview.chromium.org/1109913002
/external/skia/src/opts/SkNx_sse.h
9a22f489e8722dd83c65f33fb886019d9f60e479 27-Apr-2015 mtklein <mtklein@google.com> Revert of Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM (patchset #2 id:20001 of https://codereview.chromium.org/1109913002/)

Reason for revert:
arm64 typos

Original issue's description:
> Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM
>
> This is a logical no-op. Everything was using the equivalent of rsqrt1() before, and is now after.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/9de16283fdc8cc0d31a84f503578d0ecea4e8297

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1105233003
/external/skia/src/opts/SkNx_sse.h
9de16283fdc8cc0d31a84f503578d0ecea4e8297 27-Apr-2015 mtklein <mtklein@chromium.org> Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM

This is a logical no-op. Everything was using the equivalent of rsqrt1() before, and is now after.

BUG=skia:

Review URL: https://codereview.chromium.org/1109913002
/external/skia/src/opts/SkNx_sse.h
1113da72eced20480491bb87ade0ffcff4eb8ea7 27-Apr-2015 mtklein <mtklein@chromium.org> Mike's radial gradient CL with better float -> int.

patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001)

This looks quite launchable. radial_gradient3, min of 100 samples:
N5: 985µs -> 946µs
MBP: 395µs -> 279µs

On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time. Is that something we could do in float math rather than with a lookup table?

BUG=skia:

CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Mac10.8-Clang-Arm7-Debug-Android-Trybot,Build-Ubuntu-GCC-Arm7-Release-Android_NoNeon-Trybot

Committed: https://skia.googlesource.com/skia/+/abf6c5cf95e921fae59efb487480e5b5081cf0ec

Review URL: https://codereview.chromium.org/1109643002
/external/skia/src/opts/SkNx_sse.h
8d3e9dff3f3db3fa77c383e4cd6c47b9898a8fcd 27-Apr-2015 mtklein <mtklein@google.com> Revert of Mike's radial gradient CL with better float -> int. (patchset #7 id:120001 of https://codereview.chromium.org/1109643002/)

Reason for revert:
compile failures.

Original issue's description:
> Mike's radial gradient CL with better float -> int.
>
> patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001)
>
> This looks quite launchable. radial_gradient3, min of 100 samples:
> N5: 985µs -> 946µs
> MBP: 395µs -> 279µs
>
> On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time. Is that something we could do in float math rather than with a lookup table?
>
> BUG=skia:
>
> CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Debug-Trybot,Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/abf6c5cf95e921fae59efb487480e5b5081cf0ec

TBR=reed@google.com,robertphillips@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1109883003
/external/skia/src/opts/SkNx_sse.h
abf6c5cf95e921fae59efb487480e5b5081cf0ec 27-Apr-2015 mtklein <mtklein@chromium.org> Mike's radial gradient CL with better float -> int.

patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001)

This looks quite launchable. radial_gradient3, min of 100 samples:
N5: 985µs -> 946µs
MBP: 395µs -> 279µs

On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time. Is that something we could do in float math rather than with a lookup table?

BUG=skia:

CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Debug-Trybot,Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot

Review URL: https://codereview.chromium.org/1109643002
/external/skia/src/opts/SkNx_sse.h
115acee9386e685f9a5938fb2cf13fd5a475012a 14-Apr-2015 mtklein <mtklein@chromium.org> Sk4h and Sk8h for SSE

These will underly the SkPMFloat-like class for uint16_t components.

Sk4h will back a single-pixel version, and Sk8h any larger number than that.

BUG=skia:

Review URL: https://codereview.chromium.org/1088883005
/external/skia/src/opts/SkNx_sse.h
8fe8fffdfa7464c6f7da773b8660a2043f4998e0 14-Apr-2015 mtklein <mtklein@chromium.org> Rename SkNi to SkNb.

As used today, SkNi is used in bool-y contexts. This keeps that, but under a
new name, SkNb. This makes room for a new SkNi that's focused on integer-y
things like loads, stores, arithmetic, etc.

The main reason to split these is that we want different specializations for
each use case: for bools, it's important for us to specialize 32- and 64-bit to
support efficient float- and double- comparisons, but for integer work we're
more likely to be looking at 8- and 16- bit lanes. Keeping these use cases
siloed helps me manage the compexity of the backend NEON and SSE code.

BUG=skia:

Review URL: https://codereview.chromium.org/1083123002
/external/skia/src/opts/SkNx_sse.h
7792dbf7ea089b3bcb81792a3ecda8a6f8b421e7 03-Apr-2015 mtklein <mtklein@chromium.org> Code's more readable when SkPMFloat is an Sk4f.
#floats

BUG=skia:
BUG=skia:3592

Committed: https://skia.googlesource.com/skia/+/6b5dab889579f1cc9e1b5278f4ecdc4c63fe78c9

CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot

Review URL: https://codereview.chromium.org/1061603002
/external/skia/src/opts/SkNx_sse.h
e758579d4f85f4615d62e87847dd3f8b38e3a6e2 03-Apr-2015 mtklein <mtklein@google.com> Revert of Code's more readable when SkPMFloat is an Sk4f. (patchset #3 id:40001 of https://codereview.chromium.org/1061603002/)

Reason for revert:
missed some neon code

Original issue's description:
> Code's more readable when SkPMFloat is an Sk4f.
> #floats
>
> BUG=skia:
> BUG=skia:3592
>
> Committed: https://skia.googlesource.com/skia/+/6b5dab889579f1cc9e1b5278f4ecdc4c63fe78c9

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1056143004
/external/skia/src/opts/SkNx_sse.h
6b5dab889579f1cc9e1b5278f4ecdc4c63fe78c9 03-Apr-2015 mtklein <mtklein@chromium.org> Code's more readable when SkPMFloat is an Sk4f.
#floats

BUG=skia:
BUG=skia:3592

Review URL: https://codereview.chromium.org/1061603002
/external/skia/src/opts/SkNx_sse.h
a156a8ffbe1342a9c329e66ad1438934ac309d70 03-Apr-2015 mtklein <mtklein@chromium.org> Use switch operator[](int) to kth<int>() so we can use vget_lane.
#floats

BUG=skia:
BUG=skia:3592

Review URL: https://codereview.chromium.org/1059743002
/external/skia/src/opts/SkNx_sse.h
c9adb05b64fa0bfadf9d1a782afcda470da68c9e 30-Mar-2015 mtklein <mtklein@chromium.org> Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>

The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.

This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.

This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.

To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.

You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good

No public API changes.
TBR=reed@google.com

BUG=skia:3592

Review URL: https://codereview.chromium.org/1048593002
/external/skia/src/opts/SkNx_sse.h