Cross Reference: /external/skia/src/opts/SkNx

History log of /external/skia/src/opts/SkNx_sse.h
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
9206c76337cd349c769103e06af005edd0583a10	30-Jan-2017	Florin Malita <fmalita@chromium.org>	SkRasterPipeline shader adapter Reland of https://skia-review.googlesource.com/c/7615. (lifted from https://skia-review.googlesource.com/c/7088/) CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I797a2f0ae80209c8637875418e08d2fa03249672 Reviewed-on: https://skia-review.googlesource.com/7731 Commit-Queue: Florin Malita <fmalita@google.com> Reviewed-by: Mike Reed <reed@google.com> /external/skia/src/opts/SkNx_sse.h
1da27ef853ae3e701b7f4aae670c21684396dcce	19-Jan-2017	Matt Sarett <msarett@google.com>	Add F16 support to SkPNGImageEncoder BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ifd221365a7b9f9a4a4fc5382621e0da7189e1148 Reviewed-on: https://skia-review.googlesource.com/6526 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Leon Scroggins <scroggo@google.com> Commit-Queue: Matt Sarett <msarett@google.com> /external/skia/src/opts/SkNx_sse.h
5bee0b6de6b3ad1166d067e6b5046b48b8240a29	19-Jan-2017	Matt Sarett <msarett@google.com>	Reland "Respect full precision for RGB16 PNGs" (part 2) This lands all the new xform hooks but no change to src/codec. So the new decode features are turned off. I'm relanding this in pieces to try to bisect a strange MSAN error. Original CL: https://skia-review.googlesource.com/c/7085/ BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-MSAN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD,Build-Ubuntu-Clang-x86_64-Release-Fast Change-Id: I451a2a29c73ca475e9e7a5ded58d4948d6b8be19 Reviewed-on: https://skia-review.googlesource.com/7277 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Matt Sarett <msarett@google.com> /external/skia/src/opts/SkNx_sse.h
dfff166db5d5226dc002a22ab3e3097ef971d615	18-Jan-2017	Matt Sarett <msarett@google.com>	Revert "Respect full precision for RGB16 PNGs" This reverts commit 7a090c403da1dad6a2e19f2011158bd894a62d91. Reason for revert: <INSERT REASONING HERE> Original change's description: > Respect full precision for RGB16 PNGs > > BUG=skia: > > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > Change-Id: If58d201daae97bce2f8efbc453c2ec452e682493 > Reviewed-on: https://skia-review.googlesource.com/7085 > Commit-Queue: Matt Sarett <msarett@google.com> > Reviewed-by: Mike Klein <mtklein@chromium.org> > Reviewed-by: Leon Scroggins <scroggo@google.com> > Reviewed-by: Mike Reed <reed@google.com> > TBR=mtklein@chromium.org,mtklein@google.com,msarett@google.com,scroggo@google.com,reed@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ibd9879bc4f65ca0c2457dd0bfb5eb008d9a8f672 Reviewed-on: https://skia-review.googlesource.com/7183 Commit-Queue: Matt Sarett <msarett@google.com> Reviewed-by: Matt Sarett <msarett@google.com> /external/skia/src/opts/SkNx_sse.h
7a090c403da1dad6a2e19f2011158bd894a62d91	17-Jan-2017	Matt Sarett <msarett@google.com>	Respect full precision for RGB16 PNGs BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: If58d201daae97bce2f8efbc453c2ec452e682493 Reviewed-on: https://skia-review.googlesource.com/7085 Commit-Queue: Matt Sarett <msarett@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Leon Scroggins <scroggo@google.com> Reviewed-by: Mike Reed <reed@google.com> /external/skia/src/opts/SkNx_sse.h
e71b167dbd4d8da76495ca85db83d1a3b49aaabd	13-Jan-2017	Mike Klein <mtklein@chromium.org>	Attempt 3: SkRasterPipelineBlitter: support A8 Now that SkOpts_hsw.cpp no longer hooks in SkRasterPipeline_opts, it should be safe to try this again. This reverts commit 86d55b312a2649d80890ccf75f24571ada0265f1. Change-Id: I2d495600ca9d3a0f49c2e02fbaaae349cefac3a1 Reviewed-on: https://skia-review.googlesource.com/6985 Reviewed-by: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
379938e47bc9edb6edfd21aabefa01aed71dd135	13-Jan-2017	Matt Sarett <msarett@google.com>	Use RasterPipeline to support full precision on 16-bit RGBA pngs Reland of Original Change: https://skia-review.googlesource.com/6260 CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I809984dd9af225103bfbe83492a17c19da7c5e40 Reviewed-on: https://skia-review.googlesource.com/6980 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Matt Sarett <msarett@google.com> /external/skia/src/opts/SkNx_sse.h
25b60833e7c3dd25f2317b3f0e7af07f04b5beba	12-Jan-2017	Matt Sarett <msarett@google.com>	Revert "Use RasterPipeline to support full precision on 16-bit RGBA pngs" This reverts commit bb2339da39ab3ee59121acd911920dafcd4a2f72. Reason for revert: Breaks MSAN Original change's description: > Use RasterPipeline to support full precision on 16-bit RGBA pngs > > TODO: Support more precision on 16-bit RGB pngs > > BUG=skia: > > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > Change-Id: I89dfef3b4887b9c4895c17309933883ab90ffa4d > Reviewed-on: https://skia-review.googlesource.com/6260 > Reviewed-by: Mike Reed <reed@google.com> > Reviewed-by: Leon Scroggins <scroggo@google.com> > Reviewed-by: Mike Klein <mtklein@chromium.org> > Commit-Queue: Matt Sarett <msarett@google.com> > TBR=mtklein@chromium.org,mtklein@google.com,msarett@google.com,scroggo@google.com,reed@google.com,reviews@skia.org BUG=skia: NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I47579c20af033a75883e2b35567cb9c690ce54b0 Reviewed-on: https://skia-review.googlesource.com/6975 Commit-Queue: Matt Sarett <msarett@google.com> Reviewed-by: Matt Sarett <msarett@google.com> /external/skia/src/opts/SkNx_sse.h
bb2339da39ab3ee59121acd911920dafcd4a2f72	12-Jan-2017	Matt Sarett <msarett@google.com>	Use RasterPipeline to support full precision on 16-bit RGBA pngs TODO: Support more precision on 16-bit RGB pngs BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I89dfef3b4887b9c4895c17309933883ab90ffa4d Reviewed-on: https://skia-review.googlesource.com/6260 Reviewed-by: Mike Reed <reed@google.com> Reviewed-by: Leon Scroggins <scroggo@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Matt Sarett <msarett@google.com> /external/skia/src/opts/SkNx_sse.h
86d55b312a2649d80890ccf75f24571ada0265f1	07-Jan-2017	Mike Klein <mtklein@chromium.org>	Revert "Retry "SkRasterPipelineBlitter: support A8"..." This reverts commit f55ea6a1deb21120944d406124a2984b5009260a. Reason for revert: crbug.com/679147 Original change's description: > Retry "SkRasterPipelineBlitter: support A8"... > > ...preferring SkA8_Coverage_Blitter over SkRasterPipelineBlitter. > > I think we could make this work with SkRasterPipelineBlitter (tell it, draw white in Src mode with this mask), but the existing blitter is pretty hard to beat in efficiency and correctness. > > CQ_INCLUDE_TRYBOTS=skia.primary:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-MSAN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > > Change-Id: I72df9995c63f3334d8111c59711818cb5ed1e63c > Reviewed-on: https://skia-review.googlesource.com/6627 > Reviewed-by: Mike Klein <mtklein@chromium.org> > Commit-Queue: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org,brianosman@google.com,reed@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I6a36b4c087a52e54f4d591ded40e6a202fb77068 Reviewed-on: https://skia-review.googlesource.com/6760 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
d05204258017a040c97cbd9d0e3f7993211668ad	06-Jan-2017	Matt Sarett <msarett@google.com>	Fix Sk8f::Store4 (for HSW) This should fix the colorspacexform gm in Gold. https://gold.skia.org/search?head=true&include=false&limit=50&neg=false&pos=false&query=name%3Dcolorspacexform%26source_type%3Dgm&unt=true BUG=skia: Change-Id: I05e2c2c0e7d7095f6935e60ff1bf89858380335f Reviewed-on: https://skia-review.googlesource.com/6721 Commit-Queue: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
f55ea6a1deb21120944d406124a2984b5009260a	06-Jan-2017	Mike Klein <mtklein@chromium.org>	Retry "SkRasterPipelineBlitter: support A8"... ...preferring SkA8_Coverage_Blitter over SkRasterPipelineBlitter. I think we could make this work with SkRasterPipelineBlitter (tell it, draw white in Src mode with this mask), but the existing blitter is pretty hard to beat in efficiency and correctness. CQ_INCLUDE_TRYBOTS=skia.primary:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-MSAN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I72df9995c63f3334d8111c59711818cb5ed1e63c Reviewed-on: https://skia-review.googlesource.com/6627 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
4a224f60111643f36d6f1c204142ce34511a32a1	05-Jan-2017	Mike Klein <mtklein@chromium.org>	Revert "SkRasterPipelineBlitter: support A8" This reverts commit f44373c119290b501d4aec7385e16d12c28a1f0f. Reason for revert: MSAN Original change's description: > SkRasterPipelineBlitter: support A8 > > This adds support for loading and storing A8, then uses it in SkRasterPipelineBlitter. > > I think this handles all dst formats now: A8, 565, 8888 (by policy, sRGB only) and F16. > > CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel;skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > > Change-Id: Id207f6e6c56b6bfcc301d77dd23e0959bb7afba8 > Reviewed-on: https://skia-review.googlesource.com/6554 > Reviewed-by: Mike Reed <reed@google.com> > Commit-Queue: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org,reed@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I9ead3c3335e1776e9a1639ca0481253821505d67 Reviewed-on: https://skia-review.googlesource.com/6625 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
f44373c119290b501d4aec7385e16d12c28a1f0f	04-Jan-2017	Mike Klein <mtklein@chromium.org>	SkRasterPipelineBlitter: support A8 This adds support for loading and storing A8, then uses it in SkRasterPipelineBlitter. I think this handles all dst formats now: A8, 565, 8888 (by policy, sRGB only) and F16. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel;skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Id207f6e6c56b6bfcc301d77dd23e0959bb7afba8 Reviewed-on: https://skia-review.googlesource.com/6554 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
0d413f0f455c984eacf46c9d5cb0978d7b73357f	14-Dec-2016	Mike Klein <mtklein@chromium.org>	Revert "Revert "SkNx basically always is fast now."" This reverts commit 8ba64d1996ba6c9ecfb12132cdab7d5d99af7456. Reason for revert: does not appear to have been blocking the roll. Original change's description: > Revert "SkNx basically always is fast now." > > This reverts commit 21f783829619186442041de6008f7f58f4f6250d. > > Reason for revert: roll? > > Original change's description: > > SkNx basically always is fast now. > > > > We had this SKNX_IS_FAST hanging around from before Chrome always built with NEON. > > > > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > > > Change-Id: Ia5cc0323b3ef052192e2903f961aee11eb3f82d8 > > Reviewed-on: https://skia-review.googlesource.com/5946 > > Commit-Queue: Mike Klein <mtklein@chromium.org> > > Reviewed-by: Mike Reed <reed@google.com> > > Reviewed-by: Florin Malita <fmalita@chromium.org> > > > > TBR=mtklein@chromium.org,fmalita@chromium.org,reed@google.com,reviews@skia.org > NOPRESUBMIT=true > NOTREECHECKS=true > NOTRY=true > > Change-Id: I0e57285c68eae0a64213fe29ea4cca5519777954 > Reviewed-on: https://skia-review.googlesource.com/6040 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org,reviews@skia.org,fmalita@chromium.org,reed@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I230dd4c2abb2d14ffc302be5376b9eaacbbeafcc Reviewed-on: https://skia-review.googlesource.com/6026 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
8ba64d1996ba6c9ecfb12132cdab7d5d99af7456	14-Dec-2016	Mike Klein <mtklein@chromium.org>	Revert "SkNx basically always is fast now." This reverts commit 21f783829619186442041de6008f7f58f4f6250d. Reason for revert: roll? Original change's description: > SkNx basically always is fast now. > > We had this SKNX_IS_FAST hanging around from before Chrome always built with NEON. > > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > Change-Id: Ia5cc0323b3ef052192e2903f961aee11eb3f82d8 > Reviewed-on: https://skia-review.googlesource.com/5946 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Mike Reed <reed@google.com> > Reviewed-by: Florin Malita <fmalita@chromium.org> > TBR=mtklein@chromium.org,fmalita@chromium.org,reed@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I0e57285c68eae0a64213fe29ea4cca5519777954 Reviewed-on: https://skia-review.googlesource.com/6040 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
21f783829619186442041de6008f7f58f4f6250d	13-Dec-2016	Mike Klein <mtklein@chromium.org>	SkNx basically always is fast now. We had this SKNX_IS_FAST hanging around from before Chrome always built with NEON. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ia5cc0323b3ef052192e2903f961aee11eb3f82d8 Reviewed-on: https://skia-review.googlesource.com/5946 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> /external/skia/src/opts/SkNx_sse.h
2c8104eb18d15916db8f7b648bd79f7384397af9	29-Nov-2016	Mike Klein <mtklein@chromium.org>	SkNx_abi is unused. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I082c34a1f484715cd2dca55a8d23101235755e6a Reviewed-on: https://skia-review.googlesource.com/5233 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
2e35e8a45c9ff04b88d7a98b381ce56e565d8402	18-Nov-2016	Mike Klein <mtklein@chromium.org>	mirror tiling GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5064 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I9d48ba73e145701245e5ec4f32c8c360da6baddd Reviewed-on: https://skia-review.googlesource.com/5064 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
b273fc429fff8af5984e1e3dc1068c71dd369d57	17-Nov-2016	Mike Klein <mtklein@chromium.org>	repeat tiling Does the note in repeat() look right? GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4980 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I347023f312ab4643fc387e486192c3ae3357db8b Reviewed-on: https://skia-review.googlesource.com/4980 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
06a65e2799eaead18f778792801406aff4aec0d9	17-Nov-2016	Mike Klein <mtklein@chromium.org>	Support SkImageShader in SkRasterPipeline blitter First of many CLs, I'm sure. This handles 8888 or sRGB sources with an affine matrix, clamp/clamp tiling, and nearest-neighbor sampling only. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4906 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I99f7508852b3d44b6f52f7a0bee29a793af35c48 Reviewed-on: https://skia-review.googlesource.com/4906 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
a4a4488a4c3f16758f7e2b050168fe8d2f3b2a4d	04-Nov-2016	mtklein <mtklein@chromium.org>	skrpb: evaluate color filters for constant shaders once. The simplest thing to do here is just run shader+color filter pipeline at construction time to create a new constant color shader (replacing the paint color). This reduces a pipeline like: - constant_color (paint color) - matrix_4x5 - clamp_a - load_d_foo, xfermode, lerp, store_foo to - constant_color (paint color -> matrix_4x5 -> clamp_a) - load_d_foo, xfermode, lerp, store_foo To implement this all, we add a new store_f32 stage that writes SkPM4f, and finally get around to implementing Sk8f::Store4() (store while reinterlacing). Sk4f::Store4() already exists for both SSE and NEON. Next step: reduce simple constant_color -> store pipelines (src mode, full coverage) into non-pipeline memsets. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2480823002 Review-Url: https://codereview.chromium.org/2480823002 /external/skia/src/opts/SkNx_sse.h
7c78f3a863c620d722f02d00b88de5b3cde298a4	19-Oct-2016	Mike Klein <mtklein@chromium.org>	SkNx: use SK_ALWAYS_INLINE thoroughly. MSVC's not so good at inlining. So tell it where to. It won't hurt the others. This has nothing directly to do with ODR safety. The anonymous namespaces and 'static' on freestanding functions provide the correctness we need there. But this change can help to mechanically prevent the sort of problems ODR violations can lead to. I may follow up by extending this strategy further to Sk4px, which is used to implement a lot of the legacy xfermodes. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3608 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I927334c40910ce43da1fbabdf243c9cd5438bea6 Reviewed-on: https://skia-review.googlesource.com/3608 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
5ec5590bf796ef9d966f7348b3ce81a6e3fe3848	15-Oct-2016	Mike Klein <mtklein@chromium.org>	Unbreak -Fast bot. CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3423 Change-Id: I1875c86b785da4483038c10715af7827b7d71f0b Reviewed-on: https://skia-review.googlesource.com/3423 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
511ea17b967914ae8342d7861c6695ec8d492bcc	15-Oct-2016	Mike Klein <mtklein@chromium.org>	SkNx_abi for passing Sk4f as function arguments, etc. CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3422 Change-Id: Idc0a192faa7ff843aef023229186580c69baf1f7 Reviewed-on: https://skia-review.googlesource.com/3422 Reviewed-by: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
1e76464f87ea55a8749eb94b8e6c79638983d65a	14-Oct-2016	Mike Klein <mtklein@chromium.org>	Wrap SkNx types in anonymous namespace again. This should make each compilation unit's SkNx types distinct from each other's as far as C++ cares. This keeps us from violating the One Definition Rule with different implementations for the same function. Here's an example I like. Sk4i SkNx_cast(Sk4b) has at least 4 different sensible implementations: - SSE2: punpcklbw xmm, zero; punpcklbw xmm, zero - SSSE3: load mask; pshufb xmm, mask - SSE4.1: pmovzxbd - AVX2: vpmovzxbd We really want all these to inline, but if for some reason they don't (Debug build, poor inliner) and they're compiled in SkOpts.cpp, SkOpts_ssse3.cpp, SkOpts_sse41.cpp, SkOpts_hsw.cpp... boom! BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3461 Change-Id: I0088ebfd7640c1b0de989738ed43c81b530dc0d9 Reviewed-on: https://skia-review.googlesource.com/3461 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
04adfda9c74481d0b640c0ce18864588babfcdf6	12-Oct-2016	Mike Klein <mtklein@chromium.org>	SkRasterPipeline: 8x pipelines, without any 8x code enabled. Original review here: https://skia-review.googlesource.com/c/2990/ Second attempt here: https://skia-review.googlesource.com/c/3064/ This is the same as the second attempt, but with the change to SkOpts_hsw.cpp left out. That omitted part is the key piece... this just lands the refactoring. CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3242 Change-Id: Iaafa793a4854c2c9cd7e85cca3701bf871253f71 Reviewed-on: https://skia-review.googlesource.com/3242 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
42f4b42e8311f168aeeadd939b476c05b329500e	10-Oct-2016	Mike Klein <mtklein@chromium.org>	Revert "SkRasterPipeline: 8x pipelines, attempt 2" This reverts commit Id0ba250037e271a9475fe2f0989d64f0aa909bae. crbug.com/654213 Looks like Chrome Canary's picking up Haswell code on non-Haswell machines. Change-Id: I16f976da24db86d5c99636c472ffad56db213a2a Reviewed-on: https://skia-review.googlesource.com/3108 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
a71e151c6f0be68dc96ad2d169bbc31edca8f946	07-Oct-2016	Mike Klein <mtklein@chromium.org>	SkRasterPipeline: 8x pipelines, attempt 2 Original review here: https://skia-review.googlesource.com/c/2990/ Changes since: - simpler implementations of load_tail() / store_tail(): slower, but more obviously correct to all compilers - fleshed out math ops on Sk8i and Sk8u to make unit tests happy on -Fast bot (where we always have AVX2) - now storing stage functions as void(*)() to avoid undefined behavior and/or linker problems. This restores 32-bit Windows. - all AVX2 Sk8x methods are marked always-inline, to avoid linking the "wrong" version on Debug builds. CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3064 Change-Id: Id0ba250037e271a9475fe2f0989d64f0aa909bae Reviewed-on: https://skia-review.googlesource.com/3064 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
942858774831fea6e3f5f9d6c39b9538cf031bfd	07-Oct-2016	Mike Klein <mtklein@chromium.org>	Revert "SkRasterPipeline: 8x pipelines" This reverts commit I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a. Reason for revert: lots of failing bots. TBR=mtklein@chromium.org,msarett@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I653bed3905187f43196504f19424985fa2a765b5 Reviewed-on: https://skia-review.googlesource.com/3063 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
1aebdaee0e2aa4324509fd3ad4c40c21703ae4a2	06-Oct-2016	Mike Klein <mtklein@chromium.org>	SkRasterPipeline: 8x pipelines Bench runtime changes: sRGB: 7194 -> 3735 = 1.93x faster F16: 6531 -> 2559 = 2.55x faster Instead of building 4x and 1-3x pipelines and then maybe 8x and 1-7x, instead build either the short ones or the long ones, but not both. If we just take care to use a compatible run_pipeline(), there's some cross-module type disagreement but everything works out in the end. Oddly, a few places that looked like they'd be faster using SkNx_fma() or Sk4f_round()/Sk8f_round() are actually faster the long way, e.g. multiply, add 0.5, truncate. Curious! In all the other places you see here that I've used SkNx_fma(), it's been a significant speedup. This folds in a couple refactors and cleanups that I've been meaning to do. Hope you don't mind... if find the new code considerably easier to read than the old code. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2990 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a Reviewed-on: https://skia-review.googlesource.com/2990 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
33cbfd75afdd383770bb6253c06ba819a2481a35	06-Oct-2016	Mike Klein <mtklein@chromium.org>	Make load4 and store4 part of SkNx properly. Every type now nominally has Load4() and Store4() methods. The ones that we use are implemented. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3046 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I7984f0c2063ef8acbc322bd2e968f8f7eaa0d8fd Reviewed-on: https://skia-review.googlesource.com/3046 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> /external/skia/src/opts/SkNx_sse.h
c0444615ed76360f680619ad4d1f92cda6181a50	16-Sep-2016	msarett <msarett@google.com>	Support Float32 output from SkColorSpaceXform * Adds Float32 support to SkColorSpaceXform * Changes API to allows clients to ask for F32, updates clients to new API * Adds Sk4f_load4 and Sk4f_store4 to SkNx * Make use of new xform in SkGr.cpp BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47 Review-Url: https://codereview.chromium.org/2339233003 /external/skia/src/opts/SkNx_sse.h
c71a9b7f53938b4f33f36f48e867b8b72cc1cc61	16-Sep-2016	msarett <msarett@google.com>	Revert of Support Float32 output from SkColorSpaceXform (patchset #7 id:140001 of https://codereview.chromium.org/2339233003/ ) Reason for revert: Hitting an assert Original issue's description: > Support Float32 output from SkColorSpaceXform > > * Adds Float32 support to SkColorSpaceXform > * Changes API to allows clients to ask for F32, updates clients to > new API > * Adds Sk4f_load4 and Sk4f_store4 to SkNx > * Make use of new xform in SkGr.cpp > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003 > CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47 TBR=brianosman@google.com,mtklein@google.com,scroggo@google.com,mtklein@chromium.org,bsalomon@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2347473007 /external/skia/src/opts/SkNx_sse.h
43d6651111374b5d1e4ddd9030dcf079b448ec47	16-Sep-2016	msarett <msarett@google.com>	Support Float32 output from SkColorSpaceXform * Adds Float32 support to SkColorSpaceXform * Changes API to allows clients to ask for F32, updates clients to new API * Adds Sk4f_load4 and Sk4f_store4 to SkNx * Make use of new xform in SkGr.cpp BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2339233003 /external/skia/src/opts/SkNx_sse.h
15ee3deee8aca2bf6e658449f25ee34a8153e6ee	02-Aug-2016	msarett <msarett@google.com>	Refactor of SkColorSpaceXformOpts (1) Performance is better or stays the same. (2) Code is split into functions (RasterPipeline-ish design). IMO, it's not really more or less readable. But I think it's now much easier add capabilities, apply optimizations, or do more refactors. Or to actually use RasterPipeline. I help back from trying any of these to try to keep this CL sane. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2194303002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2194303002 /external/skia/src/opts/SkNx_sse.h
d05a8752738f84b0115678b3cdad89237173e904	29-Jul-2016	mtklein <mtklein@chromium.org>	SkNx: add Sk4u This lets us get at logical >> in a nicely principled way. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2197683002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2197683002 /external/skia/src/opts/SkNx_sse.h
f660b7cfcfbf3062f88e61f8320ea7051da72213	26-Jul-2016	mtklein <mtklein@chromium.org>	Add Sk4h_load4 for loading F16. Should feel very similar to Sk4h_store4: NEON uses its native instruction, SSE unpacks manually. Since we'll have our F16s in 4 Sk4h by the time we're done here, this also extracts an Sk4h->Sk4f routine from the old uint64_t->Sk4f one. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2184753002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2184753002 /external/skia/src/opts/SkNx_sse.h
6bdbf4412bd1a6fe26be1042ccf080174b13021f	19-Jul-2016	msarett <msarett@google.com>	Improve naive SkColorXform to half floats This should give us a good baseline to explore using SkRasterPipeline. A particular colorxform to half float drops from 425us to 282us on my desktop. Color Xform to Half Float (HP z620) Original 425us Trans16 (not 32) 355us Vector Trans16 378us Trans16 + Keep Halfs in Vector 335us Vector Trans16 + Keep Halfs in Vector 282us Final 282us Color Xform to Half Float (Nexus 5X) Original 556us Final 472us BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2159993003 /external/skia/src/opts/SkNx_sse.h
036e1831e05ae3a6ec9bcd30cb24f6b1a49a3541	15-Jul-2016	mtklein <mtklein@chromium.org>	Add a bench to measure the best way to pack from int to uint16_t with SSE. I measured relative runtimes on my laptop: pack_int_uint16_t_ss… 1036 …e41 1x …se3 1.01x …e2_b 3.01x …e2_a 3.02x I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think, so I'm going to exercise a little cowardice and leave that option disabled for now. The ssse3 version probably looks a little faster than it will be in practice. We'll usually need to load its mask, which here is hoisted out of the bench loop. The two sse2 variants are close enough in speed that I'm tie breaking them on other concerns: the <<16, >>16 version doesn't need any scratch registers or to load any constants, so it wins. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Review-Url: https://codereview.chromium.org/2150343002 /external/skia/src/opts/SkNx_sse.h
58e389b0518b46bbe58ba01c23443cf23c18435c	15-Jul-2016	mtklein <mtklein@chromium.org>	Expand _01 half<->float limitation to _finite. Simplify. It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 Review-Url: https://codereview.chromium.org/2145663003 /external/skia/src/opts/SkNx_sse.h
64bbad360f30930befafee1bdefe4ad89f130dac	14-Jul-2016	mtklein <mtklein@google.com>	Revert of Expand _01 half<->float limitation to _finite. Simplify. (patchset #7 id:120001 of https://codereview.chromium.org/2145663003/ ) Reason for revert: Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast Original issue's description: > Expand _01 half<->float limitation to _finite. Simplify. > > It's become clear we need to sometimes deal with values <0 or >1. > I'm not yet convinced we care about NaN or +-inf. > > We had some fairly clever tricks and optimizations here for NEON > and SSE. I've thrown them out in favor of a single implementation. > If we find the specializations mattered, we can certainly figure out > how to extend them to this new range/domain. > > This happens to add a vectorized float -> half for ARMv7, which was > missing from the _01 version. (The SSE strategy was not portable to > platforms that flush denorm floats to zero.) > > I've tested the full float range for FloatToHalf on my desktop and a 5x. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 > CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot > > Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 TBR=msarett@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2151023003 /external/skia/src/opts/SkNx_sse.h
3296bee70d074bb8094b3229dbe12fa016657e90	14-Jul-2016	mtklein <mtklein@chromium.org>	Expand _01 half<->float limitation to _finite. Simplify. It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2145663003 /external/skia/src/opts/SkNx_sse.h
281b33fdd909ee3f43192cdf950ce00e3df62407	13-Jul-2016	mtklein <mtklein@chromium.org>	SkRasterPipeline preliminaries Re-uploading to see if I can get a CL number < 2^31. patch from issue 2147533002 at patchset 240001 (http://crrev.com/2147533002#ps240001) Already reviewed at the other crrev link. TBR= BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147533002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2144573004 /external/skia/src/opts/SkNx_sse.h
7d3ff7142360f456be4e21e64c6c014cc919785e	12-Jul-2016	msarett <msarett@google.com>	Add Sk4f_RoundToInt BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2134753006 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2134753006 /external/skia/src/opts/SkNx_sse.h
afb8539f62b3e92dcc5dbe0598605f4058a1e03a	11-Jul-2016	msarett <msarett@google.com>	Revert of try to speed-up maprect + round2i + contains (patchset #8 id:140001 of https://codereview.chromium.org/2133413002/ ) Reason for revert: Breaking the roll... https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/253294/steps/compile%20%28with%20patch%29/logs/stdio Original issue's description: > try to speed-up maprect + round2i + contains > > We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well. > > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 > CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/b42b785d1cbc98bd34aceae338060831b974f9c5 TBR=mtklein@google.com,reed@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2136343002 /external/skia/src/opts/SkNx_sse.h
b42b785d1cbc98bd34aceae338060831b974f9c5	11-Jul-2016	reed <reed@google.com>	try to speed-up maprect + round2i + contains We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2133413002 /external/skia/src/opts/SkNx_sse.h
5608e2ed2299496eee3c57e0fe426ae9bd0d07a4	11-Jul-2016	mtklein <mtklein@chromium.org>	Clean up hyper-local SkCpu feature test experiment. This removes the code paths where we make SkCpu::Supports() calls from within a tight loop. It keeps code paths using SkCpu::Supports() to choose entire routines from src/opts/. We can't rely on these hyper-local checks to be hoisted up reliably enough. It worked pretty well with the first couple platforms we tried (e.g. Clang on Linux/Mac) but we can't gaurantee it works everywhere. Further, I'm not able to actually do anything fancy with those tests outside of x86... I've not found a way to get, say, NEON+F16 conversion code embedded into ordinary NEON code outside writing then entire function in external assembly. This whole idea becomes less important now that we've got a way to chain separate function calls together efficiently. We can now, e.g., use an AVX+F16C method to load some pixels, then chain that into an ordinary AVX method to color filter them. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2138073002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2138073002 /external/skia/src/opts/SkNx_sse.h
d2809573deb7b99e764f7f71fe34a5b5322df0b2	20-Jun-2016	msarett <msarett@google.com>	Support sRGB dsts in opt code 201295.jpg on HP z620 (300x280) QCMS Xform 0.418 ms Skia NEW Xform 0.378 ms Vs QCMS 1.11x BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2078623002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2078623002 /external/skia/src/opts/SkNx_sse.h
64f061a4e85bea6a48901b1385b7159f85909a34	17-Jun-2016	mtklein <mtklein@chromium.org>	port SkColorXform_opts to Sk4f I have tested that this compiles. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2078913003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2078913003 /external/skia/src/opts/SkNx_sse.h
e18fa440e74e9af0324de0a1de9b6ffb0fe3c3d3	09-Jun-2016	mtklein <mtklein@chromium.org>	Move immintrin/arm_neon includes to where they are used. On my Mac (so, immintrin), this improves compile time, both wall and cpu, by about 16%. To test I ran this on an SSD with files hot in their caches: $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \ ninja -C out/Release -t clean && \ time ninja -C out/Release Before: 159 wall / 3367 cpu 159 wall / 3368 cpu After: 137 wall / 2860 cpu 136 wall / 2863 cpu I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc. That made no signficant difference, so I've kept immintrin for its simplicity. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot TBR=reed@google.com No public API changes. Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a Review-Url: https://codereview.chromium.org/2045633002 /external/skia/src/opts/SkNx_sse.h
50bcb189f8785a599a3024d8eba4681c2e8ca37a	08-Jun-2016	mtklein <mtklein@google.com>	Revert of Move immintrin/arm_neon includes to where they are used. (patchset #2 id:20001 of https://codereview.chromium.org/2045633002/ ) Reason for revert: Appears to have broken the ARMv7 aspect of the Google3 roll in bizarre seemingly-unrelated ways. Original issue's description: > Move immintrin/arm_neon includes to where they are used. > > On my Mac (so, immintrin), this improves compile time, both wall and cpu, > by about 16%. To test I ran this on an SSD with files hot in their caches: > > $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \ > ninja -C out/Release -t clean && \ > time ninja -C out/Release > > Before: 159 wall / 3367 cpu > 159 wall / 3368 cpu > > After: 137 wall / 2860 cpu > 136 wall / 2863 cpu > > I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc. > That made no signficant difference, so I've kept immintrin for its simplicity. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > TBR=reed@google.com > No public API changes. > > Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a TBR=herb@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2046213002 /external/skia/src/opts/SkNx_sse.h
12dfaaa53c23f3d03050bde8f64136ac1f44164a	07-Jun-2016	mtklein <mtklein@chromium.org>	Move immintrin/arm_neon includes to where they are used. On my Mac (so, immintrin), this improves compile time, both wall and cpu, by about 16%. To test I ran this on an SSD with files hot in their caches: $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \ ninja -C out/Release -t clean && \ time ninja -C out/Release Before: 159 wall / 3367 cpu 159 wall / 3368 cpu After: 137 wall / 2860 cpu 136 wall / 2863 cpu I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc. That made no signficant difference, so I've kept immintrin for its simplicity. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot TBR=reed@google.com No public API changes. Review-Url: https://codereview.chromium.org/2045633002 /external/skia/src/opts/SkNx_sse.h
244a65350e52c9438931ecdc05a4913f29d343bc	19-Apr-2016	mtklein <mtklein@chromium.org>	skcpu: sse4.1 floor, f16c f16<->f32 - floor with roundps is about 4.5x faster when available - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions: +0x180 movups (%r15), %xmm1 +0x184 vcvtph2ps (%rbx), %xmm2 +0x189 movaps %xmm1, %xmm3 +0x18c shufps $255, %xmm3, %xmm3 +0x190 movaps %xmm0, %xmm4 +0x193 subps %xmm3, %xmm4 +0x196 mulps %xmm2, %xmm4 +0x199 addps %xmm1, %xmm4 +0x19c vcvtps2ph $0, %xmm4, (%rbx) +0x1a2 addq $16, %r15 +0x1a6 addq $8, %rbx +0x1aa decl %r14d +0x1ad jne +0x180 If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9 Committed: https://skia.googlesource.com/skia/+/3faf74b8364491ca806f523fbb1d8a97be592663 Review URL: https://codereview.chromium.org/1891513002 /external/skia/src/opts/SkNx_sse.h
2c7f24093a394ccbe54a7db60ba79af14682e7fa	15-Apr-2016	mtklein <mtklein@google.com>	Revert of skcpu: sse4.1 floor, f16c f16<->f32 (patchset #11 id:200001 of https://codereview.chromium.org/1891513002/ ) Reason for revert: this depends on a CL I want to revert Original issue's description: > skcpu: sse4.1 floor, f16c f16<->f32 > > - floor with roundps is about 4.5x faster when available > - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions: > > +0x180 movups (%r15), %xmm1 > +0x184 vcvtph2ps (%rbx), %xmm2 > +0x189 movaps %xmm1, %xmm3 > +0x18c shufps $255, %xmm3, %xmm3 > +0x190 movaps %xmm0, %xmm4 > +0x193 subps %xmm3, %xmm4 > +0x196 mulps %xmm2, %xmm4 > +0x199 addps %xmm1, %xmm4 > +0x19c vcvtps2ph $0, %xmm4, (%rbx) > +0x1a2 addq $16, %r15 > +0x1a6 addq $8, %rbx > +0x1aa decl %r14d > +0x1ad jne +0x180 > > If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9 > > Committed: https://skia.googlesource.com/skia/+/3faf74b8364491ca806f523fbb1d8a97be592663 TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1897433002 /external/skia/src/opts/SkNx_sse.h
3faf74b8364491ca806f523fbb1d8a97be592663	15-Apr-2016	mtklein <mtklein@chromium.org>	skcpu: sse4.1 floor, f16c f16<->f32 - floor with roundps is about 4.5x faster when available - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions: +0x180 movups (%r15), %xmm1 +0x184 vcvtph2ps (%rbx), %xmm2 +0x189 movaps %xmm1, %xmm3 +0x18c shufps $255, %xmm3, %xmm3 +0x190 movaps %xmm0, %xmm4 +0x193 subps %xmm3, %xmm4 +0x196 mulps %xmm2, %xmm4 +0x199 addps %xmm1, %xmm4 +0x19c vcvtps2ph $0, %xmm4, (%rbx) +0x1a2 addq $16, %r15 +0x1a6 addq $8, %rbx +0x1aa decl %r14d +0x1ad jne +0x180 If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9 Review URL: https://codereview.chromium.org/1891513002 /external/skia/src/opts/SkNx_sse.h
834d9e109298ae704043128005f8c1bc622350f4	15-Apr-2016	mtklein <mtklein@google.com>	Revert of skcpu: sse4.1 floor, f16c f16<->f32 (patchset #10 id:180001 of https://codereview.chromium.org/1891513002/ ) Reason for revert: Need to change around my #if guards so that clang-cl is treated like GCC and Clang, rather than MSVC. Original issue's description: > skcpu: sse4.1 floor, f16c f16<->f32 > > - floor with roundps is about 4.5x faster when available > - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions: > > +0x180 movups (%r15), %xmm1 > +0x184 vcvtph2ps (%rbx), %xmm2 > +0x189 movaps %xmm1, %xmm3 > +0x18c shufps $255, %xmm3, %xmm3 > +0x190 movaps %xmm0, %xmm4 > +0x193 subps %xmm3, %xmm4 > +0x196 mulps %xmm2, %xmm4 > +0x199 addps %xmm1, %xmm4 > +0x19c vcvtps2ph $0, %xmm4, (%rbx) > +0x1a2 addq $16, %r15 > +0x1a6 addq $8, %rbx > +0x1aa decl %r14d > +0x1ad jne +0x180 > > If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9 TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1891993002 /external/skia/src/opts/SkNx_sse.h
cbe3c1af987d622ea67ef560d855b41bb14a0ce9	14-Apr-2016	mtklein <mtklein@chromium.org>	skcpu: sse4.1 floor, f16c f16<->f32 - floor with roundps is about 4.5x faster when available - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions: +0x180 movups (%r15), %xmm1 +0x184 vcvtph2ps (%rbx), %xmm2 +0x189 movaps %xmm1, %xmm3 +0x18c shufps $255, %xmm3, %xmm3 +0x190 movaps %xmm0, %xmm4 +0x193 subps %xmm3, %xmm4 +0x196 mulps %xmm2, %xmm4 +0x199 addps %xmm1, %xmm4 +0x19c vcvtps2ph $0, %xmm4, (%rbx) +0x1a2 addq $16, %r15 +0x1a6 addq $8, %rbx +0x1aa decl %r14d +0x1ad jne +0x180 If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1891513002 /external/skia/src/opts/SkNx_sse.h
f8f90e4a85638faa18e7b4133cfe4d1ff5b1b23e	21-Mar-2016	mtklein <mtklein@chromium.org>	SkNx refresh - rearrange a bit - fewer macros - hooks for all operators - add left and right scalar operator overrides - add +=, &=, <<=, etc. - add SkNx_split() and SkNx_join() - simplify the many rsqrt() and invert() options to just what we actually use This refactoring pointed out that our float <-> int NEON conversions are not specialized, so I've implemented them. It seems nice that this is an error rather than silently falling back to serial code. It's unclear to me if split/join want to be external, static methods, or non-static methods (SkNx_join(), Sk4f::Join(), x.join()). Time will tell? BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1812233003 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1812233003 /external/skia/src/opts/SkNx_sse.h
fd5a26080d4647cc913f5f2b2dc72cb35abac3ab	01-Mar-2016	herb <herb@google.com>	Add swizzle for rgb8888. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1746153002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1746153002 /external/skia/src/opts/SkNx_sse.h
7c249e531900929c2fe2cdde76619fa6d2538c49	21-Feb-2016	mtklein <mtklein@chromium.org>	SkNx: kth<...>() -> [...] Just some syntax cleanup. No real change: kth<...>() was calling [...] already. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1714363002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1714363002 /external/skia/src/opts/SkNx_sse.h
0cf795fd1135babe0ee0b3585f3ad49a02fe1387	17-Feb-2016	mtklein <mtklein@chromium.org>	fast sk4f <-> sk4i SSE methods BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1707673002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1707673002 /external/skia/src/opts/SkNx_sse.h
97120a7ed193b2e081c9503a7c58657e0c6f4920	12-Feb-2016	mtklein <mtklein@google.com>	Revert of SkNx refactoring (patchset #4 id:60001 of https://codereview.chromium.org/1690633003/ ) Reason for revert: Precautionary revert for chromium:586487 Original issue's description: > SkNx refactoring > > - add back Sk4i typedef > - define SSE casts in terms of Sk4i > * uint8 <-> float becomes uint8 <-> int <-> float > * uint16 <-> float becomes uint16 <-> int <-> float > > This has the nice side effect of specializing uint8 <-> int > and uint16 <-> int, which are useful in their own right. > > There are many cast specializations now, some of which call each other. > I have tried to arrange them in some sort of sensible order, subject to > the constraint that those called must precede those who call. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1690633003 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/c1eb311f4e98934476f1b2ad5d6de772cf140d60 TBR=herb@google.com,mtklein@chromium.org # Not skipping CQ checks because original CL landed more than 1 days ago. BUG=chromium:586487 Review URL: https://codereview.chromium.org/1696903002 /external/skia/src/opts/SkNx_sse.h
c1eb311f4e98934476f1b2ad5d6de772cf140d60	11-Feb-2016	mtklein <mtklein@chromium.org>	SkNx refactoring - add back Sk4i typedef - define SSE casts in terms of Sk4i * uint8 <-> float becomes uint8 <-> int <-> float * uint16 <-> float becomes uint16 <-> int <-> float This has the nice side effect of specializing uint8 <-> int and uint16 <-> int, which are useful in their own right. There are many cast specializations now, some of which call each other. I have tried to arrange them in some sort of sensible order, subject to the constraint that those called must precede those who call. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1690633003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1690633003 /external/skia/src/opts/SkNx_sse.h
e5fe9a42d487c8648101c6f8454575d6de1acafa	10-Feb-2016	mtklein <mtklein@chromium.org>	Sk4f: floor() via int32_t roundtrip. About 25% faster on both x86 and ARMv7. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1682953002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1682953002 /external/skia/src/opts/SkNx_sse.h
126626e52310b39429ad5eaa8e68e0caa24e6202	10-Feb-2016	mtklein <mtklein@chromium.org>	Sk4f: add floor() BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/86c6c4935171a1d2d6a9ffbff37ec6dac1326614 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus7-GPU-Tegra3-Arm7-Release-Trybot,Test-Android-GCC-Nexus9-GPU-TegraK1-Arm64-Release-Trybot Review URL: https://codereview.chromium.org/1685773002 /external/skia/src/opts/SkNx_sse.h
c023210893da7ca407bad8c9f07c8339ee81854c	10-Feb-2016	mtklein <mtklein@google.com>	Revert of Sk4f: add floor() (patchset #3 id:40001 of https://codereview.chromium.org/1685773002/ ) Reason for revert: build break must be this, right? Original issue's description: > Sk4f: add floor() > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/86c6c4935171a1d2d6a9ffbff37ec6dac1326614 TBR=herb@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1679343004 /external/skia/src/opts/SkNx_sse.h
86c6c4935171a1d2d6a9ffbff37ec6dac1326614	09-Feb-2016	mtklein <mtklein@chromium.org>	Sk4f: add floor() BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1685773002 /external/skia/src/opts/SkNx_sse.h
8273ca460fcde18ec884279fb9a41f330e92e0f9	09-Feb-2016	mtklein <mtklein@chromium.org>	restore sk4i SSE specialization BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1679343003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1679343003 /external/skia/src/opts/SkNx_sse.h
e4c0beed744d09dae4757c1893d8caa64ee09cd2	09-Feb-2016	mtklein <mtklein@chromium.org>	sknx refactoring - trim unused specializations (Sk4i, Sk2d) and apis (SkNx_dup) - expand apis a little * v[0] == v.kth<0>() * SkNx_shuffle can now convert to different-sized vectors, e.g. Sk2f <-> Sk4f - remove anonymous namespace I believe it's safe to remove the anonymous namespace right now. We're worried about violating the One Definition Rule; the anonymous namespace protected us from that. In Release builds, this is mostly moot, as everything tends to inline completely. In Debug builds, violating the ODR is at worst an inconvenience, time spent trying to figure out why the bot is broken. Now that we're building with SSE2/NEON everywhere, very few bots have even a chance about getting confused by two definitions of the same type or function. Where we do compile variants depending on, e.g., SSSE3, we do so in static inline functions. These are not subject to the ODR. I plan to follow up with a tedious .kth<...>() -> [...] auto-replace. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1683543002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1683543002 /external/skia/src/opts/SkNx_sse.h
629f25a7ef229d97e22530eae043882b20cd8d8c	08-Feb-2016	mtklein <mtklein@chromium.org>	fix float <---> uint16_t conversion, with Mike's tests. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1679713002 patch from issue 1679713002 at patchset 1 (http://crrev.com/1679713002#ps1) CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1676893002 /external/skia/src/opts/SkNx_sse.h
2d340f2f37a28e82323428725019f12a8538f48e	07-Feb-2016	mtklein <mtklein@chromium.org>	could not resist: fast sse float <--> u16 - generalizes the bench to float <--> {u8,u16} - must remember to implement NEON version at some point BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1676853002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1676853002 /external/skia/src/opts/SkNx_sse.h
507ef6d68115ae9e6d884bb36436a1463523d893	31-Jan-2016	mtklein <mtklein@chromium.org>	SkNx Load/store: take any pointer. This means we can remove a lot of explicit casts in code that uses SkNx. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1650653002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1650653002 /external/skia/src/opts/SkNx_sse.h
550e9b0ef1c0fba42e2e902a467af322ad2413da	20-Jan-2016	mtklein <mtklein@chromium.org>	SkNx miplevel building All sizes approximately twice as fast. Before: micros bench 1649.35 mipmap_build_512x512 nonrendering 1824.42 mipmap_build_511x512 nonrendering 2100.66 ? mipmap_build_512x511 nonrendering 2375.94 mipmap_build_511x511 nonrendering After: micros bench 730.32 ! mipmap_build_512x512 nonrendering 922.12 mipmap_build_511x512 nonrendering 999.07 mipmap_build_512x511 nonrendering 1342.93 ! mipmap_build_511x511 nonrendering BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1606013003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/3bd5aba2a0e165997f683cf3aa306661e71464f6 Review URL: https://codereview.chromium.org/1606013003 /external/skia/src/opts/SkNx_sse.h
26def8687f9763dcc00ab3022aee4b86700970c1	20-Jan-2016	mtklein <mtklein@google.com>	Revert of SkNx miplevel building (patchset #3 id:40001 of https://codereview.chromium.org/1606013003/ ) Reason for revert: Paranoid revert to see if it helps skia:4823 Original issue's description: > SkNx miplevel building > > All sizes approximately twice as fast. > > Before: > micros bench > 1649.35 mipmap_build_512x512 nonrendering > 1824.42 mipmap_build_511x512 nonrendering > 2100.66 ? mipmap_build_512x511 nonrendering > 2375.94 mipmap_build_511x511 nonrendering > > After: > micros bench > 730.32 ! mipmap_build_512x512 nonrendering > 922.12 mipmap_build_511x512 nonrendering > 999.07 mipmap_build_512x511 nonrendering > 1342.93 ! mipmap_build_511x511 nonrendering > > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1606013003 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/3bd5aba2a0e165997f683cf3aa306661e71464f6 TBR=reed@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1607323002 /external/skia/src/opts/SkNx_sse.h
3bd5aba2a0e165997f683cf3aa306661e71464f6	19-Jan-2016	mtklein <mtklein@chromium.org>	SkNx miplevel building All sizes approximately twice as fast. Before: micros bench 1649.35 mipmap_build_512x512 nonrendering 1824.42 mipmap_build_511x512 nonrendering 2100.66 ? mipmap_build_512x511 nonrendering 2375.94 mipmap_build_511x511 nonrendering After: micros bench 730.32 ! mipmap_build_512x512 nonrendering 922.12 mipmap_build_511x512 nonrendering 999.07 mipmap_build_512x511 nonrendering 1342.93 ! mipmap_build_511x511 nonrendering BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1606013003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1606013003 /external/skia/src/opts/SkNx_sse.h
c33065a93ad0874672c4c66b9711aa0b3ef7b7e7	15-Jan-2016	mtklein <mtklein@chromium.org>	add SkNx::abs(), for now only implemented for Sk4f There's no reason we couldn't implement this for all ints and floats; just don't want to land unused code. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1590843003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1590843003 /external/skia/src/opts/SkNx_sse.h
fce612ac32a224d874acfce2b1ccf8db1a553307	15-Dec-2015	mtklein <mtklein@chromium.org>	Specialize Sk2d for SSE2 Given the autovectorization we've seen, I wouldn't expect big speedups from this, but it does give us a point of control over what's going on. BUG=skia: CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1526923003 /external/skia/src/opts/SkNx_sse.h
6f37b4a4757ea3eb00c76162cc37f8a56c3b8bdb	14-Dec-2015	mtklein <mtklein@chromium.org>	Unify some SkNx code - one base case and one N=1 case instead of two each (or three with doubles) - use SkNx_cast instead of FromBytes/toBytes - 4-at-a-time Sk4f::ToBytes becomes a special standalone Sk4f_ToBytes If I did everything right, this'll be perf- and pixel- neutral. https://gold.skia.org/search2?issue=1526523003&unt=true&query=source_type%3Dgm&master=false BUG=skia: CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1526523003 /external/skia/src/opts/SkNx_sse.h
c2e0ac4fce5bfe5bac9f2cb1ebd3aea3a8900fb1	03-Dec-2015	fmalita <fmalita@chromium.org>	Don't use the Sk4f gradient impl without SIMD Also remove the SK_SUPPORT_LEGACY_LINEAR_GRADIENT_TABLE guard since it is no longer used in Chromium. BUG=chromium:563492 R=reed@google.com,mtklein@google.com CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1489233005 /external/skia/src/opts/SkNx_sse.h
9db43ac4ee1a83a4f7b332fe6c00f592b6237349	01-Dec-2015	mtklein <mtklein@chromium.org>	Add Sk4f::ToBytes(uint8_t[16], Sk4f, Sk4f, Sk4f, Sk4f) This is a big speedup for float -> byte. E.g. gradient_linear_clamp_3color: x86-64 147µs -> 103µs (Broadwell MBP) arm64 2.03ms -> 648µs (Galaxy S6) armv7 1.12ms -> 489µs (Galaxy S6, same device!) BUG=skia: CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.android:Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot Review URL: https://codereview.chromium.org/1483953002 /external/skia/src/opts/SkNx_sse.h
6c221b40680ff933c7b8f2ac3dfa76b5732aee3e	20-Nov-2015	mtklein <mtklein@chromium.org>	Add SkNx_cast(). SkNx_cast() can cast between any of our vector types, provided they have the same number of elements. Any types should work with the default implementation, and we can drop in specializations as needed, like the SSE and NEON Sk4f -> Sk4i I included here as an example. To make this work, I made some internal name changes: SkNi<N,T> -> SkNx<N, T> SkNf<N> -> SkNx<N, float> User aliases (Sk4f, Sk16b, etc.) stay the same. We can land this first (it's PS1) if that makes things easier. BUG=skia: CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1464623002 /external/skia/src/opts/SkNx_sse.h
084db25d47dbad3ffbd7d15c04b63d344b351f90	11-Nov-2015	mtklein <mtklein@chromium.org>	float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX. Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x Xfermode_ColorDodge 25ms -> 16.5ms 0.66x Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x Ought to be no pixel diffs: https://gold.skia.org/search2?issue=1432903002&unt=true&query=source_type%3Dgm&master=false Incidental stuff: I made the SkNx(T) constructors implicit to make writing math expressions simpler. This allows us to write expressions like Sk4f v; ... v = v4; rather than Sk4f v; ... v = v Sk4f(4); As written it only works when the constant is on the right-hand side, so expressions like `(Sk4f(1) - da)` have to stay for now. I plan on following up with a CL that lets those become `(1 - da)` too. BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1432903002 /external/skia/src/opts/SkNx_sse.h
6f797092d2054f79374fb98fc1d57ca3554c7db4	09-Nov-2015	mtklein <mtklein@chromium.org>	prune unused SkNx features - remove float -> int conversion, keeping float -> byte - remove support for doubles I was thinking of specializing Sk8f for AVX. This will help keep the complexity down. This may cause minor diffs in radial gradients: toBytes() rounds where castTrunc() truncated. But I don't see any diffs in Gold. https://gold.skia.org/search2?issue=1411563008&unt=true&query=source_type%3Dgm&master=false BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1411563008 /external/skia/src/opts/SkNx_sse.h
a508f3c62d726867ad9f722623fbf0ab5d06e440	01-Sep-2015	mtklein <mtklein@chromium.org>	Require Sk4f::toBytes() clamps BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.android:Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Release-Trybot Review URL: https://codereview.chromium.org/1312053004 /external/skia/src/opts/SkNx_sse.h
aba1dc8c6aa5cbc4f38ddfce757832359f200b54	31-Aug-2015	mtklein <mtklein@chromium.org>	Move float<->byte conversions into Sk4f. This lets us avoid conversions to [0.0, 1.0] space and rounding that aren't necessary for SkColorCubeFilter_opts.h. Dropping rounding on the way back to bytes means we'll see a bunch of off-by-1 diffs. Rough perf effect: SSSE3: 110 -> 93 (~15%) NEON: 465 -> 375 (~20%) This is the beginning of the end for SkPMFloat as an entity distinct from Sk4f. I've kept it for now so I can convert sites one by one and think about how things that really want to keep PM color order will work. BUG=skia:4117 Review URL: https://codereview.chromium.org/1319413003 /external/skia/src/opts/SkNx_sse.h
082e329887a8f1efe4e1020f0a0a6ea09961712d	12-Aug-2015	mtklein <mtklein@google.com>	Revert of Refactor to put SkXfermode_opts inside SK_OPTS_NS. (patchset #1 id:1 of https://codereview.chromium.org/1286093004/ ) Reason for revert: Maybe causing test / gold problems? Original issue's description: > Refactor to put SkXfermode_opts inside SK_OPTS_NS. > > Without this refactor I was getting warnings previously about having code > inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to > code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1]. > > That low-level code was in an anonymous namespace to allow multiple independent > copies of its methods to be instantiated without the linker getting confused / > offended about violating the One Definition Rule. This was only happening in > Debug mode where the methods were not being inlined. > > To fix this all, I've force-inlined the methods of the low-level code and > removed the anonymous namespace. > > BUG=skia:4117 > > > [1] Here is what those errors looked like: > > In file included from ../../../../src/core/SkOpts.cpp:18:0: > ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror] > class Sk4pxXfermode : public SkProcCoeffXfermode { > ^ > ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror] > ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror] > class SkPMFloatXfermode : public SkProcCoeffXfermode { > ^ > cc1plus: all warnings being treated as errors > > Committed: https://skia.googlesource.com/skia/+/b07bee3121680b53b98b780ac08d14d374dd4c6f TBR=djsollen@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1284333002 /external/skia/src/opts/SkNx_sse.h
b07bee3121680b53b98b780ac08d14d374dd4c6f	12-Aug-2015	mtklein <mtklein@chromium.org>	Refactor to put SkXfermode_opts inside SK_OPTS_NS. Without this refactor I was getting warnings previously about having code inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1]. That low-level code was in an anonymous namespace to allow multiple independent copies of its methods to be instantiated without the linker getting confused / offended about violating the One Definition Rule. This was only happening in Debug mode where the methods were not being inlined. To fix this all, I've force-inlined the methods of the low-level code and removed the anonymous namespace. BUG=skia:4117 [1] Here is what those errors looked like: In file included from ../../../../src/core/SkOpts.cpp:18:0: ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror] class Sk4pxXfermode : public SkProcCoeffXfermode { ^ ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror] ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror] class SkPMFloatXfermode : public SkProcCoeffXfermode { ^ cc1plus: all warnings being treated as errors Review URL: https://codereview.chromium.org/1286093004 /external/skia/src/opts/SkNx_sse.h
4be181e304d2b280c6801bd13369cfba236d1a66	14-Jul-2015	mtklein <mtklein@chromium.org>	3-15% speedup to HardLight / Overlay xfermodes. While investigating my bug (skia:4052) I saw this TODO and figured it'd make me feel better about an otherwise unsuccessful investigation. This speeds up HardLight and Overlay (same code) by about 15% with SSE, mostly by rewriting the logic from 1 cheap comparison and 2 expensive div255() calls to 2 cheap comparisons and 1 expensive div255(). NEON speeds up by a more modest ~3%. BUG=skia: Review URL: https://codereview.chromium.org/1230663005 /external/skia/src/opts/SkNx_sse.h
2aab22a58a366df4752c1cf0f004092c6e7be335	26-Jun-2015	mtklein <mtklein@chromium.org>	Color dodge and burn with SkPMFloat. Both 25-35% faster with SSE. With NEON, Burn measures as a ~10% regression, Dodge a huge 2.9x improvement. The Burn regression is somewhat artificial: we're drawing random colored rects onto an opaque white dst, so we're heavily biased toward the (d==da) fast path in the serial code. In the vector code there's no short-circuiting and we always pay a fixed cost for ColorBurn regardless of src or dst content. Dodge's fast paths, in contrast, only trigger when (s==sa) or (d==0), neither of which happens any more than randomly in our benchmark. I don't think (d==0) should happen at all. Similarly, the (s==0) Burn fast path is really only going to happen as often as SkRandom allows. In practice, the existing Burn benchmark is hitting its fast path 100% of the time. So I actually feel really great that this only dings the benchmark by 10%. Chrome's still guarded by SK_SUPPORT_LEGACY_XFERMODES, which I'll lift after finishing the last xfermode, SoftLight. BUG=skia: Review URL: https://codereview.chromium.org/1214443002 /external/skia/src/opts/SkNx_sse.h
b5e861185a69b3e47d1c1bf622fd3f83e5f13898	25-Jun-2015	mtklein <mtklein@chromium.org>	Implement four more xfermodes with Sk4px. HardLight, Overlay, Darken, and Lighten are all ~2x faster with SSE, ~25% faster with NEON. This covers all previously-implemented NEON xfermodes. 3 previous SSE xfermodes remain. Those need division and sqrt, so I'm planning on using SkPMFloat for them. It'll help the readability and NEON speed if I move that into [0,1] space first. The main new concept here is c.thenElse(t,e), which behaves like (c ? t : e) except, of course, both t and e are evaluated. This allows us to emulate conditionals with vectors. This also removes the concept of SkNb. Instead of a standalone bool vector, each SkNi or SkNf will just return their own types for comparisons. Turns out to be a lot more manageable this way. BUG=skia: Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot Review URL: https://codereview.chromium.org/1196713004 /external/skia/src/opts/SkNx_sse.h
0cc1f0a8d5ed69c76d75061bc2dee3b1d0ce0605	24-Jun-2015	mtklein <mtklein@google.com>	Revert of Implement four more xfermodes with Sk4px. (patchset #16 id:290001 of https://codereview.chromium.org/1196713004/) Reason for revert: 64-bit ARM build failures. Original issue's description: > Implement four more xfermodes with Sk4px. > > HardLight, Overlay, Darken, and Lighten are all > ~2x faster with SSE, ~25% faster with NEON. > > This covers all previously-implemented NEON xfermodes. > 3 previous SSE xfermodes remain. Those need division > and sqrt, so I'm planning on using SkPMFloat for them. > It'll help the readability and NEON speed if I move that > into [0,1] space first. > > The main new concept here is c.thenElse(t,e), which behaves like > (c ? t : e) except, of course, both t and e are evaluated. This allows > us to emulate conditionals with vectors. > > This also removes the concept of SkNb. Instead of a standalone bool > vector, each SkNi or SkNf will just return their own types for > comparisons. Turns out to be a lot more manageable this way. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1205703008 /external/skia/src/opts/SkNx_sse.h
b9d4163bebab0f5639f9c5928bb5fc15f472dddc	24-Jun-2015	mtklein <mtklein@chromium.org>	Implement four more xfermodes with Sk4px. HardLight, Overlay, Darken, and Lighten are all ~2x faster with SSE, ~25% faster with NEON. This covers all previously-implemented NEON xfermodes. 3 previous SSE xfermodes remain. Those need division and sqrt, so I'm planning on using SkPMFloat for them. It'll help the readability and NEON speed if I move that into [0,1] space first. The main new concept here is c.thenElse(t,e), which behaves like (c ? t : e) except, of course, both t and e are evaluated. This allows us to emulate conditionals with vectors. This also removes the concept of SkNb. Instead of a standalone bool vector, each SkNi or SkNf will just return their own types for comparisons. Turns out to be a lot more manageable this way. BUG=skia: Review URL: https://codereview.chromium.org/1196713004 /external/skia/src/opts/SkNx_sse.h
059ac00446404506a46cd303db15239c7aae49d5	22-Jun-2015	mtklein <mtklein@chromium.org>	Update some Sk4px APIs. Mostly this is about ergonomics, making it easier to do good operations and hard / impossible to do bad ones. - SkAlpha / SkPMColor constructors become static factories. - Remove div255TruncNarrow(), rename div255RoundNarrow() to div255(). In practice we always want to round, and the narrowing to 8-bit is contextually obvious. - Rename fastMulDiv255Round() approxMulDiv255() to stress it's approximate-ness over its speed. Drop Round for the same reason as above... we should always round. - Add operator overloads so we don't have to keep throwing in seemingly-random Sk4px() or Sk4px::Wide() casts. - use operator*() for 8-bit x 8-bit -> 16-bit math. It's always what we want, and there's generally no 8x8->8 alternative. - MapFoo can take a const Func&. Don't think it makes a big difference, but nice to do. BUG=skia: Review URL: https://codereview.chromium.org/1202013002 /external/skia/src/opts/SkNx_sse.h
aa999cb952753d373c03befa7fce8c4700ef9308	23-May-2015	mtklein <mtklein@chromium.org>	Everyone gets a namespace {}. If we include Sk4px.h, SkPMFloat.h, or SkNx.h into files with different SIMD flags, that could cause different definitions of the same method. Normally that's moot, because all the code inlines, but in Debug it tends not to. So in Debug, the linker picks one definition for us. That breaks _someone_. Wrapping everything in a namespace {} keeps the definitions separate. Tested locally, it fixes this bug. BUG=skia:3861 This code is not yet enabled in Chrome, so shouldn't affect the roll. NOTREECHECKS=true Review URL: https://codereview.chromium.org/1154523004 /external/skia/src/opts/SkNx_sse.h
27e517ae533775889c98c65fa2f07b98357ecbc2	15-May-2015	mtklein <mtklein@chromium.org>	add Min to SkNi, specialized for u8 and u16 on SSE and NEON 0x8001 / 0x7fff don't seem to work, but we were close: 0x8000 does. I plan to use this to implement the Difference xfermode, and it seems generally handy. BUG=skia: Review URL: https://codereview.chromium.org/1133933004 /external/skia/src/opts/SkNx_sse.h
6cbf18c70bf99f58b2bb1c49cdf8d41be561fee4	13-May-2015	mtklein <mtklein@chromium.org>	Plus xfermode using Sk4px. Xfermode_Plus runs 4-5x faster. We expect mixed_xfermodes to have a small diff. This is because kFoldCoverageIntoSrcAlpha was incorrectly set to true. This implementation handily beats the Sk4f impl, the portable impl, and the existing SSE2 impl. Reading the SkXfermodes_opts_SSE2.cpp file, I'm pretty confident that we'll be able to beat all SSE2 impls. I believe this impl will beat or match the existing NEON impl too, but that may not be true for more complicated xfermodes. They can take advantage of transposing ARGBARGB... to AAAARRRR.... cheaply and I haven't figured out an abstraction for that yet that doesn't screw SSE. Adds: - MapDstSrc() to Sk4px - saturatedAdd() to SkNi (only implemented as far as it's used). - div255Narrow() BUG=skia: Review URL: https://codereview.chromium.org/1138893002 /external/skia/src/opts/SkNx_sse.h
d2ffd36eb62e99abe2920369d1e040954cc2044f	12-May-2015	mtklein <mtklein@chromium.org>	Sk4px Xfermode_SrcOver: SSE: 2.08ms -> 2.03ms (~2% faster) NEON: my N5 is noisy, but there appears to be no perf change BUG=skia: Review URL: https://codereview.chromium.org/1132273004 /external/skia/src/opts/SkNx_sse.h
d7c014ff03d44d3ed7a6a2ddca59621a7e98f739	27-Apr-2015	mtklein <mtklein@chromium.org>	Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM This is a logical no-op. Everything was using the equivalent of rsqrt1() before, and is now after. BUG=skia: Committed: https://skia.googlesource.com/skia/+/9de16283fdc8cc0d31a84f503578d0ecea4e8297 CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot Review URL: https://codereview.chromium.org/1109913002 /external/skia/src/opts/SkNx_sse.h
9a22f489e8722dd83c65f33fb886019d9f60e479	27-Apr-2015	mtklein <mtklein@google.com>	Revert of Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM (patchset #2 id:20001 of https://codereview.chromium.org/1109913002/) Reason for revert: arm64 typos Original issue's description: > Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM > > This is a logical no-op. Everything was using the equivalent of rsqrt1() before, and is now after. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/9de16283fdc8cc0d31a84f503578d0ecea4e8297 TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1105233003 /external/skia/src/opts/SkNx_sse.h
9de16283fdc8cc0d31a84f503578d0ecea4e8297	27-Apr-2015	mtklein <mtklein@chromium.org>	Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM This is a logical no-op. Everything was using the equivalent of rsqrt1() before, and is now after. BUG=skia: Review URL: https://codereview.chromium.org/1109913002 /external/skia/src/opts/SkNx_sse.h
1113da72eced20480491bb87ade0ffcff4eb8ea7	27-Apr-2015	mtklein <mtklein@chromium.org>	Mike's radial gradient CL with better float -> int. patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001) This looks quite launchable. radial_gradient3, min of 100 samples: N5: 985µs -> 946µs MBP: 395µs -> 279µs On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time. Is that something we could do in float math rather than with a lookup table? BUG=skia: CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Mac10.8-Clang-Arm7-Debug-Android-Trybot,Build-Ubuntu-GCC-Arm7-Release-Android_NoNeon-Trybot Committed: https://skia.googlesource.com/skia/+/abf6c5cf95e921fae59efb487480e5b5081cf0ec Review URL: https://codereview.chromium.org/1109643002 /external/skia/src/opts/SkNx_sse.h
8d3e9dff3f3db3fa77c383e4cd6c47b9898a8fcd	27-Apr-2015	mtklein <mtklein@google.com>	Revert of Mike's radial gradient CL with better float -> int. (patchset #7 id:120001 of https://codereview.chromium.org/1109643002/) Reason for revert: compile failures. Original issue's description: > Mike's radial gradient CL with better float -> int. > > patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001) > > This looks quite launchable. radial_gradient3, min of 100 samples: > N5: 985µs -> 946µs > MBP: 395µs -> 279µs > > On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time. Is that something we could do in float math rather than with a lookup table? > > BUG=skia: > > CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Debug-Trybot,Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot > > Committed: https://skia.googlesource.com/skia/+/abf6c5cf95e921fae59efb487480e5b5081cf0ec TBR=reed@google.com,robertphillips@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1109883003 /external/skia/src/opts/SkNx_sse.h
abf6c5cf95e921fae59efb487480e5b5081cf0ec	27-Apr-2015	mtklein <mtklein@chromium.org>	Mike's radial gradient CL with better float -> int. patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001) This looks quite launchable. radial_gradient3, min of 100 samples: N5: 985µs -> 946µs MBP: 395µs -> 279µs On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time. Is that something we could do in float math rather than with a lookup table? BUG=skia: CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Debug-Trybot,Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot Review URL: https://codereview.chromium.org/1109643002 /external/skia/src/opts/SkNx_sse.h
115acee9386e685f9a5938fb2cf13fd5a475012a	14-Apr-2015	mtklein <mtklein@chromium.org>	Sk4h and Sk8h for SSE These will underly the SkPMFloat-like class for uint16_t components. Sk4h will back a single-pixel version, and Sk8h any larger number than that. BUG=skia: Review URL: https://codereview.chromium.org/1088883005 /external/skia/src/opts/SkNx_sse.h
8fe8fffdfa7464c6f7da773b8660a2043f4998e0	14-Apr-2015	mtklein <mtklein@chromium.org>	Rename SkNi to SkNb. As used today, SkNi is used in bool-y contexts. This keeps that, but under a new name, SkNb. This makes room for a new SkNi that's focused on integer-y things like loads, stores, arithmetic, etc. The main reason to split these is that we want different specializations for each use case: for bools, it's important for us to specialize 32- and 64-bit to support efficient float- and double- comparisons, but for integer work we're more likely to be looking at 8- and 16- bit lanes. Keeping these use cases siloed helps me manage the compexity of the backend NEON and SSE code. BUG=skia: Review URL: https://codereview.chromium.org/1083123002 /external/skia/src/opts/SkNx_sse.h
7792dbf7ea089b3bcb81792a3ecda8a6f8b421e7	03-Apr-2015	mtklein <mtklein@chromium.org>	Code's more readable when SkPMFloat is an Sk4f. #floats BUG=skia: BUG=skia:3592 Committed: https://skia.googlesource.com/skia/+/6b5dab889579f1cc9e1b5278f4ecdc4c63fe78c9 CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot Review URL: https://codereview.chromium.org/1061603002 /external/skia/src/opts/SkNx_sse.h
e758579d4f85f4615d62e87847dd3f8b38e3a6e2	03-Apr-2015	mtklein <mtklein@google.com>	Revert of Code's more readable when SkPMFloat is an Sk4f. (patchset #3 id:40001 of https://codereview.chromium.org/1061603002/) Reason for revert: missed some neon code Original issue's description: > Code's more readable when SkPMFloat is an Sk4f. > #floats > > BUG=skia: > BUG=skia:3592 > > Committed: https://skia.googlesource.com/skia/+/6b5dab889579f1cc9e1b5278f4ecdc4c63fe78c9 TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1056143004 /external/skia/src/opts/SkNx_sse.h
6b5dab889579f1cc9e1b5278f4ecdc4c63fe78c9	03-Apr-2015	mtklein <mtklein@chromium.org>	Code's more readable when SkPMFloat is an Sk4f. #floats BUG=skia: BUG=skia:3592 Review URL: https://codereview.chromium.org/1061603002 /external/skia/src/opts/SkNx_sse.h
a156a8ffbe1342a9c329e66ad1438934ac309d70	03-Apr-2015	mtklein <mtklein@chromium.org>	Use switch operator[](int) to kth<int>() so we can use vget_lane. #floats BUG=skia: BUG=skia:3592 Review URL: https://codereview.chromium.org/1059743002 /external/skia/src/opts/SkNx_sse.h
c9adb05b64fa0bfadf9d1a782afcda470da68c9e	30-Mar-2015	mtklein <mtklein@chromium.org>	Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T> The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc. This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h. This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h. To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful. You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel: - Sk4f, Sk4s, Sk2d: feel awesome - Sk2f, Sk2s, Sk4d: feel pretty good No public API changes. TBR=reed@google.com BUG=skia:3592 Review URL: https://codereview.chromium.org/1048593002 /external/skia/src/opts/SkNx_sse.h