8729e5bbf7c436fd7c7c13182adbbfb419f566b5 |
|
16-Feb-2017 |
Mike Klein <mtklein@chromium.org> |
Simplify more: remove SkRasterPipeline::compile(). It's easier to work on SkJumper if everything funnels through run(). I don't anticipate huge benefit from compile() without JITing, but it's something we can always put back if we find a need. Change-Id: Id5256fd21495e8195cad1924dbad81856416d913 Reviewed-on: https://skia-review.googlesource.com/8468 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/core/SkOpts.cpp
|
2e83b13122f93a4f88d1c5471e4b4d71b446e223 |
|
10-Feb-2017 |
Herb Derby <herb@google.com> |
Remove SkTextureCompressor. This is the ultimate state of what it looks like to remove SkTextureCompressor. This end result will result from the following steps. 1) Remove Skia dep on ktx (done) 2) Move format over to ktx (done) 3) Remove all SkTexture compressor code CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I3ad7a6abbea006a3034d95662c652d6db90b86ef Reviewed-on: https://skia-review.googlesource.com/8272 Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Robert Phillips <robertphillips@google.com> Reviewed-by: Leon Scroggins <scroggo@google.com>
/external/skia/src/core/SkOpts.cpp
|
efaad3cd53330f063e6feaee8b14ad43ca251184 |
|
20-Jan-2017 |
Mike Klein <mtklein@chromium.org> |
Remove SkColorCubeFilter. It is unused. Change-Id: Iec5fc759e331de24caea1347f9510917260d379b Reviewed-on: https://skia-review.googlesource.com/7363 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/core/SkOpts.cpp
|
c789b61167dd98efc3c3bfcf9673eef24c2e57f4 |
|
30-Nov-2016 |
Mike Klein <mtklein@chromium.org> |
Bring back SkRasterPipeline::run() for one-off uses. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I308b6d75f2987a667eead9a55760a2ff6aec2984 Reviewed-on: https://skia-review.googlesource.com/5353 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/core/SkOpts.cpp
|
d2265e537c8015f8115d7b5b7f6de970aa688172 |
|
18-Nov-2016 |
xiangze.zhang <xiangze.zhang@intel.com> |
Port convolve functions to SkOpts This patch moves the C++/SSE2/NEON implementations of convolve functions into the same place and uses SkOpts framework. Also some indentation fix. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2500113004 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2500113004
/external/skia/src/core/SkOpts.cpp
|
e9f74b89c09772dd5abae1c0709c711d7cdb6535 |
|
25-Oct-2016 |
Mike Klein <mtklein@chromium.org> |
SkRasterPipeline::compile(). I'm not yet caching these in the blitter, and speed is essentially unchanged in the bench where I am now building and compiling the pipeline only once. This may not be able to stay a simple std::function after I figure out caching, but for now it's a nice fit. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3911 Change-Id: I9545af589f73baf9f17cb4e6ace9a814c2478fe9 Reviewed-on: https://skia-review.googlesource.com/3911 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/core/SkOpts.cpp
|
aebfb45104eeb6dab5dbbedda13c2eaa7b7f7868 |
|
25-Oct-2016 |
Mike Klein <mtklein@chromium.org> |
Move SkRasterPipeline further into SkOpts. The portable code now becomes entirely focused on enum+ptr descriptions, leaving the concrete implementation of the pipeline to SkOpts::run_pipeline(). As implemented, the concrete implementation is basically the same, with a little more type safety. Speed is essentially unchanged on my laptop, and that's having run_pipeline() rebuild its concrete state every call. There's room for improvement there if we split this into a compile_pipeline() / run_pipeline() sort of thing, which is my next planned CL. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3920 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Ie4c554f51040426de7c5c144afa5d9d9d8938012 Reviewed-on: https://skia-review.googlesource.com/3920 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/core/SkOpts.cpp
|
2878e76247fd335dee755cfd4c9cd026804f9655 |
|
20-Oct-2016 |
Mike Klein <mtklein@chromium.org> |
SkRasterPipeline refactor - Give body and tail functions separate types. This frees a register in body functions, especially important for Windows. - Fill out default, SSE4.1, and HSW versions of all functions. This means we don't have to mess around with SkNf_abi... all functions come from the same compilation unit where SkNf is a single consistent type. - Move Stage::next() into SkRasterPipeline_opts.h as a static inline function. - Remove Stage::ctx() entirely... fCtx is literally the same thing. This is a step along the way toward building the entire pipeline in src/opts, removing the need for all the stages to be functions living in SkOpts. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3680 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot Change-Id: I7de78ffebc15b9bad4eda187c9f50369cd7e5e42 Reviewed-on: https://skia-review.googlesource.com/3680 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/core/SkOpts.cpp
|
96b333a9a1b6a367d6c542118638e3108d8ed23b |
|
12-Oct-2016 |
Mike Klein <mtklein@chromium.org> |
Add SkRasterPipeline support to SkModeColorFilter. The shader leaves its color in r,g,b,a, so to implement this color filter, we move {r,g,b,a} into {dr,dg,db,da}, then load the filter's color in {r,g,b,a}, then apply the xfermode as usual. I've left a note about how we could sometimes cut a stage for some xfermodes. Similarly we really only need to move_src_dst instead of swap_src_dst, but it seemed handy and less error prone to do a full two way swap. As usual, we can always circle back and fine-tune these things if we want. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3243 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I928c0fb25236eb75cf238134c6bebb53af5ddf07 Reviewed-on: https://skia-review.googlesource.com/3243 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
/external/skia/src/core/SkOpts.cpp
|
04adfda9c74481d0b640c0ce18864588babfcdf6 |
|
12-Oct-2016 |
Mike Klein <mtklein@chromium.org> |
SkRasterPipeline: 8x pipelines, without any 8x code enabled. Original review here: https://skia-review.googlesource.com/c/2990/ Second attempt here: https://skia-review.googlesource.com/c/3064/ This is the same as the second attempt, but with the change to SkOpts_hsw.cpp left out. That omitted part is the key piece... this just lands the refactoring. CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3242 Change-Id: Iaafa793a4854c2c9cd7e85cca3701bf871253f71 Reviewed-on: https://skia-review.googlesource.com/3242 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/core/SkOpts.cpp
|
42f4b42e8311f168aeeadd939b476c05b329500e |
|
10-Oct-2016 |
Mike Klein <mtklein@chromium.org> |
Revert "SkRasterPipeline: 8x pipelines, attempt 2" This reverts commit Id0ba250037e271a9475fe2f0989d64f0aa909bae. crbug.com/654213 Looks like Chrome Canary's picking up Haswell code on non-Haswell machines. Change-Id: I16f976da24db86d5c99636c472ffad56db213a2a Reviewed-on: https://skia-review.googlesource.com/3108 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/core/SkOpts.cpp
|
a71e151c6f0be68dc96ad2d169bbc31edca8f946 |
|
07-Oct-2016 |
Mike Klein <mtklein@chromium.org> |
SkRasterPipeline: 8x pipelines, attempt 2 Original review here: https://skia-review.googlesource.com/c/2990/ Changes since: - simpler implementations of load_tail() / store_tail(): slower, but more obviously correct to all compilers - fleshed out math ops on Sk8i and Sk8u to make unit tests happy on -Fast bot (where we always have AVX2) - now storing stage functions as void(*)() to avoid undefined behavior and/or linker problems. This restores 32-bit Windows. - all AVX2 Sk8x methods are marked always-inline, to avoid linking the "wrong" version on Debug builds. CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3064 Change-Id: Id0ba250037e271a9475fe2f0989d64f0aa909bae Reviewed-on: https://skia-review.googlesource.com/3064 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/core/SkOpts.cpp
|
942858774831fea6e3f5f9d6c39b9538cf031bfd |
|
07-Oct-2016 |
Mike Klein <mtklein@chromium.org> |
Revert "SkRasterPipeline: 8x pipelines" This reverts commit I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a. Reason for revert: lots of failing bots. TBR=mtklein@chromium.org,msarett@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I653bed3905187f43196504f19424985fa2a765b5 Reviewed-on: https://skia-review.googlesource.com/3063 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/core/SkOpts.cpp
|
1aebdaee0e2aa4324509fd3ad4c40c21703ae4a2 |
|
06-Oct-2016 |
Mike Klein <mtklein@chromium.org> |
SkRasterPipeline: 8x pipelines Bench runtime changes: sRGB: 7194 -> 3735 = 1.93x faster F16: 6531 -> 2559 = 2.55x faster Instead of building 4x and 1-3x pipelines and then maybe 8x and 1-7x, instead build either the short ones or the long ones, but not both. If we just take care to use a compatible run_pipeline(), there's some cross-module type disagreement but everything works out in the end. Oddly, a few places that looked like they'd be faster using SkNx_fma() or Sk4f_round()/Sk8f_round() are actually faster the long way, e.g. multiply, add 0.5, truncate. Curious! In all the other places you see here that I've used SkNx_fma(), it's been a significant speedup. This folds in a couple refactors and cleanups that I've been meaning to do. Hope you don't mind... if find the new code considerably easier to read than the old code. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2990 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a Reviewed-on: https://skia-review.googlesource.com/2990 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/core/SkOpts.cpp
|
9161ef012fb22d1b76122cb95dbfddb72f02ef02 |
|
04-Oct-2016 |
Mike Klein <mtklein@chromium.org> |
Make all SkRasterPipeline stages stock stages in SkOpts. If we want to support VEX-encoded instructions (AVX, F16C, etc.) without a ridiculous slowdown, we need to make sure we're running either all VEX-encoded instructions or all non-VEX-encoded instructions. That means we cannot mix arbitrary user-defined SkRasterPipeline::Fn (never VEX) with those living in SkOpts (maybe VEX)... it's SkOpts or bust. This ports the existing user-defined SkRasterPipeline::Fn use cases over to use stock stages from SkOpts. I rewrote the unit test to use stock stages, and moved the SkXfermode implementations to SkOpts. The code deleted for SkArithmeticMode_scalar should already be dead. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2940 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I94dbe766b2d65bfec6e544d260f71d721f0f5cb0 Reviewed-on: https://skia-review.googlesource.com/2940 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Mike Reed <reed@google.com>
/external/skia/src/core/SkOpts.cpp
|
78d5a3bac5cbde50cd12d8b9ab6dd269324b5272 |
|
30-Sep-2016 |
Mike Klein <mtklein@chromium.org> |
Add an SkOpts target for Haswell+ Intel chips. Haswell brought a whole slew of handy new instructions for us (AVX2, FMA, BMI1+BMI2) and also feature F16C, which came one generation earlier on Ivybridge. We work with integers often enough that we really want to target AVX2 instead of AVX, and this means it's pretty practical to ask for all those other goodies along with it. Chrome's GN files and Google3's BUILD file will need an update, before or after this CL. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2840 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I826daf77b5104664c5d31ddaabee347e287b87a2 Reviewed-on: https://skia-review.googlesource.com/2840 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/core/SkOpts.cpp
|
fa9f241a85c55c32a3fe2ae0324811de998f7a2e |
|
29-Sep-2016 |
Mike Klein <mtklein@chromium.org> |
Add an enum layer of indirection for stock raster pipeline stages. This is handy now, and becomes necessary with fancier backends: - most code can't speak the type of AVX pipeline stages, so indirection's definitely needed there; - if the pipleine is entirely composed of stock stages, these enum values become an abstract recipe that can be JITted. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2782 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Iedd62e99ce39e94cf3e6ffc78c428f0ccc182342 Reviewed-on: https://skia-review.googlesource.com/2782 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/core/SkOpts.cpp
|
baaf8ad95237d1defdb7d93077d9bf8410d8ad7f |
|
29-Sep-2016 |
Mike Klein <mtklein@chromium.org> |
Start moving SkRasterPipeline stages to SkOpts. This lets them pick up runtime CPU specializations. Here I've plugged in SSE4.1. This is still one of the N prelude CLs to full 8-at-a-time AVX. I've moved the union of the stages used by SkRasterPipelineBench and SkRasterPipelineBlitter to SkOpts... they'll all be used by the blitter eventually. Picking up SSE4.1 specialization here (even still just 4 pixels at a time) is a significant speedup, especially to store_srgb(), so much that it's no longer really interesting to compare against the fused-but-default-instruction-set version in the bench. So that's gone now. That left the SkRasterPipeline unit test as the only other user of the EasyFn simplified interface to SkRasterPipeline. So I converted that back down to the bare-metal interface, and EasyFn and its friends became SkRasterPipeline_opts.h exclusive abbreviations (now called Kernel_Sk4f). This isn't really unexpected: SkXfermode also wanted to build up its own little abstractions, and once you build your own abstraction, the value of an additional EasyFn-like layer plummets to negative. For simplicity I've left the SkXfermode stages alone, except srcover() which was always part of the blitter. No particular reason except keeping the churn down while I hack. These _can_ be in SkOpts, but don't have to be until we go 8-at-a-time. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2752 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I3b476b18232a1598d8977e425be2150059ab71dc Reviewed-on: https://skia-review.googlesource.com/2752 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/core/SkOpts.cpp
|
9441af52aafd59553ab1a2ea52c390400f93e0bb |
|
08-Sep-2016 |
mtklein <mtklein@chromium.org> |
Apple devices do not support CRC32 instructions. Don't believe Clang's lies. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2322033002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2322033002
/external/skia/src/core/SkOpts.cpp
|
da7136750d30f5f8a38808414e0f67f3382edacf |
|
08-Sep-2016 |
mtklein <mtklein@chromium.org> |
Flesh out SkOpts namespaces. This makes it easier to see the baseline CPU feature settings for a build when looking at a stack trace or profile. This fills out all the features we currently might care about. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2318313003 Review-Url: https://codereview.chromium.org/2318313003
/external/skia/src/core/SkOpts.cpp
|
78559a78f9d3e6444f8c0c9443696699703d6531 |
|
22-Aug-2016 |
mtklein <mtklein@chromium.org> |
Use ARMv8 CRC32 instructions for SkOpts::hash(). For large inputs, this runs ~11x faster than Murmur3. My bench drops from 1µs to 88ns. Like x86-64, this runs fastest if we work in 24 byte chunks. 16 byte chunks run at about 0.75x this speed, 8 byte chunks at about 0.4x (which would still be about 5x faster than Murmur3). This'll require plumbing support for opts_crc32 into Chrome first before it can roll. perf.skia.org charts we want to watch: https://perf.skia.org/#5490 Seach for compute_hash in these logs to see the difference: baseline: https://luci-milo.appspot.com/swarming/task/30ba22f3dfe30e10/steps/nanobench/0/stdout trybot: https://luci-milo.appspot.com/swarming/task/30bbc406cbf62d10/steps/nanobench/0/stdout BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2260823002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2260823002
/external/skia/src/core/SkOpts.cpp
|
4e97607d9a1cef66fac16f347c5ca813ec4f9515 |
|
08-Aug-2016 |
mtklein <mtklein@chromium.org> |
Use sse4.2 CRC32 instructions to hash when available. About 9x faster than Murmur3 for long inputs. Most of this is a mechanical change from SkChecksum::Murmur3(...) to SkOpts::hash(...). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2208903002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac-Clang-x86_64-Release-CMake-Trybot Review-Url: https://codereview.chromium.org/2208903002
/external/skia/src/core/SkOpts.cpp
|
15ee3deee8aca2bf6e658449f25ee34a8153e6ee |
|
02-Aug-2016 |
msarett <msarett@google.com> |
Refactor of SkColorSpaceXformOpts (1) Performance is better or stays the same. (2) Code is split into functions (RasterPipeline-ish design). IMO, it's not really more or less readable. But I think it's now much easier add capabilities, apply optimizations, or do more refactors. Or to actually use RasterPipeline. I help back from trying any of these to try to keep this CL sane. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2194303002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2194303002
/external/skia/src/core/SkOpts.cpp
|
50ce1f28ffede3fa3e38d330d4114ee52b387848 |
|
29-Jul-2016 |
msarett <msarett@google.com> |
Add color space xform support to SkJpegCodec (includes F16!) Also changes SkColorXform to support: RGBA->RGBA RGBA->BGRA Instead of: RGBA->SkPMColor TBR=reed@google.com BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2174493002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/73d55332e2846dd05e9efdaa2f017bcc3872884b Review-Url: https://codereview.chromium.org/2174493002
/external/skia/src/core/SkOpts.cpp
|
39979d8c6b97889f600a212cfc9b063360f3de2f |
|
29-Jul-2016 |
msarett <msarett@google.com> |
Revert of Add color space xform support to SkJpegCodec (includes F16!) (patchset #9 id:260001 of https://codereview.chromium.org/2174493002/ ) Reason for revert: Breaking MSAN Original issue's description: > Add color space xform support to SkJpegCodec (includes F16!) > > Also changes SkColorXform to support: > RGBA->RGBA > RGBA->BGRA > > Instead of: > RGBA->SkPMColor > > TBR=reed@google.com > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2174493002 > CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/73d55332e2846dd05e9efdaa2f017bcc3872884b TBR=mtklein@google.com,reed@google.com,herb@google.com,brianosman@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2195523002
/external/skia/src/core/SkOpts.cpp
|
73d55332e2846dd05e9efdaa2f017bcc3872884b |
|
29-Jul-2016 |
msarett <msarett@google.com> |
Add color space xform support to SkJpegCodec (includes F16!) Also changes SkColorXform to support: RGBA->RGBA RGBA->BGRA Instead of: RGBA->SkPMColor TBR=reed@google.com BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2174493002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2174493002
/external/skia/src/core/SkOpts.cpp
|
9ce3a543c92a73e6daca420defc042886b3f2019 |
|
15-Jul-2016 |
msarett <msarett@google.com> |
Add capability for SkColorXform to output half floats BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147763002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2147763002
/external/skia/src/core/SkOpts.cpp
|
05c73b7ed5b73a55cbecf7f787a35fcc0e84e983 |
|
13-Jul-2016 |
mtklein <mtklein@chromium.org> |
Remove bulk float <-> half routines. These are dead code. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2152583002 Review-Url: https://codereview.chromium.org/2152583002
/external/skia/src/core/SkOpts.cpp
|
0358a6ac00497ecb9fa9412045560b7f33d3a9eb |
|
13-Jul-2016 |
mtklein <mtklein@chromium.org> |
Update SkOpts namespaces. If we make sure all SkOpts functions are static, we can give the namespaces any name we like. This lets us drop the sk_ prefix and give a real indication of the default SIMD instruction set rather than just saying sk_default. Both of these changes help debugger, profiler, and crash report readability. Perhaps more importantly, keeping these functions static helps prevent accidentally linking in unused versions of functions, as you see here with sk_avx::srcover_srgb_srgb(). This requires we update SkBlend_opts tests and benches to call SkOpts functions through SkOpts rather than declaring the methods externally. In practice this drops testing of the SSE2 version on machines with SSE4. If we still really need to test/bench the compile time best SIMD level version of this method against the runtime detected best, we can include SkBlend_opts.h into the tests or benches directly, similar to what we do for the trivial, brute-force, or best non-SIMD versions. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145833002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2145833002
/external/skia/src/core/SkOpts.cpp
|
6006f678e78af7b6f67a454cd4bc213048983f9d |
|
11-Jul-2016 |
msarett <msarett@google.com> |
Make all color xforms 'fast' (step 1) This refactors opt code to handle arbitrary src and dst gammas that are specified by tables. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2130013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2130013002
/external/skia/src/core/SkOpts.cpp
|
4d1dd6643f2efa34d22d5fc3cf9cb4866252358e |
|
23-Jun-2016 |
herb <herb@google.com> |
Add stub for avx. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2087343002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2087343002
/external/skia/src/core/SkOpts.cpp
|
d2809573deb7b99e764f7f71fe34a5b5322df0b2 |
|
20-Jun-2016 |
msarett <msarett@google.com> |
Support sRGB dsts in opt code 201295.jpg on HP z620 (300x280) QCMS Xform 0.418 ms Skia NEW Xform 0.378 ms Vs QCMS 1.11x BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2078623002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2078623002
/external/skia/src/core/SkOpts.cpp
|
dea0340cadb759932e53416a657f5ea75fee8b5f |
|
16-Jun-2016 |
msarett <msarett@google.com> |
Implement fast, correct gamma conversion for color xforms 201295.jpg on HP z620 (300x280, most common form of sRGB profile) QCMS Xform 0.495 ms Skia Old Xform 0.235 ms Skia NEW Xform 0.423 ms Vs Old Code 0.56x Vs QCMS 1.17x So to summarize, we are now much slower than before, but still a bit faster than QCMS. And now we are also far more accurate than QCMS :). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2060823003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2060823003
/external/skia/src/core/SkOpts.cpp
|
a9e878c836994bce695274b4c28890290139dcdf |
|
08-Jun-2016 |
msarett <msarett@google.com> |
Optimize color xforms with 2.2 gammas for SSE2 Because we recognize commonly used gamma tables and parameters as 2.2f, about 98% of jpegs with color profiles will pass through this xform (assuming the dst is also 2.2f). Sample size is 10,322 jpegs. I won't go crazy with performance numbers because this is a work in progress, particularly in terms of correctness. 201295.jpg on HP z620 (300x280, most common form of sRGB profile) Decode Time + QCMS Xform 1.28 ms QCMS Xform Only 0.495 ms Decode Time + Skia Opt Xform 1.01 ms Skia Opt Xform Only 0.235 ms Decode Time + Xform Speed-up 1.27x Xform Only Speed-up 2.11x FWIW, Skia xform time before these optimizations was 41.1 ms. But we expected that code to be slow. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2046013002
/external/skia/src/core/SkOpts.cpp
|
e3fa811657ecf4ab694d026752a81080c6b10611 |
|
01-Jun-2016 |
sdefresne <sdefresne@chromium.org> |
[GN] Add support for disabling opts via SK_BUILD_NO_OPTS define. When targetting iOS and using gyp to generate the build files, it is not possible to select files to build depending on the architecture. Due to that, the skia code was disabling all optimisation when SK_BUILD_FOR_IOS was defined. Since it is possible to select the correct optimised version when using gn, this pessimisation is hurting the build. Introduce a new define to disable the optimisation SK_BUILD_NO_OPTS. It will be used by Chromium when building skia for iOS with gyp but not gn. Define SK_BUILD_NO_OPTS along-side SK_BUILD_FOR_IOS for all files that look like build configuration (Xcode projects, gyp configuration files, public.bzl) in order to avoid introducing breakage on those builds. BUG=607933 GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2002423002 Review-Url: https://codereview.chromium.org/2002423002
/external/skia/src/core/SkOpts.cpp
|
809ccf37ec836d0df64afd0b13023fd968d505a4 |
|
05-May-2016 |
mtklein <mtklein@chromium.org> |
Remove NEON runtime detection support. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1952953004 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/1952953004
/external/skia/src/core/SkOpts.cpp
|
c5091b5b6c4b8a7aef8c12db9ea2a85e907b01c4 |
|
02-May-2016 |
mtklein <mtklein@chromium.org> |
Add a hook for CPU-optimized sRGB-sRGB srcover. Herb's really starting to get serious about tweaking this, which becomes a lot easier when you've got SkOpts' runtime CPU detection. We should be able to optimize this usefully for SSSE3, SSE4.1, AVX, AVX2, or NEON. (We can of course implement a subset.) This function takes two counts to give us flexibility to write src patterns: nsrc >= ndst -> the usual srcover function nsrc < ndst -> repeat src until it fills dst nsrc << ndst -> possibly preprocess src into registers nsrc == 1 -> equivalent of blitrow_color32, srcover_1, etc. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1939783003 Review-Url: https://codereview.chromium.org/1939783003
/external/skia/src/core/SkOpts.cpp
|
4311f016612a814282029daa4bd102053a853d82 |
|
19-Apr-2016 |
mtklein <mtklein@chromium.org> |
Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the check (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. Once this lands and bakes a bit, I'll start following up with CLs to use it more and to add a bunch of those little 1-2 instruction snippets we've been wanting, e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607 Review URL: https://codereview.chromium.org/1890483002
/external/skia/src/core/SkOpts.cpp
|
05db63b5fc262136da9b289418b871180cd1359b |
|
19-Apr-2016 |
mtklein <mtklein@chromium.org> |
Remove static initializer for SkOpts::Init() Static initializers run in a confusing unspecified order, so it's best to have as few of them as possible. Most tools and clients I can find already call SkGraphics::Init(), (or equivalently create an SkAutoGraphics) which calls SkOpts::Init(): - Chrome - Chrome renderer - Android - DM - nanobench - SampleApp - VisualBench - the old debugger Seems like the only thing relying on this static initializer today is the new debugger, fixed here. TBR=reed@google.com BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1903503002 Review URL: https://codereview.chromium.org/1903503002
/external/skia/src/core/SkOpts.cpp
|
d9dd4282118fc14eac735e6fa0b3ec53047b457f |
|
18-Apr-2016 |
mtklein <mtklein@chromium.org> |
Modernize and trim down SkOnce. The API and implementation are very much simplified. You may not want to bother reading the diff. As is our trend, SkOnce now uses <atomic> directly. Member initialization means we don't need SK_DECLARE_STATIC_ONCE. SkSpinlock already works this same way. All uses of the old API taking an external bool* and Lock* were pessimal, so I have not carried this sort of API forward. It's simpler, faster, and more space-efficient to always use this single SkOnce class interface. SkOnce weighs 2 bytes: a done bool and an SkSpinlock, also a bool internally. This API refactoring opens up the opportunity to fuse those into a single three-state byte if we'd like. No public API changes. TBR=reed@google.com BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1894893002 Review URL: https://codereview.chromium.org/1894893002
/external/skia/src/core/SkOpts.cpp
|
86498fbfcb93a9048bbe1c28cc0df40d8d0c96e9 |
|
15-Apr-2016 |
mtklein <mtklein@google.com> |
Revert of Move CPU feature detection to its own file. (patchset #7 id:120001 of https://codereview.chromium.org/1890483002/ ) Reason for revert: many unexpected GM diffs across GPU+CPU configs on Windows (hopefully just text masks on GPU?). seems like we pick a different srcover variant in some places. Original issue's description: > Move CPU feature detection to its own file. > > - Moves CPU feature detection to its own file. > - Cleans up some redundant feature detection scattered around core/ and opts/. > - Can now detect a few new CPU features: > * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 > * FMA -> Intel FMA instructions, added at the same time as AVX2 > * VFP_FP16 -> ARM f16<->f32 instructions, quite common > * NEON_FMA -> ARM FMA instructions, also quite common > * SSE and SSE3... why not? > > This new internal API makes it very cheap to do fine-grained runtime CPU > feature detection. Redundant calls to SkCpu::Supports() should be eliminated > and it's hoistable out of loops. It compiles away entirely when we have the > appropriate instructions available at compile time. > > This means we can call it to guard even a little snippet of 1 or 2 instructions > right where needed and let inlining hoist the check (if any at all) up to > somewhere that doesn't hurt performance. I've explained how I made this work > in the private section of the new header. > > Once this lands and bakes a bit, I'll start following up with CLs to use it more > and to add a bunch of those little 1-2 instruction snippets we've been wanting, > e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps > (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607 TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1892643003
/external/skia/src/core/SkOpts.cpp
|
567118fbe69f9d0e245ddc0a6c312b6b9b70233f |
|
14-Apr-2016 |
mtklein <mtklein@chromium.org> |
Graduate matrix map-point procs out of SkOpts. These are implemented generically with Sk4s and don't benefit from anything fancier than vanilla SSE/NEON. This means there's no need to hide this code away in another file or behind a function pointer... it's readable and we have compile-time support for all the instructions it needs. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1872193002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1872193002
/external/skia/src/core/SkOpts.cpp
|
872ea29357439f05b1f6995dd300fc054733e607 |
|
14-Apr-2016 |
mtklein <mtklein@chromium.org> |
Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the check (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. Once this lands and bakes a bit, I'll start following up with CLs to use it more and to add a bunch of those little 1-2 instruction snippets we've been wanting, e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1890483002
/external/skia/src/core/SkOpts.cpp
|
b4a7dc99b1a01cdd5c0cd5913b630436ca696210 |
|
23-Mar-2016 |
mtklein <mtklein@chromium.org> |
Port S32A_opaque blit row to SkOpts. This should be a pixel-for-pixel (i.e. bug-for-bug) port. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1820313002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1820313002
/external/skia/src/core/SkOpts.cpp
|
3bc2624a4b89c49efd65f5e548ac5f2dd9351431 |
|
17-Feb-2016 |
mtklein <mtklein@chromium.org> |
try plain-old code for sk_memset16/32 now that NEON is compile-time Most of these implementations now just say "always inline". Let's see if we can get away with the simplicity of doing that all the time. These inlined implementations can autovectorize easily. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1639863002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1639863002
/external/skia/src/core/SkOpts.cpp
|
a525cb151bb39fb6362af051f69b6d633f660fd9 |
|
09-Feb-2016 |
mtklein <mtklein@chromium.org> |
skeleton for float <-> half optimized procs Nothing fancy yet, just calls the serial code in a loop. I will try to folow this up with at least some of: - SSE2 version of serial code - NEON version of serial code - NEON version using vcvt.f32.f16/vcvt.f16.f32 - F16C (between AVX and AVX2) version using vcvtph2ps/vcvtps2ph The last two are fastest but need runtime detection. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1686543003 Review URL: https://codereview.chromium.org/1686543003
/external/skia/src/core/SkOpts.cpp
|
81bb79b7b97909eb2b10444c4bd06bc9e1e56db3 |
|
09-Feb-2016 |
mtklein <mtklein@chromium.org> |
Remove SkNx AVX code. It is not really used. Getting in the way of refactoring. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1679053002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1679053002
/external/skia/src/core/SkOpts.cpp
|
c5c322d8ecfc05718f9f04360956c4f1f9dc33c1 |
|
08-Feb-2016 |
msarett <msarett@google.com> |
Optimize CMYK->RGBA (BGRA) transform for jpeg decodes Swizzle Bench Runtime Nexus 6P 0.14x Dell Venue 8 0.12x CMYK Jpeg Decode Runtime Nexus 6P 0.81x Dell Venue 8 0.85x BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1676773003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1676773003
/external/skia/src/core/SkOpts.cpp
|
1e06079b259d1091b735492b2f71d9897c14c608 |
|
03-Feb-2016 |
msarett <msarett@google.com> |
NEON optimizations for GrayAlpha -> RGBA/BGRA Premul/Unpremul PNG Decode Time Nexus 6P (for a test set of GrayAlpha encoded PNGs) Regular Unpremul 0.91x Zero Init Unpremul 0.92x Regular Premul 0.84x Zero Init Premul 0.86x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1663623002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1663623002
/external/skia/src/core/SkOpts.cpp
|
2eff71c9b5f984b58961e5a6b4e66774c4385224 |
|
02-Feb-2016 |
msarett <msarett@google.com> |
NEON optimizations for gray -> RGBA (or BGRA) conversions Swizzle Bench Runtime Nexus 6P 0.32x Nexus 9 0.89x PNG Decode Time (for test set of gray encoded PNGs) Nexus 6P 0.88x Nexus 9 0.91x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1656383002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1656383002
/external/skia/src/core/SkOpts.cpp
|
a766ca87bfb1129a8cf4f8a428121caf60726de4 |
|
26-Jan-2016 |
mtklein <mtklein@chromium.org> |
de-proc sk_float_rsqrt This is the first of many little baby steps to have us stop runtime-detecting NEON. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1616013003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/efcc125acd2d71eb077caf6db65fdd6b9eb1dc0d Review URL: https://codereview.chromium.org/1616013003
/external/skia/src/core/SkOpts.cpp
|
0dfffbeeec3078f6a03e83a4efa4c45fefdd338d |
|
25-Jan-2016 |
msarett <msarett@google.com> |
Revert of AVX 2 SrcOver blits: color32, blitmask. (patchset #24 id:450001 of https://codereview.chromium.org/1532613002/ ) Reason for revert: Bot failures Original issue's description: > AVX 2 SrcOver blits: color32, blitmask. > > As a follow up to the SSE 4.1 CL, this should look pretty familiar. > > I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later. > > Perf changes (relative to SSE 4.1): > Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit > Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent > text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit > text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit > > This should be a big throughout win, and a small latency win. > This should all be pixel-exact to the previous SSE 4.1 code. > > > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1532613002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot > > Committed: https://skia.googlesource.com/skia/+/5d2117015eb271e09faf4a7ddd89093c9d618a36 TBR=herb@google.com,mtklein@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Review URL: https://codereview.chromium.org/1632713002
/external/skia/src/core/SkOpts.cpp
|
5d2117015eb271e09faf4a7ddd89093c9d618a36 |
|
25-Jan-2016 |
mtklein <mtklein@chromium.org> |
AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should all be pixel-exact to the previous SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1532613002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot Review URL: https://codereview.chromium.org/1532613002
/external/skia/src/core/SkOpts.cpp
|
ed814f34c72248c7eecf3b0a5f335c3a7c2e9dd1 |
|
22-Jan-2016 |
mtklein <mtklein@google.com> |
Revert of de-proc sk_float_rsqrt (patchset #3 id:40001 of https://codereview.chromium.org/1616013003/ ) Reason for revert: This is somehow blocking the Google3 roll in ways neither Ben nor I understand. Precautionary revert... will try again Monday. Original issue's description: > de-proc sk_float_rsqrt > > This is the first of many little baby steps to have us stop runtime-detecting NEON. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1616013003 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/efcc125acd2d71eb077caf6db65fdd6b9eb1dc0d TBR=reed@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1629503002
/external/skia/src/core/SkOpts.cpp
|
f1b8b6ae34e5a1f4b29e423401da39f88f0c117a |
|
22-Jan-2016 |
msarett <msarett@google.com> |
Use NEON optimizations for RGB -> RGB(FF) or BGR(FF) in SkSwizzler Swizzle Bench Runtime Nexus 6P xxx_xxxa 0.32x xxx_swaprb_xxxa 0.31x Swizzle Bench Runtime Nexus 9 xxx_xxxa 1.11x xxx_swaprb_xxxa 1.14x (This is a slow down.) Swizzle Bench Runtime Nexus 5 xxx_xxxa 0.12x xxx_swaprb 0.12x RGB PNG Decode Runtime Nexus 6P 0.94x Nexus 9 0.98x I don't know how to explain the fact that the Swizzle Bench was slower on Nexus 9, but the decode times got faster. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1618003002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1618003002
/external/skia/src/core/SkOpts.cpp
|
efcc125acd2d71eb077caf6db65fdd6b9eb1dc0d |
|
22-Jan-2016 |
mtklein <mtklein@chromium.org> |
de-proc sk_float_rsqrt This is the first of many little baby steps to have us stop runtime-detecting NEON. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1616013003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1616013003
/external/skia/src/core/SkOpts.cpp
|
8bf7b79cf9776b4edb3f6810e5ab8c80c49d3480 |
|
22-Jan-2016 |
mtklein <mtklein@chromium.org> |
Refactor swizzle names and types. - Plant a flag to say "pretend all the inputs are RGBA". This is how libpng thinks. This is the opposite of what the implementation had been doing, so I've rearranged everything to reflect the new orientation. - Rewrite the names to be less mysterious looking. No more Xs. - Make the src type uniformly const void*, to allow for 888 (RGB) srcs. This should be performance and pixel neutral. (Please revert if it's not.) BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1626463002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1626463002
/external/skia/src/core/SkOpts.cpp
|
3a24f459582f2665f0e66bd35a0d8f46a1c4c72f |
|
13-Jan-2016 |
msarett <msarett@google.com> |
Optimized premultiplying swizzles for NEON Improves decode performance for RGBA encoded PNGs. Swizzle Time on Nexus 9 (with clang): SwapPremul 0.44x Premul 0.44x Decode Time On Nexus 9 (with clang): ZeroInit Decodes 0.85x Regular Decodes 0.86x Swizzle Time on Nexus 6P (with clang) SwapPremul 0.14x Premul 0.14x Decode Time On Nexus 6P (with clang): ZeroInit Decodes 0.93x Regular Decodes 0.95x Notes: ZeroInit means memory is zero initialized, and we do not write to memory for large sections of zero pixels (memory use opt for Android). A profile on Nexus 9 shows that the premultiplication step of PNG decoding is now ~5% of decode time (down from ~20%). BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1577703006 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1577703006
/external/skia/src/core/SkOpts.cpp
|
a1bfaad00486883669e20a94cf8f7a8f4e34266a |
|
07-Jan-2016 |
mtklein <mtklein@chromium.org> |
Set up some hooks for premul/swizzzle opts. You can call these as SkOpts::premul_xxxa, SkOpts::swaprb_xxxa, etc. For now, I just backed the function pointers with some (untested) portable code, which may autovectorize. We can override with optimized versions in Init_ssse3() (in SkOpts_ssse3.cpp), Init_neon() (SkOpts_neon.cpp), etc. BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1569013002 Review URL: https://codereview.chromium.org/1569013002
/external/skia/src/core/SkOpts.cpp
|
2d0e37a16eaf1281ace3e4034bf143853c236940 |
|
03-Jan-2016 |
thakis <thakis@chromium.org> |
Fix -Wunused-function warnings in chrome/ios builds. BUG=chromium:573779 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1559783002 Review URL: https://codereview.chromium.org/1559783002
/external/skia/src/core/SkOpts.cpp
|
ee9fe1e6f39a37d522ea700873b0e5c08f0c6674 |
|
17-Nov-2015 |
mtklein <mtklein@chromium.org> |
De-spam SkOpts.cpp - stop printing when we detect sse4.2 and avx2 - that TODO is pretty well done BUG=skia: Review URL: https://codereview.chromium.org/1445343002
/external/skia/src/core/SkOpts.cpp
|
084db25d47dbad3ffbd7d15c04b63d344b351f90 |
|
11-Nov-2015 |
mtklein <mtklein@chromium.org> |
float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX. Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x Xfermode_ColorDodge 25ms -> 16.5ms 0.66x Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x Ought to be no pixel diffs: https://gold.skia.org/search2?issue=1432903002&unt=true&query=source_type%3Dgm&master=false Incidental stuff: I made the SkNx(T) constructors implicit to make writing math expressions simpler. This allows us to write expressions like Sk4f v; ... v = v*4; rather than Sk4f v; ... v = v * Sk4f(4); As written it only works when the constant is on the right-hand side, so expressions like `(Sk4f(1) - da)` have to stay for now. I plan on following up with a CL that lets those become `(1 - da)` too. BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1432903002
/external/skia/src/core/SkOpts.cpp
|
953549235ddaaf4e670b44bd69efa1ac1c835be0 |
|
09-Nov-2015 |
mtklein <mtklein@chromium.org> |
sse 4.2 detection While we're detecting instruction sets, let's fill in this hole too. BUG=skia: Review URL: https://codereview.chromium.org/1419553007
/external/skia/src/core/SkOpts.cpp
|
844a0b425741f07cb233332405143931586bbb7d |
|
07-Nov-2015 |
mtklein <mtklein@chromium.org> |
avx and avx2 detection This doesn't do anything yet beyond print out a message in Debug mode, but it's a start. Those messages should match the -SSE4-, -AVX-, or -AVX2- in the Test-...-Debug-Trybots below. The Release ones are just running by accident. So far they look right to me. BUG=skia: Review URL: https://codereview.chromium.org/1428153003
/external/skia/src/core/SkOpts.cpp
|
7de63d50d4cda278c67d8ddb0d07a8ca6a56c74a |
|
28-Oct-2015 |
mtklein <mtklein@chromium.org> |
Android framework builds can't see cpu-features.h Don't try to do NEON detection for Android framework builds. BUG=skia: Review URL: https://codereview.chromium.org/1423953004
/external/skia/src/core/SkOpts.cpp
|
4e8a09d3672702704436112f3aa5611bd79b2690 |
|
10-Sep-2015 |
mtklein <mtklein@chromium.org> |
Port SkMatrix opts to SkOpts. No changes to the code, just moved around. This will have the effect of enabling vectorized code on ARMv7. Should be no effect on ARMv8 or x86, which would have been vectorized already. nanobench --match mappoints changes on Nexus 5 (ARMv7): _affine: 132 -> 95 _scale: 118 -> 47 _trans: 60 -> 37 A teaser: We should next look at the ABCD->BADC shuffle we've noted that we need in _affine. A quick hack showed doing that optimally is another ~35% speedup on x86. Got to figure out how to do it best on ARM though: that same quick hack was a 2x slowdown there. Good reason to resurrect that SkNx_shuffle() CL! (I believe the answers are vrev64q_f32(v) and _mm_shuffle_ps(v,v, _MM_SHUFFLE(2,3,0,1), but we should probably find out in another CL.) BUG=skia:4117 Review URL: https://codereview.chromium.org/1320673014
/external/skia/src/core/SkOpts.cpp
|
4a37d08382a16717cde52c3d2687b021c5413464 |
|
10-Sep-2015 |
mtklein <mtklein@chromium.org> |
Port SkBlitRow::Color32 to SkOpts. This was a pre-SkOpts attempt that we can bring under its wing now. This should be a perf no-op, deo volente. BUG=skia:4117 Review URL: https://codereview.chromium.org/1314863006
/external/skia/src/core/SkOpts.cpp
|
08f9234eaafcda33ebf5e74ec27ca72f4abda4fb |
|
18-Aug-2015 |
mtklein <mtklein@chromium.org> |
Try again to put SkXfermode_opts in SK_OPTS_NS Remember failed attempt https://codereview.chromium.org/1286093004/ ? I think this one is simpler and safer and even technically legal C++. BUG=skia:4117 Review URL: https://codereview.chromium.org/1296183004
/external/skia/src/core/SkOpts.cpp
|
b2a327094f2b9a072fddfd440481b1c0a8b0bc80 |
|
18-Aug-2015 |
mtklein <mtklein@chromium.org> |
Update SkOpts namespaces. portable -> default, and everyone gets an sk_ prefix. BUG=skia:4117 Review URL: https://codereview.chromium.org/1299013003
/external/skia/src/core/SkOpts.cpp
|
2d141ba2df8f7506848aa9369f502944e837cd09 |
|
18-Aug-2015 |
mtklein <mtklein@chromium.org> |
Patches on top of Radu's latest. patch from issue 1273033005 at patchset 120001 (http://crrev.com/1273033005#ps120001) BUG=skia: Review URL: https://codereview.chromium.org/1288323004
/external/skia/src/core/SkOpts.cpp
|
948376379362e54c5357005778ba5f53f3dc9d5e |
|
18-Aug-2015 |
mtklein <mtklein@chromium.org> |
Remove SkOpts_sse2.cpp. It's sort of pointless: all our clients that will have SSE2 at runtime have it unconditionally at compile time, so the functions in namespace portable will pick up the SSE2 code. The procs in SkOpts_sse2.o were just duplicate code. A couple of the procs we had in _sse2.cpp can benefit slightly when compiled with SSSE3. I've moved those to _ssse3.cpp. This should lead to small speedups on platforms like Linux and Windows that have a baseline of SSE2. Similarly, I've removed the call to Init_neon() when NEON is available globally... it's a no-op. Renaming namespace portable to something clearer is TBD. BUG=skia:4117 Review URL: https://codereview.chromium.org/1294213002
/external/skia/src/core/SkOpts.cpp
|
082e329887a8f1efe4e1020f0a0a6ea09961712d |
|
12-Aug-2015 |
mtklein <mtklein@google.com> |
Revert of Refactor to put SkXfermode_opts inside SK_OPTS_NS. (patchset #1 id:1 of https://codereview.chromium.org/1286093004/ ) Reason for revert: Maybe causing test / gold problems? Original issue's description: > Refactor to put SkXfermode_opts inside SK_OPTS_NS. > > Without this refactor I was getting warnings previously about having code > inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to > code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1]. > > That low-level code was in an anonymous namespace to allow multiple independent > copies of its methods to be instantiated without the linker getting confused / > offended about violating the One Definition Rule. This was only happening in > Debug mode where the methods were not being inlined. > > To fix this all, I've force-inlined the methods of the low-level code and > removed the anonymous namespace. > > BUG=skia:4117 > > > [1] Here is what those errors looked like: > > In file included from ../../../../src/core/SkOpts.cpp:18:0: > ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror] > class Sk4pxXfermode : public SkProcCoeffXfermode { > ^ > ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror] > ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror] > class SkPMFloatXfermode : public SkProcCoeffXfermode { > ^ > cc1plus: all warnings being treated as errors > > Committed: https://skia.googlesource.com/skia/+/b07bee3121680b53b98b780ac08d14d374dd4c6f TBR=djsollen@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1284333002
/external/skia/src/core/SkOpts.cpp
|
b07bee3121680b53b98b780ac08d14d374dd4c6f |
|
12-Aug-2015 |
mtklein <mtklein@chromium.org> |
Refactor to put SkXfermode_opts inside SK_OPTS_NS. Without this refactor I was getting warnings previously about having code inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1]. That low-level code was in an anonymous namespace to allow multiple independent copies of its methods to be instantiated without the linker getting confused / offended about violating the One Definition Rule. This was only happening in Debug mode where the methods were not being inlined. To fix this all, I've force-inlined the methods of the low-level code and removed the anonymous namespace. BUG=skia:4117 [1] Here is what those errors looked like: In file included from ../../../../src/core/SkOpts.cpp:18:0: ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror] class Sk4pxXfermode : public SkProcCoeffXfermode { ^ ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror] ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror] class SkPMFloatXfermode : public SkProcCoeffXfermode { ^ cc1plus: all warnings being treated as errors Review URL: https://codereview.chromium.org/1286093004
/external/skia/src/core/SkOpts.cpp
|
4977983510028712528743aa877f6da83781b381 |
|
10-Aug-2015 |
mtklein <mtklein@chromium.org> |
Sk4px blit mask. Local SKP nanobenching ranges SSE between 1.05x and 0.87x, much more heavily weighted toward <1.0x ratios (speedups). I profiled the top five regressions (1.05x-1.01x) and they look like noise. Will follow up after broad bot results. NEON looks similar but less extreme than SSE changes, ranging between 1.02x and 0.95x, again mostly speedups in 0.99x-0.97x range. The old code trifurcated into black, opaque-but-not-black, and general versions as a function of the constant src color. I did not see a significant difference between general and opaque-but-not-black, and I don't think a black version would be faster using SIMD. So we have here just one version of the code, the general version. Somewhat fantastically, I see no pixel diffs on GMs or SKPs. I will be following up with more CLs for the other procs called by SkBlitMask. BUG=skia: Review URL: https://codereview.chromium.org/1278253003
/external/skia/src/core/SkOpts.cpp
|
b6394746ff546a9c60d68e3be162cb38feffa803 |
|
06-Aug-2015 |
mtklein <mtklein@chromium.org> |
Port SkTextureCompression opts to SkOpts Pretty vanilla translation. I cleaned up who calls whom a little. Used to be utils -> opts -> utils, now it's just utils -> opts. I may follow up with a pass over the NEON code for readability and to clean up dead code. This turns on NEON A8->R11EAC conversion for ARMv8. Unit tests which now hit the NEON code still pass. I can't find any related bench. BUG=skia:4117 Review URL: https://codereview.chromium.org/1273103002
/external/skia/src/core/SkOpts.cpp
|
d029ded92d409a004f2096c78f5a99b524206481 |
|
04-Aug-2015 |
mtklein <mtklein@chromium.org> |
Port morphology to SkOpts. Nothing too fancy. Direction enums become enum classes so they don't get all confused. An alternative is to create one single Direction enum that both blur and morphology opts use. BUG=skia:4117 Review URL: https://codereview.chromium.org/1267343004
/external/skia/src/core/SkOpts.cpp
|
8caa5af92cf91debc1598380cb72c330e8c63efb |
|
04-Aug-2015 |
Mike Klein <mtklein@chromium.org> |
Reorganize to keep similar code together. This organizes memset16, memset32, and rsqrt the same way as the other code. No functional change. BUG=skia:4117 R=djsollen@google.com Review URL: https://codereview.chromium.org/1264423002 .
/external/skia/src/core/SkOpts.cpp
|
dce5ce4276e2825efc6d8c4daa819c965794cd12 |
|
04-Aug-2015 |
mtklein <mtklein@chromium.org> |
Port SkBlurImage opts to SkOpts. +268 -535 lines I also rearranged the code a little bit to encapsulate itself better, mostly replacing static helper functions with lambdas. This also let me merge the SSE2 and SSE4.1 code paths. BUG=skia:4117 Review URL: https://codereview.chromium.org/1264103004
/external/skia/src/core/SkOpts.cpp
|
f96bee384dea60a4e96035878e2671cd49d3b406 |
|
31-Jul-2015 |
mtklein <mtklein@chromium.org> |
disable SkOpts on x86 iOS (simulator) TBR= BUG=skia: Review URL: https://codereview.chromium.org/1266443006
/external/skia/src/core/SkOpts.cpp
|
490b61569d27c9b7ba164fbc4394994d2e7cb022 |
|
31-Jul-2015 |
mtklein <mtklein@chromium.org> |
Port SkXfermode opts to SkOpts.h Renames Sk4pxXfermode.h to SkXfermode_opts.h, and refactors it a tiny bit internally. This moves xfermode optimization from being "compile-time everywhere but NEON" to simply "runtime everywhere". I don't anticipate any effect on perf or correctness. BUG=skia:4117 Review URL: https://codereview.chromium.org/1264543006
/external/skia/src/core/SkOpts.cpp
|
7eb0945af254d376df11475150d184623104cf93 |
|
31-Jul-2015 |
mtklein <mtklein@chromium.org> |
Port SkUtils opts to SkOpts. With this new arrangement, the benefits of inlining sk_memset16/32 have changed. On x86, they're not significantly different, except for small N<=10 where the inlined code is significantly slower. On ARMv7 with NEON, our custom code is still significantly faster for N>10 (up to 2x faster). For small N<=10 inlining is still significantly faster. On ARMv7 without NEON, our custom code is still ridiculously faster (up to 10x) than inlining for N>10, though for small N<=10 inlining is still a little faster. We were not using the NEON memset16 and memset32 procs on ARMv8. At first blush, that seems to be an oversight, but if so it's an extremely lucky one. The ARMv8 code generation for our memset16/32 procs is total garbage, leaving those methods ~8x slower than just inlining the memset, using the compiler's autovectorization. So, no need to inline any more on x86, and still inline for N<=10 on ARMv7. Always inline for ARMv8. BUG=skia:4117 Review URL: https://codereview.chromium.org/1270573002
/external/skia/src/core/SkOpts.cpp
|
f684a78d9ea988883c9b2c7bcc4ea4d5e68bd998 |
|
30-Jul-2015 |
mtklein <mtklein@chromium.org> |
Runtime CPU detection for rsqrt(). This enables the NEON sk_float_rsqrt() code for configurations that have NEON at run-time but not compile-time. These devices will see about a 2x (1.26 -> 2.33) slowdown in sk_float_rsqrt(), but it should be more precise than our portable fallback. (When inlined, the portable fallback and the NEON code are almost identical in speed. The only difference is precision. Going through a function pointer is causing all this slowdown. This is a good example of a place where Skia really benefits from compile-time NEON.) BUG=skia:4117,skia:4114 No public API changes. TBR=reed@google.com Review URL: https://codereview.chromium.org/1264893002
/external/skia/src/core/SkOpts.cpp
|
8317a1832f55e175531ee7ae7ccd12a3a15e3c75 |
|
30-Jul-2015 |
mtklein <mtklein@chromium.org> |
Lay groundwork for SkOpts. This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks. BUG=skia:4117 Committed: https://skia.googlesource.com/skia/+/ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827 Review URL: https://codereview.chromium.org/1255193002
/external/skia/src/core/SkOpts.cpp
|
56b78a7a2aa87998e6a3f3d026e2184f1bccaf0c |
|
27-Jul-2015 |
mtklein <mtklein@chromium.org> |
Revert of Lay groundwork for SkOpts. (patchset #3 id:40001 of https://codereview.chromium.org/1255193002/) Reason for revert: Chromium doesn't call SkGraphics::Init(). This setup won't work. Original issue's description: > Lay groundwork for SkOpts. > > This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks. > > BUG=skia:4117 > > Committed: https://skia.googlesource.com/skia/+/ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827 TBR=djsollen@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1261743002
/external/skia/src/core/SkOpts.cpp
|
ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827 |
|
27-Jul-2015 |
mtklein <mtklein@chromium.org> |
Lay groundwork for SkOpts. This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks. BUG=skia:4117 Review URL: https://codereview.chromium.org/1255193002
/external/skia/src/core/SkOpts.cpp
|