History log of /external/skia/src/core/SkOpts.cpp
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
3bc2624a4b89c49efd65f5e548ac5f2dd9351431 17-Feb-2016 mtklein <mtklein@chromium.org> try plain-old code for sk_memset16/32 now that NEON is compile-time

Most of these implementations now just say "always inline".
Let's see if we can get away with the simplicity of doing that all the time.

These inlined implementations can autovectorize easily.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1639863002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1639863002
/external/skia/src/core/SkOpts.cpp
a525cb151bb39fb6362af051f69b6d633f660fd9 09-Feb-2016 mtklein <mtklein@chromium.org> skeleton for float <-> half optimized procs

Nothing fancy yet, just calls the serial code in a loop.

I will try to folow this up with at least some of:
- SSE2 version of serial code
- NEON version of serial code
- NEON version using vcvt.f32.f16/vcvt.f16.f32
- F16C (between AVX and AVX2) version using vcvtph2ps/vcvtps2ph
The last two are fastest but need runtime detection.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1686543003

Review URL: https://codereview.chromium.org/1686543003
/external/skia/src/core/SkOpts.cpp
81bb79b7b97909eb2b10444c4bd06bc9e1e56db3 09-Feb-2016 mtklein <mtklein@chromium.org> Remove SkNx AVX code. It is not really used. Getting in the way of refactoring.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1679053002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1679053002
/external/skia/src/core/SkOpts.cpp
c5c322d8ecfc05718f9f04360956c4f1f9dc33c1 08-Feb-2016 msarett <msarett@google.com> Optimize CMYK->RGBA (BGRA) transform for jpeg decodes

Swizzle Bench Runtime
Nexus 6P 0.14x
Dell Venue 8 0.12x

CMYK Jpeg Decode Runtime
Nexus 6P 0.81x
Dell Venue 8 0.85x

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1676773003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1676773003
/external/skia/src/core/SkOpts.cpp
1e06079b259d1091b735492b2f71d9897c14c608 03-Feb-2016 msarett <msarett@google.com> NEON optimizations for GrayAlpha -> RGBA/BGRA Premul/Unpremul

PNG Decode Time Nexus 6P (for a test set of GrayAlpha encoded PNGs)
Regular Unpremul 0.91x
Zero Init Unpremul 0.92x
Regular Premul 0.84x
Zero Init Premul 0.86x

BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1663623002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1663623002
/external/skia/src/core/SkOpts.cpp
2eff71c9b5f984b58961e5a6b4e66774c4385224 02-Feb-2016 msarett <msarett@google.com> NEON optimizations for gray -> RGBA (or BGRA) conversions

Swizzle Bench Runtime
Nexus 6P 0.32x
Nexus 9 0.89x

PNG Decode Time (for test set of gray encoded PNGs)
Nexus 6P 0.88x
Nexus 9 0.91x

BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1656383002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1656383002
/external/skia/src/core/SkOpts.cpp
a766ca87bfb1129a8cf4f8a428121caf60726de4 26-Jan-2016 mtklein <mtklein@chromium.org> de-proc sk_float_rsqrt

This is the first of many little baby steps to have us stop runtime-detecting NEON.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1616013003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Committed: https://skia.googlesource.com/skia/+/efcc125acd2d71eb077caf6db65fdd6b9eb1dc0d

Review URL: https://codereview.chromium.org/1616013003
/external/skia/src/core/SkOpts.cpp
0dfffbeeec3078f6a03e83a4efa4c45fefdd338d 25-Jan-2016 msarett <msarett@google.com> Revert of AVX 2 SrcOver blits: color32, blitmask. (patchset #24 id:450001 of https://codereview.chromium.org/1532613002/ )

Reason for revert:
Bot failures

Original issue's description:
> AVX 2 SrcOver blits: color32, blitmask.
>
> As a follow up to the SSE 4.1 CL, this should look pretty familiar.
>
> I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later.
>
> Perf changes (relative to SSE 4.1):
> Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit
> Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent
> text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit
> text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit
>
> This should be a big throughout win, and a small latency win.
> This should all be pixel-exact to the previous SSE 4.1 code.
>
>
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1532613002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/5d2117015eb271e09faf4a7ddd89093c9d618a36

TBR=herb@google.com,mtklein@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true

Review URL: https://codereview.chromium.org/1632713002
/external/skia/src/core/SkOpts.cpp
5d2117015eb271e09faf4a7ddd89093c9d618a36 25-Jan-2016 mtklein <mtklein@chromium.org> AVX 2 SrcOver blits: color32, blitmask.

As a follow up to the SSE 4.1 CL, this should look pretty familiar.

I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later.

Perf changes (relative to SSE 4.1):
Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit
Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent
text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit
text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit

This should be a big throughout win, and a small latency win.
This should all be pixel-exact to the previous SSE 4.1 code.

GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1532613002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot

Review URL: https://codereview.chromium.org/1532613002
/external/skia/src/core/SkOpts.cpp
ed814f34c72248c7eecf3b0a5f335c3a7c2e9dd1 22-Jan-2016 mtklein <mtklein@google.com> Revert of de-proc sk_float_rsqrt (patchset #3 id:40001 of https://codereview.chromium.org/1616013003/ )

Reason for revert:
This is somehow blocking the Google3 roll in ways neither Ben nor I understand. Precautionary revert... will try again Monday.

Original issue's description:
> de-proc sk_float_rsqrt
>
> This is the first of many little baby steps to have us stop runtime-detecting NEON.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1616013003
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/efcc125acd2d71eb077caf6db65fdd6b9eb1dc0d

TBR=reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1629503002
/external/skia/src/core/SkOpts.cpp
f1b8b6ae34e5a1f4b29e423401da39f88f0c117a 22-Jan-2016 msarett <msarett@google.com> Use NEON optimizations for RGB -> RGB(FF) or BGR(FF) in SkSwizzler

Swizzle Bench Runtime Nexus 6P
xxx_xxxa 0.32x
xxx_swaprb_xxxa 0.31x

Swizzle Bench Runtime Nexus 9
xxx_xxxa 1.11x
xxx_swaprb_xxxa 1.14x
(This is a slow down.)

Swizzle Bench Runtime Nexus 5
xxx_xxxa 0.12x
xxx_swaprb 0.12x

RGB PNG Decode Runtime
Nexus 6P 0.94x
Nexus 9 0.98x

I don't know how to explain the fact that the Swizzle Bench was
slower on Nexus 9, but the decode times got faster.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1618003002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1618003002
/external/skia/src/core/SkOpts.cpp
efcc125acd2d71eb077caf6db65fdd6b9eb1dc0d 22-Jan-2016 mtklein <mtklein@chromium.org> de-proc sk_float_rsqrt

This is the first of many little baby steps to have us stop runtime-detecting NEON.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1616013003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1616013003
/external/skia/src/core/SkOpts.cpp
8bf7b79cf9776b4edb3f6810e5ab8c80c49d3480 22-Jan-2016 mtklein <mtklein@chromium.org> Refactor swizzle names and types.

- Plant a flag to say "pretend all the inputs are RGBA".
This is how libpng thinks.
This is the opposite of what the implementation had been doing,
so I've rearranged everything to reflect the new orientation.

- Rewrite the names to be less mysterious looking. No more Xs.

- Make the src type uniformly const void*, to allow for 888 (RGB) srcs.

This should be performance and pixel neutral. (Please revert if it's not.)

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1626463002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1626463002
/external/skia/src/core/SkOpts.cpp
3a24f459582f2665f0e66bd35a0d8f46a1c4c72f 13-Jan-2016 msarett <msarett@google.com> Optimized premultiplying swizzles for NEON

Improves decode performance for RGBA encoded PNGs.

Swizzle Time on Nexus 9 (with clang):
SwapPremul 0.44x
Premul 0.44x
Decode Time On Nexus 9 (with clang):
ZeroInit Decodes 0.85x
Regular Decodes 0.86x

Swizzle Time on Nexus 6P (with clang)
SwapPremul 0.14x
Premul 0.14x
Decode Time On Nexus 6P (with clang):
ZeroInit Decodes 0.93x
Regular Decodes 0.95x

Notes:
ZeroInit means memory is zero initialized, and we do not write to
memory for large sections of zero pixels (memory use opt for Android).

A profile on Nexus 9 shows that the premultiplication step of PNG
decoding is now ~5% of decode time (down from ~20%).

BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1577703006
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1577703006
/external/skia/src/core/SkOpts.cpp
a1bfaad00486883669e20a94cf8f7a8f4e34266a 07-Jan-2016 mtklein <mtklein@chromium.org> Set up some hooks for premul/swizzzle opts.

You can call these as SkOpts::premul_xxxa, SkOpts::swaprb_xxxa, etc.

For now, I just backed the function pointers with some (untested) portable
code, which may autovectorize. We can override with optimized versions in
Init_ssse3() (in SkOpts_ssse3.cpp), Init_neon() (SkOpts_neon.cpp), etc.

BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1569013002

Review URL: https://codereview.chromium.org/1569013002
/external/skia/src/core/SkOpts.cpp
2d0e37a16eaf1281ace3e4034bf143853c236940 03-Jan-2016 thakis <thakis@chromium.org> Fix -Wunused-function warnings in chrome/ios builds.

BUG=chromium:573779
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1559783002

Review URL: https://codereview.chromium.org/1559783002
/external/skia/src/core/SkOpts.cpp
ee9fe1e6f39a37d522ea700873b0e5c08f0c6674 17-Nov-2015 mtklein <mtklein@chromium.org> De-spam SkOpts.cpp

- stop printing when we detect sse4.2 and avx2
- that TODO is pretty well done

BUG=skia:

Review URL: https://codereview.chromium.org/1445343002
/external/skia/src/core/SkOpts.cpp
084db25d47dbad3ffbd7d15c04b63d344b351f90 11-Nov-2015 mtklein <mtklein@chromium.org> float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX.

Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x
Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x
Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x
Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x
Xfermode_ColorDodge 25ms -> 16.5ms 0.66x
Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x

Ought to be no pixel diffs:
https://gold.skia.org/search2?issue=1432903002&unt=true&query=source_type%3Dgm&master=false

Incidental stuff:

I made the SkNx(T) constructors implicit to make writing math expressions simpler.
This allows us to write expressions like
Sk4f v;
...
v = v*4;
rather than
Sk4f v;
...
v = v * Sk4f(4);

As written it only works when the constant is on the right-hand side,
so expressions like `(Sk4f(1) - da)` have to stay for now. I plan on
following up with a CL that lets those become `(1 - da)` too.

BUG=skia:4117
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1432903002
/external/skia/src/core/SkOpts.cpp
953549235ddaaf4e670b44bd69efa1ac1c835be0 09-Nov-2015 mtklein <mtklein@chromium.org> sse 4.2 detection

While we're detecting instruction sets, let's fill in this hole too.

BUG=skia:

Review URL: https://codereview.chromium.org/1419553007
/external/skia/src/core/SkOpts.cpp
844a0b425741f07cb233332405143931586bbb7d 07-Nov-2015 mtklein <mtklein@chromium.org> avx and avx2 detection

This doesn't do anything yet beyond print out a message in Debug mode,
but it's a start. Those messages should match the -SSE4-, -AVX-, or -AVX2- in
the Test-...-Debug-Trybots below. The Release ones are just running by accident.

So far they look right to me.

BUG=skia:

Review URL: https://codereview.chromium.org/1428153003
/external/skia/src/core/SkOpts.cpp
7de63d50d4cda278c67d8ddb0d07a8ca6a56c74a 28-Oct-2015 mtklein <mtklein@chromium.org> Android framework builds can't see cpu-features.h

Don't try to do NEON detection for Android framework builds.

BUG=skia:

Review URL: https://codereview.chromium.org/1423953004
/external/skia/src/core/SkOpts.cpp
4e8a09d3672702704436112f3aa5611bd79b2690 10-Sep-2015 mtklein <mtklein@chromium.org> Port SkMatrix opts to SkOpts.

No changes to the code, just moved around.

This will have the effect of enabling vectorized code on ARMv7.
Should be no effect on ARMv8 or x86, which would have been vectorized already.

nanobench --match mappoints changes on Nexus 5 (ARMv7):

_affine: 132 -> 95
_scale: 118 -> 47
_trans: 60 -> 37

A teaser:
We should next look at the ABCD->BADC shuffle we've noted that we need in _affine. A quick hack showed doing that optimally is another ~35% speedup on x86. Got to figure out how to do it best on ARM though: that same quick hack was a 2x slowdown there. Good reason to resurrect that SkNx_shuffle() CL!

(I believe the answers are vrev64q_f32(v) and _mm_shuffle_ps(v,v, _MM_SHUFFLE(2,3,0,1), but we should probably find out in another CL.)

BUG=skia:4117

Review URL: https://codereview.chromium.org/1320673014
/external/skia/src/core/SkOpts.cpp
4a37d08382a16717cde52c3d2687b021c5413464 10-Sep-2015 mtklein <mtklein@chromium.org> Port SkBlitRow::Color32 to SkOpts.

This was a pre-SkOpts attempt that we can bring under its wing now.

This should be a perf no-op, deo volente.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1314863006
/external/skia/src/core/SkOpts.cpp
08f9234eaafcda33ebf5e74ec27ca72f4abda4fb 18-Aug-2015 mtklein <mtklein@chromium.org> Try again to put SkXfermode_opts in SK_OPTS_NS

Remember failed attempt https://codereview.chromium.org/1286093004/ ? I think this one is simpler and safer and even technically legal C++.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1296183004
/external/skia/src/core/SkOpts.cpp
b2a327094f2b9a072fddfd440481b1c0a8b0bc80 18-Aug-2015 mtklein <mtklein@chromium.org> Update SkOpts namespaces.

portable -> default, and everyone gets an sk_ prefix.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1299013003
/external/skia/src/core/SkOpts.cpp
2d141ba2df8f7506848aa9369f502944e837cd09 18-Aug-2015 mtklein <mtklein@chromium.org> Patches on top of Radu's latest.

patch from issue 1273033005 at patchset 120001 (http://crrev.com/1273033005#ps120001)

BUG=skia:

Review URL: https://codereview.chromium.org/1288323004
/external/skia/src/core/SkOpts.cpp
948376379362e54c5357005778ba5f53f3dc9d5e 18-Aug-2015 mtklein <mtklein@chromium.org> Remove SkOpts_sse2.cpp.

It's sort of pointless: all our clients that will have SSE2 at runtime have it
unconditionally at compile time, so the functions in namespace portable will
pick up the SSE2 code. The procs in SkOpts_sse2.o were just duplicate code.

A couple of the procs we had in _sse2.cpp can benefit slightly when compiled
with SSSE3. I've moved those to _ssse3.cpp. This should lead to small speedups
on platforms like Linux and Windows that have a baseline of SSE2.

Similarly, I've removed the call to Init_neon() when NEON is available globally... it's a no-op.

Renaming namespace portable to something clearer is TBD.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1294213002
/external/skia/src/core/SkOpts.cpp
082e329887a8f1efe4e1020f0a0a6ea09961712d 12-Aug-2015 mtklein <mtklein@google.com> Revert of Refactor to put SkXfermode_opts inside SK_OPTS_NS. (patchset #1 id:1 of https://codereview.chromium.org/1286093004/ )

Reason for revert:
Maybe causing test / gold problems?

Original issue's description:
> Refactor to put SkXfermode_opts inside SK_OPTS_NS.
>
> Without this refactor I was getting warnings previously about having code
> inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to
> code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1].
>
> That low-level code was in an anonymous namespace to allow multiple independent
> copies of its methods to be instantiated without the linker getting confused /
> offended about violating the One Definition Rule. This was only happening in
> Debug mode where the methods were not being inlined.
>
> To fix this all, I've force-inlined the methods of the low-level code and
> removed the anonymous namespace.
>
> BUG=skia:4117
>
>
> [1] Here is what those errors looked like:
>
> In file included from ../../../../src/core/SkOpts.cpp:18:0:
> ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror]
> class Sk4pxXfermode : public SkProcCoeffXfermode {
> ^
> ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror]
> ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror]
> class SkPMFloatXfermode : public SkProcCoeffXfermode {
> ^
> cc1plus: all warnings being treated as errors
>
> Committed: https://skia.googlesource.com/skia/+/b07bee3121680b53b98b780ac08d14d374dd4c6f

TBR=djsollen@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4117

Review URL: https://codereview.chromium.org/1284333002
/external/skia/src/core/SkOpts.cpp
b07bee3121680b53b98b780ac08d14d374dd4c6f 12-Aug-2015 mtklein <mtklein@chromium.org> Refactor to put SkXfermode_opts inside SK_OPTS_NS.

Without this refactor I was getting warnings previously about having code
inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to
code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1].

That low-level code was in an anonymous namespace to allow multiple independent
copies of its methods to be instantiated without the linker getting confused /
offended about violating the One Definition Rule. This was only happening in
Debug mode where the methods were not being inlined.

To fix this all, I've force-inlined the methods of the low-level code and
removed the anonymous namespace.

BUG=skia:4117

[1] Here is what those errors looked like:

In file included from ../../../../src/core/SkOpts.cpp:18:0:
../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror]
class Sk4pxXfermode : public SkProcCoeffXfermode {
^
../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror]
../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror]
class SkPMFloatXfermode : public SkProcCoeffXfermode {
^
cc1plus: all warnings being treated as errors

Review URL: https://codereview.chromium.org/1286093004
/external/skia/src/core/SkOpts.cpp
4977983510028712528743aa877f6da83781b381 10-Aug-2015 mtklein <mtklein@chromium.org> Sk4px blit mask.

Local SKP nanobenching ranges SSE between 1.05x and 0.87x, much more heavily weighted toward <1.0x ratios (speedups).
I profiled the top five regressions (1.05x-1.01x) and they look like noise. Will follow up after broad bot results.

NEON looks similar but less extreme than SSE changes, ranging between 1.02x and 0.95x, again mostly speedups in 0.99x-0.97x range.

The old code trifurcated into black, opaque-but-not-black, and general versions as a function of the constant src color. I did not see a significant difference between general and opaque-but-not-black, and I don't think a black version would be faster using SIMD. So we have here just one version of the code, the general version.

Somewhat fantastically, I see no pixel diffs on GMs or SKPs.

I will be following up with more CLs for the other procs called by SkBlitMask.
BUG=skia:

Review URL: https://codereview.chromium.org/1278253003
/external/skia/src/core/SkOpts.cpp
b6394746ff546a9c60d68e3be162cb38feffa803 06-Aug-2015 mtklein <mtklein@chromium.org> Port SkTextureCompression opts to SkOpts

Pretty vanilla translation. I cleaned up who calls whom a little.
Used to be utils -> opts -> utils, now it's just utils -> opts.

I may follow up with a pass over the NEON code for readability
and to clean up dead code.

This turns on NEON A8->R11EAC conversion for ARMv8.
Unit tests which now hit the NEON code still pass.
I can't find any related bench.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1273103002
/external/skia/src/core/SkOpts.cpp
d029ded92d409a004f2096c78f5a99b524206481 04-Aug-2015 mtklein <mtklein@chromium.org> Port morphology to SkOpts.

Nothing too fancy.

Direction enums become enum classes so they don't get all confused. An
alternative is to create one single Direction enum that both blur and
morphology opts use.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1267343004
/external/skia/src/core/SkOpts.cpp
8caa5af92cf91debc1598380cb72c330e8c63efb 04-Aug-2015 Mike Klein <mtklein@chromium.org> Reorganize to keep similar code together.

This organizes memset16, memset32, and rsqrt the same way as the other code. No functional change.

BUG=skia:4117
R=djsollen@google.com

Review URL: https://codereview.chromium.org/1264423002 .
/external/skia/src/core/SkOpts.cpp
dce5ce4276e2825efc6d8c4daa819c965794cd12 04-Aug-2015 mtklein <mtklein@chromium.org> Port SkBlurImage opts to SkOpts.

+268 -535 lines

I also rearranged the code a little bit to encapsulate itself better,
mostly replacing static helper functions with lambdas. This also
let me merge the SSE2 and SSE4.1 code paths.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1264103004
/external/skia/src/core/SkOpts.cpp
f96bee384dea60a4e96035878e2671cd49d3b406 31-Jul-2015 mtklein <mtklein@chromium.org> disable SkOpts on x86 iOS (simulator)

TBR=

BUG=skia:

Review URL: https://codereview.chromium.org/1266443006
/external/skia/src/core/SkOpts.cpp
490b61569d27c9b7ba164fbc4394994d2e7cb022 31-Jul-2015 mtklein <mtklein@chromium.org> Port SkXfermode opts to SkOpts.h

Renames Sk4pxXfermode.h to SkXfermode_opts.h,
and refactors it a tiny bit internally.

This moves xfermode optimization from being "compile-time everywhere but NEON"
to simply "runtime everywhere". I don't anticipate any effect on perf or
correctness.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1264543006
/external/skia/src/core/SkOpts.cpp
7eb0945af254d376df11475150d184623104cf93 31-Jul-2015 mtklein <mtklein@chromium.org> Port SkUtils opts to SkOpts.

With this new arrangement, the benefits of inlining sk_memset16/32 have changed.

On x86, they're not significantly different, except for small N<=10 where the inlined code is significantly slower.
On ARMv7 with NEON, our custom code is still significantly faster for N>10 (up to 2x faster). For small N<=10 inlining is still significantly faster.
On ARMv7 without NEON, our custom code is still ridiculously faster (up to 10x) than inlining for N>10, though for small N<=10 inlining is still a little faster.

We were not using the NEON memset16 and memset32 procs on ARMv8. At first blush, that seems to be an oversight, but if so it's an extremely lucky one. The ARMv8 code generation for our memset16/32 procs is total garbage, leaving those methods ~8x slower than just inlining the memset, using the compiler's autovectorization.

So, no need to inline any more on x86, and still inline for N<=10 on ARMv7. Always inline for ARMv8.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1270573002
/external/skia/src/core/SkOpts.cpp
f684a78d9ea988883c9b2c7bcc4ea4d5e68bd998 30-Jul-2015 mtklein <mtklein@chromium.org> Runtime CPU detection for rsqrt().

This enables the NEON sk_float_rsqrt() code for configurations that have NEON at run-time but not compile-time.

These devices will see about a 2x (1.26 -> 2.33) slowdown in sk_float_rsqrt(), but it should be more precise than our portable fallback.

(When inlined, the portable fallback and the NEON code are almost identical in speed. The only difference is precision. Going through a function pointer is causing all this slowdown. This is a good example of a place where Skia really benefits from compile-time NEON.)

BUG=skia:4117,skia:4114

No public API changes.
TBR=reed@google.com

Review URL: https://codereview.chromium.org/1264893002
/external/skia/src/core/SkOpts.cpp
8317a1832f55e175531ee7ae7ccd12a3a15e3c75 30-Jul-2015 mtklein <mtklein@chromium.org> Lay groundwork for SkOpts.

This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks.

BUG=skia:4117

Committed: https://skia.googlesource.com/skia/+/ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827

Review URL: https://codereview.chromium.org/1255193002
/external/skia/src/core/SkOpts.cpp
56b78a7a2aa87998e6a3f3d026e2184f1bccaf0c 27-Jul-2015 mtklein <mtklein@chromium.org> Revert of Lay groundwork for SkOpts. (patchset #3 id:40001 of https://codereview.chromium.org/1255193002/)

Reason for revert:
Chromium doesn't call SkGraphics::Init(). This setup won't work.

Original issue's description:
> Lay groundwork for SkOpts.
>
> This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks.
>
> BUG=skia:4117
>
> Committed: https://skia.googlesource.com/skia/+/ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827

TBR=djsollen@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4117

Review URL: https://codereview.chromium.org/1261743002
/external/skia/src/core/SkOpts.cpp
ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827 27-Jul-2015 mtklein <mtklein@chromium.org> Lay groundwork for SkOpts.

This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1255193002
/external/skia/src/core/SkOpts.cpp