History log of /external/skia/src/opts/Sk4px_none.h
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
7c249e531900929c2fe2cdde76619fa6d2538c49 21-Feb-2016 mtklein <mtklein@chromium.org> SkNx: kth<...>() -> [...]

Just some syntax cleanup. No real change: kth<...>() was calling [...] already.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1714363002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1714363002
/external/skia/src/opts/Sk4px_none.h
defa0daa6a0f4e97a3527a522ae602c6771a7c80 08-Jan-2016 mtklein <mtklein@chromium.org> Clean up SkXfermode_opts.h

It seems that MSVC + __vectorcall don't play well together,
so back ourselves out into a situation where we don't need it.

- Inline transfermode functions. This removes the need for SK_VECTORCALL.
- Remove 565 destination specializations.
Blending into 565 is not speed-critical enough to merit the code bloat.
- Removing 565 specializations means a bunch of Sk4px code is now dead.

8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate.

565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially[1] and using the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate.

[1] the 565->8888 conversion is actually being autovectorized

BUG=skia:4765,skia:4776
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1565223002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

No public API changes.
TBR=reed@google.com

Review URL: https://codereview.chromium.org/1565223002
/external/skia/src/opts/Sk4px_none.h
cbf4fba43933302a846872e4c5ce8f1adb8b325e 17-Nov-2015 mtklein <mtklein@chromium.org> div255(x) as ((x+128)*257)>>16 with SSE

_mm_mulhi_epu16 makes the (...*257)>>16 part simple.
This seems to speed up every transfermode that uses div255(),
in the 7-25% range.

It even appears to obviate the need for approxMulDiv255() on SSE.
I'm not sure about NEON yet, so I'll keep approxMulDiv255() for now.

Should be no pixels change:
https://gold.skia.org/search2?issue=1452903004&unt=true&query=source_type%3Dgm&master=false

BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1452903004
/external/skia/src/opts/Sk4px_none.h
082e329887a8f1efe4e1020f0a0a6ea09961712d 12-Aug-2015 mtklein <mtklein@google.com> Revert of Refactor to put SkXfermode_opts inside SK_OPTS_NS. (patchset #1 id:1 of https://codereview.chromium.org/1286093004/ )

Reason for revert:
Maybe causing test / gold problems?

Original issue's description:
> Refactor to put SkXfermode_opts inside SK_OPTS_NS.
>
> Without this refactor I was getting warnings previously about having code
> inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to
> code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1].
>
> That low-level code was in an anonymous namespace to allow multiple independent
> copies of its methods to be instantiated without the linker getting confused /
> offended about violating the One Definition Rule. This was only happening in
> Debug mode where the methods were not being inlined.
>
> To fix this all, I've force-inlined the methods of the low-level code and
> removed the anonymous namespace.
>
> BUG=skia:4117
>
>
> [1] Here is what those errors looked like:
>
> In file included from ../../../../src/core/SkOpts.cpp:18:0:
> ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror]
> class Sk4pxXfermode : public SkProcCoeffXfermode {
> ^
> ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror]
> ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror]
> class SkPMFloatXfermode : public SkProcCoeffXfermode {
> ^
> cc1plus: all warnings being treated as errors
>
> Committed: https://skia.googlesource.com/skia/+/b07bee3121680b53b98b780ac08d14d374dd4c6f

TBR=djsollen@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4117

Review URL: https://codereview.chromium.org/1284333002
/external/skia/src/opts/Sk4px_none.h
b07bee3121680b53b98b780ac08d14d374dd4c6f 12-Aug-2015 mtklein <mtklein@chromium.org> Refactor to put SkXfermode_opts inside SK_OPTS_NS.

Without this refactor I was getting warnings previously about having code
inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to
code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1].

That low-level code was in an anonymous namespace to allow multiple independent
copies of its methods to be instantiated without the linker getting confused /
offended about violating the One Definition Rule. This was only happening in
Debug mode where the methods were not being inlined.

To fix this all, I've force-inlined the methods of the low-level code and
removed the anonymous namespace.

BUG=skia:4117

[1] Here is what those errors looked like:

In file included from ../../../../src/core/SkOpts.cpp:18:0:
../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror]
class Sk4pxXfermode : public SkProcCoeffXfermode {
^
../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror]
../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror]
class SkPMFloatXfermode : public SkProcCoeffXfermode {
^
cc1plus: all warnings being treated as errors

Review URL: https://codereview.chromium.org/1286093004
/external/skia/src/opts/Sk4px_none.h
ced1585149d4ac8a68efd80e11dbc23bca6620d4 22-Jul-2015 mtklein <mtklein@chromium.org> 565 support for SIMD xfermodes

This uses the most basic approach possible:
- to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors.
- to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially.

Clearly, we can optimize these loads and stores. That's a TODO.

The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling.

The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster.

This will cause minor GM diffs, but I don't think any layout test changes.

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0

Committed: https://skia.googlesource.com/skia/+/860dcaa2ddfdadc050af4f943a84a9d499315066

Review URL: https://codereview.chromium.org/1245673002
/external/skia/src/opts/Sk4px_none.h
78f55fc4bf4c1a8f7cb6395e9a30e16e5e4622a8 22-Jul-2015 mtklein <mtklein@google.com> Revert of 565 support for SIMD xfermodes (patchset #4 id:60001 of https://codereview.chromium.org/1245673002/)

Reason for revert:
NEON 565 gold images have gone ugly. This is what I get for writing and testing SSE and just writing NEON.

E.g. colortype_xfermodes, dstreadshuffle, bigbitmaprect, pictures, textbloblooper, aaxfermodes (only Plus)

Original issue's description:
> 565 support for SIMD xfermodes
>
> This uses the most basic approach possible:
> - to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors.
> - to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially.
>
> Clearly, we can optimize these loads and stores. That's a TODO.
>
> The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling.
>
> The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster.
>
> This will cause minor GM diffs, but I don't think any layout test changes.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0
>
> Committed: https://skia.googlesource.com/skia/+/860dcaa2ddfdadc050af4f943a84a9d499315066

TBR=msarett@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1248893004
/external/skia/src/opts/Sk4px_none.h
860dcaa2ddfdadc050af4f943a84a9d499315066 22-Jul-2015 mtklein <mtklein@chromium.org> 565 support for SIMD xfermodes

This uses the most basic approach possible:
- to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors.
- to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially.

Clearly, we can optimize these loads and stores. That's a TODO.

The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling.

The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster.

This will cause minor GM diffs, but I don't think any layout test changes.

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0

Review URL: https://codereview.chromium.org/1245673002
/external/skia/src/opts/Sk4px_none.h
4be181e304d2b280c6801bd13369cfba236d1a66 14-Jul-2015 mtklein <mtklein@chromium.org> 3-15% speedup to HardLight / Overlay xfermodes.

While investigating my bug (skia:4052) I saw this TODO and figured
it'd make me feel better about an otherwise unsuccessful investigation.

This speeds up HardLight and Overlay (same code) by about 15% with SSE, mostly
by rewriting the logic from 1 cheap comparison and 2 expensive div255() calls
to 2 cheap comparisons and 1 expensive div255().

NEON speeds up by a more modest ~3%.

BUG=skia:

Review URL: https://codereview.chromium.org/1230663005
/external/skia/src/opts/Sk4px_none.h
059ac00446404506a46cd303db15239c7aae49d5 22-Jun-2015 mtklein <mtklein@chromium.org> Update some Sk4px APIs.

Mostly this is about ergonomics, making it easier to do good operations and hard / impossible to do bad ones.

- SkAlpha / SkPMColor constructors become static factories.
- Remove div255TruncNarrow(), rename div255RoundNarrow() to div255(). In practice we always want to round, and the narrowing to 8-bit is contextually obvious.
- Rename fastMulDiv255Round() approxMulDiv255() to stress it's approximate-ness over its speed. Drop Round for the same reason as above... we should always round.
- Add operator overloads so we don't have to keep throwing in seemingly-random Sk4px() or Sk4px::Wide() casts.
- use operator*() for 8-bit x 8-bit -> 16-bit math. It's always what we want, and there's generally no 8x8->8 alternative.
- MapFoo can take a const Func&. Don't think it makes a big difference, but nice to do.

BUG=skia:

Review URL: https://codereview.chromium.org/1202013002
/external/skia/src/opts/Sk4px_none.h
aa999cb952753d373c03befa7fce8c4700ef9308 23-May-2015 mtklein <mtklein@chromium.org> Everyone gets a namespace {}.

If we include Sk4px.h, SkPMFloat.h, or SkNx.h into files with different
SIMD flags, that could cause different definitions of the same method.

Normally that's moot, because all the code inlines, but in Debug it tends not
to. So in Debug, the linker picks one definition for us. That breaks _someone_.

Wrapping everything in a namespace {} keeps the definitions separate.

Tested locally, it fixes this bug.
BUG=skia:3861

This code is not yet enabled in Chrome, so shouldn't affect the roll.
NOTREECHECKS=true

Review URL: https://codereview.chromium.org/1154523004
/external/skia/src/opts/Sk4px_none.h
0135a41e095a433414e21e37b277dab7dcbec373 15-May-2015 mtklein <mtklein@chromium.org> Sk4px: Difference and Exclusion

This will cause minor (off-by-one) diffs due to a little lost precision:
colortype_xfermodes
mixed_xfermodes
xfermodes2
xfermodeimagefilter
xfermodes3
xfermodes

Desktop:
Xfermode_Difference_aa 9.77ms -> 7.32ms 0.75x
Xfermode_Exclusion_aa 8.49ms -> 6.21ms 0.73x
Xfermode_Difference 17ms -> 7.54ms 0.44x
Xfermode_Exclusion 13.5ms -> 5.09ms 0.38x

N7:
Xfermode_Difference_aa 32.2ms -> 27.6ms 0.86x
Xfermode_Difference 43.9ms -> 32ms 0.73x
Xfermode_Exclusion_aa 40.5ms -> 26.7ms 0.66x
Xfermode_Exclusion 71.5ms -> 23.9ms 0.33x

This wraps up the xfermodes implemented in Sk4f.

BUG=skia:

Review URL: https://codereview.chromium.org/1141213002
/external/skia/src/opts/Sk4px_none.h
8a90edc2a58a4f8a4b4da73eb08e943be09538c0 13-May-2015 mtklein <mtklein@chromium.org> Sk4px: alphas() and Load[24]Alphas()

alphas() extracts the 4 alphas from an existing Sk4px as another Sk4px.
LoadNAlphas() constructs an Sk4px from N packed alphas.

In both cases, we end up with 4x repeated alphas aligned with their pixels.

alphas()
A0 R0 G0 B0 A1 R1 G1 B1 A2 R2 G2 B2 A3 R3 G3 B3
->
A0 A0 A0 A0 A1 A1 A1 A1 A2 A2 A2 A2 A3 A3 A3 A3

Load4Alphas()
A0 A1 A2 A3
->
A0 A0 A0 A0 A1 A1 A1 A1 A2 A2 A2 A2 A3 A3 A3 A3

Load2Alphas()
A0 A1
->
A0 A0 A0 A0 A1 A1 A1 A1 0 0 0 0 0 0 0 0

This is a 5-10% speedup for AA on Intel, and wash on ARM.
AA is still mostly dominated by the final lerp.

alphas() isn't used yet, but it's similar enough to Load[24]Alphas()
that it was easier to write all at once.

BUG=skia:

Review URL: https://codereview.chromium.org/1138333003
/external/skia/src/opts/Sk4px_none.h
d2ffd36eb62e99abe2920369d1e040954cc2044f 12-May-2015 mtklein <mtklein@chromium.org> Sk4px

Xfermode_SrcOver:
SSE: 2.08ms -> 2.03ms (~2% faster)
NEON: my N5 is noisy, but there appears to be no perf change

BUG=skia:

Review URL: https://codereview.chromium.org/1132273004
/external/skia/src/opts/Sk4px_none.h