Cross Reference: /external/skia/src/opts/Sk4px

History log of /external/skia/src/opts/Sk4px_none.h
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
7c249e531900929c2fe2cdde76619fa6d2538c49	21-Feb-2016	mtklein <mtklein@chromium.org>	SkNx: kth<...>() -> [...] Just some syntax cleanup. No real change: kth<...>() was calling [...] already. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1714363002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1714363002 /external/skia/src/opts/Sk4px_none.h
defa0daa6a0f4e97a3527a522ae602c6771a7c80	08-Jan-2016	mtklein <mtklein@chromium.org>	Clean up SkXfermode_opts.h It seems that MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially[1] and using the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. [1] the 565->8888 conversion is actually being autovectorized BUG=skia:4765,skia:4776 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1565223002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot No public API changes. TBR=reed@google.com Review URL: https://codereview.chromium.org/1565223002 /external/skia/src/opts/Sk4px_none.h
cbf4fba43933302a846872e4c5ce8f1adb8b325e	17-Nov-2015	mtklein <mtklein@chromium.org>	div255(x) as ((x+128)257)>>16 with SSE _mm_mulhi_epu16 makes the (...257)>>16 part simple. This seems to speed up every transfermode that uses div255(), in the 7-25% range. It even appears to obviate the need for approxMulDiv255() on SSE. I'm not sure about NEON yet, so I'll keep approxMulDiv255() for now. Should be no pixels change: https://gold.skia.org/search2?issue=1452903004&unt=true&query=source_type%3Dgm&master=false BUG=skia: CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1452903004 /external/skia/src/opts/Sk4px_none.h
082e329887a8f1efe4e1020f0a0a6ea09961712d	12-Aug-2015	mtklein <mtklein@google.com>	Revert of Refactor to put SkXfermode_opts inside SK_OPTS_NS. (patchset #1 id:1 of https://codereview.chromium.org/1286093004/ ) Reason for revert: Maybe causing test / gold problems? Original issue's description: > Refactor to put SkXfermode_opts inside SK_OPTS_NS. > > Without this refactor I was getting warnings previously about having code > inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to > code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1]. > > That low-level code was in an anonymous namespace to allow multiple independent > copies of its methods to be instantiated without the linker getting confused / > offended about violating the One Definition Rule. This was only happening in > Debug mode where the methods were not being inlined. > > To fix this all, I've force-inlined the methods of the low-level code and > removed the anonymous namespace. > > BUG=skia:4117 > > > [1] Here is what those errors looked like: > > In file included from ../../../../src/core/SkOpts.cpp:18:0: > ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror] > class Sk4pxXfermode : public SkProcCoeffXfermode { > ^ > ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror] > ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror] > class SkPMFloatXfermode : public SkProcCoeffXfermode { > ^ > cc1plus: all warnings being treated as errors > > Committed: https://skia.googlesource.com/skia/+/b07bee3121680b53b98b780ac08d14d374dd4c6f TBR=djsollen@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1284333002 /external/skia/src/opts/Sk4px_none.h
b07bee3121680b53b98b780ac08d14d374dd4c6f	12-Aug-2015	mtklein <mtklein@chromium.org>	Refactor to put SkXfermode_opts inside SK_OPTS_NS. Without this refactor I was getting warnings previously about having code inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1]. That low-level code was in an anonymous namespace to allow multiple independent copies of its methods to be instantiated without the linker getting confused / offended about violating the One Definition Rule. This was only happening in Debug mode where the methods were not being inlined. To fix this all, I've force-inlined the methods of the low-level code and removed the anonymous namespace. BUG=skia:4117 [1] Here is what those errors looked like: In file included from ../../../../src/core/SkOpts.cpp:18:0: ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror] class Sk4pxXfermode : public SkProcCoeffXfermode { ^ ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror] ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror] class SkPMFloatXfermode : public SkProcCoeffXfermode { ^ cc1plus: all warnings being treated as errors Review URL: https://codereview.chromium.org/1286093004 /external/skia/src/opts/Sk4px_none.h
ced1585149d4ac8a68efd80e11dbc23bca6620d4	22-Jul-2015	mtklein <mtklein@chromium.org>	565 support for SIMD xfermodes This uses the most basic approach possible: - to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors. - to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially. Clearly, we can optimize these loads and stores. That's a TODO. The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling. The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster. This will cause minor GM diffs, but I don't think any layout test changes. BUG=skia: Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0 Committed: https://skia.googlesource.com/skia/+/860dcaa2ddfdadc050af4f943a84a9d499315066 Review URL: https://codereview.chromium.org/1245673002 /external/skia/src/opts/Sk4px_none.h
78f55fc4bf4c1a8f7cb6395e9a30e16e5e4622a8	22-Jul-2015	mtklein <mtklein@google.com>	Revert of 565 support for SIMD xfermodes (patchset #4 id:60001 of https://codereview.chromium.org/1245673002/) Reason for revert: NEON 565 gold images have gone ugly. This is what I get for writing and testing SSE and just writing NEON. E.g. colortype_xfermodes, dstreadshuffle, bigbitmaprect, pictures, textbloblooper, aaxfermodes (only Plus) Original issue's description: > 565 support for SIMD xfermodes > > This uses the most basic approach possible: > - to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors. > - to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially. > > Clearly, we can optimize these loads and stores. That's a TODO. > > The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling. > > The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster. > > This will cause minor GM diffs, but I don't think any layout test changes. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0 > > Committed: https://skia.googlesource.com/skia/+/860dcaa2ddfdadc050af4f943a84a9d499315066 TBR=msarett@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1248893004 /external/skia/src/opts/Sk4px_none.h
860dcaa2ddfdadc050af4f943a84a9d499315066	22-Jul-2015	mtklein <mtklein@chromium.org>	565 support for SIMD xfermodes This uses the most basic approach possible: - to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors. - to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially. Clearly, we can optimize these loads and stores. That's a TODO. The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling. The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster. This will cause minor GM diffs, but I don't think any layout test changes. BUG=skia: Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0 Review URL: https://codereview.chromium.org/1245673002 /external/skia/src/opts/Sk4px_none.h
4be181e304d2b280c6801bd13369cfba236d1a66	14-Jul-2015	mtklein <mtklein@chromium.org>	3-15% speedup to HardLight / Overlay xfermodes. While investigating my bug (skia:4052) I saw this TODO and figured it'd make me feel better about an otherwise unsuccessful investigation. This speeds up HardLight and Overlay (same code) by about 15% with SSE, mostly by rewriting the logic from 1 cheap comparison and 2 expensive div255() calls to 2 cheap comparisons and 1 expensive div255(). NEON speeds up by a more modest ~3%. BUG=skia: Review URL: https://codereview.chromium.org/1230663005 /external/skia/src/opts/Sk4px_none.h
059ac00446404506a46cd303db15239c7aae49d5	22-Jun-2015	mtklein <mtklein@chromium.org>	Update some Sk4px APIs. Mostly this is about ergonomics, making it easier to do good operations and hard / impossible to do bad ones. - SkAlpha / SkPMColor constructors become static factories. - Remove div255TruncNarrow(), rename div255RoundNarrow() to div255(). In practice we always want to round, and the narrowing to 8-bit is contextually obvious. - Rename fastMulDiv255Round() approxMulDiv255() to stress it's approximate-ness over its speed. Drop Round for the same reason as above... we should always round. - Add operator overloads so we don't have to keep throwing in seemingly-random Sk4px() or Sk4px::Wide() casts. - use operator*() for 8-bit x 8-bit -> 16-bit math. It's always what we want, and there's generally no 8x8->8 alternative. - MapFoo can take a const Func&. Don't think it makes a big difference, but nice to do. BUG=skia: Review URL: https://codereview.chromium.org/1202013002 /external/skia/src/opts/Sk4px_none.h
aa999cb952753d373c03befa7fce8c4700ef9308	23-May-2015	mtklein <mtklein@chromium.org>	Everyone gets a namespace {}. If we include Sk4px.h, SkPMFloat.h, or SkNx.h into files with different SIMD flags, that could cause different definitions of the same method. Normally that's moot, because all the code inlines, but in Debug it tends not to. So in Debug, the linker picks one definition for us. That breaks _someone_. Wrapping everything in a namespace {} keeps the definitions separate. Tested locally, it fixes this bug. BUG=skia:3861 This code is not yet enabled in Chrome, so shouldn't affect the roll. NOTREECHECKS=true Review URL: https://codereview.chromium.org/1154523004 /external/skia/src/opts/Sk4px_none.h
0135a41e095a433414e21e37b277dab7dcbec373	15-May-2015	mtklein <mtklein@chromium.org>	Sk4px: Difference and Exclusion This will cause minor (off-by-one) diffs due to a little lost precision: colortype_xfermodes mixed_xfermodes xfermodes2 xfermodeimagefilter xfermodes3 xfermodes Desktop: Xfermode_Difference_aa 9.77ms -> 7.32ms 0.75x Xfermode_Exclusion_aa 8.49ms -> 6.21ms 0.73x Xfermode_Difference 17ms -> 7.54ms 0.44x Xfermode_Exclusion 13.5ms -> 5.09ms 0.38x N7: Xfermode_Difference_aa 32.2ms -> 27.6ms 0.86x Xfermode_Difference 43.9ms -> 32ms 0.73x Xfermode_Exclusion_aa 40.5ms -> 26.7ms 0.66x Xfermode_Exclusion 71.5ms -> 23.9ms 0.33x This wraps up the xfermodes implemented in Sk4f. BUG=skia: Review URL: https://codereview.chromium.org/1141213002 /external/skia/src/opts/Sk4px_none.h
8a90edc2a58a4f8a4b4da73eb08e943be09538c0	13-May-2015	mtklein <mtklein@chromium.org>	Sk4px: alphas() and Load[24]Alphas() alphas() extracts the 4 alphas from an existing Sk4px as another Sk4px. LoadNAlphas() constructs an Sk4px from N packed alphas. In both cases, we end up with 4x repeated alphas aligned with their pixels. alphas() A0 R0 G0 B0 A1 R1 G1 B1 A2 R2 G2 B2 A3 R3 G3 B3 -> A0 A0 A0 A0 A1 A1 A1 A1 A2 A2 A2 A2 A3 A3 A3 A3 Load4Alphas() A0 A1 A2 A3 -> A0 A0 A0 A0 A1 A1 A1 A1 A2 A2 A2 A2 A3 A3 A3 A3 Load2Alphas() A0 A1 -> A0 A0 A0 A0 A1 A1 A1 A1 0 0 0 0 0 0 0 0 This is a 5-10% speedup for AA on Intel, and wash on ARM. AA is still mostly dominated by the final lerp. alphas() isn't used yet, but it's similar enough to Load[24]Alphas() that it was easier to write all at once. BUG=skia: Review URL: https://codereview.chromium.org/1138333003 /external/skia/src/opts/Sk4px_none.h
d2ffd36eb62e99abe2920369d1e040954cc2044f	12-May-2015	mtklein <mtklein@chromium.org>	Sk4px Xfermode_SrcOver: SSE: 2.08ms -> 2.03ms (~2% faster) NEON: my N5 is noisy, but there appears to be no perf change BUG=skia: Review URL: https://codereview.chromium.org/1132273004 /external/skia/src/opts/Sk4px_none.h