History log of /external/skia/src/opts/opts_check_x86.cpp
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
4e5a758f3832594cf1828d367496b5a80bcab8ee 05-Jan-2016 reed <reed@chromium.org> remove shadeSpan16 from shader

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1556003003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1556003003
/external/skia/src/opts/opts_check_x86.cpp
e735f163035c4df79024d3bfee8adbe4e4409e1e 05-Jan-2016 mtklein <mtklein@google.com> Revert of Try using std::call_once (patchset #1 id:1 of https://codereview.chromium.org/1550893002/ )

Reason for revert:
Can't use on XP. :(

Original issue's description:
> Try using std::call_once
>
> Now that we've got std library support, perhaps we should start using it.
> This CL acts as a little canary, and may help fix the linked bug.
>
> I'm not really sure what's going on in the linked bug, but using
> std::call_once over homegrown atomics has to be the right answer...
>
> BUG=chromium:418041
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1550893002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Going to land this ahead of review while the tree is quiet to see how it rolls.
> TBR=herb@google.com
>
> Committed: https://skia.googlesource.com/skia/+/8895b72f789e5dc8bb99cb9727875439005fc919

TBR=herb@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=chromium:418041

Review URL: https://codereview.chromium.org/1552333003
/external/skia/src/opts/opts_check_x86.cpp
8895b72f789e5dc8bb99cb9727875439005fc919 29-Dec-2015 mtklein <mtklein@chromium.org> Try using std::call_once

Now that we've got std library support, perhaps we should start using it.
This CL acts as a little canary, and may help fix the linked bug.

I'm not really sure what's going on in the linked bug, but using
std::call_once over homegrown atomics has to be the right answer...

BUG=chromium:418041
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1550893002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Going to land this ahead of review while the tree is quiet to see how it rolls.
TBR=herb@google.com

Review URL: https://codereview.chromium.org/1550893002
/external/skia/src/opts/opts_check_x86.cpp
6c59d80858f453a426df9b07fdf3a8cc01e0b906 09-Sep-2015 mtklein <mtklein@chromium.org> Port uses of SkLazyPtr to SkOncePtr.

This gives SkOncePtr a non-trivial destructor that uses std::default_delete
by default. This is overrideable, as seen in SkColorTable.

SK_DECLARE_STATIC_ONCE_PTR still just leaves its pointers hanging at EOP.

BUG=skia:

No public API changes.
TBR=reed@google.com

Committed: https://skia.googlesource.com/skia/+/a1254acdb344174e761f5061c820559dab64a74c

Review URL: https://codereview.chromium.org/1322933005
/external/skia/src/opts/opts_check_x86.cpp
2ac6793efc9b33f6104f9c39810bee5714bdc208 09-Sep-2015 mtklein <mtklein@google.com> Revert of Port uses of SkLazyPtr to SkOncePtr. (patchset #7 id:110001 of https://codereview.chromium.org/1322933005/ )

Reason for revert:
Breaks Chrome roll.

obj/skia/ext/skia_chrome.skia_memory_dump_provider.o
does not have -I include/private on its include path, but transitively includes SkMessageBus.h.

Original issue's description:
> Port uses of SkLazyPtr to SkOncePtr.
>
> This gives SkOncePtr a non-trivial destructor that uses std::default_delete
> by default. This is overrideable, as seen in SkColorTable.
>
> SK_DECLARE_STATIC_ONCE_PTR still just leaves its pointers hanging at EOP.
>
> BUG=skia:
>
> No public API changes.
> TBR=reed@google.com
>
> Committed: https://skia.googlesource.com/skia/+/a1254acdb344174e761f5061c820559dab64a74c

TBR=herb@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1334523002
/external/skia/src/opts/opts_check_x86.cpp
a1254acdb344174e761f5061c820559dab64a74c 09-Sep-2015 mtklein <mtklein@chromium.org> Port uses of SkLazyPtr to SkOncePtr.

This gives SkOncePtr a non-trivial destructor that uses std::default_delete
by default. This is overrideable, as seen in SkColorTable.

SK_DECLARE_STATIC_ONCE_PTR still just leaves its pointers hanging at EOP.

BUG=skia:

No public API changes.
TBR=reed@google.com

Review URL: https://codereview.chromium.org/1322933005
/external/skia/src/opts/opts_check_x86.cpp
96fcdcc219d2a0d3579719b84b28bede76efba64 27-Aug-2015 halcanary <halcanary@google.com> Style Change: NULL->nullptr
DOCS_PREVIEW= https://skia.org/?cl=1316233002

Review URL: https://codereview.chromium.org/1316233002
/external/skia/src/opts/opts_check_x86.cpp
385fe4d4b62d7d1dd76116dd570df3290a2f487b 26-Aug-2015 halcanary <halcanary@google.com> Style Change: SkNEW->new; SkDELETE->delete
DOCS_PREVIEW= https://skia.org/?cl=1316123003

Review URL: https://codereview.chromium.org/1316123003
/external/skia/src/opts/opts_check_x86.cpp
4977983510028712528743aa877f6da83781b381 10-Aug-2015 mtklein <mtklein@chromium.org> Sk4px blit mask.

Local SKP nanobenching ranges SSE between 1.05x and 0.87x, much more heavily weighted toward <1.0x ratios (speedups).
I profiled the top five regressions (1.05x-1.01x) and they look like noise. Will follow up after broad bot results.

NEON looks similar but less extreme than SSE changes, ranging between 1.02x and 0.95x, again mostly speedups in 0.99x-0.97x range.

The old code trifurcated into black, opaque-but-not-black, and general versions as a function of the constant src color. I did not see a significant difference between general and opaque-but-not-black, and I don't think a black version would be faster using SIMD. So we have here just one version of the code, the general version.

Somewhat fantastically, I see no pixel diffs on GMs or SKPs.

I will be following up with more CLs for the other procs called by SkBlitMask.
BUG=skia:

Review URL: https://codereview.chromium.org/1278253003
/external/skia/src/opts/opts_check_x86.cpp
d029ded92d409a004f2096c78f5a99b524206481 04-Aug-2015 mtklein <mtklein@chromium.org> Port morphology to SkOpts.

Nothing too fancy.

Direction enums become enum classes so they don't get all confused. An
alternative is to create one single Direction enum that both blur and
morphology opts use.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1267343004
/external/skia/src/opts/opts_check_x86.cpp
dce5ce4276e2825efc6d8c4daa819c965794cd12 04-Aug-2015 mtklein <mtklein@chromium.org> Port SkBlurImage opts to SkOpts.

+268 -535 lines

I also rearranged the code a little bit to encapsulate itself better,
mostly replacing static helper functions with lambdas. This also
let me merge the SSE2 and SSE4.1 code paths.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1264103004
/external/skia/src/opts/opts_check_x86.cpp
7eb0945af254d376df11475150d184623104cf93 31-Jul-2015 mtklein <mtklein@chromium.org> Port SkUtils opts to SkOpts.

With this new arrangement, the benefits of inlining sk_memset16/32 have changed.

On x86, they're not significantly different, except for small N<=10 where the inlined code is significantly slower.
On ARMv7 with NEON, our custom code is still significantly faster for N>10 (up to 2x faster). For small N<=10 inlining is still significantly faster.
On ARMv7 without NEON, our custom code is still ridiculously faster (up to 10x) than inlining for N>10, though for small N<=10 inlining is still a little faster.

We were not using the NEON memset16 and memset32 procs on ARMv8. At first blush, that seems to be an oversight, but if so it's an extremely lucky one. The ARMv8 code generation for our memset16/32 procs is total garbage, leaving those methods ~8x slower than just inlining the memset, using the compiler's autovectorization.

So, no need to inline any more on x86, and still inline for N<=10 on ARMv7. Always inline for ARMv8.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1270573002
/external/skia/src/opts/opts_check_x86.cpp
58fd2c8af4fc4debbf3c9f3cf7783971982bd6dc 27-Jul-2015 mtklein <mtklein@chromium.org> Remove sk_memcpy32

It's only implemented on x86, where the exisiting benchmark says memcpy() is
faster for all cases:

Timer overhead: 24ns
curr/maxrss loops min median mean max stddev samples config bench
10/10 MB 1 35.9µs 36.2µs 36.2µs 36.6µs 1% ▁▂▄▅▅▃█▄▄▅ nonrendering sk_memcpy32_100000
10/10 MB 13 2.27µs 2.28µs 2.28µs 2.29µs 0% █▄▃▅▃▁▃▅▁▄ nonrendering sk_memcpy32_10000
11/11 MB 677 91.6ns 95.9ns 94.5ns 99.4ns 3% ▅▅▅▅▅█▁▁▁▁ nonrendering sk_memcpy32_1000
11/11 MB 1171 20ns 20.9ns 21.3ns 23.4ns 6% ▁▁▇▃▃▃█▇▃▃ nonrendering sk_memcpy32_100
11/11 MB 1952 14ns 14ns 14.3ns 15.2ns 3% ▁▁██▁▁▁▁▁▁ nonrendering sk_memcpy32_10
11/11 MB 5 33.6µs 33.7µs 34.1µs 35.2µs 2% ▆▇█▁▁▁▁▁▁▁ nonrendering memcpy32_memcpy_100000
11/11 MB 18 2.12µs 2.22µs 2.24µs 2.39µs 5% ▂█▄▇█▄▇▁▁▁ nonrendering memcpy32_memcpy_10000
11/11 MB 1112 87.3ns 87.3ns 89.1ns 93.7ns 3% ▄██▄▁▁▁▁▁▁ nonrendering memcpy32_memcpy_1000
11/11 MB 2124 12.8ns 13.3ns 13.5ns 14.8ns 6% ▁▁▁█▃▃█▇▃▃ nonrendering memcpy32_memcpy_100
11/11 MB 3077 9ns 9.41ns 9.52ns 10.2ns 4% ▃█▁█▃▃▃▃▃▃ nonrendering memcpy32_memcpy_10

(Why? One fewer thing to port to SkOpts.)

BUG=skia:4117

Review URL: https://codereview.chromium.org/1256763003
/external/skia/src/opts/opts_check_x86.cpp
54f313ccb8eba45954fe0a45092433cbf739b053 20-Jul-2015 mtklein <mtklein@chromium.org> Clean up dead xfermode opts code.

Now that SK_SUPPORT_LEGACY_XFERMODES is unused, tons of code becomes dead.

Nothing is needed in opts/ anymore for x86.
We still do runtime NEON detection, which just duplicates Sk4pxXfermode.

TBR=reed@google.com

BUG=skia:

Review URL: https://codereview.chromium.org/1230023011
/external/skia/src/opts/opts_check_x86.cpp
4525f438a84147989de080e89cf1d97bc46eb546 07-May-2015 mtklein <mtklein@chromium.org> We don't use boxBlurY.

Also noticed nobody sets SK_DISABLE_BLUR_DIVISION_OPTIMIZATION.

BUG=skia:

Review URL: https://codereview.chromium.org/1134513003
/external/skia/src/opts/opts_check_x86.cpp
95cc012ccaea20f372893ae277ea0a8a6339d094 28-Apr-2015 mtklein <mtklein@chromium.org> De-proc Color32

Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
which is no longer needed.

Seems handy to have SkTypes include the relevant intrinsics when
we know we've got them, but I'm not married to it.

Locally this looks like a pointlessly small perf win, but I'm mostly
keen to get all the code together.

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b

Committed: https://skia.googlesource.com/skia/+/d65dc0cedd5b50dd407b6ff8fdc39123f11511cc

CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Mips-Debug-Android-Trybot

Review URL: https://codereview.chromium.org/1104183004
/external/skia/src/opts/opts_check_x86.cpp
641c3ff7c680ef7935d47d2e68f8301acc79e3de 27-Apr-2015 mtklein <mtklein@google.com> Revert of De-proc Color32 (patchset #5 id:80001 of https://codereview.chromium.org/1104183004/)

Reason for revert:
duh

Original issue's description:
> De-proc Color32
>
> Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
> which is no longer needed.
>
> Seems handy to have SkTypes include the relevant intrinsics when
> we know we've got them, but I'm not married to it.
>
> Locally this looks like a pointlessly small perf win, but I'm mostly
> keen to get all the code together.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b
>
> Committed: https://skia.googlesource.com/skia/+/d65dc0cedd5b50dd407b6ff8fdc39123f11511cc

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1102363006
/external/skia/src/opts/opts_check_x86.cpp
d65dc0cedd5b50dd407b6ff8fdc39123f11511cc 27-Apr-2015 mtklein <mtklein@chromium.org> De-proc Color32

Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
which is no longer needed.

Seems handy to have SkTypes include the relevant intrinsics when
we know we've got them, but I'm not married to it.

Locally this looks like a pointlessly small perf win, but I'm mostly
keen to get all the code together.

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b

Review URL: https://codereview.chromium.org/1104183004
/external/skia/src/opts/opts_check_x86.cpp
498856ebc6f22d7018071bd6696756d7cd077ab8 27-Apr-2015 mtklein <mtklein@google.com> Revert of De-proc Color32 (patchset #4 id:60001 of https://codereview.chromium.org/1104183004/)

Reason for revert:
MIPS

Original issue's description:
> De-proc Color32
>
> Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
> which is no longer needed.
>
> Seems handy to have SkTypes include the relevant intrinsics when
> we know we've got them, but I'm not married to it.
>
> Locally this looks like a pointlessly small perf win, but I'm mostly
> keen to get all the code together.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1108163002
/external/skia/src/opts/opts_check_x86.cpp
376e9bc206b69d9190f38dfebb132a8769bbd72b 27-Apr-2015 mtklein <mtklein@chromium.org> De-proc Color32

Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
which is no longer needed.

Seems handy to have SkTypes include the relevant intrinsics when
we know we've got them, but I'm not married to it.

Locally this looks like a pointlessly small perf win, but I'm mostly
keen to get all the code together.

BUG=skia:

Review URL: https://codereview.chromium.org/1104183004
/external/skia/src/opts/opts_check_x86.cpp
70840cbd898df67f603987213164c798415d76bf 20-Mar-2015 henrik.smiding <henrik.smiding@intel.com> Replace SSE optimization of Color32A_D565

Adds an SSE2 version of the Color32A_D565 function, to replace
the existing SSE4 version. Also does some minor cleanup.

Performance improvement in the following Skia benchmarks.
Measured on Atom Silvermont:
Xfermode_SrcOver - x3
luma_colorfilter_large - x4.6
luma_colorfilter_small - x2
tablebench - ~15%
chart_bw - ~10%

Measured on Corei7 Haswell:
luma_colorfilter_large running SSE2 - x2
luma_colorfilter_large running SSE4 - x2.3

Also improves performance in WPS Office application and 2D subtest of 0xbenchmark on Android.

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

Review URL: https://codereview.chromium.org/923523002
/external/skia/src/opts/opts_check_x86.cpp
edeccc58606e0421a1ae275e391ee4347c6f52f6 25-Feb-2015 mtklein <mtklein@chromium.org> Clean up ColorRectProc plumbing.

We've always been using the portable ColorRect32, so we don't need the
ColorRectProc plumbing.

Furthermore, ColorRect32 doesn't seem to be very important (we're only using
it in the opaque case, which our row-by-row procs already specialize for).
Remove that too.

If we find we want specialization for really narrow rects again, let's put it in
blitRect() directly. It's pretty unlikely we're going to get platform-specific
speedup for blits to non-contiguous memory.

My local SKP comparison is +- 3%... most neutral I've ever seen.

BUG=skia:

Review URL: https://codereview.chromium.org/959873002
/external/skia/src/opts/opts_check_x86.cpp
3704df347a585fdde78117fcedbee3c289b63e33 25-Feb-2015 henrik.smiding <henrik.smiding@intel.com> Remove SSE2 ColorRect32 code/files

Removes the disabled SSE2 optimization of ColorRect32 and deletes
the two files containing the code.
Measured on both Core Haswell and Atom Silvermont, and only got
some miniscule improvement compared to the default implementation.

Also tried to write a new, ultimate, version of this optimization,
but only got ~5% improvement on ColorRect32-heavy tests.

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

Review URL: https://codereview.chromium.org/957433002
/external/skia/src/opts/opts_check_x86.cpp
4e65473069b3a9382292577875366bd86c5777c8 10-Feb-2015 henrik.smiding <henrik.smiding@intel.com> Add SSE optimization of Color32A_D565

Adds an SSE4.1 version of the Color32A_D565 function.

Performance improvement in the following benchmarks:
Xfermode_SrcOver - ~100%
luma_colorfilter_large - ~150%
luma_colorfilter_small - ~60%
tablebench - ~10%
chart_bw - ~10%
(Measured on a Atom Silvermont core)

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

Review URL: https://codereview.chromium.org/892623002
/external/skia/src/opts/opts_check_x86.cpp
4bf1ce2709f5f4a4850d9b04b3213be732cbdf89 02-Feb-2015 stephana <stephana@google.com> Revert of Revert of SSE4 opaque blend using intrinsics instead of assembly. (patchset #1 id:1 of https://codereview.chromium.org/873553003/)

Reason for revert:
Reverted the wrong CL.

Original issue's description:
> Revert of SSE4 opaque blend using intrinsics instead of assembly. (patchset #16 id:300001 of https://codereview.chromium.org/874863002/)
>
> Reason for revert:
> This causes a bug on the 'hittestpath' GM on MacMini 4,1
>
> See:
>
> https://gold.skia.org/#/triage/hittestpath?head=0
>
> for details.
>
> Original issue's description:
> > SSE4 opaque blend using intrinsics instead of assembly.
> >
> > Since we had such a hard time with the assembly versions of this blit (to the
> > point that we have them completely disabled everywhere), I thought I'd take
> > a shot at writing a version of the blit using intrinsics.
> >
> > The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*)
> > to skip the blend when the 16 src pixels we consider each loop are all opaque
> > or all transparent. _mm_shuffle_epi8 from SSSE3 also lends a hand to extract
> > all those alphas.
> >
> > It's worth looking to see if we can backport this type of logic to SSE2 using
> > _mm_movemask_epi8, or up to 32 pixels at a time using AVX.
> >
> > My local performance testing doesn't show this to be an unambiguous win
> > (there are probably microbenchmarks and SKPs where we'd be better off just
> > powering through the blend rather than looking at alphas), but the potential
> > does seem tantalizing enough to let skiaperf vet it on the bots. (< 1.0x is a win.)
> >
> > DM says it draws pixel perfect compare to the old code.
> >
> > Microbenchmarks:
> > bitmap_RGBA_8888_A_source_stripes_two 14us -> 14.4us 1.03x
> > bitmap_RGBA_8888_A_source_stripes_three 14.3us -> 14.5us 1.01x
> > bitmap_RGBA_8888_scale_bilerp 61.9us -> 62.2us 1.01x
> > bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp 102us -> 101us 0.99x
> > bitmap_RGBA_8888_scale_rotate_bilerp 103us -> 101us 0.99x
> > bitmap_RGBA_8888_scale 18.4us -> 18.2us 0.99x
> > bitmap_RGBA_8888_A_scale_rotate_bicubic 71us -> 70us 0.99x
> > bitmap_RGBA_8888_update_scale_rotate_bilerp 103us -> 101us 0.99x
> > bitmap_RGBA_8888_A_scale_rotate_bilerp 112us -> 109us 0.98x
> > bitmap_RGBA_8888_update_volatile 5.72us -> 5.58us 0.98x
> > bitmap_RGBA_8888 5.73us -> 5.58us 0.97x
> > bitmap_RGBA_8888_update 5.78us -> 5.6us 0.97x
> > bitmap_RGBA_8888_A_scale_bilerp 70.7us -> 68us 0.96x
> > bitmap_RGBA_8888_A_scale_bicubic 23.7us -> 21.8us 0.92x
> > bitmap_RGBA_8888_A 13.9us -> 10.9us 0.78x
> > bitmap_RGBA_8888_A_source_opaque 14us -> 6.29us 0.45x
> > bitmap_RGBA_8888_A_source_transparent 14us -> 3.65us 0.26x
> >
> > Running over our ~70 SKP web page captures, this looks like we spend 0.7x
> > the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should
> > be a decent predictor of real-world impact.
> >
> > BUG=chromium:399842
> >
> > Committed: https://skia.googlesource.com/skia/+/04bc91b972417038fecfa87c484771eac2b9b785
> >
> > CQ_EXTRA_TRYBOTS=client.skia:Test-Mac10.6-MacMini4.1-GeForce320M-x86_64-Release-Trybot
> >
> > Committed: https://skia.googlesource.com/skia/+/6dbfb21a6c88af6d94e8c823c3ad559f1a41b493
>
> TBR=henrik.smiding@intel.com,mtklein@google.com,herb@google.com,reed@google.com,thakis@chromium.org,mtklein@chromium.org
> NOPRESUBMIT=true
> NOTREECHECKS=true
> NOTRY=true
> BUG=chromium:399842
>
> Committed: https://skia.googlesource.com/skia/+/4988891a1173cd405bf1c1dd3a3668c451f45e4c

TBR=henrik.smiding@intel.com,mtklein@google.com,herb@google.com,reed@google.com,thakis@chromium.org,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=chromium:399842

Review URL: https://codereview.chromium.org/894083002
/external/skia/src/opts/opts_check_x86.cpp
4988891a1173cd405bf1c1dd3a3668c451f45e4c 02-Feb-2015 stephana <stephana@google.com> Revert of SSE4 opaque blend using intrinsics instead of assembly. (patchset #16 id:300001 of https://codereview.chromium.org/874863002/)

Reason for revert:
This causes a bug on the 'hittestpath' GM on MacMini 4,1

See:

https://gold.skia.org/#/triage/hittestpath?head=0

for details.

Original issue's description:
> SSE4 opaque blend using intrinsics instead of assembly.
>
> Since we had such a hard time with the assembly versions of this blit (to the
> point that we have them completely disabled everywhere), I thought I'd take
> a shot at writing a version of the blit using intrinsics.
>
> The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*)
> to skip the blend when the 16 src pixels we consider each loop are all opaque
> or all transparent. _mm_shuffle_epi8 from SSSE3 also lends a hand to extract
> all those alphas.
>
> It's worth looking to see if we can backport this type of logic to SSE2 using
> _mm_movemask_epi8, or up to 32 pixels at a time using AVX.
>
> My local performance testing doesn't show this to be an unambiguous win
> (there are probably microbenchmarks and SKPs where we'd be better off just
> powering through the blend rather than looking at alphas), but the potential
> does seem tantalizing enough to let skiaperf vet it on the bots. (< 1.0x is a win.)
>
> DM says it draws pixel perfect compare to the old code.
>
> Microbenchmarks:
> bitmap_RGBA_8888_A_source_stripes_two 14us -> 14.4us 1.03x
> bitmap_RGBA_8888_A_source_stripes_three 14.3us -> 14.5us 1.01x
> bitmap_RGBA_8888_scale_bilerp 61.9us -> 62.2us 1.01x
> bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp 102us -> 101us 0.99x
> bitmap_RGBA_8888_scale_rotate_bilerp 103us -> 101us 0.99x
> bitmap_RGBA_8888_scale 18.4us -> 18.2us 0.99x
> bitmap_RGBA_8888_A_scale_rotate_bicubic 71us -> 70us 0.99x
> bitmap_RGBA_8888_update_scale_rotate_bilerp 103us -> 101us 0.99x
> bitmap_RGBA_8888_A_scale_rotate_bilerp 112us -> 109us 0.98x
> bitmap_RGBA_8888_update_volatile 5.72us -> 5.58us 0.98x
> bitmap_RGBA_8888 5.73us -> 5.58us 0.97x
> bitmap_RGBA_8888_update 5.78us -> 5.6us 0.97x
> bitmap_RGBA_8888_A_scale_bilerp 70.7us -> 68us 0.96x
> bitmap_RGBA_8888_A_scale_bicubic 23.7us -> 21.8us 0.92x
> bitmap_RGBA_8888_A 13.9us -> 10.9us 0.78x
> bitmap_RGBA_8888_A_source_opaque 14us -> 6.29us 0.45x
> bitmap_RGBA_8888_A_source_transparent 14us -> 3.65us 0.26x
>
> Running over our ~70 SKP web page captures, this looks like we spend 0.7x
> the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should
> be a decent predictor of real-world impact.
>
> BUG=chromium:399842
>
> Committed: https://skia.googlesource.com/skia/+/04bc91b972417038fecfa87c484771eac2b9b785
>
> CQ_EXTRA_TRYBOTS=client.skia:Test-Mac10.6-MacMini4.1-GeForce320M-x86_64-Release-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/6dbfb21a6c88af6d94e8c823c3ad559f1a41b493

TBR=henrik.smiding@intel.com,mtklein@google.com,herb@google.com,reed@google.com,thakis@chromium.org,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=chromium:399842

Review URL: https://codereview.chromium.org/873553003
/external/skia/src/opts/opts_check_x86.cpp
6dbfb21a6c88af6d94e8c823c3ad559f1a41b493 27-Jan-2015 mtklein <mtklein@chromium.org> SSE4 opaque blend using intrinsics instead of assembly.

Since we had such a hard time with the assembly versions of this blit (to the
point that we have them completely disabled everywhere), I thought I'd take
a shot at writing a version of the blit using intrinsics.

The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*)
to skip the blend when the 16 src pixels we consider each loop are all opaque
or all transparent. _mm_shuffle_epi8 from SSSE3 also lends a hand to extract
all those alphas.

It's worth looking to see if we can backport this type of logic to SSE2 using
_mm_movemask_epi8, or up to 32 pixels at a time using AVX.

My local performance testing doesn't show this to be an unambiguous win
(there are probably microbenchmarks and SKPs where we'd be better off just
powering through the blend rather than looking at alphas), but the potential
does seem tantalizing enough to let skiaperf vet it on the bots. (< 1.0x is a win.)

DM says it draws pixel perfect compare to the old code.

Microbenchmarks:
bitmap_RGBA_8888_A_source_stripes_two 14us -> 14.4us 1.03x
bitmap_RGBA_8888_A_source_stripes_three 14.3us -> 14.5us 1.01x
bitmap_RGBA_8888_scale_bilerp 61.9us -> 62.2us 1.01x
bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp 102us -> 101us 0.99x
bitmap_RGBA_8888_scale_rotate_bilerp 103us -> 101us 0.99x
bitmap_RGBA_8888_scale 18.4us -> 18.2us 0.99x
bitmap_RGBA_8888_A_scale_rotate_bicubic 71us -> 70us 0.99x
bitmap_RGBA_8888_update_scale_rotate_bilerp 103us -> 101us 0.99x
bitmap_RGBA_8888_A_scale_rotate_bilerp 112us -> 109us 0.98x
bitmap_RGBA_8888_update_volatile 5.72us -> 5.58us 0.98x
bitmap_RGBA_8888 5.73us -> 5.58us 0.97x
bitmap_RGBA_8888_update 5.78us -> 5.6us 0.97x
bitmap_RGBA_8888_A_scale_bilerp 70.7us -> 68us 0.96x
bitmap_RGBA_8888_A_scale_bicubic 23.7us -> 21.8us 0.92x
bitmap_RGBA_8888_A 13.9us -> 10.9us 0.78x
bitmap_RGBA_8888_A_source_opaque 14us -> 6.29us 0.45x
bitmap_RGBA_8888_A_source_transparent 14us -> 3.65us 0.26x

Running over our ~70 SKP web page captures, this looks like we spend 0.7x
the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should
be a decent predictor of real-world impact.

BUG=chromium:399842

Committed: https://skia.googlesource.com/skia/+/04bc91b972417038fecfa87c484771eac2b9b785

CQ_EXTRA_TRYBOTS=client.skia:Test-Mac10.6-MacMini4.1-GeForce320M-x86_64-Release-Trybot

Review URL: https://codereview.chromium.org/874863002
/external/skia/src/opts/opts_check_x86.cpp
2d80dd264704984bca4c2915f31ef427033b0b9b 26-Jan-2015 bungeman <bungeman@google.com> Revert of SSE4 opaque blend using intrinsics instead of assembly. (patchset #14 id:260001 of https://codereview.chromium.org/874863002/)

Reason for revert:
This kills Mac 10.6 bots.

FAILED: c++ -MMD -MF obj/src/opts/opts_sse4.SkBlitRow_opts_SSE4.o.d -DSK_INTERNAL -DSK_GAMMA_SRGB -DSK_GAMMA_APPLY_TO_A8 -DSK_SCALAR_TO_FLOAT_EXCLUDED -DSK_ALLOW_STATIC_GLOBAL_INITIALIZERS=1 -DSK_SUPPORT_GPU=1 -DSK_SUPPORT_OPENCL=0 -DSK_FORCE_DISTANCE_FIELD_TEXT=0 -DSK_BUILD_FOR_MAC -DSK_CRASH_HANDLER -DSK_DEVELOPER=1 -I../../src/core -I../../src/utils -I../../include/c -I../../include/config -I../../include/core -I../../include/pathops -I../../include/pipe -I../../include/utils/mac -I../../include/effects -O0 -gdwarf-2 -mmacosx-version-min=10.6 -arch x86_64 -mssse3 -Wall -Wextra -Winit-self -Wpointer-arith -Wsign-compare -Wno-unused-parameter -Wno-invalid-offsetof -msse4.1 -c ../../src/opts/SkBlitRow_opts_SSE4.cpp -o obj/src/opts/opts_sse4.SkBlitRow_opts_SSE4.o
../../src/opts/SkBlitRow_opts_SSE4.cpp:15:27: warning: x86intrin.h: No such file or directory
../../src/opts/SkBlitRow_opts_SSE4.cpp: In function 'void S32A_Opaque_BlitRow32_SSE4(SkPMColor*, const SkPMColor*, int, U8CPU)':
../../src/opts/SkBlitRow_opts_SSE4.cpp:40: error: '_mm_testz_si128' was not declared in this scope
../../src/opts/SkBlitRow_opts_SSE4.cpp:45: error: '_mm_testc_si128' was not declared in this scope

Original issue's description:
> SSE4 opaque blend using intrinsics instead of assembly.
>
> Since we had such a hard time with the assembly versions of this blit (to the
> point that we have them completely disabled everywhere), I thought I'd take
> a shot at writing a version of the blit using intrinsics.
>
> The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*)
> to skip the blend when the 16 src pixels we consider each loop are all opaque
> or all transparent. _mm_shuffle_epi8 from SSSE3 also lends a hand to extract
> all those alphas.
>
> It's worth looking to see if we can backport this type of logic to SSE2 using
> _mm_movemask_epi8, or up to 32 pixels at a time using AVX.
>
> My local performance testing doesn't show this to be an unambiguous win
> (there are probably microbenchmarks and SKPs where we'd be better off just
> powering through the blend rather than looking at alphas), but the potential
> does seem tantalizing enough to let skiaperf vet it on the bots. (< 1.0x is a win.)
>
> DM says it draws pixel perfect compare to the old code.
>
> Microbenchmarks:
> bitmap_RGBA_8888_A_source_stripes_two 14us -> 14.4us 1.03x
> bitmap_RGBA_8888_A_source_stripes_three 14.3us -> 14.5us 1.01x
> bitmap_RGBA_8888_scale_bilerp 61.9us -> 62.2us 1.01x
> bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp 102us -> 101us 0.99x
> bitmap_RGBA_8888_scale_rotate_bilerp 103us -> 101us 0.99x
> bitmap_RGBA_8888_scale 18.4us -> 18.2us 0.99x
> bitmap_RGBA_8888_A_scale_rotate_bicubic 71us -> 70us 0.99x
> bitmap_RGBA_8888_update_scale_rotate_bilerp 103us -> 101us 0.99x
> bitmap_RGBA_8888_A_scale_rotate_bilerp 112us -> 109us 0.98x
> bitmap_RGBA_8888_update_volatile 5.72us -> 5.58us 0.98x
> bitmap_RGBA_8888 5.73us -> 5.58us 0.97x
> bitmap_RGBA_8888_update 5.78us -> 5.6us 0.97x
> bitmap_RGBA_8888_A_scale_bilerp 70.7us -> 68us 0.96x
> bitmap_RGBA_8888_A_scale_bicubic 23.7us -> 21.8us 0.92x
> bitmap_RGBA_8888_A 13.9us -> 10.9us 0.78x
> bitmap_RGBA_8888_A_source_opaque 14us -> 6.29us 0.45x
> bitmap_RGBA_8888_A_source_transparent 14us -> 3.65us 0.26x
>
> Running over our ~70 SKP web page captures, this looks like we spend 0.7x
> the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should
> be a decent predictor of real-world impact.
>
> BUG=chromium:399842
>
> Committed: https://skia.googlesource.com/skia/+/04bc91b972417038fecfa87c484771eac2b9b785

TBR=henrik.smiding@intel.com,mtklein@google.com,herb@google.com,reed@google.com,thakis@chromium.org,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=chromium:399842

Review URL: https://codereview.chromium.org/874033004
/external/skia/src/opts/opts_check_x86.cpp
04bc91b972417038fecfa87c484771eac2b9b785 26-Jan-2015 mtklein <mtklein@chromium.org> SSE4 opaque blend using intrinsics instead of assembly.

Since we had such a hard time with the assembly versions of this blit (to the
point that we have them completely disabled everywhere), I thought I'd take
a shot at writing a version of the blit using intrinsics.

The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*)
to skip the blend when the 16 src pixels we consider each loop are all opaque
or all transparent. _mm_shuffle_epi8 from SSSE3 also lends a hand to extract
all those alphas.

It's worth looking to see if we can backport this type of logic to SSE2 using
_mm_movemask_epi8, or up to 32 pixels at a time using AVX.

My local performance testing doesn't show this to be an unambiguous win
(there are probably microbenchmarks and SKPs where we'd be better off just
powering through the blend rather than looking at alphas), but the potential
does seem tantalizing enough to let skiaperf vet it on the bots. (< 1.0x is a win.)

DM says it draws pixel perfect compare to the old code.

Microbenchmarks:
bitmap_RGBA_8888_A_source_stripes_two 14us -> 14.4us 1.03x
bitmap_RGBA_8888_A_source_stripes_three 14.3us -> 14.5us 1.01x
bitmap_RGBA_8888_scale_bilerp 61.9us -> 62.2us 1.01x
bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp 102us -> 101us 0.99x
bitmap_RGBA_8888_scale_rotate_bilerp 103us -> 101us 0.99x
bitmap_RGBA_8888_scale 18.4us -> 18.2us 0.99x
bitmap_RGBA_8888_A_scale_rotate_bicubic 71us -> 70us 0.99x
bitmap_RGBA_8888_update_scale_rotate_bilerp 103us -> 101us 0.99x
bitmap_RGBA_8888_A_scale_rotate_bilerp 112us -> 109us 0.98x
bitmap_RGBA_8888_update_volatile 5.72us -> 5.58us 0.98x
bitmap_RGBA_8888 5.73us -> 5.58us 0.97x
bitmap_RGBA_8888_update 5.78us -> 5.6us 0.97x
bitmap_RGBA_8888_A_scale_bilerp 70.7us -> 68us 0.96x
bitmap_RGBA_8888_A_scale_bicubic 23.7us -> 21.8us 0.92x
bitmap_RGBA_8888_A 13.9us -> 10.9us 0.78x
bitmap_RGBA_8888_A_source_opaque 14us -> 6.29us 0.45x
bitmap_RGBA_8888_A_source_transparent 14us -> 3.65us 0.26x

Running over our ~70 SKP web page captures, this looks like we spend 0.7x
the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should
be a decent predictor of real-world impact.

BUG=chromium:399842

Review URL: https://codereview.chromium.org/874863002
/external/skia/src/opts/opts_check_x86.cpp
94443b8718fa194bb2077324eade66bd68f99b54 20-Jan-2015 reed <reed@google.com> remove dead code after HQ change

BUG=skia:

Review URL: https://codereview.chromium.org/845303005
/external/skia/src/opts/opts_check_x86.cpp
a7f11918d92621507f35b228a290f05dcaf0f4b6 13-Jan-2015 reed <reed@google.com> rename blitrow::proc and add (uncalled) hook for colorproc16

BUG=skia:3302

Review URL: https://codereview.chromium.org/847443003
/external/skia/src/opts/opts_check_x86.cpp
72b0c05fc19eb159c0adbf20ea87ded68c827ca3 10-Dec-2014 qiankun.miao <qiankun.miao@intel.com> Add SSSE3 acceleration for S32_D16_filter_DX

With this CL, related nanobench can be improved for 565 config.
bitmap_BGRA_8888_update_scale_bilerp 76.1us -> 46.7us 0.61x
bitmap_BGRA_8888_scale_bilerp 78.7us -> 47us 0.6x
bitmap_BGRA_8888_update_volatile_scale_bilerp 82.7us -> 46.9us 0.57x

BUG=skia:

Review URL: https://codereview.chromium.org/788853002
/external/skia/src/opts/opts_check_x86.cpp
60f3c657cc0235650b630be78105fc47d37385e7 04-Dec-2014 qiankun.miao <qiankun.miao@intel.com> Add SSSE3 acceleration for S32_D16_filter_DXDY

With this CL, related nanobench can be improved for 565 config.
bitmap_BGRA_8888_scale_rotate_bilerp 115us -> 70.5us 0.61x
bitmap_BGRA_8888_update_volatile_scale_rotate_bilerp 115us -> 70.5us 0.61x
bitmap_BGRA_8888_update_scale_rotate_bilerp 112us -> 68us 0.6x

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/45a05780867a06b9f8a8d5240cf6c5d5a2c15a35

Review URL: https://codereview.chromium.org/773753002
/external/skia/src/opts/opts_check_x86.cpp
9da051ad417c13eb562c91ef6286f92dba2ae232 04-Dec-2014 jam <jam@chromium.org> Revert of Add SSSE3 acceleration for S32_D16_filter_DXDY (patchset #3 id:40001 of https://codereview.chromium.org/773753002/)

Reason for revert:
breaks build when not using SSE3, since the two method definitions differ in parameter types (typo)

Original issue's description:
> Add SSSE3 acceleration for S32_D16_filter_DXDY
>
> With this CL, related nanobench can be improved for 565 config.
> bitmap_BGRA_8888_scale_rotate_bilerp 115us -> 70.5us 0.61x
> bitmap_BGRA_8888_update_volatile_scale_rotate_bilerp 115us -> 70.5us 0.61x
> bitmap_BGRA_8888_update_scale_rotate_bilerp 112us -> 68us 0.6x
>
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/45a05780867a06b9f8a8d5240cf6c5d5a2c15a35

TBR=mtklein@google.com,qkmiao@gmail.com,qiankun.miao@intel.com
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/761103003
/external/skia/src/opts/opts_check_x86.cpp
45a05780867a06b9f8a8d5240cf6c5d5a2c15a35 03-Dec-2014 qiankun.miao <qiankun.miao@intel.com> Add SSSE3 acceleration for S32_D16_filter_DXDY

With this CL, related nanobench can be improved for 565 config.
bitmap_BGRA_8888_scale_rotate_bilerp 115us -> 70.5us 0.61x
bitmap_BGRA_8888_update_volatile_scale_rotate_bilerp 115us -> 70.5us 0.61x
bitmap_BGRA_8888_update_scale_rotate_bilerp 112us -> 68us 0.6x

BUG=skia:

Review URL: https://codereview.chromium.org/773753002
/external/skia/src/opts/opts_check_x86.cpp
c09e2af17fab03d3d36c20e5201a560c3e4c233e 13-Oct-2014 mtklein <mtklein@chromium.org> Fix race in supports_simd().

Local statics are not thread safe in Chrome. Use an SkLazyPtr instead.

See https://code.google.com/p/chromium/issues/detail?id=418041

BUG=418041

Review URL: https://codereview.chromium.org/655573002
/external/skia/src/opts/opts_check_x86.cpp
f31507b240a27b7f7f3db9a92db4d4076e69c2ee 04-Sep-2014 qiankun.miao <qiankun.miao@intel.com> Enable highQualityFilter_SSE2

With SSE2, bitmap_BGRA_8888_A_scale_rotate_bicubic gains about 40%
performance improvement on desktop i7-3770.

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/b381fa10d8079c58928058bb8a6db32b39f05e51

CQ_EXTRA_TRYBOTS=tryserver.skia:Test-Mac10.6-MacMini4.1-GeForce320M-x86_64-Release-Trybot
R=mtklein@google.com, humper@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/525283002
/external/skia/src/opts/opts_check_x86.cpp
c09b2c49a30fed981283f97476e885b40e53f094 03-Sep-2014 mtklein <mtklein@google.com> Revert of Enable highQualityFilter_SSE2 (patchset #1 id:1 of https://codereview.chromium.org/525283002/)

Reason for revert:
Color order looks wrong on Macs:

Before: http://chromium-skia-gm.commondatastorage.googleapis.com/gm/bitmap-64bitMD5/filterbitmap_image_mandrill_16.png/12823183142873462143.png

After: http://chromium-skia-gm.commondatastorage.googleapis.com/gm/bitmap-64bitMD5/filterbitmap_image_mandrill_16.png/13683040204546320578.png

Original issue's description:
> Enable highQualityFilter_SSE2
>
> With SSE2, bitmap_BGRA_8888_A_scale_rotate_bicubic gains about 40%
> performance improvement on desktop i7-3770.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/b381fa10d8079c58928058bb8a6db32b39f05e51

R=humper@google.com, qiankun.miao@intel.com
TBR=humper@google.com, qiankun.miao@intel.com
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Author: mtklein@google.com

Review URL: https://codereview.chromium.org/539523002
/external/skia/src/opts/opts_check_x86.cpp
b381fa10d8079c58928058bb8a6db32b39f05e51 03-Sep-2014 qiankun.miao <qiankun.miao@intel.com> Enable highQualityFilter_SSE2

With SSE2, bitmap_BGRA_8888_A_scale_rotate_bicubic gains about 40%
performance improvement on desktop i7-3770.

BUG=skia:
R=mtklein@google.com, humper@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/525283002
/external/skia/src/opts/opts_check_x86.cpp
5f7f9d04dc3a2d2c3ef9d8f1703d8e13c2d15c6e 07-Jul-2014 henrik.smiding <henrik.smiding@intel.com> Add SSE4 version of BlurImage optimizations.

Adds an SSE4.1 version of the existing BlurImage optimizations.
Performance of blur_image_filter_* benchmarks show a 10-50%
improvement on Linux/Ubuntu Core i7.

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

Committed: https://skia.googlesource.com/skia/+/2830632ce93c97ed7647b13348365ea92e4ea665

R=mtklein@google.com, reed@chromium.org

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/366593004
/external/skia/src/opts/opts_check_x86.cpp
82cb86f613300aa26f858e8f46454b1a8a5e2314 07-Jul-2014 reed <reed@chromium.org> Revert of Add SSE4 version of BlurImage optimizations. (https://codereview.chromium.org/366593004/)

Reason for revert:
breaks linker on chrome

[04:36:09.966000] [503/5965] LIB obj\chrome\installer_util.lib
[04:36:10.466000] FAILED: C:\Users\chrome-bot\buildbot\third_party\depot_tools\python276_bin\python.exe gyp-win-tool link-with-manifests environment.x86 True skia.dll "C:\Users\chrome-bot\buildbot\third_party\depot_tools\python276_bin\python.exe gyp-win-tool link-wrapper environment.x86 False link.exe /nologo /IMPLIB:skia.dll.lib /DLL /OUT:skia.dll @skia.dll.rsp" 2 mt.exe rc.exe "obj\skia\skia.skia.dll.intermediate.manifest" obj\skia\skia.skia.dll.generated.manifest
[04:36:10.466000] skia.opts_check_x86.obj : error LNK2019: unresolved external symbol "bool __cdecl SkBoxBlurGetPlatformProcs_SSE4(void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int))" (?SkBoxBlurGetPlatformProcs_SSE4@@YA_NPAP6AXPBIHPAIHHHHH@Z222@Z) referenced in function "bool __cdecl SkBoxBlurGetPlatformProcs(void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int))" (?SkBoxBlurGetPlatformProcs@@YA_NPAP6AXPBIHPAIHHHHH@Z222@Z)
[04:36:10.466000]
[04:36:10.466000] skia.dll : fatal error LNK1120: 1 unresolved externals

Original issue's description:
> Add SSE4 version of BlurImage optimizations.
>
> Adds an SSE4.1 version of the existing BlurImage optimizations.
> Performance of blur_image_filter_* benchmarks show a 10-50%
> improvement on Linux/Ubuntu Core i7.
>
> Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
>
> Committed: https://skia.googlesource.com/skia/+/2830632ce93c97ed7647b13348365ea92e4ea665

R=mtklein@google.com, henrik.smiding@intel.com
TBR=henrik.smiding@intel.com, mtklein@google.com
NOTREECHECKS=true
NOTRY=true

Author: reed@chromium.org

Review URL: https://codereview.chromium.org/375503003
/external/skia/src/opts/opts_check_x86.cpp
2830632ce93c97ed7647b13348365ea92e4ea665 04-Jul-2014 henrik.smiding <henrik.smiding@intel.com> Add SSE4 version of BlurImage optimizations.

Adds an SSE4.1 version of the existing BlurImage optimizations.
Performance of blur_image_filter_* benchmarks show a 10-50%
improvement on Linux/Ubuntu Core i7.

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

R=mtklein@google.com

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/366593004
/external/skia/src/opts/opts_check_x86.cpp
4f96ab36180489748f4e9bb249d773414ef0d6cb 27-Jun-2014 humper <humper@google.com> Refactor bitmap scaler to make it easier to migrate rest of chrome to use it

Previously, the set of platform-specific function pointers to do fast convolution (e.g., neon, SSE) were passed in a structure to the scaler.

I refactored this so that the scaler fills in these function pointers after it's called, so the caller doesn't have to worry about it.

R=mtklein@google.com
TBR=mtklein
NOTRY=True

Author: humper@google.com

Review URL: https://codereview.chromium.org/354193002
/external/skia/src/opts/opts_check_x86.cpp
3bb195ef0d9691384027d7b61b0b8ef8379aaf5d 27-Jun-2014 henrik.smiding <henrik.smiding@intel.com> Add SSE4 optimization of S32A_Opaque_Blitrow

Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
instruction set. Special case for when alpha is zero or opaque.

Performance increase of 10%-400% compared to the existing SSE2
optimization (measured on Silvermont architecture).
Noticeable in ~25 different skia bench subtests, especially in
bitmap_8888_*, repeatTile_*, and morph_*.

bitmap_8888_A - 100% faster
bitmap_8888_A_source_transparent - 250% faster
bitmap_8888_A_source_opaque - 25% faster
bitmap_8888_A_scale_bicubic - 75% faster

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e

Committed: https://skia.googlesource.com/skia/+/b5c281e1e06af3be804309877de1dac6145686b9

R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/289473009
/external/skia/src/opts/opts_check_x86.cpp
db6346a5b1d76b150920a5dce47c436fee9c6281 18-Jun-2014 mtklein <mtklein@google.com> Revert of Add SSE4 optimization of S32A_Opaque_Blitrow (https://codereview.chromium.org/289473009/)

NOTREECHECKS=true
NOTRY=true

Reason for revert:
Valgrind bot's seeing this code use uninitialized memory, and it's somehow blocking our roll into Chrome too:

> ld: warning: could not create compact unwind for
S32A_Opaque_BlitRow32_SSE4_asm:
> stack subq instruction is too different from dwarf stack size
> [10339/10982 | 3247.792] PACKAGE FRAMEWORK "Chromium Framework.framework",
> POSTBUILDS
> FAILED: ./gyp-mac-tool package-framework "Chromium Framework.framework" A &&
> (export
> BUILT_PRODUCTS_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release;
> export CONFIGURATION=Release; export CONTENTS_FOLDER_PATH="Chromium
> Framework.framework/Versions/A"; export
> DYLIB_INSTALL_NAME_BASE=@executable_path/../Versions/37.0.2056.0; export
> EXECUTABLE_NAME="Chromium Framework"; export EXECUTABLE_PATH="Chromium
> Framework.framework/Versions/A/Chromium Framework"; export
> FULL_PRODUCT_NAME="Chromium Framework.framework"; export
> INFOPLIST_PATH="Chromium Framework.framework/Versions/A/Resources/Info.plist";
> export
LD_DYLIB_INSTALL_NAME="@executable_path/../Versions/37.0.2056.0/Chromium
> Framework.framework/Chromium Framework"; export MACH_O_TYPE=mh_dylib; export
> PRODUCT_NAME="Chromium Framework"; export
> PRODUCT_TYPE=com.apple.product-type.framework; export
>
SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.6.sdk;
> export
>
SRCROOT=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/../../chrome;
> export SOURCE_ROOT="${SRCROOT}"; export
> TARGET_BUILD_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release;
> export TEMP_DIR="${TMPDIR}"; export
UNLOCALIZED_RESOURCES_FOLDER_PATH="Chromium
> Framework.framework/Versions/A/Resources"; export WRAPPER_NAME="Chromium
> Framework.framework"; (cd ../../chrome && ../build/mac/tweak_info_plist.py
> "--breakpad=1" "--breakpad_uploads=0" "--keystone=0" "--scm=1"
> "--branding=Chromium" && ln -fns Versions/Current/Libraries
> "${BUILT_PRODUCTS_DIR}/${WRAPPER_NAME}/Libraries" &&
> tools/build/mac/verify_order _ChromeMain
> "${BUILT_PRODUCTS_DIR}/${EXECUTABLE_PATH}"); G=$?; ((exit $G) || rm -rf
> 'Chromium Framework.framework') && exit $G) && touch "Chromium
> Framework.framework"
> tools/build/mac/verify_order: unordered symbols in
> /Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/Chromium
> Framework.framework/Versions/A/Chromium Framework:
> S32A_Opaque_BlitRow32_SSE4_asm
> _S32A_Opaque_BlitRow32_SSE4_asm
> ninja: build stopped: subcommand failed.

Original issue's description:
> Add SSE4 optimization of S32A_Opaque_Blitrow
>
> Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
> instruction set. Special case for when alpha is zero or opaque.
>
> Performance increase of 10%-400% compared to the existing SSE2
> optimization (measured on Silvermont architecture).
> Noticeable in ~25 different skia bench subtests, especially in
> bitmap_8888_*, repeatTile_*, and morph_*.
>
> bitmap_8888_A - 100% faster
> bitmap_8888_A_source_transparent - 250% faster
> bitmap_8888_A_source_opaque - 25% faster
> bitmap_8888_A_scale_bicubic - 75% faster
>
> Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
>
> Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e
>
> Committed: https://skia.googlesource.com/skia/+/b5c281e1e06af3be804309877de1dac6145686b9

R=reed@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com, mtklein@chromium.org

Author: mtklein@google.com

Review URL: https://codereview.chromium.org/336413007
/external/skia/src/opts/opts_check_x86.cpp
b5c281e1e06af3be804309877de1dac6145686b9 17-Jun-2014 henrik.smiding <henrik.smiding@intel.com> Add SSE4 optimization of S32A_Opaque_Blitrow

Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
instruction set. Special case for when alpha is zero or opaque.

Performance increase of 10%-400% compared to the existing SSE2
optimization (measured on Silvermont architecture).
Noticeable in ~25 different skia bench subtests, especially in
bitmap_8888_*, repeatTile_*, and morph_*.

bitmap_8888_A - 100% faster
bitmap_8888_A_source_transparent - 250% faster
bitmap_8888_A_source_opaque - 25% faster
bitmap_8888_A_scale_bicubic - 75% faster

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e

R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/289473009
/external/skia/src/opts/opts_check_x86.cpp
9ccabf7f94500934b4b501492764e6f41449dd27 16-Jun-2014 mtklein <mtklein@google.com> Revert of Temporarily limit x86 SIMD to SSE2 only, to see effect on all benches and bots. (https://codereview.chromium.org/331193004/)

Reason for revert:
Experiment is over: disabling SSSE3 is a 25-50% perf regression for bitmap scaling on every machine we've got.

Original issue's description:
> Temporarily limit x86 SIMD to SSE2 only, to see effect on all benches and bots.
>
> BUG=372232
>
> Committed: https://skia.googlesource.com/skia/+/f1e5a04832e4d350f9ebf5d556c6d3897345f883

R=reed@google.com, mtklein@chromium.org
TBR=mtklein@chromium.org, reed@google.com
NOTREECHECKS=true
NOTRY=true
BUG=372232

Author: mtklein@google.com

Review URL: https://codereview.chromium.org/332213005
/external/skia/src/opts/opts_check_x86.cpp
f1e5a04832e4d350f9ebf5d556c6d3897345f883 16-Jun-2014 mtklein <mtklein@chromium.org> Temporarily limit x86 SIMD to SSE2 only, to see effect on all benches and bots.

BUG=372232
R=reed@google.com, mtklein@google.com

Author: mtklein@chromium.org

Review URL: https://codereview.chromium.org/331193004
/external/skia/src/opts/opts_check_x86.cpp
71804cc3035e7ba6c307a5273abfc74971c88e85 05-Jun-2014 jvanverth <jvanverth@google.com> Revert of Add SSE4 optimization of S32A_Opaque_Blitrow (https://codereview.chromium.org/289473009/)

Reason for revert:
Buildbot failures on Mac 10.6 and Mac 10.7.

R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com
TBR=reed@google.com
NOTRY=True

Original issue's description:
> Add SSE4 optimization of S32A_Opaque_Blitrow
>
> Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
> instruction set. Special case for when alpha is zero or opaque.
>
> Performance increase of 10%-400% compared to the existing SSE2
> optimization (measured on Silvermont architecture).
> Noticeable in ~25 different skia bench subtests, especially in
> bitmap_8888_*, repeatTile_*, and morph_*.
>
> bitmap_8888_A - 100% faster
> bitmap_8888_A_source_transparent - 250% faster
> bitmap_8888_A_source_opaque - 25% faster
> bitmap_8888_A_scale_bicubic - 75% faster
>
> Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
>
> Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e

Author: jvanverth@google.com

Review URL: https://codereview.chromium.org/311053009
/external/skia/src/opts/opts_check_x86.cpp
e2527b147679b0c43019fae7d59cc3777d2d097e 05-Jun-2014 henrik.smiding <henrik.smiding@intel.com> Add SSE4 optimization of S32A_Opaque_Blitrow

Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
instruction set. Special case for when alpha is zero or opaque.

Performance increase of 10%-400% compared to the existing SSE2
optimization (measured on Silvermont architecture).
Noticeable in ~25 different skia bench subtests, especially in
bitmap_8888_*, repeatTile_*, and morph_*.

bitmap_8888_A - 100% faster
bitmap_8888_A_source_transparent - 250% faster
bitmap_8888_A_source_opaque - 25% faster
bitmap_8888_A_scale_bicubic - 75% faster

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/289473009
/external/skia/src/opts/opts_check_x86.cpp
cba73780bbd12fd254229517aec04fcbf0b64b52 29-May-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> replace config() with colorType()

BUG=skia:
R=robertphillips@google.com

Author: reed@google.com

Review URL: https://codereview.chromium.org/303543009

git-svn-id: http://skia.googlecode.com/svn/trunk@14959 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/opts_check_x86.cpp
f0ea77a3630e6d1c01d83aa5430b3780da9e88b6 21-May-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> SSE2 implementation of memcpy32

With SSE2 version memcpy32, S32_Opaque_BlitRow32() in SkBlitRow_D32.cpp
has about 30% performance improvement. Here are the data on desktop
i7-3770.
before:
bitmap_scale_filter_90_90 8888: cmsecs = 2.01
bitmaprect_FF_filter_trans 8888: cmsecs = 3.61
bitmaprect_FF_nofilter_trans 8888: cmsecs = 3.57
bitmaprect_FF_filter_identity 8888: cmsecs = 3.53
bitmaprect_FF_nofilter_identity 8888: cmsecs = 3.53
bitmap_4444_update 8888: cmsecs = 4.84
bitmap_4444_update_volatile 8888: cmsecs = 4.81
bitmap_4444 8888: cmsecs = 4.81
after:
bitmap_scale_filter_90_90 8888: cmsecs = 1.83
bitmaprect_FF_filter_trans 8888: cmsecs = 2.36
bitmaprect_FF_nofilter_trans 8888: cmsecs = 2.36
bitmaprect_FF_filter_identity 8888: cmsecs = 2.60
bitmaprect_FF_nofilter_identity 8888: cmsecs = 2.63
bitmap_4444_update 8888: cmsecs = 3.30
bitmap_4444_update_volatile 8888: cmsecs = 3.30
bitmap_4444 8888: cmsecs = 3.29

BUG=skia:
R=mtklein@google.com, reed@google.com, bsalomon@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/285313002

git-svn-id: http://skia.googlecode.com/svn/trunk@14822 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/opts_check_x86.cpp
ce4402c2fbae8a2bc73b79dc28e0fb9ea9d82c88 12-May-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> Improved x86 SSE build and run-time checks.

Replaces the current build/run-time checks for SSE level in
opts_check_x86.cpp with a simpler and more future-proof version.
Also adds SSE versions 4.1 and 4.2 to the config file.

Author: henrik.smiding@intel.com

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

Committed: http://code.google.com/p/skia/source/detail?r=14644

R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/272503006

git-svn-id: http://skia.googlecode.com/svn/trunk@14693 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/opts_check_x86.cpp
443c0a6d61536ad826f29da852042fca26be0c40 08-May-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> Revert of Improved x86 SSE build and run-time checks. (https://codereview.chromium.org/272503006/)

Reason for revert:
Windows builders breaking. :(

Original issue's description:
> Improved x86 SSE build and run-time checks.
>
> Replaces the current build/run-time checks for SSE level in
> opts_check_x86.cpp with a simpler and more future-proof version.
> Also adds SSE versions 4.1 and 4.2 to the config file.
>
> Author: henrik.smiding@intel.com
>
> Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
>
> Committed: http://code.google.com/p/skia/source/detail?r=14644

R=reed@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com
TBR=djsollen@google.com, henrik.smiding@intel.com, joakim.landberg@intel.com, reed@google.com, tomhudson@google.com
NOTREECHECKS=true
NOTRY=true

Author: mtklein@google.com

Review URL: https://codereview.chromium.org/277593004

git-svn-id: http://skia.googlecode.com/svn/trunk@14646 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/opts_check_x86.cpp
2656b1dbb37906ee9bf77e5bc64806ab60956645 08-May-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> Improved x86 SSE build and run-time checks.

Replaces the current build/run-time checks for SSE level in
opts_check_x86.cpp with a simpler and more future-proof version.
Also adds SSE versions 4.1 and 4.2 to the config file.

Author: henrik.smiding@intel.com

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/272503006

git-svn-id: http://skia.googlecode.com/svn/trunk@14644 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/opts_check_x86.cpp
8c4953c6f176469ad287c3270ab146e292b23bad 30-Apr-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> Cleanup of SSE optimization files.

General cleanup of optimization files for x86/SSEx.
Renamed the opts_check_SSE2.cpp file to _x86, since it's not specific
to SSE2. Commented out the ColorRect32 optimization, since it's
disabled anyway, to make it more visible.
Also fixed a lot of indentation, inclusion guards, spelling,
copyright headers, braces, whitespace, and sorting of includes.

Author: henrik.smiding@intel.com

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/264603002

git-svn-id: http://skia.googlecode.com/svn/trunk@14464 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/opts_check_x86.cpp