History log of /external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
52fe583a7b21ee1cb04e95b05db9946be899b26d 13-Mar-2017 Florin Malita <fmalita@chromium.org> Remove SK_SUPPORT_LEGACY_BROKEN_LERP support

Chromium change landed.

BUG=chromium:696216

CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD

Change-Id: I3e67392b0fdad8c5a3ad256e4f190123dff6c846
Reviewed-on: https://skia-review.googlesource.com/9551
Reviewed-by: Mike Reed <reed@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Florin Malita <fmalita@chromium.org>
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
4b19b403944dd4ab70507c0dea2aa3d38f145eac 07-Mar-2017 Mike Klein <mtklein@chromium.org> Revert "Fix new IT blocks ARMv8"

This reverts commit 90165c2269bc33ca3d6aaa73d528194daf48da4e.

Reason for revert: Skia and Chrome iOS builds broken.

../../third_party/skia/include/private/SkFixed.h:106:41: error: invalid output constraint '+t' in asm
asm("vcvt.s32.f32 %0, %0, #16": "+t"(x));


Original change's description:
> Fix new IT blocks ARMv8
>
> ARMv8 specifies that an IT block should be followed by only one 16-bit instruction.
> * SkFloatToFix is back to a C implementation that mirrors the assembly code.
>
> * S32A_D565_Opaque_neon switched the usage of the temporary 'ip' register to let
> the compiler choose what is best in the context of the IT block. And replaced
> 'keep_dst' by 'ip' where low register or high register does not matter.
>
> BUG=skia:
>
> CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
>
> Change-Id: I096759841c972e9300c1d0293bc80d3c3ff2747b
> Reviewed-on: https://skia-review.googlesource.com/9340
> Reviewed-by: Mike Klein <mtklein@chromium.org>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
>

TBR=mtklein@chromium.org,amaury.leleyzour@arm.com,reviews@skia.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD

Change-Id: Idbcbda88039066153e1c34233d43366ab114fd01
Reviewed-on: https://skia-review.googlesource.com/9332
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
90165c2269bc33ca3d6aaa73d528194daf48da4e 06-Mar-2017 Amaury Le Leyzour <amaury.leleyzour@arm.com> Fix new IT blocks ARMv8

ARMv8 specifies that an IT block should be followed by only one 16-bit instruction.
* SkFloatToFix is back to a C implementation that mirrors the assembly code.

* S32A_D565_Opaque_neon switched the usage of the temporary 'ip' register to let
the compiler choose what is best in the context of the IT block. And replaced
'keep_dst' by 'ip' where low register or high register does not matter.

BUG=skia:

CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD

Change-Id: I096759841c972e9300c1d0293bc80d3c3ff2747b
Reviewed-on: https://skia-review.googlesource.com/9340
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
40254c2c2dc28a34f96294d5a1ad94a99b0be8a6 05-Aug-2016 lsalzman <lsalzman@mozilla.com> SkBlendARGB32 and S32[A]_Blend_BlitRow32 are currently formulated as: SkAlphaMulQ(src, src_scale) + SkAlphaMulQ(dst, dst_scale), which boils down to ((src*src_scale)>>8) + ((dst*dst_scale)>>8). In particular, note that the intermediate precision is discarded before the two parts are added together, causing the final result to possibly inaccurate.

In Firefox, we use SkCanvas::saveLayer in combination with a backdrop that initializes the layer to the background. When this is blended back onto background using transparency, where the source and destination pixel colors are the same, the resulting color after the blend is not preserved due to the lost precision mentioned above. In cases where this operation is repeatedly performed, this causes substantially noticeable differences in color as evidenced in this downstream Firefox bug report: https://bugzilla.mozilla.org/show_bug.cgi?id=1200684

In the test-case in the downstream report, essentially it does blend(src=0xFF2E3338, dst=0xFF2E3338, scale=217), which gives the result 0xFF2E3237, while we would expect to get back 0xFF2E3338.

This problem goes away if the blend is instead reformulated to effectively do (src*src_scale + dst*dst_scale)>>8, which keeps the intermediate precision during the addition before shifting it off.

This modifies the blending operations thusly. The performance should remain mostly unchanged, or possibly improve slightly, so there should be no real downside to doing this, with the benefit of making the results more accurate. Without this, it is currently unsafe for Firefox to blend a layer back onto itself that was initialized with a copy of its background.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2097883002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

[mtklein adds...]
No public API changes.
TBR=reed@google.com

Review-Url: https://codereview.chromium.org/2097883002
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
b4a7dc99b1a01cdd5c0cd5913b630436ca696210 23-Mar-2016 mtklein <mtklein@chromium.org> Port S32A_opaque blit row to SkOpts.

This should be a pixel-for-pixel (i.e. bug-for-bug) port.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1820313002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1820313002
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
f61088321c0906f62431c029173c1a7a70856ec7 16-Dec-2015 mtklein <mtklein@chromium.org> count is an int, so constrain it to a 32-bit w-register.

This piece of code is already 64-bit only, so we don't need to think about ARMv7.

Hopefully this shuts up the warnings. They were harmless.

If this doesn't work (it's relatively new modifier, so maybe some compilers barf), an alternative is to cast count to a size_t.

BUG=skia:4686
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1527123003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1527123003
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
4a37d08382a16717cde52c3d2687b021c5413464 10-Sep-2015 mtklein <mtklein@chromium.org> Port SkBlitRow::Color32 to SkOpts.

This was a pre-SkOpts attempt that we can bring under its wing now.

This should be a perf no-op, deo volente.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1314863006
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
96fcdcc219d2a0d3579719b84b28bede76efba64 27-Aug-2015 halcanary <halcanary@google.com> Style Change: NULL->nullptr
DOCS_PREVIEW= https://skia.org/?cl=1316233002

Review URL: https://codereview.chromium.org/1316233002
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
3848427d884b72114854c8eef9662691f23fae7b 07-Aug-2015 mtklein <mtklein@chromium.org> The compiler can generate smulbb perfectly well nowadays.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1273203002
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
e683e810a35e4ea91b00104590e15a56cb35ad39 06-Aug-2015 mtklein <mtklein@chromium.org> Purge non-NEON ARM code.

As I begin to wade in here, it's nice to remove as much code as possible.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1277953002
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
059ac00446404506a46cd303db15239c7aae49d5 22-Jun-2015 mtklein <mtklein@chromium.org> Update some Sk4px APIs.

Mostly this is about ergonomics, making it easier to do good operations and hard / impossible to do bad ones.

- SkAlpha / SkPMColor constructors become static factories.
- Remove div255TruncNarrow(), rename div255RoundNarrow() to div255(). In practice we always want to round, and the narrowing to 8-bit is contextually obvious.
- Rename fastMulDiv255Round() approxMulDiv255() to stress it's approximate-ness over its speed. Drop Round for the same reason as above... we should always round.
- Add operator overloads so we don't have to keep throwing in seemingly-random Sk4px() or Sk4px::Wide() casts.
- use operator*() for 8-bit x 8-bit -> 16-bit math. It's always what we want, and there's generally no 8x8->8 alternative.
- MapFoo can take a const Func&. Don't think it makes a big difference, but nice to do.

BUG=skia:

Review URL: https://codereview.chromium.org/1202013002
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
051a51ec32549078a65ade19e42df0b9d9f9a3ff 22-May-2015 mtklein <mtklein@chromium.org> Re-proc SkBlitRow::Color32 for ARM.

This is a spiritual revert of http://crrev.com/1104183004.

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/4e13a23d8f720e17660f26657b45b89fe4339004

Review URL: https://codereview.chromium.org/1145283003
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
3b8d9f776e476746b38c0dca674e497d7c4b13cb 22-May-2015 mtklein <mtklein@google.com> Revert of Re-proc SkBlitRow::Color32 for ARM. (patchset #3 id:40001 of https://codereview.chromium.org/1145283003/)

Reason for revert:
http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_rel_device_ninja/builds/70016/steps/compile%20%28with%20patch%29/logs/stdio

Original issue's description:
> Re-proc SkBlitRow::Color32 for ARM.
>
> This is a spiritual revert of http://crrev.com/1104183004.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/4e13a23d8f720e17660f26657b45b89fe4339004

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1157633003
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
4e13a23d8f720e17660f26657b45b89fe4339004 22-May-2015 mtklein <mtklein@chromium.org> Re-proc SkBlitRow::Color32 for ARM.

This is a spiritual revert of http://crrev.com/1104183004.

BUG=skia:

Review URL: https://codereview.chromium.org/1145283003
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
95cc012ccaea20f372893ae277ea0a8a6339d094 28-Apr-2015 mtklein <mtklein@chromium.org> De-proc Color32

Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
which is no longer needed.

Seems handy to have SkTypes include the relevant intrinsics when
we know we've got them, but I'm not married to it.

Locally this looks like a pointlessly small perf win, but I'm mostly
keen to get all the code together.

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b

Committed: https://skia.googlesource.com/skia/+/d65dc0cedd5b50dd407b6ff8fdc39123f11511cc

CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Mips-Debug-Android-Trybot

Review URL: https://codereview.chromium.org/1104183004
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
641c3ff7c680ef7935d47d2e68f8301acc79e3de 27-Apr-2015 mtklein <mtklein@google.com> Revert of De-proc Color32 (patchset #5 id:80001 of https://codereview.chromium.org/1104183004/)

Reason for revert:
duh

Original issue's description:
> De-proc Color32
>
> Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
> which is no longer needed.
>
> Seems handy to have SkTypes include the relevant intrinsics when
> we know we've got them, but I'm not married to it.
>
> Locally this looks like a pointlessly small perf win, but I'm mostly
> keen to get all the code together.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b
>
> Committed: https://skia.googlesource.com/skia/+/d65dc0cedd5b50dd407b6ff8fdc39123f11511cc

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1102363006
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
d65dc0cedd5b50dd407b6ff8fdc39123f11511cc 27-Apr-2015 mtklein <mtklein@chromium.org> De-proc Color32

Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
which is no longer needed.

Seems handy to have SkTypes include the relevant intrinsics when
we know we've got them, but I'm not married to it.

Locally this looks like a pointlessly small perf win, but I'm mostly
keen to get all the code together.

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b

Review URL: https://codereview.chromium.org/1104183004
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
498856ebc6f22d7018071bd6696756d7cd077ab8 27-Apr-2015 mtklein <mtklein@google.com> Revert of De-proc Color32 (patchset #4 id:60001 of https://codereview.chromium.org/1104183004/)

Reason for revert:
MIPS

Original issue's description:
> De-proc Color32
>
> Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
> which is no longer needed.
>
> Seems handy to have SkTypes include the relevant intrinsics when
> we know we've got them, but I'm not married to it.
>
> Locally this looks like a pointlessly small perf win, but I'm mostly
> keen to get all the code together.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1108163002
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
376e9bc206b69d9190f38dfebb132a8769bbd72b 27-Apr-2015 mtklein <mtklein@chromium.org> De-proc Color32

Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
which is no longer needed.

Seems handy to have SkTypes include the relevant intrinsics when
we know we've got them, but I'm not married to it.

Locally this looks like a pointlessly small perf win, but I'm mostly
keen to get all the code together.

BUG=skia:

Review URL: https://codereview.chromium.org/1104183004
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
a4a0aeb74808a0860f3e94588d0ceb0da9fed386 21-Apr-2015 mtklein <mtklein@google.com> Revert of Convert Color32 code to perfect blend. (patchset #6 id:100001 of https://codereview.chromium.org/1098913002/)

Reason for revert:
Xfermode_SrcOver not looking encouraging. Up to 50% regressions.

https://perf.skia.org/#3242

Original issue's description:
> Convert Color32 code to perfect blend.
>
> Before we commit to blend_256_round_alt, let's make sure blend_perfect is
> really slower in practice (i.e. regresses on perf.skia.org).
>
> blend_perfect is really the most desirable algorithm if we can afford it. Not
> only is it correct, but it's easy to think about and break into correct pieces:
> for instance, its div255() doesn't require any coordination with the multiply.
>
> This looks like a 30% hit according to microbenches. That said, microbenches
> said my previous change would be a 20-25% perf improvement, but it didn't end
> up showing a significant effect at a high level.
>
> As for correctness, I see a bunch of off-by-1 compared to blend_256_round_alt
> (exactly what we'd expect), and one off-by-3 in a GM that looks like it has a
> bunch of overdraw.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/61221e7f87a99765b0e034020e06bb018e2a08c2

TBR=reed@google.com,fmalita@chromium.org,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1083923006
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
61221e7f87a99765b0e034020e06bb018e2a08c2 20-Apr-2015 mtklein <mtklein@chromium.org> Convert Color32 code to perfect blend.

Before we commit to blend_256_round_alt, let's make sure blend_perfect is
really slower in practice (i.e. regresses on perf.skia.org).

blend_perfect is really the most desirable algorithm if we can afford it. Not
only is it correct, but it's easy to think about and break into correct pieces:
for instance, its div255() doesn't require any coordination with the multiply.

This looks like a 30% hit according to microbenches. That said, microbenches
said my previous change would be a 20-25% perf improvement, but it didn't end
up showing a significant effect at a high level.

As for correctness, I see a bunch of off-by-1 compared to blend_256_round_alt
(exactly what we'd expect), and one off-by-3 in a GM that looks like it has a
bunch of overdraw.

BUG=skia:

Review URL: https://codereview.chromium.org/1098913002
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
afe2ffb8ba5e7362a2ee6f4e1540c9ab22df2c1e 17-Apr-2015 mtklein <mtklein@chromium.org> Rework SSE and NEON Color32 algorithms to be more correct and faster.

This algorithm changes the blend math, guarded by SK_LEGACY_COLOR32_MATH. The new math is more correct: it's never off by more than 1, and correct in all the interesting 0x00 and 0xFF edge cases, where the old math was never off by more than 2, and not always correct on the edges.

If you look at tests/BlendTest.cpp, the old code was using the `blend_256_plus1_trunc` algorithm, while the new code uses `blend_256_round_alt`. Neither uses `blend_perfect`, which is about ~35% slower than `blend_256_round_alt`.

This will require an unfathomable number of rebaselines, first to Skia, then to Blink when I remove the guard.

I plan to follow up with some integer SIMD abstractions that can unify these two implementations into a single algorithm. This was originally what I was working on here, but the correctness gains seem to be quite compelling. The only places these two algorithms really differ greatly now is the kernel function, and even there they can really both be expressed abstractly as:
- multiply 8-bits and 8-bits producing 16-bits
- add 16-bits to 16-bits, returning the top 8 bits.
All the constants are the same, except SSE is a little faster to keep 8 16-bit inverse alphas, NEON's a little faster to keep 8 8-bit inverse alphas. I may need to take this small speed win back to unify the two.

We should expect a ~25% speedup on Intel (mostly from unrolling to 8 pixels) and a ~20% speedup on ARM (mostly from using vaddhn to add `color`, round, and narrow back down to 8-bit all into one instruction.

(I am probably missing several more related bugs here.)
BUG=skia:3738,skia:420,chromium:111470

Review URL: https://codereview.chromium.org/1092433002
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
3f55eed73f5af405909c2c10bff179d80526d423 02-Apr-2015 Mike Klein <mtklein@google.com> I suspect S32A_D565_Opaque_neon for Daisy problems.

I don't see any color-order handling logic in the 32-bit code.

BUG=skia:1843

CQ_EXCLUDE_TRYBOTS=client.skia.compile:Build-Win-MSVC-x86-Debug-Trybot,Build-Win-MSVC-x86_64-Debug-Trybot
R=mtklein@google.com

Review URL: https://codereview.chromium.org/1051683003
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
b1629c5d9eb6236429cca1502d3bf5fcda8e3406 17-Feb-2015 kui.zheng <kui.zheng@arm.com> Fix skia bug 2845

Shouldn't call Fast Blur path(DoubleRowBoxBlur_NEON)
when kernelsize is 1. Or, uint16x8_t resultPixels will be overflow.

BUG=skia:2845
R=senorblanco@chromium.org

Review URL: https://codereview.chromium.org/587543003
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
402448d6818cab9d7b7633a0c18fcf574c915357 29-Jan-2015 mlee <mlee@nvidia.com> skia: blend32_16_row for neon version

This includes blend32_16_row neon implementation
for aarch32 and aarch64.

For performance,
blend32_16_row is called in following tests in nanobench.
- Xfermode_SrcOver
- tablebench
- rotated_rects_bw_alternating_transparent_and_opaque_srcover
- rotated_rects_bw_changing_transparent_srcover
- rotated_rects_bw_same_transparent_srcover
- luma_colorfilter_large
- luma_colorfilter_small
- chart_bw

I can see perf increase in following two tests, especially. For others, looks
similar.
For each, I tried to run two times.

1) Xfermode_SrcOver
<org>
- D/skia ( 2000): 3M 57 17.3µs 17.4µs 17.4µs 17.7µs 1%
█▃▂▃▂▂▂▁▃▂ 565 Xfermode_SrcOver
- D/skia ( 1915): 3M 70 13.5µs 16.9µs 16.7µs 18.8µs 9%
▆█▄▅█▁▅▅▆▄ 565 Xfermode_SrcOver

<new>
- D/skia ( 2000): 3M 8 11.6µs 11.8µs 12.1µs 14.4µs 7%
▃█▁▁▂▁▁▁▂▂ 565 Xfermode_SrcOver
- D/skia ( 2004): 3M 62 10.3µs 12.9µs 13µs 15.2µs 11%
█▅▅▆▁▅▅▅▇▃ 565 Xfermode_SrcOver

2)
luma_colorfilter_large
<org>
- D/skia ( 2000): 159M 8 136µs 136µs 136µs 139µs 1%
█▃▁▂▁▁▁▁▁▁ 565 luma_colorfilter_large
- D/skia ( 1915): 158M 2 135µs 177µs 182µs 269µs 22%
▆▃█▁▁▃▃▃▃▃ 565 luma_colorfilter_large

<new>
- D/skia ( 2000): 157M 5 84.2µs 85.3µs 87.5µs 110µs 9%
█▁▂▁▁▁▁▁▁▁ 565 luma_colorfilter_large
- D/skia ( 2004): 159M 6 84.7µs 110µs 112µs 144µs 18%
█▄▇▁▁▄▃▄▄▆ 565 luma_colorfilter_large

Review URL: https://codereview.chromium.org/847363002
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
a7f11918d92621507f35b228a290f05dcaf0f4b6 13-Jan-2015 reed <reed@google.com> rename blitrow::proc and add (uncalled) hook for colorproc16

BUG=skia:3302

Review URL: https://codereview.chromium.org/847443003
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
fa115bd4543631244f3b9accb3541b28f4222a96 22-Aug-2014 mtklein <mtklein@chromium.org> Disable Neon optimization of bad S32A/D565 blend.

BUG=skia:2797

Committed: https://skia.googlesource.com/skia/+/84cab93186fbe3e87d931fea73cb31b70ff5017b

R=mtklein@google.com

Author: mtklein@chromium.org

Review URL: https://codereview.chromium.org/497823002
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
84cab93186fbe3e87d931fea73cb31b70ff5017b 22-Aug-2014 mtklein <mtklein@chromium.org> Disable Neon optimization of bad S32A/D565 blend.

BUG=skia:2797
R=mtklein@google.com

Author: mtklein@chromium.org

Review URL: https://codereview.chromium.org/497823002
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
5b2c2c6fd09752641b14766678d62fe50b4e3ef3 22-Aug-2014 reed <reed@google.com> disable neon proc that is triggering asserts

BUG=skia:2845
R=mtklein@google.com

Author: reed@google.com

Review URL: https://codereview.chromium.org/498733002
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
ea13afff6e46d8a969611cdd56c996bfb05a27c1 12-Aug-2014 thakis <thakis@chromium.org> Let skia build with clang's integrated assembler.

1. vuzpq is a gcc instruction. Replace it with the equivalent vuzp
(see http://llvm.org/PR20423)

2. .func / .endfunc only have an effect with -gstabs, which we don't
use. As it's unused and clang doesn't support it, remove
.func / .endfunc (also see http://llvm.org/20424)

BUG=chromium:124610
R=mtklein@google.com

Author: thakis@chromium.org

Review URL: https://codereview.chromium.org/461693004
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
0be677d35c7ef1bf8a7a694d1838fa11333d1bec 09-Aug-2014 kevin.petit <kevin.petit@arm.com> Fix S32A_D565_Opaque for RGBA on arm64

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=skia:2813
R=halcanary@google.com, djsollen@google.com, mtklein@google.com

Author: kevin.petit@arm.com

Review URL: https://codereview.chromium.org/458453002
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
9f2ed6d4d85464bda359d04e646629d96b757362 07-Aug-2014 djsollen <djsollen@google.com> Disable suspect NEON function for 64-bit Android

R=halcanary@google.com, mtklein@google.com, kevin.petit@arm.com

Author: djsollen@google.com

Review URL: https://codereview.chromium.org/451633006
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
0d448303099037d56d8f97ce87d8cae05dd9fdab 26-Jun-2014 kevin.petit <kevin.petit@arm.com> ARM Skia NEON patches - 40 - arm64: S32A_D565_Opaque

Here are some perf results:

+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -2.54% | -5.39% |
+-------+------------+------------+
| 2 | -0.66% | -2.08% |
+-------+------------+------------+
| 4 | -11.13% | 0.00% |
+-------+------------+------------+
| 8 | -5.79% | -1.30% |
+-------+------------+------------+
| 16 | 71.60% | 93.27% |
+-------+------------+------------+
| 64 | 30.99% | 57.35% |
+-------+------------+------------+
| 256 | 25.41% | 52.59% |
+-------+------------+------------+
| 1024 | 25.56% | 53.76% |
+-------+------------+------------+

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=skia:
R=mtklein@google.com, djsollen@google.com

Author: kevin.petit@arm.com

Review URL: https://codereview.chromium.org/346843003
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
ea6b46b6c06fd9e03b98f01b274733de1eeae89d 06-Jun-2014 kevin.petit <kevin.petit@arm.com> ARM Skia NEON patches - 39 - arm64 565 blitters

This enables all 565 blitters except S32A_D565_Opaque.

Here are some performance results:

S32_D565_Opaque:
================

+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -18.37% | -13.04% |
+-------+------------+------------+
| 2 | -9.90% | -13.78% |
+-------+------------+------------+
| 4 | -8.28% | -6.77% |
+-------+------------+------------+
| 8 | 157.63% | 78.15% |
+-------+------------+------------+
| 16 | 72.67% | 44.81% |
+-------+------------+------------+
| 64 | 76.78% | 40.89% |
+-------+------------+------------+
| 256 | 73.85% | 36.05% |
+-------+------------+------------+
| 1024 | 75.73% | 36.70% |
+-------+------------+------------+

S32_D565_Blend:
===============

+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -9.99% | -13.79% |
+-------+------------+------------+
| 2 | -9.17% | -6.74% |
+-------+------------+------------+
| 4 | -6.73% | -4.42% |
+-------+------------+------------+
| 8 | 163.31% | 112.82% |
+-------+------------+------------+
| 16 | 55.21% | 44.68% |
+-------+------------+------------+
| 64 | 54.09% | 41.99% |
+-------+------------+------------+
| 256 | 52.63% | 40.64% |
+-------+------------+------------+
| 1024 | 52.46% | 40.45% |
+-------+------------+------------+

S32A_D565_Blend:
================

+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -5.88% | -6.06% |
+-------+------------+------------+
| 2 | -4.74% | -0.01% |
+-------+------------+------------+
| 4 | -5.42% | -3.03% |
+-------+------------+------------+
| 8 | 78.78% | 77.96% |
+-------+------------+------------+
| 16 | 98.19% | 79.61% |
+-------+------------+------------+
| 64 | 111.56% | 72.60% |
+-------+------------+------------+
| 256 | 113.80% | 69.96% |
+-------+------------+------------+
| 1024 | 114.42% | 70.85% |
+-------+------------+------------+

S32_D565_Opaque_Dither:
=======================

+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -4.18% | -0.93% |
+-------+------------+------------+
| 2 | -2.43% | -2.04% |
+-------+------------+------------+
| 4 | -1.09% | -1.23% |
+-------+------------+------------+
| 8 | 184.89% | 136.53% |
+-------+------------+------------+
| 16 | 128.64% | 89.11% |
+-------+------------+------------+
| 64 | 132.68% | 100.98% |
+-------+------------+------------+
| 256 | 157.02% | 100.86% |
+-------+------------+------------+
| 1024 | 163.85% | 103.62% |
+-------+------------+------------+

S32_D565_Blend_Dither:
======================

+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -4.87% | 0.01% |
+-------+------------+------------+
| 2 | -2.71% | 2.97% |
+-------+------------+------------+
| 4 | -2.20% | 0.28% |
+-------+------------+------------+
| 8 | 149.76% | 146.80% |
+-------+------------+------------+
| 16 | 85.69% | 95.77% |
+-------+------------+------------+
| 64 | 88.81% | 101.39% |
+-------+------------+------------+
| 256 | 97.32% | 107.22% |
+-------+------------+------------+
| 1024 | 98.08% | 115.71% |
+-------+------------+------------+

S32A_D565_Opaque_Dither:
========================

+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -1.86% | 0.02% |
+-------+------------+------------+
| 2 | -0.58% | -1.52% |
+-------+------------+------------+
| 4 | -0.75% | 1.16% |
+-------+------------+------------+
| 8 | 240.74% | 155.16% |
+-------+------------+------------+
| 16 | 181.97% | 132.15% |
+-------+------------+------------+
| 64 | 203.11% | 136.48% |
+-------+------------+------------+
| 256 | 223.45% | 133.05% |
+-------+------------+------------+
| 1024 | 225.96% | 134.05% |
+-------+------------+------------+

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=skia:
R=djsollen@google.com, mtklein@google.com

Author: kevin.petit@arm.com

Review URL: https://codereview.chromium.org/317193003
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
3a2682a77f996f649de7699c9f7bee046c6d4f17 03-Jun-2014 mtklein <mtklein@chromium.org> SK_CPU_ARM --> SK_CPU_ARM32

That's what it means. It keeps confusing us as named today.

BUG=skia:
R=djsollen@google.com, mtklein@google.com, reed@google.com

Author: mtklein@chromium.org

Review URL: https://codereview.chromium.org/314643004
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
866b95d65dfc01af372bbed206ec067e04c1f533 03-Jun-2014 kevin.petit <kevin.petit@arm.com> ARM Skia NEON patches - 38 - arm64 8888 blitters

Enable NEON on arm64 for most 8888 blitters

This patch enables NEON optimisation for the Color32, S32_Blend,
S32A_Opaque blitters on arm64.

Here are the perf improvements vs the existing code:

Color32:
========

+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -2.39% | 23.78% |
+-------+------------+------------+
| 2 | -5.46% | 8.88% |
+-------+------------+------------+
| 4 | -4.74% | 4.89% |
+-------+------------+------------+
| 8 | 67.74% | 107.12% |
+-------+------------+------------+
| 16 | 40.03% | 101.20% |
+-------+------------+------------+
| 64 | 11.09% | 98.40% |
+-------+------------+------------+
| 256 | -2.20% | 74.81% |
+-------+------------+------------+
| 1024 | -4.28% | 78.90% |
+-------+------------+------------+

S32_Blend:
==========

+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | 7.84% | -6.75% |
+-------+------------+------------+
| 2 | 28.95% | 39.77% |
+-------+------------+------------+
| 4 | 5.80% | 8.26% |
+-------+------------+------------+
| 8 | 1.35% | 33.80% |
+-------+------------+------------+
| 16 | -2.13% | 41.13% |
+-------+------------+------------+
| 64 | -4.91% | 42.84% |
+-------+------------+------------+
| 256 | -6.53% | 48.72% |
+-------+------------+------------+
| 1024 | -6.65% | 46.66% |
+-------+------------+------------+

S32A_Opaque:
============

+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -7.51% | -19.06% |
+-------+------------+------------+
| 2 | -5.02% | -27.70% |
+-------+------------+------------+
| 4 | 15.38% | -21.66% |
+-------+------------+------------+
| 8 | -0.98% | 1.05% |
+-------+------------+------------+
| 16 | -7.35% | 3.34% |
+-------+------------+------------+
| 64 | 50.53% | 94.63% |
+-------+------------+------------+
| 256 | 71.17% | 164.10% |
+-------+------------+------------+
| 1024 | 79.58% | 197.60% |
+-------+------------+------------+

Signed-off-by: Kevin PETIT <kevin.petit@arm.com>

BUG=skia:
R=djsollen@google.com, mtklein@google.com

Author: kevin.petit@arm.com

Review URL: https://codereview.chromium.org/302283003
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
b53dd6cb4edb2523c064d52a25a026256a6ef403 30-Apr-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> Remove the unused SkCachePreload_arm

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=skia:
R=djsollen@google.com, mtklein@google.com

Author: kevin.petit@arm.com

Review URL: https://codereview.chromium.org/263553008

git-svn-id: http://skia.googlecode.com/svn/trunk@14456 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
5376325c7c6a2ba42d2713587bda6c76ea1bd7d7 29-Apr-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> ARM Skia NEON patches - 36 - Color32

Convert Color32 to intrinsics

This change is performance-neutral for high values of count and is
a big improvement for values smaller than 64.

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=skia:
R=djsollen@google.com, mtklein@google.com, borenet@google.com

Author: kevin.petit@arm.com

Review URL: https://codereview.chromium.org/258173005

git-svn-id: http://skia.googlecode.com/svn/trunk@14435 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
95c2e5532b094add82b007bdfcd4c64050b6b366 09-Apr-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> ARM Skia NEON patches - 22 - S32_D565_Blend

BlitRow565: new NEON version of S32_D565_Blend

This new implementation brings a good speedup in most cases and
gives exact results (removes one mismatch in gm).

Here are the benchmark results (speedup vs. existing S32A_D565_Blend):

+-------+-----------+------------+
| count | Cortex-A9 | Cortex-A15 |
+-------+-----------+------------+
| 1 | -26,7% | -27,5% |
+-------+-----------+------------+
| 2 | 0% | +53% |
+-------+-----------+------------+
| 4 | +38,3% | +26,5% |
+-------+-----------+------------+
| 8 | +10,9% | -4,5% |
+-------+-----------+------------+
| 16 | +18,2% | +1,6% |
+-------+-----------+------------+
| 64 | +22,3% | +8,75% |
+-------+-----------+------------+
| 256 | +12,3% | +11,2% |
+-------+-----------+------------+
| 1024 | +79,2% | +10,9% |
+-------+-----------+------------+

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=skia:
R=djsollen@google.com, mtklein@google.com

Author: kevin.petit@arm.com

Review URL: https://codereview.chromium.org/181523002

git-svn-id: http://skia.googlecode.com/svn/trunk@14103 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
fe68eb6a4081f60caf665ec632180e6d7c26a169 25-Feb-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> ARM Skia NEON patches - 25 - S32A_D565_Opaque_Dither clean/bugfix/speed

BlitRow565: S32A_D565_Opaque_Dither: some improvements

- Supports ARGB and ABGR
- Less magic numbers
- Reduced instruction count : 5-25% speedup
- Fixed indentation, removed some commented and useless code

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=skia:
R=djsollen@google.com, mtklein@google.com

Author: kevin.petit@arm.com

Review URL: https://codereview.chromium.org/177963003

git-svn-id: http://skia.googlecode.com/svn/trunk@13577 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
374ea4ee26b9d537c1b9635544105f915766f61b 21-Feb-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> ARM Skia NEON patches - 12 - S32_Blend

Blitrow32: S32_Blend fix and little speed improvement

- the results are now exactly similar as the C code
- the speed has improved, especially for small values of count

+-------+-----------+------------+
| count | Cortex-A9 | Cortex-A15 |
+-------+-----------+------------+
| 1 | +30% | +18% |
+-------+-----------+------------+
| 2 | 0 | 0 |
+-------+-----------+------------+
| 4 | - <1% | +14% |
+-------+-----------+------------+
| > 4 | -0.5..+5% | -0.5..+4% |
+-------+-----------+------------+

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=skia:

Committed: http://code.google.com/p/skia/source/detail?r=13532

R=djsollen@google.com, mtklein@google.com

Author: kevin.petit@arm.com

Review URL: https://codereview.chromium.org/158973002

git-svn-id: http://skia.googlecode.com/svn/trunk@13543 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
5b92499f8ff96760ba54fdd76f48a8af2088b3f5 21-Feb-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> Revert of ARM Skia NEON patches - 12 - S32_Blend (https://codereview.chromium.org/158973002/)

Reason for revert:
Breaking the build.

See http://108.170.219.164:10117/builders/Build-Ubuntu12-GCC-Arm7-Debug-Nexus4/builds/2966 (and others).

We are getting warnings that vsrc and vdst may be uninitialized. Please fix and resubmit.

Original issue's description:
> ARM Skia NEON patches - 12 - S32_Blend
>
> Blitrow32: S32_Blend fix and little speed improvement
>
> - the results are now exactly similar as the C code
> - the speed has improved, especially for small values of count
>
> +-------+-----------+------------+
> | count | Cortex-A9 | Cortex-A15 |
> +-------+-----------+------------+
> | 1 | +30% | +18% |
> +-------+-----------+------------+
> | 2 | 0 | 0 |
> +-------+-----------+------------+
> | 4 | - <1% | +14% |
> +-------+-----------+------------+
> | > 4 | -0.5..+5% | -0.5..+4% |
> +-------+-----------+------------+
>
> Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
>
> BUG=skia:
>
> Committed: http://code.google.com/p/skia/source/detail?r=13532

R=djsollen@google.com, mtklein@google.com, kevin.petit@arm.com
TBR=djsollen@google.com, kevin.petit@arm.com, mtklein@google.com
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Author: scroggo@google.com

Review URL: https://codereview.chromium.org/175433002

git-svn-id: http://skia.googlecode.com/svn/trunk@13534 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
dfff2737f8ad3e945a4dcbe175380d4b2a91a260 21-Feb-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> ARM Skia NEON patches - 12 - S32_Blend

Blitrow32: S32_Blend fix and little speed improvement

- the results are now exactly similar as the C code
- the speed has improved, especially for small values of count

+-------+-----------+------------+
| count | Cortex-A9 | Cortex-A15 |
+-------+-----------+------------+
| 1 | +30% | +18% |
+-------+-----------+------------+
| 2 | 0 | 0 |
+-------+-----------+------------+
| 4 | - <1% | +14% |
+-------+-----------+------------+
| > 4 | -0.5..+5% | -0.5..+4% |
+-------+-----------+------------+

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=skia:
R=djsollen@google.com, mtklein@google.com

Author: kevin.petit@arm.com

Review URL: https://codereview.chromium.org/158973002

git-svn-id: http://skia.googlecode.com/svn/trunk@13532 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
be233d63ca015c2991f4fe0802e4a31a71642062 13-Feb-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> ARM Skia NEON patches - 27 - S32A_D565_Blend


BlitRow565: new intrinsics version of S32A_D565_Blend

This new version is basically a rewrite of the existing code with
a few speed and accuracy improvements. There is a switch to enable
pixel perfect results at the cost of a (quite big) decrease of
performances (disabled in this patch).

Here are the benchmark results (speedup vs. existing code):

+-------+------------+------------+
| count | Cortex -A9 | Cortex-A15 |
+-------+------------+------------+
| 1 | +103.6% | +12% |
+-------+------------+------------+
| 2 | +3.6% | +21.6% |
+-------+------------+------------+
| 4 | +0.8% | -0.8% |
+-------+------------+------------+
| 8 | +3.9% | -1% |
+-------+------------+------------+
| 16 | +14.7% | +5.7% |
+-------+------------+------------+
| 64 | +18.1% | +13.2% |
+-------+------------+------------+
| 256 | +16.3% | +27.4% |
+-------+------------+------------+
| 1024 | +78.2% | +17.4% |
+-------+------------+------------+

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=skia:
R=djsollen@google.com, mtklein@google.com, halcanary@google.com

Author: kevin.petit@arm.com

Review URL: https://codereview.chromium.org/156113005

git-svn-id: http://skia.googlecode.com/svn/trunk@13438 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
bc25dfc798fff225ce65355ecda19d2b85bd0e74 08-Nov-2013 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> ARM Skia NEON patches - 31 - Xfermode: xfer16
Xfermode: xfer16

This adds support for 16bit Xfermodes. It also tunes the gcc test
macros in xfer32() to add compatibility for gcc > 4.

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=
R=djsollen@google.com, mtklein@google.com, reed@google.com

Author: kevin.petit.arm@gmail.com

Review URL: https://codereview.chromium.org/33063002

git-svn-id: http://skia.googlecode.com/svn/trunk@12192 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
4cc26324e3be5258fae9dc102aa6a3af7d1c96ea 26-Sep-2013 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> ARM Skia NEON patches - 24 - S32_D565_Blend_Dither slight speedup/bugfix




BlitRow565: S32_D565_Blend_Dither, slight speedup + bugfix

This patch adds a rewrite of S32_D565_Blend_Dither in intrinsics.
The newer version is faster (10-20% depending on the value of count)
and also supports ARGB as well as ABGR. It also adds the missing
assert at the beginning of the function.

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=
R=djsollen@google.com, mtklein@google.com

Author: kevin.petit.arm@gmail.com

Review URL: https://chromiumcodereview.appspot.com/22566002

git-svn-id: http://skia.googlecode.com/svn/trunk@11473 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
0060159457453ca45a47828648c8f29d5695983c 20-Sep-2013 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> ARM Skia NEON patches - 21 - new NEON S32_D565_Opaque




BlitRow565: NEON version of S32_D565_Opaque

Here's a new implementation of S32_D565_Opaque in NEON. It
improves dramatically the speed compared to S32A_D565_Opaque.

Here are the benchmark results (speedup vs. existing NEON):

+-------+-----------+------------+
| count | Cortex-A9 | Cortex-A15 |
+-------+-----------+------------+
| 1 | +130% | +139% |
+-------+-----------+------------+
| 2 | +65,2% | +51% |
+-------+-----------+------------+
| 4 | -25,5% | +10,2% |
+-------+-----------+------------+
| 8 | +63,8% | +32,1% |
+-------+-----------+------------+
| 16 | +110% | +49,2% |
+-------+-----------+------------+
| 64 | +153% | +123,5% |
+-------+-----------+------------+
| 256 | +151% | +144,7% |
+-------+-----------+------------+
| 1024 | +272% | +157,2% |
+-------+-----------+------------+

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=
R=djsollen@google.com, mtklein@google.com

Author: kevin.petit.arm@gmail.com

Review URL: https://chromiumcodereview.appspot.com/22351006

git-svn-id: http://skia.googlecode.com/svn/trunk@11415 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
688d362b4545e1beadebb7ba5886813d7038883c 18-Sep-2013 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> ARM Skia NEON patches - 23 - S32_D565_Opaque_Dither cleanup/bugfix/speed




BlitRow565: S32_D565_Opaque_Dither: cleaning / bugfix

This patch brings a little code cleaning (spaces/comments) and a little
speed improvement (by using post-incrementation in the asm) but more
importantly it fixes a bug on Linux. The new code now supports ARGB
as well as ABGR.

I removed the comment as I have confirmed with benchmarks that this
code bring a *massive* (3x-7x) speedup compared to the C code.

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=
R=djsollen@google.com, mtklein@google.com

Author: kevin.petit.arm@gmail.com

Review URL: https://chromiumcodereview.appspot.com/22269003

git-svn-id: http://skia.googlecode.com/svn/trunk@11339 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
a111e492b84c312a6bd5d5d9ef100dca48f4941d 09-Aug-2013 djsollen@google.com <djsollen@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> Cleanup the ARM blitrow optimizations

R=mtklein@google.com

Review URL: https://codereview.chromium.org/22229002

git-svn-id: http://skia.googlecode.com/svn/trunk@10652 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
1fdc6774280ffc18dd7e1247e430931aa2f58790 01-Aug-2013 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> ARM Skia NEON patches - 14 - S32A_Blend





Blitrow32: S32A_Blend new NEON version

Adding a NEON version of S32A_Blend_BlitRow32. Here are the
benchmark results:

+-------+--------------------------+--------------------------+
| | Speedup vs. C | Speedup vs. ARM asm |
| count +------------+-------------+------------+-------------+
| | Cortex A-9 | Cortex A-15 | Cortex A-9 | Cortex A-15 |
+-------+------------+-------------+------------+-------------+
| 1 | +8,5% | +18,5% | +0.9% | +2,9% |
+-------+------------+-------------+------------+-------------+
| 2 | +65,6% | +94% | +70,3% | +80% |
+-------+------------+-------------+------------+-------------+
| 4 | +42,4% | +87,8% | +56,8% | +84,4% |
+-------+------------+-------------+------------+-------------+
| 8 | +30% | +90% | +49,9% | +82,7% |
+-------+------------+-------------+------------+-------------+
| 16 | +23,1% | +95,4% | +46,6% | +87,6% |
+-------+------------+-------------+------------+-------------+
| 64 | +23,1% | +95,7% | +46,1% | +89,4% |
+-------+------------+-------------+------------+-------------+
| 256 | +35,5% | +122% | +53,6% | +99,2% |
+-------+------------+-------------+------------+-------------+
| 1024 | +61,8% | +101% | +64,2% | +91,2% |
+-------+------------+-------------+------------+-------------+

BUG=
R=djsollen@google.com

Author: kevin.petit.arm@gmail.com

Review URL: https://chromiumcodereview.appspot.com/18614010

git-svn-id: http://skia.googlecode.com/svn/trunk@10480 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
c2050e3a3ecfb8738b36e2add15c526e8e0f21fe 15-Jul-2013 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> ARM Skia NEON patches - 01 - Simple fixes

This series contains a few fairly non-controversial fixes.





Misc: remove dead references to neon 4444 functions

Misc: avoid the double _neon_neon suffix in the clamp matrix functions.
MAKENAME already adds the _neon suffix

Misc: a few stupid / obvious fixes

BUG=
R=djsollen@google.com

Author: kevin.petit.arm@gmail.com

Review URL: https://chromiumcodereview.appspot.com/18666004

git-svn-id: http://skia.googlecode.com/svn/trunk@10072 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
0a5699ee482c3b5ef1e857de8a2de06c6a1fa298 11-Jul-2013 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> ARM Skia NEON patches - 13 - S32A_Opaque





Blitrow32: S32A_Opaque code cleaning and speed improvement

- the old way of calculating alpha doesn't seem to be used anymore,
so remove the remaining code
- adding prefetching allows to improve performance greatly in some
cases at the expense of a little trade-off:

+-------+-----------+------------+
| count | Cortex-A9 | Cortex-A15 |
+-------+-----------+------------+
| 1,2 | 0 | 0 |
+-------+-----------+------------+
| 4 | 0 | -3% |
+-------+-----------+------------+
| 8 | 0 | -4% |
+-------+-----------+------------+
| 16 | 0 | -5% |
+-------+-----------+------------+
| 64 | +14% | 0 |
+-------+-----------+------------+
| 256 | +14% | +12% |
+-------+-----------+------------+
| 1024 | +115% | +15% |
+-------+-----------+------------+

BUG=
R=djsollen@google.com

Author: kevin.petit.arm@gmail.com

Review URL: https://chromiumcodereview.appspot.com/18459008

git-svn-id: http://skia.googlecode.com/svn/trunk@10026 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
c2532dd0b89e03ed158229872cb1ee06ae7f10fe 09-Apr-2013 djsollen@google.com <djsollen@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> Partial reapply of r5364 minus the non-neon code path.

See https://codereview.appspot.com/6465075 for a more detailed description of the contents of this CL.

Review URL: https://codereview.chromium.org/13060004

git-svn-id: http://skia.googlecode.com/svn/trunk@8579 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
efbe8e9bedda21a3e061ebf3d96431a0f250a654 07-Feb-2013 djsollen@google.com <djsollen@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> Fix errors when compiling with -Wall -Werror on Android.

This CL also turns those features on by default on Android

Review URL: https://codereview.appspot.com/7313049

git-svn-id: http://skia.googlecode.com/svn/trunk@7645 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
b78765e63b5de5a7dfe5f9f6813f6df81cae14ae 04-Sep-2012 robertphillips@google.com <robertphillips@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> Reverting r5364 (Update ARM and NEON optimizations for S32A_Opaque_BlitRow32)



git-svn-id: http://skia.googlecode.com/svn/trunk@5378 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
dc1a3badc702dd47d903863208315ddda660dbb9 31-Aug-2012 djsollen@google.com <djsollen@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> Update ARM and NEON optimizations for S32A_Opaque_BlitRow32.

These patches replace those written by ARM with ones provided by NVidia.

Review URL: https://codereview.appspot.com/6465075

git-svn-id: http://skia.googlecode.com/svn/trunk@5364 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
fbfcd5602128ec010c82cb733c9cdc0a3254f9f3 23-Aug-2012 rmistry@google.com <rmistry@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> Result of running tools/sanitize_source_files.py (which was added in https://codereview.appspot.com/6465078/)

This CL is part I of IV (I broke down the 1280 files into 4 CLs).
Review URL: https://codereview.appspot.com/6485054

git-svn-id: http://skia.googlecode.com/svn/trunk@5262 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp
a8dd1ce930811a51cc841f583424d507d95e7e78 09-Aug-2012 digit@google.com <digit@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> arm: dynamic NEON support for SkBlitRow_opts_arm.cpp

This patch moves all NEON-specific code from the source
src/opts/SkBlitRow_opts_arm.cpp into a new file that is
built as part of the 'opts_arm_neon' static library.
Review URL: https://codereview.appspot.com/6449110

git-svn-id: http://skia.googlecode.com/svn/trunk@5016 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkBlitRow_opts_arm_neon.cpp