History log of /external/skia/src/opts/SkPMFloat_neon.h
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
7792dbf7ea089b3bcb81792a3ecda8a6f8b421e7 03-Apr-2015 mtklein <mtklein@chromium.org> Code's more readable when SkPMFloat is an Sk4f.
#floats

BUG=skia:
BUG=skia:3592

Committed: https://skia.googlesource.com/skia/+/6b5dab889579f1cc9e1b5278f4ecdc4c63fe78c9

CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot

Review URL: https://codereview.chromium.org/1061603002
/external/skia/src/opts/SkPMFloat_neon.h
e758579d4f85f4615d62e87847dd3f8b38e3a6e2 03-Apr-2015 mtklein <mtklein@google.com> Revert of Code's more readable when SkPMFloat is an Sk4f. (patchset #3 id:40001 of https://codereview.chromium.org/1061603002/)

Reason for revert:
missed some neon code

Original issue's description:
> Code's more readable when SkPMFloat is an Sk4f.
> #floats
>
> BUG=skia:
> BUG=skia:3592
>
> Committed: https://skia.googlesource.com/skia/+/6b5dab889579f1cc9e1b5278f4ecdc4c63fe78c9

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1056143004
/external/skia/src/opts/SkPMFloat_neon.h
6b5dab889579f1cc9e1b5278f4ecdc4c63fe78c9 03-Apr-2015 mtklein <mtklein@chromium.org> Code's more readable when SkPMFloat is an Sk4f.
#floats

BUG=skia:
BUG=skia:3592

Review URL: https://codereview.chromium.org/1061603002
/external/skia/src/opts/SkPMFloat_neon.h
3d626834b4b5ee2d6dda34da365dfe40520253aa 03-Apr-2015 mtklein <mtklein@chromium.org> New names for SkPMFloat methods.

BUG=skia:

Review URL: https://codereview.chromium.org/1055123002
/external/skia/src/opts/SkPMFloat_neon.h
07342361a3cda94376230b37d9e863052449653c 02-Apr-2015 mtklein <mtklein@chromium.org> SkPMFloat: fewer internal this->isValid() assertions.

Each of these conversion functions now only asserts is output is valid.
For SkPMColor -> SkPMFloat, we assert isValid().
For SkPMFloat -> SkPMColor, we SkPMColorAssert.

#floats

BUG=skia:
BUG=skia:3592

Review URL: https://codereview.chromium.org/1055093002
/external/skia/src/opts/SkPMFloat_neon.h
0340df5b3698aff1c9540fcdbc3dafd9d5ddb0b0 31-Mar-2015 mtklein <mtklein@chromium.org> back to Sk4f for SkPMColor
#floats

BUG=skia:
BUG=skia:3592

Review URL: https://codereview.chromium.org/1047823002
/external/skia/src/opts/SkPMFloat_neon.h
c9adb05b64fa0bfadf9d1a782afcda470da68c9e 30-Mar-2015 mtklein <mtklein@chromium.org> Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>

The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.

This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.

This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.

To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.

You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good

No public API changes.
TBR=reed@google.com

BUG=skia:3592

Review URL: https://codereview.chromium.org/1048593002
/external/skia/src/opts/SkPMFloat_neon.h
3d4c4a5a9feff961c6ba70443fa40ea1ca0a503e 26-Mar-2015 mtklein <mtklein@chromium.org> SkPMFloat::trunc()

Add and test trunc(), which is what get() used to be before rounding.
Using trunc() is a ~40% speedup on our linear gradient bench.

#neon #floats
BUG=skia:3592
#n5
#n9
CQ_INCLUDE_TRYBOTS=client.skia.android:Test-Android-Nexus5-Adreno330-Arm7-Debug-Trybot;client.skia.android:Test-Android-Nexus9-TegraK1-Arm64-Release-Trybot

Review URL: https://codereview.chromium.org/1032243002
/external/skia/src/opts/SkPMFloat_neon.h
15391ee4acaa092f52742f64968ad8046b74ca81 25-Mar-2015 mtklein <mtklein@chromium.org> Update 4-at-a-time APIs.

There is no reason to require the 4 SkPMFloats (registers) to be adjacent.
The only potential win in loads and stores comes from the SkPMColors being adjacent.

Makes no difference to existing bench.

BUG=skia:

Review URL: https://codereview.chromium.org/1035583002
/external/skia/src/opts/SkPMFloat_neon.h
f94fa7112f67af6fc5db19f86d8397307ba17105 18-Mar-2015 mtklein <mtklein@chromium.org> SkPMFloat: avoid loads and stores where possible.

A store/load pair like this is a redundant no-op:
store simd_register_a, memory_address
load memory_address, simd_register_a

Everyone seems to be good at removing those when using SSE, but GCC and Clang
are pretty terrible at this for NEON. We end up issuing both redundant
commands, usually to and from the stack. That's slow. Let's not do that.

This CL unions in the native SIMD register type into SkPMFloat, so that we can
assign to and from it directly, which is generating a lot better NEON code. On
my Nexus 5, the benchmarks improve from 36ns to 23ns.

SSE is just as fast either way, but I paralleled the NEON code for consistency.
It's a little terser. And because it needed the platform headers anyway, I
moved all includes into SkPMFloat.h, again only for consistency.

I'd union in Sk4f too to make its conversion methods a little clearer,
but MSVC won't let me (it has a copy constructor... they're apparently not up
to speed with C++11 unrestricted unions).

BUG=skia:

Review URL: https://codereview.chromium.org/1015083004
/external/skia/src/opts/SkPMFloat_neon.h
91fd7371ec80724ec53aae8f2d5a6753499d8963 06-Mar-2015 mtklein <mtklein@chromium.org> SKPMFloat: we can beat the naive loops when clamping

Clamping 4 at a time is now about 15% faster than 1 at a time with SSSE3.
Clamping 4 at a time is now about 20% faster with SSE2,
and this applies to non-clamping too (we still just clamp there).

In all cases, 4 at a time is never worse than 1 at a time,
and not clamping is never slower than clamping.

Here's all the bench results, with the numbers for portable code as a fun point
of reference:

SSSE3:
maxrss loops min median mean max stddev samples config bench
10M 2291 4.66ns 4.66ns 4.66ns 4.68ns 0% ▆█▁▁▁▇▁▇▁▃ nonrendering SkPMFloat_get_1x
10M 2040 5.29ns 5.3ns 5.3ns 5.32ns 0% ▃▆▃▃▁▁▆▃▃█ nonrendering SkPMFloat_clamp_1x
10M 7175 4.62ns 4.62ns 4.62ns 4.63ns 0% ▁▄▃████▃▄▇ nonrendering SkPMFloat_get_4x
10M 5801 4.89ns 4.89ns 4.89ns 4.91ns 0% █▂▄▃▁▃▄█▁▁ nonrendering SkPMFloat_clamp_4x

SSE2:
maxrss loops min median mean max stddev samples config bench
10M 1601 6.02ns 6.05ns 6.04ns 6.08ns 0% █▅▄▅▄▂▁▂▂▂ nonrendering SkPMFloat_get_1x
10M 2918 6.05ns 6.06ns 6.05ns 6.06ns 0% ▂▇▁▇▇▁▇█▇▂ nonrendering SkPMFloat_clamp_1x
10M 3569 5.43ns 5.45ns 5.44ns 5.45ns 0% ▄█▂██▇▁▁▇▇ nonrendering SkPMFloat_get_4x
10M 4168 5.43ns 5.43ns 5.43ns 5.44ns 0% █▄▇▁▇▄▁▁▁▁ nonrendering SkPMFloat_clamp_4x

Portable:
maxrss loops min median mean max stddev samples config bench
10M 500 27.8ns 28.1ns 28ns 28.2ns 0% ▃█▆▃▇▃▆▁▇▂ nonrendering SkPMFloat_get_1x
10M 770 40.1ns 40.2ns 40.2ns 40.3ns 0% ▅▁▃▂▆▄█▂▅▂ nonrendering SkPMFloat_clamp_1x
10M 1269 28.4ns 28.8ns 29.1ns 32.7ns 4% ▂▂▂█▂▁▁▂▁▁ nonrendering SkPMFloat_get_4x
10M 1439 40.2ns 40.4ns 40.4ns 40.5ns 0% ▆▆▆█▁▆▅█▅▆ nonrendering SkPMFloat_clamp_4x

SkPMFloat_neon.h is still one big TODO as far as 4-at-a-time APIs go.

BUG=skia:

Review URL: https://codereview.chromium.org/982123002
/external/skia/src/opts/SkPMFloat_neon.h
4e644f5d5020a6ec904734a3f521bfad173cb450 04-Mar-2015 mtklein <mtklein@chromium.org> Update SkPMFloat API a bit.

Instead of set(SkPMColor), add a constructor SkPMFloat(SkPMColor).
Replace setA(), setR(), etc. with a 4 float constructor.

And, promise to stick to SkPMColor order.

BUG=skia:

Review URL: https://codereview.chromium.org/977773002
/external/skia/src/opts/SkPMFloat_neon.h
0aebf5d0d3a2aef38a71885c85303583fdeaad57 03-Mar-2015 mtklein <mtklein@chromium.org> Test and fix SkPMFloat rounding.

SSE rounds for free (that was a happy accident: they also have a truncating version).
NEON does not, nor obviously the portable code, so they add 0.5 before truncating.

NOPRESUBMIT=true

BUG=skia:

Review URL: https://codereview.chromium.org/974643002
/external/skia/src/opts/SkPMFloat_neon.h
60d2a32b2dbbaabf4a0c133c8d3ff5ad888b8e5e 03-Mar-2015 mtklein <mtklein@chromium.org> Make SkPMFloats store floats in [0,255] instead of [0,1].

This pushes the cost of the *255 and *1/255 conversions onto only those code
paths that need it. We're not doing it any more efficiently than can be done
with Sk4f.

In microbenchmark isolation, this is about a 15% speedup.

BUG=skia:
NOPRESUBMIT=true

Review URL: https://codereview.chromium.org/973603002
/external/skia/src/opts/SkPMFloat_neon.h
e76161458afe25d987b97bf1ed40b03cdf2d7060 26-Feb-2015 mtklein <mtklein@chromium.org> Spin off some fixes to land right away.

BUG=skia:

Review URL: https://codereview.chromium.org/960023002
/external/skia/src/opts/SkPMFloat_neon.h
a2f4be76a9d453f1fdfd55b0cec6a683f23ffe0f 23-Feb-2015 mtklein <mtklein@chromium.org> Sketch SkPMFloat

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/50d2b3114b3e59dc84811881591bf25b2c1ecb9f

CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu13.10-GCC4.8-Arm7-Release-Android_Neon-Trybot

http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu13.10-GCC4.8-Arm7-Release-Android_Neon/builds/2120/steps/build%20most/logs/stdio

Review URL: https://codereview.chromium.org/936633002
/external/skia/src/opts/SkPMFloat_neon.h
088302756bde25083d6712b18dcd24644d4dcdbb 23-Feb-2015 mtklein <mtklein@google.com> Revert of Sketch SkPMFloat (patchset #15 id:270001 of https://codereview.chromium.org/936633002/)

Reason for revert:
http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu13.10-GCC4.8-Arm7-Release-Android_Neon/builds/2120/steps/build%20most/logs/stdio

Original issue's description:
> Sketch SkPMFloat
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/50d2b3114b3e59dc84811881591bf25b2c1ecb9f

TBR=reed@google.com,msarrett@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/952453004
/external/skia/src/opts/SkPMFloat_neon.h
50d2b3114b3e59dc84811881591bf25b2c1ecb9f 23-Feb-2015 mtklein <mtklein@chromium.org> Sketch SkPMFloat

BUG=skia:

Review URL: https://codereview.chromium.org/936633002
/external/skia/src/opts/SkPMFloat_neon.h