History log of /external/skia/src/opts/SkSwizzler_opts.h
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
e18fa440e74e9af0324de0a1de9b6ffb0fe3c3d3 09-Jun-2016 mtklein <mtklein@chromium.org> Move immintrin/arm_neon includes to where they are used.

On my Mac (so, immintrin), this improves compile time, both wall and cpu,
by about 16%. To test I ran this on an SSD with files hot in their caches:

$ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
ninja -C out/Release -t clean && \
time ninja -C out/Release

Before: 159 wall / 3367 cpu
159 wall / 3368 cpu

After: 137 wall / 2860 cpu
136 wall / 2863 cpu

I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
That made no signficant difference, so I've kept immintrin for its simplicity.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

TBR=reed@google.com
No public API changes.

Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a
Review-Url: https://codereview.chromium.org/2045633002
/external/skia/src/opts/SkSwizzler_opts.h
50bcb189f8785a599a3024d8eba4681c2e8ca37a 08-Jun-2016 mtklein <mtklein@google.com> Revert of Move immintrin/arm_neon includes to where they are used. (patchset #2 id:20001 of https://codereview.chromium.org/2045633002/ )

Reason for revert:
Appears to have broken the ARMv7 aspect of the Google3 roll in bizarre seemingly-unrelated ways.

Original issue's description:
> Move immintrin/arm_neon includes to where they are used.
>
> On my Mac (so, immintrin), this improves compile time, both wall and cpu,
> by about 16%. To test I ran this on an SSD with files hot in their caches:
>
> $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
> ninja -C out/Release -t clean && \
> time ninja -C out/Release
>
> Before: 159 wall / 3367 cpu
> 159 wall / 3368 cpu
>
> After: 137 wall / 2860 cpu
> 136 wall / 2863 cpu
>
> I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
> That made no signficant difference, so I've kept immintrin for its simplicity.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> TBR=reed@google.com
> No public API changes.
>
> Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a

TBR=herb@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review-Url: https://codereview.chromium.org/2046213002
/external/skia/src/opts/SkSwizzler_opts.h
12dfaaa53c23f3d03050bde8f64136ac1f44164a 07-Jun-2016 mtklein <mtklein@chromium.org> Move immintrin/arm_neon includes to where they are used.

On my Mac (so, immintrin), this improves compile time, both wall and cpu,
by about 16%. To test I ran this on an SSD with files hot in their caches:

$ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
ninja -C out/Release -t clean && \
time ninja -C out/Release

Before: 159 wall / 3367 cpu
159 wall / 3368 cpu

After: 137 wall / 2860 cpu
136 wall / 2863 cpu

I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
That made no signficant difference, so I've kept immintrin for its simplicity.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

TBR=reed@google.com
No public API changes.

Review-Url: https://codereview.chromium.org/2045633002
/external/skia/src/opts/SkSwizzler_opts.h
c5c322d8ecfc05718f9f04360956c4f1f9dc33c1 08-Feb-2016 msarett <msarett@google.com> Optimize CMYK->RGBA (BGRA) transform for jpeg decodes

Swizzle Bench Runtime
Nexus 6P 0.14x
Dell Venue 8 0.12x

CMYK Jpeg Decode Runtime
Nexus 6P 0.81x
Dell Venue 8 0.85x

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1676773003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1676773003
/external/skia/src/opts/SkSwizzler_opts.h
095742419d0277a4fb0d499a05ff29b7506f1c5e 04-Feb-2016 msarett <msarett@google.com> SSE optimizations for GrayAlpha -> RGBA/BGRA Premul/Unpremul

Swizzle Runtime (Dell Venue 8)
Unpremul 0.17x
Premul 0.20x

PNG Decode Runtime on GrayAlpha Encoded PNGs (Dell Venue 8)
Unpremul Regular 0.91x
Unpremul ZeroInit 0.92x
Premul Regular 0.84x
Premul ZeroInit 0.85x

BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1666853002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1666853002
/external/skia/src/opts/SkSwizzler_opts.h
1e06079b259d1091b735492b2f71d9897c14c608 03-Feb-2016 msarett <msarett@google.com> NEON optimizations for GrayAlpha -> RGBA/BGRA Premul/Unpremul

PNG Decode Time Nexus 6P (for a test set of GrayAlpha encoded PNGs)
Regular Unpremul 0.91x
Zero Init Unpremul 0.92x
Regular Premul 0.84x
Zero Init Premul 0.86x

BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1663623002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1663623002
/external/skia/src/opts/SkSwizzler_opts.h
0700651128f8c505da65e651f9788589593f07c4 02-Feb-2016 msarett <msarett@google.com> SSSE3 optimizations for gray -> RGBA (or BGRA)

Swizzle Bench Runtime
Dell Venue 8 0.16x
HP z620 0.47x

PNG Decode Time (for test set of gray encoded PNGs)
Dell Venue 8 0.80x
HP z620 0.96x

BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1657393002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1657393002
/external/skia/src/opts/SkSwizzler_opts.h
2eff71c9b5f984b58961e5a6b4e66774c4385224 02-Feb-2016 msarett <msarett@google.com> NEON optimizations for gray -> RGBA (or BGRA) conversions

Swizzle Bench Runtime
Nexus 6P 0.32x
Nexus 9 0.89x

PNG Decode Time (for test set of gray encoded PNGs)
Nexus 6P 0.88x
Nexus 9 0.91x

BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1656383002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1656383002
/external/skia/src/opts/SkSwizzler_opts.h
13aa1a5ad97156e35184970fc1ce1aaf3c50c91c 22-Jan-2016 msarett <msarett@google.com> SSSE3 opts for RGB -> RGB(FF) or BGR(FF)

Swizzle Bench Runtime
z620 0.21x
Dell Venue 8 0.26x

RGB PNGs Decode Runtime
z620 0.91x
Dell Venus 8 0.96x

BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1618603003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1618603003
/external/skia/src/opts/SkSwizzler_opts.h
f1b8b6ae34e5a1f4b29e423401da39f88f0c117a 22-Jan-2016 msarett <msarett@google.com> Use NEON optimizations for RGB -> RGB(FF) or BGR(FF) in SkSwizzler

Swizzle Bench Runtime Nexus 6P
xxx_xxxa 0.32x
xxx_swaprb_xxxa 0.31x

Swizzle Bench Runtime Nexus 9
xxx_xxxa 1.11x
xxx_swaprb_xxxa 1.14x
(This is a slow down.)

Swizzle Bench Runtime Nexus 5
xxx_xxxa 0.12x
xxx_swaprb 0.12x

RGB PNG Decode Runtime
Nexus 6P 0.94x
Nexus 9 0.98x

I don't know how to explain the fact that the Swizzle Bench was
slower on Nexus 9, but the decode times got faster.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1618003002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1618003002
/external/skia/src/opts/SkSwizzler_opts.h
8bf7b79cf9776b4edb3f6810e5ab8c80c49d3480 22-Jan-2016 mtklein <mtklein@chromium.org> Refactor swizzle names and types.

- Plant a flag to say "pretend all the inputs are RGBA".
This is how libpng thinks.
This is the opposite of what the implementation had been doing,
so I've rearranged everything to reflect the new orientation.

- Rewrite the names to be less mysterious looking. No more Xs.

- Make the src type uniformly const void*, to allow for 888 (RGB) srcs.

This should be performance and pixel neutral. (Please revert if it's not.)

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1626463002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1626463002
/external/skia/src/opts/SkSwizzler_opts.h
53b9d29b973f2828624f097bf110f1c7acc4b593 19-Jan-2016 msarett <msarett@google.com> Add SSSE3 Optimizations for premul and swap

Improves deocde performance for RGBA pngs.

Swizzler Time on z620 (clang):
SwapPremul 0.24x
Premul 0.24x
Swap 0.37x
Decode Time on z620 (clang):
Premul ZeroInit Decodes 0.88x
Unpremul ZeroInit Decodes 0.94x
Premul Regular Decodes 0.91x
Unpremul Regular Decodes 0.98x

Swizzler Time in Dell Venue 8 (gcc):
SwapPremul 0.14x
Premul 0.14x
Swap 0.08x
Decode Time on Dell Venus 8 (gcc):
Premul ZeroInit Decodes 0.79x
Premul Regular Decodes 0.77x

Note:
ZeroInit means memory is zero initialized, and we do not write to
memory for large sections of zero pixels (memory use opt for Android).

BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1601883002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1601883002
/external/skia/src/opts/SkSwizzler_opts.h
03108de163354fa574679ad153b58ce57126b2ba 15-Jan-2016 msarett <msarett@google.com> Add NEON swap opts and use opts in SkSwizzler

All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler now use fast
options (with the exception of conversions to 565).

Swizzle Time for swap_rb
0.94x Nexus 9
0.81x Nexus 6P

Unpremul Decode Time for RGBA PNGs***
ZeroInit 0.93x Nexus 9
Regular 0.94x Nexus 9
ZeroInit 0.97x Nexus 6P
ZeroInit 0.95x Nexus 6P

***Two Notes:
The improvements here are actually due to taking advantage of
memcpy() (no need to swap, the bytes are already in the proper
order).
ZeroInit skips writing zeros to zero initialized memory. This
is a memory use opt in Android.

BMP decodes should also benefit from these improvements.

I am relying on Gold to help test all possible cases.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1581933006
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1581933006
/external/skia/src/opts/SkSwizzler_opts.h
3a24f459582f2665f0e66bd35a0d8f46a1c4c72f 13-Jan-2016 msarett <msarett@google.com> Optimized premultiplying swizzles for NEON

Improves decode performance for RGBA encoded PNGs.

Swizzle Time on Nexus 9 (with clang):
SwapPremul 0.44x
Premul 0.44x
Decode Time On Nexus 9 (with clang):
ZeroInit Decodes 0.85x
Regular Decodes 0.86x

Swizzle Time on Nexus 6P (with clang)
SwapPremul 0.14x
Premul 0.14x
Decode Time On Nexus 6P (with clang):
ZeroInit Decodes 0.93x
Regular Decodes 0.95x

Notes:
ZeroInit means memory is zero initialized, and we do not write to
memory for large sections of zero pixels (memory use opt for Android).

A profile on Nexus 9 shows that the premultiplication step of PNG
decoding is now ~5% of decode time (down from ~20%).

BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1577703006
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1577703006
/external/skia/src/opts/SkSwizzler_opts.h