History log of /external/skia/src/core/SkXfermodeF16.cpp
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
d47067392848ba132d4e86ffbeebe2dcacda9534 15-Nov-2016 Mike Reed <reed@google.com> make SkXfermode.h go away

This is step one:
- make SkXfermode useless to public clients
- everything they should need is in SkBlendMode.h

Step two:
- remove SkXfermode.h entirely (since skia core will already be using SkXfermodePriv.h)

BUG=skia:

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4534

Change-Id: If2cea9f71df92430ed6644edb98dd306c5572cbc
Reviewed-on: https://skia-review.googlesource.com/4534
Commit-Queue: Mike Reed <reed@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
/external/skia/src/core/SkXfermodeF16.cpp
6a01554e9e8687c56e6b6707e0c6a02062a1824e 09-Nov-2016 Mike Reed <reed@google.com> remove use of xfermode* in procs

BUG=skia:

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4592

Change-Id: I99f35924ff5325dfac527bb573a86d2d0366e0b3
Reviewed-on: https://skia-review.googlesource.com/4592
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Reed <reed@google.com>
/external/skia/src/core/SkXfermodeF16.cpp
8ae991e433d2c0814ea5579613f00173805ff057 22-Aug-2016 mtklein <mtklein@chromium.org> Flush denorm half floats to zero.

I think we convinced ourselves that denorms, while a good chunk of half floats,
cover a rather small fraction of the representable range, which is always
close enough to zero to flush.

This makes both paths of the conversion to or from float considerably simpler.

These functions now work for zero-or-normal half floats (excluding infinite, NaN).
I'm not aware of a term for this class so I've called them "ordinary".

A handful of GMs and SKPs draw differently in --config f16, but all imperceptibly.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2256023002

Review-Url: https://codereview.chromium.org/2256023002
/external/skia/src/core/SkXfermodeF16.cpp
6bdbf4412bd1a6fe26be1042ccf080174b13021f 19-Jul-2016 msarett <msarett@google.com> Improve naive SkColorXform to half floats

This should give us a good baseline to explore using SkRasterPipeline.

A particular colorxform to half float drops from 425us to 282us on my desktop.

Color Xform to Half Float (HP z620)
Original 425us
Trans16 (not 32) 355us
Vector Trans16 378us
Trans16 + Keep Halfs in Vector 335us
Vector Trans16 + Keep Halfs in Vector 282us
Final 282us

Color Xform to Half Float (Nexus 5X)
Original 556us
Final 472us

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2159993003
/external/skia/src/core/SkXfermodeF16.cpp
58e389b0518b46bbe58ba01c23443cf23c18435c 15-Jul-2016 mtklein <mtklein@chromium.org> Expand _01 half<->float limitation to _finite. Simplify.

It's become clear we need to sometimes deal with values <0 or >1.
I'm not yet convinced we care about NaN or +-inf.

We had some fairly clever tricks and optimizations here for NEON
and SSE. I've thrown them out in favor of a single implementation.
If we find the specializations mattered, we can certainly figure out
how to extend them to this new range/domain.

This happens to add a vectorized float -> half for ARMv7, which was
missing from the _01 version. (The SSE strategy was not portable to
platforms that flush denorm floats to zero.)

I've tested the full float range for FloatToHalf on my desktop and a 5x.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot

Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90
Review-Url: https://codereview.chromium.org/2145663003
/external/skia/src/core/SkXfermodeF16.cpp
64bbad360f30930befafee1bdefe4ad89f130dac 14-Jul-2016 mtklein <mtklein@google.com> Revert of Expand _01 half<->float limitation to _finite. Simplify. (patchset #7 id:120001 of https://codereview.chromium.org/2145663003/ )

Reason for revert:
Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast

Original issue's description:
> Expand _01 half<->float limitation to _finite. Simplify.
>
> It's become clear we need to sometimes deal with values <0 or >1.
> I'm not yet convinced we care about NaN or +-inf.
>
> We had some fairly clever tricks and optimizations here for NEON
> and SSE. I've thrown them out in favor of a single implementation.
> If we find the specializations mattered, we can certainly figure out
> how to extend them to this new range/domain.
>
> This happens to add a vectorized float -> half for ARMv7, which was
> missing from the _01 version. (The SSE strategy was not portable to
> platforms that flush denorm floats to zero.)
>
> I've tested the full float range for FloatToHalf on my desktop and a 5x.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
> CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90

TBR=msarett@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review-Url: https://codereview.chromium.org/2151023003
/external/skia/src/core/SkXfermodeF16.cpp
3296bee70d074bb8094b3229dbe12fa016657e90 14-Jul-2016 mtklein <mtklein@chromium.org> Expand _01 half<->float limitation to _finite. Simplify.

It's become clear we need to sometimes deal with values <0 or >1.
I'm not yet convinced we care about NaN or +-inf.

We had some fairly clever tricks and optimizations here for NEON
and SSE. I've thrown them out in favor of a single implementation.
If we find the specializations mattered, we can certainly figure out
how to extend them to this new range/domain.

This happens to add a vectorized float -> half for ARMv7, which was
missing from the _01 version. (The SSE strategy was not portable to
platforms that flush denorm floats to zero.)

I've tested the full float range for FloatToHalf on my desktop and a 5x.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2145663003
/external/skia/src/core/SkXfermodeF16.cpp
5608e2ed2299496eee3c57e0fe426ae9bd0d07a4 11-Jul-2016 mtklein <mtklein@chromium.org> Clean up hyper-local SkCpu feature test experiment.

This removes the code paths where we make SkCpu::Supports() calls
from within a tight loop. It keeps code paths using SkCpu::Supports()
to choose entire routines from src/opts/.

We can't rely on these hyper-local checks to be hoisted up reliably enough.
It worked pretty well with the first couple platforms we tried (e.g. Clang
on Linux/Mac) but we can't gaurantee it works everywhere.

Further, I'm not able to actually do anything fancy with those tests
outside of x86... I've not found a way to get, say, NEON+F16 conversion
code embedded into ordinary NEON code outside writing then entire function
in external assembly.

This whole idea becomes less important now that we've got a way to chain
separate function calls together efficiently. We can now, e.g., use an
AVX+F16C method to load some pixels, then chain that into an ordinary AVX
method to color filter them.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2138073002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2138073002
/external/skia/src/core/SkXfermodeF16.cpp
244a65350e52c9438931ecdc05a4913f29d343bc 19-Apr-2016 mtklein <mtklein@chromium.org> skcpu: sse4.1 floor, f16c f16<->f32

- floor with roundps is about 4.5x faster when available
- f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:

+0x180 movups (%r15), %xmm1
+0x184 vcvtph2ps (%rbx), %xmm2
+0x189 movaps %xmm1, %xmm3
+0x18c shufps $255, %xmm3, %xmm3
+0x190 movaps %xmm0, %xmm4
+0x193 subps %xmm3, %xmm4
+0x196 mulps %xmm2, %xmm4
+0x199 addps %xmm1, %xmm4
+0x19c vcvtps2ph $0, %xmm4, (%rbx)
+0x1a2 addq $16, %r15
+0x1a6 addq $8, %rbx
+0x1aa decl %r14d
+0x1ad jne +0x180

If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9

Committed: https://skia.googlesource.com/skia/+/3faf74b8364491ca806f523fbb1d8a97be592663

Review URL: https://codereview.chromium.org/1891513002
/external/skia/src/core/SkXfermodeF16.cpp
2c7f24093a394ccbe54a7db60ba79af14682e7fa 15-Apr-2016 mtklein <mtklein@google.com> Revert of skcpu: sse4.1 floor, f16c f16<->f32 (patchset #11 id:200001 of https://codereview.chromium.org/1891513002/ )

Reason for revert:
this depends on a CL I want to revert

Original issue's description:
> skcpu: sse4.1 floor, f16c f16<->f32
>
> - floor with roundps is about 4.5x faster when available
> - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:
>
> +0x180 movups (%r15), %xmm1
> +0x184 vcvtph2ps (%rbx), %xmm2
> +0x189 movaps %xmm1, %xmm3
> +0x18c shufps $255, %xmm3, %xmm3
> +0x190 movaps %xmm0, %xmm4
> +0x193 subps %xmm3, %xmm4
> +0x196 mulps %xmm2, %xmm4
> +0x199 addps %xmm1, %xmm4
> +0x19c vcvtps2ph $0, %xmm4, (%rbx)
> +0x1a2 addq $16, %r15
> +0x1a6 addq $8, %rbx
> +0x1aa decl %r14d
> +0x1ad jne +0x180
>
> If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9
>
> Committed: https://skia.googlesource.com/skia/+/3faf74b8364491ca806f523fbb1d8a97be592663

TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1897433002
/external/skia/src/core/SkXfermodeF16.cpp
3faf74b8364491ca806f523fbb1d8a97be592663 15-Apr-2016 mtklein <mtklein@chromium.org> skcpu: sse4.1 floor, f16c f16<->f32

- floor with roundps is about 4.5x faster when available
- f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:

+0x180 movups (%r15), %xmm1
+0x184 vcvtph2ps (%rbx), %xmm2
+0x189 movaps %xmm1, %xmm3
+0x18c shufps $255, %xmm3, %xmm3
+0x190 movaps %xmm0, %xmm4
+0x193 subps %xmm3, %xmm4
+0x196 mulps %xmm2, %xmm4
+0x199 addps %xmm1, %xmm4
+0x19c vcvtps2ph $0, %xmm4, (%rbx)
+0x1a2 addq $16, %r15
+0x1a6 addq $8, %rbx
+0x1aa decl %r14d
+0x1ad jne +0x180

If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9

Review URL: https://codereview.chromium.org/1891513002
/external/skia/src/core/SkXfermodeF16.cpp
834d9e109298ae704043128005f8c1bc622350f4 15-Apr-2016 mtklein <mtklein@google.com> Revert of skcpu: sse4.1 floor, f16c f16<->f32 (patchset #10 id:180001 of https://codereview.chromium.org/1891513002/ )

Reason for revert:
Need to change around my #if guards so that clang-cl is treated like GCC and Clang, rather than MSVC.

Original issue's description:
> skcpu: sse4.1 floor, f16c f16<->f32
>
> - floor with roundps is about 4.5x faster when available
> - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:
>
> +0x180 movups (%r15), %xmm1
> +0x184 vcvtph2ps (%rbx), %xmm2
> +0x189 movaps %xmm1, %xmm3
> +0x18c shufps $255, %xmm3, %xmm3
> +0x190 movaps %xmm0, %xmm4
> +0x193 subps %xmm3, %xmm4
> +0x196 mulps %xmm2, %xmm4
> +0x199 addps %xmm1, %xmm4
> +0x19c vcvtps2ph $0, %xmm4, (%rbx)
> +0x1a2 addq $16, %r15
> +0x1a6 addq $8, %rbx
> +0x1aa decl %r14d
> +0x1ad jne +0x180
>
> If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9

TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1891993002
/external/skia/src/core/SkXfermodeF16.cpp
cbe3c1af987d622ea67ef560d855b41bb14a0ce9 14-Apr-2016 mtklein <mtklein@chromium.org> skcpu: sse4.1 floor, f16c f16<->f32

- floor with roundps is about 4.5x faster when available
- f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:

+0x180 movups (%r15), %xmm1
+0x184 vcvtph2ps (%rbx), %xmm2
+0x189 movaps %xmm1, %xmm3
+0x18c shufps $255, %xmm3, %xmm3
+0x190 movaps %xmm0, %xmm4
+0x193 subps %xmm3, %xmm4
+0x196 mulps %xmm2, %xmm4
+0x199 addps %xmm1, %xmm4
+0x19c vcvtps2ph $0, %xmm4, (%rbx)
+0x1a2 addq $16, %r15
+0x1a6 addq $8, %rbx
+0x1aa decl %r14d
+0x1ad jne +0x180

If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1891513002
/external/skia/src/core/SkXfermodeF16.cpp
3dc6aac5da2dbeb56cda025f3699cad72e1a4b4e 14-Apr-2016 reed <reed@google.com> remove U16 support, just support F16

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1889753002

Review URL: https://codereview.chromium.org/1889753002
/external/skia/src/core/SkXfermodeF16.cpp