History log of /external/skia/src/core/SkBlitRow_D32.cpp
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
d2ffd36eb62e99abe2920369d1e040954cc2044f 12-May-2015 mtklein <mtklein@chromium.org> Sk4px

Xfermode_SrcOver:
SSE: 2.08ms -> 2.03ms (~2% faster)
NEON: my N5 is noisy, but there appears to be no perf change

BUG=skia:

Review URL: https://codereview.chromium.org/1132273004
/external/skia/src/core/SkBlitRow_D32.cpp
95cc012ccaea20f372893ae277ea0a8a6339d094 28-Apr-2015 mtklein <mtklein@chromium.org> De-proc Color32

Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
which is no longer needed.

Seems handy to have SkTypes include the relevant intrinsics when
we know we've got them, but I'm not married to it.

Locally this looks like a pointlessly small perf win, but I'm mostly
keen to get all the code together.

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b

Committed: https://skia.googlesource.com/skia/+/d65dc0cedd5b50dd407b6ff8fdc39123f11511cc

CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Mips-Debug-Android-Trybot

Review URL: https://codereview.chromium.org/1104183004
/external/skia/src/core/SkBlitRow_D32.cpp
641c3ff7c680ef7935d47d2e68f8301acc79e3de 27-Apr-2015 mtklein <mtklein@google.com> Revert of De-proc Color32 (patchset #5 id:80001 of https://codereview.chromium.org/1104183004/)

Reason for revert:
duh

Original issue's description:
> De-proc Color32
>
> Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
> which is no longer needed.
>
> Seems handy to have SkTypes include the relevant intrinsics when
> we know we've got them, but I'm not married to it.
>
> Locally this looks like a pointlessly small perf win, but I'm mostly
> keen to get all the code together.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b
>
> Committed: https://skia.googlesource.com/skia/+/d65dc0cedd5b50dd407b6ff8fdc39123f11511cc

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1102363006
/external/skia/src/core/SkBlitRow_D32.cpp
d65dc0cedd5b50dd407b6ff8fdc39123f11511cc 27-Apr-2015 mtklein <mtklein@chromium.org> De-proc Color32

Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
which is no longer needed.

Seems handy to have SkTypes include the relevant intrinsics when
we know we've got them, but I'm not married to it.

Locally this looks like a pointlessly small perf win, but I'm mostly
keen to get all the code together.

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b

Review URL: https://codereview.chromium.org/1104183004
/external/skia/src/core/SkBlitRow_D32.cpp
498856ebc6f22d7018071bd6696756d7cd077ab8 27-Apr-2015 mtklein <mtklein@google.com> Revert of De-proc Color32 (patchset #4 id:60001 of https://codereview.chromium.org/1104183004/)

Reason for revert:
MIPS

Original issue's description:
> De-proc Color32
>
> Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
> which is no longer needed.
>
> Seems handy to have SkTypes include the relevant intrinsics when
> we know we've got them, but I'm not married to it.
>
> Locally this looks like a pointlessly small perf win, but I'm mostly
> keen to get all the code together.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1108163002
/external/skia/src/core/SkBlitRow_D32.cpp
376e9bc206b69d9190f38dfebb132a8769bbd72b 27-Apr-2015 mtklein <mtklein@chromium.org> De-proc Color32

Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
which is no longer needed.

Seems handy to have SkTypes include the relevant intrinsics when
we know we've got them, but I'm not married to it.

Locally this looks like a pointlessly small perf win, but I'm mostly
keen to get all the code together.

BUG=skia:

Review URL: https://codereview.chromium.org/1104183004
/external/skia/src/core/SkBlitRow_D32.cpp
a4a0aeb74808a0860f3e94588d0ceb0da9fed386 21-Apr-2015 mtklein <mtklein@google.com> Revert of Convert Color32 code to perfect blend. (patchset #6 id:100001 of https://codereview.chromium.org/1098913002/)

Reason for revert:
Xfermode_SrcOver not looking encouraging. Up to 50% regressions.

https://perf.skia.org/#3242

Original issue's description:
> Convert Color32 code to perfect blend.
>
> Before we commit to blend_256_round_alt, let's make sure blend_perfect is
> really slower in practice (i.e. regresses on perf.skia.org).
>
> blend_perfect is really the most desirable algorithm if we can afford it. Not
> only is it correct, but it's easy to think about and break into correct pieces:
> for instance, its div255() doesn't require any coordination with the multiply.
>
> This looks like a 30% hit according to microbenches. That said, microbenches
> said my previous change would be a 20-25% perf improvement, but it didn't end
> up showing a significant effect at a high level.
>
> As for correctness, I see a bunch of off-by-1 compared to blend_256_round_alt
> (exactly what we'd expect), and one off-by-3 in a GM that looks like it has a
> bunch of overdraw.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/61221e7f87a99765b0e034020e06bb018e2a08c2

TBR=reed@google.com,fmalita@chromium.org,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1083923006
/external/skia/src/core/SkBlitRow_D32.cpp
61221e7f87a99765b0e034020e06bb018e2a08c2 20-Apr-2015 mtklein <mtklein@chromium.org> Convert Color32 code to perfect blend.

Before we commit to blend_256_round_alt, let's make sure blend_perfect is
really slower in practice (i.e. regresses on perf.skia.org).

blend_perfect is really the most desirable algorithm if we can afford it. Not
only is it correct, but it's easy to think about and break into correct pieces:
for instance, its div255() doesn't require any coordination with the multiply.

This looks like a 30% hit according to microbenches. That said, microbenches
said my previous change would be a 20-25% perf improvement, but it didn't end
up showing a significant effect at a high level.

As for correctness, I see a bunch of off-by-1 compared to blend_256_round_alt
(exactly what we'd expect), and one off-by-3 in a GM that looks like it has a
bunch of overdraw.

BUG=skia:

Review URL: https://codereview.chromium.org/1098913002
/external/skia/src/core/SkBlitRow_D32.cpp
afe2ffb8ba5e7362a2ee6f4e1540c9ab22df2c1e 17-Apr-2015 mtklein <mtklein@chromium.org> Rework SSE and NEON Color32 algorithms to be more correct and faster.

This algorithm changes the blend math, guarded by SK_LEGACY_COLOR32_MATH. The new math is more correct: it's never off by more than 1, and correct in all the interesting 0x00 and 0xFF edge cases, where the old math was never off by more than 2, and not always correct on the edges.

If you look at tests/BlendTest.cpp, the old code was using the `blend_256_plus1_trunc` algorithm, while the new code uses `blend_256_round_alt`. Neither uses `blend_perfect`, which is about ~35% slower than `blend_256_round_alt`.

This will require an unfathomable number of rebaselines, first to Skia, then to Blink when I remove the guard.

I plan to follow up with some integer SIMD abstractions that can unify these two implementations into a single algorithm. This was originally what I was working on here, but the correctness gains seem to be quite compelling. The only places these two algorithms really differ greatly now is the kernel function, and even there they can really both be expressed abstractly as:
- multiply 8-bits and 8-bits producing 16-bits
- add 16-bits to 16-bits, returning the top 8 bits.
All the constants are the same, except SSE is a little faster to keep 8 16-bit inverse alphas, NEON's a little faster to keep 8 8-bit inverse alphas. I may need to take this small speed win back to unify the two.

We should expect a ~25% speedup on Intel (mostly from unrolling to 8 pixels) and a ~20% speedup on ARM (mostly from using vaddhn to add `color`, round, and narrow back down to 8-bit all into one instruction.

(I am probably missing several more related bugs here.)
BUG=skia:3738,skia:420,chromium:111470

Review URL: https://codereview.chromium.org/1092433002
/external/skia/src/core/SkBlitRow_D32.cpp
edeccc58606e0421a1ae275e391ee4347c6f52f6 25-Feb-2015 mtklein <mtklein@chromium.org> Clean up ColorRectProc plumbing.

We've always been using the portable ColorRect32, so we don't need the
ColorRectProc plumbing.

Furthermore, ColorRect32 doesn't seem to be very important (we're only using
it in the opaque case, which our row-by-row procs already specialize for).
Remove that too.

If we find we want specialization for really narrow rects again, let's put it in
blitRect() directly. It's pretty unlikely we're going to get platform-specific
speedup for blits to non-contiguous memory.

My local SKP comparison is +- 3%... most neutral I've ever seen.

BUG=skia:

Review URL: https://codereview.chromium.org/959873002
/external/skia/src/core/SkBlitRow_D32.cpp
f0ea77a3630e6d1c01d83aa5430b3780da9e88b6 21-May-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> SSE2 implementation of memcpy32

With SSE2 version memcpy32, S32_Opaque_BlitRow32() in SkBlitRow_D32.cpp
has about 30% performance improvement. Here are the data on desktop
i7-3770.
before:
bitmap_scale_filter_90_90 8888: cmsecs = 2.01
bitmaprect_FF_filter_trans 8888: cmsecs = 3.61
bitmaprect_FF_nofilter_trans 8888: cmsecs = 3.57
bitmaprect_FF_filter_identity 8888: cmsecs = 3.53
bitmaprect_FF_nofilter_identity 8888: cmsecs = 3.53
bitmap_4444_update 8888: cmsecs = 4.84
bitmap_4444_update_volatile 8888: cmsecs = 4.81
bitmap_4444 8888: cmsecs = 4.81
after:
bitmap_scale_filter_90_90 8888: cmsecs = 1.83
bitmaprect_FF_filter_trans 8888: cmsecs = 2.36
bitmaprect_FF_nofilter_trans 8888: cmsecs = 2.36
bitmaprect_FF_filter_identity 8888: cmsecs = 2.60
bitmaprect_FF_nofilter_identity 8888: cmsecs = 2.63
bitmap_4444_update 8888: cmsecs = 3.30
bitmap_4444_update_volatile 8888: cmsecs = 3.30
bitmap_4444 8888: cmsecs = 3.29

BUG=skia:
R=mtklein@google.com, reed@google.com, bsalomon@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/285313002

git-svn-id: http://skia.googlecode.com/svn/trunk@14822 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
e16efc1882ab34a0bb3ae361a2d37f840044cf87 26-Jan-2013 skia.committer@gmail.com <skia.committer@gmail.com@2bbb7eff-a529-9590-31e7-b0007b416f81> Sanitizing source files in Skia_Periodic_House_Keeping

git-svn-id: http://skia.googlecode.com/svn/trunk@7406 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
fa3d892f2e3beba39ccb9957be78aad529521740 30-Jul-2012 reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> remove outdated test code for TEST_SRC_ALPHA
Review URL: https://codereview.appspot.com/6457056

git-svn-id: http://skia.googlecode.com/svn/trunk@4840 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
9f63667ff221b72e96fa6a044ccb0dde12af6ebe 15-May-2012 reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> special-case filling narrow rects, where we can be faster than the SSE2 asm



git-svn-id: http://skia.googlecode.com/svn/trunk@3932 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
8dd90a926a8660da2bacc7af149f4ac5b2e7c64c 19-Mar-2012 tomhudson@google.com <tomhudson@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> (SSE2) acceleration for rectangular opaque erases.
15% speedup for rectangles < 31 px wide, 5% for larger.

http://codereview.appspot.com/5843050/



git-svn-id: http://skia.googlecode.com/svn/trunk@3423 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
c909a1ecadd422d91ff97d10ce08865290223b14 25-Oct-2011 reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> don't blend with zero in colorproc (forgot to return after memcpy check).



git-svn-id: http://skia.googlecode.com/svn/trunk@2527 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
edb606cb999887d54629f361bcbf57c5fede1bb0 18-Oct-2011 reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> move LCD blits into opts, so they can have assembly versions



git-svn-id: http://skia.googlecode.com/svn/trunk@2484 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
58af9a64701540c7f8083bc22a42d0bae3a5583c 12-Oct-2011 reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> separate SkBlitMask decl into its own header



git-svn-id: http://skia.googlecode.com/svn/trunk@2461 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
f2068adc2c4123075a17bf838171e498584312b3 13-Sep-2011 reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> add SK_RESTRICT to mask procs
separate out opaque and non for lcd16 blits



git-svn-id: http://skia.googlecode.com/svn/trunk@2253 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
ec3ed6a5ebf6f2c406d7bcf94b6bc34fcaeb976e 28-Jul-2011 epoger@google.com <epoger@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> Automatic update of all copyright notices to reflect new license terms.

I have manually examined all of these diffs and restored a few files that
seem to require manual adjustment.

The following files still need to be modified manually, in a separate CL:

android_sample/SampleApp/AndroidManifest.xml
android_sample/SampleApp/res/layout/layout.xml
android_sample/SampleApp/res/menu/sample.xml
android_sample/SampleApp/res/values/strings.xml
android_sample/SampleApp/src/com/skia/sampleapp/SampleApp.java
android_sample/SampleApp/src/com/skia/sampleapp/SampleView.java
experimental/CiCarbonSampleMain.c
experimental/CocoaDebugger/main.m
experimental/FileReaderApp/main.m
experimental/SimpleCocoaApp/main.m
experimental/iOSSampleApp/Shared/SkAlertPrompt.h
experimental/iOSSampleApp/Shared/SkAlertPrompt.m
experimental/iOSSampleApp/SkiOSSampleApp-Base.xcconfig
experimental/iOSSampleApp/SkiOSSampleApp-Debug.xcconfig
experimental/iOSSampleApp/SkiOSSampleApp-Release.xcconfig
gpu/src/android/GrGLDefaultInterface_android.cpp
gyp/common.gypi
gyp_skia
include/ports/SkHarfBuzzFont.h
include/views/SkOSWindow_wxwidgets.h
make.bat
make.py
src/opts/memset.arm.S
src/opts/memset16_neon.S
src/opts/memset32_neon.S
src/opts/opts_check_arm.cpp
src/ports/SkDebug_brew.cpp
src/ports/SkMemory_brew.cpp
src/ports/SkOSFile_brew.cpp
src/ports/SkXMLParser_empty.cpp
src/utils/ios/SkImageDecoder_iOS.mm
src/utils/ios/SkOSFile_iOS.mm
src/utils/ios/SkStream_NSData.mm
tests/FillPathTest.cpp
Review URL: http://codereview.appspot.com/4816058

git-svn-id: http://skia.googlecode.com/svn/trunk@1982 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
e6ea606fb92cd611b965806cb005f87495b261f2 07-Jul-2011 reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> re-enable SSE2 blitmask procs, only excluding if we're black (in which case
the protable version is still faster)



git-svn-id: http://skia.googlecode.com/svn/trunk@1819 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
ee467ee79d449ebe6ae7f7946e613cc70a479c69 09-Mar-2011 reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> Correct blitmask procs to recognize that we pass them an SkColor, and if they
want a SkPMColor, they need to call SkPreMultiplyColor()

Add Opaque and Black optimizations for blitmask_d32



git-svn-id: http://skia.googlecode.com/svn/trunk@911 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
981d4798007b91e2e19c13b171583927a56df63b 09-Mar-2011 reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> http://codereview.appspot.com/3980041/

Add blitmask procs (with optional platform acceleration)
patch by yaojie.yan



git-svn-id: http://skia.googlecode.com/svn/trunk@910 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
57f4969724a1dd88c8d9ae35a863e6cf621181d5 23-Feb-2011 djsollen@google.com <djsollen@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81> merge from android tree:
- optional parameters added to descriptorProc and allocPixels
- clip options to image decoders
- check for xfermode in blitter_a8
- UNROLL loops in blitrow

reviewed by reed@google.com



git-svn-id: http://skia.googlecode.com/svn/trunk@841 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
29e5054dd07c97c2195c5f64bf67aaa6b5afa204 16-Dec-2010 senorblanco@chromium.org <senorblanco@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> Fix perf regression in Color32.

The regression was due to the fact that we were calling PlatformColorProc() for
every span (which in turns makes CPUID, a fairly expensive call). Since we draw
a lot of rects, and rects have 1-pixel wide spans for the vertical segments,
that's a lot of CPUID.

Fixed by cacheing the result of PlatformColorProc(), as is done for the other
platform-specific blitters.

Review URL: http://codereview.appspot.com/3669042/



git-svn-id: http://skia.googlecode.com/svn/trunk@636 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
c3856384e4ab9a7ad5902696a5c972ab595b8467 13-Dec-2010 senorblanco@chromium.org <senorblanco@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> SSE2 optimizations for 32bit Color operation.

[Patch from weiwei.li@intel.com]

SSE2 optimization has been added by Stephen White before, this improves the skia
performance on SSE2-supporting platform. (please refer to below issues)

Issue 171055: More SSE2ification
Issue 157141: More SSE2ification
Issue 150060: minor tweaks to SSE2 code for -fPIC
Issue 144072: SSE2 optimizations for 32bit blending blitters

This CL implements SSE2 optimizations for the 32bit Color operation. Like above
issues, it uses CPUID to detect for SSE2 and changes the platform procs at
runtime as well. The 32bit Color operation is heavily used on Chrome HTML5
canvas operations. Take Microsoft IE test drives Pulsating Bubbles as example
(http://ie.microsoft.com/testdrive/Performance/PulsatingBubbles/Default.xhtml),
if running this cases on Chrome, the overhead of 32bit Color operation is about
40~50%. So this CL will make skia performance more better, and also make Chrome
HTML5 canvas performance more better.

Additional, this CL has passed the skia bench & tests validation, the result is
pretty good. We also apply this CL to the latest chromium, and re-run Microsoft
IE test drives Pulsating Bubbles, the performance is improved by almost 9~10%.


git-svn-id: http://skia.googlecode.com/svn/trunk@633 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
9272761b22746d2d22439c26f5555028f8e824da 04-Nov-2009 senorblanco@chromium.org <senorblanco@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> SSE2 optimizations for 32bit blending blitters.

This CL implements SSE2 optimizations for 3 of the 32bit blending blitters. It
uses CPUID to detect for SSE2 at runtime. In order to accomodate runtime
detection, it changes the platform procs from static arrays to static
functions.

It also includes an implementation of SkTime for Win32.

http://codereview.appspot.com/144072



git-svn-id: http://skia.googlecode.com/svn/trunk@418 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp
c4cae85752e3e486cf4eac8cd8128f57b6f40563 23-Sep-2009 reed@android.com <reed@android.com@2bbb7eff-a529-9590-31e7-b0007b416f81> add BlitRow procs for 32->32, to allow for neon and other optimizations.
call these new procs in (nearly) all the places we had inlined loops before.
In once instance (blitter_argb32::blitAntiH) we get different results by a
tiny bit. The new code is more accurate, and exactly inline with all of the
other like-minded blits, so I think the change is good going forward.



git-svn-id: http://skia.googlecode.com/svn/trunk@366 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/core/SkBlitRow_D32.cpp