History log of /external/skia/src/opts/SkColor_opts_SSE2.h
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
f31fa24914c683abcc2c860093b142725c43fbe6 12-May-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> Make gMask_00FF00FF a constant

This is to optimize SkAlphaMulQ() in PIC mode. With the visibility=default
symbol the constant is not known at compile time (and is not a constant), but
instead is fetched through a double indirection through GOT. The function is
quite hot on one of the chromium benchmarks:
rasterize_and_record_micro.key_silk_cases.

This change replaces the symbol with a compile-time constant. As a bonus the
variable is not exported from the dynamic library, i. e. a cleaner library
interface.

See specific performance improvements on Android here:
http://goo.gl/iMuTDt

R=skyostil@chromium.org, tomhudson@chromium.org, mtklein@google.com, reed@google.com, tomhudson@google.com

Author: pasko@chromium.org

Review URL: https://codereview.chromium.org/270473003

git-svn-id: http://skia.googlecode.com/svn/trunk@14696 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkColor_opts_SSE2.h
7bf10152b129e3b0cad76f2abd5136ccbc74a393 25-Apr-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> Xfermode: SSE2 implementation of overlay_modeproc

With SSE2 optimization, performance of Xfermode_Overlay will improve
about 35% on desktop i7-3770. Here are the data:
before:
Xfermode_Overlay 8888: cmsecs = 44.17 565: cmsecs = 59.27
after:
Xfermode_Overlay 8888: cmsecs = 28.30 565: cmsecs = 35.84

BUG=skia:
R=mtklein@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/232783002

git-svn-id: http://skia.googlecode.com/svn/trunk@14370 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkColor_opts_SSE2.h
54299654e964ab53189f774e30bce2adebbdc857 14-Apr-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> Xfermode: SSE2 implementation of a number of simple transfer modes

These modes share some common code and not very complex, so group them
together. This CL yields about 50% performance improvement on desktop
i7-3770. Here are the data:
before:
Xfermode_Screen 8888: cmsecs = 30.25 565: cmsecs = 46.81
Xfermode_Modulate 8888: cmsecs = 22.48 565: cmsecs = 40.06
Xfermode_Plus 8888: cmsecs = 21.04 565: cmsecs = 37.51
Xfermode_Xor 8888: cmsecs = 37.18 565: cmsecs = 52.53
Xfermode_DstATop 8888: cmsecs = 28.97 565: cmsecs = 46.42
Xfermode_SrcATop 8888: cmsecs = 29.74 565: cmsecs = 46.25
Xfermode_DstOut 8888: cmsecs = 5.34 565: cmsecs = 24.53
Xfermode_SrcOut 8888: cmsecs = 12.25 565: cmsecs = 24.39
Xfermode_DstIn 8888: cmsecs = 5.30 565: cmsecs = 24.50
Xfermode_SrcIn 8888: cmsecs = 12.05 565: cmsecs = 25.40
Xfermode_DstOver 8888: cmsecs = 12.45 565: cmsecs = 0.15
Xfermode_SrcOver 8888: cmsecs = 2.68 565: cmsecs = 4.42
after:
Xfermode_Screen 8888: cmsecs = 13.68 565: cmsecs = 21.73
Xfermode_Modulate 8888: cmsecs = 13.25 565: cmsecs = 20.97
Xfermode_Plus 8888: cmsecs = 9.77 565: cmsecs = 16.71
Xfermode_Xor 8888: cmsecs = 17.64 565: cmsecs = 25.62
Xfermode_DstATop 8888: cmsecs = 15.99 565: cmsecs = 23.74
Xfermode_SrcATop 8888: cmsecs = 15.69 565: cmsecs = 23.40
Xfermode_DstOut 8888: cmsecs = 4.77 565: cmsecs = 11.85
Xfermode_SrcOut 8888: cmsecs = 4.98 565: cmsecs = 11.84
Xfermode_DstIn 8888: cmsecs = 4.68 565: cmsecs = 11.72
Xfermode_SrcIn 8888: cmsecs = 4.93 565: cmsecs = 11.79
Xfermode_DstOver 8888: cmsecs = 5.04 565: cmsecs = 0.15
Xfermode_SrcOver 8888: cmsecs = 2.69 565: cmsecs = 4.42

BUG=skia:
R=mtklein@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/232793002

git-svn-id: http://skia.googlecode.com/svn/trunk@14176 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkColor_opts_SSE2.h
c524e98f1edf06b53e65543f5f28217fa13b7aa9 09-Apr-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> Xfermode: SSE2 implementation of multiply_modeproc

This patch implements basics for Xfermode SSE optimization. Based on
these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
implementation for other modes will come in future. With this patch
performance of Xfermode_Multiply will improve about 45%. Here are the
data on desktop i7-3770.
before:
Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65
after:
Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87

BUG=

Committed: http://code.google.com/p/skia/source/detail?r=14006

Committed: http://code.google.com/p/skia/source/detail?r=14050

R=mtklein@google.com, robertphillips@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/202903004

git-svn-id: http://skia.googlecode.com/svn/trunk@14107 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkColor_opts_SSE2.h
77815fd74df355b8d6eff8a91fd10cc65033a79f 03-Apr-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> Revert of Xfermode: SSE2 implementation of multiply_modeproc (https://codereview.chromium.org/202903004/)

Reason for revert:
It looks like serialization is broken. The serialize and pipe-cross-process tests are failing and turning (at least the Ubuntu12 and Win7) bots red

Original issue's description:
> Xfermode: SSE2 implementation of multiply_modeproc
>
> This patch implements basics for Xfermode SSE optimization. Based on
> these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
> implementation for other modes will come in future. With this patch
> performance of Xfermode_Multiply will improve about 45%. Here are the
> data on desktop i7-3770.
> before:
> Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65
> after:
> Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87
>
> BUG=
>
> Committed: http://code.google.com/p/skia/source/detail?r=14006
>
> Committed: http://code.google.com/p/skia/source/detail?r=14050

R=mtklein@google.com, qiankun.miao@intel.com
TBR=mtklein@google.com, qiankun.miao@intel.com
NOTREECHECKS=true
NOTRY=true
BUG=

Author: robertphillips@google.com

Review URL: https://codereview.chromium.org/224253003

git-svn-id: http://skia.googlecode.com/svn/trunk@14053 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkColor_opts_SSE2.h
c3118739277beb7678973d47e3d71bb863929bce 03-Apr-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> Xfermode: SSE2 implementation of multiply_modeproc

This patch implements basics for Xfermode SSE optimization. Based on
these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
implementation for other modes will come in future. With this patch
performance of Xfermode_Multiply will improve about 45%. Here are the
data on desktop i7-3770.
before:
Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65
after:
Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87

BUG=

Committed: http://code.google.com/p/skia/source/detail?r=14006

R=mtklein@google.com, robertphillips@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/202903004

git-svn-id: http://skia.googlecode.com/svn/trunk@14050 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkColor_opts_SSE2.h
079d2986002a6eb407ba4699bae58920fdc355a0 01-Apr-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> Revert of Xfermode: SSE2 implementation of multiply_modeproc (https://codereview.chromium.org/202903004/)

Reason for revert:
Breaking builds

Original issue's description:
> Xfermode: SSE2 implementation of multiply_modeproc
>
> This patch implements basics for Xfermode SSE optimization. Based on
> these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
> implementation for other modes will come in future. With this patch
> performance of Xfermode_Multiply will improve about 45%. Here are the
> data on desktop i7-3770.
> before:
> Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65
> after:
> Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87
>
> BUG=
>
> Committed: http://code.google.com/p/skia/source/detail?r=14006

R=mtklein@google.com, qiankun.miao@intel.com
TBR=mtklein@google.com, qiankun.miao@intel.com
NOTREECHECKS=true
NOTRY=true
BUG=

Author: robertphillips@google.com

Review URL: https://codereview.chromium.org/219243009

git-svn-id: http://skia.googlecode.com/svn/trunk@14007 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkColor_opts_SSE2.h
25f7455f3a7cf2c440509bead85486079f1e4b31 01-Apr-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> Xfermode: SSE2 implementation of multiply_modeproc

This patch implements basics for Xfermode SSE optimization. Based on
these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
implementation for other modes will come in future. With this patch
performance of Xfermode_Multiply will improve about 45%. Here are the
data on desktop i7-3770.
before:
Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65
after:
Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87

BUG=
R=mtklein@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/202903004

git-svn-id: http://skia.googlecode.com/svn/trunk@14006 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkColor_opts_SSE2.h
475910750cdc7d14da3071d4052ba9ab98383be9 19-Feb-2014 commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> SSE2 implementation of S32A_D565_Opaque

microbenchmark of S32A_D565_Opaque() shows a 3x speedup after SSE optimization with various count on i7-3770.

BUG=
R=mtklein@google.com, reed@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/138163013

git-svn-id: http://skia.googlecode.com/svn/trunk@13495 2bbb7eff-a529-9590-31e7-b0007b416f81
/external/skia/src/opts/SkColor_opts_SSE2.h