Cross Reference: /external/skia/src/opts/SkBlitRow_opts

History log of /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
95cc012ccaea20f372893ae277ea0a8a6339d094	28-Apr-2015	mtklein <mtklein@chromium.org>	De-proc Color32 Also strips SK_SUPPORT_LEGACY_COLOR32_MATH, which is no longer needed. Seems handy to have SkTypes include the relevant intrinsics when we know we've got them, but I'm not married to it. Locally this looks like a pointlessly small perf win, but I'm mostly keen to get all the code together. BUG=skia: Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b Committed: https://skia.googlesource.com/skia/+/d65dc0cedd5b50dd407b6ff8fdc39123f11511cc CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Mips-Debug-Android-Trybot Review URL: https://codereview.chromium.org/1104183004 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
641c3ff7c680ef7935d47d2e68f8301acc79e3de	27-Apr-2015	mtklein <mtklein@google.com>	Revert of De-proc Color32 (patchset #5 id:80001 of https://codereview.chromium.org/1104183004/) Reason for revert: duh Original issue's description: > De-proc Color32 > > Also strips SK_SUPPORT_LEGACY_COLOR32_MATH, > which is no longer needed. > > Seems handy to have SkTypes include the relevant intrinsics when > we know we've got them, but I'm not married to it. > > Locally this looks like a pointlessly small perf win, but I'm mostly > keen to get all the code together. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b > > Committed: https://skia.googlesource.com/skia/+/d65dc0cedd5b50dd407b6ff8fdc39123f11511cc TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1102363006 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
d65dc0cedd5b50dd407b6ff8fdc39123f11511cc	27-Apr-2015	mtklein <mtklein@chromium.org>	De-proc Color32 Also strips SK_SUPPORT_LEGACY_COLOR32_MATH, which is no longer needed. Seems handy to have SkTypes include the relevant intrinsics when we know we've got them, but I'm not married to it. Locally this looks like a pointlessly small perf win, but I'm mostly keen to get all the code together. BUG=skia: Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b Review URL: https://codereview.chromium.org/1104183004 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
498856ebc6f22d7018071bd6696756d7cd077ab8	27-Apr-2015	mtklein <mtklein@google.com>	Revert of De-proc Color32 (patchset #4 id:60001 of https://codereview.chromium.org/1104183004/) Reason for revert: MIPS Original issue's description: > De-proc Color32 > > Also strips SK_SUPPORT_LEGACY_COLOR32_MATH, > which is no longer needed. > > Seems handy to have SkTypes include the relevant intrinsics when > we know we've got them, but I'm not married to it. > > Locally this looks like a pointlessly small perf win, but I'm mostly > keen to get all the code together. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1108163002 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
376e9bc206b69d9190f38dfebb132a8769bbd72b	27-Apr-2015	mtklein <mtklein@chromium.org>	De-proc Color32 Also strips SK_SUPPORT_LEGACY_COLOR32_MATH, which is no longer needed. Seems handy to have SkTypes include the relevant intrinsics when we know we've got them, but I'm not married to it. Locally this looks like a pointlessly small perf win, but I'm mostly keen to get all the code together. BUG=skia: Review URL: https://codereview.chromium.org/1104183004 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
a4a0aeb74808a0860f3e94588d0ceb0da9fed386	21-Apr-2015	mtklein <mtklein@google.com>	Revert of Convert Color32 code to perfect blend. (patchset #6 id:100001 of https://codereview.chromium.org/1098913002/) Reason for revert: Xfermode_SrcOver not looking encouraging. Up to 50% regressions. https://perf.skia.org/#3242 Original issue's description: > Convert Color32 code to perfect blend. > > Before we commit to blend_256_round_alt, let's make sure blend_perfect is > really slower in practice (i.e. regresses on perf.skia.org). > > blend_perfect is really the most desirable algorithm if we can afford it. Not > only is it correct, but it's easy to think about and break into correct pieces: > for instance, its div255() doesn't require any coordination with the multiply. > > This looks like a 30% hit according to microbenches. That said, microbenches > said my previous change would be a 20-25% perf improvement, but it didn't end > up showing a significant effect at a high level. > > As for correctness, I see a bunch of off-by-1 compared to blend_256_round_alt > (exactly what we'd expect), and one off-by-3 in a GM that looks like it has a > bunch of overdraw. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/61221e7f87a99765b0e034020e06bb018e2a08c2 TBR=reed@google.com,fmalita@chromium.org,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1083923006 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
61221e7f87a99765b0e034020e06bb018e2a08c2	20-Apr-2015	mtklein <mtklein@chromium.org>	Convert Color32 code to perfect blend. Before we commit to blend_256_round_alt, let's make sure blend_perfect is really slower in practice (i.e. regresses on perf.skia.org). blend_perfect is really the most desirable algorithm if we can afford it. Not only is it correct, but it's easy to think about and break into correct pieces: for instance, its div255() doesn't require any coordination with the multiply. This looks like a 30% hit according to microbenches. That said, microbenches said my previous change would be a 20-25% perf improvement, but it didn't end up showing a significant effect at a high level. As for correctness, I see a bunch of off-by-1 compared to blend_256_round_alt (exactly what we'd expect), and one off-by-3 in a GM that looks like it has a bunch of overdraw. BUG=skia: Review URL: https://codereview.chromium.org/1098913002 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
afe2ffb8ba5e7362a2ee6f4e1540c9ab22df2c1e	17-Apr-2015	mtklein <mtklein@chromium.org>	Rework SSE and NEON Color32 algorithms to be more correct and faster. This algorithm changes the blend math, guarded by SK_LEGACY_COLOR32_MATH. The new math is more correct: it's never off by more than 1, and correct in all the interesting 0x00 and 0xFF edge cases, where the old math was never off by more than 2, and not always correct on the edges. If you look at tests/BlendTest.cpp, the old code was using the `blend_256_plus1_trunc` algorithm, while the new code uses `blend_256_round_alt`. Neither uses `blend_perfect`, which is about ~35% slower than `blend_256_round_alt`. This will require an unfathomable number of rebaselines, first to Skia, then to Blink when I remove the guard. I plan to follow up with some integer SIMD abstractions that can unify these two implementations into a single algorithm. This was originally what I was working on here, but the correctness gains seem to be quite compelling. The only places these two algorithms really differ greatly now is the kernel function, and even there they can really both be expressed abstractly as: - multiply 8-bits and 8-bits producing 16-bits - add 16-bits to 16-bits, returning the top 8 bits. All the constants are the same, except SSE is a little faster to keep 8 16-bit inverse alphas, NEON's a little faster to keep 8 8-bit inverse alphas. I may need to take this small speed win back to unify the two. We should expect a ~25% speedup on Intel (mostly from unrolling to 8 pixels) and a ~20% speedup on ARM (mostly from using vaddhn to add `color`, round, and narrow back down to 8-bit all into one instruction. (I am probably missing several more related bugs here.) BUG=skia:3738,skia:420,chromium:111470 Review URL: https://codereview.chromium.org/1092433002 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
70840cbd898df67f603987213164c798415d76bf	20-Mar-2015	henrik.smiding <henrik.smiding@intel.com>	Replace SSE optimization of Color32A_D565 Adds an SSE2 version of the Color32A_D565 function, to replace the existing SSE4 version. Also does some minor cleanup. Performance improvement in the following Skia benchmarks. Measured on Atom Silvermont: Xfermode_SrcOver - x3 luma_colorfilter_large - x4.6 luma_colorfilter_small - x2 tablebench - ~15% chart_bw - ~10% Measured on Corei7 Haswell: luma_colorfilter_large running SSE2 - x2 luma_colorfilter_large running SSE4 - x2.3 Also improves performance in WPS Office application and 2D subtest of 0xbenchmark on Android. Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> Review URL: https://codereview.chromium.org/923523002 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
ad49b3b71508aff4a371d4e487b90971d9799721	02-Feb-2015	qiankun.miao <qiankun.miao@intel.com>	Optimize SSE2 opaque blend Backport optimization from https://codereview.chromium.org/874863002/. Microbenchmarks data compared to previous SSE2 implementation: bitmap_BGRA_8888_A_source_stripes_three 7.52us -> 8.67us 1.15x bitmap_BGRA_8888_A_source_stripes_two 7.48us -> 8.56us 1.15x bitmap_BGRA_8888_update_scale_rotate_bilerp 63.4us -> 64us 1.01x bitmap_BGRA_8888_update_volatile 3.31us -> 3.33us 1.01x bitmap_BGRA_8888_scale 11.1us -> 11.2us 1x bitmap_BGRA_8888_scale_bilerp 35.8us -> 35.9us 1x bitmap_BGRA_8888 3.33us -> 3.33us 1x bitmap_BGRA_8888_A_scale_rotate_bicubic 66.7us -> 66.5us 1x bitmap_BGRA_8888_update_volatile_scale_rotate_bilerp 65.1us -> 64us 0.98x bitmap_BGRA_8888_scale_rotate_bilerp 65.1us -> 64us 0.98x bitmap_BGRA_8888_A_scale_bicubic 30.6us -> 29.9us 0.98x bitmap_BGRA_8888_A_scale_bilerp 42.7us -> 41.4us 0.97x bitmap_BGRA_8888_A_scale_rotate_bilerp 71us -> 67.7us 0.95x bitmap_BGRA_8888_A 7.44us -> 5.7us 0.77x bitmap_BGRA_8888_A_source_opaque 7.46us -> 3.72us 0.5x bitmap_BGRA_8888_A_source_transparent 7.46us -> 1.96us 0.26x BUG=skia: Review URL: https://codereview.chromium.org/886403002 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
2253aa93930cdc5d0615098ce5473065427bcff6	25-Nov-2014	qiankun.miao <qiankun.miao@intel.com>	Add SkBlendARGB32_SSE2() to clean up code Related nanobench results: before: maxrss loops min median mean max stddev samples config bench 10M 2 31.9µs 32.4µs 33.3µs 38.7µs 6% █▄▂▂▂▁▂▁▁▁ 8888 bitmap_BGRA_8888_A_scale_bicubic 10M 13 43.8µs 51.8µs 49.6µs 57.9µs 11% ▁▁▁▁▂▆▇▆▅█ 8888 bitmap_BGRA_8888_A_scale_bilerp 10M 13 23.7µs 24.3µs 26µs 32.7µs 13% ▅█▆▁▁▁▁▂▁▁ 8888 bitmap_Index_8_A 10M 4 1.68µs 1.7µs 4.09µs 25.4µs 183% █▁▁▁▁▁▁▁▁▁ 8888 text_16_AA_88 10M 144 1.76µs 1.77µs 1.78µs 1.81µs 1% █▂▇▂▅▁▁▁▁▁ 8888 text_16_AA_FF 10M 10 4.7µs 5.34µs 5.61µs 8.63µs 21% █▂▂▃▂▁▁▁▁▄ 8888 rotated_rects_aa_alternating_transparent_and_opaque_src 10M 50 4.44µs 4.47µs 4.5µs 4.71µs 2% █▅▃▂▂▂▁▁▁▁ 8888 rotated_rects_aa_changing_opaque_src 10M 51 4.39µs 4.78µs 5.21µs 6.62µs 17% ▁▆▆▇▁▁█▁▂▂ 8888 rotated_rects_aa_same_opaque_src 10M 50 4.47µs 5.79µs 5.43µs 6.14µs 11% ▄▂▁▃▇▇▆▇▇█ 8888 rotated_rects_aa_alternating_transparent_and_opaque_srcover 10M 30 4.35µs 6.06µs 5.84µs 7.63µs 16% ▅▅▅▄▅▅▄█▁▁ 8888 rotated_rects_aa_changing_transparent_srcover 10M 44 4.31µs 4.51µs 4.76µs 6.25µs 13% ▄▂▂▁█▃▁▃▁▁ 8888 rotated_rects_aa_changing_opaque_srcover 10M 46 4.36µs 4.42µs 4.75µs 6.19µs 14% ▆█▃▁▁▁▁▁▁▁ 8888 rotated_rects_aa_same_transparent_srcover 10M 47 4.29µs 4.35µs 4.44µs 5.15µs 6% ▃▂▂▁▁█▁▁▁▁ 8888 rotated_rects_aa_same_opaque_srcover 10M 3 39.1µs 39.2µs 50.7µs 153µs 71% █▁▁▁▁▁▁▁▁▁ 8888 rectori 10M 1 2.3ms 2.31ms 2.35ms 2.74ms 6% ▁▁▁▁▁▁▁▁█▂ 8888 maskcolor 10M 1 2.33ms 2.34ms 2.53ms 3.14ms 11% ▁▁▁▁▁▁▅█▄▄ 8888 maskopaque 10M 11 15µs 15.3µs 15.7µs 18.3µs 7% ▅▃▂▂▁▁▁▁█▁ 8888 rrects_3_stroke_4 10M 46 3.99µs 4.07µs 4.14µs 4.54µs 4% █▅▅▃▂▂▁▁▁▁ 8888 rrects_3 10M 16 15.6µs 15.9µs 16.1µs 17.5µs 4% █▄▃▂▂▂▁▂▁▁ 8888 ovals_3_stroke_4 10M 40 5.09µs 5.18µs 5.23µs 5.67µs 3% █▅▃▂▂▁▃▁▁▁ 8888 ovals_3 10M 231 1.92µs 1.93µs 1.94µs 2µs 1% █▃▂▁▃▁▁▁▁▁ 8888 zeroradroundrect 10M 924 3.88µs 3.93µs 4.11µs 4.95µs 9% ▁█▆▃▁▁▁▁▁▁ 8888 arbroundrect 10M 8 8.11µs 8.47µs 8.48µs 8.85µs 3% █▅▇▄▄▂▁▄▄▆ 8888 merge_large 10M 14 6.71µs 6.92µs 6.96µs 7.46µs 3% ▃▆▁█▃▃▃▂▂▁ 8888 merge_small 11M 2 225µs 227µs 229µs 233µs 1% ███▃▇▂▃▁▃▂ 8888 displacement_full_large 16M 1 381µs 401µs 401µs 421µs 3% ▅▅▅█▆▄▄▃▃▁ 8888 displacement_alpha_large 19M 1 507µs 508µs 509µs 512µs 0% █▃▂▆▂▂▃▂▃▁ 8888 displacement_zero_large 19M 19 9µs 9.11µs 9.15µs 9.67µs 2% ▄▂▂▂█▂▁▁▁▂ 8888 displacement_full_small 19M 5 54.2µs 54.5µs 54.9µs 58µs 2% █▃▂▂▁▁▃▁▁▁ 8888 blurroundrect_WH[100x100]_cr[90] 20M 1 229µs 230µs 231µs 240µs 2% █▄▃▂▂▁▁▁▁▂ 8888 GM_varied_text_clipped_no_lcd 20M 1 267µs 269µs 270µs 279µs 1% █▄▃▂▂▂▂▂▁▁ 8888 GM_varied_text_ignorable_clip_no_lcd 22M 1 1.95ms 1.97ms 2.03ms 2.46ms 8% ▁▁▁▁▁▁▁▂█▃ 8888 GM_convex_poly_clip after: maxrss loops min median mean max stddev samples config bench 10M 2 31.5µs 32.3µs 32.8µs 37.2µs 5% █▄▃▂▂▂▁▁▁▁ 8888 bitmap_BGRA_8888_A_scale_bicubic 10M 13 43.9µs 44µs 44.1µs 44.9µs 1% █▂▁▁▁▆▁▁▁▂ 8888 bitmap_BGRA_8888_A_scale_bilerp 10M 19 22.7µs 23.3µs 25.6µs 32.4µs 14% ▁▁▁▁▁▅▆▁▅█ 8888 bitmap_Index_8_A 10M 5 1.79µs 1.97µs 3.85µs 21.1µs 158% █▁▁▁▁▁▁▁▁▁ 8888 text_16_AA_88 10M 141 1.83µs 1.83µs 1.85µs 1.93µs 2% ▅▁▁█▁▁▁▁▁▁ 8888 text_16_AA_FF 10M 10 4.65µs 4.92µs 5.06µs 6.56µs 11% █▃▃▂▂▂▁▁▁▁ 8888 rotated_rects_aa_alternating_transparent_and_opaque_src 10M 51 4.35µs 4.48µs 4.83µs 6.68µs 17% ▂▁▁▁▁▁▁▂▆█ 8888 rotated_rects_aa_changing_opaque_src 10M 51 4.38µs 4.79µs 4.85µs 5.84µs 11% ▁█▁▃▃▁▄▁▄▇ 8888 rotated_rects_aa_same_opaque_src 10M 32 5.58µs 6.24µs 6.1µs 6.39µs 5% █▂█▆▁▇▄▅▇▇ 8888 rotated_rects_aa_alternating_transparent_and_opaque_srcover 10M 42 4.28µs 5.59µs 5.11µs 6.01µs 15% ▂▂█▇█▂▁▆▁▇ 8888 rotated_rects_aa_changing_transparent_srcover 10M 48 4.24µs 4.33µs 4.58µs 6.46µs 15% ▁▁▁▁▁█▃▂▁▁ 8888 rotated_rects_aa_changing_opaque_srcover 10M 48 4.28µs 4.3µs 4.4µs 5.12µs 6% ▂▂▁▁▁▁▁▁▁█ 8888 rotated_rects_aa_same_transparent_srcover 10M 46 4.24µs 4.29µs 4.66µs 7.11µs 20% ▁▁▁▁▁▁▁▁▃█ 8888 rotated_rects_aa_same_opaque_srcover 10M 3 39.3µs 39.4µs 51.4µs 154µs 70% █▁▁▁▁▁▁▁▁▁ 8888 rectori 10M 1 2.32ms 2.43ms 2.53ms 3.14ms 11% ▁▁▁▁▂▄█▃▅▁ 8888 maskcolor 10M 1 2.33ms 2.37ms 2.54ms 3.21ms 12% ▁▁▁▁▁▂█▅▆▁ 8888 maskopaque 10M 10 15.3µs 15.6µs 15.8µs 17.2µs 4% █▅▃▂▂▂▁▁▁▁ 8888 rrects_3_stroke_4 10M 46 4.03µs 4.09µs 4.15µs 4.47µs 4% █▄▆▂▂▂▁▁▁▁ 8888 rrects_3 10M 15 15.9µs 16.2µs 16.3µs 17.8µs 4% █▄▃▂▂▂▁▁▁▁ 8888 ovals_3_stroke_4 10M 40 5.14µs 5.26µs 5.29µs 5.72µs 3% █▅▃▂▂▁▂▂▁▁ 8888 ovals_3 10M 222 1.91µs 1.99µs 2.21µs 2.91µs 19% ▂▁▁▁▁▁▂▇▇█ 8888 zeroradroundrect 10M 462 3.9µs 3.96µs 4.23µs 5.22µs 12% ▆▄█▁▂▁▁▁▁▁ 8888 arbroundrect 10M 8 8.2µs 8.59µs 8.62µs 8.97µs 3% ▆▄█▄▅▃▁▆▄█ 8888 merge_large 10M 14 6.73µs 6.88µs 6.86µs 7.08µs 2% ▄█▁▂▄▂▅▄▂▅ 8888 merge_small 11M 2 221µs 234µs 237µs 263µs 5% ▄▃▃▃▄▃▂▁▇█ 8888 displacement_full_large 16M 1 387µs 416µs 427µs 471µs 7% ▇█▁▃▃▁▃▃▇▆ 8888 displacement_alpha_large 19M 1 512µs 521µs 528µs 594µs 5% █▂▂▂▁▁▂▃▁▁ 8888 displacement_zero_large 19M 18 9.06µs 9.12µs 9.13µs 9.23µs 1% █▃▃▃▄▃▆▁▅▅ 8888 displacement_full_small 19M 5 55.6µs 55.9µs 56.5µs 59.5µs 2% █▃▂▁▁▁▁▁▅▁ 8888 blurroundrect_WH[100x100]_cr[90] 20M 1 229µs 233µs 235µs 254µs 3% █▄▃▂▂▁▁▂▁▁ 8888 GM_varied_text_clipped_no_lcd 20M 1 270µs 271µs 272µs 278µs 1% █▄▃▂▂▂▁▂▁▇ 8888 GM_varied_text_ignorable_clip_no_lcd 22M 1 1.96ms 2ms 2.06ms 2.45ms 7% ▂▂▁▁▁▁▁▃█▄ 8888 GM_convex_poly_clip BUG=skia: Review URL: https://codereview.chromium.org/754733002 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
533a32782f9817bb307484b36323040470575da4	25-Nov-2014	qiankun.miao <qiankun.miao@intel.com>	Cleanup with SkAlphaMulQ_SSE2() Related nanobench results: before: 10M 18 7.03µs 7.31µs 7.38µs 8.46µs 6% ▂▁▂▂▂▃▄▁█▁ 8888 bitmaprect_80_filter_identity 10M 43 6.96µs 6.97µs 6.99µs 7.19µs 1% ▁▂▁▁▁▁▁█▁▁ 8888 bitmaprect_80_nofilter_identity 10M 14 35.7µs 35.8µs 35.9µs 36.3µs 1% ▃▂▁▂▁█▂▁▁▁ 8888 bitmap_BGRA_8888_update_scale_bilerp 10M 16 35.5µs 35.6µs 35.7µs 36.3µs 1% █▅▂▁▁▁▃▂▁▁ 8888 bitmap_BGRA_8888_update_volatile_scale_bilerp 10M 16 35.4µs 35.4µs 35.5µs 36.8µs 1% ▂▁█▁▁▁▁▂▁▁ 8888 bitmap_BGRA_8888_scale_bilerp 10M 25 16.4µs 16.6µs 16.7µs 17.4µs 2% ▂▁▁▂▁▁▁▅▅█ 8888 bitmap_Index_8 10M 15 37.9µs 38µs 38µs 38.4µs 0% ▄▆▂▁▁▁█▂▁▁ 8888 bitmap_RGB_565 10M 33 11.1µs 11.1µs 11.1µs 11.2µs 0% ▆▂█▂▂▂▁▁▂▁ 8888 bitmap_BGRA_8888_scale after: 10M 9 7.04µs 7.06µs 7.1µs 7.32µs 1% █▅▂▁▁▂▁▁▁▁ 8888 bitmaprect_80_filter_identity 10M 18 7.01µs 7.02µs 7.05µs 7.25µs 1% █▂▁▁▁▁▁▁▁▁ 8888 bitmaprect_80_nofilter_identity 10M 5 33.9µs 34µs 34.1µs 34.5µs 1% █▃▂▂▁▁▁▅▃▂ 8888 bitmap_BGRA_8888_update_scale_bilerp 10M 7 35.5µs 35.5µs 35.6µs 36.3µs 1% ▃▂▂▁▂▁▂▁█▂ 8888 bitmap_BGRA_8888_update_volatile_scale_bilerp 10M 7 35.5µs 35.5µs 35.7µs 36.8µs 1% ▂▁▁▁▁▁▁▁▁█ 8888 bitmap_BGRA_8888_scale_bilerp 10M 11 16.4µs 16.4µs 16.4µs 16.6µs 0% █▂▁▁▂▁▁▁▂▁ 8888 bitmap_Index_8 10M 7 37.3µs 37.4µs 38.4µs 47.8µs 9% ▁▁▁▁▁▁▁▁▁█ 8888 bitmap_RGB_565 10M 33 11µs 11µs 11.1µs 11.2µs 1% ▄█▅▃▂▁▁▁▁▁ 8888 bitmap_BGRA_8888_scale BUG=skia: Review URL: https://codereview.chromium.org/755573002 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
52e74c6d848931e12d2ba634abc0aae881cd3be1	24-Nov-2014	qiankun.miao <qiankun.miao@intel.com>	Cleanup of S32_D565_Opaque_SSE2() BUG=skia: Review URL: https://codereview.chromium.org/725693003 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
60e4ad7b29f50ebd7698d2d37580d5c8da5ce600	09-Oct-2014	jmuizelaar <jmuizelaar@mozilla.com>	Improve SkARGB32_A8_BlitMask_SSE2 With clang this: - movzbl -3(%rbx), %edx - pxor %xmm5, %xmm5 - pinsrw $0, %edx, %xmm5 - pinsrw $1, %edx, %xmm5 - movzbl -2(%rbx), %edx - pinsrw $2, %edx, %xmm5 - pinsrw $3, %edx, %xmm5 - movzbl -1(%rbx), %edx - pinsrw $4, %edx, %xmm5 - pinsrw $5, %edx, %xmm5 - movzbl (%rbx), %edx - pinsrw $6, %edx, %xmm5 - pinsrw $7, %edx, %xmm5 becomes: + movd (%rbx), %xmm4 + punpcklbw %xmm9, %xmm4 + punpcklwd %xmm4, %xmm4 And clang already does better codegen than msvc 2013 on this. BUG=skia: Review URL: https://codereview.chromium.org/609823003 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
8c4953c6f176469ad287c3270ab146e292b23bad	30-Apr-2014	commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	Cleanup of SSE optimization files. General cleanup of optimization files for x86/SSEx. Renamed the opts_check_SSE2.cpp file to _x86, since it's not specific to SSE2. Commented out the ColorRect32 optimization, since it's disabled anyway, to make it more visible. Also fixed a lot of indentation, inclusion guards, spelling, copyright headers, braces, whitespace, and sorting of includes. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/264603002 git-svn-id: http://skia.googlecode.com/svn/trunk@14464 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
c524e98f1edf06b53e65543f5f28217fa13b7aa9	09-Apr-2014	commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	Xfermode: SSE2 implementation of multiply_modeproc This patch implements basics for Xfermode SSE optimization. Based on these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 implementation for other modes will come in future. With this patch performance of Xfermode_Multiply will improve about 45%. Here are the data on desktop i7-3770. before: Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 after: Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 BUG= Committed: http://code.google.com/p/skia/source/detail?r=14006 Committed: http://code.google.com/p/skia/source/detail?r=14050 R=mtklein@google.com, robertphillips@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/202903004 git-svn-id: http://skia.googlecode.com/svn/trunk@14107 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
77815fd74df355b8d6eff8a91fd10cc65033a79f	03-Apr-2014	commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	Revert of Xfermode: SSE2 implementation of multiply_modeproc (https://codereview.chromium.org/202903004/) Reason for revert: It looks like serialization is broken. The serialize and pipe-cross-process tests are failing and turning (at least the Ubuntu12 and Win7) bots red Original issue's description: > Xfermode: SSE2 implementation of multiply_modeproc > > This patch implements basics for Xfermode SSE optimization. Based on > these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 > implementation for other modes will come in future. With this patch > performance of Xfermode_Multiply will improve about 45%. Here are the > data on desktop i7-3770. > before: > Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 > after: > Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 > > BUG= > > Committed: http://code.google.com/p/skia/source/detail?r=14006 > > Committed: http://code.google.com/p/skia/source/detail?r=14050 R=mtklein@google.com, qiankun.miao@intel.com TBR=mtklein@google.com, qiankun.miao@intel.com NOTREECHECKS=true NOTRY=true BUG= Author: robertphillips@google.com Review URL: https://codereview.chromium.org/224253003 git-svn-id: http://skia.googlecode.com/svn/trunk@14053 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
c3118739277beb7678973d47e3d71bb863929bce	03-Apr-2014	commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	Xfermode: SSE2 implementation of multiply_modeproc This patch implements basics for Xfermode SSE optimization. Based on these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 implementation for other modes will come in future. With this patch performance of Xfermode_Multiply will improve about 45%. Here are the data on desktop i7-3770. before: Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 after: Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 BUG= Committed: http://code.google.com/p/skia/source/detail?r=14006 R=mtklein@google.com, robertphillips@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/202903004 git-svn-id: http://skia.googlecode.com/svn/trunk@14050 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
079d2986002a6eb407ba4699bae58920fdc355a0	01-Apr-2014	commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	Revert of Xfermode: SSE2 implementation of multiply_modeproc (https://codereview.chromium.org/202903004/) Reason for revert: Breaking builds Original issue's description: > Xfermode: SSE2 implementation of multiply_modeproc > > This patch implements basics for Xfermode SSE optimization. Based on > these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 > implementation for other modes will come in future. With this patch > performance of Xfermode_Multiply will improve about 45%. Here are the > data on desktop i7-3770. > before: > Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 > after: > Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 > > BUG= > > Committed: http://code.google.com/p/skia/source/detail?r=14006 R=mtklein@google.com, qiankun.miao@intel.com TBR=mtklein@google.com, qiankun.miao@intel.com NOTREECHECKS=true NOTRY=true BUG= Author: robertphillips@google.com Review URL: https://codereview.chromium.org/219243009 git-svn-id: http://skia.googlecode.com/svn/trunk@14007 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
25f7455f3a7cf2c440509bead85486079f1e4b31	01-Apr-2014	commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	Xfermode: SSE2 implementation of multiply_modeproc This patch implements basics for Xfermode SSE optimization. Based on these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 implementation for other modes will come in future. With this patch performance of Xfermode_Multiply will improve about 45%. Here are the data on desktop i7-3770. before: Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 after: Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 BUG= R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/202903004 git-svn-id: http://skia.googlecode.com/svn/trunk@14006 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
fe089b383aeae512ee39678a667c81867f730cd0	07-Mar-2014	commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	SSE2 implementation of S32A_D565_Opaque_Dither Run benchmarks with command line option "--forceDither true --forceBlend 1", almost all the benchmarks exercised S32A_D565_Opaque_Dither can get about 20%-70% performance improvement. Here are the data on i7-3770: before after verts 4314.81 3627.64 15.93% constXTile_MM_filter_trans 1434.22 432.82 69.82% constXTile_CC_filter_trans_scale 1440.17 437.00 69.66% constXTile_RR_filter_trans 1436.96 431.93 69.94% constXTile_MM_trans_scale 1436.33 435.77 69.66% constXTile_CC_trans 1433.12 431.36 69.90% constXTile_RR_trans_scale 1436.13 436.06 69.64% constXTile_MM_filter 1411.55 408.06 71.09% constXTile_CC_filter_scale 1416.68 414.18 70.76% constXTile_RR_filter 1429.46 409.81 71.33% constXTile_MM_scale 1415.00 412.56 70.84% constXTile_CC 1410.32 408.36 71.04% constXTile_RR_scale 1413.26 413.16 70.77% repeatTile_4444_A 1922.01 879.03 54.27% repeatTile_4444_A 1430.68 818.34 42.80% repeatTile_4444_X 1817.43 816.63 55.07% maskshader 5911.09 5895.46 0.26% gradient_create_alpha 4.41 4.41 -0.15% gradient_conical_clamp_3color 35298.71 27574.34 21.88% gradient_conical_clamp_hicolor 35262.15 27538.99 21.90% gradient_conical_clamp 35276.21 27599.80 21.76% gradient_radial2_mirror 20846.74 12969.39 37.79% gradient_radial2_clamp_hicolor 21848.12 13967.57 36.07% gradient_radial2_clamp 21829.95 13978.57 35.97% bitmap_4444_A_scale_rotate_bicubic 105.31 87.13 17.26% bitmap_4444_A_scale_bicubic 73.69 47.76 35.20% bitmap_4444_update_scale_rotate_bilerp 125.65 87.86 30.08% bitmap_4444_update_volatile_scale_rotate_bilerp 125.50 87.65 30.16% bitmap_4444_scale_rotate_bilerp 124.46 87.91 29.37% bitmap_4444_A_scale_rotate_bilerp 105.09 87.27 16.96% bitmap_4444_update_scale_bilerp 106.78 63.28 40.74% bitmap_4444_update_volatile_scale_bilerp 106.66 63.66 40.32% bitmap_4444_scale_bilerp 106.70 63.19 40.78% bitmap_4444_A_scale_bilerp 83.05 62.25 25.04% bitmap_a8 98.11 52.76 46.22% bitmap_a8_A 98.24 52.85 46.20% BUG= R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/179443003 git-svn-id: http://skia.googlecode.com/svn/trunk@13699 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
275804782f7b752cc9c25cb556db2a0cfc711dd9	07-Mar-2014	commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	SSE2 implementation of S32_D565_Opaque_Dither Run benchmarks with command line option "--forceDither true". The result shows that all benchmarks exercised S32_D565_Opaque_Dither benefit from this SSE2 optimization. Here are the data on i7-3770: before after constXTile_MM_filter 900.93 217.75 75.83% constXTile_CC_filter_scale 907.59 225.65 75.14% constXTile_RR_filter 903.33 219.41 75.71% constXTile_MM_scale 902.45 221.46 75.46% constXTile_CC 898.55 218.37 75.70% constXTile_RR_scale 902.69 222.35 75.37% repeatTile_4444_X 938.53 240.49 74.38% gradient_radial2_mirror 16999.49 11540.39 32.11% gradient_radial2_clamp_hicolor 17943.38 12501.71 30.33% gradient_radial2_clamp 17816.36 12492.04 29.88% bitmaprect_FF_filter_trans 47.81 10.98 77.03% bitmaprect_FF_nofilter_trans 47.79 10.91 77.18% bitmaprect_FF_filter_identity 47.74 10.89 77.18% bitmaprect_FF_nofilter_identity 47.83 10.89 77.24% bitmap_4444_update_scale_rotate_bilerp 100.45 76.84 23.50% bitmap_4444_update_volatile_scale_rotate_bilerp 100.80 76.70 23.91% bitmap_4444_scale_rotate_bilerp 100.43 77.18 23.15% bitmap_4444_update_scale_bilerp 79.00 49.03 37.93% bitmap_4444_update_volatile_scale_bilerp 78.90 48.87 38.06% bitmap_4444_scale_bilerp 78.92 48.81 38.16% bitmap_4444_update 42.19 11.53 72.68% bitmap_4444_update_volatile 42.28 11.49 72.82% bitmap_a8 60.37 29.75 50.72% bitmap_4444 42.19 11.52 72.69% BUG= R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/181293002 git-svn-id: http://skia.googlecode.com/svn/trunk@13698 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
39ce33a1facae795eb2f02e35674702de7eb23b5	24-Feb-2014	commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	SSE2 implementation of S32_D565_Opaque Benchmarks hitting this path can benfit from this patch. Here are the data: before after gradient_radial2_mirror 10885.52 10849.48 0.33% gradient_radial2_clamp_hicolor 11819.69 11644.83 1.48% gradient_radial2_clamp 11816.10 11649.91 1.41% bitmaprect_FF_filter_trans 6.27 4.88 22.17% bitmaprect_FF_nofilter_trans 6.27 4.88 22.17% bitmaprect_FF_filter_identity 6.31 4.86 22.98% bitmaprect_FF_nofilter_identity 6.25 4.86 22.24% bitmap_4444_update 6.26 5.05 19.33% bitmap_4444_update_volatile 6.21 5.06 18.52% bitmap_4444 6.22 5.06 18.65% BUG= R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/172083003 git-svn-id: http://skia.googlecode.com/svn/trunk@13556 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
475910750cdc7d14da3071d4052ba9ab98383be9	19-Feb-2014	commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	SSE2 implementation of S32A_D565_Opaque microbenchmark of S32A_D565_Opaque() shows a 3x speedup after SSE optimization with various count on i7-3770. BUG= R=mtklein@google.com, reed@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/138163013 git-svn-id: http://skia.googlecode.com/svn/trunk@13495 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
76e0d137892f6a4f3bce278aceb99f9a0d37317c	02-Jul-2013	commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	Commented SSE blend functions and cleaned-up variable naming. R=senorblanco@chromium.org, alokp@chromium.org, reed@google.com, bungeman@google.com Author: ernstm@chromium.org Review URL: https://chromiumcodereview.appspot.com/17847010 git-svn-id: http://skia.googlecode.com/svn/trunk@9870 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
fbfcd5602128ec010c82cb733c9cdc0a3254f9f3	23-Aug-2012	rmistry@google.com <rmistry@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	Result of running tools/sanitize_source_files.py (which was added in https://codereview.appspot.com/6465078/) This CL is part I of IV (I broke down the 1280 files into 4 CLs). Review URL: https://codereview.appspot.com/6485054 git-svn-id: http://skia.googlecode.com/svn/trunk@5262 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
27123cd59fb6d6fc240d327efd9fc068c0e3a495	21-Aug-2012	bungeman@google.com <bungeman@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	Force opaque in SkBlendLCD16Opaque_SSE2 to match SkBlendLCD16. https://codereview.appspot.com/6460123/ git-svn-id: http://skia.googlecode.com/svn/trunk@5218 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
7329dc9976e5a61e2382974884d2e075f4f856f1	27-Jul-2012	reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	revert 4799-4801 -- red and blue are reversed on windows and linux git-svn-id: http://skia.googlecode.com/svn/trunk@4803 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
e5a196f4403529076dfa335facb2122c5768d8aa	27-Jul-2012	reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	use SK_RESTRICT instead of __restrict__ git-svn-id: http://skia.googlecode.com/svn/trunk@4801 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
d2fd13685ba0d7b3b7a2816c12379e0c61c6749b	27-Jul-2012	reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	use intptr_t to cast from ptr to int for masking low bits git-svn-id: http://skia.googlecode.com/svn/trunk@4800 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
37fe48489907870067e9cef9cf987b505ae6f02c	27-Jul-2012	reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	land http://codereview.appspot.com/6327044/ SSE optimization for 565 pixel format -- by Lei git-svn-id: http://skia.googlecode.com/svn/trunk@4799 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
8cd5ae79c6aaa20188ac6f34318c2f358d87e103	09-Jul-2012	bungeman@google.com <bungeman@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	Fix SkBlendLCD16_SSE2 for non ARGB platforms. http://codereview.appspot.com/6356062/ git-svn-id: http://skia.googlecode.com/svn/trunk@4481 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
83ecdc3ac69c9208493c4c3fc8ea9f84b1350535	06-Jun-2012	caryclark@google.com <caryclark@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	fix warnings on Mac in src/opts Fix these class of warnings: - unused functions - unused locals - sign mismatch - missing function prototypes - missing newline at end of file - 64 to 32 bit truncation The changes prefer to link in dead code in the debug build with 'if (false)' than to comment it out, but trivial cases are commented out or sometimes deleted if it appears to be a copy/paste error. Review URL: https://codereview.appspot.com/6303045 git-svn-id: http://skia.googlecode.com/svn/trunk@4184 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
98a5b420aa41b02a4fbf77eaa378e039defc62bb	28-Feb-2012	tomhudson@google.com <tomhudson@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	Improve SSE2 code for Blending BlitRow functions, producing 10% speedup. Courtesy of Evan Nier. http://codereview.appspot.com/5518045/ git-svn-id: http://skia.googlecode.com/svn/trunk@3273 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
d6770e69e05c9dcc12f2a1a2d509c0b174372ee7	14-Feb-2012	tomhudson@google.com <tomhudson@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	SSE2 version of blit_lcd16, courtesy of Jin Yang. Yields 25-30% speedup on Windows (32b), 4-7% on Linux (64b, less register pressure), not invoked on Mac (lcd text is 32b instead of 16b). Followup: GDI system settings on Windows can suppress LCD text for small fonts, interfering with our benchmarks. (http://code.google.com/p/skia/issues/detail?id=483) http://codereview.appspot.com/5617058/ git-svn-id: http://skia.googlecode.com/svn/trunk@3189 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
c909a1ecadd422d91ff97d10ce08865290223b14	25-Oct-2011	reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	don't blend with zero in colorproc (forgot to return after memcpy check). git-svn-id: http://skia.googlecode.com/svn/trunk@2527 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
edb606cb999887d54629f361bcbf57c5fede1bb0	18-Oct-2011	reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	move LCD blits into opts, so they can have assembly versions git-svn-id: http://skia.googlecode.com/svn/trunk@2484 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
ec3ed6a5ebf6f2c406d7bcf94b6bc34fcaeb976e	28-Jul-2011	epoger@google.com <epoger@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	Automatic update of all copyright notices to reflect new license terms. I have manually examined all of these diffs and restored a few files that seem to require manual adjustment. The following files still need to be modified manually, in a separate CL: android_sample/SampleApp/AndroidManifest.xml android_sample/SampleApp/res/layout/layout.xml android_sample/SampleApp/res/menu/sample.xml android_sample/SampleApp/res/values/strings.xml android_sample/SampleApp/src/com/skia/sampleapp/SampleApp.java android_sample/SampleApp/src/com/skia/sampleapp/SampleView.java experimental/CiCarbonSampleMain.c experimental/CocoaDebugger/main.m experimental/FileReaderApp/main.m experimental/SimpleCocoaApp/main.m experimental/iOSSampleApp/Shared/SkAlertPrompt.h experimental/iOSSampleApp/Shared/SkAlertPrompt.m experimental/iOSSampleApp/SkiOSSampleApp-Base.xcconfig experimental/iOSSampleApp/SkiOSSampleApp-Debug.xcconfig experimental/iOSSampleApp/SkiOSSampleApp-Release.xcconfig gpu/src/android/GrGLDefaultInterface_android.cpp gyp/common.gypi gyp_skia include/ports/SkHarfBuzzFont.h include/views/SkOSWindow_wxwidgets.h make.bat make.py src/opts/memset.arm.S src/opts/memset16_neon.S src/opts/memset32_neon.S src/opts/opts_check_arm.cpp src/ports/SkDebug_brew.cpp src/ports/SkMemory_brew.cpp src/ports/SkOSFile_brew.cpp src/ports/SkXMLParser_empty.cpp src/utils/ios/SkImageDecoder_iOS.mm src/utils/ios/SkOSFile_iOS.mm src/utils/ios/SkStream_NSData.mm tests/FillPathTest.cpp Review URL: http://codereview.appspot.com/4816058 git-svn-id: http://skia.googlecode.com/svn/trunk@1982 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
ee467ee79d449ebe6ae7f7946e613cc70a479c69	09-Mar-2011	reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	Correct blitmask procs to recognize that we pass them an SkColor, and if they want a SkPMColor, they need to call SkPreMultiplyColor() Add Opaque and Black optimizations for blitmask_d32 git-svn-id: http://skia.googlecode.com/svn/trunk@911 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
981d4798007b91e2e19c13b171583927a56df63b	09-Mar-2011	reed@google.com <reed@google.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	http://codereview.appspot.com/3980041/ Add blitmask procs (with optional platform acceleration) patch by yaojie.yan git-svn-id: http://skia.googlecode.com/svn/trunk@910 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
c3856384e4ab9a7ad5902696a5c972ab595b8467	13-Dec-2010	senorblanco@chromium.org <senorblanco@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	SSE2 optimizations for 32bit Color operation. [Patch from weiwei.li@intel.com] SSE2 optimization has been added by Stephen White before, this improves the skia performance on SSE2-supporting platform. (please refer to below issues) Issue 171055: More SSE2ification Issue 157141: More SSE2ification Issue 150060: minor tweaks to SSE2 code for -fPIC Issue 144072: SSE2 optimizations for 32bit blending blitters This CL implements SSE2 optimizations for the 32bit Color operation. Like above issues, it uses CPUID to detect for SSE2 and changes the platform procs at runtime as well. The 32bit Color operation is heavily used on Chrome HTML5 canvas operations. Take Microsoft IE test drives Pulsating Bubbles as example (http://ie.microsoft.com/testdrive/Performance/PulsatingBubbles/Default.xhtml), if running this cases on Chrome, the overhead of 32bit Color operation is about 40~50%. So this CL will make skia performance more better, and also make Chrome HTML5 canvas performance more better. Additional, this CL has passed the skia bench & tests validation, the result is pretty good. We also apply this CL to the latest chromium, and re-run Microsoft IE test drives Pulsating Bubbles, the performance is improved by almost 9~10%. git-svn-id: http://skia.googlecode.com/svn/trunk@633 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
f3f0bd71b81097f6c640e7f60805de7eacbc98c6	10-Dec-2009	senorblanco@chromium.org <senorblanco@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	SSE2-ified S32_alpha_D32_filter_DX (refactoring to come). Also shaved a few cycles off the SSE2 blends. Review URL: http://codereview.appspot.com/171055 git-svn-id: http://skia.googlecode.com/svn/trunk@456 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
dc7de745dd142cdc00ffed7963ebb030a0506f72	30-Nov-2009	senorblanco@chromium.org <senorblanco@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	More SSE2 optimizations. This CL implements an SSE2 version of S32_bitmap_D32_filter_DX, and uses aligned loads and stores for dst, in all blending. Review URL: http://codereview.appspot.com/157141 git-svn-id: http://skia.googlecode.com/svn/trunk@448 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
4e753558fc8cc2f77cbcd46fba80d8612e836a1e	16-Nov-2009	senorblanco@chromium.org <senorblanco@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	More SSE2-ification; fix for gcc -msse2. Review URL: http://codereview.appspot.com/154163 git-svn-id: http://skia.googlecode.com/svn/trunk@428 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
f0f4e9abba62a405b7a41e40fcee20b45eb348ee	13-Nov-2009	reed@android.com <reed@android.com@2bbb7eff-a529-9590-31e7-b0007b416f81>	remove const modifiers on function return types (unneeded, and caused an error on some gccs). git-svn-id: http://skia.googlecode.com/svn/trunk@425 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
d03337c736d32bd814eae8a62b1284ea0009b816	09-Nov-2009	senorblanco@chromium.org <senorblanco@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	Fix for gcc -fPIC build. http://codereview.appspot.com/150060 git-svn-id: http://skia.googlecode.com/svn/trunk@421 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp
9272761b22746d2d22439c26f5555028f8e824da	04-Nov-2009	senorblanco@chromium.org <senorblanco@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>	SSE2 optimizations for 32bit blending blitters. This CL implements SSE2 optimizations for 3 of the 32bit blending blitters. It uses CPUID to detect for SSE2 at runtime. In order to accomodate runtime detection, it changes the platform procs from static arrays to static functions. It also includes an implementation of SkTime for Win32. http://codereview.appspot.com/144072 git-svn-id: http://skia.googlecode.com/svn/trunk@418 2bbb7eff-a529-9590-31e7-b0007b416f81 /external/skia/src/opts/SkBlitRow_opts_SSE2.cpp