883c9bce671fb955574a6c0e46f57f57189bd6c6 |
|
19-Jul-2017 |
Mike Reed <reed@google.com> |
experimental: draw into unpremul raster-only Bug: skia: Change-Id: I3af19f031083c9cc258f73ba6a2f6020bb15f110 Reviewed-on: https://skia-review.googlesource.com/24400 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
968af43f721cd949ba3005f8e5f3c03c6c978a74 |
|
18-Jul-2017 |
Mike Klein <mtklein@chromium.org> |
remove gather_i8, unify memory-touching contexts gather_i8 is now unused, so we can remove it. That in turn makes the ctable field of SkJumper_GatherCtx unused. After removing ctable, SkJumper_GatherCtx and SkJumper_PtrStride look identical, so I've now fused them into SkJumper_MemoryCtx, which will eventually be used by everything loading from, gathering from, or storing to memory. Change-Id: Ia882d2dbd54c9fcf9a8250a1ce83304389dd284a Reviewed-on: https://skia-review.googlesource.com/24085 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
3b92b6907a6b9f7020a7589ad07cad2472fe4e86 |
|
18-Jul-2017 |
Mike Klein <mtklein@chromium.org> |
start on raster pipeline 2d mode - Add run_2d(x,y,w,h) and start_pipeline_2d(). - Add and test a 2d-compatible store_8888_2d stage. Change-Id: Ib9c225d1b8cb40471ae4333df1d06eec4d506f8a Reviewed-on: https://skia-review.googlesource.com/24401 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
140635504c40b99debb0e714aca8d90652aa6aff |
|
18-Jul-2017 |
Mike Klein <mtklein@chromium.org> |
minor fixes to start_pipeline_lowp - in _lowp.cpp, JUMPER is always defined, so no need to check. - the return type of this function has been void for a while. Change-Id: I5271e8dab784f46c7ffa9cfba6eb55b5e399b537 Reviewed-on: https://skia-review.googlesource.com/24326 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
ea23a04a1c5da3c6af271b1991befc45f5df2f5e |
|
27-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
add 32-bit Windows SkJumper backend The most interesting part of this is getting the call to start_pipeline to work. From there it should be just like the other x86 backend. The 32-bit calling conventions are the same across Linux/Mac and Windows, so that's nice. The tricky bit is that Linux and Mac align the stack to 16 bytes, while Windows only to 4. I think this force_align_arg_pointer attribute on start_pipeline does the trick. This needs a guard for layout tests. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win2k8-MSVC-GCE-CPU-AVX2-x86-Debug;master.tryserver.blink:win10_blink_rel,win7_blink_rel;master.tryserver.chromium.win:win_chromium_rel_ng Change-Id: Ia74d22e5a4ce5483c9817b8a8f89dd21885bbd14 Reviewed-on: https://skia-review.googlesource.com/20968 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
c91e3877a57ef140b688627b8c0acaafbefc9034 |
|
05-Jul-2017 |
Mike Reed <reed@google.com> |
add stages for black and white colors histogram of test skps: black: 1/7 white: 2/7 other: 4/7 Bug: skia: Change-Id: I3a092899d31ce87837e66e5c8ea9ec5e0f239361 Reviewed-on: https://skia-review.googlesource.com/21408 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Reed <reed@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
7aad8cc2ff7a580bd5852d57289ace8c8dced6f5 |
|
05-Jul-2017 |
Mike Reed <reed@google.com> |
optimize for diff matrix types Bug: skia: Change-Id: I671e07c5bbb9e4ced92303c9959143324f7a6bdc Reviewed-on: https://skia-review.googlesource.com/21523 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
9026fe13a751582e58e98f9bf735c18b4719d7fe |
|
29-Jun-2017 |
Florin Malita <fmalita@chromium.org> |
2pt conical stage for focal-point-outside case A couple of annoyances here: 1) the prev vector_scale stage is not usable for masking, as NaN values can propagate through => switch to actual masking 2) for the outside case, we must select the min root when the gradient is flipped => split into two templated stages (_min, _max) (I'm not convinced that we need to flip the gradient for RP at all; we can investigate later) Change-Id: I0283812d613a53124f2987d1aea1f26e4533655e Reviewed-on: https://skia-review.googlesource.com/21162 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Florin Malita <fmalita@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
2e409009fb715400a0d64612c57187465c12790c |
|
28-Jun-2017 |
Florin Malita <fmalita@chromium.org> |
2pt conical stage for focal-pt-on-edge case When the focal point is on the edge of the end circle, the quadratic equation devolves to linear. Add a stage to handle this case. As a complication, this case can produce "degenerate" values: 1) t == NaN 2) R(t) < 0 For these, we're supposed to draw transparent black - which means overwriting the color from the gradient stage. To support this, build a 0/1 vector mask in the context, and apply it post-gradient-stage. Change-Id: Ice4e3243abfd8c784bb810f6c310aed7a4ac7dc8 Reviewed-on: https://skia-review.googlesource.com/21111 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
a66ef2d1063c9005611890613a7c27bede6ec5cb |
|
28-Jun-2017 |
Florin Malita <fmalita@chromium.org> |
2ptconical stage Initial impl, for the well-behaved case (focal point inside). MBP numbers - Before: 3365.87 ! gradient_conical_clamp_shallow srgb 3590.88 ! gradient_conical_clamp_shallow_dither srgb 3376.91 ! gradient_conical_clamp_3color srgb 3351.64 ! gradient_conical_clamp_hicolor srgb 3379.35 ! gradient_conical_clamp srgb After: 648.93 ! gradient_conical_clamp_shallow srgb 665.12 ! gradient_conical_clamp_shallow_dither srgb 773.98 ! gradient_conical_clamp_3color srgb 1175.35 ! gradient_conical_clamp_hicolor srgb 619.17 ! gradient_conical_clamp srgb Change-Id: I07b22a758363e1f340a6041bca53bdef74229eb9 Reviewed-on: https://skia-review.googlesource.com/20906 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
111f8a9eea6980a70a300e3a8bfd758257310fe2 |
|
27-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
add bgra as 1st class format This is a start to eliminating swap_rb as a stage. I've just hit the main hot spots here. Going to look into the ~dozen other spots to see how they should work next. Change-Id: I26fb46a042facf7bd6fff3b47c9fcee86d7142fd Reviewed-on: https://skia-review.googlesource.com/20982 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
9f2b3d1fbfa16146184a13b778fbe2fa5355c51b |
|
27-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
remove unused "swap" stage Change-Id: I25619f010f8ac6441529cfe8dff2d8c42d7400cf Reviewed-on: https://skia-review.googlesource.com/20988 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
279091ef85eec98969b72805bbf613f1c0660380 |
|
27-Jun-2017 |
Mike Reed <reed@google.com> |
specialize loaders for dst registers, to avoid move/swap stages Bug: skia: Change-Id: I75d82ef2226c5f116b7de2208c4e914739414b6d Reviewed-on: https://skia-review.googlesource.com/20984 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
d08109f3448c4a725a233e759508e9242e6bba42 |
|
27-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
try not zeroing registers in start_pipeline Generally stages take care of state setup themselves, either with seed_shader, constant_color, a load, etc. I think these zeros may be unnecessarily cautious. This can't make anything draw more correctly, but it could make things - draw wrong - draw more slowly - draw more quickly so it's an interesting thing to try and keep an eye on. Change-Id: I7e5ea3cd79e55a65e1dbd214601e147ba3815b87 Reviewed-on: https://skia-review.googlesource.com/20976 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
8c3d5156c7ab2bf723c307043841815d670895c5 |
|
19-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
add _hsw lowp backend CQ_INCLUDE_TRYBOTS=skia.primary:Build-Ubuntu-Clang-x86_64-Debug-MSAN Change-Id: Id53279c17589b3434629bb644358ee238af8649f Reviewed-on: https://skia-review.googlesource.com/20269 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Mike Reed <reed@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
c4fcbed6b2b2d1e2253e325b292931cb3d05d3fe |
|
26-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
somewhat less silly tail loads and stores No reason to keep going one at a time when we know there are generally better ways to handle loading a power-of-two number of low lanes. This strategy scales up too, with quick answers for 8 (one 8 byte load), 12 (one 8 byte, one 4 byte), etc. $ ninja -C out monobench; and out/monobench SkRasterPipeline_compile 300 Before: 46.946ns After: 43.341ns (This happens to be _lowp. Expect similar small speedups elsewhere.) Change-Id: I08f87769ea3c9f06ad13d2b1d5326e542b9b63a8 Reviewed-on: https://skia-review.googlesource.com/20903 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
4b8d491b2230de6153836113c6864303eeedcd67 |
|
26-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
lean more on the compiler in lowp stages This refactors {from,to}_{byte,8888} to lean a bit more on the compiler, and to share code between the two. The algorithm is not exactly the same, but it's comparable, and the results of course are identical. This new algorithm is a lot easier to generalize to AVX2, and parallels the full-precision {from,to}_{byte,8888} functions in _stages.cpp. Change-Id: I31ea90d65967bf4ede2497d1e2197cb0e7648bf8 Reviewed-on: https://skia-review.googlesource.com/20828 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
51e46d5e1c48072c08637ba8f875ba249eda2376 |
|
23-Jun-2017 |
Mike Reed <reed@google.com> |
use mul_inv instead of div for tiling Bug: skia: Change-Id: I5f88e7923fe204faba8dc5d87454805a4d470d52 Reviewed-on: https://skia-review.googlesource.com/20688 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Reed <reed@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
0cc60b8bbd251efbeb374815395ea3e5fb62e184 |
|
22-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
fix repeat/mirror sampling bleed I think this has been broken since we tried to simplify this in https://skia-review.googlesource.com/16547 The HSW backend does still look a little wrong, but improved, and the others seem fixed. Can you see how this affects your test cases, layout tests, etc? BUG=skia:6783 Change-Id: I17957ac8100331bea5b64d674bf43105048b72f6 Reviewed-on: https://skia-review.googlesource.com/20548 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
277f7f240f127772f7aa158baefc503dca05cf08 |
|
15-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
delete lowp plus I have figured out how to implement lowp clamp_1/clamp_a, and implementing clamp_1 would make lowp plus active. But... the way we have factored blend modes requires us to be able to lerp between the dst and possibly-out-of-range src values. This is not possible in lowp. If we try to multiply with values in [0x8001,0xffff], we'll just get garbage. We'll clamp them back in range, but sadly clamped garbage is still garbage. So the simplest thing to do is keep plus blends in floats. This CL doesn't even change that... we'd use floats before and after it. It just removes the lowp plus stage code that is both dead and buggy. As far as I can tell, no other drawing is currently gated by lowp missing clamp_1 or clamp_a. Change-Id: I55b73c840614f1bff9cd610dff90ca5e2b5c73e5 Reviewed-on: https://skia-review.googlesource.com/19909 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
f046594d82b84bf96c18f6fd0cb14c16bd3b8708 |
|
13-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
Remove AVX+ special case for load<U8>(). We haven't really needed this since we got constants working. Change-Id: Ie9de8df861959696ed44fc2a64e259cca786bfe5 Reviewed-on: https://skia-review.googlesource.com/19670 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
1dda8bbf467802113ab1802f792b5b191108add1 |
|
06-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
use smarter float -> skfixed15 logic everywhere This is the same logic from constant_color, covering all the other places where we convert from float to fixed, e.g. scale_1_float. This isn't quite ideal yet. We replace mulss+cvttss2si for addss+movd, which is great, but this leads to a silly sequence of code: addss %xmm2, %xmm0 movd %xmm0, %r9d movd %r9d, %xmm0 pshuflw $0x0, %xmm0, %xmm0 Those two movd are pointless... Again, all diffs due to switching from truncation to rounding. Change-Id: Icf6f3b6eb370fe41cea0cebcfda0b8907e055f41 Reviewed-on: https://skia-review.googlesource.com/18846 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
ce4b6c100f6e700b50933f75e3b4429357001028 |
|
06-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
less naive lowp constant_color This is as good as we can get without switching away from float inputs. All diffs due to rounding (from the +256.0f). Change-Id: I0d314f111d313577ce9078660178be17e865f11e Reviewed-on: https://skia-review.googlesource.com/18845 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
8e200787b769ba9ee0bd4bd37d27ded1fafc06e6 |
|
06-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
more easy lowp stages Change-Id: I8a292bc98135b41ceedb4242451436c3657616fc Reviewed-on: https://skia-review.googlesource.com/18722 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
adff9dc98200b6ffe1f902ef012c410c2424d220 |
|
05-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: only omit leaf frame pointers This seems to make Instruments work a lot better. Change-Id: I078f005d32e427b4eb31bc92c731e3444f2faffb Reviewed-on: https://skia-review.googlesource.com/18635 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
9653d3aa84505c30aa5440b5629cdb25525666c3 |
|
05-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
more lowp blend modes Change-Id: Id62e989d4278f273c040b159ed4d2fd6a2f209e0 Reviewed-on: https://skia-review.googlesource.com/18627 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
f36031b68aa5b92204187c154fc5bc717db20a3a |
|
05-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
lowp: add some big easy stages srcover_rgba_8888, lerp_u8, lerp_1_float, scale_u8, scale_1_float... this is enough for _lots_ of drawing. Change-Id: Ibe42adb8b1da6c66db3085851561dc9070556ee3 Reviewed-on: https://skia-review.googlesource.com/18622 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
727b09c8984b5c972ccde7f8f94d404b221eda6d |
|
05-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
lowp: add constant_color, swap, move_dst_src This is enough for us to do some really simple draws. Also add some debug tools to help prioritize porting. Change-Id: I334f8fd2133be1aeec3f3406371a81aa6c184776 Reviewed-on: https://skia-review.googlesource.com/18597 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
5adb01bf0d190b20abe50fac608f396c65993424 |
|
05-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
lowp: add move_src_dst and srcover This is enough to run the bench SkRasterPipeline_compile. $ ninja -C out monobench; and out/monobench SkRasterPipeline_compile 300 Before: 300 SkRasterPipeline_compile 48.4858ns After: 300 SkRasterPipeline_compile 37.5801ns Change-Id: Icb80348908dfb016826700a44566222c9f7a853c Reviewed-on: https://skia-review.googlesource.com/18595 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
1f29bf093f01e9c9bf79cbd2ac27da62efc8e3a4 |
|
05-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
slight streamlining for lowp load_8888 with pshufb We can use 2 pshufb to replace 4 unpacks when deinterlacing the colors. Change-Id: I713fbbc94f5cb9eaf14f85323b0ec76dc2246e98 Reviewed-on: https://skia-review.googlesource.com/18531 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
eeccbf735deedb88313c480dde29bfd8193ee5f3 |
|
05-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
Real fix for stage perf regression. When we made start_pipeline() return void, the call into the tail!=0 run of the pipeline became eligble to be a tail-call, and Clang made that choice. This had the side effect of not going through vzeroupper on those tails. We now mark start_pipeline() as inelligible for tail calls when targeting AVX+. All paths go through the vzeroupper at the end. BUG=chromium:729237 Change-Id: I2099931284214f24c67b38979b3ad4b4d10e8bba Reviewed-on: https://skia-review.googlesource.com/18591 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
9beafc41afa3c78ffa12649dcde73628c277da9c |
|
05-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
have start_pipeline() return limit again This is spooky. I don't quite yet understand why, but this makes things much faster. Performance regressed across the board when we no longer needed the value and changed it to return void: https://perf.skia.org/e/?begin=1496176469&keys=6994&xbaroffset=28513 You can see similar regressions following this Chromium bug link. BUG=chromium:729237 Change-Id: I68371b0456014f909acf819aca52aa4f4f187460 Reviewed-on: https://skia-review.googlesource.com/18580 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
bba02c203190a19bce7eb60887c4ae64041c8ae8 |
|
02-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
start on SkJumper lowp mode Just 3 stages implemented so far: load_8888 swap_rb store_8888 That's enough to make the shortest non-trivial pipeline that you see in the new unit test. Change-Id: Iabf90866ab452f7183d8c8dec1405ece2db695dc Reviewed-on: https://skia-review.googlesource.com/18458 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
9b10f8ff0d163d5d076e7028a1a173f9c1f3b714 |
|
01-Jun-2017 |
Mike Klein <mtklein@chromium.org> |
plumb y through to SkJumper There'll still be a little more refactoring after this, but this is the main thing we want to do. This makes y available in a general-purpose register in pipeline stages, just like x. Stages that need y (seed_shader and dither) can just use it rather than pulling it off a context pointer. seed_shader loses its context pointer, and dither's gets simpler. Change-Id: Ic2d1e13b03fb45b73e308b38aafbb3a14c29cf7f Reviewed-on: https://skia-review.googlesource.com/18383 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
823103384cada2e3b84726424b44e6cd80bee5fd |
|
31-May-2017 |
Mike Klein <mtklein@chromium.org> |
reland: We can mask load and store with just AVX Originally reviewed here: https://skia-review.googlesource.com/c/17452/ CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-ShuttleA-GPU-GTX550Ti-x86_64-Release-Valgrind Change-Id: I2e593e897ce93147ec593c2a5de143217274ba2a Reviewed-on: https://skia-review.googlesource.com/18267 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
b5e4842543318e1eac827433855c9a37cdb7f26a |
|
31-May-2017 |
Mike Klein <mtklein@chromium.org> |
clean up now that min_stride == 1 Change-Id: I1c285e818c3119bc5917dee6d7fbe4c0c62ff6d7 Reviewed-on: https://skia-review.googlesource.com/18153 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
e7ba8b05d2a98c43f8d37890d36d6d31538459a1 |
|
25-May-2017 |
Herb Derby <herb@google.com> |
Add tail handling for SSE* to SkJumper. Change-Id: Icb9d385333082de2f99b7a25cfd7251717e3f663 Reviewed-on: https://skia-review.googlesource.com/17580 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
506262665c9b02514dea9a4da3211a00fdf02dbe |
|
25-May-2017 |
Mike Klein <mtklein@chromium.org> |
add srcover_rgba_8888 Change-Id: Iabdc79183ccd2f9cc513d4bdc530fb078b1627ab Reviewed-on: https://skia-review.googlesource.com/17930 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
c998f733e3a3da0674fe32acfcec34b4650e4c2a |
|
25-May-2017 |
Mike Klein <mtklein@chromium.org> |
make sure to_srgb maps 1 to 1 CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Android-Clang-Nexus10-CPU-Exynos5250-arm-Release-Android,Test-Android-Clang-PixelC-CPU-TegraX1-arm64-Release-Android,Test-Android-Clang-Ci20-CPU-IngenicJZ4780-mipsel-Release-Android BUG=skia:6678,skia:6683 Change-Id: I217084fa0a11ad661a8751f0c3b1cade5cc52473 Reviewed-on: https://skia-review.googlesource.com/17902 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
6533159f2c83d0af51f79c3d85461580e2346080 |
|
24-May-2017 |
Mike Reed <reed@google.com> |
add stage for gaussian alpha to rgba for shadows speeds up GM:shadow_utils 20% Bug: skia: Change-Id: If52dd5e2c76ace82d06351af1419e0663a3a634f Reviewed-on: https://skia-review.googlesource.com/17844 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
8ca3356290323ed7eb42611b867fff90c239eb39 |
|
23-May-2017 |
Mike Klein <mtklein@chromium.org> |
remove min from repeat and mirror generally I don't think they do anything anymore after the inclusive/exclusive refactoring. Change-Id: I63f2e010a00953b5b6415de002bcb51ec2b73458 Reviewed-on: https://skia-review.googlesource.com/17490 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
9f85d68887a61f93656dbfacdd3f978d510c02ee |
|
23-May-2017 |
Mike Klein <mtklein@chromium.org> |
retry tilers against 1 These weren't the Valgrind problem. Change-Id: Ic76df460cf0ce7f7ae8155d549f4e92a7f8de040 Reviewed-on: https://skia-review.googlesource.com/17701 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
74fc593209c28efdf57127219322d944dfdf7c10 |
|
22-May-2017 |
Brian Osman <brianosman@google.com> |
Revert "We can mask load and store with just AVX." This reverts commit 139e463dc6f965fdaed854efcb20c6cafbb6dbdc. Reason for revert: Crashes on Valgrind bots. Original change's description: > We can mask load and store with just AVX. > > Previously we were using AVX2 instructions to generate the masks, > and AVX2 instructions for the mask load and stores themselves. > > AVX came with float mask loads and stores, which will work perfectly > fine. I don't really get what the point of the 32-bit int loads and > stores are in AVX2, beyond maybe syntax sugar? > > Change-Id: I81fa55fb09daea4f5546f8c9ebbc886015edce51 > Reviewed-on: https://skia-review.googlesource.com/17452 > Reviewed-by: Herb Derby <herb@google.com> > Commit-Queue: Ravi Mistry <rmistry@google.com> > TBR=mtklein@chromium.org,rmistry@google.com,herb@google.com,reed@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I3a48f006c20475f6334ff94998281f381c696c93 Reviewed-on: https://skia-review.googlesource.com/17524 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Brian Osman <brianosman@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
459889967899be7258edcd8e0567f63bb3ae33df |
|
22-May-2017 |
Brian Osman <brianosman@google.com> |
Revert "add tilers against 1" This reverts commit 8110b849d60455d6fb594f26919f0f38c3ec9925. Reason for revert: Valgrind requires reverting ancestor commit. Original change's description: > add tilers against 1 > > Change-Id: I2482972a43cb89a93cbfb9e708614e0334002e53 > Reviewed-on: https://skia-review.googlesource.com/17483 > Reviewed-by: Herb Derby <herb@google.com> > Commit-Queue: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org,herb@google.com,reed@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: Icbc3a2212f800854ef7b2b17aa99fedad182d53e Reviewed-on: https://skia-review.googlesource.com/17523 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Brian Osman <brianosman@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
8110b849d60455d6fb594f26919f0f38c3ec9925 |
|
22-May-2017 |
Mike Klein <mtklein@chromium.org> |
add tilers against 1 Change-Id: I2482972a43cb89a93cbfb9e708614e0334002e53 Reviewed-on: https://skia-review.googlesource.com/17483 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
139e463dc6f965fdaed854efcb20c6cafbb6dbdc |
|
22-May-2017 |
Mike Klein <mtklein@chromium.org> |
We can mask load and store with just AVX. Previously we were using AVX2 instructions to generate the masks, and AVX2 instructions for the mask load and stores themselves. AVX came with float mask loads and stores, which will work perfectly fine. I don't really get what the point of the 32-bit int loads and stores are in AVX2, beyond maybe syntax sugar? Change-Id: I81fa55fb09daea4f5546f8c9ebbc886015edce51 Reviewed-on: https://skia-review.googlesource.com/17452 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Ravi Mistry <rmistry@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
5d7f2b5301f82334ae0eb70b54bad01a0eb04062 |
|
20-May-2017 |
Mike Klein <mtklein@chromium.org> |
tidy up dither stage Using the float iota was just an expedient to write the stage... this CL adds the U32 iota that dither really wants. Change-Id: I7990b10afd0c5277186b6b8e730245d291bcef0c Reviewed-on: https://skia-review.googlesource.com/17441 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
0264095cfd1aa205b0af218c7e5abbaff8a5db0c |
|
19-May-2017 |
Mike Reed <reed@google.com> |
stage version of vertices This CL, just to limit its size/complexity, only handles colors but not textures. Future CLs will cover everything. Performance is pretty exciting. Its faster than the old code-path, and when we fix a bug in pathutils to preserve opaqueness, it gets a lot faster (8 -> 5) Bug: skia: Change-Id: I4113060e25fe25fe4e6a0ea59bd4fa5e33abc668 Reviewed-on: https://skia-review.googlesource.com/17276 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
f45e3d78a4660b4450a218926d70da4941359efe |
|
15-May-2017 |
Mike Klein <mtklein@chromium.org> |
try Herb's new to_srgb This was 6-8% faster than the previous code on my Trashcan. Change-Id: I70081009e233c83226d6d302f871fb7e86cdc438 Reviewed-on: https://skia-review.googlesource.com/16986 Reviewed-by: Matt Sarett <msarett@google.com> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
7e68bc93ffaa394bd94a6954e2a8318a825067c6 |
|
16-May-2017 |
Mike Klein <mtklein@chromium.org> |
clamp to premul in dither Dither can bump color values above alpha (duh), or below zero (duh), so clamp back to premul after dithering. BUG=skia:6644,skia:6643 Change-Id: Ida107e866380e06130af0d01467117bca929ba44 Reviewed-on: https://skia-review.googlesource.com/17070 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
fd35c742bff9c7e39f1b6deeb2171d04ca895f0a |
|
15-May-2017 |
Mike Klein <mtklein@chromium.org> |
fix SkJumper radial gradient precision rcp(rsqrt(x)) doesn't have enough precision when x is a coordinate. (It's fine when x is a color, like in the softlight blend mode.) Adds a GM to test this. It used to look quite ugly. Change-Id: Icec295c2e2f50ae7a5e3e33c62270f632a58f65c Reviewed-on: https://skia-review.googlesource.com/16914 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
4de1304297d7220b223c829bf386f97815db1654 |
|
15-May-2017 |
Herb Derby <herb@google.com> |
Add evenly spaced stops and unify gradient contexts Change-Id: I17ac13b9d1ea6765e2c1a2b53aa6975eab408856 Reviewed-on: https://skia-review.googlesource.com/16713 Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
9959f723c33d609519c265abb5c4f48cba71fd6f |
|
15-May-2017 |
Mike Reed <reed@google.com> |
composeshader stages needed to add two helper stages for composeshader load_rgba, store_rgba These just read/write the r,g,b,a registers to context memory, making no promise as to how the memory is formatted (e.g. interleaved -vs- planar). Note that we have similar existing stages, but they did not seem to suit: constant_color This guy loads 4 floats from memory, and splats them into registers. I need to load 4 entire registers. load_f32, store_f32 These offset where they read/write based on the 'x' register, plus they guarantee that the memory will be interleaved ala SkPM4f. Bug: skia: Change-Id: Iaa81f950660b837bdb34416ab3e342d56a92239b Reviewed-on: https://skia-review.googlesource.com/16716 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Reed <reed@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
08aa88d280732386f4c56cbce6ae15f638805a1b |
|
12-May-2017 |
Mike Klein <mtklein@chromium.org> |
fix SkJumperHSL blend modes I took a new, unprincipled approach here, which is to just mimic the legacy code path exactly (e.g. hue_modeproc in SkXfermode.cpp). This fixes how we handle alpha in these blend modes, and ought to be faster by avoiding the unpremul. BUG=skia:6621 Change-Id: I21635290985ff42d9421d2718f7a88cf44a85d7f Reviewed-on: https://skia-review.googlesource.com/16711 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
5c7960be57010bf61db3d4ce879a3194687b5af9 |
|
11-May-2017 |
Mike Klein <mtklein@chromium.org> |
refactor gradient stage names This is just a name refactor and I'm happy to delay it until we're done with the current wave of gradient CLs. The main ideas: - we use the "linear_gradient" stages for all gradients, so cut the "linear" and just call them "gradient"; - remind ourselves that the 2-stop stage requires even spacing, i.e. stops at 0 and 1. This name should harmonize with Herb's new general evenly spaced gradient stage, currently "evenly_spaced_linear_gradient", and after it lands and I rebase, "evenly_spaced_gradient" - remind ourselves which polar coordinate xy_to_polar_unit returns, the angle. Change-Id: I0fd0c8bd4c1ead7d2d0fff45a199d318b71f34ac Reviewed-on: https://skia-review.googlesource.com/16500 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
df3a371c904c2e3e1d3d9201b7dfc0d080e5f12a |
|
12-May-2017 |
Mike Klein <mtklein@chromium.org> |
Revert "Evenly space gradient stage." This reverts commit 892501d09bc8608704362235c73a59bb23a386b3. Reason for revert: https://bugs.chromium.org/p/chromium/issues/detail?id=721682 :( Original change's description: > Evenly space gradient stage. > > This seems like an experiment at this point because I don't know how to do > this kind of thing on arm. > > > Numbers from Skylake... > Before: > ./out/Release/nanobench --config srgb \ > --match gradient_linear_clamp_3color gradient_linear_clamp_hicolor -q 19:48:13 > Timer overhead: 36.7ns > ! -> high variance, ? -> moderate variance > micros bench > 439.92 ? gradient_linear_clamp_3color srgb > 2697.60 gradient_linear_clamp_hicolor srgb > 437.28 gradient_linear_clamp_3color_4f srgb > 2700.50 gradient_linear_clamp_hicolor_4f srgb > > > After: > micros bench > 382.35 gradient_linear_clamp_3color srgb > 593.49 gradient_linear_clamp_hicolor srgb > 382.36 gradient_linear_clamp_3color_4f srgb > 565.60 gradient_linear_clamp_hicolor_4f srgb > > > Numbers on my Mac Trashcan are about even; there is no > speedup or slowdown between master and this change. > > Change-Id: I04402452e23c0888512362fd1d6d5436cea61719 > Reviewed-on: https://skia-review.googlesource.com/15960 > Commit-Queue: Herb Derby <herb@google.com> > Reviewed-by: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org,mtklein@google.com,herb@google.com,fmalita@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: Ic6a064c66686b6f238ca1417ba1abd9ce25de1b4 Reviewed-on: https://skia-review.googlesource.com/16660 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
892501d09bc8608704362235c73a59bb23a386b3 |
|
11-May-2017 |
herb <herb@google.com> |
Evenly space gradient stage. This seems like an experiment at this point because I don't know how to do this kind of thing on arm. Numbers from Skylake... Before: ./out/Release/nanobench --config srgb \ --match gradient_linear_clamp_3color gradient_linear_clamp_hicolor -q 19:48:13 Timer overhead: 36.7ns ! -> high variance, ? -> moderate variance micros bench 439.92 ? gradient_linear_clamp_3color srgb 2697.60 gradient_linear_clamp_hicolor srgb 437.28 gradient_linear_clamp_3color_4f srgb 2700.50 gradient_linear_clamp_hicolor_4f srgb After: micros bench 382.35 gradient_linear_clamp_3color srgb 593.49 gradient_linear_clamp_hicolor srgb 382.36 gradient_linear_clamp_3color_4f srgb 565.60 gradient_linear_clamp_hicolor_4f srgb Numbers on my Mac Trashcan are about even; there is no speedup or slowdown between master and this change. Change-Id: I04402452e23c0888512362fd1d6d5436cea61719 Reviewed-on: https://skia-review.googlesource.com/15960 Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
fc84dc5f0eb18900e935dd29e1b75e96820781cc |
|
11-May-2017 |
Mike Klein <mtklein@chromium.org> |
proposed: inclusive gradients, exclusive images Change-Id: I5821f823a4c0df54d4388a2f455767f58ae646b8 Reviewed-on: https://skia-review.googlesource.com/16547 Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
6f9f2591c1a6a01f0c0fb06fcd77d1429e3ce260 |
|
10-May-2017 |
bungeman <bungeman@google.com> |
Fix alpha coverage for lerp_565 stage. Like three gray with alpha blends and taking the max alpha. Change-Id: I104c84f784979030744127f8f66905ad9d1bdf0e Reviewed-on: https://skia-review.googlesource.com/15898 Commit-Queue: Ben Wagner <bungeman@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
090fbf86cf90dd326b3b3b59cde0a46b63a594a6 |
|
08-May-2017 |
Herb Derby <herb@google.com> |
Add radial gradient stage. Change-Id: Ie1f9640f5149f21bd8b3b864ff8b176232e1b0a9 Reviewed-on: https://skia-review.googlesource.com/15461 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
bb33833ed25c30007e4ea3cd3de6df728407f94e |
|
04-May-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, finish blend modes I've decided to ignore our existing CPU implementations and start from scratch, mostly referencing the GL ES 3.2 spec and w3 spec. This implementation ought to look a lot like the reference implementation I've written in gm/hsl.cpp, with the addition of handling alpha: unpremul, blend, re-premul with a simple SrcOver alpha. Change-Id: I38cf6be2dc66a6f46d7b18b91847f6933d2fab62 Reviewed-on: https://skia-review.googlesource.com/15316 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
db711c982bfaa805d2de5a253c55a680c30189e0 |
|
03-May-2017 |
Mike Klein <mtklein@chromium.org> |
move dither after the transfer function There's no single dither rate that we can use in linear space if we're using a non-linear transfer function... if it's too high (like today) we'll dither too much around zero (e.g. 0 -> 5), and if it's too low we won't dither near one. We were thinking it'd be a good idea to move the dither later in the pipeline anyway. This has to be the right spot! Now that we're moving dither from operating in linear space to operating in bit-space (after transfer function, aware of bit-depth of destination) we have to start clamping [0,1] instead of [0,a]... But, I think I've rewritten things to make sure we don't need to clamp at all. The main idea is to make sure 0-dither and 1+dither round to 0 and 1 respectively. We can do this by making the dither span exclusive, switching from [-0.5,+0.5] to (-0.5,+0.5). In practice I'm doing that as [-0.4921875,+0.4921875], a maximum dither of 63/128 of a bit. Similarly, I don't think it makes sense to fold in the multiply by alpha anymore if we're after the transfer function. Change-Id: I55857bca80377c639fcdd29acc9b362931dd9d12 Reviewed-on: https://skia-review.googlesource.com/15254 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
581e69865ef5425aa5c9dba173b346dff1ec5652 |
|
03-May-2017 |
Mike Klein <mtklein@chromium.org> |
dither stage I think we can dither generically as a pipeline stage. I'm not married to where the dither happens, or the implementation, which is mostly cribbed from https://en.wikipedia.org/wiki/Ordered_dithering. BUG=skia:3302,skia:6224 Change-Id: If7f6b22a523ca0b34cb03c0aa97b6734c34e0133 Reviewed-on: https://skia-review.googlesource.com/15161 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
7eb86981a954c500fa4a4d8425496a5beb789e5d |
|
03-May-2017 |
Herb Derby <herb@google.com> |
Add sweep gradient to SkRasterPipeline This is a prototype. I have remove the tiling, but maybe I should add it back in. I thinking about factoring out the common code with linear gradient in its own CL. I think radial gradient will be very close to this. Change-Id: I1dfcb4f944138ee623afdf10b2a8befde797c604 Reviewed-on: https://skia-review.googlesource.com/13766 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
5664e65eb1680a14eeaa6ca79ddf9e734518c822 |
|
01-May-2017 |
Mike Klein <mtklein@chromium.org> |
finish up constants For whatever reason, if I swap the condition in the if_then_else tests from < to >= and swap the then/else values, I can use constants in hsl_to_rgb. Still don't understand why, but I'll take it. I suspect it has something to do with SSE, IEEE, and NaN, but I don't care enough to speculate any more concretely. This does that, removes C() and _f, updates some comments, and adds a guard in build_stages.py to yell if it sees trouble like LCPI40_4... This reminds me to try -ffast-math soon. I think that was mostly held back by constants. Change-Id: I3f8a37a4d4642f77422ce3261b750061e9e604a3 Reviewed-on: https://skia-review.googlesource.com/14942 Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
879a08ac146acb2518363cc9d2d8aa6dce04528d |
|
01-May-2017 |
Mike Klein <mtklein@chromium.org> |
refactor hsl_to_rgb a touch This rewrites the existing logic to expose more of the symmetries, and especially to make them clearly identical subexpressions. I think it's clear that the intent in hue_to_rgb is to wrap the t value back into 0-1... that's t = fract(t). No GM diffs. Change-Id: I9d62d8f80bcb45711ee334f953d3f6410e068ce4 Reviewed-on: https://skia-review.googlesource.com/14940 Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
b3665f01608dc65374b23f00240582494021a540 |
|
01-May-2017 |
Mike Klein <mtklein@chromium.org> |
fix t / t2 confusion in hsl_to_rgb I think the original[1] SkRasterPipeline_opts.h version had a typo that I faithfully copied over to hsl_to_rgb. In Hue2RGB[2], the scalar equivalent of hue_to_rgb, we mutate t to keep it in 0-1 range[3]. In the SkRasterPipeline_opts.h code we introduced a new value t2 instead, and then used it everywhere, but accidentally typed 't' in the t < 1/6 case. The expression "p + (q - p)*6.0f*t" should have been "p + (q - p)*6.0f*t2". This fixes things by changing t in place, much like Hue2RGB does. The GM doesn't change anywhere, which is troubling. [1] https://skia-review.googlesource.com/c/7460/21/src/opts/SkRasterPipeline_opts.h#808 [2] https://skia-review.googlesource.com/c/7460/21/src/effects/SkHighContrastFilter.cpp#26 [3] I think this whole clamp should probably become "t = fract(t)". Will follow up. Change-Id: I3dcc1a79ffae46830178d931844ee3113f8bdfd1 Reviewed-on: https://skia-review.googlesource.com/14910 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
fb11acdeefce2fe24e98dc4e4c1d002291604119 |
|
01-May-2017 |
Mike Klein <mtklein@chromium.org> |
getting close on float constants I think I'm now down to just the constants used in comparisons in hsl_to_rgb... so weird. I don't get what's special about this code. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2 Change-Id: I398139f00371bdbc45281a49bca56d02ba59cba9 Reviewed-on: https://skia-review.googlesource.com/14909 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
fe560a8cc3839b7c4c0a63bdb286fe1e1f89a5dc |
|
01-May-2017 |
Mike Klein <mtklein@chromium.org> |
some float constants Trying to go slowly to find where problems arise. Weirdly, I think I got everything except hsl_to_rgb. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2 Change-Id: I4d85a4c1f40bd87e7cb18fc9b5ce020812dc31db Reviewed-on: https://skia-review.googlesource.com/14905 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
308e62416e9c8191932c6bde1711f336bafda373 |
|
27-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, remove C(int) This finishes off integer constants... they should all be normal now. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2 Change-Id: I66ecc6533807fc59bb5ac9d3c5f7ab9e6e1f0d7f Reviewed-on: https://skia-review.googlesource.com/14528 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
0aa742f15a400433bafe6350c20523b4dd062f64 |
|
27-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, replace _i with normal constants So far I only seem to be encountering constant pools with float constants, so integer constants should be easy to make normal. This just removes _i. There might be a couple integer constants generated with C() too... they'll be the next CL. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2 Change-Id: Icc82cbc660d1e33bcdb5282072fb86cb5190d901 Reviewed-on: https://skia-review.googlesource.com/14527 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
b4bbc64adee74f99ddfc30ad4e5043f98380bdbb |
|
27-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
clear out C(), _i, and _f constants from SkJumper_vectors.h I think this is just gonna be a bunch of baby steps for now. This gets everything in SkJumper_vectors.h. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2 Change-Id: Ic87faa9bf6380a4fc9a577936dad8c3a9c48472e Reviewed-on: https://skia-review.googlesource.com/14441 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
67e617149df4336b77a3ccdf5ea7d557c7a33166 |
|
26-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
prep for more constants - Add -z to print zero bytes instead of ... - avx+hsw will create 32-byte constants in .const, so we should disassemble those too, and align to 32 bytes. - The default _text section on Windows is 16-byte aligned, so we make a new one that's 32-byte aligned. Change-Id: Icb2a962baa4c3735e98a992f2285eaf5cb1680fd Reviewed-on: https://skia-review.googlesource.com/14364 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
c7be00366bb0171e2d247ea71e291a64e3d10254 |
|
25-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
remove to_2dot2 and from_2dot2 The parametric_{r,g,b} stages are just as good now; under the hood it's all going through approx_powf. Change-Id: If7f3ae1e24fcee2ddb201c1d66ce1dd64820c89a Reviewed-on: https://skia-review.googlesource.com/14320 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
2229b576949a76fddb5bc823accfd70e655dea14 |
|
21-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, maybe we can just use constants As long as everything is laid out the same way they were originally, I don't think there's any reason we can't just use %rip-relative addressing on x86-64. Basically, we just need to keep all the sections together in order. Somewhat subtly we cannot just use -D to disassemble all sections. -D will double-disassemble[1] some bytes, which throws off our %rip-relative addressing of constants. You can see this in PS1. So we whitelist sections instead. [1], from man objdump: This option also has a subtle effect on the disassembly of instructions in code sections. When option -d is in effect objdump will assume that any symbols present in a code section occur on the boundary between instructions and it will refuse to disassemble across such a boundary. When option -D is in effect however this assumption is supressed. This means that it is possible for the output of -d and -D to differ if, for example, data is stored in code sections. Change-Id: Idbcfe08e67113b3f7d75749931c640ff90aa0bf4 Reviewed-on: https://skia-review.googlesource.com/14029 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
795c5b156756796cbb3a584c99b1ab51fb5fe187 |
|
21-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, implement 2.2 stages with approx_powf My main interest is getting rid of weird code, but it's also faster. The new bench drops from 667 to 412. Change-Id: Ibf889601284cf925780320c828394f79937dc705 Reviewed-on: https://skia-review.googlesource.com/14035 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
4e3e9f858cf111d08be08a7274028d8a6e6a8b5d |
|
20-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, lab_to_xyz I don't suppose you know any existing test coverage of this? I can't seem to trigger any... Change-Id: I7244053e2fb665888c9443d2bc52c5a23725ec53 Reviewed-on: https://skia-review.googlesource.com/13970 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
c17dc24fa9994eacc640d3adc6633a02fd96d3fc |
|
20-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, rework callback a bit, use it for color_lookup_table Looks like the color-space images have this well tested (even without lab_to_xyz) and the diffs look like rounding/FMA. The old plan to keep loads and stores outside callback was: 1) awkward, with too many pointers and pointers to pointers to track 2) misguided... load and store stages march ahead by x, working at ptr+0, ptr+8, ptr+16, etc. while callback always wants to be working at the same spot in the buffer. I spent a frustrating day in lldb to understood 2). :/ So now the stage always store4's its pixels to a buffer in the context before the callback, and when the callback returns it load4's them back from a pointer in the context, defaulting to that same buffer. Instead of passing a void* into the callback, we pass the context itself. This lets us subclass the context and add our own data... C-compatible object-oriented programming. Change-Id: I7a03439b3abd2efb000a6973631a9336452e9a43 Reviewed-on: https://skia-review.googlesource.com/13985 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
097d0939e319493765074025d2d34c4179662f7b |
|
20-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
more symmetry for from_half/to_half Tweaks to make the parallels between from_half and to_half stand out. We can logically do the `auto denorm = em < ...;` comparisons as either U32 or I32. U32 would read more naturally, but we do I32 because some instruction sets have direct signed comparison but must synthesize an unsigned comparison. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Android-Clang-PixelC-CPU-TegraX1-arm64-Release-Android,Test-Android-Clang-Ci20-CPU-IngenicJZ4780-mipsel-Release-Android,Test-Android-Clang-Nexus10-CPU-Exynos5250-arm-Release-Android,Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Release,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86-Debug,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug Change-Id: Ic74fe5b3b850f5bb7fd00fd4435bc32b8628eecd Reviewed-on: https://skia-review.googlesource.com/13963 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
d0ce148ed4945aa75fb7eeaaffcfd345dd9f85fb |
|
19-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
test and fix f16<->f32 conversion stages This refactors from_half() and to_half() a bit, totally reimplementing the non-hardware cases to be more clearly correct. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Android-Clang-PixelC-CPU-TegraX1-arm64-Release-Android,Test-Android-Clang-Ci20-CPU-IngenicJZ4780-mipsel-Release-Android,Test-Android-Clang-Nexus10-CPU-Exynos5250-arm-Release-Android,Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Release,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86-Debug,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug Change-Id: I439463cf90935c5e8fe2369cbcf45e07f3af62c7 Reviewed-on: https://skia-review.googlesource.com/13921 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
da16434928b4c721b462d780d5140f9fc68d1413 |
|
19-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
refactor approx_{log2,pow2,powf} - Move to SkJumper_vectors.h - Fold the -127 and +2.774485010. - approx_powf(F,F) instead of approx_powf(F,float) for consistency. - A little layout reformatting. Change-Id: If9cb3d62a097cb6ecf89f157a1dde672c1516371 Reviewed-on: https://skia-review.googlesource.com/13865 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
44375176c06f00682518a03d4983554ca8fb5b6a |
|
18-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, parametric_{r,g,b,a} I've tried a couple of ideas for approx_powf(): 1) accumulate integer powers of x, then 4th roots, then 16th roots 2) continue 1) all the way to 256th roots 3) decompose into pow2 and log2, exploiting IEEE float layout 4) slightly tune constants used in 3) 5) accumulate integer powers of x, then 3+4) with different tuning 6) follow a source online, basically 5 with finesse 7) a new source quoting and improving on the method in 6). 7) seems perfect, enough that maybe we can explore improving its speed at cost of precision. Might be nice to get rid of those divides. If we allow a small tolerance (2-5) in our tests, we could use the very simple fast forms from 3) (e.g. PS 5). I wish I had some images to look at! Anything involving roots seems to be subverted by poor rsqrt precision. This change of course affects the pipelines created by the tests for exponential and full parametric gamma curves. What's less obvious is that it also means SkJumper can now for the first time run the pipeline created by the mixed gamma curves test. This means we now need to relax our tolerance for the table-based channel, just like we did when implementing table_{r,g,b,a}. This took me an embarassingly long time to figure out. *face palm* Change-Id: I451ee3c970a0a4a4e285f8aa8f6ef709a654d247 Reviewed-on: https://skia-review.googlesource.com/13656 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com> Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
c7d9c0b80886d24e74cd6e5a7118f44c06e0cd9b |
|
17-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, table_{r,g,b,a} In testing, it didn't really seem like we're getting anything out of doing an interpolated lookup, so this just does a single rounded lookup. Change-Id: If85ba68675945b442076519dd7f1bf7540d1628d Reviewed-on: https://skia-review.googlesource.com/13646 Commit-Queue: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
a3735cd34b4f8db4d1c5d1eee1c6e5841c75706b |
|
17-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, u16_be load_tables stages Change-Id: I738da1dac2ef5b74ef34cca14938c0b6cce2fe16 Reviewed-on: https://skia-review.googlesource.com/13596 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
b3821730e1ddd86b356fbf3247d115950c269240 |
|
17-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, load_rgb_u16_be testing: out/dm --src colorImage --colorImages images/colorspace/ --config srgb Change-Id: I3481e1fb4a070cfd5d95361329fd95af5fcfbd1f Reviewed-on: https://skia-review.googlesource.com/13590 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Matt Sarett <msarett@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
7fee90cb5eda2345bb8ec9be706aea1a09866005 |
|
07-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
add a callback stage to SkRasterPipeline This lets us temporarily escape to piece of code outside SkRasterPipeline. We should be able to use this to replace - parametric_{r,g,b,a} - table_{r,g,b,a} - color_lookup_table - shader_adapter* * We want to obsolete shader_adapter for other reasons anyway, but we _could_ replace it with this if we want to. Change-Id: I42b657b3c19c679796ed1876856cae0c8471307e Reviewed-on: https://skia-review.googlesource.com/12102 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Matt Sarett <msarett@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
0a9044950c1caa1b9dc0c2837889850d044d1d34 |
|
12-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, bilinear and bicubic sampling stages This splits SkImageShaderContext into three parts: - SkJumper_GatherCtx: always, already done - SkJumper_SamplerCtx: when bilinear or bicubic - MiscCtx: other little bits (the matrix, paint color, tiling limits) Thanks for the snazzy allocator that allows this Herb! Both SkJumper and SkRasterPipeline_opts.h should be speaking all the same types now. I've copied the comments about bilinear/bicubic to SkJumper with little typo fixes and clarifications. Change-Id: I4ba7b7c02feba3f65f5292169a22c060e34933c6 Reviewed-on: https://skia-review.googlesource.com/13269 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
db1cbcb4b5e5da04571e3f8c0008b6675fb93cb3 |
|
12-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, rgb<->hsl I've rearranged while porting, I hope making the logic clearer. Exactly one gm is affected, highcontrastfilter. The most interesting line is this, from hsl_to_rgb: F t2 = if_then_else(t < 0.0_f, t + 1.0_f, I had to write 0.0_f (instead of the usual 0) to force Clang to compare against a zero register instead of a 16-byte zero constant in memory. Register pressure is high in hsl_to_rgb, so something must have kicked in to prefer memory over zeroing a register. No big deal. It makes the code read more symmetrically anyway. Change-Id: I1a5ced72216234587760c6f803fb69315d18fae0 Reviewed-on: https://skia-review.googlesource.com/13242 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
7b4202de0e818024fb5c5d9535ebbacec0e186c1 |
|
10-Apr-2017 |
Herb Derby <herb@google.com> |
Add multi-stop SkJumper stage. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I954d02638a785bec284d2fdf8f46abfccd474e7a Reviewed-on: https://skia-review.googlesource.com/10211 Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
40de6dad46874d03cc33a883797f20c665d7aa39 |
|
07-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, byte_tables + byte_tables_rgb Factors out a function F from_byte(U8) too. Change-Id: Ib739ccbd509ddf25d2bfb7751ba6eaf51b16c12f Reviewed-on: https://skia-review.googlesource.com/11791 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
5f055f0fe9a3391b5481d09cbba21b7eeee06103 |
|
07-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, gather_f16 Here we use 64-bit gather instructions for HSW, which I think we haven't done before. Change-Id: I7b22b3cc0b7a151952518bb9afb90624ebdb4a22 Reviewed-on: https://skia-review.googlesource.com/11602 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
7d3d8723319038d16456137ba932f238c1e65dbf |
|
06-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, gather_i8 Change-Id: Iefa8044bac0555c5fff370217a6270b4f3c64300 Reviewed-on: https://skia-review.googlesource.com/11582 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
21bd3e4b11b001748d5533eb3d7ee5682b89aa68 |
|
06-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, more gathers This is all the gathers except index 8 and f16, which aren't conceptually hard but I want to land separately. Change-Id: I525f2496e55451041bd6ea07985858fda7b56a40 Reviewed-on: https://skia-review.googlesource.com/11524 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
dec4ea81ce38614426b9a0f1245fa90d883ddb52 |
|
06-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, gather_8888 Change-Id: I70bd64d114a2460534bcb51d356e13d9bc3b8603 Reviewed-on: https://skia-review.googlesource.com/11491 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
14987ebb97f491f1b5bc776252b5ddbf65b8fca0 |
|
06-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, add load_f32() Change-Id: I71d85ffe29bc11678ff1e696fa4a2c93d0b4fcbe Reviewed-on: https://skia-review.googlesource.com/11446 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
8a823faeba2da8a77740b0cd9eaf1acf473067b1 |
|
05-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, kill off F4 Its alignment (sometimes 4, sometimes 16) has proven to be error-prone. This also means we don't really need LazyCtx::load(). I think I only had it there to make sure we were doing unaligned loads of F4; the better way is to just never declare the data as aligned... The generated code isn't quite as good, but I can live with it. Change-Id: I5d57a580ca12c94ca84a5e8b72a66cf8d0c829eb Reviewed-on: https://skia-review.googlesource.com/11406 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
7125ac603613d156d83a528f27b03e04282a1e37 |
|
05-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, to_2dot2 and from_2dot2 Nothing too tricky here. Change-Id: I2a10548efc75a6fd875fcb242790880d9b9a28fd Reviewed-on: https://skia-review.googlesource.com/11388 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
3146bb948373b03ae118c10732c8bd11a931157d |
|
05-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, load_u16_be and store_u16_be Change-Id: I2d58538ab071b217d8dbbf2d802493d9045eabf2 Reviewed-on: https://skia-review.googlesource.com/11384 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
95f53be0059940da50d4fce10da5c4dcf037b6ae |
|
04-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, split store_f16 into to_half, store4 Pretty much the same deal as the last CL going the other direction: split store_f16 into to_half() and store4(). Platforms that had fused strategies here get a little less optimal, but the code's easier to follow, maintain, and reuse. Also adds widen_cast() to encapsulate the fairly common pattern of expanding one of our logical vector types (e.g. 8-byte U16) up to the width of the physical vector type (e.g. 16-byte __m128i). This operation is deeply understood by Clang, and often is a no-op. I could make bit_cast() do this, but it seems clearer to have two names. Change-Id: I7ba5bb4746acfcaa6d486379f67e07baee3820b2 Reviewed-on: https://skia-review.googlesource.com/11204 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
114e6b33d67537f034b749e77f68d168ef9bfbc6 |
|
04-Apr-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, factor out load4() and from_half() load_f16 gets slightly worse codegen for ARMv7, SSE2, SSE4.1, and AVX from splitting it apart compared to the previous fused versions. But the stage code becomes much simpler. I'm happy to make those trades until someone complains. load4() will be useful on its own to implement a couple other stages. Everything draws the same. I intend to follow up with more of the same sort of refactoring, but this was tricky enough a change I want to do them in small steps. Change-Id: Ib4aa86a58d000f2d7916937cd4f22dc2bd135a49 Reviewed-on: https://skia-review.googlesource.com/11186 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
f809fef8280e3d6ad9d95697cd234560b49962ab |
|
31-Mar-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, a couple simple loads and stores Change-Id: I217d7b562f5fa443978044e17469ba757c061209 Reviewed-on: https://skia-review.googlesource.com/10971 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
61b84169eeab16a69e0efc5a7146d235b7a79210 |
|
31-Mar-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, caught up on blend modes Change-Id: I39538c90cd4c68691c3956e3f51616b77e4c90d1 Reviewed-on: https://skia-review.googlesource.com/10961 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
66b09abdb184d843ece45a1d343fbe632075b323 |
|
31-Mar-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, another batch of blend modes Change-Id: I0c48ec80eee8b7c7e9fb980efa8ed1dad5ad9768 Reviewed-on: https://skia-review.googlesource.com/10924 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
aaca1e44b15f205a9393580b697bfd8331741a17 |
|
31-Mar-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, more blend modes Change-Id: I17ce08a7ec62ef8ffe8ae567079d669a87ef9a9c Reviewed-on: https://skia-review.googlesource.com/10921 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
5f80485a53e56e92d0914131b03ae33e49c163a8 |
|
30-Mar-2017 |
Mike Klein <mtklein@chromium.org> |
make _win.S know if it's 64-bit I think this is the root of my Windows / Chrome problems. Even on 32-bit builds, Chrome compiles nacl64.exe in 64-bit mode. So to make things simple, always put _win.S in the sources, and no-op it away when assembling for 32-bit. Change-Id: I19f163491739a6c0cbdedd0ce353f1d2289907ae Reviewed-on: https://skia-review.googlesource.com/10637 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
d7e06aec7e7837ca1dd54680ad37b431e40e4eef |
|
29-Mar-2017 |
Mike Klein <mtklein@chromium.org> |
jumper, revert to generating .S files I went with the unified-in-one-.cpp approach mostly to make it easy to roll out SkJumper. I no longer see any difficultly rolling out the assembly files, and it's possible the unified .cpp approach just makes things harder. Let's see if it's any easier to get Chrome's official build to work with normal assembly files. It's not going to be a problem to roll out. This is a partial revert of https://skia-review.googlesource.com/c/9336. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win2k8-MSVC-GCE-CPU-AVX2-x86_64-Debug,Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Debug,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug Change-Id: Idfdbd2d322452b44bc0adaf6dc299cc7649bc51e Reviewed-on: https://skia-review.googlesource.com/10561 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
894d5611e54cbf62a03ff9ffb48a2302dda9ab86 |
|
07-Mar-2017 |
Mike Klein <mtklein@chromium.org> |
Back to code as data arrays, this time in .text. This technique lets us generate a single source file, use the C++ preprocessor, and avoid the pain of working with assemblers. By using the section attribute or declspec allocate, we can put these data arrays into the .text section, making them ordinary code. This is like the previous solution, except it should actually run. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win2k8-MSVC-GCE-CPU-AVX2-x86_64-Debug,Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Debug,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug Change-Id: Ide7675f6cf32eb4831ff02906acbdc3faaeaa684 Reviewed-on: https://skia-review.googlesource.com/9336 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
94fc0fe016bbfeb097c829df513673d4fcbb38b3 |
|
03-Mar-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: store_f32 Change-Id: I4bc6d1a8787c540fd1a29274650d34392e56651c Reviewed-on: https://skia-review.googlesource.com/9223 Reviewed-by: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
767c7e7a0b8a5462df100c2662f0bf99cbad6f03 |
|
02-Mar-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: use AVX2 mask loads and stores for U32 SkRasterPipeline_f16: 63 -> 58 (8888+f16 loads, f16 store) SkRasterPipeline_srgb: 96 -> 84 (2x 8888 loads, 8888 store) PS3 has a simpler way to build the mask, in a uint64_t. Timing is still roughlt the same. Change-Id: Ie278611dff02281e5a0f3a57185050bbe852bff0 Reviewed-on: https://skia-review.googlesource.com/9165 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
8e8e817cbf6c6e3df744dbaa070564c453b70a62 |
|
02-Mar-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: skip null contexts This makes stages that don't use a context pointer look a little cleaner, especially on ARM. No interesting speed difference on x86. What do you think? Change-Id: I445472be2aa8a7c3bc8cba443fa477a3628118ba Reviewed-on: https://skia-review.googlesource.com/9155 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
4e7fc0c5da88e0e4ccc1dff23f4f2ff134130acd |
|
02-Mar-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: be more precise by rejecting data sections. This allows %rip addressing as long as it's not going into a data section. This lets us use switch tables, avoiding loops and stack. On HSW, SkRasterPipeline_f16: 90 -> 63 SkRasterPipeline_srgb: 170 -> 97 Change-Id: I3ca2e4ff819b70beea78be75579f9d80c06979e8 Reviewed-on: https://skia-review.googlesource.com/9146 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
c31858bcba3f6d9eb6b57ae03c15b266324a5c23 |
|
01-Mar-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: handle the <kStride tail in AVX+ mode. We have plenty general purpose registers to spare on x86-64, so the cheapest thing to do is use one to hold the usual 'tail'. Speedups on HSW: SkRasterPipeline_srgb: 292 -> 170 SkRasterPipeline_f16: 122 -> 90 There's plenty more room to improve here, e.g. using mask loads and stores, but this seems to be enough to get things working reasonably. BUG=skia:6289 Change-Id: I8c0ed325391822e9f36636500350205e93942111 Reviewed-on: https://skia-review.googlesource.com/9110 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
9c220e00896e8d344e9cfde5934b00fb011d14b8 |
|
02-Mar-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: allow the compiler to generate FMAs Today we use mad() to get FMAs where possible. -ffp-contract=fast lets the compiler generate them if it spots an opportunity. It looks like it's found a mix of FMAs and FMSs. I will follow up by seeing if we can relax the use of mad(). Quick experiments say no, but less quick experiments may say otherwise. Change-Id: I5228811cfbf11cccc0d715672a464fd1e1cea3b0 Reviewed-on: https://skia-review.googlesource.com/9136 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
e93d190ee5c2954ec373176826add4f8ee11b9c4 |
|
01-Mar-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: upgrade to Clang 3.9 Mostly I think this will help me handle the AVX tails better. But there are some wins here already, particularly in AVX and ARM code. Change-Id: Ie79b4c2c4ab455277c313f15d360cbf8e4bb7836 Reviewed-on: https://skia-review.googlesource.com/9126 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
11d2df0bdd58d08ab57bc10eea56bc333664c892 |
|
24-Feb-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: perspective matrix Change-Id: I2c63e0996e4689950f8f3b82da0fb07941c26044 Reviewed-on: https://skia-review.googlesource.com/8952 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
9fe1b22249171087a0f01c67369559f6fd491540 |
|
24-Feb-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: tiling modes Slight changes to clamp to make it look more like the other two. Mirror gets a fun new SSE/AVX abs() that requires no constants: abs(v) = v & (0-v) Change-Id: Iab4a61e39a7d28b47d9a10e7283df58b5e5a034e Reviewed-on: https://skia-review.googlesource.com/8950 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
420e38f586ed21a51c9d216c422b4c4d5ab2dc97 |
|
24-Feb-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: a8 Change-Id: I123caaee0bb8e3967c0a1f2acf1d80bcf0f41758 Reviewed-on: https://skia-review.googlesource.com/8944 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
e3d4421e67d328fc0994bbfd25deab0ce3964dc2 |
|
24-Feb-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: scales and lerps Change-Id: I6057ba3e9243641fecbc6b78f6f83ee3265ad3d4 Reviewed-on: https://skia-review.googlesource.com/8941 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
3f81f3703a68755c88f5cc4a87728b98f34c4cd4 |
|
23-Feb-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: 565 Change-Id: Icbd41e3dde9b39a61ccbe8e7622334ae53e5212a Reviewed-on: https://skia-review.googlesource.com/8922 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
db356b7213bfd3ed636e158b5427be68adf01bed |
|
23-Feb-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: fill in AVX f16 stages, turn on AVX As far as I can tell, this draws identically to the SSE4.1 backend. Change-Id: Id650db59a84d779b84d45f42e60321732e28d803 Reviewed-on: https://skia-review.googlesource.com/8913 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
e9c25ce2d2fecd719156212b4ac6f6701c31a4b6 |
|
23-Feb-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: reformat .S files Decimal byte encoding makes more horizontal space for comments, which are the only thing you really want to read. No code change here. Change-Id: I674d78c898976063b0d89b747af41c62dc294303 Reviewed-on: https://skia-review.googlesource.com/8899 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
ca0cfb4a7a52ae894ca005475ad9de5ac1329900 |
|
23-Feb-2017 |
Mike Klein <mtklein@chromium.org> |
Add AVX to the SkJumper mix. AVX is a nice little halfway point between SSE4.1 and HSW, in terms of instructions available, performance, and availability. Intel chips have had AVX since ~2011, compared to ~2013 for HSW and ~2007 for SSE4.1. Like HSW it's got 8-wide 256-bit float vectors, but integer (and double) operations are essentially still only 128-bit. It also doesn't have F16 conversion or FMA instructions. It doesn't look like this is going to be a burden to maintain, and only adds a few KB of code size. In exchange, we now run 8x wide on 45% to 70% of x86 machines, depending on the OS. In my brief testing, speed eerily resembles exact geometric progression: SSE4.1: 1x speed (baseline) AVX: ~sqrt(2)x speed HSW: ~2x speed This adds all the basic plumbing for AVX but leaves it disabled. I'll flip it on once I've implemented the f16 TODOs. Change-Id: I1c378dabb8a06386646371bf78ade9e9432b006f Reviewed-on: https://skia-review.googlesource.com/8898 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
d9e82256e8591e7da1e2afde1e4c6c49c48ddc96 |
|
22-Feb-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: set_rgb and swap_rb swap_rb is a big limiting factor on Windows and Linux. set_rgb just happened to be nearby and easy. Change-Id: Ic529c7578eeb278476821090127fa8fb1f70c04f Reviewed-on: https://skia-review.googlesource.com/8859 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
2b767361de00fd85cb32dce62c4a95d30b7eaabf |
|
22-Feb-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: implement lerp_u8 Going to start filling these in in biggest-bang-for-the-buck order. lerp_u8 (i.e. text drawing) is number 1 right now. Change-Id: If58eaf8ddbb93a6b954c3700fa1a476dca94a809 Reviewed-on: https://skia-review.googlesource.com/8856 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|
9ef63754a7262f57097b39318adf2f7789d23ecf |
|
21-Feb-2017 |
Mike Klein <mtklein@chromium.org> |
Move looping logic into start_pipeline(). This should be a big win on Windows, but I haven't timed there yet. On my Mac, it's a solid 2% speedup. PS1 was insufficiently ambitious, but was this for posterity: No need to vzeroupper twice on Windows. On Windows start_pipeline() will vzeroupper, so no need to do it in just_return(). Change-Id: I099320b95da85900a60ce96fdb7a216a36db1858 Reviewed-on: https://skia-review.googlesource.com/8821 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
/external/skia/src/jumper/SkJumper_generated_win.S
|
ed1b9022b379108219e3aa07412b764ac08ae382 |
|
21-Feb-2017 |
Mike Klein <mtklein@chromium.org> |
SkJumper: Windows - Compile stages with -DWIN to pick up MS-specific start_pipeline(). - Add SkJumper_generated_win.S with MS-specific assembly. - Add a minimal asm tool to our GN Windows toolchain. The SkRasterPipeline_f16 benchmark run ~4x faster on my desktop. Change-Id: Ia45afb4ecb6a055e2c0e43f0f54f59e081c23b7f Reviewed-on: https://skia-review.googlesource.com/8778 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
/external/skia/src/jumper/SkJumper_generated_win.S
|