62423294e7d0783a279fe16250d103bbaabdde2d |
|
28-Apr-2016 |
John Muir <muirj@google.com> |
CRAS: Fix compile warnings on the latest clang. BUG=None TEST=Build completes on 64-bit ARM Brillo-based build. Build completes on chrome OS build environment. Change-Id: Icdc835aa7de8ca210f7cc57c4ba57e5507f196ec Reviewed-on: https://chromium-review.googlesource.com/341313 Commit-Ready: John Muir <muirj@google.com> Tested-by: John Muir <muirj@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Dylan Reid <dgreid@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|
162278bc78b51ee641e3b95d307c32d370b648ff |
|
24-Mar-2016 |
Frank Barchard <fbarchard@google.com> |
optimize interleave_stereo changing multiply to add. instead of float multiply by 32768, add 15 as an integer to the exponent of the float, scaling the float that is in range of -1 to 1, to -32768 to 32767. performance improves on aarch64 by 29% Was interleave SIMD size = 65536, elapsed time = 3232 ms interleave SIMD size = 32768, elapsed time = 1205 ms interleave SIMD size = 16384, elapsed time = 596 ms interleave SIMD size = 8192, elapsed time = 298 ms interleave SIMD size = 4096, elapsed time = 157 ms interleave SIMD size = 2048, elapsed time = 73 ms interleave SIMD size = 1024, elapsed time = 37 ms Now interleave SIMD size = 65536, elapsed time = 2149 ms interleave SIMD size = 32768, elapsed time = 925 ms interleave SIMD size = 16384, elapsed time = 461 ms interleave SIMD size = 8192, elapsed time = 231 ms interleave SIMD size = 4096, elapsed time = 133 ms interleave SIMD size = 2048, elapsed time = 57 ms interleave SIMD size = 1024, elapsed time = 29 ms Intel performance interleave SIMD size = 65536, elapsed time = 1171 ms interleave SIMD size = 32768, elapsed time = 573 ms interleave SIMD size = 16384, elapsed time = 243 ms interleave SIMD size = 8192, elapsed time = 121 ms interleave SIMD size = 4096, elapsed time = 59 ms interleave SIMD size = 2048, elapsed time = 30 ms BUG=None TEST=local stand alone test Change-Id: Ib01ac9897e96f30eecad30e85a21dab05a08f8d4 Reviewed-on: https://chromium-review.googlesource.com/334494 Commit-Ready: Frank Barchard <fbarchard@chromium.org> Tested-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Dylan Reid <dgreid@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|
08e48570a305dbc48d3a8155a43fd618bfb0af53 |
|
22-Mar-2016 |
Frank Barchard <fbarchard@google.com> |
interleave_stereo - switch round to away to mimic C code fcvtns rounds ties to even, which makes even numbers round down odd numbers round up. The C code always rounds ties up (away from zero), so this CL switches to fcvtas instruction. A test compares C, Neon and expected results for edges case: test interleave compare 0, 1.000000 32768.000000 32767 32767 32767 test interleave compare 0, -1.000000 -32768.000000 -32768 -32768 -32768 test interleave compare 0, 1.100000 36044.800781 32767 32767 32767 test interleave compare 0, -1.100000 -36044.800781 -32768 -32768 -32768 test interleave compare 0, inf inf 32767 32767 32767 test interleave compare 0, -inf -inf -32768 -32768 -32768 test interleave compare 0, 0.250000 8192.000000 8192 8192 8192 test interleave compare 0, -0.250000 -8192.000000 -8192 -8192 -8192 test interleave compare 0, 0.500000 16384.000000 16384 16384 16384 test interleave compare 0, -0.500000 -16384.000000 -16384 -16384 -16384 test interleave compare 0, 0.000031 1.000000 1 1 1 test interleave compare 0, -0.000031 -1.000000 -1 -1 -1 test interleave compare 0, 0.000031 1.000033 1 1 1 test interleave compare 0, -0.000031 -1.000033 -1 -1 -1 test interleave compare 0, 0.000031 0.999967 1 1 1 test interleave compare 0, -0.000031 -0.999967 -1 -1 -1 test interleave compare 0, 0.000015 0.500000 1 1 0 test interleave compare 0, -0.000015 -0.500000 -1 -1 0 test interleave compare 0, 0.000015 0.500033 1 1 1 test interleave compare 0, -0.000015 -0.500033 -1 -1 1 test interleave compare 0, 0.000015 0.499967 0 0 0 test interleave compare 0, -0.000015 -0.499967 0 0 0 test interleave compare 0, 0.000046 1.500000 2 2 2 test interleave compare 0, -0.000046 -1.500000 -2 -2 -2 test interleave compare 0, 0.000046 1.500033 2 2 2 test interleave compare 0, -0.000046 -1.500033 -2 -2 -2 test interleave compare 0, 0.000046 1.499967 1 1 1 test interleave compare 0, -0.000046 -1.499967 -1 -1 -1 test interleave compare 0, 0.000000 0.000000 0 0 0 test interleave compare 0, -0.000000 -0.000000 0 0 0 test interleave compare 0, nan nan 0 0 0 test interleave compare 0, -nan -nan 0 0 0 BUG=None TEST=local unittest Change-Id: If55678d63b5842688d105c817ecedf6137064d31 Reviewed-on: https://chromium-review.googlesource.com/334311 Commit-Ready: Frank Barchard <fbarchard@chromium.org> Tested-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Dylan Reid <dgreid@chromium.org> Reviewed-by: Cheng Wang <wangcheng@google.com>
/external/adhd/cras/src/dsp/dsp_util.c
|
2758d361b60e47338d81c99e5002a820590ab15d |
|
22-Mar-2016 |
Frank Barchard <fbarchard@google.com> |
Add clamping large floats to intel float to int. CVTPS2DQ on large values, produces 0x80000000. Adding a minps before doing the conversion allows clamping to a large positive number instead, matching the C and Arm versions. Performance is 5.3% slower Without clamp interleave ORIG size = 65536, elapsed time = 10907 ms interleave SSE3 size = 65536, elapsed time = 1552 ms With clamping interleave ORIG size = 65536, elapsed time = 10855 ms interleave SSE3 size = 65536, elapsed time = 1635 ms BUG=None TEST=local stand alone test Change-Id: Ia7431d2daf97083b87e29560f7ff22642c3cda22 Reviewed-on: https://chromium-review.googlesource.com/334387 Commit-Ready: Frank Barchard <fbarchard@chromium.org> Tested-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Dylan Reid <dgreid@chromium.org> Reviewed-by: Cheng Wang <wangcheng@google.com>
/external/adhd/cras/src/dsp/dsp_util.c
|
a0065c306bda230377cba798ac847123fa02d1a1 |
|
17-Mar-2016 |
Frank Barchard <fbarchard@google.com> |
interleave_stereo use fmul then rounding float to int cast. The current code rounds by adding or subtracting 0.5. This CL improves performance by multiplying the float by 32k, then using a rounding float to int conversion. Was fcmgt interleave NEON size = 4096, elapsed time = 203 ms Now fmul interleave NEON size = 4096, elapsed time = 150 ms BUG=none TEST=local unittest Change-Id: I019f37755faa4aa0fb6f7f4fe3f1e6306d282891 Reviewed-on: https://chromium-review.googlesource.com/332813 Commit-Ready: Frank Barchard <fbarchard@chromium.org> Tested-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|
1521112b819f2c94f3af1f36c271e6eb37d227c3 |
|
11-Mar-2016 |
Frank Barchard <fbarchard@google.com> |
64 bit neon interleave_stereo Port 32 bit neon audio interleave_stereo to 64 bit neon. For 4k samples, performance compared to original C code is 18.2 times faster: interleave ORIG size = 4096, elapsed time = 4292 ms interleave NEON size = 4096, elapsed time = 236 ms Improvement is more for smaller buffers (10x) as memory becomes less of a limiting factor. -O2 used for C code unrolled and used Neon scaler instructions, producing a reasonably fast but large function compared to the 32 bit version, which is same performance for Neon, but slower C code: interleave ORIG size = 4096, elapsed time = 1344 ms BUG=None TEST=local tests and try bots pass. Change-Id: Ic57bcb9274fae1495b38ff251f46ab86cc0a33ae Reviewed-on: https://chromium-review.googlesource.com/332612 Commit-Ready: Frank Barchard <fbarchard@chromium.org> Tested-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|
1351f846c568aa31365aac0b607eb377cc5087c5 |
|
02-Mar-2016 |
Frank Barchard <fbarchard@chromium.org> |
64 bit neon deinterleave_stereo Port 32 bit neon audio deinterleave_stereo to 64 bit neon. For 4k samples, performance compared to original C code is 8.06 times faster: deinterleave ORIG size = 4096, elapsed time = 1741 ms deinterleave NEON size = 4096, elapsed time = 216 ms Improvement is more for smaller buffers and less for larger buffers as memory becomes limiting factor. -O2 used for C code unrolled and used Neon scaler instructions, producing a reasonably fast but large function compared to the 32 bit version, which is same performance for Neon, but slower C code: deinterleave ORIG size = 4096, elapsed time = 4586 ms BUG=None TEST=local arm64-generic build and try bots Change-Id: I3093c7e73089051713930d59a89cea050eb65297 Reviewed-on: https://chromium-review.googlesource.com/329951 Commit-Ready: Frank Barchard <fbarchard@chromium.org> Tested-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|
0cb6c04cd9cd7ee17b1a5a3e3dc5395d8353c0eb |
|
10-Mar-2016 |
Frank Barchard <fbarchard@google.com> |
CRAS: use inline asm for dsp_enable_flush_denormal_to_zero This CL reimplements dsp_enable_flush_denormal_to_zero using inline assembly for ARM. dsp_enable_flush_denormal_to_zero() currently enables denormal flush to zero using macros in <fpu_control.h>. This header is not available on all platforms. No functional change. No change for Intel or other CPUs. BUG=None TEST=local tests and try bots pass. Change-Id: I169c8ca4f1d0919d0f4441db51ff56b75af85374 Reviewed-on: https://chromium-review.googlesource.com/332062 Commit-Ready: Frank Barchard <fbarchard@chromium.org> Tested-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Dylan Reid <dgreid@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|
173efda318c9f0e46da6ce8c5e40bbf95642105a |
|
09-Mar-2016 |
Frank Barchard <fbarchard@chromium.org> |
Revert "64 bit neon interleave_stereo" This reverts commit 042208bd3157da51623ab6ac0170b7362bbc15ca. Change-Id: Ie6ee76ddafb17af17caa71a64dd2e637baacddc6 Reviewed-on: https://chromium-review.googlesource.com/331980 Commit-Ready: Frank Barchard <fbarchard@chromium.org> Tested-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|
042208bd3157da51623ab6ac0170b7362bbc15ca |
|
08-Mar-2016 |
Frank Barchard <fbarchard@google.com> |
64 bit neon interleave_stereo Port 32 bit neon audio interleave_stereo to 64 bit neon. For 4k samples, performance compared to original C code is 8.09 times faster: interleave ORIG size = 4096, elapsed time = 429 ms interleave NEON size = 4096, elapsed time = 53 ms Improvement is more for smaller buffers (10x) as memory becomes less of a limiting factor. -O2 used for C code unrolled and used Neon scaler instructions, producing a reasonably fast but large function compared to the 32 bit version, which is same performance for Neon, but slower C code: interleave ORIG size = 4096, elapsed time = 1344 ms BUG=None TEST=local tests and try bots pass. Change-Id: I9b963b9492e8f05284d9bd0a8cf137c68373cecc Reviewed-on: https://chromium-review.googlesource.com/331323 Commit-Ready: Frank Barchard <fbarchard@chromium.org> Tested-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|
b2f9396d26f1d193822d67a2173fd823a60f971e |
|
02-Mar-2016 |
Frank Barchard <fbarchard@google.com> |
Enable denormal flush to zero for aarch64. The denormal intrinsics apply to 64 bit arm, as well as 32 bit. Enable the code when compiled for arm64 target. BUG=None TEST=local arm64-generic build and try bots. Change-Id: I96941ad3e05898754688a7489b3fb8bae9eeb840 Reviewed-on: https://chromium-review.googlesource.com/329952 Commit-Ready: Frank Barchard <fbarchard@chromium.org> Tested-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Chih-Chung Chang <chihchung@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|
93bf9cd9fd7a7a03303b812305117a770e8403df |
|
03-Mar-2016 |
Frank Barchard <fbarchard@chromium.org> |
Revert "Optimized Neon deinterleave_stereo" This reverts commit 27cfcea7bae30dd890539fb439c1b6a41d3a2d27. Will cause an overwrite of up to 7 samples if the 'frames' is not a multiple of 8. If the 'frames' is a multiple of 8, no overwrite occurs. Change-Id: I17058ccf47e8b5106e2963fd83db01556524781e Reviewed-on: https://chromium-review.googlesource.com/330159 Commit-Ready: Frank Barchard <fbarchard@chromium.org> Tested-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|
27cfcea7bae30dd890539fb439c1b6a41d3a2d27 |
|
23-Feb-2016 |
Frank Barchard <fbarchard@google.com> |
Optimized Neon deinterleave_stereo Reorder the conversion in ascending order. Loop using frames parameter directly. Convert tabs to 2 spaces for lint. BUG=None TEST=untested Change-Id: Ife0487c1f87bb1495fc40bc8931f54377391cbe6 Reviewed-on: https://chromium-review.googlesource.com/328825 Reviewed-by: Johann Koenig <johannkoenig@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org> Tested-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|
d64c9f3c9f382891e37470a5f73519f47a0b5b7f |
|
19-Oct-2013 |
Mike Frysinger <vapier@chromium.org> |
dsp_util: simplify optimization logic This reduces the footprint in the callers a bit. We also disable #warning code from breaking the build. BUG=chromium:307180 TEST=building for a new arch no longer aborts TEST=`emerge-daisy adhd` works and doesn't warn CQ-DEPEND=CL:173762 Change-Id: I94572ebcc7e59b24df51f2b52fa8ed5f8ff6f8ed Reviewed-on: https://chromium-review.googlesource.com/173761 Reviewed-by: Chih-Chung Chang <chihchung@chromium.org> Tested-by: Mike Frysinger <vapier@chromium.org> Commit-Queue: Mike Frysinger <vapier@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|
59904485bcf1793be97a369ec1f7c4d94e8d5b09 |
|
01-Jul-2013 |
Chih-Chung Chang <chihchung@chromium.org> |
CRAS: Add SSE-optimized (de)interleave and EQ functions. This reduces two channels six biquads EQ from 0.20% to 0.09% CPU on link. BUG=none TEST=make check, run eq2_test and compare results. Change-Id: I6fc237c7f0ba84dcccc01d8b069e5ec0707b4966 Reviewed-on: https://gerrit.chromium.org/gerrit/60642 Reviewed-by: Chih-Chung Chang <chihchung@chromium.org> Tested-by: Chih-Chung Chang <chihchung@chromium.org> Commit-Queue: Chih-Chung Chang <chihchung@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|
496d6a5a3e1229a96c80538b4d56293e2129c8ea |
|
17-Jun-2013 |
Chih-Chung Chang <chihchung@chromium.org> |
CRAS: Improve dsp functions on arm. - Use inline assembly instead of intrinsics as GCC didn't generate good code for intrinsics. - Process 8 frames instead of 4 frames per deinterleave loop. - Replace vuzp/vzip with vld2/vst2 and remove unnecessary vmaxq/vminq (already implied in vqmovn). - Tweak the evaluation order in eq_process to gain a bit speed on arm. This reduces EQ time from 0.59% to 0.36% on spring. BUG=none TEST=run eq_test and compare with previous result Change-Id: I697eb5582888ff44d4105b052b39e34b54de1de3 Reviewed-on: https://gerrit.chromium.org/gerrit/59014 Commit-Queue: Chih-Chung Chang <chihchung@chromium.org> Reviewed-by: Chih-Chung Chang <chihchung@chromium.org> Tested-by: Chih-Chung Chang <chihchung@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|
6d3720ae98a5eeffd1b1fbef6bad3910c1f48b6e |
|
05-Jun-2013 |
Chih-Chung Chang <chihchung@chromium.org> |
CRAS: Add neon interleave/deinterleave functions. BUG=chromium:220340 TEST=make test Change-Id: I22b58558c66ba0a3898e299f554a42e4dad00040 Reviewed-on: https://gerrit.chromium.org/gerrit/57585 Reviewed-by: Dylan Reid <dgreid@chromium.org> Tested-by: Chih-Chung Chang <chihchung@chromium.org> Commit-Queue: Chih-Chung Chang <chihchung@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|
9ec4821af5dd2853247d33a97ea668f1e2ceda9e |
|
05-Jun-2013 |
Chih-Chung Chang <chihchung@chromium.org> |
CRAS: Add DSP EQ module. The equalizer is implemented using a chain of biquads. Each biquad can be lowpass, highpass, peaking, etc. The biquad formula is ported from Chromium browser. BUG=chromium:220340 TEST=make check; run eq_test and plot_fftl.m and verify. Change-Id: I80a3df15155589e006275911ec58cfde7020ccbb Reviewed-on: https://gerrit.chromium.org/gerrit/57584 Reviewed-by: Dylan Reid <dgreid@chromium.org> Commit-Queue: Chih-Chung Chang <chihchung@chromium.org> Tested-by: Chih-Chung Chang <chihchung@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
|