History log of /external/adhd/cras/src/dsp/dsp_util.c
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
62423294e7d0783a279fe16250d103bbaabdde2d 28-Apr-2016 John Muir <muirj@google.com> CRAS: Fix compile warnings on the latest clang.

BUG=None
TEST=Build completes on 64-bit ARM Brillo-based build.
Build completes on chrome OS build environment.

Change-Id: Icdc835aa7de8ca210f7cc57c4ba57e5507f196ec
Reviewed-on: https://chromium-review.googlesource.com/341313
Commit-Ready: John Muir <muirj@google.com>
Tested-by: John Muir <muirj@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
162278bc78b51ee641e3b95d307c32d370b648ff 24-Mar-2016 Frank Barchard <fbarchard@google.com> optimize interleave_stereo changing multiply to add.

instead of float multiply by 32768, add 15 as an integer to
the exponent of the float, scaling the float that is in range of
-1 to 1, to -32768 to 32767.

performance improves on aarch64 by 29%
Was
interleave SIMD size = 65536, elapsed time = 3232 ms
interleave SIMD size = 32768, elapsed time = 1205 ms
interleave SIMD size = 16384, elapsed time = 596 ms
interleave SIMD size = 8192, elapsed time = 298 ms
interleave SIMD size = 4096, elapsed time = 157 ms
interleave SIMD size = 2048, elapsed time = 73 ms
interleave SIMD size = 1024, elapsed time = 37 ms

Now
interleave SIMD size = 65536, elapsed time = 2149 ms
interleave SIMD size = 32768, elapsed time = 925 ms
interleave SIMD size = 16384, elapsed time = 461 ms
interleave SIMD size = 8192, elapsed time = 231 ms
interleave SIMD size = 4096, elapsed time = 133 ms
interleave SIMD size = 2048, elapsed time = 57 ms
interleave SIMD size = 1024, elapsed time = 29 ms

Intel performance
interleave SIMD size = 65536, elapsed time = 1171 ms
interleave SIMD size = 32768, elapsed time = 573 ms
interleave SIMD size = 16384, elapsed time = 243 ms
interleave SIMD size = 8192, elapsed time = 121 ms
interleave SIMD size = 4096, elapsed time = 59 ms
interleave SIMD size = 2048, elapsed time = 30 ms

BUG=None
TEST=local stand alone test

Change-Id: Ib01ac9897e96f30eecad30e85a21dab05a08f8d4
Reviewed-on: https://chromium-review.googlesource.com/334494
Commit-Ready: Frank Barchard <fbarchard@chromium.org>
Tested-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
08e48570a305dbc48d3a8155a43fd618bfb0af53 22-Mar-2016 Frank Barchard <fbarchard@google.com> interleave_stereo - switch round to away to mimic C code

fcvtns rounds ties to even, which makes even numbers round down
odd numbers round up.
The C code always rounds ties up (away from zero), so this CL
switches to fcvtas instruction.

A test compares C, Neon and expected results for edges case:
test interleave compare 0, 1.000000 32768.000000 32767 32767 32767
test interleave compare 0, -1.000000 -32768.000000 -32768 -32768 -32768
test interleave compare 0, 1.100000 36044.800781 32767 32767 32767
test interleave compare 0, -1.100000 -36044.800781 -32768 -32768 -32768
test interleave compare 0, inf inf 32767 32767 32767
test interleave compare 0, -inf -inf -32768 -32768 -32768
test interleave compare 0, 0.250000 8192.000000 8192 8192 8192
test interleave compare 0, -0.250000 -8192.000000 -8192 -8192 -8192
test interleave compare 0, 0.500000 16384.000000 16384 16384 16384
test interleave compare 0, -0.500000 -16384.000000 -16384 -16384 -16384
test interleave compare 0, 0.000031 1.000000 1 1 1
test interleave compare 0, -0.000031 -1.000000 -1 -1 -1
test interleave compare 0, 0.000031 1.000033 1 1 1
test interleave compare 0, -0.000031 -1.000033 -1 -1 -1
test interleave compare 0, 0.000031 0.999967 1 1 1
test interleave compare 0, -0.000031 -0.999967 -1 -1 -1
test interleave compare 0, 0.000015 0.500000 1 1 0
test interleave compare 0, -0.000015 -0.500000 -1 -1 0
test interleave compare 0, 0.000015 0.500033 1 1 1
test interleave compare 0, -0.000015 -0.500033 -1 -1 1
test interleave compare 0, 0.000015 0.499967 0 0 0
test interleave compare 0, -0.000015 -0.499967 0 0 0
test interleave compare 0, 0.000046 1.500000 2 2 2
test interleave compare 0, -0.000046 -1.500000 -2 -2 -2
test interleave compare 0, 0.000046 1.500033 2 2 2
test interleave compare 0, -0.000046 -1.500033 -2 -2 -2
test interleave compare 0, 0.000046 1.499967 1 1 1
test interleave compare 0, -0.000046 -1.499967 -1 -1 -1
test interleave compare 0, 0.000000 0.000000 0 0 0
test interleave compare 0, -0.000000 -0.000000 0 0 0
test interleave compare 0, nan nan 0 0 0
test interleave compare 0, -nan -nan 0 0 0

BUG=None
TEST=local unittest

Change-Id: If55678d63b5842688d105c817ecedf6137064d31
Reviewed-on: https://chromium-review.googlesource.com/334311
Commit-Ready: Frank Barchard <fbarchard@chromium.org>
Tested-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
/external/adhd/cras/src/dsp/dsp_util.c
2758d361b60e47338d81c99e5002a820590ab15d 22-Mar-2016 Frank Barchard <fbarchard@google.com> Add clamping large floats to intel float to int.

CVTPS2DQ on large values, produces 0x80000000. Adding a minps
before doing the conversion allows clamping to a large positive
number instead, matching the C and Arm versions.

Performance is 5.3% slower
Without clamp
interleave ORIG size = 65536, elapsed time = 10907 ms
interleave SSE3 size = 65536, elapsed time = 1552 ms

With clamping
interleave ORIG size = 65536, elapsed time = 10855 ms
interleave SSE3 size = 65536, elapsed time = 1635 ms

BUG=None
TEST=local stand alone test

Change-Id: Ia7431d2daf97083b87e29560f7ff22642c3cda22
Reviewed-on: https://chromium-review.googlesource.com/334387
Commit-Ready: Frank Barchard <fbarchard@chromium.org>
Tested-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
/external/adhd/cras/src/dsp/dsp_util.c
a0065c306bda230377cba798ac847123fa02d1a1 17-Mar-2016 Frank Barchard <fbarchard@google.com> interleave_stereo use fmul then rounding float to int cast.

The current code rounds by adding or subtracting 0.5.
This CL improves performance by multiplying the float by 32k,
then using a rounding float to int conversion.

Was fcmgt
interleave NEON size = 4096, elapsed time = 203 ms
Now fmul
interleave NEON size = 4096, elapsed time = 150 ms

BUG=none
TEST=local unittest

Change-Id: I019f37755faa4aa0fb6f7f4fe3f1e6306d282891
Reviewed-on: https://chromium-review.googlesource.com/332813
Commit-Ready: Frank Barchard <fbarchard@chromium.org>
Tested-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
1521112b819f2c94f3af1f36c271e6eb37d227c3 11-Mar-2016 Frank Barchard <fbarchard@google.com> 64 bit neon interleave_stereo

Port 32 bit neon audio interleave_stereo to 64 bit neon.

For 4k samples, performance compared to original C code
is 18.2 times faster:
interleave ORIG size = 4096, elapsed time = 4292 ms
interleave NEON size = 4096, elapsed time = 236 ms

Improvement is more for smaller buffers (10x) as memory
becomes less of a limiting factor. -O2 used for C
code unrolled and used Neon scaler instructions, producing a
reasonably fast but large function compared to the 32 bit
version, which is same performance for Neon, but slower C code:
interleave ORIG size = 4096, elapsed time = 1344 ms

BUG=None
TEST=local tests and try bots pass.

Change-Id: Ic57bcb9274fae1495b38ff251f46ab86cc0a33ae
Reviewed-on: https://chromium-review.googlesource.com/332612
Commit-Ready: Frank Barchard <fbarchard@chromium.org>
Tested-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
1351f846c568aa31365aac0b607eb377cc5087c5 02-Mar-2016 Frank Barchard <fbarchard@chromium.org> 64 bit neon deinterleave_stereo

Port 32 bit neon audio deinterleave_stereo to 64 bit neon.

For 4k samples, performance compared to original C code
is 8.06 times faster:
deinterleave ORIG size = 4096, elapsed time = 1741 ms
deinterleave NEON size = 4096, elapsed time = 216 ms

Improvement is more for smaller buffers and less for larger
buffers as memory becomes limiting factor. -O2 used for C
code unrolled and used Neon scaler instructions, producing a
reasonably fast but large function compared to the 32 bit
version, which is same performance for Neon, but slower C code:
deinterleave ORIG size = 4096, elapsed time = 4586 ms

BUG=None
TEST=local arm64-generic build and try bots

Change-Id: I3093c7e73089051713930d59a89cea050eb65297
Reviewed-on: https://chromium-review.googlesource.com/329951
Commit-Ready: Frank Barchard <fbarchard@chromium.org>
Tested-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
0cb6c04cd9cd7ee17b1a5a3e3dc5395d8353c0eb 10-Mar-2016 Frank Barchard <fbarchard@google.com> CRAS: use inline asm for dsp_enable_flush_denormal_to_zero

This CL reimplements dsp_enable_flush_denormal_to_zero
using inline assembly for ARM.

dsp_enable_flush_denormal_to_zero() currently enables
denormal flush to zero using macros in <fpu_control.h>.
This header is not available on all platforms.

No functional change.
No change for Intel or other CPUs.

BUG=None
TEST=local tests and try bots pass.

Change-Id: I169c8ca4f1d0919d0f4441db51ff56b75af85374
Reviewed-on: https://chromium-review.googlesource.com/332062
Commit-Ready: Frank Barchard <fbarchard@chromium.org>
Tested-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
173efda318c9f0e46da6ce8c5e40bbf95642105a 09-Mar-2016 Frank Barchard <fbarchard@chromium.org> Revert "64 bit neon interleave_stereo"

This reverts commit 042208bd3157da51623ab6ac0170b7362bbc15ca.

Change-Id: Ie6ee76ddafb17af17caa71a64dd2e637baacddc6
Reviewed-on: https://chromium-review.googlesource.com/331980
Commit-Ready: Frank Barchard <fbarchard@chromium.org>
Tested-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
042208bd3157da51623ab6ac0170b7362bbc15ca 08-Mar-2016 Frank Barchard <fbarchard@google.com> 64 bit neon interleave_stereo

Port 32 bit neon audio interleave_stereo to 64 bit neon.

For 4k samples, performance compared to original C code
is 8.09 times faster:
interleave ORIG size = 4096, elapsed time = 429 ms
interleave NEON size = 4096, elapsed time = 53 ms

Improvement is more for smaller buffers (10x) as memory
becomes less of a limiting factor. -O2 used for C
code unrolled and used Neon scaler instructions, producing a
reasonably fast but large function compared to the 32 bit
version, which is same performance for Neon, but slower C code:
interleave ORIG size = 4096, elapsed time = 1344 ms

BUG=None
TEST=local tests and try bots pass.

Change-Id: I9b963b9492e8f05284d9bd0a8cf137c68373cecc
Reviewed-on: https://chromium-review.googlesource.com/331323
Commit-Ready: Frank Barchard <fbarchard@chromium.org>
Tested-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
b2f9396d26f1d193822d67a2173fd823a60f971e 02-Mar-2016 Frank Barchard <fbarchard@google.com> Enable denormal flush to zero for aarch64.

The denormal intrinsics apply to 64 bit arm, as well as
32 bit. Enable the code when compiled for arm64 target.

BUG=None
TEST=local arm64-generic build and try bots.

Change-Id: I96941ad3e05898754688a7489b3fb8bae9eeb840
Reviewed-on: https://chromium-review.googlesource.com/329952
Commit-Ready: Frank Barchard <fbarchard@chromium.org>
Tested-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Chih-Chung Chang <chihchung@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
93bf9cd9fd7a7a03303b812305117a770e8403df 03-Mar-2016 Frank Barchard <fbarchard@chromium.org> Revert "Optimized Neon deinterleave_stereo"

This reverts commit 27cfcea7bae30dd890539fb439c1b6a41d3a2d27.

Will cause an overwrite of up to 7 samples if the 'frames' is not a multiple of 8.
If the 'frames' is a multiple of 8, no overwrite occurs.

Change-Id: I17058ccf47e8b5106e2963fd83db01556524781e
Reviewed-on: https://chromium-review.googlesource.com/330159
Commit-Ready: Frank Barchard <fbarchard@chromium.org>
Tested-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
27cfcea7bae30dd890539fb439c1b6a41d3a2d27 23-Feb-2016 Frank Barchard <fbarchard@google.com> Optimized Neon deinterleave_stereo

Reorder the conversion in ascending order.
Loop using frames parameter directly.
Convert tabs to 2 spaces for lint.

BUG=None
TEST=untested

Change-Id: Ife0487c1f87bb1495fc40bc8931f54377391cbe6
Reviewed-on: https://chromium-review.googlesource.com/328825
Reviewed-by: Johann Koenig <johannkoenig@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Tested-by: Frank Barchard <fbarchard@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
d64c9f3c9f382891e37470a5f73519f47a0b5b7f 19-Oct-2013 Mike Frysinger <vapier@chromium.org> dsp_util: simplify optimization logic

This reduces the footprint in the callers a bit.

We also disable #warning code from breaking the build.

BUG=chromium:307180
TEST=building for a new arch no longer aborts
TEST=`emerge-daisy adhd` works and doesn't warn
CQ-DEPEND=CL:173762

Change-Id: I94572ebcc7e59b24df51f2b52fa8ed5f8ff6f8ed
Reviewed-on: https://chromium-review.googlesource.com/173761
Reviewed-by: Chih-Chung Chang <chihchung@chromium.org>
Tested-by: Mike Frysinger <vapier@chromium.org>
Commit-Queue: Mike Frysinger <vapier@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
59904485bcf1793be97a369ec1f7c4d94e8d5b09 01-Jul-2013 Chih-Chung Chang <chihchung@chromium.org> CRAS: Add SSE-optimized (de)interleave and EQ functions.

This reduces two channels six biquads EQ from 0.20% to 0.09% CPU on link.

BUG=none
TEST=make check, run eq2_test and compare results.

Change-Id: I6fc237c7f0ba84dcccc01d8b069e5ec0707b4966
Reviewed-on: https://gerrit.chromium.org/gerrit/60642
Reviewed-by: Chih-Chung Chang <chihchung@chromium.org>
Tested-by: Chih-Chung Chang <chihchung@chromium.org>
Commit-Queue: Chih-Chung Chang <chihchung@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
496d6a5a3e1229a96c80538b4d56293e2129c8ea 17-Jun-2013 Chih-Chung Chang <chihchung@chromium.org> CRAS: Improve dsp functions on arm.

- Use inline assembly instead of intrinsics as GCC didn't generate good
code for intrinsics.
- Process 8 frames instead of 4 frames per deinterleave loop.
- Replace vuzp/vzip with vld2/vst2 and remove unnecessary vmaxq/vminq
(already implied in vqmovn).
- Tweak the evaluation order in eq_process to gain a bit speed on arm.

This reduces EQ time from 0.59% to 0.36% on spring.

BUG=none
TEST=run eq_test and compare with previous result
Change-Id: I697eb5582888ff44d4105b052b39e34b54de1de3
Reviewed-on: https://gerrit.chromium.org/gerrit/59014
Commit-Queue: Chih-Chung Chang <chihchung@chromium.org>
Reviewed-by: Chih-Chung Chang <chihchung@chromium.org>
Tested-by: Chih-Chung Chang <chihchung@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
6d3720ae98a5eeffd1b1fbef6bad3910c1f48b6e 05-Jun-2013 Chih-Chung Chang <chihchung@chromium.org> CRAS: Add neon interleave/deinterleave functions.

BUG=chromium:220340
TEST=make test
Change-Id: I22b58558c66ba0a3898e299f554a42e4dad00040
Reviewed-on: https://gerrit.chromium.org/gerrit/57585
Reviewed-by: Dylan Reid <dgreid@chromium.org>
Tested-by: Chih-Chung Chang <chihchung@chromium.org>
Commit-Queue: Chih-Chung Chang <chihchung@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c
9ec4821af5dd2853247d33a97ea668f1e2ceda9e 05-Jun-2013 Chih-Chung Chang <chihchung@chromium.org> CRAS: Add DSP EQ module.

The equalizer is implemented using a chain of biquads. Each biquad can
be lowpass, highpass, peaking, etc. The biquad formula is ported from
Chromium browser.

BUG=chromium:220340
TEST=make check; run eq_test and plot_fftl.m and verify.
Change-Id: I80a3df15155589e006275911ec58cfde7020ccbb
Reviewed-on: https://gerrit.chromium.org/gerrit/57584
Reviewed-by: Dylan Reid <dgreid@chromium.org>
Commit-Queue: Chih-Chung Chang <chihchung@chromium.org>
Tested-by: Chih-Chung Chang <chihchung@chromium.org>
/external/adhd/cras/src/dsp/dsp_util.c