6ca5afb8c26991cf4f13a8bcca870ec2a9522bf7 |
|
24-Mar-2014 |
Mathias Krause <minipli@googlemail.com> |
crypto: x86/sha1 - re-enable the AVX variant Commit 7c1da8d0d0 "crypto: sha - SHA1 transform x86_64 AVX2" accidentally disabled the AVX variant by making the avx_usable() test not only fail in case the CPU doesn't support AVX or OSXSAVE but also if it doesn't support AVX2. Fix that regression by splitting up the AVX/AVX2 test into two functions. Also test for the BMI1 extension in the avx2_usable() test as the AVX2 implementation not only makes use of BMI2 but also BMI1 instructions. Cc: Chandramouli Narayanan <mouli@linux.intel.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Reviewed-by: H. Peter Anvin <hpa@linux.intel.com> Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
7c1da8d0d046174a4188b5729d7579abf3d29427 |
|
20-Mar-2014 |
chandramouli narayanan <mouli@linux.intel.com> |
crypto: sha - SHA1 transform x86_64 AVX2 This git patch adds x86_64 AVX2 optimization of SHA1 transform to crypto support. The patch has been tested with 3.14.0-rc1 kernel. On a Haswell desktop, with turbo disabled and all cpus running at maximum frequency, tcrypt shows AVX2 performance improvement from 3% for 256 bytes update to 16% for 1024 bytes update over AVX implementation. This patch adds sha1_avx2_transform(), the glue, build and configuration changes needed for AVX2 optimization of SHA1 transform to crypto support. sha1-ssse3 is one module which adds the necessary optimization support (SSSE3/AVX/AVX2) for the low-level SHA1 transform function. With better optimization support, transform function is overridden as the case may be. In the case of AVX2, due to performance reasons across datablock sizes, the AVX or AVX2 transform function is used at run-time as it suits best. The Makefile change therefore appends the necessary objects to the linkage. Due to this, the patch merely appends AVX2 transform to the existing build mix and Kconfig support and leaves the configuration build support as is. Signed-off-by: Chandramouli Narayanan <mouli@linux.intel.com> Reviewed-by: Marek Vasut <marex@denx.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
65df57743924c3d13e1fa1bcf5bf70fe874fcdfd |
|
24-May-2012 |
Mathias Krause <minipli@googlemail.com> |
crypto: sha1 - use Kbuild supplied flags for AVX test Commit ea4d26ae ("raid5: add AVX optimized RAID5 checksumming") introduced x86/ arch wide defines for AFLAGS and CFLAGS indicating AVX support in binutils based on the same test we have in x86/crypto/ right now. To minimize duplication drop our implementation in favour to the one in x86/. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
66be895158886a6cd816aa1eaa18965a5c522d8f |
|
04-Aug-2011 |
Mathias Krause <minipli@googlemail.com> |
crypto: sha1 - SSSE3 based SHA1 implementation for x86-64 This is an assembler implementation of the SHA1 algorithm using the Supplemental SSE3 (SSSE3) instructions or, when available, the Advanced Vector Extensions (AVX). Testing with the tcrypt module shows the raw hash performance is up to 2.3 times faster than the C implementation, using 8k data blocks on a Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25% faster. Since this implementation uses SSE/YMM registers it cannot safely be used in every situation, e.g. while an IRQ interrupts a kernel thread. The implementation falls back to the generic SHA1 variant, if using the SSE/YMM registers is not possible. With this algorithm I was able to increase the throughput of a single IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using the SSSE3 variant -- a speedup of +34.8%. Saving and restoring SSE/YMM state might make the actual throughput fluctuate when there are FPU intensive userland applications running. For example, meassuring the performance using iperf2 directly on the machine under test gives wobbling numbers because iperf2 uses the FPU for each packet to check if the reporting interval has expired (in the above test I got min/max/avg: 402/484/464 MBit/s). Using this algorithm on a IPsec gateway gives much more reasonable and stable numbers, albeit not as high as in the directly connected case. Here is the result from an RFC 2544 test run with a EXFO Packet Blazer FTB-8510: frame size sha1-generic sha1-ssse3 delta 64 byte 37.5 MBit/s 37.5 MBit/s 0.0% 128 byte 56.3 MBit/s 62.5 MBit/s +11.0% 256 byte 87.5 MBit/s 100.0 MBit/s +14.3% 512 byte 131.3 MBit/s 150.0 MBit/s +14.2% 1024 byte 162.5 MBit/s 193.8 MBit/s +19.3% 1280 byte 175.0 MBit/s 212.5 MBit/s +21.4% 1420 byte 175.0 MBit/s 218.7 MBit/s +25.0% 1518 byte 150.0 MBit/s 181.2 MBit/s +20.8% The throughput for the largest frame size is lower than for the previous size because the IP packets need to be fragmented in this case to make there way through the IPsec tunnel. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Maxim Locktyukhin <maxim.locktyukhin@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|