16a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/* 26a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * ARM NEON optimizations for libjpeg-turbo 36a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 46a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * Copyright (C) 2009-2011 Nokia Corporation and/or its subsidiary(-ies). 56a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * All rights reserved. 66a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * Contact: Alexander Bokovoy <alexander.bokovoy@nokia.com> 76a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 86a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * This software is provided 'as-is', without any express or implied 96a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * warranty. In no event will the authors be held liable for any damages 106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * arising from the use of this software. 116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * Permission is granted to anyone to use this software for any purpose, 136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * including commercial applications, and to alter it and redistribute it 146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * freely, subject to the following restrictions: 156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 1. The origin of this software must not be misrepresented; you must not 176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * claim that you wrote the original software. If you use this software 186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * in a product, an acknowledgment in the product documentation would be 196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * appreciated but is not required. 206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 2. Altered source versions must be plainly marked as such, and must not be 216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * misrepresented as being the original software. 226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 3. This notice may not be removed or altered from any source distribution. 236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */ 246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/* Copyright (c) 2011, NVIDIA CORPORATION. All rights reserved. 256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * Redistribution and use in source and binary forms, with or without 276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * modification, are permitted provided that the following conditions 286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * are met: 296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * * Redistributions of source code must retain the above copyright 316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * notice, this list of conditions and the following disclaimer. 326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * * Redistributions in binary form must reproduce the above copyright 336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * notice, this list of conditions and the following disclaimer in the 346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * documentation and/or other materials provided with the distribution. 356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * * Neither the name of the NVIDIA CORPORATION nor the names of its 366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * contributors may be used to endorse or promote products derived 376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * from this software without specific prior written permission. 386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE 426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF 486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * THE POSSIBILITY OF SUCH DAMAGE. 496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */ 506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#if defined(__linux__) && defined(__ELF__) 546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.section .note.GNU-stack,"",%progbits /* mark stack as non-executable */ 556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#endif 566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.text 586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.fpu neon 596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.arch armv7a 606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.object_arch armv7a 616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.arm 626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define RESPECT_STRICT_ALIGNMENT 1 656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*****************************************************************************/ 676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/* Supplementary macro for setting function attributes */ 696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro asm_function fname 706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .func \fname 716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .global \fname 726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#ifdef __ELF__ 736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .hidden \fname 746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .type \fname, %function 756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#endif 766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe\fname: 776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm 786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/* Transpose a block of 4x4 coefficients in four 64-bit registers */ 806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro transpose_4x4 x0, x1, x2, x3 816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.16 \x0, \x1 826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.16 \x2, \x3 836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.32 \x0, \x2 846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.32 \x1, \x3 856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm 866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*****************************************************************************/ 886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/* 906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * jsimd_idct_ifast_neon 916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * This function contains a fast, not so accurate integer implementation of 936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * the inverse DCT (Discrete Cosine Transform). It uses the same calculations 946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * and produces exactly the same output as IJG's original 'jpeg_idct_fast' 956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * function from jidctfst.c 966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * TODO: a bit better instructions scheduling is needed. 986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */ 996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 1006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define XFIX_1_082392200 d0[0] 1016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define XFIX_1_414213562 d0[1] 1026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define XFIX_1_847759065 d0[2] 1036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define XFIX_2_613125930 d0[3] 1046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 1056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.balign 16 1066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhejsimd_idct_ifast_neon_consts: 1076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short (277 * 128 - 256 * 128) /* XFIX_1_082392200 */ 1086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short (362 * 128 - 256 * 128) /* XFIX_1_414213562 */ 1096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short (473 * 128 - 256 * 128) /* XFIX_1_847759065 */ 1106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short (669 * 128 - 512 * 128) /* XFIX_2_613125930 */ 1116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 1126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/* 1-D IDCT helper macro */ 1136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 1146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro idct_helper x0, x1, x2, x3, x4, x5, x6, x7, \ 1156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe t10, t11, t12, t13, t14 1166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 1176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \t10, \x0, \x4 1186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \x4, \x0, \x4 1196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vswp.s16 \t10, \x0 1206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \t11, \x2, \x6 1216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \x6, \x2, \x6 1226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vswp.s16 \t11, \x2 1236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \t10, \x3, \x5 1246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \x5, \x3, \x5 1256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vswp.s16 \t10, \x3 1266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \t11, \x1, \x7 1276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \x7, \x1, \x7 1286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vswp.s16 \t11, \x1 1296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 1306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqdmulh.s16 \t13, \x2, d0[1] 1316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \t12, \x3, \x3 1326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \x2, \x2, \t13 1336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqdmulh.s16 \t13, \x3, d0[3] 1346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \t10, \x1, \x3 1356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \t12, \t12, \t13 1366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqdmulh.s16 \t13, \t10, d0[2] 1376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \t11, \x7, \x5 1386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \t10, \t10, \t13 1396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqdmulh.s16 \t13, \t11, d0[1] 1406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \t11, \t11, \t13 1416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 1426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqdmulh.s16 \t13, \x1, d0[0] 1436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \x2, \x6, \x2 1446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \t14, \x0, \x2 1456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \x2, \x0, \x2 1466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \x0, \x4, \x6 1476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \x4, \x4, \x6 1486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \x1, \x1, \t13 1496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \t13, \x7, \x5 1506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \t12, \t13, \t12 1516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \t12, \t12, \t10 1526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \t11, \t12, \t11 1536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \t10, \x1, \t10 1546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \t10, \t10, \t11 1556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 1566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \x7, \x0, \t13 1576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \x0, \x0, \t13 1586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \x6, \t14, \t12 1596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \x1, \t14, \t12 1606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \x5, \x2, \t11 1616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \x2, \x2, \t11 1626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s16 \x3, \x4, \t10 1636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 \x4, \x4, \t10 1646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm 1656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 1666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadheasm_function jsimd_idct_ifast_neon 1676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 1686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe DCT_TABLE .req r0 1696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe COEF_BLOCK .req r1 1706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe OUTPUT_BUF .req r2 1716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe OUTPUT_COL .req r3 1726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe TMP .req ip 1736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 1746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vpush {d8-d15} 1756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 1766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Load constants */ 1776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe adr TMP, jsimd_idct_ifast_neon_consts 1786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d0}, [TMP, :64] 1796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 1806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Load all COEF_BLOCK into NEON registers with the following allocation: 1816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 0 1 2 3 | 4 5 6 7 1826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * ---------+-------- 1836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 0 | d4 | d5 1846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 1 | d6 | d7 1856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 2 | d8 | d9 1866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 3 | d10 | d11 1876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 4 | d12 | d13 1886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 5 | d14 | d15 1896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 6 | d16 | d17 1906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 7 | d18 | d19 1916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */ 1926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d4, d5, d6, d7}, [COEF_BLOCK]! 1936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d8, d9, d10, d11}, [COEF_BLOCK]! 1946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d12, d13, d14, d15}, [COEF_BLOCK]! 1956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d16, d17, d18, d19}, [COEF_BLOCK]! 1966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Dequantize */ 1976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d20, d21, d22, d23}, [DCT_TABLE]! 1986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q2, q2, q10 1996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d24, d25, d26, d27}, [DCT_TABLE]! 2006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q3, q3, q11 2016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q4, q4, q12 2026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d28, d29, d30, d31}, [DCT_TABLE]! 2036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q5, q5, q13 2046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q6, q6, q14 2056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d20, d21, d22, d23}, [DCT_TABLE]! 2066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q7, q7, q15 2076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q8, q8, q10 2086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q9, q9, q11 2096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 2106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Pass 1 : process columns from input, store into work array.*/ 2116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe idct_helper q2, q3, q4, q5, q6, q7, q8, q9, q10, q11, q12, q13, q14 2126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Transpose */ 2136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.16 q2, q3 2146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.16 q4, q5 2156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.32 q2, q4 2166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.32 q3, q5 2176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 2186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.16 q6, q7 2196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.16 q8, q9 2206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.32 q6, q8 2216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.32 q7, q9 2226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 2236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vswp d12, d5 2246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vswp d14, d7 2256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vswp d16, d9 2266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vswp d18, d11 2276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 2286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Pass 2 */ 2296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe idct_helper q2, q3, q4, q5, q6, q7, q8, q9, q10, q11, q12, q13, q14 2306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Transpose */ 2316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 2326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.16 q2, q3 2336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.16 q4, q5 2346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.32 q2, q4 2356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.32 q3, q5 2366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 2376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.16 q6, q7 2386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.16 q8, q9 2396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.32 q6, q8 2406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.32 q7, q9 2416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 2426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vswp d12, d5 2436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vswp d14, d7 2446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vswp d16, d9 2456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vswp d18, d11 2466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 2476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Descale and range limit */ 2486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmov.s16 q15, #(0x80 << 5) 2496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqadd.s16 q2, q2, q15 2506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqadd.s16 q3, q3, q15 2516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqadd.s16 q4, q4, q15 2526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqadd.s16 q5, q5, q15 2536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqadd.s16 q6, q6, q15 2546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqadd.s16 q7, q7, q15 2556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqadd.s16 q8, q8, q15 2566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqadd.s16 q9, q9, q15 2576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqshrun.s16 d4, q2, #5 2586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqshrun.s16 d6, q3, #5 2596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqshrun.s16 d8, q4, #5 2606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqshrun.s16 d10, q5, #5 2616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqshrun.s16 d12, q6, #5 2626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqshrun.s16 d14, q7, #5 2636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqshrun.s16 d16, q8, #5 2646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqshrun.s16 d18, q9, #5 2656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 2666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Store results to the output buffer */ 2676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .irp x, d4, d6, d8, d10, d12, d14, d16, d18 2686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe ldr TMP, [OUTPUT_BUF], #4 2696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add TMP, TMP, OUTPUT_COL 2706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {\x}, [TMP]! 2716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .endr 2726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 2736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vpop {d8-d15} 2746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe bx lr 2756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 2766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq DCT_TABLE 2776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq COEF_BLOCK 2786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq OUTPUT_BUF 2796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq OUTPUT_COL 2806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq TMP 2816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endfunc 2826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 2836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.purgem idct_helper 2846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 2856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*****************************************************************************/ 2866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 2876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/* 2886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * jsimd_idct_4x4_neon 2896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 2906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * This function contains inverse-DCT code for getting reduced-size 2916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 4x4 pixels output from an 8x8 DCT block. It uses the same calculations 2926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * and produces exactly the same output as IJG's original 'jpeg_idct_4x4' 2936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * function from jpeg-6b (jidctred.c). 2946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 2956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * NOTE: jpeg-8 has an improved implementation of 4x4 inverse-DCT, which 2966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * requires much less arithmetic operations and hence should be faster. 2976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * The primary purpose of this particular NEON optimized function is 2986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * bit exact compatibility with jpeg-6b. 2996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 3006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * TODO: a bit better instructions scheduling can be achieved by expanding 3016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * idct_helper/transpose_4x4 macros and reordering instructions, 3026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * but readability will suffer somewhat. 3036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */ 3046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define CONST_BITS 13 3066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_0_211164243 (1730) /* FIX(0.211164243) */ 3086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_0_509795579 (4176) /* FIX(0.509795579) */ 3096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_0_601344887 (4926) /* FIX(0.601344887) */ 3106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_0_720959822 (5906) /* FIX(0.720959822) */ 3116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_0_765366865 (6270) /* FIX(0.765366865) */ 3126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_0_850430095 (6967) /* FIX(0.850430095) */ 3136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_0_899976223 (7373) /* FIX(0.899976223) */ 3146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_1_061594337 (8697) /* FIX(1.061594337) */ 3156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_1_272758580 (10426) /* FIX(1.272758580) */ 3166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_1_451774981 (11893) /* FIX(1.451774981) */ 3176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_1_847759065 (15137) /* FIX(1.847759065) */ 3186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_2_172734803 (17799) /* FIX(2.172734803) */ 3196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_2_562915447 (20995) /* FIX(2.562915447) */ 3206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_3_624509785 (29692) /* FIX(3.624509785) */ 3216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.balign 16 3236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhejsimd_idct_4x4_neon_consts: 3246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short FIX_1_847759065 /* d0[0] */ 3256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short -FIX_0_765366865 /* d0[1] */ 3266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short -FIX_0_211164243 /* d0[2] */ 3276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short FIX_1_451774981 /* d0[3] */ 3286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short -FIX_2_172734803 /* d1[0] */ 3296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short FIX_1_061594337 /* d1[1] */ 3306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short -FIX_0_509795579 /* d1[2] */ 3316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short -FIX_0_601344887 /* d1[3] */ 3326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short FIX_0_899976223 /* d2[0] */ 3336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short FIX_2_562915447 /* d2[1] */ 3346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short 1 << (CONST_BITS+1) /* d2[2] */ 3356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short 0 /* d2[3] */ 3366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro idct_helper x4, x6, x8, x10, x12, x14, x16, shift, y26, y27, y28, y29 3386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmull.s16 q14, \x4, d2[2] 3396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q14, \x8, d0[0] 3406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q14, \x14, d0[1] 3416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmull.s16 q13, \x16, d1[2] 3436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q13, \x12, d1[3] 3446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q13, \x10, d2[0] 3456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q13, \x6, d2[1] 3466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmull.s16 q15, \x4, d2[2] 3486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlsl.s16 q15, \x8, d0[0] 3496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlsl.s16 q15, \x14, d0[1] 3506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmull.s16 q12, \x16, d0[2] 3526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q12, \x12, d0[3] 3536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q12, \x10, d1[0] 3546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q12, \x6, d1[1] 3556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s32 q10, q14, q13 3576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s32 q14, q14, q13 3586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.if \shift > 16 3606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshr.s32 q10, q10, #\shift 3616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshr.s32 q14, q14, #\shift 3626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmovn.s32 \y26, q10 3636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmovn.s32 \y29, q14 3646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.else 3656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 \y26, q10, #\shift 3666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 \y29, q14, #\shift 3676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endif 3686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s32 q10, q15, q12 3706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s32 q15, q15, q12 3716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.if \shift > 16 3736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshr.s32 q10, q10, #\shift 3746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshr.s32 q15, q15, #\shift 3756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmovn.s32 \y27, q10 3766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmovn.s32 \y28, q15 3776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.else 3786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 \y27, q10, #\shift 3796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 \y28, q15, #\shift 3806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endif 3816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm 3836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadheasm_function jsimd_idct_4x4_neon 3856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe DCT_TABLE .req r0 3876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe COEF_BLOCK .req r1 3886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe OUTPUT_BUF .req r2 3896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe OUTPUT_COL .req r3 3906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe TMP1 .req r0 3916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe TMP2 .req r1 3926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe TMP3 .req r2 3936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe TMP4 .req ip 3946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vpush {d8-d15} 3966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 3976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Load constants (d3 is just used for padding) */ 3986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe adr TMP4, jsimd_idct_4x4_neon_consts 3996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d0, d1, d2, d3}, [TMP4, :128] 4006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 4016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Load all COEF_BLOCK into NEON registers with the following allocation: 4026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 0 1 2 3 | 4 5 6 7 4036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * ---------+-------- 4046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 0 | d4 | d5 4056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 1 | d6 | d7 4066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 2 | d8 | d9 4076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 3 | d10 | d11 4086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 4 | - | - 4096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 5 | d12 | d13 4106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 6 | d14 | d15 4116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 7 | d16 | d17 4126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */ 4136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d4, d5, d6, d7}, [COEF_BLOCK]! 4146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d8, d9, d10, d11}, [COEF_BLOCK]! 4156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add COEF_BLOCK, COEF_BLOCK, #16 4166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d12, d13, d14, d15}, [COEF_BLOCK]! 4176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d16, d17}, [COEF_BLOCK]! 4186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* dequantize */ 4196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d18, d19, d20, d21}, [DCT_TABLE]! 4206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q2, q2, q9 4216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d22, d23, d24, d25}, [DCT_TABLE]! 4226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q3, q3, q10 4236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q4, q4, q11 4246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add DCT_TABLE, DCT_TABLE, #16 4256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d26, d27, d28, d29}, [DCT_TABLE]! 4266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q5, q5, q12 4276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q6, q6, q13 4286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d30, d31}, [DCT_TABLE]! 4296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q7, q7, q14 4306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q8, q8, q15 4316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 4326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 4336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Pass 1 */ 4346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe idct_helper d4, d6, d8, d10, d12, d14, d16, 12, d4, d6, d8, d10 4356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe transpose_4x4 d4, d6, d8, d10 4366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe idct_helper d5, d7, d9, d11, d13, d15, d17, 12, d5, d7, d9, d11 4376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe transpose_4x4 d5, d7, d9, d11 4386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 4396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Pass 2 */ 4406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe idct_helper d4, d6, d8, d10, d7, d9, d11, 19, d26, d27, d28, d29 4416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe transpose_4x4 d26, d27, d28, d29 4426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 4436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Range limit */ 4446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmov.u16 q15, #0x80 4456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 q13, q13, q15 4466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 q14, q14, q15 4476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqmovun.s16 d26, q13 4486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqmovun.s16 d27, q14 4496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 4506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Store results to the output buffer */ 4516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe ldmia OUTPUT_BUF, {TMP1, TMP2, TMP3, TMP4} 4526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add TMP1, TMP1, OUTPUT_COL 4536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add TMP2, TMP2, OUTPUT_COL 4546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add TMP3, TMP3, OUTPUT_COL 4556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add TMP4, TMP4, OUTPUT_COL 4566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 4576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#if defined(__ARMEL__) && !RESPECT_STRICT_ALIGNMENT 4586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* We can use much less instructions on little endian systems if the 4596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * OS kernel is not configured to trap unaligned memory accesses 4606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */ 4616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.32 {d26[0]}, [TMP1]! 4626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.32 {d27[0]}, [TMP3]! 4636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.32 {d26[1]}, [TMP2]! 4646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.32 {d27[1]}, [TMP4]! 4656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#else 4666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d26[0]}, [TMP1]! 4676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d27[0]}, [TMP3]! 4686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d26[1]}, [TMP1]! 4696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d27[1]}, [TMP3]! 4706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d26[2]}, [TMP1]! 4716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d27[2]}, [TMP3]! 4726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d26[3]}, [TMP1]! 4736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d27[3]}, [TMP3]! 4746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 4756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d26[4]}, [TMP2]! 4766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d27[4]}, [TMP4]! 4776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d26[5]}, [TMP2]! 4786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d27[5]}, [TMP4]! 4796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d26[6]}, [TMP2]! 4806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d27[6]}, [TMP4]! 4816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d26[7]}, [TMP2]! 4826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d27[7]}, [TMP4]! 4836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#endif 4846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 4856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vpop {d8-d15} 4866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe bx lr 4876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 4886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq DCT_TABLE 4896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq COEF_BLOCK 4906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq OUTPUT_BUF 4916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq OUTPUT_COL 4926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq TMP1 4936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq TMP2 4946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq TMP3 4956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq TMP4 4966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endfunc 4976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 4986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.purgem idct_helper 4996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 5006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*****************************************************************************/ 5016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 5026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/* 5036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * jsimd_idct_2x2_neon 5046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 5056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * This function contains inverse-DCT code for getting reduced-size 5066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 2x2 pixels output from an 8x8 DCT block. It uses the same calculations 5076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * and produces exactly the same output as IJG's original 'jpeg_idct_2x2' 5086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * function from jpeg-6b (jidctred.c). 5096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 5106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * NOTE: jpeg-8 has an improved implementation of 2x2 inverse-DCT, which 5116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * requires much less arithmetic operations and hence should be faster. 5126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * The primary purpose of this particular NEON optimized function is 5136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * bit exact compatibility with jpeg-6b. 5146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */ 5156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 5166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.balign 8 5176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhejsimd_idct_2x2_neon_consts: 5186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short -FIX_0_720959822 /* d0[0] */ 5196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short FIX_0_850430095 /* d0[1] */ 5206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short -FIX_1_272758580 /* d0[2] */ 5216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short FIX_3_624509785 /* d0[3] */ 5226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 5236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro idct_helper x4, x6, x10, x12, x16, shift, y26, y27 5246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vshll.s16 q14, \x4, #15 5256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmull.s16 q13, \x6, d0[3] 5266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q13, \x10, d0[2] 5276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q13, \x12, d0[1] 5286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q13, \x16, d0[0] 5296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 5306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s32 q10, q14, q13 5316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s32 q14, q14, q13 5326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 5336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.if \shift > 16 5346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshr.s32 q10, q10, #\shift 5356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshr.s32 q14, q14, #\shift 5366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmovn.s32 \y26, q10 5376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmovn.s32 \y27, q14 5386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.else 5396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 \y26, q10, #\shift 5406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 \y27, q14, #\shift 5416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endif 5426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 5436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm 5446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 5456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadheasm_function jsimd_idct_2x2_neon 5466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 5476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe DCT_TABLE .req r0 5486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe COEF_BLOCK .req r1 5496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe OUTPUT_BUF .req r2 5506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe OUTPUT_COL .req r3 5516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe TMP1 .req r0 5526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe TMP2 .req ip 5536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 5546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vpush {d8-d15} 5556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 5566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Load constants */ 5576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe adr TMP2, jsimd_idct_2x2_neon_consts 5586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d0}, [TMP2, :64] 5596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 5606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Load all COEF_BLOCK into NEON registers with the following allocation: 5616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 0 1 2 3 | 4 5 6 7 5626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * ---------+-------- 5636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 0 | d4 | d5 5646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 1 | d6 | d7 5656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 2 | - | - 5666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 3 | d10 | d11 5676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 4 | - | - 5686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 5 | d12 | d13 5696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 6 | - | - 5706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 7 | d16 | d17 5716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */ 5726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 5736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d4, d5, d6, d7}, [COEF_BLOCK]! 5746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add COEF_BLOCK, COEF_BLOCK, #16 5756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d10, d11}, [COEF_BLOCK]! 5766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add COEF_BLOCK, COEF_BLOCK, #16 5776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d12, d13}, [COEF_BLOCK]! 5786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add COEF_BLOCK, COEF_BLOCK, #16 5796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d16, d17}, [COEF_BLOCK]! 5806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Dequantize */ 5816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d18, d19, d20, d21}, [DCT_TABLE]! 5826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q2, q2, q9 5836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q3, q3, q10 5846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add DCT_TABLE, DCT_TABLE, #16 5856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d24, d25}, [DCT_TABLE]! 5866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q5, q5, q12 5876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add DCT_TABLE, DCT_TABLE, #16 5886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d26, d27}, [DCT_TABLE]! 5896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q6, q6, q13 5906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add DCT_TABLE, DCT_TABLE, #16 5916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d30, d31}, [DCT_TABLE]! 5926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmul.s16 q8, q8, q15 5936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 5946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Pass 1 */ 5956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmull.s16 q13, d6, d0[3] 5966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q13, d10, d0[2] 5976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q13, d12, d0[1] 5986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q13, d16, d0[0] 5996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmull.s16 q12, d7, d0[3] 6006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q12, d11, d0[2] 6016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q12, d13, d0[1] 6026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q12, d17, d0[0] 6036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vshll.s16 q14, d4, #15 6046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vshll.s16 q15, d5, #15 6056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s32 q10, q14, q13 6066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s32 q14, q14, q13 6076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 d4, q10, #13 6086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 d6, q14, #13 6096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s32 q10, q15, q12 6106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsub.s32 q14, q15, q12 6116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 d5, q10, #13 6126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 d7, q14, #13 6136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.16 q2, q3 6146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vtrn.32 q3, q5 6156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Pass 2 */ 6176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe idct_helper d4, d6, d10, d7, d11, 20, d26, d27 6186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Range limit */ 6206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmov.u16 q15, #0x80 6216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vadd.s16 q13, q13, q15 6226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqmovun.s16 d26, q13 6236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqmovun.s16 d27, q13 6246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Store results to the output buffer */ 6266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe ldmia OUTPUT_BUF, {TMP1, TMP2} 6276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add TMP1, TMP1, OUTPUT_COL 6286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add TMP2, TMP2, OUTPUT_COL 6296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d26[0]}, [TMP1]! 6316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d27[4]}, [TMP1]! 6326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d26[1]}, [TMP2]! 6336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst1.8 {d27[5]}, [TMP2]! 6346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vpop {d8-d15} 6366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe bx lr 6376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq DCT_TABLE 6396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq COEF_BLOCK 6406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq OUTPUT_BUF 6416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq OUTPUT_COL 6426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq TMP1 6436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq TMP2 6446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endfunc 6456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.purgem idct_helper 6476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*****************************************************************************/ 6496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/* 6516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * jsimd_ycc_rgba8888_convert_neon 6526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * jsimd_ycc_rgb565_convert_neon 6536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * Colorspace conversion YCbCr -> RGB 6546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */ 6556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro do_load size 6586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .if \size == 8 6596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d4}, [U]! 6606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d5}, [V]! 6616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d0}, [Y]! 6626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe pld [Y, #64] 6636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe pld [U, #64] 6646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe pld [V, #64] 6656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .elseif \size == 4 6666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d4[0]}, [U]! 6676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d4[1]}, [U]! 6686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d4[2]}, [U]! 6696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d4[3]}, [U]! 6706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d5[0]}, [V]! 6716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d5[1]}, [V]! 6726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d5[2]}, [V]! 6736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d5[3]}, [V]! 6746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d0[0]}, [Y]! 6756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d0[1]}, [Y]! 6766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d0[2]}, [Y]! 6776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d0[3]}, [Y]! 6786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .elseif \size == 2 6796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d4[4]}, [U]! 6806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d4[5]}, [U]! 6816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d5[4]}, [V]! 6826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d5[5]}, [V]! 6836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d0[4]}, [Y]! 6846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d0[5]}, [Y]! 6856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .elseif \size == 1 6866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d4[6]}, [U]! 6876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d5[6]}, [V]! 6886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.8 {d0[6]}, [Y]! 6896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .else 6906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .error unsupported macroblock size 6916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .endif 6926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm 6936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 6986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro do_store bpp, size 6996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .if \bpp == 16 7006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* if 16 bits, pack into RGB565 format */ 7016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmov d27, d10 /* insert red channel */ 7026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsri.u8 d27, d11, #5 /* shift and insert the green channel */ 7036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsli.u8 d26, d11, #3 7046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vsri.u8 d26, d12, #3 /* shift and insert the blue channel */ 7056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 7066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .if \size == 8 7076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst2.8 {d26, d27}, [RGB]! 7086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .elseif \size == 4 7096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst2.8 {d26[0], d27[0]}, [RGB]! 7106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst2.8 {d26[1], d27[1]}, [RGB]! 7116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst2.8 {d26[2], d27[2]}, [RGB]! 7126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst2.8 {d26[3], d27[3]}, [RGB]! 7136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .elseif \size == 2 7146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst2.8 {d26[4], d27[4]}, [RGB]! 7156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst2.8 {d26[5], d27[5]}, [RGB]! 7166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .elseif \size == 1 7176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst2.8 {d26[6], d27[6]}, [RGB]! 7186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .else 7196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .error unsupported macroblock size 7206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .endif 7216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .elseif \bpp == 24 7226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .if \size == 8 7236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst3.8 {d10, d11, d12}, [RGB]! 7246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .elseif \size == 4 7256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst3.8 {d10[0], d11[0], d12[0]}, [RGB]! 7266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst3.8 {d10[1], d11[1], d12[1]}, [RGB]! 7276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst3.8 {d10[2], d11[2], d12[2]}, [RGB]! 7286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst3.8 {d10[3], d11[3], d12[3]}, [RGB]! 7296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .elseif \size == 2 7306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst3.8 {d10[4], d11[4], d12[4]}, [RGB]! 7316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst3.8 {d10[5], d11[5], d12[5]}, [RGB]! 7326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .elseif \size == 1 7336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst3.8 {d10[6], d11[6], d12[6]}, [RGB]! 7346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .else 7356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .error unsupported macroblock size 7366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .endif 7376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .elseif \bpp == 32 7386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .if \size == 8 7396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst4.8 {d10, d11, d12, d13}, [RGB]! 7406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .elseif \size == 4 7416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst4.8 {d10[0], d11[0], d12[0], d13[0]}, [RGB]! 7426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst4.8 {d10[1], d11[1], d12[1], d13[1]}, [RGB]! 7436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst4.8 {d10[2], d11[2], d12[2], d13[2]}, [RGB]! 7446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst4.8 {d10[3], d11[3], d12[3], d13[3]}, [RGB]! 7456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .elseif \size == 2 7466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst4.8 {d10[4], d11[4], d12[4], d13[4]}, [RGB]! 7476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst4.8 {d10[5], d11[5], d12[5], d13[5]}, [RGB]! 7486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .elseif \size == 1 7496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vst4.8 {d10[6], d11[6], d12[6], d13[6]}, [RGB]! 7506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .else 7516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .error unsupported macroblock size 7526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .endif 7536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .else 7546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .error unsupported bpp 7556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .endif 7566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm 7576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 7586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro generate_jsimd_ycc_rgb_convert_neon colorid, bpp, r_offs, g_offs, b_offs 7596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 7606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro do_yuv_to_rgb 7616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vaddw.u8 q3, q1, d4 /* q3 = u - 128 */ 7626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vaddw.u8 q4, q1, d5 /* q2 = v - 128 */ 7636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmull.s16 q10, d6, d1[1] /* multiply by -11277 */ 7646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q10, d8, d1[2] /* multiply by -23401 */ 7656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmull.s16 q11, d7, d1[1] /* multiply by -11277 */ 7666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmlal.s16 q11, d9, d1[2] /* multiply by -23401 */ 7676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmull.s16 q12, d8, d1[0] /* multiply by 22971 */ 7686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmull.s16 q13, d9, d1[0] /* multiply by 22971 */ 7696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmull.s16 q14, d6, d1[3] /* multiply by 29033 */ 7706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmull.s16 q15, d7, d1[3] /* multiply by 29033 */ 7716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 d20, q10, #15 7726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 d21, q11, #15 7736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 d24, q12, #14 7746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 d25, q13, #14 7756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 d28, q14, #14 7766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vrshrn.s32 d29, q15, #14 7776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vaddw.u8 q10, q10, d0 7786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vaddw.u8 q12, q12, d0 7796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vaddw.u8 q14, q14, d0 7806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqmovun.s16 d1\g_offs, q10 7816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqmovun.s16 d1\r_offs, q12 7826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vqmovun.s16 d1\b_offs, q14 7836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm 7846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 7856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/* Apple gas crashes on adrl, work around that by using adr. 7866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * But this requires a copy of these constants for each function. 7876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */ 7886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 7896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.balign 16 7906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhejsimd_ycc_\colorid\()_neon_consts: 7916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short 0, 0, 0, 0 7926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short 22971, -11277, -23401, 29033 7936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short -128, -128, -128, -128 7946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .short -128, -128, -128, -128 7956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 7966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadheasm_function jsimd_ycc_\colorid\()_convert_neon 7976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe OUTPUT_WIDTH .req r0 7986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe INPUT_BUF .req r1 7996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe INPUT_ROW .req r2 8006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe OUTPUT_BUF .req r3 8016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe NUM_ROWS .req r4 8026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 8036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe INPUT_BUF0 .req r5 8046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe INPUT_BUF1 .req r6 8056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe INPUT_BUF2 .req INPUT_BUF 8066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 8076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe RGB .req r7 8086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe Y .req r8 8096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe U .req r9 8106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe V .req r10 8116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe N .req ip 8126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 8136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Load constants to d1, d2, d3 (d0 is just used for padding) */ 8146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe adr ip, jsimd_ycc_\colorid\()_neon_consts 8156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vld1.16 {d0, d1, d2, d3}, [ip, :128] 8166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 8176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Save ARM registers and handle input arguments */ 8186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe push {r4, r5, r6, r7, r8, r9, r10, lr} 8196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe ldr NUM_ROWS, [sp, #(4 * 8)] 8206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe ldr INPUT_BUF0, [INPUT_BUF] 8216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe ldr INPUT_BUF1, [INPUT_BUF, #4] 8226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe ldr INPUT_BUF2, [INPUT_BUF, #8] 8236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq INPUT_BUF 8246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 8256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Save NEON registers */ 8266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vpush {d8-d15} 8276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 8286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Initially set d10, d11, d12, d13 to 0xFF */ 8296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmov.u8 q5, #255 8306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vmov.u8 q6, #255 8316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 8326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Outer loop over scanlines */ 8336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe cmp NUM_ROWS, #1 8346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe blt 9f 8356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe0: 8366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe ldr Y, [INPUT_BUF0, INPUT_ROW, lsl #2] 8376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe ldr U, [INPUT_BUF1, INPUT_ROW, lsl #2] 8386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe mov N, OUTPUT_WIDTH 8396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe ldr V, [INPUT_BUF2, INPUT_ROW, lsl #2] 8406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe add INPUT_ROW, INPUT_ROW, #1 8416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe ldr RGB, [OUTPUT_BUF], #4 8426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 8436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Inner loop over pixels */ 8446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe subs N, N, #8 8456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe blt 2f 8466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe1: 8476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe do_load 8 8486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe do_yuv_to_rgb 8496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe do_store \bpp, 8 8506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe subs N, N, #8 8516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe bge 1b 8526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe tst N, #7 8536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe beq 8f 8546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe2: 8556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe tst N, #4 8566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe beq 3f 8576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe do_load 4 8586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe3: 8596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe tst N, #2 8606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe beq 4f 8616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe do_load 2 8626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe4: 8636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe tst N, #1 8646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe beq 5f 8656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe do_load 1 8666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe5: 8676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe do_yuv_to_rgb 8686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe tst N, #4 8696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe beq 6f 8706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe do_store \bpp, 4 8716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe6: 8726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe tst N, #2 8736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe beq 7f 8746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe do_store \bpp, 2 8756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe7: 8766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe tst N, #1 8776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe beq 8f 8786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe do_store \bpp, 1 8796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe8: 8806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe subs NUM_ROWS, NUM_ROWS, #1 8816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe bgt 0b 8826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe9: 8836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe /* Restore all registers and return */ 8846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe vpop {d8-d15} 8856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe pop {r4, r5, r6, r7, r8, r9, r10, pc} 8866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 8876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq OUTPUT_WIDTH 8886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq INPUT_ROW 8896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq OUTPUT_BUF 8906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq NUM_ROWS 8916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq INPUT_BUF0 8926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq INPUT_BUF1 8936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq INPUT_BUF2 8946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq RGB 8956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq Y 8966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq U 8976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq V 8986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe .unreq N 8996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endfunc 9006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 9016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.purgem do_yuv_to_rgb 9026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 9036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm 9046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 9056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*--------------------------------- id ----- bpp R G B */ 9066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhegenerate_jsimd_ycc_rgb_convert_neon rgba8888, 32, 0, 1, 2 9076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhegenerate_jsimd_ycc_rgb_convert_neon rgb565, 16, 0, 1, 2 9086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 9096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 9106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.purgem do_load 9116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.purgem do_store 9126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe 9136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*****************************************************************************/ 914