16a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*
26a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * ARM NEON optimizations for libjpeg-turbo
36a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *
46a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * Copyright (C) 2009-2011 Nokia Corporation and/or its subsidiary(-ies).
56a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * All rights reserved.
66a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * Contact: Alexander Bokovoy <alexander.bokovoy@nokia.com>
76a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *
86a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * This software is provided 'as-is', without any express or implied
96a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * warranty.  In no event will the authors be held liable for any damages
106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * arising from the use of this software.
116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *
126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * Permission is granted to anyone to use this software for any purpose,
136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * including commercial applications, and to alter it and redistribute it
146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * freely, subject to the following restrictions:
156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *
166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 1. The origin of this software must not be misrepresented; you must not
176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *    claim that you wrote the original software. If you use this software
186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *    in a product, an acknowledgment in the product documentation would be
196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *    appreciated but is not required.
206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 2. Altered source versions must be plainly marked as such, and must not be
216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *    misrepresented as being the original software.
226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 3. This notice may not be removed or altered from any source distribution.
236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */
246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/* Copyright (c) 2011,  NVIDIA CORPORATION. All rights reserved.
256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *
266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * Redistribution and use in source and binary forms, with or without
276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * modification, are permitted provided that the following conditions
286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * are met:
296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *
306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *  * Redistributions of source code must retain the above copyright
316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *    notice, this list of conditions and the following disclaimer.
326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *  * Redistributions in binary form must reproduce the above copyright
336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *    notice, this list of conditions and the following disclaimer in the
346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *    documentation and/or other materials provided with the distribution.
356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *  * Neither the name of the NVIDIA CORPORATION nor the names of its
366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *    contributors may be used to endorse or promote products derived
376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *    from this software without specific prior written permission.
386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * THE POSSIBILITY OF SUCH DAMAGE.
496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */
506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#if defined(__linux__) && defined(__ELF__)
546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.section .note.GNU-stack,"",%progbits /* mark stack as non-executable */
556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#endif
566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.text
586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.fpu neon
596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.arch armv7a
606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.object_arch armv7a
616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.arm
626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define RESPECT_STRICT_ALIGNMENT 1
656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*****************************************************************************/
676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/* Supplementary macro for setting function attributes */
696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro asm_function fname
706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .func \fname
716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .global \fname
726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#ifdef __ELF__
736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .hidden \fname
746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .type \fname, %function
756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#endif
766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe\fname:
776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm
786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/* Transpose a block of 4x4 coefficients in four 64-bit registers */
806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro transpose_4x4 x0, x1, x2, x3
816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.16 \x0, \x1
826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.16 \x2, \x3
836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.32 \x0, \x2
846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.32 \x1, \x3
856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm
866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*****************************************************************************/
886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*
906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * jsimd_idct_ifast_neon
916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *
926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * This function contains a fast, not so accurate integer implementation of
936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * the inverse DCT (Discrete Cosine Transform). It uses the same calculations
946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * and produces exactly the same output as IJG's original 'jpeg_idct_fast'
956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * function from jidctfst.c
966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *
976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * TODO: a bit better instructions scheduling is needed.
986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */
996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
1006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define XFIX_1_082392200 d0[0]
1016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define XFIX_1_414213562 d0[1]
1026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define XFIX_1_847759065 d0[2]
1036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define XFIX_2_613125930 d0[3]
1046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
1056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.balign 16
1066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhejsimd_idct_ifast_neon_consts:
1076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short (277 * 128 - 256 * 128) /* XFIX_1_082392200 */
1086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short (362 * 128 - 256 * 128) /* XFIX_1_414213562 */
1096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short (473 * 128 - 256 * 128) /* XFIX_1_847759065 */
1106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short (669 * 128 - 512 * 128) /* XFIX_2_613125930 */
1116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
1126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/* 1-D IDCT helper macro */
1136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
1146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro idct_helper  x0, x1, x2, x3, x4, x5, x6, x7, \
1156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe                    t10, t11, t12, t13, t14
1166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
1176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \t10, \x0, \x4
1186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \x4,  \x0, \x4
1196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vswp.s16        \t10, \x0
1206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \t11, \x2, \x6
1216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \x6,  \x2, \x6
1226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vswp.s16        \t11, \x2
1236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \t10, \x3, \x5
1246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \x5,  \x3, \x5
1256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vswp.s16        \t10, \x3
1266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \t11, \x1, \x7
1276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \x7,  \x1, \x7
1286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vswp.s16        \t11, \x1
1296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
1306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqdmulh.s16     \t13, \x2,  d0[1]
1316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \t12, \x3,  \x3
1326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \x2,  \x2,  \t13
1336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqdmulh.s16     \t13, \x3,  d0[3]
1346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \t10,  \x1, \x3
1356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \t12, \t12, \t13
1366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqdmulh.s16     \t13, \t10, d0[2]
1376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \t11, \x7,  \x5
1386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \t10, \t10, \t13
1396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqdmulh.s16     \t13, \t11, d0[1]
1406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \t11, \t11, \t13
1416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
1426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqdmulh.s16     \t13, \x1,  d0[0]
1436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \x2,  \x6,  \x2
1446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \t14, \x0,  \x2
1456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \x2,  \x0,  \x2
1466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \x0,  \x4,  \x6
1476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \x4,  \x4,  \x6
1486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \x1,  \x1,  \t13
1496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \t13, \x7,  \x5
1506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \t12, \t13, \t12
1516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \t12, \t12, \t10
1526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \t11, \t12, \t11
1536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \t10, \x1,  \t10
1546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \t10, \t10, \t11
1556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
1566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \x7,  \x0,  \t13
1576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \x0,  \x0,  \t13
1586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \x6,  \t14, \t12
1596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \x1,  \t14, \t12
1606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \x5,  \x2,  \t11
1616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \x2,  \x2,  \t11
1626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s16        \x3,  \x4,  \t10
1636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        \x4,  \x4,  \t10
1646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm
1656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
1666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadheasm_function jsimd_idct_ifast_neon
1676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
1686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    DCT_TABLE       .req r0
1696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    COEF_BLOCK      .req r1
1706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    OUTPUT_BUF      .req r2
1716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    OUTPUT_COL      .req r3
1726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    TMP             .req ip
1736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
1746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vpush           {d8-d15}
1756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
1766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Load constants */
1776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    adr             TMP, jsimd_idct_ifast_neon_consts
1786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d0}, [TMP, :64]
1796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
1806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Load all COEF_BLOCK into NEON registers with the following allocation:
1816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *       0 1 2 3 | 4 5 6 7
1826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *      ---------+--------
1836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   0 | d4      | d5
1846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   1 | d6      | d7
1856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   2 | d8      | d9
1866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   3 | d10     | d11
1876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   4 | d12     | d13
1886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   5 | d14     | d15
1896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   6 | d16     | d17
1906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   7 | d18     | d19
1916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     */
1926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d4, d5, d6, d7}, [COEF_BLOCK]!
1936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d8, d9, d10, d11}, [COEF_BLOCK]!
1946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d12, d13, d14, d15}, [COEF_BLOCK]!
1956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d16, d17, d18, d19}, [COEF_BLOCK]!
1966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Dequantize */
1976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d20, d21, d22, d23}, [DCT_TABLE]!
1986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q2, q2, q10
1996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d24, d25, d26, d27}, [DCT_TABLE]!
2006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q3, q3, q11
2016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q4, q4, q12
2026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d28, d29, d30, d31}, [DCT_TABLE]!
2036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q5, q5, q13
2046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q6, q6, q14
2056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d20, d21, d22, d23}, [DCT_TABLE]!
2066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q7, q7, q15
2076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q8, q8, q10
2086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q9, q9, q11
2096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
2106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Pass 1 : process columns from input, store into work array.*/
2116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    idct_helper     q2, q3, q4, q5, q6, q7, q8, q9, q10, q11, q12, q13, q14
2126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Transpose */
2136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.16 q2, q3
2146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.16 q4, q5
2156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.32 q2, q4
2166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.32 q3, q5
2176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
2186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.16 q6, q7
2196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.16 q8, q9
2206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.32 q6, q8
2216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.32 q7, q9
2226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
2236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vswp            d12, d5
2246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vswp            d14, d7
2256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vswp            d16, d9
2266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vswp            d18, d11
2276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
2286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Pass 2 */
2296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    idct_helper     q2, q3, q4, q5, q6, q7, q8, q9, q10, q11, q12, q13, q14
2306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Transpose */
2316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
2326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.16 q2, q3
2336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.16 q4, q5
2346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.32 q2, q4
2356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.32 q3, q5
2366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
2376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.16 q6, q7
2386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.16 q8, q9
2396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.32 q6, q8
2406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.32 q7, q9
2416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
2426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vswp            d12, d5
2436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vswp            d14, d7
2446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vswp            d16, d9
2456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vswp            d18, d11
2466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
2476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Descale and range limit */
2486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmov.s16        q15, #(0x80 << 5)
2496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqadd.s16       q2, q2, q15
2506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqadd.s16       q3, q3, q15
2516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqadd.s16       q4, q4, q15
2526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqadd.s16       q5, q5, q15
2536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqadd.s16       q6, q6, q15
2546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqadd.s16       q7, q7, q15
2556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqadd.s16       q8, q8, q15
2566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqadd.s16       q9, q9, q15
2576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqshrun.s16     d4, q2, #5
2586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqshrun.s16     d6, q3, #5
2596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqshrun.s16     d8, q4, #5
2606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqshrun.s16     d10, q5, #5
2616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqshrun.s16     d12, q6, #5
2626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqshrun.s16     d14, q7, #5
2636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqshrun.s16     d16, q8, #5
2646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqshrun.s16     d18, q9, #5
2656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
2666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Store results to the output buffer */
2676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .irp            x, d4, d6, d8, d10, d12, d14, d16, d18
2686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    ldr             TMP, [OUTPUT_BUF], #4
2696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add             TMP, TMP, OUTPUT_COL
2706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {\x}, [TMP]!
2716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .endr
2726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
2736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vpop            {d8-d15}
2746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    bx              lr
2756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
2766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          DCT_TABLE
2776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          COEF_BLOCK
2786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          OUTPUT_BUF
2796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          OUTPUT_COL
2806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          TMP
2816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endfunc
2826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
2836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.purgem idct_helper
2846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
2856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*****************************************************************************/
2866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
2876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*
2886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * jsimd_idct_4x4_neon
2896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *
2906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * This function contains inverse-DCT code for getting reduced-size
2916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 4x4 pixels output from an 8x8 DCT block. It uses the same  calculations
2926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * and produces exactly the same output as IJG's original 'jpeg_idct_4x4'
2936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * function from jpeg-6b (jidctred.c).
2946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *
2956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * NOTE: jpeg-8 has an improved implementation of 4x4 inverse-DCT, which
2966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *       requires much less arithmetic operations and hence should be faster.
2976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *       The primary purpose of this particular NEON optimized function is
2986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *       bit exact compatibility with jpeg-6b.
2996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *
3006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * TODO: a bit better instructions scheduling can be achieved by expanding
3016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *       idct_helper/transpose_4x4 macros and reordering instructions,
3026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *       but readability will suffer somewhat.
3036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */
3046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define CONST_BITS  13
3066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_0_211164243  (1730)  /* FIX(0.211164243) */
3086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_0_509795579  (4176)  /* FIX(0.509795579) */
3096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_0_601344887  (4926)  /* FIX(0.601344887) */
3106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_0_720959822  (5906)  /* FIX(0.720959822) */
3116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_0_765366865  (6270)  /* FIX(0.765366865) */
3126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_0_850430095  (6967)  /* FIX(0.850430095) */
3136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_0_899976223  (7373)  /* FIX(0.899976223) */
3146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_1_061594337  (8697)  /* FIX(1.061594337) */
3156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_1_272758580  (10426) /* FIX(1.272758580) */
3166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_1_451774981  (11893) /* FIX(1.451774981) */
3176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_1_847759065  (15137) /* FIX(1.847759065) */
3186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_2_172734803  (17799) /* FIX(2.172734803) */
3196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_2_562915447  (20995) /* FIX(2.562915447) */
3206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#define FIX_3_624509785  (29692) /* FIX(3.624509785) */
3216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.balign 16
3236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhejsimd_idct_4x4_neon_consts:
3246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     FIX_1_847759065     /* d0[0] */
3256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     -FIX_0_765366865    /* d0[1] */
3266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     -FIX_0_211164243    /* d0[2] */
3276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     FIX_1_451774981     /* d0[3] */
3286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     -FIX_2_172734803    /* d1[0] */
3296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     FIX_1_061594337     /* d1[1] */
3306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     -FIX_0_509795579    /* d1[2] */
3316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     -FIX_0_601344887    /* d1[3] */
3326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     FIX_0_899976223     /* d2[0] */
3336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     FIX_2_562915447     /* d2[1] */
3346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     1 << (CONST_BITS+1) /* d2[2] */
3356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     0                   /* d2[3] */
3366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro idct_helper x4, x6, x8, x10, x12, x14, x16, shift, y26, y27, y28, y29
3386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmull.s16       q14, \x4,  d2[2]
3396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q14, \x8,  d0[0]
3406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q14, \x14, d0[1]
3416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmull.s16       q13, \x16, d1[2]
3436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q13, \x12, d1[3]
3446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q13, \x10, d2[0]
3456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q13, \x6,  d2[1]
3466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmull.s16       q15, \x4,  d2[2]
3486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlsl.s16       q15, \x8,  d0[0]
3496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlsl.s16       q15, \x14, d0[1]
3506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmull.s16       q12, \x16, d0[2]
3526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q12, \x12, d0[3]
3536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q12, \x10, d1[0]
3546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q12, \x6,  d1[1]
3556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s32        q10, q14, q13
3576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s32        q14, q14, q13
3586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.if \shift > 16
3606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshr.s32       q10,  q10, #\shift
3616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshr.s32       q14,  q14, #\shift
3626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmovn.s32       \y26, q10
3636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmovn.s32       \y29, q14
3646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.else
3656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32      \y26, q10, #\shift
3666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32      \y29, q14, #\shift
3676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endif
3686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s32        q10, q15, q12
3706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s32        q15, q15, q12
3716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.if \shift > 16
3736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshr.s32       q10,  q10, #\shift
3746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshr.s32       q15,  q15, #\shift
3756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmovn.s32       \y27, q10
3766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmovn.s32       \y28, q15
3776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.else
3786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32      \y27, q10, #\shift
3796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32      \y28, q15, #\shift
3806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endif
3816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm
3836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadheasm_function jsimd_idct_4x4_neon
3856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    DCT_TABLE       .req r0
3876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    COEF_BLOCK      .req r1
3886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    OUTPUT_BUF      .req r2
3896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    OUTPUT_COL      .req r3
3906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    TMP1            .req r0
3916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    TMP2            .req r1
3926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    TMP3            .req r2
3936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    TMP4            .req ip
3946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vpush           {d8-d15}
3966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
3976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Load constants (d3 is just used for padding) */
3986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    adr             TMP4, jsimd_idct_4x4_neon_consts
3996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d0, d1, d2, d3}, [TMP4, :128]
4006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
4016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Load all COEF_BLOCK into NEON registers with the following allocation:
4026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *       0 1 2 3 | 4 5 6 7
4036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *      ---------+--------
4046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   0 | d4      | d5
4056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   1 | d6      | d7
4066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   2 | d8      | d9
4076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   3 | d10     | d11
4086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   4 | -       | -
4096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   5 | d12     | d13
4106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   6 | d14     | d15
4116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   7 | d16     | d17
4126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     */
4136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d4, d5, d6, d7}, [COEF_BLOCK]!
4146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d8, d9, d10, d11}, [COEF_BLOCK]!
4156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add COEF_BLOCK, COEF_BLOCK, #16
4166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d12, d13, d14, d15}, [COEF_BLOCK]!
4176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d16, d17}, [COEF_BLOCK]!
4186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* dequantize */
4196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d18, d19, d20, d21}, [DCT_TABLE]!
4206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q2, q2, q9
4216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d22, d23, d24, d25}, [DCT_TABLE]!
4226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q3, q3, q10
4236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q4, q4, q11
4246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add             DCT_TABLE, DCT_TABLE, #16
4256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d26, d27, d28, d29}, [DCT_TABLE]!
4266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q5, q5, q12
4276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q6, q6, q13
4286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d30, d31}, [DCT_TABLE]!
4296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q7, q7, q14
4306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q8, q8, q15
4316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
4326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
4336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Pass 1 */
4346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    idct_helper     d4, d6, d8, d10, d12, d14, d16, 12, d4, d6, d8, d10
4356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    transpose_4x4   d4, d6, d8, d10
4366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    idct_helper     d5, d7, d9, d11, d13, d15, d17, 12, d5, d7, d9, d11
4376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    transpose_4x4   d5, d7, d9, d11
4386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
4396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Pass 2 */
4406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    idct_helper     d4, d6, d8, d10, d7, d9, d11, 19, d26, d27, d28, d29
4416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    transpose_4x4   d26, d27, d28, d29
4426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
4436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Range limit */
4446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmov.u16        q15, #0x80
4456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        q13, q13, q15
4466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        q14, q14, q15
4476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqmovun.s16     d26, q13
4486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqmovun.s16     d27, q14
4496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
4506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Store results to the output buffer */
4516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    ldmia           OUTPUT_BUF, {TMP1, TMP2, TMP3, TMP4}
4526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add             TMP1, TMP1, OUTPUT_COL
4536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add             TMP2, TMP2, OUTPUT_COL
4546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add             TMP3, TMP3, OUTPUT_COL
4556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add             TMP4, TMP4, OUTPUT_COL
4566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
4576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#if defined(__ARMEL__) && !RESPECT_STRICT_ALIGNMENT
4586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* We can use much less instructions on little endian systems if the
4596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     * OS kernel is not configured to trap unaligned memory accesses
4606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     */
4616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.32         {d26[0]}, [TMP1]!
4626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.32         {d27[0]}, [TMP3]!
4636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.32         {d26[1]}, [TMP2]!
4646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.32         {d27[1]}, [TMP4]!
4656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#else
4666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d26[0]}, [TMP1]!
4676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d27[0]}, [TMP3]!
4686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d26[1]}, [TMP1]!
4696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d27[1]}, [TMP3]!
4706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d26[2]}, [TMP1]!
4716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d27[2]}, [TMP3]!
4726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d26[3]}, [TMP1]!
4736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d27[3]}, [TMP3]!
4746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
4756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d26[4]}, [TMP2]!
4766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d27[4]}, [TMP4]!
4776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d26[5]}, [TMP2]!
4786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d27[5]}, [TMP4]!
4796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d26[6]}, [TMP2]!
4806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d27[6]}, [TMP4]!
4816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d26[7]}, [TMP2]!
4826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d27[7]}, [TMP4]!
4836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe#endif
4846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
4856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vpop            {d8-d15}
4866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    bx              lr
4876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
4886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          DCT_TABLE
4896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          COEF_BLOCK
4906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          OUTPUT_BUF
4916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          OUTPUT_COL
4926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          TMP1
4936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          TMP2
4946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          TMP3
4956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          TMP4
4966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endfunc
4976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
4986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.purgem idct_helper
4996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
5006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*****************************************************************************/
5016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
5026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*
5036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * jsimd_idct_2x2_neon
5046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *
5056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * This function contains inverse-DCT code for getting reduced-size
5066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * 2x2 pixels output from an 8x8 DCT block. It uses the same  calculations
5076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * and produces exactly the same output as IJG's original 'jpeg_idct_2x2'
5086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * function from jpeg-6b (jidctred.c).
5096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *
5106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * NOTE: jpeg-8 has an improved implementation of 2x2 inverse-DCT, which
5116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *       requires much less arithmetic operations and hence should be faster.
5126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *       The primary purpose of this particular NEON optimized function is
5136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe *       bit exact compatibility with jpeg-6b.
5146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */
5156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
5166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.balign 8
5176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhejsimd_idct_2x2_neon_consts:
5186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     -FIX_0_720959822    /* d0[0] */
5196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     FIX_0_850430095     /* d0[1] */
5206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     -FIX_1_272758580    /* d0[2] */
5216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short     FIX_3_624509785     /* d0[3] */
5226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
5236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro idct_helper x4, x6, x10, x12, x16, shift, y26, y27
5246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vshll.s16  q14,  \x4,  #15
5256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmull.s16  q13,  \x6,  d0[3]
5266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16  q13,  \x10, d0[2]
5276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16  q13,  \x12, d0[1]
5286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16  q13,  \x16, d0[0]
5296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
5306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s32   q10,  q14,  q13
5316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s32   q14,  q14,  q13
5326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
5336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.if \shift > 16
5346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshr.s32  q10,  q10,  #\shift
5356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshr.s32  q14,  q14,  #\shift
5366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmovn.s32  \y26, q10
5376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmovn.s32  \y27, q14
5386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.else
5396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32 \y26, q10,  #\shift
5406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32 \y27, q14,  #\shift
5416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endif
5426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
5436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm
5446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
5456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadheasm_function jsimd_idct_2x2_neon
5466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
5476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    DCT_TABLE       .req r0
5486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    COEF_BLOCK      .req r1
5496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    OUTPUT_BUF      .req r2
5506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    OUTPUT_COL      .req r3
5516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    TMP1            .req r0
5526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    TMP2            .req ip
5536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
5546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vpush           {d8-d15}
5556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
5566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Load constants */
5576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    adr             TMP2, jsimd_idct_2x2_neon_consts
5586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d0}, [TMP2, :64]
5596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
5606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Load all COEF_BLOCK into NEON registers with the following allocation:
5616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *       0 1 2 3 | 4 5 6 7
5626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *      ---------+--------
5636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   0 | d4      | d5
5646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   1 | d6      | d7
5656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   2 | -       | -
5666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   3 | d10     | d11
5676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   4 | -       | -
5686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   5 | d12     | d13
5696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   6 | -       | -
5706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     *   7 | d16     | d17
5716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe     */
5726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
5736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d4, d5, d6, d7}, [COEF_BLOCK]!
5746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add             COEF_BLOCK, COEF_BLOCK, #16
5756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d10, d11}, [COEF_BLOCK]!
5766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add             COEF_BLOCK, COEF_BLOCK, #16
5776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d12, d13}, [COEF_BLOCK]!
5786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add             COEF_BLOCK, COEF_BLOCK, #16
5796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d16, d17}, [COEF_BLOCK]!
5806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Dequantize */
5816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d18, d19, d20, d21}, [DCT_TABLE]!
5826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q2, q2, q9
5836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q3, q3, q10
5846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add             DCT_TABLE, DCT_TABLE, #16
5856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d24, d25}, [DCT_TABLE]!
5866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q5, q5, q12
5876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add             DCT_TABLE, DCT_TABLE, #16
5886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d26, d27}, [DCT_TABLE]!
5896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q6, q6, q13
5906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add             DCT_TABLE, DCT_TABLE, #16
5916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d30, d31}, [DCT_TABLE]!
5926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmul.s16        q8, q8, q15
5936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
5946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Pass 1 */
5956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmull.s16       q13, d6,  d0[3]
5966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q13, d10, d0[2]
5976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q13, d12, d0[1]
5986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q13, d16, d0[0]
5996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmull.s16       q12, d7,  d0[3]
6006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q12, d11, d0[2]
6016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q12, d13, d0[1]
6026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q12, d17, d0[0]
6036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vshll.s16       q14, d4,  #15
6046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vshll.s16       q15, d5,  #15
6056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s32        q10, q14, q13
6066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s32        q14, q14, q13
6076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32      d4,  q10, #13
6086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32      d6,  q14, #13
6096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s32        q10, q15, q12
6106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vsub.s32        q14, q15, q12
6116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32      d5,  q10, #13
6126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32      d7,  q14, #13
6136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.16         q2,  q3
6146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vtrn.32         q3,  q5
6156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Pass 2 */
6176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    idct_helper     d4, d6, d10, d7, d11, 20, d26, d27
6186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Range limit */
6206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmov.u16        q15, #0x80
6216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vadd.s16        q13, q13, q15
6226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqmovun.s16     d26, q13
6236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqmovun.s16     d27, q13
6246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Store results to the output buffer */
6266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    ldmia           OUTPUT_BUF, {TMP1, TMP2}
6276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add             TMP1, TMP1, OUTPUT_COL
6286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add             TMP2, TMP2, OUTPUT_COL
6296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d26[0]}, [TMP1]!
6316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d27[4]}, [TMP1]!
6326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d26[1]}, [TMP2]!
6336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vst1.8          {d27[5]}, [TMP2]!
6346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vpop            {d8-d15}
6366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    bx              lr
6376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          DCT_TABLE
6396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          COEF_BLOCK
6406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          OUTPUT_BUF
6416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          OUTPUT_COL
6426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          TMP1
6436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          TMP2
6446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endfunc
6456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.purgem idct_helper
6476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*****************************************************************************/
6496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*
6516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * jsimd_ycc_rgba8888_convert_neon
6526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * jsimd_ycc_rgb565_convert_neon
6536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * Colorspace conversion YCbCr -> RGB
6546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */
6556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro do_load size
6586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .if \size == 8
6596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d4}, [U]!
6606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d5}, [V]!
6616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d0}, [Y]!
6626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        pld     [Y, #64]
6636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        pld     [U, #64]
6646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        pld     [V, #64]
6656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .elseif \size == 4
6666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d4[0]}, [U]!
6676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d4[1]}, [U]!
6686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d4[2]}, [U]!
6696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d4[3]}, [U]!
6706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d5[0]}, [V]!
6716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d5[1]}, [V]!
6726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d5[2]}, [V]!
6736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d5[3]}, [V]!
6746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d0[0]}, [Y]!
6756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d0[1]}, [Y]!
6766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d0[2]}, [Y]!
6776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d0[3]}, [Y]!
6786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .elseif \size == 2
6796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d4[4]}, [U]!
6806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d4[5]}, [U]!
6816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d5[4]}, [V]!
6826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d5[5]}, [V]!
6836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d0[4]}, [Y]!
6846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d0[5]}, [Y]!
6856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .elseif \size == 1
6866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d4[6]}, [U]!
6876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d5[6]}, [V]!
6886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        vld1.8  {d0[6]}, [Y]!
6896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .else
6906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .error unsupported macroblock size
6916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .endif
6926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm
6936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
6986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro do_store bpp, size
6996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .if \bpp == 16
7006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            /* if 16 bits, pack into RGB565 format */
7016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vmov      d27, d10          /* insert red channel */
7026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vsri.u8   d27, d11, #5      /* shift and insert the green channel */
7036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vsli.u8   d26, d11, #3
7046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vsri.u8   d26, d12, #3     /* shift and insert the blue channel */
7056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
7066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .if \size == 8
7076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst2.8  {d26, d27}, [RGB]!
7086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .elseif \size == 4
7096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst2.8  {d26[0], d27[0]}, [RGB]!
7106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst2.8  {d26[1], d27[1]}, [RGB]!
7116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst2.8  {d26[2], d27[2]}, [RGB]!
7126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst2.8  {d26[3], d27[3]}, [RGB]!
7136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .elseif \size == 2
7146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst2.8  {d26[4], d27[4]}, [RGB]!
7156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst2.8  {d26[5], d27[5]}, [RGB]!
7166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .elseif \size == 1
7176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst2.8  {d26[6], d27[6]}, [RGB]!
7186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .else
7196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            .error unsupported macroblock size
7206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .endif
7216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .elseif \bpp == 24
7226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .if \size == 8
7236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst3.8  {d10, d11, d12}, [RGB]!
7246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .elseif \size == 4
7256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst3.8  {d10[0], d11[0], d12[0]}, [RGB]!
7266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst3.8  {d10[1], d11[1], d12[1]}, [RGB]!
7276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst3.8  {d10[2], d11[2], d12[2]}, [RGB]!
7286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst3.8  {d10[3], d11[3], d12[3]}, [RGB]!
7296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .elseif \size == 2
7306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst3.8  {d10[4], d11[4], d12[4]}, [RGB]!
7316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst3.8  {d10[5], d11[5], d12[5]}, [RGB]!
7326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .elseif \size == 1
7336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst3.8  {d10[6], d11[6], d12[6]}, [RGB]!
7346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .else
7356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            .error unsupported macroblock size
7366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .endif
7376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .elseif \bpp == 32
7386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .if \size == 8
7396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst4.8  {d10, d11, d12, d13}, [RGB]!
7406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .elseif \size == 4
7416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst4.8  {d10[0], d11[0], d12[0], d13[0]}, [RGB]!
7426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst4.8  {d10[1], d11[1], d12[1], d13[1]}, [RGB]!
7436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst4.8  {d10[2], d11[2], d12[2], d13[2]}, [RGB]!
7446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst4.8  {d10[3], d11[3], d12[3], d13[3]}, [RGB]!
7456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .elseif \size == 2
7466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst4.8  {d10[4], d11[4], d12[4], d13[4]}, [RGB]!
7476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst4.8  {d10[5], d11[5], d12[5], d13[5]}, [RGB]!
7486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .elseif \size == 1
7496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            vst4.8  {d10[6], d11[6], d12[6], d13[6]}, [RGB]!
7506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .else
7516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe            .error unsupported macroblock size
7526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .endif
7536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .else
7546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe        .error unsupported bpp
7556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .endif
7566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm
7576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
7586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro generate_jsimd_ycc_rgb_convert_neon colorid, bpp, r_offs, g_offs, b_offs
7596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
7606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.macro do_yuv_to_rgb
7616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vaddw.u8        q3, q1, d4     /* q3 = u - 128 */
7626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vaddw.u8        q4, q1, d5     /* q2 = v - 128 */
7636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmull.s16       q10, d6, d1[1] /* multiply by -11277 */
7646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q10, d8, d1[2] /* multiply by -23401 */
7656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmull.s16       q11, d7, d1[1] /* multiply by -11277 */
7666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmlal.s16       q11, d9, d1[2] /* multiply by -23401 */
7676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmull.s16       q12, d8, d1[0] /* multiply by 22971 */
7686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmull.s16       q13, d9, d1[0] /* multiply by 22971 */
7696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmull.s16       q14, d6, d1[3] /* multiply by 29033 */
7706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmull.s16       q15, d7, d1[3] /* multiply by 29033 */
7716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32      d20, q10, #15
7726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32      d21, q11, #15
7736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32      d24, q12, #14
7746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32      d25, q13, #14
7756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32      d28, q14, #14
7766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vrshrn.s32      d29, q15, #14
7776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vaddw.u8        q10, q10, d0
7786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vaddw.u8        q12, q12, d0
7796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vaddw.u8        q14, q14, d0
7806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqmovun.s16     d1\g_offs, q10
7816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqmovun.s16     d1\r_offs, q12
7826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vqmovun.s16     d1\b_offs, q14
7836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm
7846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
7856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/* Apple gas crashes on adrl, work around that by using adr.
7866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe * But this requires a copy of these constants for each function.
7876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe */
7886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
7896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.balign 16
7906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhejsimd_ycc_\colorid\()_neon_consts:
7916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short          0,      0,     0,      0
7926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short          22971, -11277, -23401, 29033
7936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short          -128,  -128,   -128,   -128
7946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .short          -128,  -128,   -128,   -128
7956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
7966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadheasm_function jsimd_ycc_\colorid\()_convert_neon
7976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    OUTPUT_WIDTH    .req r0
7986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    INPUT_BUF       .req r1
7996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    INPUT_ROW       .req r2
8006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    OUTPUT_BUF      .req r3
8016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    NUM_ROWS        .req r4
8026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
8036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    INPUT_BUF0      .req r5
8046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    INPUT_BUF1      .req r6
8056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    INPUT_BUF2      .req INPUT_BUF
8066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
8076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    RGB             .req r7
8086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    Y               .req r8
8096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    U               .req r9
8106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    V               .req r10
8116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    N               .req ip
8126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
8136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Load constants to d1, d2, d3 (d0 is just used for padding) */
8146a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    adr             ip, jsimd_ycc_\colorid\()_neon_consts
8156a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vld1.16         {d0, d1, d2, d3}, [ip, :128]
8166a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
8176a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Save ARM registers and handle input arguments */
8186a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    push            {r4, r5, r6, r7, r8, r9, r10, lr}
8196a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    ldr             NUM_ROWS, [sp, #(4 * 8)]
8206a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    ldr             INPUT_BUF0, [INPUT_BUF]
8216a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    ldr             INPUT_BUF1, [INPUT_BUF, #4]
8226a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    ldr             INPUT_BUF2, [INPUT_BUF, #8]
8236a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          INPUT_BUF
8246a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
8256a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Save NEON registers */
8266a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vpush           {d8-d15}
8276a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
8286a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Initially set d10, d11, d12, d13 to 0xFF */
8296a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmov.u8         q5, #255
8306a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vmov.u8         q6, #255
8316a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
8326a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Outer loop over scanlines */
8336a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    cmp             NUM_ROWS, #1
8346a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    blt             9f
8356a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe0:
8366a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    ldr             Y, [INPUT_BUF0, INPUT_ROW, lsl #2]
8376a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    ldr             U, [INPUT_BUF1, INPUT_ROW, lsl #2]
8386a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    mov             N, OUTPUT_WIDTH
8396a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    ldr             V, [INPUT_BUF2, INPUT_ROW, lsl #2]
8406a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    add             INPUT_ROW, INPUT_ROW, #1
8416a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    ldr             RGB, [OUTPUT_BUF], #4
8426a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
8436a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Inner loop over pixels */
8446a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    subs            N, N, #8
8456a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    blt             2f
8466a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe1:
8476a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    do_load         8
8486a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    do_yuv_to_rgb
8496a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    do_store        \bpp, 8
8506a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    subs            N, N, #8
8516a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    bge             1b
8526a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    tst             N, #7
8536a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    beq             8f
8546a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe2:
8556a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    tst             N, #4
8566a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    beq             3f
8576a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    do_load         4
8586a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe3:
8596a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    tst             N, #2
8606a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    beq             4f
8616a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    do_load         2
8626a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe4:
8636a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    tst             N, #1
8646a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    beq             5f
8656a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    do_load         1
8666a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe5:
8676a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    do_yuv_to_rgb
8686a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    tst             N, #4
8696a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    beq             6f
8706a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    do_store        \bpp, 4
8716a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe6:
8726a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    tst             N, #2
8736a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    beq             7f
8746a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    do_store        \bpp, 2
8756a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe7:
8766a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    tst             N, #1
8776a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    beq             8f
8786a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    do_store        \bpp, 1
8796a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe8:
8806a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    subs            NUM_ROWS, NUM_ROWS, #1
8816a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    bgt             0b
8826a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe9:
8836a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    /* Restore all registers and return */
8846a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    vpop            {d8-d15}
8856a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    pop             {r4, r5, r6, r7, r8, r9, r10, pc}
8866a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
8876a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          OUTPUT_WIDTH
8886a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          INPUT_ROW
8896a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          OUTPUT_BUF
8906a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          NUM_ROWS
8916a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          INPUT_BUF0
8926a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          INPUT_BUF1
8936a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          INPUT_BUF2
8946a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          RGB
8956a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          Y
8966a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          U
8976a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          V
8986a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe    .unreq          N
8996a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endfunc
9006a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
9016a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.purgem do_yuv_to_rgb
9026a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
9036a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.endm
9046a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
9056a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*--------------------------------- id ----- bpp R  G  B */
9066a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhegenerate_jsimd_ycc_rgb_convert_neon rgba8888, 32, 0, 1, 2
9076a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhegenerate_jsimd_ycc_rgb_convert_neon rgb565,  16, 0, 1, 2
9086a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
9096a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
9106a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.purgem do_load
9116a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe.purgem do_store
9126a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe
9136a3be8dfbb7c258e7fbbd11f1078bf11c9be89bdPrajakta Gudadhe/*****************************************************************************/
914