170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine/* 270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * jidctfst.c 370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * 470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * Copyright (C) 1994-1998, Thomas G. Lane. 570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * This file is part of the Independent JPEG Group's software. 670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * For conditions of distribution and use, see the accompanying README file. 770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * 870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * This file contains a fast, not so accurate integer implementation of the 970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * inverse DCT (Discrete Cosine Transform). In the IJG code, this routine 1070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * must also perform dequantization of the input coefficients. 1170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * 1270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * A 2-D IDCT can be done by 1-D IDCT on each column followed by 1-D IDCT 1370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * on each row (or vice versa, but it's more convenient to emit a row at 1470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * a time). Direct algorithms are also available, but they are much more 1570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * complex and seem not to be any faster when reduced to code. 1670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * 1770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * This implementation is based on Arai, Agui, and Nakajima's algorithm for 1870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * scaled DCT. Their original paper (Trans. IEICE E-71(11):1095) is in 1970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * Japanese, but the algorithm is described in the Pennebaker & Mitchell 2070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * JPEG textbook (see REFERENCES section in file README). The following code 2170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * is based directly on figure 4-8 in P&M. 2270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * While an 8-point DCT cannot be done in less than 11 multiplies, it is 2370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * possible to arrange the computation so that many of the multiplies are 2470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * simple scalings of the final outputs. These multiplies can then be 2570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * folded into the multiplications or divisions by the JPEG quantization 2670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * table entries. The AA&N method leaves only 5 multiplies and 29 adds 2770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * to be done in the DCT itself. 2870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * The primary disadvantage of this method is that with fixed-point math, 2970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * accuracy is lost due to imprecise representation of the scaled 3070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * quantization values. The smaller the quantization table entry, the less 3170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * precise the scaled value, so this implementation does worse with high- 3270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * quality-setting files than with low-quality ones. 3370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine */ 3470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 3570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define JPEG_INTERNALS 3670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#include "jinclude.h" 3770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#include "jpeglib.h" 3870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#include "jdct.h" /* Private declarations for DCT subsystem */ 3970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 4070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#ifdef DCT_IFAST_SUPPORTED 4170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 4270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 4370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine/* 4470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * This module is specialized to the case DCTSIZE = 8. 4570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine */ 4670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 4770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#if DCTSIZE != 8 4870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine Sorry, this code only copes with 8x8 DCTs. /* deliberate syntax err */ 4970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#endif 5070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 5170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 5270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine/* Scaling decisions are generally the same as in the LL&M algorithm; 5370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * see jidctint.c for more details. However, we choose to descale 5470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * (right shift) multiplication products as soon as they are formed, 5570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * rather than carrying additional fractional bits into subsequent additions. 5670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * This compromises accuracy slightly, but it lets us save a few shifts. 5770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * More importantly, 16-bit arithmetic is then adequate (for 8-bit samples) 5870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * everywhere except in the multiplications proper; this saves a good deal 5970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * of work on 16-bit-int machines. 6070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * 6170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * The dequantized coefficients are not integers because the AA&N scaling 6270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * factors have been incorporated. We represent them scaled up by PASS1_BITS, 6370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * so that the first and second IDCT rounds have the same input scaling. 6470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * For 8-bit JSAMPLEs, we choose IFAST_SCALE_BITS = PASS1_BITS so as to 6570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * avoid a descaling shift; this compromises accuracy rather drastically 6670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * for small quantization table entries, but it saves a lot of shifts. 6770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * For 12-bit JSAMPLEs, there's no hope of using 16x16 multiplies anyway, 6870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * so we use a much larger scaling factor to preserve accuracy. 6970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * 7070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * A final compromise is to represent the multiplicative constants to only 7170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * 8 fractional bits, rather than 13. This saves some shifting work on some 7270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * machines, and may also reduce the cost of multiplication (since there 7370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * are fewer one-bits in the constants). 7470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine */ 7570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 7670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#if BITS_IN_JSAMPLE == 8 7770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define CONST_BITS 8 7870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define PASS1_BITS 2 7970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#else 8070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define CONST_BITS 8 8170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define PASS1_BITS 1 /* lose a little precision to avoid overflow */ 8270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#endif 8370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 8470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine/* Some C compilers fail to reduce "FIX(constant)" at compile time, thus 8570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * causing a lot of useless floating-point operations at run time. 8670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * To get around this we use the following pre-calculated constants. 8770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * If you change CONST_BITS you may want to add appropriate values. 8870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * (With a reasonable C compiler, you can just rely on the FIX() macro...) 8970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine */ 9070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 9170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#if CONST_BITS == 8 9270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define FIX_1_082392200 ((INT32) 277) /* FIX(1.082392200) */ 9370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define FIX_1_414213562 ((INT32) 362) /* FIX(1.414213562) */ 9470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define FIX_1_847759065 ((INT32) 473) /* FIX(1.847759065) */ 9570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define FIX_2_613125930 ((INT32) 669) /* FIX(2.613125930) */ 9670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#else 9770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define FIX_1_082392200 FIX(1.082392200) 9870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define FIX_1_414213562 FIX(1.414213562) 9970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define FIX_1_847759065 FIX(1.847759065) 10070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define FIX_2_613125930 FIX(2.613125930) 10170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#endif 10270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 10370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 10470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine/* We can gain a little more speed, with a further compromise in accuracy, 10570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * by omitting the addition in a descaling shift. This yields an incorrectly 10670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * rounded result half the time... 10770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine */ 10870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 10970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#ifndef USE_ACCURATE_ROUNDING 11070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#undef DESCALE 11170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define DESCALE(x,n) RIGHT_SHIFT(x, n) 11270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#endif 11370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 11470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 11570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine/* Multiply a DCTELEM variable by an INT32 constant, and immediately 11670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * descale to yield a DCTELEM result. 11770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine */ 11870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 11970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define MULTIPLY(var,const) ((DCTELEM) DESCALE((var) * (const), CONST_BITS)) 12070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 12170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 12270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine/* Dequantize a coefficient by multiplying it by the multiplier-table 12370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * entry; produce a DCTELEM result. For 8-bit data a 16x16->16 12470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * multiplication will do. For 12-bit data, the multiplier table is 12570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * declared INT32, so a 32-bit multiply will be used. 12670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine */ 12770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 12870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#if BITS_IN_JSAMPLE == 8 12970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define DEQUANTIZE(coef,quantval) (((IFAST_MULT_TYPE) (coef)) * (quantval)) 13070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#else 13170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define DEQUANTIZE(coef,quantval) \ 13270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine DESCALE((coef)*(quantval), IFAST_SCALE_BITS-PASS1_BITS) 13370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#endif 13470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 13570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 13670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine/* Like DESCALE, but applies to a DCTELEM and produces an int. 13770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * We assume that int right shift is unsigned if INT32 right shift is. 13870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine */ 13970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 14070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#ifdef RIGHT_SHIFT_IS_UNSIGNED 14170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define ISHIFT_TEMPS DCTELEM ishift_temp; 14270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#if BITS_IN_JSAMPLE == 8 14370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define DCTELEMBITS 16 /* DCTELEM may be 16 or 32 bits */ 14470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#else 14570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define DCTELEMBITS 32 /* DCTELEM must be 32 bits */ 14670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#endif 14770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define IRIGHT_SHIFT(x,shft) \ 14870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine ((ishift_temp = (x)) < 0 ? \ 14970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine (ishift_temp >> (shft)) | ((~((DCTELEM) 0)) << (DCTELEMBITS-(shft))) : \ 15070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine (ishift_temp >> (shft))) 15170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#else 15270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define ISHIFT_TEMPS 15370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define IRIGHT_SHIFT(x,shft) ((x) >> (shft)) 15470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#endif 15570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 15670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#ifdef USE_ACCURATE_ROUNDING 15770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define IDESCALE(x,n) ((int) IRIGHT_SHIFT((x) + (1 << ((n)-1)), n)) 15870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#else 15970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#define IDESCALE(x,n) ((int) IRIGHT_SHIFT(x, n)) 16070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#endif 16170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 16270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 16370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine/* 16470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * Perform dequantization and inverse DCT on one block of coefficients. 16570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine */ 16670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 16770a18cd874a22452aca9e39e22275ed4538ed20bVladimir ChtchetkineGLOBAL(void) 16870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkinejpeg_idct_ifast (j_decompress_ptr cinfo, jpeg_component_info * compptr, 16970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine JCOEFPTR coef_block, 17070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine JSAMPARRAY output_buf, JDIMENSION output_col) 17170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine{ 17270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine DCTELEM tmp0, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7; 17370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine DCTELEM tmp10, tmp11, tmp12, tmp13; 17470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine DCTELEM z5, z10, z11, z12, z13; 17570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine JCOEFPTR inptr; 17670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine IFAST_MULT_TYPE * quantptr; 17770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine int * wsptr; 17870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine JSAMPROW outptr; 17970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine JSAMPLE *range_limit = IDCT_range_limit(cinfo); 18070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine int ctr; 18170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine int workspace[DCTSIZE2]; /* buffers data between passes */ 18270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine SHIFT_TEMPS /* for DESCALE */ 18370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine ISHIFT_TEMPS /* for IDESCALE */ 18470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 18570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine /* Pass 1: process columns from input, store into work array. */ 18670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 18770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine inptr = coef_block; 18870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine quantptr = (IFAST_MULT_TYPE *) compptr->dct_table; 18970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr = workspace; 19070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine for (ctr = DCTSIZE; ctr > 0; ctr--) { 19170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine /* Due to quantization, we will usually find that many of the input 19270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * coefficients are zero, especially the AC terms. We can exploit this 19370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * by short-circuiting the IDCT calculation for any column in which all 19470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * the AC terms are zero. In that case each output is equal to the 19570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * DC coefficient (with scale factor as needed). 19670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * With typical images and quantization tables, half or more of the 19770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * column DCT calculations can be simplified this way. 19870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine */ 19970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 20070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine if (inptr[DCTSIZE*1] == 0 && inptr[DCTSIZE*2] == 0 && 20170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine inptr[DCTSIZE*3] == 0 && inptr[DCTSIZE*4] == 0 && 20270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine inptr[DCTSIZE*5] == 0 && inptr[DCTSIZE*6] == 0 && 20370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine inptr[DCTSIZE*7] == 0) { 20470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine /* AC terms all zero */ 20570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine int dcval = (int) DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]); 20670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 20770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*0] = dcval; 20870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*1] = dcval; 20970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*2] = dcval; 21070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*3] = dcval; 21170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*4] = dcval; 21270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*5] = dcval; 21370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*6] = dcval; 21470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*7] = dcval; 21570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 21670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine inptr++; /* advance pointers to next column */ 21770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine quantptr++; 21870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr++; 21970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine continue; 22070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine } 22170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 22270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine /* Even part */ 22370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 22470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]); 22570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]); 22670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]); 22770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]); 22870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 22970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp10 = tmp0 + tmp2; /* phase 3 */ 23070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp11 = tmp0 - tmp2; 23170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 23270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp13 = tmp1 + tmp3; /* phases 5-3 */ 23370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp12 = MULTIPLY(tmp1 - tmp3, FIX_1_414213562) - tmp13; /* 2*c4 */ 23470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 23570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp0 = tmp10 + tmp13; /* phase 2 */ 23670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp3 = tmp10 - tmp13; 23770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp1 = tmp11 + tmp12; 23870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp2 = tmp11 - tmp12; 23970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 24070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine /* Odd part */ 24170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 24270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp4 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]); 24370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp5 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]); 24470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp6 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]); 24570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp7 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]); 24670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 24770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine z13 = tmp6 + tmp5; /* phase 6 */ 24870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine z10 = tmp6 - tmp5; 24970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine z11 = tmp4 + tmp7; 25070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine z12 = tmp4 - tmp7; 25170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 25270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp7 = z11 + z13; /* phase 5 */ 25370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp11 = MULTIPLY(z11 - z13, FIX_1_414213562); /* 2*c4 */ 25470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 25570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine z5 = MULTIPLY(z10 + z12, FIX_1_847759065); /* 2*c2 */ 25670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp10 = MULTIPLY(z12, FIX_1_082392200) - z5; /* 2*(c2-c6) */ 25770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp12 = MULTIPLY(z10, - FIX_2_613125930) + z5; /* -2*(c2+c6) */ 25870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 25970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp6 = tmp12 - tmp7; /* phase 2 */ 26070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp5 = tmp11 - tmp6; 26170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp4 = tmp10 + tmp5; 26270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 26370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*0] = (int) (tmp0 + tmp7); 26470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*7] = (int) (tmp0 - tmp7); 26570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*1] = (int) (tmp1 + tmp6); 26670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*6] = (int) (tmp1 - tmp6); 26770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*2] = (int) (tmp2 + tmp5); 26870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*5] = (int) (tmp2 - tmp5); 26970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*4] = (int) (tmp3 + tmp4); 27070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[DCTSIZE*3] = (int) (tmp3 - tmp4); 27170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 27270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine inptr++; /* advance pointers to next column */ 27370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine quantptr++; 27470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr++; 27570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine } 27670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 27770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine /* Pass 2: process rows from work array, store into output array. */ 27870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine /* Note that we must descale the results by a factor of 8 == 2**3, */ 27970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine /* and also undo the PASS1_BITS scaling. */ 28070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 28170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr = workspace; 28270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine for (ctr = 0; ctr < DCTSIZE; ctr++) { 28370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr = output_buf[ctr] + output_col; 28470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine /* Rows of zeroes can be exploited in the same way as we did with columns. 28570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * However, the column calculation has created many nonzero AC terms, so 28670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * the simplification applies less often (typically 5% to 10% of the time). 28770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * On machines with very fast multiplication, it's possible that the 28870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * test takes more time than it's worth. In that case this section 28970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine * may be commented out. 29070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine */ 29170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 29270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#ifndef NO_ZERO_ROW_TEST 29370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine if (wsptr[1] == 0 && wsptr[2] == 0 && wsptr[3] == 0 && wsptr[4] == 0 && 29470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr[5] == 0 && wsptr[6] == 0 && wsptr[7] == 0) { 29570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine /* AC terms all zero */ 29670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine JSAMPLE dcval = range_limit[IDESCALE(wsptr[0], PASS1_BITS+3) 29770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine & RANGE_MASK]; 29870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 29970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[0] = dcval; 30070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[1] = dcval; 30170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[2] = dcval; 30270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[3] = dcval; 30370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[4] = dcval; 30470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[5] = dcval; 30570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[6] = dcval; 30670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[7] = dcval; 30770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 30870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr += DCTSIZE; /* advance pointer to next row */ 30970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine continue; 31070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine } 31170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#endif 31270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 31370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine /* Even part */ 31470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 31570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp10 = ((DCTELEM) wsptr[0] + (DCTELEM) wsptr[4]); 31670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp11 = ((DCTELEM) wsptr[0] - (DCTELEM) wsptr[4]); 31770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 31870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp13 = ((DCTELEM) wsptr[2] + (DCTELEM) wsptr[6]); 31970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp12 = MULTIPLY((DCTELEM) wsptr[2] - (DCTELEM) wsptr[6], FIX_1_414213562) 32070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine - tmp13; 32170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 32270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp0 = tmp10 + tmp13; 32370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp3 = tmp10 - tmp13; 32470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp1 = tmp11 + tmp12; 32570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp2 = tmp11 - tmp12; 32670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 32770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine /* Odd part */ 32870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 32970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine z13 = (DCTELEM) wsptr[5] + (DCTELEM) wsptr[3]; 33070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine z10 = (DCTELEM) wsptr[5] - (DCTELEM) wsptr[3]; 33170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine z11 = (DCTELEM) wsptr[1] + (DCTELEM) wsptr[7]; 33270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine z12 = (DCTELEM) wsptr[1] - (DCTELEM) wsptr[7]; 33370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 33470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp7 = z11 + z13; /* phase 5 */ 33570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp11 = MULTIPLY(z11 - z13, FIX_1_414213562); /* 2*c4 */ 33670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 33770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine z5 = MULTIPLY(z10 + z12, FIX_1_847759065); /* 2*c2 */ 33870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp10 = MULTIPLY(z12, FIX_1_082392200) - z5; /* 2*(c2-c6) */ 33970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp12 = MULTIPLY(z10, - FIX_2_613125930) + z5; /* -2*(c2+c6) */ 34070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 34170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp6 = tmp12 - tmp7; /* phase 2 */ 34270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp5 = tmp11 - tmp6; 34370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine tmp4 = tmp10 + tmp5; 34470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 34570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine /* Final output stage: scale down by a factor of 8 and range-limit */ 34670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 34770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[0] = range_limit[IDESCALE(tmp0 + tmp7, PASS1_BITS+3) 34870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine & RANGE_MASK]; 34970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[7] = range_limit[IDESCALE(tmp0 - tmp7, PASS1_BITS+3) 35070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine & RANGE_MASK]; 35170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[1] = range_limit[IDESCALE(tmp1 + tmp6, PASS1_BITS+3) 35270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine & RANGE_MASK]; 35370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[6] = range_limit[IDESCALE(tmp1 - tmp6, PASS1_BITS+3) 35470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine & RANGE_MASK]; 35570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[2] = range_limit[IDESCALE(tmp2 + tmp5, PASS1_BITS+3) 35670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine & RANGE_MASK]; 35770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[5] = range_limit[IDESCALE(tmp2 - tmp5, PASS1_BITS+3) 35870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine & RANGE_MASK]; 35970a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[4] = range_limit[IDESCALE(tmp3 + tmp4, PASS1_BITS+3) 36070a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine & RANGE_MASK]; 36170a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine outptr[3] = range_limit[IDESCALE(tmp3 - tmp4, PASS1_BITS+3) 36270a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine & RANGE_MASK]; 36370a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 36470a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine wsptr += DCTSIZE; /* advance pointer to next row */ 36570a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine } 36670a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine} 36770a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine 36870a18cd874a22452aca9e39e22275ed4538ed20bVladimir Chtchetkine#endif /* DCT_IFAST_SUPPORTED */ 369