1/* stb_image - v2.08 - public domain image loader - http://nothings.org/stb_image.h
2                                     no warranty implied; use at your own risk
3
4   Do this:
5      #define STB_IMAGE_IMPLEMENTATION
6   before you include this file in *one* C or C++ file to create the implementation.
7
8   // i.e. it should look like this:
9   #include ...
10   #include ...
11   #include ...
12   #define STB_IMAGE_IMPLEMENTATION
13   #include "stb_image.h"
14
15   You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
16   And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
17
18
19   QUICK NOTES:
20      Primarily of interest to game developers and other people who can
21          avoid problematic images and only need the trivial interface
22
23      JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
24      PNG 1/2/4/8-bit-per-channel (16 bpc not supported)
25
26      TGA (not sure what subset, if a subset)
27      BMP non-1bpp, non-RLE
28      PSD (composited view only, no extra channels, 8/16 bit-per-channel)
29
30      GIF (*comp always reports as 4-channel)
31      HDR (radiance rgbE format)
32      PIC (Softimage PIC)
33      PNM (PPM and PGM binary only)
34
35      Animated GIF still needs a proper API, but here's one way to do it:
36          http://gist.github.com/urraka/685d9a6340b26b830d49
37
38      - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
39      - decode from arbitrary I/O callbacks
40      - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
41
42   Full documentation under "DOCUMENTATION" below.
43
44
45   Revision 2.00 release notes:
46
47      - Progressive JPEG is now supported.
48
49      - PPM and PGM binary formats are now supported, thanks to Ken Miller.
50
51      - x86 platforms now make use of SSE2 SIMD instructions for
52        JPEG decoding, and ARM platforms can use NEON SIMD if requested.
53        This work was done by Fabian "ryg" Giesen. SSE2 is used by
54        default, but NEON must be enabled explicitly; see docs.
55
56        With other JPEG optimizations included in this version, we see
57        2x speedup on a JPEG on an x86 machine, and a 1.5x speedup
58        on a JPEG on an ARM machine, relative to previous versions of this
59        library. The same results will not obtain for all JPGs and for all
60        x86/ARM machines. (Note that progressive JPEGs are significantly
61        slower to decode than regular JPEGs.) This doesn't mean that this
62        is the fastest JPEG decoder in the land; rather, it brings it
63        closer to parity with standard libraries. If you want the fastest
64        decode, look elsewhere. (See "Philosophy" section of docs below.)
65
66        See final bullet items below for more info on SIMD.
67
68      - Added STBI_MALLOC, STBI_REALLOC, and STBI_FREE macros for replacing
69        the memory allocator. Unlike other STBI libraries, these macros don't
70        support a context parameter, so if you need to pass a context in to
71        the allocator, you'll have to store it in a global or a thread-local
72        variable.
73
74      - Split existing STBI_NO_HDR flag into two flags, STBI_NO_HDR and
75        STBI_NO_LINEAR.
76            STBI_NO_HDR:     suppress implementation of .hdr reader format
77            STBI_NO_LINEAR:  suppress high-dynamic-range light-linear float API
78
79      - You can suppress implementation of any of the decoders to reduce
80        your code footprint by #defining one or more of the following
81        symbols before creating the implementation.
82
83            STBI_NO_JPEG
84            STBI_NO_PNG
85            STBI_NO_BMP
86            STBI_NO_PSD
87            STBI_NO_TGA
88            STBI_NO_GIF
89            STBI_NO_HDR
90            STBI_NO_PIC
91            STBI_NO_PNM   (.ppm and .pgm)
92
93      - You can request *only* certain decoders and suppress all other ones
94        (this will be more forward-compatible, as addition of new decoders
95        doesn't require you to disable them explicitly):
96
97            STBI_ONLY_JPEG
98            STBI_ONLY_PNG
99            STBI_ONLY_BMP
100            STBI_ONLY_PSD
101            STBI_ONLY_TGA
102            STBI_ONLY_GIF
103            STBI_ONLY_HDR
104            STBI_ONLY_PIC
105            STBI_ONLY_PNM   (.ppm and .pgm)
106
107         Note that you can define multiples of these, and you will get all
108         of them ("only x" and "only y" is interpreted to mean "only x&y").
109
110       - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
111         want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
112
113      - Compilation of all SIMD code can be suppressed with
114            #define STBI_NO_SIMD
115        It should not be necessary to disable SIMD unless you have issues
116        compiling (e.g. using an x86 compiler which doesn't support SSE
117        intrinsics or that doesn't support the method used to detect
118        SSE2 support at run-time), and even those can be reported as
119        bugs so I can refine the built-in compile-time checking to be
120        smarter.
121
122      - The old STBI_SIMD system which allowed installing a user-defined
123        IDCT etc. has been removed. If you need this, don't upgrade. My
124        assumption is that almost nobody was doing this, and those who
125        were will find the built-in SIMD more satisfactory anyway.
126
127      - RGB values computed for JPEG images are slightly different from
128        previous versions of stb_image. (This is due to using less
129        integer precision in SIMD.) The C code has been adjusted so
130        that the same RGB values will be computed regardless of whether
131        SIMD support is available, so your app should always produce
132        consistent results. But these results are slightly different from
133        previous versions. (Specifically, about 3% of available YCbCr values
134        will compute different RGB results from pre-1.49 versions by +-1;
135        most of the deviating values are one smaller in the G channel.)
136
137      - If you must produce consistent results with previous versions of
138        stb_image, #define STBI_JPEG_OLD and you will get the same results
139        you used to; however, you will not get the SIMD speedups for
140        the YCbCr-to-RGB conversion step (although you should still see
141        significant JPEG speedup from the other changes).
142
143        Please note that STBI_JPEG_OLD is a temporary feature; it will be
144        removed in future versions of the library. It is only intended for
145        near-term back-compatibility use.
146
147
148   Latest revision history:
149      2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
150      2.07  (2015-09-13) partial animated GIF support
151                         limited 16-bit PSD support
152                         minor bugs, code cleanup, and compiler warnings
153      2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
154      2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
155      2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
156      2.03  (2015-04-12) additional corruption checking
157                         stbi_set_flip_vertically_on_load
158                         fix NEON support; fix mingw support
159      2.02  (2015-01-19) fix incorrect assert, fix warning
160      2.01  (2015-01-17) fix various warnings
161      2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
162      2.00  (2014-12-25) optimize JPEG, including x86 SSE2 & ARM NEON SIMD
163                         progressive JPEG
164                         PGM/PPM support
165                         STBI_MALLOC,STBI_REALLOC,STBI_FREE
166                         STBI_NO_*, STBI_ONLY_*
167                         GIF bugfix
168      1.48  (2014-12-14) fix incorrectly-named assert()
169      1.47  (2014-12-14) 1/2/4-bit PNG support (both grayscale and paletted)
170                         optimize PNG
171                         fix bug in interlaced PNG with user-specified channel count
172
173   See end of file for full revision history.
174
175
176 ============================    Contributors    =========================
177
178 Image formats                                Bug fixes & warning fixes
179    Sean Barrett (jpeg, png, bmp)                Marc LeBlanc
180    Nicolas Schulz (hdr, psd)                    Christpher Lloyd
181    Jonathan Dummer (tga)                        Dave Moore
182    Jean-Marc Lienher (gif)                      Won Chun
183    Tom Seddon (pic)                             the Horde3D community
184    Thatcher Ulrich (psd)                        Janez Zemva
185    Ken Miller (pgm, ppm)                        Jonathan Blow
186    urraka@github (animated gif)                 Laurent Gomila
187                                                 Aruelien Pocheville
188                                                 Ryamond Barbiero
189                                                 David Woo
190 Extensions, features                            Martin Golini
191    Jetro Lauha (stbi_info)                      Roy Eltham
192    Martin "SpartanJ" Golini (stbi_info)         Luke Graham
193    James "moose2000" Brown (iPhone PNG)         Thomas Ruf
194    Ben "Disch" Wenger (io callbacks)            John Bartholomew
195    Omar Cornut (1/2/4-bit PNG)                  Ken Hamada
196    Nicolas Guillemot (vertical flip)            Cort Stratton
197    Richard Mitton (16-bit PSD)                  Blazej Dariusz Roszkowski
198                                                 Thibault Reuille
199                                                 Paul Du Bois
200                                                 Guillaume George
201                                                 Jerry Jansson
202                                                 Hayaki Saito
203                                                 Johan Duparc
204                                                 Ronny Chevalier
205 Optimizations & bugfixes                        Michal Cichon
206    Fabian "ryg" Giesen                          Tero Hanninen
207    Arseny Kapoulkine                            Sergio Gonzalez
208                                                 Cass Everitt
209                                                 Engin Manap
210  If your name should be here but                Martins Mozeiko
211  isn't, let Sean know.                          Joseph Thomson
212                                                 Phil Jordan
213                                                 Nathan Reed
214                                                 Michaelangel007@github
215                                                 Nick Verigakis
216
217LICENSE
218
219This software is in the public domain. Where that dedication is not
220recognized, you are granted a perpetual, irrevocable license to copy,
221distribute, and modify this file as you see fit.
222
223*/
224
225#ifndef STBI_INCLUDE_STB_IMAGE_H
226#define STBI_INCLUDE_STB_IMAGE_H
227
228// DOCUMENTATION
229//
230// Limitations:
231//    - no 16-bit-per-channel PNG
232//    - no 12-bit-per-channel JPEG
233//    - no JPEGs with arithmetic coding
234//    - no 1-bit BMP
235//    - GIF always returns *comp=4
236//
237// Basic usage (see HDR discussion below for HDR usage):
238//    int x,y,n;
239//    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
240//    // ... process data if not NULL ...
241//    // ... x = width, y = height, n = # 8-bit components per pixel ...
242//    // ... replace '0' with '1'..'4' to force that many components per pixel
243//    // ... but 'n' will always be the number that it would have been if you said 0
244//    stbi_image_free(data)
245//
246// Standard parameters:
247//    int *x       -- outputs image width in pixels
248//    int *y       -- outputs image height in pixels
249//    int *comp    -- outputs # of image components in image file
250//    int req_comp -- if non-zero, # of image components requested in result
251//
252// The return value from an image loader is an 'unsigned char *' which points
253// to the pixel data, or NULL on an allocation failure or if the image is
254// corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
255// with each pixel consisting of N interleaved 8-bit components; the first
256// pixel pointed to is top-left-most in the image. There is no padding between
257// image scanlines or between pixels, regardless of format. The number of
258// components N is 'req_comp' if req_comp is non-zero, or *comp otherwise.
259// If req_comp is non-zero, *comp has the number of components that _would_
260// have been output otherwise. E.g. if you set req_comp to 4, you will always
261// get RGBA output, but you can check *comp to see if it's trivially opaque
262// because e.g. there were only 3 channels in the source image.
263//
264// An output image with N components has the following components interleaved
265// in this order in each pixel:
266//
267//     N=#comp     components
268//       1           grey
269//       2           grey, alpha
270//       3           red, green, blue
271//       4           red, green, blue, alpha
272//
273// If image loading fails for any reason, the return value will be NULL,
274// and *x, *y, *comp will be unchanged. The function stbi_failure_reason()
275// can be queried for an extremely brief, end-user unfriendly explanation
276// of why the load failed. Define STBI_NO_FAILURE_STRINGS to avoid
277// compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
278// more user-friendly ones.
279//
280// Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
281//
282// ===========================================================================
283//
284// Philosophy
285//
286// stb libraries are designed with the following priorities:
287//
288//    1. easy to use
289//    2. easy to maintain
290//    3. good performance
291//
292// Sometimes I let "good performance" creep up in priority over "easy to maintain",
293// and for best performance I may provide less-easy-to-use APIs that give higher
294// performance, in addition to the easy to use ones. Nevertheless, it's important
295// to keep in mind that from the standpoint of you, a client of this library,
296// all you care about is #1 and #3, and stb libraries do not emphasize #3 above all.
297//
298// Some secondary priorities arise directly from the first two, some of which
299// make more explicit reasons why performance can't be emphasized.
300//
301//    - Portable ("ease of use")
302//    - Small footprint ("easy to maintain")
303//    - No dependencies ("ease of use")
304//
305// ===========================================================================
306//
307// I/O callbacks
308//
309// I/O callbacks allow you to read from arbitrary sources, like packaged
310// files or some other source. Data read from callbacks are processed
311// through a small internal buffer (currently 128 bytes) to try to reduce
312// overhead.
313//
314// The three functions you must define are "read" (reads some bytes of data),
315// "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
316//
317// ===========================================================================
318//
319// SIMD support
320//
321// The JPEG decoder will try to automatically use SIMD kernels on x86 when
322// supported by the compiler. For ARM Neon support, you must explicitly
323// request it.
324//
325// (The old do-it-yourself SIMD API is no longer supported in the current
326// code.)
327//
328// On x86, SSE2 will automatically be used when available based on a run-time
329// test; if not, the generic C versions are used as a fall-back. On ARM targets,
330// the typical path is to have separate builds for NEON and non-NEON devices
331// (at least this is true for iOS and Android). Therefore, the NEON support is
332// toggled by a build flag: define STBI_NEON to get NEON loops.
333//
334// The output of the JPEG decoder is slightly different from versions where
335// SIMD support was introduced (that is, for versions before 1.49). The
336// difference is only +-1 in the 8-bit RGB channels, and only on a small
337// fraction of pixels. You can force the pre-1.49 behavior by defining
338// STBI_JPEG_OLD, but this will disable some of the SIMD decoding path
339// and hence cost some performance.
340//
341// If for some reason you do not want to use any of SIMD code, or if
342// you have issues compiling it, you can disable it entirely by
343// defining STBI_NO_SIMD.
344//
345// ===========================================================================
346//
347// HDR image support   (disable by defining STBI_NO_HDR)
348//
349// stb_image now supports loading HDR images in general, and currently
350// the Radiance .HDR file format, although the support is provided
351// generically. You can still load any file through the existing interface;
352// if you attempt to load an HDR file, it will be automatically remapped to
353// LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
354// both of these constants can be reconfigured through this interface:
355//
356//     stbi_hdr_to_ldr_gamma(2.2f);
357//     stbi_hdr_to_ldr_scale(1.0f);
358//
359// (note, do not use _inverse_ constants; stbi_image will invert them
360// appropriately).
361//
362// Additionally, there is a new, parallel interface for loading files as
363// (linear) floats to preserve the full dynamic range:
364//
365//    float *data = stbi_loadf(filename, &x, &y, &n, 0);
366//
367// If you load LDR images through this interface, those images will
368// be promoted to floating point values, run through the inverse of
369// constants corresponding to the above:
370//
371//     stbi_ldr_to_hdr_scale(1.0f);
372//     stbi_ldr_to_hdr_gamma(2.2f);
373//
374// Finally, given a filename (or an open file or memory block--see header
375// file for details) containing image data, you can query for the "most
376// appropriate" interface to use (that is, whether the image is HDR or
377// not), using:
378//
379//     stbi_is_hdr(char *filename);
380//
381// ===========================================================================
382//
383// iPhone PNG support:
384//
385// By default we convert iphone-formatted PNGs back to RGB, even though
386// they are internally encoded differently. You can disable this conversion
387// by by calling stbi_convert_iphone_png_to_rgb(0), in which case
388// you will always just get the native iphone "format" through (which
389// is BGR stored in RGB).
390//
391// Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
392// pixel to remove any premultiplied alpha *only* if the image file explicitly
393// says there's premultiplied data (currently only happens in iPhone images,
394// and only if iPhone convert-to-rgb processing is on).
395//
396
397
398#ifndef STBI_NO_STDIO
399#include <stdio.h>
400#endif // STBI_NO_STDIO
401
402#define STBI_VERSION 1
403
404enum
405{
406   STBI_default = 0, // only used for req_comp
407
408   STBI_grey       = 1,
409   STBI_grey_alpha = 2,
410   STBI_rgb        = 3,
411   STBI_rgb_alpha  = 4
412};
413
414typedef unsigned char stbi_uc;
415
416#ifdef __cplusplus
417extern "C" {
418#endif
419
420#ifdef STB_IMAGE_STATIC
421#define STBIDEF static
422#else
423#define STBIDEF extern
424#endif
425
426//////////////////////////////////////////////////////////////////////////////
427//
428// PRIMARY API - works on images of any type
429//
430
431//
432// load image by filename, open file, or memory buffer
433//
434
435typedef struct
436{
437   int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
438   void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
439   int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
440} stbi_io_callbacks;
441
442STBIDEF stbi_uc *stbi_load               (char              const *filename,           int *x, int *y, int *comp, int req_comp);
443STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *comp, int req_comp);
444STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *comp, int req_comp);
445
446#ifndef STBI_NO_STDIO
447STBIDEF stbi_uc *stbi_load_from_file  (FILE *f,                  int *x, int *y, int *comp, int req_comp);
448// for stbi_load_from_file, file pointer is left pointing immediately after image
449#endif
450
451#ifndef STBI_NO_LINEAR
452   STBIDEF float *stbi_loadf                 (char const *filename,           int *x, int *y, int *comp, int req_comp);
453   STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp);
454   STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp);
455
456   #ifndef STBI_NO_STDIO
457   STBIDEF float *stbi_loadf_from_file  (FILE *f,                int *x, int *y, int *comp, int req_comp);
458   #endif
459#endif
460
461#ifndef STBI_NO_HDR
462   STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
463   STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
464#endif
465
466#ifndef STBI_NO_LINEAR
467   STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
468   STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
469#endif // STBI_NO_HDR
470
471// stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
472STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
473STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
474#ifndef STBI_NO_STDIO
475STBIDEF int      stbi_is_hdr          (char const *filename);
476STBIDEF int      stbi_is_hdr_from_file(FILE *f);
477#endif // STBI_NO_STDIO
478
479
480// get a VERY brief reason for failure
481// NOT THREADSAFE
482STBIDEF const char *stbi_failure_reason  (void);
483
484// free the loaded image -- this is just free()
485STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
486
487// get image dimensions & components without fully decoding
488STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
489STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
490
491#ifndef STBI_NO_STDIO
492STBIDEF int      stbi_info            (char const *filename,     int *x, int *y, int *comp);
493STBIDEF int      stbi_info_from_file  (FILE *f,                  int *x, int *y, int *comp);
494
495#endif
496
497
498
499// for image formats that explicitly notate that they have premultiplied alpha,
500// we just return the colors as stored in the file. set this flag to force
501// unpremultiplication. results are undefined if the unpremultiply overflow.
502STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
503
504// indicate whether we should process iphone images back to canonical format,
505// or just pass them through "as-is"
506STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
507
508// flip the image vertically, so the first pixel in the output array is the bottom left
509STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
510
511// ZLIB client - used by PNG, available for other purposes
512
513STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
514STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
515STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
516STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
517
518STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
519STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
520
521
522#ifdef __cplusplus
523}
524#endif
525
526//
527//
528////   end header file   /////////////////////////////////////////////////////
529#endif // STBI_INCLUDE_STB_IMAGE_H
530
531#ifdef STB_IMAGE_IMPLEMENTATION
532
533#if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
534  || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
535  || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
536  || defined(STBI_ONLY_ZLIB)
537   #ifndef STBI_ONLY_JPEG
538   #define STBI_NO_JPEG
539   #endif
540   #ifndef STBI_ONLY_PNG
541   #define STBI_NO_PNG
542   #endif
543   #ifndef STBI_ONLY_BMP
544   #define STBI_NO_BMP
545   #endif
546   #ifndef STBI_ONLY_PSD
547   #define STBI_NO_PSD
548   #endif
549   #ifndef STBI_ONLY_TGA
550   #define STBI_NO_TGA
551   #endif
552   #ifndef STBI_ONLY_GIF
553   #define STBI_NO_GIF
554   #endif
555   #ifndef STBI_ONLY_HDR
556   #define STBI_NO_HDR
557   #endif
558   #ifndef STBI_ONLY_PIC
559   #define STBI_NO_PIC
560   #endif
561   #ifndef STBI_ONLY_PNM
562   #define STBI_NO_PNM
563   #endif
564#endif
565
566#if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
567#define STBI_NO_ZLIB
568#endif
569
570
571#include <stdarg.h>
572#include <stddef.h> // ptrdiff_t on osx
573#include <stdlib.h>
574#include <string.h>
575
576#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
577#include <math.h>  // ldexp
578#endif
579
580#ifndef STBI_NO_STDIO
581#include <stdio.h>
582#endif
583
584#ifndef STBI_ASSERT
585#include <assert.h>
586#define STBI_ASSERT(x) assert(x)
587#endif
588
589
590#ifndef _MSC_VER
591   #ifdef __cplusplus
592   #define stbi_inline inline
593   #else
594   #define stbi_inline
595   #endif
596#else
597   #define stbi_inline __forceinline
598#endif
599
600
601#ifdef _MSC_VER
602typedef unsigned short stbi__uint16;
603typedef   signed short stbi__int16;
604typedef unsigned int   stbi__uint32;
605typedef   signed int   stbi__int32;
606#else
607#include <stdint.h>
608typedef uint16_t stbi__uint16;
609typedef int16_t  stbi__int16;
610typedef uint32_t stbi__uint32;
611typedef int32_t  stbi__int32;
612#endif
613
614// should produce compiler error if size is wrong
615typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
616
617#ifdef _MSC_VER
618#define STBI_NOTUSED(v)  (void)(v)
619#else
620#define STBI_NOTUSED(v)  (void)sizeof(v)
621#endif
622
623#ifdef _MSC_VER
624#define STBI_HAS_LROTL
625#endif
626
627#ifdef STBI_HAS_LROTL
628   #define stbi_lrot(x,y)  _lrotl(x,y)
629#else
630   #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (32 - (y))))
631#endif
632
633#if defined(STBI_MALLOC) && defined(STBI_FREE) && defined(STBI_REALLOC)
634// ok
635#elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC)
636// ok
637#else
638#error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC."
639#endif
640
641#ifndef STBI_MALLOC
642#define STBI_MALLOC(sz)    malloc(sz)
643#define STBI_REALLOC(p,sz) realloc(p,sz)
644#define STBI_FREE(p)       free(p)
645#endif
646
647// x86/x64 detection
648#if defined(__x86_64__) || defined(_M_X64)
649#define STBI__X64_TARGET
650#elif defined(__i386) || defined(_M_IX86)
651#define STBI__X86_TARGET
652#endif
653
654#if defined(__GNUC__) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET)) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
655// NOTE: not clear do we actually need this for the 64-bit path?
656// gcc doesn't support sse2 intrinsics unless you compile with -msse2,
657// (but compiling with -msse2 allows the compiler to use SSE2 everywhere;
658// this is just broken and gcc are jerks for not fixing it properly
659// http://www.virtualdub.org/blog/pivot/entry.php?id=363 )
660#define STBI_NO_SIMD
661#endif
662
663#if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
664// Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
665//
666// 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
667// Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
668// As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
669// simultaneously enabling "-mstackrealign".
670//
671// See https://github.com/nothings/stb/issues/81 for more information.
672//
673// So default to no SSE2 on 32-bit MinGW. If you've read this far and added
674// -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
675#define STBI_NO_SIMD
676#endif
677
678#if !defined(STBI_NO_SIMD) && defined(STBI__X86_TARGET)
679#define STBI_SSE2
680#include <emmintrin.h>
681
682#ifdef _MSC_VER
683
684#if _MSC_VER >= 1400  // not VC6
685#include <intrin.h> // __cpuid
686static int stbi__cpuid3(void)
687{
688   int info[4];
689   __cpuid(info,1);
690   return info[3];
691}
692#else
693static int stbi__cpuid3(void)
694{
695   int res;
696   __asm {
697      mov  eax,1
698      cpuid
699      mov  res,edx
700   }
701   return res;
702}
703#endif
704
705#define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
706
707static int stbi__sse2_available()
708{
709   int info3 = stbi__cpuid3();
710   return ((info3 >> 26) & 1) != 0;
711}
712#else // assume GCC-style if not VC++
713#define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
714
715static int stbi__sse2_available()
716{
717#if defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__) >= 408 // GCC 4.8 or later
718   // GCC 4.8+ has a nice way to do this
719   return __builtin_cpu_supports("sse2");
720#else
721   // portable way to do this, preferably without using GCC inline ASM?
722   // just bail for now.
723   return 0;
724#endif
725}
726#endif
727#endif
728
729// ARM NEON
730#if defined(STBI_NO_SIMD) && defined(STBI_NEON)
731#undef STBI_NEON
732#endif
733
734#ifdef STBI_NEON
735#include <arm_neon.h>
736// assume GCC or Clang on ARM targets
737#define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
738#endif
739
740#ifndef STBI_SIMD_ALIGN
741#define STBI_SIMD_ALIGN(type, name) type name
742#endif
743
744///////////////////////////////////////////////
745//
746//  stbi__context struct and start_xxx functions
747
748// stbi__context structure is our basic context used by all images, so it
749// contains all the IO context, plus some basic image information
750typedef struct
751{
752   stbi__uint32 img_x, img_y;
753   int img_n, img_out_n;
754
755   stbi_io_callbacks io;
756   void *io_user_data;
757
758   int read_from_callbacks;
759   int buflen;
760   stbi_uc buffer_start[128];
761
762   stbi_uc *img_buffer, *img_buffer_end;
763   stbi_uc *img_buffer_original, *img_buffer_original_end;
764} stbi__context;
765
766
767static void stbi__refill_buffer(stbi__context *s);
768
769// initialize a memory-decode context
770static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
771{
772   s->io.read = NULL;
773   s->read_from_callbacks = 0;
774   s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
775   s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
776}
777
778// initialize a callback-based context
779static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
780{
781   s->io = *c;
782   s->io_user_data = user;
783   s->buflen = sizeof(s->buffer_start);
784   s->read_from_callbacks = 1;
785   s->img_buffer_original = s->buffer_start;
786   stbi__refill_buffer(s);
787   s->img_buffer_original_end = s->img_buffer_end;
788}
789
790#ifndef STBI_NO_STDIO
791
792static int stbi__stdio_read(void *user, char *data, int size)
793{
794   return (int) fread(data,1,size,(FILE*) user);
795}
796
797static void stbi__stdio_skip(void *user, int n)
798{
799   fseek((FILE*) user, n, SEEK_CUR);
800}
801
802static int stbi__stdio_eof(void *user)
803{
804   return feof((FILE*) user);
805}
806
807static stbi_io_callbacks stbi__stdio_callbacks =
808{
809   stbi__stdio_read,
810   stbi__stdio_skip,
811   stbi__stdio_eof,
812};
813
814static void stbi__start_file(stbi__context *s, FILE *f)
815{
816   stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
817}
818
819//static void stop_file(stbi__context *s) { }
820
821#endif // !STBI_NO_STDIO
822
823static void stbi__rewind(stbi__context *s)
824{
825   // conceptually rewind SHOULD rewind to the beginning of the stream,
826   // but we just rewind to the beginning of the initial buffer, because
827   // we only use it after doing 'test', which only ever looks at at most 92 bytes
828   s->img_buffer = s->img_buffer_original;
829   s->img_buffer_end = s->img_buffer_original_end;
830}
831
832#ifndef STBI_NO_JPEG
833static int      stbi__jpeg_test(stbi__context *s);
834static stbi_uc *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
835static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
836#endif
837
838#ifndef STBI_NO_PNG
839static int      stbi__png_test(stbi__context *s);
840static stbi_uc *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
841static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
842#endif
843
844#ifndef STBI_NO_BMP
845static int      stbi__bmp_test(stbi__context *s);
846static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
847static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
848#endif
849
850#ifndef STBI_NO_TGA
851static int      stbi__tga_test(stbi__context *s);
852static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
853static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
854#endif
855
856#ifndef STBI_NO_PSD
857static int      stbi__psd_test(stbi__context *s);
858static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
859static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
860#endif
861
862#ifndef STBI_NO_HDR
863static int      stbi__hdr_test(stbi__context *s);
864static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
865static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
866#endif
867
868#ifndef STBI_NO_PIC
869static int      stbi__pic_test(stbi__context *s);
870static stbi_uc *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
871static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
872#endif
873
874#ifndef STBI_NO_GIF
875static int      stbi__gif_test(stbi__context *s);
876static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
877static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
878#endif
879
880#ifndef STBI_NO_PNM
881static int      stbi__pnm_test(stbi__context *s);
882static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
883static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
884#endif
885
886// this is not threadsafe
887static const char *stbi__g_failure_reason;
888
889STBIDEF const char *stbi_failure_reason(void)
890{
891   return stbi__g_failure_reason;
892}
893
894static int stbi__err(const char *str)
895{
896   stbi__g_failure_reason = str;
897   return 0;
898}
899
900static void *stbi__malloc(size_t size)
901{
902    return STBI_MALLOC(size);
903}
904
905// stbi__err - error
906// stbi__errpf - error returning pointer to float
907// stbi__errpuc - error returning pointer to unsigned char
908
909#ifdef STBI_NO_FAILURE_STRINGS
910   #define stbi__err(x,y)  0
911#elif defined(STBI_FAILURE_USERMSG)
912   #define stbi__err(x,y)  stbi__err(y)
913#else
914   #define stbi__err(x,y)  stbi__err(x)
915#endif
916
917#define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
918#define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
919
920STBIDEF void stbi_image_free(void *retval_from_stbi_load)
921{
922   STBI_FREE(retval_from_stbi_load);
923}
924
925#ifndef STBI_NO_LINEAR
926static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
927#endif
928
929#ifndef STBI_NO_HDR
930static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
931#endif
932
933static int stbi__vertically_flip_on_load = 0;
934
935STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
936{
937    stbi__vertically_flip_on_load = flag_true_if_should_flip;
938}
939
940static unsigned char *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
941{
942   #ifndef STBI_NO_JPEG
943   if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp);
944   #endif
945   #ifndef STBI_NO_PNG
946   if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp);
947   #endif
948   #ifndef STBI_NO_BMP
949   if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp);
950   #endif
951   #ifndef STBI_NO_GIF
952   if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp);
953   #endif
954   #ifndef STBI_NO_PSD
955   if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp);
956   #endif
957   #ifndef STBI_NO_PIC
958   if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp);
959   #endif
960   #ifndef STBI_NO_PNM
961   if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp);
962   #endif
963
964   #ifndef STBI_NO_HDR
965   if (stbi__hdr_test(s)) {
966      float *hdr = stbi__hdr_load(s, x,y,comp,req_comp);
967      return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
968   }
969   #endif
970
971   #ifndef STBI_NO_TGA
972   // test tga last because it's a crappy test!
973   if (stbi__tga_test(s))
974      return stbi__tga_load(s,x,y,comp,req_comp);
975   #endif
976
977   return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
978}
979
980static unsigned char *stbi__load_flip(stbi__context *s, int *x, int *y, int *comp, int req_comp)
981{
982   unsigned char *result = stbi__load_main(s, x, y, comp, req_comp);
983
984   if (stbi__vertically_flip_on_load && result != NULL) {
985      int w = *x, h = *y;
986      int depth = req_comp ? req_comp : *comp;
987      int row,col,z;
988      stbi_uc temp;
989
990      // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
991      for (row = 0; row < (h>>1); row++) {
992         for (col = 0; col < w; col++) {
993            for (z = 0; z < depth; z++) {
994               temp = result[(row * w + col) * depth + z];
995               result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
996               result[((h - row - 1) * w + col) * depth + z] = temp;
997            }
998         }
999      }
1000   }
1001
1002   return result;
1003}
1004
1005#ifndef STBI_NO_HDR
1006static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
1007{
1008   if (stbi__vertically_flip_on_load && result != NULL) {
1009      int w = *x, h = *y;
1010      int depth = req_comp ? req_comp : *comp;
1011      int row,col,z;
1012      float temp;
1013
1014      // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
1015      for (row = 0; row < (h>>1); row++) {
1016         for (col = 0; col < w; col++) {
1017            for (z = 0; z < depth; z++) {
1018               temp = result[(row * w + col) * depth + z];
1019               result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
1020               result[((h - row - 1) * w + col) * depth + z] = temp;
1021            }
1022         }
1023      }
1024   }
1025}
1026#endif
1027
1028#ifndef STBI_NO_STDIO
1029
1030static FILE *stbi__fopen(char const *filename, char const *mode)
1031{
1032   FILE *f;
1033#if defined(_MSC_VER) && _MSC_VER >= 1400
1034   if (0 != fopen_s(&f, filename, mode))
1035      f=0;
1036#else
1037   f = fopen(filename, mode);
1038#endif
1039   return f;
1040}
1041
1042
1043STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
1044{
1045   FILE *f = stbi__fopen(filename, "rb");
1046   unsigned char *result;
1047   if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
1048   result = stbi_load_from_file(f,x,y,comp,req_comp);
1049   fclose(f);
1050   return result;
1051}
1052
1053STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1054{
1055   unsigned char *result;
1056   stbi__context s;
1057   stbi__start_file(&s,f);
1058   result = stbi__load_flip(&s,x,y,comp,req_comp);
1059   if (result) {
1060      // need to 'unget' all the characters in the IO buffer
1061      fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1062   }
1063   return result;
1064}
1065#endif //!STBI_NO_STDIO
1066
1067STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1068{
1069   stbi__context s;
1070   stbi__start_mem(&s,buffer,len);
1071   return stbi__load_flip(&s,x,y,comp,req_comp);
1072}
1073
1074STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1075{
1076   stbi__context s;
1077   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1078   return stbi__load_flip(&s,x,y,comp,req_comp);
1079}
1080
1081#ifndef STBI_NO_LINEAR
1082static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1083{
1084   unsigned char *data;
1085   #ifndef STBI_NO_HDR
1086   if (stbi__hdr_test(s)) {
1087      float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp);
1088      if (hdr_data)
1089         stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
1090      return hdr_data;
1091   }
1092   #endif
1093   data = stbi__load_flip(s, x, y, comp, req_comp);
1094   if (data)
1095      return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
1096   return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
1097}
1098
1099STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1100{
1101   stbi__context s;
1102   stbi__start_mem(&s,buffer,len);
1103   return stbi__loadf_main(&s,x,y,comp,req_comp);
1104}
1105
1106STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1107{
1108   stbi__context s;
1109   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1110   return stbi__loadf_main(&s,x,y,comp,req_comp);
1111}
1112
1113#ifndef STBI_NO_STDIO
1114STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1115{
1116   float *result;
1117   FILE *f = stbi__fopen(filename, "rb");
1118   if (!f) return stbi__errpf("can't fopen", "Unable to open file");
1119   result = stbi_loadf_from_file(f,x,y,comp,req_comp);
1120   fclose(f);
1121   return result;
1122}
1123
1124STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1125{
1126   stbi__context s;
1127   stbi__start_file(&s,f);
1128   return stbi__loadf_main(&s,x,y,comp,req_comp);
1129}
1130#endif // !STBI_NO_STDIO
1131
1132#endif // !STBI_NO_LINEAR
1133
1134// these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1135// defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1136// reports false!
1137
1138STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1139{
1140   #ifndef STBI_NO_HDR
1141   stbi__context s;
1142   stbi__start_mem(&s,buffer,len);
1143   return stbi__hdr_test(&s);
1144   #else
1145   STBI_NOTUSED(buffer);
1146   STBI_NOTUSED(len);
1147   return 0;
1148   #endif
1149}
1150
1151#ifndef STBI_NO_STDIO
1152STBIDEF int      stbi_is_hdr          (char const *filename)
1153{
1154   FILE *f = stbi__fopen(filename, "rb");
1155   int result=0;
1156   if (f) {
1157      result = stbi_is_hdr_from_file(f);
1158      fclose(f);
1159   }
1160   return result;
1161}
1162
1163STBIDEF int      stbi_is_hdr_from_file(FILE *f)
1164{
1165   #ifndef STBI_NO_HDR
1166   stbi__context s;
1167   stbi__start_file(&s,f);
1168   return stbi__hdr_test(&s);
1169   #else
1170   STBI_NOTUSED(f);
1171   return 0;
1172   #endif
1173}
1174#endif // !STBI_NO_STDIO
1175
1176STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1177{
1178   #ifndef STBI_NO_HDR
1179   stbi__context s;
1180   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1181   return stbi__hdr_test(&s);
1182   #else
1183   STBI_NOTUSED(clbk);
1184   STBI_NOTUSED(user);
1185   return 0;
1186   #endif
1187}
1188
1189static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
1190static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
1191
1192#ifndef STBI_NO_LINEAR
1193STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
1194STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1195#endif
1196
1197STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
1198STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
1199
1200
1201//////////////////////////////////////////////////////////////////////////////
1202//
1203// Common code used by all image loaders
1204//
1205
1206enum
1207{
1208   STBI__SCAN_load=0,
1209   STBI__SCAN_type,
1210   STBI__SCAN_header
1211};
1212
1213static void stbi__refill_buffer(stbi__context *s)
1214{
1215   int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
1216   if (n == 0) {
1217      // at end of file, treat same as if from memory, but need to handle case
1218      // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1219      s->read_from_callbacks = 0;
1220      s->img_buffer = s->buffer_start;
1221      s->img_buffer_end = s->buffer_start+1;
1222      *s->img_buffer = 0;
1223   } else {
1224      s->img_buffer = s->buffer_start;
1225      s->img_buffer_end = s->buffer_start + n;
1226   }
1227}
1228
1229stbi_inline static stbi_uc stbi__get8(stbi__context *s)
1230{
1231   if (s->img_buffer < s->img_buffer_end)
1232      return *s->img_buffer++;
1233   if (s->read_from_callbacks) {
1234      stbi__refill_buffer(s);
1235      return *s->img_buffer++;
1236   }
1237   return 0;
1238}
1239
1240stbi_inline static int stbi__at_eof(stbi__context *s)
1241{
1242   if (s->io.read) {
1243      if (!(s->io.eof)(s->io_user_data)) return 0;
1244      // if feof() is true, check if buffer = end
1245      // special case: we've only got the special 0 character at the end
1246      if (s->read_from_callbacks == 0) return 1;
1247   }
1248
1249   return s->img_buffer >= s->img_buffer_end;
1250}
1251
1252static void stbi__skip(stbi__context *s, int n)
1253{
1254   if (n < 0) {
1255      s->img_buffer = s->img_buffer_end;
1256      return;
1257   }
1258   if (s->io.read) {
1259      int blen = (int) (s->img_buffer_end - s->img_buffer);
1260      if (blen < n) {
1261         s->img_buffer = s->img_buffer_end;
1262         (s->io.skip)(s->io_user_data, n - blen);
1263         return;
1264      }
1265   }
1266   s->img_buffer += n;
1267}
1268
1269static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1270{
1271   if (s->io.read) {
1272      int blen = (int) (s->img_buffer_end - s->img_buffer);
1273      if (blen < n) {
1274         int res, count;
1275
1276         memcpy(buffer, s->img_buffer, blen);
1277
1278         count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
1279         res = (count == (n-blen));
1280         s->img_buffer = s->img_buffer_end;
1281         return res;
1282      }
1283   }
1284
1285   if (s->img_buffer+n <= s->img_buffer_end) {
1286      memcpy(buffer, s->img_buffer, n);
1287      s->img_buffer += n;
1288      return 1;
1289   } else
1290      return 0;
1291}
1292
1293static int stbi__get16be(stbi__context *s)
1294{
1295   int z = stbi__get8(s);
1296   return (z << 8) + stbi__get8(s);
1297}
1298
1299static stbi__uint32 stbi__get32be(stbi__context *s)
1300{
1301   stbi__uint32 z = stbi__get16be(s);
1302   return (z << 16) + stbi__get16be(s);
1303}
1304
1305#if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
1306// nothing
1307#else
1308static int stbi__get16le(stbi__context *s)
1309{
1310   int z = stbi__get8(s);
1311   return z + (stbi__get8(s) << 8);
1312}
1313#endif
1314
1315#ifndef STBI_NO_BMP
1316static stbi__uint32 stbi__get32le(stbi__context *s)
1317{
1318   stbi__uint32 z = stbi__get16le(s);
1319   return z + (stbi__get16le(s) << 16);
1320}
1321#endif
1322
1323#define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
1324
1325
1326//////////////////////////////////////////////////////////////////////////////
1327//
1328//  generic converter from built-in img_n to req_comp
1329//    individual types do this automatically as much as possible (e.g. jpeg
1330//    does all cases internally since it needs to colorspace convert anyway,
1331//    and it never has alpha, so very few cases ). png can automatically
1332//    interleave an alpha=255 channel, but falls back to this for other cases
1333//
1334//  assume data buffer is malloced, so malloc a new one and free that one
1335//  only failure mode is malloc failing
1336
1337static stbi_uc stbi__compute_y(int r, int g, int b)
1338{
1339   return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
1340}
1341
1342static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1343{
1344   int i,j;
1345   unsigned char *good;
1346
1347   if (req_comp == img_n) return data;
1348   STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1349
1350   good = (unsigned char *) stbi__malloc(req_comp * x * y);
1351   if (good == NULL) {
1352      STBI_FREE(data);
1353      return stbi__errpuc("outofmem", "Out of memory");
1354   }
1355
1356   for (j=0; j < (int) y; ++j) {
1357      unsigned char *src  = data + j * x * img_n   ;
1358      unsigned char *dest = good + j * x * req_comp;
1359
1360      #define COMBO(a,b)  ((a)*8+(b))
1361      #define CASE(a,b)   case COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1362      // convert source image with img_n components to one with req_comp components;
1363      // avoid switch per pixel, so use switch per scanline and massive macros
1364      switch (COMBO(img_n, req_comp)) {
1365         CASE(1,2) dest[0]=src[0], dest[1]=255; break;
1366         CASE(1,3) dest[0]=dest[1]=dest[2]=src[0]; break;
1367         CASE(1,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=255; break;
1368         CASE(2,1) dest[0]=src[0]; break;
1369         CASE(2,3) dest[0]=dest[1]=dest[2]=src[0]; break;
1370         CASE(2,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1]; break;
1371         CASE(3,4) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=255; break;
1372         CASE(3,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
1373         CASE(3,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = 255; break;
1374         CASE(4,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
1375         CASE(4,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = src[3]; break;
1376         CASE(4,3) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2]; break;
1377         default: STBI_ASSERT(0);
1378      }
1379      #undef CASE
1380   }
1381
1382   STBI_FREE(data);
1383   return good;
1384}
1385
1386#ifndef STBI_NO_LINEAR
1387static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
1388{
1389   int i,k,n;
1390   float *output = (float *) stbi__malloc(x * y * comp * sizeof(float));
1391   if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
1392   // compute number of non-alpha components
1393   if (comp & 1) n = comp; else n = comp-1;
1394   for (i=0; i < x*y; ++i) {
1395      for (k=0; k < n; ++k) {
1396         output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1397      }
1398      if (k < comp) output[i*comp + k] = data[i*comp+k]/255.0f;
1399   }
1400   STBI_FREE(data);
1401   return output;
1402}
1403#endif
1404
1405#ifndef STBI_NO_HDR
1406#define stbi__float2int(x)   ((int) (x))
1407static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
1408{
1409   int i,k,n;
1410   stbi_uc *output = (stbi_uc *) stbi__malloc(x * y * comp);
1411   if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
1412   // compute number of non-alpha components
1413   if (comp & 1) n = comp; else n = comp-1;
1414   for (i=0; i < x*y; ++i) {
1415      for (k=0; k < n; ++k) {
1416         float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1417         if (z < 0) z = 0;
1418         if (z > 255) z = 255;
1419         output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1420      }
1421      if (k < comp) {
1422         float z = data[i*comp+k] * 255 + 0.5f;
1423         if (z < 0) z = 0;
1424         if (z > 255) z = 255;
1425         output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1426      }
1427   }
1428   STBI_FREE(data);
1429   return output;
1430}
1431#endif
1432
1433//////////////////////////////////////////////////////////////////////////////
1434//
1435//  "baseline" JPEG/JFIF decoder
1436//
1437//    simple implementation
1438//      - doesn't support delayed output of y-dimension
1439//      - simple interface (only one output format: 8-bit interleaved RGB)
1440//      - doesn't try to recover corrupt jpegs
1441//      - doesn't allow partial loading, loading multiple at once
1442//      - still fast on x86 (copying globals into locals doesn't help x86)
1443//      - allocates lots of intermediate memory (full size of all components)
1444//        - non-interleaved case requires this anyway
1445//        - allows good upsampling (see next)
1446//    high-quality
1447//      - upsampled channels are bilinearly interpolated, even across blocks
1448//      - quality integer IDCT derived from IJG's 'slow'
1449//    performance
1450//      - fast huffman; reasonable integer IDCT
1451//      - some SIMD kernels for common paths on targets with SSE2/NEON
1452//      - uses a lot of intermediate memory, could cache poorly
1453
1454#ifndef STBI_NO_JPEG
1455
1456// huffman decoding acceleration
1457#define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
1458
1459typedef struct
1460{
1461   stbi_uc  fast[1 << FAST_BITS];
1462   // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1463   stbi__uint16 code[256];
1464   stbi_uc  values[256];
1465   stbi_uc  size[257];
1466   unsigned int maxcode[18];
1467   int    delta[17];   // old 'firstsymbol' - old 'firstcode'
1468} stbi__huffman;
1469
1470typedef struct
1471{
1472   stbi__context *s;
1473   stbi__huffman huff_dc[4];
1474   stbi__huffman huff_ac[4];
1475   stbi_uc dequant[4][64];
1476   stbi__int16 fast_ac[4][1 << FAST_BITS];
1477
1478// sizes for components, interleaved MCUs
1479   int img_h_max, img_v_max;
1480   int img_mcu_x, img_mcu_y;
1481   int img_mcu_w, img_mcu_h;
1482
1483// definition of jpeg image component
1484   struct
1485   {
1486      int id;
1487      int h,v;
1488      int tq;
1489      int hd,ha;
1490      int dc_pred;
1491
1492      int x,y,w2,h2;
1493      stbi_uc *data;
1494      void *raw_data, *raw_coeff;
1495      stbi_uc *linebuf;
1496      short   *coeff;   // progressive only
1497      int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
1498   } img_comp[4];
1499
1500   stbi__uint32   code_buffer; // jpeg entropy-coded buffer
1501   int            code_bits;   // number of valid bits
1502   unsigned char  marker;      // marker seen while filling entropy buffer
1503   int            nomore;      // flag if we saw a marker so must stop
1504
1505   int            progressive;
1506   int            spec_start;
1507   int            spec_end;
1508   int            succ_high;
1509   int            succ_low;
1510   int            eob_run;
1511
1512   int scan_n, order[4];
1513   int restart_interval, todo;
1514
1515// kernels
1516   void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1517   void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
1518   stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
1519} stbi__jpeg;
1520
1521static int stbi__build_huffman(stbi__huffman *h, int *count)
1522{
1523   int i,j,k=0,code;
1524   // build size list for each symbol (from JPEG spec)
1525   for (i=0; i < 16; ++i)
1526      for (j=0; j < count[i]; ++j)
1527         h->size[k++] = (stbi_uc) (i+1);
1528   h->size[k] = 0;
1529
1530   // compute actual symbols (from jpeg spec)
1531   code = 0;
1532   k = 0;
1533   for(j=1; j <= 16; ++j) {
1534      // compute delta to add to code to compute symbol id
1535      h->delta[j] = k - code;
1536      if (h->size[k] == j) {
1537         while (h->size[k] == j)
1538            h->code[k++] = (stbi__uint16) (code++);
1539         if (code-1 >= (1 << j)) return stbi__err("bad code lengths","Corrupt JPEG");
1540      }
1541      // compute largest code + 1 for this size, preshifted as needed later
1542      h->maxcode[j] = code << (16-j);
1543      code <<= 1;
1544   }
1545   h->maxcode[j] = 0xffffffff;
1546
1547   // build non-spec acceleration table; 255 is flag for not-accelerated
1548   memset(h->fast, 255, 1 << FAST_BITS);
1549   for (i=0; i < k; ++i) {
1550      int s = h->size[i];
1551      if (s <= FAST_BITS) {
1552         int c = h->code[i] << (FAST_BITS-s);
1553         int m = 1 << (FAST_BITS-s);
1554         for (j=0; j < m; ++j) {
1555            h->fast[c+j] = (stbi_uc) i;
1556         }
1557      }
1558   }
1559   return 1;
1560}
1561
1562// build a table that decodes both magnitude and value of small ACs in
1563// one go.
1564static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
1565{
1566   int i;
1567   for (i=0; i < (1 << FAST_BITS); ++i) {
1568      stbi_uc fast = h->fast[i];
1569      fast_ac[i] = 0;
1570      if (fast < 255) {
1571         int rs = h->values[fast];
1572         int run = (rs >> 4) & 15;
1573         int magbits = rs & 15;
1574         int len = h->size[fast];
1575
1576         if (magbits && len + magbits <= FAST_BITS) {
1577            // magnitude code followed by receive_extend code
1578            int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
1579            int m = 1 << (magbits - 1);
1580            if (k < m) k += (-1 << magbits) + 1;
1581            // if the result is small enough, we can fit it in fast_ac table
1582            if (k >= -128 && k <= 127)
1583               fast_ac[i] = (stbi__int16) ((k << 8) + (run << 4) + (len + magbits));
1584         }
1585      }
1586   }
1587}
1588
1589static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
1590{
1591   do {
1592      int b = j->nomore ? 0 : stbi__get8(j->s);
1593      if (b == 0xff) {
1594         int c = stbi__get8(j->s);
1595         if (c != 0) {
1596            j->marker = (unsigned char) c;
1597            j->nomore = 1;
1598            return;
1599         }
1600      }
1601      j->code_buffer |= b << (24 - j->code_bits);
1602      j->code_bits += 8;
1603   } while (j->code_bits <= 24);
1604}
1605
1606// (1 << n) - 1
1607static stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
1608
1609// decode a jpeg huffman value from the bitstream
1610stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
1611{
1612   unsigned int temp;
1613   int c,k;
1614
1615   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1616
1617   // look at the top FAST_BITS and determine what symbol ID it is,
1618   // if the code is <= FAST_BITS
1619   c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1620   k = h->fast[c];
1621   if (k < 255) {
1622      int s = h->size[k];
1623      if (s > j->code_bits)
1624         return -1;
1625      j->code_buffer <<= s;
1626      j->code_bits -= s;
1627      return h->values[k];
1628   }
1629
1630   // naive test is to shift the code_buffer down so k bits are
1631   // valid, then test against maxcode. To speed this up, we've
1632   // preshifted maxcode left so that it has (16-k) 0s at the
1633   // end; in other words, regardless of the number of bits, it
1634   // wants to be compared against something shifted to have 16;
1635   // that way we don't need to shift inside the loop.
1636   temp = j->code_buffer >> 16;
1637   for (k=FAST_BITS+1 ; ; ++k)
1638      if (temp < h->maxcode[k])
1639         break;
1640   if (k == 17) {
1641      // error! code not found
1642      j->code_bits -= 16;
1643      return -1;
1644   }
1645
1646   if (k > j->code_bits)
1647      return -1;
1648
1649   // convert the huffman code to the symbol id
1650   c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
1651   STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
1652
1653   // convert the id to a symbol
1654   j->code_bits -= k;
1655   j->code_buffer <<= k;
1656   return h->values[c];
1657}
1658
1659// bias[n] = (-1<<n) + 1
1660static int const stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
1661
1662// combined JPEG 'receive' and JPEG 'extend', since baseline
1663// always extends everything it receives.
1664stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
1665{
1666   unsigned int k;
1667   int sgn;
1668   if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1669
1670   sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB
1671   k = stbi_lrot(j->code_buffer, n);
1672   STBI_ASSERT(n >= 0 && n < (int) (sizeof(stbi__bmask)/sizeof(*stbi__bmask)));
1673   j->code_buffer = k & ~stbi__bmask[n];
1674   k &= stbi__bmask[n];
1675   j->code_bits -= n;
1676   return k + (stbi__jbias[n] & ~sgn);
1677}
1678
1679// get some unsigned bits
1680stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
1681{
1682   unsigned int k;
1683   if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1684   k = stbi_lrot(j->code_buffer, n);
1685   j->code_buffer = k & ~stbi__bmask[n];
1686   k &= stbi__bmask[n];
1687   j->code_bits -= n;
1688   return k;
1689}
1690
1691stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
1692{
1693   unsigned int k;
1694   if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
1695   k = j->code_buffer;
1696   j->code_buffer <<= 1;
1697   --j->code_bits;
1698   return k & 0x80000000;
1699}
1700
1701// given a value that's at position X in the zigzag stream,
1702// where does it appear in the 8x8 matrix coded as row-major?
1703static stbi_uc stbi__jpeg_dezigzag[64+15] =
1704{
1705    0,  1,  8, 16,  9,  2,  3, 10,
1706   17, 24, 32, 25, 18, 11,  4,  5,
1707   12, 19, 26, 33, 40, 48, 41, 34,
1708   27, 20, 13,  6,  7, 14, 21, 28,
1709   35, 42, 49, 56, 57, 50, 43, 36,
1710   29, 22, 15, 23, 30, 37, 44, 51,
1711   58, 59, 52, 45, 38, 31, 39, 46,
1712   53, 60, 61, 54, 47, 55, 62, 63,
1713   // let corrupt input sample past end
1714   63, 63, 63, 63, 63, 63, 63, 63,
1715   63, 63, 63, 63, 63, 63, 63
1716};
1717
1718// decode one 64-entry block--
1719static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi_uc *dequant)
1720{
1721   int diff,dc,k;
1722   int t;
1723
1724   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1725   t = stbi__jpeg_huff_decode(j, hdc);
1726   if (t < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1727
1728   // 0 all the ac values now so we can do it 32-bits at a time
1729   memset(data,0,64*sizeof(data[0]));
1730
1731   diff = t ? stbi__extend_receive(j, t) : 0;
1732   dc = j->img_comp[b].dc_pred + diff;
1733   j->img_comp[b].dc_pred = dc;
1734   data[0] = (short) (dc * dequant[0]);
1735
1736   // decode AC components, see JPEG spec
1737   k = 1;
1738   do {
1739      unsigned int zig;
1740      int c,r,s;
1741      if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1742      c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1743      r = fac[c];
1744      if (r) { // fast-AC path
1745         k += (r >> 4) & 15; // run
1746         s = r & 15; // combined length
1747         j->code_buffer <<= s;
1748         j->code_bits -= s;
1749         // decode into unzigzag'd location
1750         zig = stbi__jpeg_dezigzag[k++];
1751         data[zig] = (short) ((r >> 8) * dequant[zig]);
1752      } else {
1753         int rs = stbi__jpeg_huff_decode(j, hac);
1754         if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1755         s = rs & 15;
1756         r = rs >> 4;
1757         if (s == 0) {
1758            if (rs != 0xf0) break; // end block
1759            k += 16;
1760         } else {
1761            k += r;
1762            // decode into unzigzag'd location
1763            zig = stbi__jpeg_dezigzag[k++];
1764            data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
1765         }
1766      }
1767   } while (k < 64);
1768   return 1;
1769}
1770
1771static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
1772{
1773   int diff,dc;
1774   int t;
1775   if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1776
1777   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1778
1779   if (j->succ_high == 0) {
1780      // first scan for DC coefficient, must be first
1781      memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
1782      t = stbi__jpeg_huff_decode(j, hdc);
1783      diff = t ? stbi__extend_receive(j, t) : 0;
1784
1785      dc = j->img_comp[b].dc_pred + diff;
1786      j->img_comp[b].dc_pred = dc;
1787      data[0] = (short) (dc << j->succ_low);
1788   } else {
1789      // refinement scan for DC coefficient
1790      if (stbi__jpeg_get_bit(j))
1791         data[0] += (short) (1 << j->succ_low);
1792   }
1793   return 1;
1794}
1795
1796// @OPTIMIZE: store non-zigzagged during the decode passes,
1797// and only de-zigzag when dequantizing
1798static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
1799{
1800   int k;
1801   if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1802
1803   if (j->succ_high == 0) {
1804      int shift = j->succ_low;
1805
1806      if (j->eob_run) {
1807         --j->eob_run;
1808         return 1;
1809      }
1810
1811      k = j->spec_start;
1812      do {
1813         unsigned int zig;
1814         int c,r,s;
1815         if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1816         c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1817         r = fac[c];
1818         if (r) { // fast-AC path
1819            k += (r >> 4) & 15; // run
1820            s = r & 15; // combined length
1821            j->code_buffer <<= s;
1822            j->code_bits -= s;
1823            zig = stbi__jpeg_dezigzag[k++];
1824            data[zig] = (short) ((r >> 8) << shift);
1825         } else {
1826            int rs = stbi__jpeg_huff_decode(j, hac);
1827            if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1828            s = rs & 15;
1829            r = rs >> 4;
1830            if (s == 0) {
1831               if (r < 15) {
1832                  j->eob_run = (1 << r);
1833                  if (r)
1834                     j->eob_run += stbi__jpeg_get_bits(j, r);
1835                  --j->eob_run;
1836                  break;
1837               }
1838               k += 16;
1839            } else {
1840               k += r;
1841               zig = stbi__jpeg_dezigzag[k++];
1842               data[zig] = (short) (stbi__extend_receive(j,s) << shift);
1843            }
1844         }
1845      } while (k <= j->spec_end);
1846   } else {
1847      // refinement scan for these AC coefficients
1848
1849      short bit = (short) (1 << j->succ_low);
1850
1851      if (j->eob_run) {
1852         --j->eob_run;
1853         for (k = j->spec_start; k <= j->spec_end; ++k) {
1854            short *p = &data[stbi__jpeg_dezigzag[k]];
1855            if (*p != 0)
1856               if (stbi__jpeg_get_bit(j))
1857                  if ((*p & bit)==0) {
1858                     if (*p > 0)
1859                        *p += bit;
1860                     else
1861                        *p -= bit;
1862                  }
1863         }
1864      } else {
1865         k = j->spec_start;
1866         do {
1867            int r,s;
1868            int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
1869            if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1870            s = rs & 15;
1871            r = rs >> 4;
1872            if (s == 0) {
1873               if (r < 15) {
1874                  j->eob_run = (1 << r) - 1;
1875                  if (r)
1876                     j->eob_run += stbi__jpeg_get_bits(j, r);
1877                  r = 64; // force end of block
1878               } else {
1879                  // r=15 s=0 should write 16 0s, so we just do
1880                  // a run of 15 0s and then write s (which is 0),
1881                  // so we don't have to do anything special here
1882               }
1883            } else {
1884               if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
1885               // sign bit
1886               if (stbi__jpeg_get_bit(j))
1887                  s = bit;
1888               else
1889                  s = -bit;
1890            }
1891
1892            // advance by r
1893            while (k <= j->spec_end) {
1894               short *p = &data[stbi__jpeg_dezigzag[k++]];
1895               if (*p != 0) {
1896                  if (stbi__jpeg_get_bit(j))
1897                     if ((*p & bit)==0) {
1898                        if (*p > 0)
1899                           *p += bit;
1900                        else
1901                           *p -= bit;
1902                     }
1903               } else {
1904                  if (r == 0) {
1905                     *p = (short) s;
1906                     break;
1907                  }
1908                  --r;
1909               }
1910            }
1911         } while (k <= j->spec_end);
1912      }
1913   }
1914   return 1;
1915}
1916
1917// take a -128..127 value and stbi__clamp it and convert to 0..255
1918stbi_inline static stbi_uc stbi__clamp(int x)
1919{
1920   // trick to use a single test to catch both cases
1921   if ((unsigned int) x > 255) {
1922      if (x < 0) return 0;
1923      if (x > 255) return 255;
1924   }
1925   return (stbi_uc) x;
1926}
1927
1928#define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
1929#define stbi__fsh(x)  ((x) << 12)
1930
1931// derived from jidctint -- DCT_ISLOW
1932#define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
1933   int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
1934   p2 = s2;                                    \
1935   p3 = s6;                                    \
1936   p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
1937   t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
1938   t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
1939   p2 = s0;                                    \
1940   p3 = s4;                                    \
1941   t0 = stbi__fsh(p2+p3);                      \
1942   t1 = stbi__fsh(p2-p3);                      \
1943   x0 = t0+t3;                                 \
1944   x3 = t0-t3;                                 \
1945   x1 = t1+t2;                                 \
1946   x2 = t1-t2;                                 \
1947   t0 = s7;                                    \
1948   t1 = s5;                                    \
1949   t2 = s3;                                    \
1950   t3 = s1;                                    \
1951   p3 = t0+t2;                                 \
1952   p4 = t1+t3;                                 \
1953   p1 = t0+t3;                                 \
1954   p2 = t1+t2;                                 \
1955   p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
1956   t0 = t0*stbi__f2f( 0.298631336f);           \
1957   t1 = t1*stbi__f2f( 2.053119869f);           \
1958   t2 = t2*stbi__f2f( 3.072711026f);           \
1959   t3 = t3*stbi__f2f( 1.501321110f);           \
1960   p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
1961   p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
1962   p3 = p3*stbi__f2f(-1.961570560f);           \
1963   p4 = p4*stbi__f2f(-0.390180644f);           \
1964   t3 += p1+p4;                                \
1965   t2 += p2+p3;                                \
1966   t1 += p2+p4;                                \
1967   t0 += p1+p3;
1968
1969static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
1970{
1971   int i,val[64],*v=val;
1972   stbi_uc *o;
1973   short *d = data;
1974
1975   // columns
1976   for (i=0; i < 8; ++i,++d, ++v) {
1977      // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
1978      if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
1979           && d[40]==0 && d[48]==0 && d[56]==0) {
1980         //    no shortcut                 0     seconds
1981         //    (1|2|3|4|5|6|7)==0          0     seconds
1982         //    all separate               -0.047 seconds
1983         //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
1984         int dcterm = d[0] << 2;
1985         v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
1986      } else {
1987         STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
1988         // constants scaled things up by 1<<12; let's bring them back
1989         // down, but keep 2 extra bits of precision
1990         x0 += 512; x1 += 512; x2 += 512; x3 += 512;
1991         v[ 0] = (x0+t3) >> 10;
1992         v[56] = (x0-t3) >> 10;
1993         v[ 8] = (x1+t2) >> 10;
1994         v[48] = (x1-t2) >> 10;
1995         v[16] = (x2+t1) >> 10;
1996         v[40] = (x2-t1) >> 10;
1997         v[24] = (x3+t0) >> 10;
1998         v[32] = (x3-t0) >> 10;
1999      }
2000   }
2001
2002   for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
2003      // no fast case since the first 1D IDCT spread components out
2004      STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
2005      // constants scaled things up by 1<<12, plus we had 1<<2 from first
2006      // loop, plus horizontal and vertical each scale by sqrt(8) so together
2007      // we've got an extra 1<<3, so 1<<17 total we need to remove.
2008      // so we want to round that, which means adding 0.5 * 1<<17,
2009      // aka 65536. Also, we'll end up with -128 to 127 that we want
2010      // to encode as 0..255 by adding 128, so we'll add that before the shift
2011      x0 += 65536 + (128<<17);
2012      x1 += 65536 + (128<<17);
2013      x2 += 65536 + (128<<17);
2014      x3 += 65536 + (128<<17);
2015      // tried computing the shifts into temps, or'ing the temps to see
2016      // if any were out of range, but that was slower
2017      o[0] = stbi__clamp((x0+t3) >> 17);
2018      o[7] = stbi__clamp((x0-t3) >> 17);
2019      o[1] = stbi__clamp((x1+t2) >> 17);
2020      o[6] = stbi__clamp((x1-t2) >> 17);
2021      o[2] = stbi__clamp((x2+t1) >> 17);
2022      o[5] = stbi__clamp((x2-t1) >> 17);
2023      o[3] = stbi__clamp((x3+t0) >> 17);
2024      o[4] = stbi__clamp((x3-t0) >> 17);
2025   }
2026}
2027
2028#ifdef STBI_SSE2
2029// sse2 integer IDCT. not the fastest possible implementation but it
2030// produces bit-identical results to the generic C version so it's
2031// fully "transparent".
2032static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2033{
2034   // This is constructed to match our regular (generic) integer IDCT exactly.
2035   __m128i row0, row1, row2, row3, row4, row5, row6, row7;
2036   __m128i tmp;
2037
2038   // dot product constant: even elems=x, odd elems=y
2039   #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
2040
2041   // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
2042   // out(1) = c1[even]*x + c1[odd]*y
2043   #define dct_rot(out0,out1, x,y,c0,c1) \
2044      __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
2045      __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
2046      __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
2047      __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
2048      __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
2049      __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
2050
2051   // out = in << 12  (in 16-bit, out 32-bit)
2052   #define dct_widen(out, in) \
2053      __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
2054      __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
2055
2056   // wide add
2057   #define dct_wadd(out, a, b) \
2058      __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
2059      __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
2060
2061   // wide sub
2062   #define dct_wsub(out, a, b) \
2063      __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
2064      __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
2065
2066   // butterfly a/b, add bias, then shift by "s" and pack
2067   #define dct_bfly32o(out0, out1, a,b,bias,s) \
2068      { \
2069         __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
2070         __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
2071         dct_wadd(sum, abiased, b); \
2072         dct_wsub(dif, abiased, b); \
2073         out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
2074         out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
2075      }
2076
2077   // 8-bit interleave step (for transposes)
2078   #define dct_interleave8(a, b) \
2079      tmp = a; \
2080      a = _mm_unpacklo_epi8(a, b); \
2081      b = _mm_unpackhi_epi8(tmp, b)
2082
2083   // 16-bit interleave step (for transposes)
2084   #define dct_interleave16(a, b) \
2085      tmp = a; \
2086      a = _mm_unpacklo_epi16(a, b); \
2087      b = _mm_unpackhi_epi16(tmp, b)
2088
2089   #define dct_pass(bias,shift) \
2090      { \
2091         /* even part */ \
2092         dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
2093         __m128i sum04 = _mm_add_epi16(row0, row4); \
2094         __m128i dif04 = _mm_sub_epi16(row0, row4); \
2095         dct_widen(t0e, sum04); \
2096         dct_widen(t1e, dif04); \
2097         dct_wadd(x0, t0e, t3e); \
2098         dct_wsub(x3, t0e, t3e); \
2099         dct_wadd(x1, t1e, t2e); \
2100         dct_wsub(x2, t1e, t2e); \
2101         /* odd part */ \
2102         dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
2103         dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
2104         __m128i sum17 = _mm_add_epi16(row1, row7); \
2105         __m128i sum35 = _mm_add_epi16(row3, row5); \
2106         dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
2107         dct_wadd(x4, y0o, y4o); \
2108         dct_wadd(x5, y1o, y5o); \
2109         dct_wadd(x6, y2o, y5o); \
2110         dct_wadd(x7, y3o, y4o); \
2111         dct_bfly32o(row0,row7, x0,x7,bias,shift); \
2112         dct_bfly32o(row1,row6, x1,x6,bias,shift); \
2113         dct_bfly32o(row2,row5, x2,x5,bias,shift); \
2114         dct_bfly32o(row3,row4, x3,x4,bias,shift); \
2115      }
2116
2117   __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2118   __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
2119   __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2120   __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2121   __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
2122   __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
2123   __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
2124   __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
2125
2126   // rounding biases in column/row passes, see stbi__idct_block for explanation.
2127   __m128i bias_0 = _mm_set1_epi32(512);
2128   __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
2129
2130   // load
2131   row0 = _mm_load_si128((const __m128i *) (data + 0*8));
2132   row1 = _mm_load_si128((const __m128i *) (data + 1*8));
2133   row2 = _mm_load_si128((const __m128i *) (data + 2*8));
2134   row3 = _mm_load_si128((const __m128i *) (data + 3*8));
2135   row4 = _mm_load_si128((const __m128i *) (data + 4*8));
2136   row5 = _mm_load_si128((const __m128i *) (data + 5*8));
2137   row6 = _mm_load_si128((const __m128i *) (data + 6*8));
2138   row7 = _mm_load_si128((const __m128i *) (data + 7*8));
2139
2140   // column pass
2141   dct_pass(bias_0, 10);
2142
2143   {
2144      // 16bit 8x8 transpose pass 1
2145      dct_interleave16(row0, row4);
2146      dct_interleave16(row1, row5);
2147      dct_interleave16(row2, row6);
2148      dct_interleave16(row3, row7);
2149
2150      // transpose pass 2
2151      dct_interleave16(row0, row2);
2152      dct_interleave16(row1, row3);
2153      dct_interleave16(row4, row6);
2154      dct_interleave16(row5, row7);
2155
2156      // transpose pass 3
2157      dct_interleave16(row0, row1);
2158      dct_interleave16(row2, row3);
2159      dct_interleave16(row4, row5);
2160      dct_interleave16(row6, row7);
2161   }
2162
2163   // row pass
2164   dct_pass(bias_1, 17);
2165
2166   {
2167      // pack
2168      __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2169      __m128i p1 = _mm_packus_epi16(row2, row3);
2170      __m128i p2 = _mm_packus_epi16(row4, row5);
2171      __m128i p3 = _mm_packus_epi16(row6, row7);
2172
2173      // 8bit 8x8 transpose pass 1
2174      dct_interleave8(p0, p2); // a0e0a1e1...
2175      dct_interleave8(p1, p3); // c0g0c1g1...
2176
2177      // transpose pass 2
2178      dct_interleave8(p0, p1); // a0c0e0g0...
2179      dct_interleave8(p2, p3); // b0d0f0h0...
2180
2181      // transpose pass 3
2182      dct_interleave8(p0, p2); // a0b0c0d0...
2183      dct_interleave8(p1, p3); // a4b4c4d4...
2184
2185      // store
2186      _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
2187      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
2188      _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
2189      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
2190      _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
2191      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
2192      _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
2193      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
2194   }
2195
2196#undef dct_const
2197#undef dct_rot
2198#undef dct_widen
2199#undef dct_wadd
2200#undef dct_wsub
2201#undef dct_bfly32o
2202#undef dct_interleave8
2203#undef dct_interleave16
2204#undef dct_pass
2205}
2206
2207#endif // STBI_SSE2
2208
2209#ifdef STBI_NEON
2210
2211// NEON integer IDCT. should produce bit-identical
2212// results to the generic C version.
2213static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2214{
2215   int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2216
2217   int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2218   int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2219   int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
2220   int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
2221   int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2222   int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2223   int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2224   int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2225   int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
2226   int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
2227   int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
2228   int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
2229
2230#define dct_long_mul(out, inq, coeff) \
2231   int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2232   int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2233
2234#define dct_long_mac(out, acc, inq, coeff) \
2235   int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2236   int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2237
2238#define dct_widen(out, inq) \
2239   int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2240   int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2241
2242// wide add
2243#define dct_wadd(out, a, b) \
2244   int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2245   int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2246
2247// wide sub
2248#define dct_wsub(out, a, b) \
2249   int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2250   int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2251
2252// butterfly a/b, then shift using "shiftop" by "s" and pack
2253#define dct_bfly32o(out0,out1, a,b,shiftop,s) \
2254   { \
2255      dct_wadd(sum, a, b); \
2256      dct_wsub(dif, a, b); \
2257      out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2258      out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2259   }
2260
2261#define dct_pass(shiftop, shift) \
2262   { \
2263      /* even part */ \
2264      int16x8_t sum26 = vaddq_s16(row2, row6); \
2265      dct_long_mul(p1e, sum26, rot0_0); \
2266      dct_long_mac(t2e, p1e, row6, rot0_1); \
2267      dct_long_mac(t3e, p1e, row2, rot0_2); \
2268      int16x8_t sum04 = vaddq_s16(row0, row4); \
2269      int16x8_t dif04 = vsubq_s16(row0, row4); \
2270      dct_widen(t0e, sum04); \
2271      dct_widen(t1e, dif04); \
2272      dct_wadd(x0, t0e, t3e); \
2273      dct_wsub(x3, t0e, t3e); \
2274      dct_wadd(x1, t1e, t2e); \
2275      dct_wsub(x2, t1e, t2e); \
2276      /* odd part */ \
2277      int16x8_t sum15 = vaddq_s16(row1, row5); \
2278      int16x8_t sum17 = vaddq_s16(row1, row7); \
2279      int16x8_t sum35 = vaddq_s16(row3, row5); \
2280      int16x8_t sum37 = vaddq_s16(row3, row7); \
2281      int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2282      dct_long_mul(p5o, sumodd, rot1_0); \
2283      dct_long_mac(p1o, p5o, sum17, rot1_1); \
2284      dct_long_mac(p2o, p5o, sum35, rot1_2); \
2285      dct_long_mul(p3o, sum37, rot2_0); \
2286      dct_long_mul(p4o, sum15, rot2_1); \
2287      dct_wadd(sump13o, p1o, p3o); \
2288      dct_wadd(sump24o, p2o, p4o); \
2289      dct_wadd(sump23o, p2o, p3o); \
2290      dct_wadd(sump14o, p1o, p4o); \
2291      dct_long_mac(x4, sump13o, row7, rot3_0); \
2292      dct_long_mac(x5, sump24o, row5, rot3_1); \
2293      dct_long_mac(x6, sump23o, row3, rot3_2); \
2294      dct_long_mac(x7, sump14o, row1, rot3_3); \
2295      dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
2296      dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
2297      dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
2298      dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
2299   }
2300
2301   // load
2302   row0 = vld1q_s16(data + 0*8);
2303   row1 = vld1q_s16(data + 1*8);
2304   row2 = vld1q_s16(data + 2*8);
2305   row3 = vld1q_s16(data + 3*8);
2306   row4 = vld1q_s16(data + 4*8);
2307   row5 = vld1q_s16(data + 5*8);
2308   row6 = vld1q_s16(data + 6*8);
2309   row7 = vld1q_s16(data + 7*8);
2310
2311   // add DC bias
2312   row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2313
2314   // column pass
2315   dct_pass(vrshrn_n_s32, 10);
2316
2317   // 16bit 8x8 transpose
2318   {
2319// these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2320// whether compilers actually get this is another story, sadly.
2321#define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
2322#define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
2323#define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
2324
2325      // pass 1
2326      dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2327      dct_trn16(row2, row3);
2328      dct_trn16(row4, row5);
2329      dct_trn16(row6, row7);
2330
2331      // pass 2
2332      dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2333      dct_trn32(row1, row3);
2334      dct_trn32(row4, row6);
2335      dct_trn32(row5, row7);
2336
2337      // pass 3
2338      dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2339      dct_trn64(row1, row5);
2340      dct_trn64(row2, row6);
2341      dct_trn64(row3, row7);
2342
2343#undef dct_trn16
2344#undef dct_trn32
2345#undef dct_trn64
2346   }
2347
2348   // row pass
2349   // vrshrn_n_s32 only supports shifts up to 16, we need
2350   // 17. so do a non-rounding shift of 16 first then follow
2351   // up with a rounding shift by 1.
2352   dct_pass(vshrn_n_s32, 16);
2353
2354   {
2355      // pack and round
2356      uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2357      uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2358      uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2359      uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2360      uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2361      uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2362      uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2363      uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2364
2365      // again, these can translate into one instruction, but often don't.
2366#define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
2367#define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
2368#define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
2369
2370      // sadly can't use interleaved stores here since we only write
2371      // 8 bytes to each scan line!
2372
2373      // 8x8 8-bit transpose pass 1
2374      dct_trn8_8(p0, p1);
2375      dct_trn8_8(p2, p3);
2376      dct_trn8_8(p4, p5);
2377      dct_trn8_8(p6, p7);
2378
2379      // pass 2
2380      dct_trn8_16(p0, p2);
2381      dct_trn8_16(p1, p3);
2382      dct_trn8_16(p4, p6);
2383      dct_trn8_16(p5, p7);
2384
2385      // pass 3
2386      dct_trn8_32(p0, p4);
2387      dct_trn8_32(p1, p5);
2388      dct_trn8_32(p2, p6);
2389      dct_trn8_32(p3, p7);
2390
2391      // store
2392      vst1_u8(out, p0); out += out_stride;
2393      vst1_u8(out, p1); out += out_stride;
2394      vst1_u8(out, p2); out += out_stride;
2395      vst1_u8(out, p3); out += out_stride;
2396      vst1_u8(out, p4); out += out_stride;
2397      vst1_u8(out, p5); out += out_stride;
2398      vst1_u8(out, p6); out += out_stride;
2399      vst1_u8(out, p7);
2400
2401#undef dct_trn8_8
2402#undef dct_trn8_16
2403#undef dct_trn8_32
2404   }
2405
2406#undef dct_long_mul
2407#undef dct_long_mac
2408#undef dct_widen
2409#undef dct_wadd
2410#undef dct_wsub
2411#undef dct_bfly32o
2412#undef dct_pass
2413}
2414
2415#endif // STBI_NEON
2416
2417#define STBI__MARKER_none  0xff
2418// if there's a pending marker from the entropy stream, return that
2419// otherwise, fetch from the stream and get a marker. if there's no
2420// marker, return 0xff, which is never a valid marker value
2421static stbi_uc stbi__get_marker(stbi__jpeg *j)
2422{
2423   stbi_uc x;
2424   if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
2425   x = stbi__get8(j->s);
2426   if (x != 0xff) return STBI__MARKER_none;
2427   while (x == 0xff)
2428      x = stbi__get8(j->s);
2429   return x;
2430}
2431
2432// in each scan, we'll have scan_n components, and the order
2433// of the components is specified by order[]
2434#define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
2435
2436// after a restart interval, stbi__jpeg_reset the entropy decoder and
2437// the dc prediction
2438static void stbi__jpeg_reset(stbi__jpeg *j)
2439{
2440   j->code_bits = 0;
2441   j->code_buffer = 0;
2442   j->nomore = 0;
2443   j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = 0;
2444   j->marker = STBI__MARKER_none;
2445   j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2446   j->eob_run = 0;
2447   // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2448   // since we don't even allow 1<<30 pixels
2449}
2450
2451static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
2452{
2453   stbi__jpeg_reset(z);
2454   if (!z->progressive) {
2455      if (z->scan_n == 1) {
2456         int i,j;
2457         STBI_SIMD_ALIGN(short, data[64]);
2458         int n = z->order[0];
2459         // non-interleaved data, we just need to process one block at a time,
2460         // in trivial scanline order
2461         // number of blocks to do just depends on how many actual "pixels" this
2462         // component has, independent of interleaved MCU blocking and such
2463         int w = (z->img_comp[n].x+7) >> 3;
2464         int h = (z->img_comp[n].y+7) >> 3;
2465         for (j=0; j < h; ++j) {
2466            for (i=0; i < w; ++i) {
2467               int ha = z->img_comp[n].ha;
2468               if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2469               z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2470               // every data block is an MCU, so countdown the restart interval
2471               if (--z->todo <= 0) {
2472                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2473                  // if it's NOT a restart, then just bail, so we get corrupt data
2474                  // rather than no data
2475                  if (!STBI__RESTART(z->marker)) return 1;
2476                  stbi__jpeg_reset(z);
2477               }
2478            }
2479         }
2480         return 1;
2481      } else { // interleaved
2482         int i,j,k,x,y;
2483         STBI_SIMD_ALIGN(short, data[64]);
2484         for (j=0; j < z->img_mcu_y; ++j) {
2485            for (i=0; i < z->img_mcu_x; ++i) {
2486               // scan an interleaved mcu... process scan_n components in order
2487               for (k=0; k < z->scan_n; ++k) {
2488                  int n = z->order[k];
2489                  // scan out an mcu's worth of this component; that's just determined
2490                  // by the basic H and V specified for the component
2491                  for (y=0; y < z->img_comp[n].v; ++y) {
2492                     for (x=0; x < z->img_comp[n].h; ++x) {
2493                        int x2 = (i*z->img_comp[n].h + x)*8;
2494                        int y2 = (j*z->img_comp[n].v + y)*8;
2495                        int ha = z->img_comp[n].ha;
2496                        if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2497                        z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
2498                     }
2499                  }
2500               }
2501               // after all interleaved components, that's an interleaved MCU,
2502               // so now count down the restart interval
2503               if (--z->todo <= 0) {
2504                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2505                  if (!STBI__RESTART(z->marker)) return 1;
2506                  stbi__jpeg_reset(z);
2507               }
2508            }
2509         }
2510         return 1;
2511      }
2512   } else {
2513      if (z->scan_n == 1) {
2514         int i,j;
2515         int n = z->order[0];
2516         // non-interleaved data, we just need to process one block at a time,
2517         // in trivial scanline order
2518         // number of blocks to do just depends on how many actual "pixels" this
2519         // component has, independent of interleaved MCU blocking and such
2520         int w = (z->img_comp[n].x+7) >> 3;
2521         int h = (z->img_comp[n].y+7) >> 3;
2522         for (j=0; j < h; ++j) {
2523            for (i=0; i < w; ++i) {
2524               short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2525               if (z->spec_start == 0) {
2526                  if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2527                     return 0;
2528               } else {
2529                  int ha = z->img_comp[n].ha;
2530                  if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
2531                     return 0;
2532               }
2533               // every data block is an MCU, so countdown the restart interval
2534               if (--z->todo <= 0) {
2535                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2536                  if (!STBI__RESTART(z->marker)) return 1;
2537                  stbi__jpeg_reset(z);
2538               }
2539            }
2540         }
2541         return 1;
2542      } else { // interleaved
2543         int i,j,k,x,y;
2544         for (j=0; j < z->img_mcu_y; ++j) {
2545            for (i=0; i < z->img_mcu_x; ++i) {
2546               // scan an interleaved mcu... process scan_n components in order
2547               for (k=0; k < z->scan_n; ++k) {
2548                  int n = z->order[k];
2549                  // scan out an mcu's worth of this component; that's just determined
2550                  // by the basic H and V specified for the component
2551                  for (y=0; y < z->img_comp[n].v; ++y) {
2552                     for (x=0; x < z->img_comp[n].h; ++x) {
2553                        int x2 = (i*z->img_comp[n].h + x);
2554                        int y2 = (j*z->img_comp[n].v + y);
2555                        short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
2556                        if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2557                           return 0;
2558                     }
2559                  }
2560               }
2561               // after all interleaved components, that's an interleaved MCU,
2562               // so now count down the restart interval
2563               if (--z->todo <= 0) {
2564                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2565                  if (!STBI__RESTART(z->marker)) return 1;
2566                  stbi__jpeg_reset(z);
2567               }
2568            }
2569         }
2570         return 1;
2571      }
2572   }
2573}
2574
2575static void stbi__jpeg_dequantize(short *data, stbi_uc *dequant)
2576{
2577   int i;
2578   for (i=0; i < 64; ++i)
2579      data[i] *= dequant[i];
2580}
2581
2582static void stbi__jpeg_finish(stbi__jpeg *z)
2583{
2584   if (z->progressive) {
2585      // dequantize and idct the data
2586      int i,j,n;
2587      for (n=0; n < z->s->img_n; ++n) {
2588         int w = (z->img_comp[n].x+7) >> 3;
2589         int h = (z->img_comp[n].y+7) >> 3;
2590         for (j=0; j < h; ++j) {
2591            for (i=0; i < w; ++i) {
2592               short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2593               stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
2594               z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2595            }
2596         }
2597      }
2598   }
2599}
2600
2601static int stbi__process_marker(stbi__jpeg *z, int m)
2602{
2603   int L;
2604   switch (m) {
2605      case STBI__MARKER_none: // no marker found
2606         return stbi__err("expected marker","Corrupt JPEG");
2607
2608      case 0xDD: // DRI - specify restart interval
2609         if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
2610         z->restart_interval = stbi__get16be(z->s);
2611         return 1;
2612
2613      case 0xDB: // DQT - define quantization table
2614         L = stbi__get16be(z->s)-2;
2615         while (L > 0) {
2616            int q = stbi__get8(z->s);
2617            int p = q >> 4;
2618            int t = q & 15,i;
2619            if (p != 0) return stbi__err("bad DQT type","Corrupt JPEG");
2620            if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
2621            for (i=0; i < 64; ++i)
2622               z->dequant[t][stbi__jpeg_dezigzag[i]] = stbi__get8(z->s);
2623            L -= 65;
2624         }
2625         return L==0;
2626
2627      case 0xC4: // DHT - define huffman table
2628         L = stbi__get16be(z->s)-2;
2629         while (L > 0) {
2630            stbi_uc *v;
2631            int sizes[16],i,n=0;
2632            int q = stbi__get8(z->s);
2633            int tc = q >> 4;
2634            int th = q & 15;
2635            if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
2636            for (i=0; i < 16; ++i) {
2637               sizes[i] = stbi__get8(z->s);
2638               n += sizes[i];
2639            }
2640            L -= 17;
2641            if (tc == 0) {
2642               if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
2643               v = z->huff_dc[th].values;
2644            } else {
2645               if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
2646               v = z->huff_ac[th].values;
2647            }
2648            for (i=0; i < n; ++i)
2649               v[i] = stbi__get8(z->s);
2650            if (tc != 0)
2651               stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
2652            L -= n;
2653         }
2654         return L==0;
2655   }
2656   // check for comment block or APP blocks
2657   if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
2658      stbi__skip(z->s, stbi__get16be(z->s)-2);
2659      return 1;
2660   }
2661   return 0;
2662}
2663
2664// after we see SOS
2665static int stbi__process_scan_header(stbi__jpeg *z)
2666{
2667   int i;
2668   int Ls = stbi__get16be(z->s);
2669   z->scan_n = stbi__get8(z->s);
2670   if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
2671   if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
2672   for (i=0; i < z->scan_n; ++i) {
2673      int id = stbi__get8(z->s), which;
2674      int q = stbi__get8(z->s);
2675      for (which = 0; which < z->s->img_n; ++which)
2676         if (z->img_comp[which].id == id)
2677            break;
2678      if (which == z->s->img_n) return 0; // no match
2679      z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
2680      z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
2681      z->order[i] = which;
2682   }
2683
2684   {
2685      int aa;
2686      z->spec_start = stbi__get8(z->s);
2687      z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
2688      aa = stbi__get8(z->s);
2689      z->succ_high = (aa >> 4);
2690      z->succ_low  = (aa & 15);
2691      if (z->progressive) {
2692         if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
2693            return stbi__err("bad SOS", "Corrupt JPEG");
2694      } else {
2695         if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
2696         if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
2697         z->spec_end = 63;
2698      }
2699   }
2700
2701   return 1;
2702}
2703
2704static int stbi__process_frame_header(stbi__jpeg *z, int scan)
2705{
2706   stbi__context *s = z->s;
2707   int Lf,p,i,q, h_max=1,v_max=1,c;
2708   Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
2709   p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
2710   s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
2711   s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
2712   c = stbi__get8(s);
2713   if (c != 3 && c != 1) return stbi__err("bad component count","Corrupt JPEG");    // JFIF requires
2714   s->img_n = c;
2715   for (i=0; i < c; ++i) {
2716      z->img_comp[i].data = NULL;
2717      z->img_comp[i].linebuf = NULL;
2718   }
2719
2720   if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
2721
2722   for (i=0; i < s->img_n; ++i) {
2723      z->img_comp[i].id = stbi__get8(s);
2724      if (z->img_comp[i].id != i+1)   // JFIF requires
2725         if (z->img_comp[i].id != i)  // some version of jpegtran outputs non-JFIF-compliant files!
2726            return stbi__err("bad component ID","Corrupt JPEG");
2727      q = stbi__get8(s);
2728      z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
2729      z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
2730      z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
2731   }
2732
2733   if (scan != STBI__SCAN_load) return 1;
2734
2735   if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
2736
2737   for (i=0; i < s->img_n; ++i) {
2738      if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
2739      if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
2740   }
2741
2742   // compute interleaved mcu info
2743   z->img_h_max = h_max;
2744   z->img_v_max = v_max;
2745   z->img_mcu_w = h_max * 8;
2746   z->img_mcu_h = v_max * 8;
2747   z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
2748   z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
2749
2750   for (i=0; i < s->img_n; ++i) {
2751      // number of effective pixels (e.g. for non-interleaved MCU)
2752      z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
2753      z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
2754      // to simplify generation, we'll allocate enough memory to decode
2755      // the bogus oversized data from using interleaved MCUs and their
2756      // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
2757      // discard the extra data until colorspace conversion
2758      z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
2759      z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
2760      z->img_comp[i].raw_data = stbi__malloc(z->img_comp[i].w2 * z->img_comp[i].h2+15);
2761
2762      if (z->img_comp[i].raw_data == NULL) {
2763         for(--i; i >= 0; --i) {
2764            STBI_FREE(z->img_comp[i].raw_data);
2765            z->img_comp[i].raw_data = NULL;
2766         }
2767         return stbi__err("outofmem", "Out of memory");
2768      }
2769      // align blocks for idct using mmx/sse
2770      z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
2771      z->img_comp[i].linebuf = NULL;
2772      if (z->progressive) {
2773         z->img_comp[i].coeff_w = (z->img_comp[i].w2 + 7) >> 3;
2774         z->img_comp[i].coeff_h = (z->img_comp[i].h2 + 7) >> 3;
2775         z->img_comp[i].raw_coeff = STBI_MALLOC(z->img_comp[i].coeff_w * z->img_comp[i].coeff_h * 64 * sizeof(short) + 15);
2776         z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
2777      } else {
2778         z->img_comp[i].coeff = 0;
2779         z->img_comp[i].raw_coeff = 0;
2780      }
2781   }
2782
2783   return 1;
2784}
2785
2786// use comparisons since in some cases we handle more than one case (e.g. SOF)
2787#define stbi__DNL(x)         ((x) == 0xdc)
2788#define stbi__SOI(x)         ((x) == 0xd8)
2789#define stbi__EOI(x)         ((x) == 0xd9)
2790#define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
2791#define stbi__SOS(x)         ((x) == 0xda)
2792
2793#define stbi__SOF_progressive(x)   ((x) == 0xc2)
2794
2795static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
2796{
2797   int m;
2798   z->marker = STBI__MARKER_none; // initialize cached marker to empty
2799   m = stbi__get_marker(z);
2800   if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
2801   if (scan == STBI__SCAN_type) return 1;
2802   m = stbi__get_marker(z);
2803   while (!stbi__SOF(m)) {
2804      if (!stbi__process_marker(z,m)) return 0;
2805      m = stbi__get_marker(z);
2806      while (m == STBI__MARKER_none) {
2807         // some files have extra padding after their blocks, so ok, we'll scan
2808         if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
2809         m = stbi__get_marker(z);
2810      }
2811   }
2812   z->progressive = stbi__SOF_progressive(m);
2813   if (!stbi__process_frame_header(z, scan)) return 0;
2814   return 1;
2815}
2816
2817// decode image to YCbCr format
2818static int stbi__decode_jpeg_image(stbi__jpeg *j)
2819{
2820   int m;
2821   for (m = 0; m < 4; m++) {
2822      j->img_comp[m].raw_data = NULL;
2823      j->img_comp[m].raw_coeff = NULL;
2824   }
2825   j->restart_interval = 0;
2826   if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
2827   m = stbi__get_marker(j);
2828   while (!stbi__EOI(m)) {
2829      if (stbi__SOS(m)) {
2830         if (!stbi__process_scan_header(j)) return 0;
2831         if (!stbi__parse_entropy_coded_data(j)) return 0;
2832         if (j->marker == STBI__MARKER_none ) {
2833            // handle 0s at the end of image data from IP Kamera 9060
2834            while (!stbi__at_eof(j->s)) {
2835               int x = stbi__get8(j->s);
2836               if (x == 255) {
2837                  j->marker = stbi__get8(j->s);
2838                  break;
2839               } else if (x != 0) {
2840                  return stbi__err("junk before marker", "Corrupt JPEG");
2841               }
2842            }
2843            // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
2844         }
2845      } else {
2846         if (!stbi__process_marker(j, m)) return 0;
2847      }
2848      m = stbi__get_marker(j);
2849   }
2850   if (j->progressive)
2851      stbi__jpeg_finish(j);
2852   return 1;
2853}
2854
2855// static jfif-centered resampling (across block boundaries)
2856
2857typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
2858                                    int w, int hs);
2859
2860#define stbi__div4(x) ((stbi_uc) ((x) >> 2))
2861
2862static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2863{
2864   STBI_NOTUSED(out);
2865   STBI_NOTUSED(in_far);
2866   STBI_NOTUSED(w);
2867   STBI_NOTUSED(hs);
2868   return in_near;
2869}
2870
2871static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2872{
2873   // need to generate two samples vertically for every one in input
2874   int i;
2875   STBI_NOTUSED(hs);
2876   for (i=0; i < w; ++i)
2877      out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
2878   return out;
2879}
2880
2881static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2882{
2883   // need to generate two samples horizontally for every one in input
2884   int i;
2885   stbi_uc *input = in_near;
2886
2887   if (w == 1) {
2888      // if only one sample, can't do any interpolation
2889      out[0] = out[1] = input[0];
2890      return out;
2891   }
2892
2893   out[0] = input[0];
2894   out[1] = stbi__div4(input[0]*3 + input[1] + 2);
2895   for (i=1; i < w-1; ++i) {
2896      int n = 3*input[i]+2;
2897      out[i*2+0] = stbi__div4(n+input[i-1]);
2898      out[i*2+1] = stbi__div4(n+input[i+1]);
2899   }
2900   out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
2901   out[i*2+1] = input[w-1];
2902
2903   STBI_NOTUSED(in_far);
2904   STBI_NOTUSED(hs);
2905
2906   return out;
2907}
2908
2909#define stbi__div16(x) ((stbi_uc) ((x) >> 4))
2910
2911static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2912{
2913   // need to generate 2x2 samples for every one in input
2914   int i,t0,t1;
2915   if (w == 1) {
2916      out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2917      return out;
2918   }
2919
2920   t1 = 3*in_near[0] + in_far[0];
2921   out[0] = stbi__div4(t1+2);
2922   for (i=1; i < w; ++i) {
2923      t0 = t1;
2924      t1 = 3*in_near[i]+in_far[i];
2925      out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
2926      out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
2927   }
2928   out[w*2-1] = stbi__div4(t1+2);
2929
2930   STBI_NOTUSED(hs);
2931
2932   return out;
2933}
2934
2935#if defined(STBI_SSE2) || defined(STBI_NEON)
2936static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2937{
2938   // need to generate 2x2 samples for every one in input
2939   int i=0,t0,t1;
2940
2941   if (w == 1) {
2942      out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2943      return out;
2944   }
2945
2946   t1 = 3*in_near[0] + in_far[0];
2947   // process groups of 8 pixels for as long as we can.
2948   // note we can't handle the last pixel in a row in this loop
2949   // because we need to handle the filter boundary conditions.
2950   for (; i < ((w-1) & ~7); i += 8) {
2951#if defined(STBI_SSE2)
2952      // load and perform the vertical filtering pass
2953      // this uses 3*x + y = 4*x + (y - x)
2954      __m128i zero  = _mm_setzero_si128();
2955      __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
2956      __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
2957      __m128i farw  = _mm_unpacklo_epi8(farb, zero);
2958      __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
2959      __m128i diff  = _mm_sub_epi16(farw, nearw);
2960      __m128i nears = _mm_slli_epi16(nearw, 2);
2961      __m128i curr  = _mm_add_epi16(nears, diff); // current row
2962
2963      // horizontal filter works the same based on shifted vers of current
2964      // row. "prev" is current row shifted right by 1 pixel; we need to
2965      // insert the previous pixel value (from t1).
2966      // "next" is current row shifted left by 1 pixel, with first pixel
2967      // of next block of 8 pixels added in.
2968      __m128i prv0 = _mm_slli_si128(curr, 2);
2969      __m128i nxt0 = _mm_srli_si128(curr, 2);
2970      __m128i prev = _mm_insert_epi16(prv0, t1, 0);
2971      __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
2972
2973      // horizontal filter, polyphase implementation since it's convenient:
2974      // even pixels = 3*cur + prev = cur*4 + (prev - cur)
2975      // odd  pixels = 3*cur + next = cur*4 + (next - cur)
2976      // note the shared term.
2977      __m128i bias  = _mm_set1_epi16(8);
2978      __m128i curs = _mm_slli_epi16(curr, 2);
2979      __m128i prvd = _mm_sub_epi16(prev, curr);
2980      __m128i nxtd = _mm_sub_epi16(next, curr);
2981      __m128i curb = _mm_add_epi16(curs, bias);
2982      __m128i even = _mm_add_epi16(prvd, curb);
2983      __m128i odd  = _mm_add_epi16(nxtd, curb);
2984
2985      // interleave even and odd pixels, then undo scaling.
2986      __m128i int0 = _mm_unpacklo_epi16(even, odd);
2987      __m128i int1 = _mm_unpackhi_epi16(even, odd);
2988      __m128i de0  = _mm_srli_epi16(int0, 4);
2989      __m128i de1  = _mm_srli_epi16(int1, 4);
2990
2991      // pack and write output
2992      __m128i outv = _mm_packus_epi16(de0, de1);
2993      _mm_storeu_si128((__m128i *) (out + i*2), outv);
2994#elif defined(STBI_NEON)
2995      // load and perform the vertical filtering pass
2996      // this uses 3*x + y = 4*x + (y - x)
2997      uint8x8_t farb  = vld1_u8(in_far + i);
2998      uint8x8_t nearb = vld1_u8(in_near + i);
2999      int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
3000      int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
3001      int16x8_t curr  = vaddq_s16(nears, diff); // current row
3002
3003      // horizontal filter works the same based on shifted vers of current
3004      // row. "prev" is current row shifted right by 1 pixel; we need to
3005      // insert the previous pixel value (from t1).
3006      // "next" is current row shifted left by 1 pixel, with first pixel
3007      // of next block of 8 pixels added in.
3008      int16x8_t prv0 = vextq_s16(curr, curr, 7);
3009      int16x8_t nxt0 = vextq_s16(curr, curr, 1);
3010      int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
3011      int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
3012
3013      // horizontal filter, polyphase implementation since it's convenient:
3014      // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3015      // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3016      // note the shared term.
3017      int16x8_t curs = vshlq_n_s16(curr, 2);
3018      int16x8_t prvd = vsubq_s16(prev, curr);
3019      int16x8_t nxtd = vsubq_s16(next, curr);
3020      int16x8_t even = vaddq_s16(curs, prvd);
3021      int16x8_t odd  = vaddq_s16(curs, nxtd);
3022
3023      // undo scaling and round, then store with even/odd phases interleaved
3024      uint8x8x2_t o;
3025      o.val[0] = vqrshrun_n_s16(even, 4);
3026      o.val[1] = vqrshrun_n_s16(odd,  4);
3027      vst2_u8(out + i*2, o);
3028#endif
3029
3030      // "previous" value for next iter
3031      t1 = 3*in_near[i+7] + in_far[i+7];
3032   }
3033
3034   t0 = t1;
3035   t1 = 3*in_near[i] + in_far[i];
3036   out[i*2] = stbi__div16(3*t1 + t0 + 8);
3037
3038   for (++i; i < w; ++i) {
3039      t0 = t1;
3040      t1 = 3*in_near[i]+in_far[i];
3041      out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3042      out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
3043   }
3044   out[w*2-1] = stbi__div4(t1+2);
3045
3046   STBI_NOTUSED(hs);
3047
3048   return out;
3049}
3050#endif
3051
3052static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3053{
3054   // resample with nearest-neighbor
3055   int i,j;
3056   STBI_NOTUSED(in_far);
3057   for (i=0; i < w; ++i)
3058      for (j=0; j < hs; ++j)
3059         out[i*hs+j] = in_near[i];
3060   return out;
3061}
3062
3063#ifdef STBI_JPEG_OLD
3064// this is the same YCbCr-to-RGB calculation that stb_image has used
3065// historically before the algorithm changes in 1.49
3066#define float2fixed(x)  ((int) ((x) * 65536 + 0.5))
3067static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3068{
3069   int i;
3070   for (i=0; i < count; ++i) {
3071      int y_fixed = (y[i] << 16) + 32768; // rounding
3072      int r,g,b;
3073      int cr = pcr[i] - 128;
3074      int cb = pcb[i] - 128;
3075      r = y_fixed + cr*float2fixed(1.40200f);
3076      g = y_fixed - cr*float2fixed(0.71414f) - cb*float2fixed(0.34414f);
3077      b = y_fixed                            + cb*float2fixed(1.77200f);
3078      r >>= 16;
3079      g >>= 16;
3080      b >>= 16;
3081      if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3082      if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3083      if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3084      out[0] = (stbi_uc)r;
3085      out[1] = (stbi_uc)g;
3086      out[2] = (stbi_uc)b;
3087      out[3] = 255;
3088      out += step;
3089   }
3090}
3091#else
3092// this is a reduced-precision calculation of YCbCr-to-RGB introduced
3093// to make sure the code produces the same results in both SIMD and scalar
3094#define float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
3095static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3096{
3097   int i;
3098   for (i=0; i < count; ++i) {
3099      int y_fixed = (y[i] << 20) + (1<<19); // rounding
3100      int r,g,b;
3101      int cr = pcr[i] - 128;
3102      int cb = pcb[i] - 128;
3103      r = y_fixed +  cr* float2fixed(1.40200f);
3104      g = y_fixed + (cr*-float2fixed(0.71414f)) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3105      b = y_fixed                               +   cb* float2fixed(1.77200f);
3106      r >>= 20;
3107      g >>= 20;
3108      b >>= 20;
3109      if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3110      if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3111      if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3112      out[0] = (stbi_uc)r;
3113      out[1] = (stbi_uc)g;
3114      out[2] = (stbi_uc)b;
3115      out[3] = 255;
3116      out += step;
3117   }
3118}
3119#endif
3120
3121#if defined(STBI_SSE2) || defined(STBI_NEON)
3122static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
3123{
3124   int i = 0;
3125
3126#ifdef STBI_SSE2
3127   // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3128   // it's useful in practice (you wouldn't use it for textures, for example).
3129   // so just accelerate step == 4 case.
3130   if (step == 4) {
3131      // this is a fairly straightforward implementation and not super-optimized.
3132      __m128i signflip  = _mm_set1_epi8(-0x80);
3133      __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
3134      __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
3135      __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
3136      __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
3137      __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
3138      __m128i xw = _mm_set1_epi16(255); // alpha channel
3139
3140      for (; i+7 < count; i += 8) {
3141         // load
3142         __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
3143         __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
3144         __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
3145         __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3146         __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3147
3148         // unpack to short (and left-shift cr, cb by 8)
3149         __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
3150         __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3151         __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3152
3153         // color transform
3154         __m128i yws = _mm_srli_epi16(yw, 4);
3155         __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3156         __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3157         __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3158         __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3159         __m128i rws = _mm_add_epi16(cr0, yws);
3160         __m128i gwt = _mm_add_epi16(cb0, yws);
3161         __m128i bws = _mm_add_epi16(yws, cb1);
3162         __m128i gws = _mm_add_epi16(gwt, cr1);
3163
3164         // descale
3165         __m128i rw = _mm_srai_epi16(rws, 4);
3166         __m128i bw = _mm_srai_epi16(bws, 4);
3167         __m128i gw = _mm_srai_epi16(gws, 4);
3168
3169         // back to byte, set up for transpose
3170         __m128i brb = _mm_packus_epi16(rw, bw);
3171         __m128i gxb = _mm_packus_epi16(gw, xw);
3172
3173         // transpose to interleave channels
3174         __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3175         __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3176         __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3177         __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3178
3179         // store
3180         _mm_storeu_si128((__m128i *) (out + 0), o0);
3181         _mm_storeu_si128((__m128i *) (out + 16), o1);
3182         out += 32;
3183      }
3184   }
3185#endif
3186
3187#ifdef STBI_NEON
3188   // in this version, step=3 support would be easy to add. but is there demand?
3189   if (step == 4) {
3190      // this is a fairly straightforward implementation and not super-optimized.
3191      uint8x8_t signflip = vdup_n_u8(0x80);
3192      int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
3193      int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
3194      int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
3195      int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
3196
3197      for (; i+7 < count; i += 8) {
3198         // load
3199         uint8x8_t y_bytes  = vld1_u8(y + i);
3200         uint8x8_t cr_bytes = vld1_u8(pcr + i);
3201         uint8x8_t cb_bytes = vld1_u8(pcb + i);
3202         int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3203         int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3204
3205         // expand to s16
3206         int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3207         int16x8_t crw = vshll_n_s8(cr_biased, 7);
3208         int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3209
3210         // color transform
3211         int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3212         int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3213         int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3214         int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3215         int16x8_t rws = vaddq_s16(yws, cr0);
3216         int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3217         int16x8_t bws = vaddq_s16(yws, cb1);
3218
3219         // undo scaling, round, convert to byte
3220         uint8x8x4_t o;
3221         o.val[0] = vqrshrun_n_s16(rws, 4);
3222         o.val[1] = vqrshrun_n_s16(gws, 4);
3223         o.val[2] = vqrshrun_n_s16(bws, 4);
3224         o.val[3] = vdup_n_u8(255);
3225
3226         // store, interleaving r/g/b/a
3227         vst4_u8(out, o);
3228         out += 8*4;
3229      }
3230   }
3231#endif
3232
3233   for (; i < count; ++i) {
3234      int y_fixed = (y[i] << 20) + (1<<19); // rounding
3235      int r,g,b;
3236      int cr = pcr[i] - 128;
3237      int cb = pcb[i] - 128;
3238      r = y_fixed + cr* float2fixed(1.40200f);
3239      g = y_fixed + cr*-float2fixed(0.71414f) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3240      b = y_fixed                             +   cb* float2fixed(1.77200f);
3241      r >>= 20;
3242      g >>= 20;
3243      b >>= 20;
3244      if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3245      if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3246      if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3247      out[0] = (stbi_uc)r;
3248      out[1] = (stbi_uc)g;
3249      out[2] = (stbi_uc)b;
3250      out[3] = 255;
3251      out += step;
3252   }
3253}
3254#endif
3255
3256// set up the kernels
3257static void stbi__setup_jpeg(stbi__jpeg *j)
3258{
3259   j->idct_block_kernel = stbi__idct_block;
3260   j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3261   j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3262
3263#ifdef STBI_SSE2
3264   if (stbi__sse2_available()) {
3265      j->idct_block_kernel = stbi__idct_simd;
3266      #ifndef STBI_JPEG_OLD
3267      j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3268      #endif
3269      j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3270   }
3271#endif
3272
3273#ifdef STBI_NEON
3274   j->idct_block_kernel = stbi__idct_simd;
3275   #ifndef STBI_JPEG_OLD
3276   j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3277   #endif
3278   j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3279#endif
3280}
3281
3282// clean up the temporary component buffers
3283static void stbi__cleanup_jpeg(stbi__jpeg *j)
3284{
3285   int i;
3286   for (i=0; i < j->s->img_n; ++i) {
3287      if (j->img_comp[i].raw_data) {
3288         STBI_FREE(j->img_comp[i].raw_data);
3289         j->img_comp[i].raw_data = NULL;
3290         j->img_comp[i].data = NULL;
3291      }
3292      if (j->img_comp[i].raw_coeff) {
3293         STBI_FREE(j->img_comp[i].raw_coeff);
3294         j->img_comp[i].raw_coeff = 0;
3295         j->img_comp[i].coeff = 0;
3296      }
3297      if (j->img_comp[i].linebuf) {
3298         STBI_FREE(j->img_comp[i].linebuf);
3299         j->img_comp[i].linebuf = NULL;
3300      }
3301   }
3302}
3303
3304typedef struct
3305{
3306   resample_row_func resample;
3307   stbi_uc *line0,*line1;
3308   int hs,vs;   // expansion factor in each axis
3309   int w_lores; // horizontal pixels pre-expansion
3310   int ystep;   // how far through vertical expansion we are
3311   int ypos;    // which pre-expansion row we're on
3312} stbi__resample;
3313
3314static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
3315{
3316   int n, decode_n;
3317   z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3318
3319   // validate req_comp
3320   if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
3321
3322   // load a jpeg image from whichever source, but leave in YCbCr format
3323   if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
3324
3325   // determine actual number of components to generate
3326   n = req_comp ? req_comp : z->s->img_n;
3327
3328   if (z->s->img_n == 3 && n < 3)
3329      decode_n = 1;
3330   else
3331      decode_n = z->s->img_n;
3332
3333   // resample and color-convert
3334   {
3335      int k;
3336      unsigned int i,j;
3337      stbi_uc *output;
3338      stbi_uc *coutput[4];
3339
3340      stbi__resample res_comp[4];
3341
3342      for (k=0; k < decode_n; ++k) {
3343         stbi__resample *r = &res_comp[k];
3344
3345         // allocate line buffer big enough for upsampling off the edges
3346         // with upsample factor of 4
3347         z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
3348         if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3349
3350         r->hs      = z->img_h_max / z->img_comp[k].h;
3351         r->vs      = z->img_v_max / z->img_comp[k].v;
3352         r->ystep   = r->vs >> 1;
3353         r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
3354         r->ypos    = 0;
3355         r->line0   = r->line1 = z->img_comp[k].data;
3356
3357         if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
3358         else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
3359         else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
3360         else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
3361         else                               r->resample = stbi__resample_row_generic;
3362      }
3363
3364      // can't error after this so, this is safe
3365      output = (stbi_uc *) stbi__malloc(n * z->s->img_x * z->s->img_y + 1);
3366      if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3367
3368      // now go ahead and resample
3369      for (j=0; j < z->s->img_y; ++j) {
3370         stbi_uc *out = output + n * z->s->img_x * j;
3371         for (k=0; k < decode_n; ++k) {
3372            stbi__resample *r = &res_comp[k];
3373            int y_bot = r->ystep >= (r->vs >> 1);
3374            coutput[k] = r->resample(z->img_comp[k].linebuf,
3375                                     y_bot ? r->line1 : r->line0,
3376                                     y_bot ? r->line0 : r->line1,
3377                                     r->w_lores, r->hs);
3378            if (++r->ystep >= r->vs) {
3379               r->ystep = 0;
3380               r->line0 = r->line1;
3381               if (++r->ypos < z->img_comp[k].y)
3382                  r->line1 += z->img_comp[k].w2;
3383            }
3384         }
3385         if (n >= 3) {
3386            stbi_uc *y = coutput[0];
3387            if (z->s->img_n == 3) {
3388               z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3389            } else
3390               for (i=0; i < z->s->img_x; ++i) {
3391                  out[0] = out[1] = out[2] = y[i];
3392                  out[3] = 255; // not used if n==3
3393                  out += n;
3394               }
3395         } else {
3396            stbi_uc *y = coutput[0];
3397            if (n == 1)
3398               for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
3399            else
3400               for (i=0; i < z->s->img_x; ++i) *out++ = y[i], *out++ = 255;
3401         }
3402      }
3403      stbi__cleanup_jpeg(z);
3404      *out_x = z->s->img_x;
3405      *out_y = z->s->img_y;
3406      if (comp) *comp  = z->s->img_n; // report original components, not output
3407      return output;
3408   }
3409}
3410
3411static unsigned char *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
3412{
3413   stbi__jpeg j;
3414   j.s = s;
3415   stbi__setup_jpeg(&j);
3416   return load_jpeg_image(&j, x,y,comp,req_comp);
3417}
3418
3419static int stbi__jpeg_test(stbi__context *s)
3420{
3421   int r;
3422   stbi__jpeg j;
3423   j.s = s;
3424   stbi__setup_jpeg(&j);
3425   r = stbi__decode_jpeg_header(&j, STBI__SCAN_type);
3426   stbi__rewind(s);
3427   return r;
3428}
3429
3430static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
3431{
3432   if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
3433      stbi__rewind( j->s );
3434      return 0;
3435   }
3436   if (x) *x = j->s->img_x;
3437   if (y) *y = j->s->img_y;
3438   if (comp) *comp = j->s->img_n;
3439   return 1;
3440}
3441
3442static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
3443{
3444   stbi__jpeg j;
3445   j.s = s;
3446   return stbi__jpeg_info_raw(&j, x, y, comp);
3447}
3448#endif
3449
3450// public domain zlib decode    v0.2  Sean Barrett 2006-11-18
3451//    simple implementation
3452//      - all input must be provided in an upfront buffer
3453//      - all output is written to a single output buffer (can malloc/realloc)
3454//    performance
3455//      - fast huffman
3456
3457#ifndef STBI_NO_ZLIB
3458
3459// fast-way is faster to check than jpeg huffman, but slow way is slower
3460#define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
3461#define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
3462
3463// zlib-style huffman encoding
3464// (jpegs packs from left, zlib from right, so can't share code)
3465typedef struct
3466{
3467   stbi__uint16 fast[1 << STBI__ZFAST_BITS];
3468   stbi__uint16 firstcode[16];
3469   int maxcode[17];
3470   stbi__uint16 firstsymbol[16];
3471   stbi_uc  size[288];
3472   stbi__uint16 value[288];
3473} stbi__zhuffman;
3474
3475stbi_inline static int stbi__bitreverse16(int n)
3476{
3477  n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
3478  n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
3479  n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
3480  n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
3481  return n;
3482}
3483
3484stbi_inline static int stbi__bit_reverse(int v, int bits)
3485{
3486   STBI_ASSERT(bits <= 16);
3487   // to bit reverse n bits, reverse 16 and shift
3488   // e.g. 11 bits, bit reverse and shift away 5
3489   return stbi__bitreverse16(v) >> (16-bits);
3490}
3491
3492static int stbi__zbuild_huffman(stbi__zhuffman *z, stbi_uc *sizelist, int num)
3493{
3494   int i,k=0;
3495   int code, next_code[16], sizes[17];
3496
3497   // DEFLATE spec for generating codes
3498   memset(sizes, 0, sizeof(sizes));
3499   memset(z->fast, 0, sizeof(z->fast));
3500   for (i=0; i < num; ++i)
3501      ++sizes[sizelist[i]];
3502   sizes[0] = 0;
3503   for (i=1; i < 16; ++i)
3504      if (sizes[i] > (1 << i))
3505         return stbi__err("bad sizes", "Corrupt PNG");
3506   code = 0;
3507   for (i=1; i < 16; ++i) {
3508      next_code[i] = code;
3509      z->firstcode[i] = (stbi__uint16) code;
3510      z->firstsymbol[i] = (stbi__uint16) k;
3511      code = (code + sizes[i]);
3512      if (sizes[i])
3513         if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
3514      z->maxcode[i] = code << (16-i); // preshift for inner loop
3515      code <<= 1;
3516      k += sizes[i];
3517   }
3518   z->maxcode[16] = 0x10000; // sentinel
3519   for (i=0; i < num; ++i) {
3520      int s = sizelist[i];
3521      if (s) {
3522         int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
3523         stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
3524         z->size [c] = (stbi_uc     ) s;
3525         z->value[c] = (stbi__uint16) i;
3526         if (s <= STBI__ZFAST_BITS) {
3527            int j = stbi__bit_reverse(next_code[s],s);
3528            while (j < (1 << STBI__ZFAST_BITS)) {
3529               z->fast[j] = fastv;
3530               j += (1 << s);
3531            }
3532         }
3533         ++next_code[s];
3534      }
3535   }
3536   return 1;
3537}
3538
3539// zlib-from-memory implementation for PNG reading
3540//    because PNG allows splitting the zlib stream arbitrarily,
3541//    and it's annoying structurally to have PNG call ZLIB call PNG,
3542//    we require PNG read all the IDATs and combine them into a single
3543//    memory buffer
3544
3545typedef struct
3546{
3547   stbi_uc *zbuffer, *zbuffer_end;
3548   int num_bits;
3549   stbi__uint32 code_buffer;
3550
3551   char *zout;
3552   char *zout_start;
3553   char *zout_end;
3554   int   z_expandable;
3555
3556   stbi__zhuffman z_length, z_distance;
3557} stbi__zbuf;
3558
3559stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
3560{
3561   if (z->zbuffer >= z->zbuffer_end) return 0;
3562   return *z->zbuffer++;
3563}
3564
3565static void stbi__fill_bits(stbi__zbuf *z)
3566{
3567   do {
3568      STBI_ASSERT(z->code_buffer < (1U << z->num_bits));
3569      z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
3570      z->num_bits += 8;
3571   } while (z->num_bits <= 24);
3572}
3573
3574stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
3575{
3576   unsigned int k;
3577   if (z->num_bits < n) stbi__fill_bits(z);
3578   k = z->code_buffer & ((1 << n) - 1);
3579   z->code_buffer >>= n;
3580   z->num_bits -= n;
3581   return k;
3582}
3583
3584static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
3585{
3586   int b,s,k;
3587   // not resolved by fast table, so compute it the slow way
3588   // use jpeg approach, which requires MSbits at top
3589   k = stbi__bit_reverse(a->code_buffer, 16);
3590   for (s=STBI__ZFAST_BITS+1; ; ++s)
3591      if (k < z->maxcode[s])
3592         break;
3593   if (s == 16) return -1; // invalid code!
3594   // code size is s, so:
3595   b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
3596   STBI_ASSERT(z->size[b] == s);
3597   a->code_buffer >>= s;
3598   a->num_bits -= s;
3599   return z->value[b];
3600}
3601
3602stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
3603{
3604   int b,s;
3605   if (a->num_bits < 16) stbi__fill_bits(a);
3606   b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
3607   if (b) {
3608      s = b >> 9;
3609      a->code_buffer >>= s;
3610      a->num_bits -= s;
3611      return b & 511;
3612   }
3613   return stbi__zhuffman_decode_slowpath(a, z);
3614}
3615
3616static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
3617{
3618   char *q;
3619   int cur, limit;
3620   z->zout = zout;
3621   if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
3622   cur   = (int) (z->zout     - z->zout_start);
3623   limit = (int) (z->zout_end - z->zout_start);
3624   while (cur + n > limit)
3625      limit *= 2;
3626   q = (char *) STBI_REALLOC(z->zout_start, limit);
3627   if (q == NULL) return stbi__err("outofmem", "Out of memory");
3628   z->zout_start = q;
3629   z->zout       = q + cur;
3630   z->zout_end   = q + limit;
3631   return 1;
3632}
3633
3634static int stbi__zlength_base[31] = {
3635   3,4,5,6,7,8,9,10,11,13,
3636   15,17,19,23,27,31,35,43,51,59,
3637   67,83,99,115,131,163,195,227,258,0,0 };
3638
3639static int stbi__zlength_extra[31]=
3640{ 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
3641
3642static int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
3643257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
3644
3645static int stbi__zdist_extra[32] =
3646{ 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
3647
3648static int stbi__parse_huffman_block(stbi__zbuf *a)
3649{
3650   char *zout = a->zout;
3651   for(;;) {
3652      int z = stbi__zhuffman_decode(a, &a->z_length);
3653      if (z < 256) {
3654         if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
3655         if (zout >= a->zout_end) {
3656            if (!stbi__zexpand(a, zout, 1)) return 0;
3657            zout = a->zout;
3658         }
3659         *zout++ = (char) z;
3660      } else {
3661         stbi_uc *p;
3662         int len,dist;
3663         if (z == 256) {
3664            a->zout = zout;
3665            return 1;
3666         }
3667         z -= 257;
3668         len = stbi__zlength_base[z];
3669         if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
3670         z = stbi__zhuffman_decode(a, &a->z_distance);
3671         if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
3672         dist = stbi__zdist_base[z];
3673         if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
3674         if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
3675         if (zout + len > a->zout_end) {
3676            if (!stbi__zexpand(a, zout, len)) return 0;
3677            zout = a->zout;
3678         }
3679         p = (stbi_uc *) (zout - dist);
3680         if (dist == 1) { // run of one byte; common in images.
3681            stbi_uc v = *p;
3682            if (len) { do *zout++ = v; while (--len); }
3683         } else {
3684            if (len) { do *zout++ = *p++; while (--len); }
3685         }
3686      }
3687   }
3688}
3689
3690static int stbi__compute_huffman_codes(stbi__zbuf *a)
3691{
3692   static stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
3693   stbi__zhuffman z_codelength;
3694   stbi_uc lencodes[286+32+137];//padding for maximum single op
3695   stbi_uc codelength_sizes[19];
3696   int i,n;
3697
3698   int hlit  = stbi__zreceive(a,5) + 257;
3699   int hdist = stbi__zreceive(a,5) + 1;
3700   int hclen = stbi__zreceive(a,4) + 4;
3701
3702   memset(codelength_sizes, 0, sizeof(codelength_sizes));
3703   for (i=0; i < hclen; ++i) {
3704      int s = stbi__zreceive(a,3);
3705      codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
3706   }
3707   if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
3708
3709   n = 0;
3710   while (n < hlit + hdist) {
3711      int c = stbi__zhuffman_decode(a, &z_codelength);
3712      if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
3713      if (c < 16)
3714         lencodes[n++] = (stbi_uc) c;
3715      else if (c == 16) {
3716         c = stbi__zreceive(a,2)+3;
3717         memset(lencodes+n, lencodes[n-1], c);
3718         n += c;
3719      } else if (c == 17) {
3720         c = stbi__zreceive(a,3)+3;
3721         memset(lencodes+n, 0, c);
3722         n += c;
3723      } else {
3724         STBI_ASSERT(c == 18);
3725         c = stbi__zreceive(a,7)+11;
3726         memset(lencodes+n, 0, c);
3727         n += c;
3728      }
3729   }
3730   if (n != hlit+hdist) return stbi__err("bad codelengths","Corrupt PNG");
3731   if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
3732   if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
3733   return 1;
3734}
3735
3736static int stbi__parse_uncomperssed_block(stbi__zbuf *a)
3737{
3738   stbi_uc header[4];
3739   int len,nlen,k;
3740   if (a->num_bits & 7)
3741      stbi__zreceive(a, a->num_bits & 7); // discard
3742   // drain the bit-packed data into header
3743   k = 0;
3744   while (a->num_bits > 0) {
3745      header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
3746      a->code_buffer >>= 8;
3747      a->num_bits -= 8;
3748   }
3749   STBI_ASSERT(a->num_bits == 0);
3750   // now fill header the normal way
3751   while (k < 4)
3752      header[k++] = stbi__zget8(a);
3753   len  = header[1] * 256 + header[0];
3754   nlen = header[3] * 256 + header[2];
3755   if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
3756   if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
3757   if (a->zout + len > a->zout_end)
3758      if (!stbi__zexpand(a, a->zout, len)) return 0;
3759   memcpy(a->zout, a->zbuffer, len);
3760   a->zbuffer += len;
3761   a->zout += len;
3762   return 1;
3763}
3764
3765static int stbi__parse_zlib_header(stbi__zbuf *a)
3766{
3767   int cmf   = stbi__zget8(a);
3768   int cm    = cmf & 15;
3769   /* int cinfo = cmf >> 4; */
3770   int flg   = stbi__zget8(a);
3771   if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
3772   if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
3773   if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
3774   // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
3775   return 1;
3776}
3777
3778// @TODO: should statically initialize these for optimal thread safety
3779static stbi_uc stbi__zdefault_length[288], stbi__zdefault_distance[32];
3780static void stbi__init_zdefaults(void)
3781{
3782   int i;   // use <= to match clearly with spec
3783   for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
3784   for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
3785   for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
3786   for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
3787
3788   for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
3789}
3790
3791static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
3792{
3793   int final, type;
3794   if (parse_header)
3795      if (!stbi__parse_zlib_header(a)) return 0;
3796   a->num_bits = 0;
3797   a->code_buffer = 0;
3798   do {
3799      final = stbi__zreceive(a,1);
3800      type = stbi__zreceive(a,2);
3801      if (type == 0) {
3802         if (!stbi__parse_uncomperssed_block(a)) return 0;
3803      } else if (type == 3) {
3804         return 0;
3805      } else {
3806         if (type == 1) {
3807            // use fixed code lengths
3808            if (!stbi__zdefault_distance[31]) stbi__init_zdefaults();
3809            if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , 288)) return 0;
3810            if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
3811         } else {
3812            if (!stbi__compute_huffman_codes(a)) return 0;
3813         }
3814         if (!stbi__parse_huffman_block(a)) return 0;
3815      }
3816   } while (!final);
3817   return 1;
3818}
3819
3820static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
3821{
3822   a->zout_start = obuf;
3823   a->zout       = obuf;
3824   a->zout_end   = obuf + olen;
3825   a->z_expandable = exp;
3826
3827   return stbi__parse_zlib(a, parse_header);
3828}
3829
3830STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
3831{
3832   stbi__zbuf a;
3833   char *p = (char *) stbi__malloc(initial_size);
3834   if (p == NULL) return NULL;
3835   a.zbuffer = (stbi_uc *) buffer;
3836   a.zbuffer_end = (stbi_uc *) buffer + len;
3837   if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
3838      if (outlen) *outlen = (int) (a.zout - a.zout_start);
3839      return a.zout_start;
3840   } else {
3841      STBI_FREE(a.zout_start);
3842      return NULL;
3843   }
3844}
3845
3846STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
3847{
3848   return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
3849}
3850
3851STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
3852{
3853   stbi__zbuf a;
3854   char *p = (char *) stbi__malloc(initial_size);
3855   if (p == NULL) return NULL;
3856   a.zbuffer = (stbi_uc *) buffer;
3857   a.zbuffer_end = (stbi_uc *) buffer + len;
3858   if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
3859      if (outlen) *outlen = (int) (a.zout - a.zout_start);
3860      return a.zout_start;
3861   } else {
3862      STBI_FREE(a.zout_start);
3863      return NULL;
3864   }
3865}
3866
3867STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
3868{
3869   stbi__zbuf a;
3870   a.zbuffer = (stbi_uc *) ibuffer;
3871   a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3872   if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
3873      return (int) (a.zout - a.zout_start);
3874   else
3875      return -1;
3876}
3877
3878STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
3879{
3880   stbi__zbuf a;
3881   char *p = (char *) stbi__malloc(16384);
3882   if (p == NULL) return NULL;
3883   a.zbuffer = (stbi_uc *) buffer;
3884   a.zbuffer_end = (stbi_uc *) buffer+len;
3885   if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
3886      if (outlen) *outlen = (int) (a.zout - a.zout_start);
3887      return a.zout_start;
3888   } else {
3889      STBI_FREE(a.zout_start);
3890      return NULL;
3891   }
3892}
3893
3894STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
3895{
3896   stbi__zbuf a;
3897   a.zbuffer = (stbi_uc *) ibuffer;
3898   a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3899   if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
3900      return (int) (a.zout - a.zout_start);
3901   else
3902      return -1;
3903}
3904#endif
3905
3906// public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
3907//    simple implementation
3908//      - only 8-bit samples
3909//      - no CRC checking
3910//      - allocates lots of intermediate memory
3911//        - avoids problem of streaming data between subsystems
3912//        - avoids explicit window management
3913//    performance
3914//      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
3915
3916#ifndef STBI_NO_PNG
3917typedef struct
3918{
3919   stbi__uint32 length;
3920   stbi__uint32 type;
3921} stbi__pngchunk;
3922
3923static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
3924{
3925   stbi__pngchunk c;
3926   c.length = stbi__get32be(s);
3927   c.type   = stbi__get32be(s);
3928   return c;
3929}
3930
3931static int stbi__check_png_header(stbi__context *s)
3932{
3933   static stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
3934   int i;
3935   for (i=0; i < 8; ++i)
3936      if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
3937   return 1;
3938}
3939
3940typedef struct
3941{
3942   stbi__context *s;
3943   stbi_uc *idata, *expanded, *out;
3944} stbi__png;
3945
3946
3947enum {
3948   STBI__F_none=0,
3949   STBI__F_sub=1,
3950   STBI__F_up=2,
3951   STBI__F_avg=3,
3952   STBI__F_paeth=4,
3953   // synthetic filters used for first scanline to avoid needing a dummy row of 0s
3954   STBI__F_avg_first,
3955   STBI__F_paeth_first
3956};
3957
3958static stbi_uc first_row_filter[5] =
3959{
3960   STBI__F_none,
3961   STBI__F_sub,
3962   STBI__F_none,
3963   STBI__F_avg_first,
3964   STBI__F_paeth_first
3965};
3966
3967static int stbi__paeth(int a, int b, int c)
3968{
3969   int p = a + b - c;
3970   int pa = abs(p-a);
3971   int pb = abs(p-b);
3972   int pc = abs(p-c);
3973   if (pa <= pb && pa <= pc) return a;
3974   if (pb <= pc) return b;
3975   return c;
3976}
3977
3978static stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
3979
3980// create the png data from post-deflated data
3981static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
3982{
3983   stbi__context *s = a->s;
3984   stbi__uint32 i,j,stride = x*out_n;
3985   stbi__uint32 img_len, img_width_bytes;
3986   int k;
3987   int img_n = s->img_n; // copy it into a local for later
3988
3989   STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
3990   a->out = (stbi_uc *) stbi__malloc(x * y * out_n); // extra bytes to write off the end into
3991   if (!a->out) return stbi__err("outofmem", "Out of memory");
3992
3993   img_width_bytes = (((img_n * x * depth) + 7) >> 3);
3994   img_len = (img_width_bytes + 1) * y;
3995   if (s->img_x == x && s->img_y == y) {
3996      if (raw_len != img_len) return stbi__err("not enough pixels","Corrupt PNG");
3997   } else { // interlaced:
3998      if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
3999   }
4000
4001   for (j=0; j < y; ++j) {
4002      stbi_uc *cur = a->out + stride*j;
4003      stbi_uc *prior = cur - stride;
4004      int filter = *raw++;
4005      int filter_bytes = img_n;
4006      int width = x;
4007      if (filter > 4)
4008         return stbi__err("invalid filter","Corrupt PNG");
4009
4010      if (depth < 8) {
4011         STBI_ASSERT(img_width_bytes <= x);
4012         cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
4013         filter_bytes = 1;
4014         width = img_width_bytes;
4015      }
4016
4017      // if first row, use special filter that doesn't sample previous row
4018      if (j == 0) filter = first_row_filter[filter];
4019
4020      // handle first byte explicitly
4021      for (k=0; k < filter_bytes; ++k) {
4022         switch (filter) {
4023            case STBI__F_none       : cur[k] = raw[k]; break;
4024            case STBI__F_sub        : cur[k] = raw[k]; break;
4025            case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4026            case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
4027            case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
4028            case STBI__F_avg_first  : cur[k] = raw[k]; break;
4029            case STBI__F_paeth_first: cur[k] = raw[k]; break;
4030         }
4031      }
4032
4033      if (depth == 8) {
4034         if (img_n != out_n)
4035            cur[img_n] = 255; // first pixel
4036         raw += img_n;
4037         cur += out_n;
4038         prior += out_n;
4039      } else {
4040         raw += 1;
4041         cur += 1;
4042         prior += 1;
4043      }
4044
4045      // this is a little gross, so that we don't switch per-pixel or per-component
4046      if (depth < 8 || img_n == out_n) {
4047         int nk = (width - 1)*img_n;
4048         #define CASE(f) \
4049             case f:     \
4050                for (k=0; k < nk; ++k)
4051         switch (filter) {
4052            // "none" filter turns into a memcpy here; make that explicit.
4053            case STBI__F_none:         memcpy(cur, raw, nk); break;
4054            CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); break;
4055            CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4056            CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); break;
4057            CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); break;
4058            CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); break;
4059            CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); break;
4060         }
4061         #undef CASE
4062         raw += nk;
4063      } else {
4064         STBI_ASSERT(img_n+1 == out_n);
4065         #define CASE(f) \
4066             case f:     \
4067                for (i=x-1; i >= 1; --i, cur[img_n]=255,raw+=img_n,cur+=out_n,prior+=out_n) \
4068                   for (k=0; k < img_n; ++k)
4069         switch (filter) {
4070            CASE(STBI__F_none)         cur[k] = raw[k]; break;
4071            CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k-out_n]); break;
4072            CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4073            CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-out_n])>>1)); break;
4074            CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],prior[k],prior[k-out_n])); break;
4075            CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k-out_n] >> 1)); break;
4076            CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],0,0)); break;
4077         }
4078         #undef CASE
4079      }
4080   }
4081
4082   // we make a separate pass to expand bits to pixels; for performance,
4083   // this could run two scanlines behind the above code, so it won't
4084   // intefere with filtering but will still be in the cache.
4085   if (depth < 8) {
4086      for (j=0; j < y; ++j) {
4087         stbi_uc *cur = a->out + stride*j;
4088         stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
4089         // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
4090         // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
4091         stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
4092
4093         // note that the final byte might overshoot and write more data than desired.
4094         // we can allocate enough data that this never writes out of memory, but it
4095         // could also overwrite the next scanline. can it overwrite non-empty data
4096         // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
4097         // so we need to explicitly clamp the final ones
4098
4099         if (depth == 4) {
4100            for (k=x*img_n; k >= 2; k-=2, ++in) {
4101               *cur++ = scale * ((*in >> 4)       );
4102               *cur++ = scale * ((*in     ) & 0x0f);
4103            }
4104            if (k > 0) *cur++ = scale * ((*in >> 4)       );
4105         } else if (depth == 2) {
4106            for (k=x*img_n; k >= 4; k-=4, ++in) {
4107               *cur++ = scale * ((*in >> 6)       );
4108               *cur++ = scale * ((*in >> 4) & 0x03);
4109               *cur++ = scale * ((*in >> 2) & 0x03);
4110               *cur++ = scale * ((*in     ) & 0x03);
4111            }
4112            if (k > 0) *cur++ = scale * ((*in >> 6)       );
4113            if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
4114            if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
4115         } else if (depth == 1) {
4116            for (k=x*img_n; k >= 8; k-=8, ++in) {
4117               *cur++ = scale * ((*in >> 7)       );
4118               *cur++ = scale * ((*in >> 6) & 0x01);
4119               *cur++ = scale * ((*in >> 5) & 0x01);
4120               *cur++ = scale * ((*in >> 4) & 0x01);
4121               *cur++ = scale * ((*in >> 3) & 0x01);
4122               *cur++ = scale * ((*in >> 2) & 0x01);
4123               *cur++ = scale * ((*in >> 1) & 0x01);
4124               *cur++ = scale * ((*in     ) & 0x01);
4125            }
4126            if (k > 0) *cur++ = scale * ((*in >> 7)       );
4127            if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
4128            if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
4129            if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
4130            if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
4131            if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
4132            if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
4133         }
4134         if (img_n != out_n) {
4135            int q;
4136            // insert alpha = 255
4137            cur = a->out + stride*j;
4138            if (img_n == 1) {
4139               for (q=x-1; q >= 0; --q) {
4140                  cur[q*2+1] = 255;
4141                  cur[q*2+0] = cur[q];
4142               }
4143            } else {
4144               STBI_ASSERT(img_n == 3);
4145               for (q=x-1; q >= 0; --q) {
4146                  cur[q*4+3] = 255;
4147                  cur[q*4+2] = cur[q*3+2];
4148                  cur[q*4+1] = cur[q*3+1];
4149                  cur[q*4+0] = cur[q*3+0];
4150               }
4151            }
4152         }
4153      }
4154   }
4155
4156   return 1;
4157}
4158
4159static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
4160{
4161   stbi_uc *final;
4162   int p;
4163   if (!interlaced)
4164      return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4165
4166   // de-interlacing
4167   final = (stbi_uc *) stbi__malloc(a->s->img_x * a->s->img_y * out_n);
4168   for (p=0; p < 7; ++p) {
4169      int xorig[] = { 0,4,0,2,0,1,0 };
4170      int yorig[] = { 0,0,4,0,2,0,1 };
4171      int xspc[]  = { 8,8,4,4,2,2,1 };
4172      int yspc[]  = { 8,8,8,4,4,2,2 };
4173      int i,j,x,y;
4174      // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4175      x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
4176      y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
4177      if (x && y) {
4178         stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4179         if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
4180            STBI_FREE(final);
4181            return 0;
4182         }
4183         for (j=0; j < y; ++j) {
4184            for (i=0; i < x; ++i) {
4185               int out_y = j*yspc[p]+yorig[p];
4186               int out_x = i*xspc[p]+xorig[p];
4187               memcpy(final + out_y*a->s->img_x*out_n + out_x*out_n,
4188                      a->out + (j*x+i)*out_n, out_n);
4189            }
4190         }
4191         STBI_FREE(a->out);
4192         image_data += img_len;
4193         image_data_len -= img_len;
4194      }
4195   }
4196   a->out = final;
4197
4198   return 1;
4199}
4200
4201static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
4202{
4203   stbi__context *s = z->s;
4204   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4205   stbi_uc *p = z->out;
4206
4207   // compute color-based transparency, assuming we've
4208   // already got 255 as the alpha value in the output
4209   STBI_ASSERT(out_n == 2 || out_n == 4);
4210
4211   if (out_n == 2) {
4212      for (i=0; i < pixel_count; ++i) {
4213         p[1] = (p[0] == tc[0] ? 0 : 255);
4214         p += 2;
4215      }
4216   } else {
4217      for (i=0; i < pixel_count; ++i) {
4218         if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4219            p[3] = 0;
4220         p += 4;
4221      }
4222   }
4223   return 1;
4224}
4225
4226static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
4227{
4228   stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4229   stbi_uc *p, *temp_out, *orig = a->out;
4230
4231   p = (stbi_uc *) stbi__malloc(pixel_count * pal_img_n);
4232   if (p == NULL) return stbi__err("outofmem", "Out of memory");
4233
4234   // between here and free(out) below, exitting would leak
4235   temp_out = p;
4236
4237   if (pal_img_n == 3) {
4238      for (i=0; i < pixel_count; ++i) {
4239         int n = orig[i]*4;
4240         p[0] = palette[n  ];
4241         p[1] = palette[n+1];
4242         p[2] = palette[n+2];
4243         p += 3;
4244      }
4245   } else {
4246      for (i=0; i < pixel_count; ++i) {
4247         int n = orig[i]*4;
4248         p[0] = palette[n  ];
4249         p[1] = palette[n+1];
4250         p[2] = palette[n+2];
4251         p[3] = palette[n+3];
4252         p += 4;
4253      }
4254   }
4255   STBI_FREE(a->out);
4256   a->out = temp_out;
4257
4258   STBI_NOTUSED(len);
4259
4260   return 1;
4261}
4262
4263static int stbi__unpremultiply_on_load = 0;
4264static int stbi__de_iphone_flag = 0;
4265
4266STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
4267{
4268   stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply;
4269}
4270
4271STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
4272{
4273   stbi__de_iphone_flag = flag_true_if_should_convert;
4274}
4275
4276static void stbi__de_iphone(stbi__png *z)
4277{
4278   stbi__context *s = z->s;
4279   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4280   stbi_uc *p = z->out;
4281
4282   if (s->img_out_n == 3) {  // convert bgr to rgb
4283      for (i=0; i < pixel_count; ++i) {
4284         stbi_uc t = p[0];
4285         p[0] = p[2];
4286         p[2] = t;
4287         p += 3;
4288      }
4289   } else {
4290      STBI_ASSERT(s->img_out_n == 4);
4291      if (stbi__unpremultiply_on_load) {
4292         // convert bgr to rgb and unpremultiply
4293         for (i=0; i < pixel_count; ++i) {
4294            stbi_uc a = p[3];
4295            stbi_uc t = p[0];
4296            if (a) {
4297               p[0] = p[2] * 255 / a;
4298               p[1] = p[1] * 255 / a;
4299               p[2] =  t   * 255 / a;
4300            } else {
4301               p[0] = p[2];
4302               p[2] = t;
4303            }
4304            p += 4;
4305         }
4306      } else {
4307         // convert bgr to rgb
4308         for (i=0; i < pixel_count; ++i) {
4309            stbi_uc t = p[0];
4310            p[0] = p[2];
4311            p[2] = t;
4312            p += 4;
4313         }
4314      }
4315   }
4316}
4317
4318#define STBI__PNG_TYPE(a,b,c,d)  (((a) << 24) + ((b) << 16) + ((c) << 8) + (d))
4319
4320static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
4321{
4322   stbi_uc palette[1024], pal_img_n=0;
4323   stbi_uc has_trans=0, tc[3];
4324   stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
4325   int first=1,k,interlace=0, color=0, depth=0, is_iphone=0;
4326   stbi__context *s = z->s;
4327
4328   z->expanded = NULL;
4329   z->idata = NULL;
4330   z->out = NULL;
4331
4332   if (!stbi__check_png_header(s)) return 0;
4333
4334   if (scan == STBI__SCAN_type) return 1;
4335
4336   for (;;) {
4337      stbi__pngchunk c = stbi__get_chunk_header(s);
4338      switch (c.type) {
4339         case STBI__PNG_TYPE('C','g','B','I'):
4340            is_iphone = 1;
4341            stbi__skip(s, c.length);
4342            break;
4343         case STBI__PNG_TYPE('I','H','D','R'): {
4344            int comp,filter;
4345            if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
4346            first = 0;
4347            if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
4348            s->img_x = stbi__get32be(s); if (s->img_x > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4349            s->img_y = stbi__get32be(s); if (s->img_y > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4350            depth = stbi__get8(s);  if (depth != 1 && depth != 2 && depth != 4 && depth != 8)  return stbi__err("1/2/4/8-bit only","PNG not supported: 1/2/4/8-bit only");
4351            color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
4352            if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
4353            comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
4354            filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
4355            interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
4356            if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
4357            if (!pal_img_n) {
4358               s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
4359               if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
4360               if (scan == STBI__SCAN_header) return 1;
4361            } else {
4362               // if paletted, then pal_n is our final components, and
4363               // img_n is # components to decompress/filter.
4364               s->img_n = 1;
4365               if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
4366               // if SCAN_header, have to scan to see if we have a tRNS
4367            }
4368            break;
4369         }
4370
4371         case STBI__PNG_TYPE('P','L','T','E'):  {
4372            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4373            if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
4374            pal_len = c.length / 3;
4375            if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
4376            for (i=0; i < pal_len; ++i) {
4377               palette[i*4+0] = stbi__get8(s);
4378               palette[i*4+1] = stbi__get8(s);
4379               palette[i*4+2] = stbi__get8(s);
4380               palette[i*4+3] = 255;
4381            }
4382            break;
4383         }
4384
4385         case STBI__PNG_TYPE('t','R','N','S'): {
4386            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4387            if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
4388            if (pal_img_n) {
4389               if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
4390               if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
4391               if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
4392               pal_img_n = 4;
4393               for (i=0; i < c.length; ++i)
4394                  palette[i*4+3] = stbi__get8(s);
4395            } else {
4396               if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
4397               if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
4398               has_trans = 1;
4399               for (k=0; k < s->img_n; ++k)
4400                  tc[k] = (stbi_uc) (stbi__get16be(s) & 255) * stbi__depth_scale_table[depth]; // non 8-bit images will be larger
4401            }
4402            break;
4403         }
4404
4405         case STBI__PNG_TYPE('I','D','A','T'): {
4406            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4407            if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
4408            if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
4409            if ((int)(ioff + c.length) < (int)ioff) return 0;
4410            if (ioff + c.length > idata_limit) {
4411               stbi_uc *p;
4412               if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
4413               while (ioff + c.length > idata_limit)
4414                  idata_limit *= 2;
4415               p = (stbi_uc *) STBI_REALLOC(z->idata, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
4416               z->idata = p;
4417            }
4418            if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
4419            ioff += c.length;
4420            break;
4421         }
4422
4423         case STBI__PNG_TYPE('I','E','N','D'): {
4424            stbi__uint32 raw_len, bpl;
4425            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4426            if (scan != STBI__SCAN_load) return 1;
4427            if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
4428            // initial guess for decoded data size to avoid unnecessary reallocs
4429            bpl = (s->img_x * depth + 7) / 8; // bytes per line, per component
4430            raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
4431            z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
4432            if (z->expanded == NULL) return 0; // zlib should set error
4433            STBI_FREE(z->idata); z->idata = NULL;
4434            if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
4435               s->img_out_n = s->img_n+1;
4436            else
4437               s->img_out_n = s->img_n;
4438            if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, depth, color, interlace)) return 0;
4439            if (has_trans)
4440               if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
4441            if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
4442               stbi__de_iphone(z);
4443            if (pal_img_n) {
4444               // pal_img_n == 3 or 4
4445               s->img_n = pal_img_n; // record the actual colors we had
4446               s->img_out_n = pal_img_n;
4447               if (req_comp >= 3) s->img_out_n = req_comp;
4448               if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
4449                  return 0;
4450            }
4451            STBI_FREE(z->expanded); z->expanded = NULL;
4452            return 1;
4453         }
4454
4455         default:
4456            // if critical, fail
4457            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4458            if ((c.type & (1 << 29)) == 0) {
4459               #ifndef STBI_NO_FAILURE_STRINGS
4460               // not threadsafe
4461               static char invalid_chunk[] = "XXXX PNG chunk not known";
4462               invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
4463               invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
4464               invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
4465               invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
4466               #endif
4467               return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
4468            }
4469            stbi__skip(s, c.length);
4470            break;
4471      }
4472      // end of PNG chunk, read and skip CRC
4473      stbi__get32be(s);
4474   }
4475}
4476
4477static unsigned char *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp)
4478{
4479   unsigned char *result=NULL;
4480   if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
4481   if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
4482      result = p->out;
4483      p->out = NULL;
4484      if (req_comp && req_comp != p->s->img_out_n) {
4485         result = stbi__convert_format(result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
4486         p->s->img_out_n = req_comp;
4487         if (result == NULL) return result;
4488      }
4489      *x = p->s->img_x;
4490      *y = p->s->img_y;
4491      if (n) *n = p->s->img_out_n;
4492   }
4493   STBI_FREE(p->out);      p->out      = NULL;
4494   STBI_FREE(p->expanded); p->expanded = NULL;
4495   STBI_FREE(p->idata);    p->idata    = NULL;
4496
4497   return result;
4498}
4499
4500static unsigned char *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4501{
4502   stbi__png p;
4503   p.s = s;
4504   return stbi__do_png(&p, x,y,comp,req_comp);
4505}
4506
4507static int stbi__png_test(stbi__context *s)
4508{
4509   int r;
4510   r = stbi__check_png_header(s);
4511   stbi__rewind(s);
4512   return r;
4513}
4514
4515static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
4516{
4517   if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
4518      stbi__rewind( p->s );
4519      return 0;
4520   }
4521   if (x) *x = p->s->img_x;
4522   if (y) *y = p->s->img_y;
4523   if (comp) *comp = p->s->img_n;
4524   return 1;
4525}
4526
4527static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
4528{
4529   stbi__png p;
4530   p.s = s;
4531   return stbi__png_info_raw(&p, x, y, comp);
4532}
4533#endif
4534
4535// Microsoft/Windows BMP image
4536
4537#ifndef STBI_NO_BMP
4538static int stbi__bmp_test_raw(stbi__context *s)
4539{
4540   int r;
4541   int sz;
4542   if (stbi__get8(s) != 'B') return 0;
4543   if (stbi__get8(s) != 'M') return 0;
4544   stbi__get32le(s); // discard filesize
4545   stbi__get16le(s); // discard reserved
4546   stbi__get16le(s); // discard reserved
4547   stbi__get32le(s); // discard data offset
4548   sz = stbi__get32le(s);
4549   r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
4550   return r;
4551}
4552
4553static int stbi__bmp_test(stbi__context *s)
4554{
4555   int r = stbi__bmp_test_raw(s);
4556   stbi__rewind(s);
4557   return r;
4558}
4559
4560
4561// returns 0..31 for the highest set bit
4562static int stbi__high_bit(unsigned int z)
4563{
4564   int n=0;
4565   if (z == 0) return -1;
4566   if (z >= 0x10000) n += 16, z >>= 16;
4567   if (z >= 0x00100) n +=  8, z >>=  8;
4568   if (z >= 0x00010) n +=  4, z >>=  4;
4569   if (z >= 0x00004) n +=  2, z >>=  2;
4570   if (z >= 0x00002) n +=  1, z >>=  1;
4571   return n;
4572}
4573
4574static int stbi__bitcount(unsigned int a)
4575{
4576   a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
4577   a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
4578   a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
4579   a = (a + (a >> 8)); // max 16 per 8 bits
4580   a = (a + (a >> 16)); // max 32 per 8 bits
4581   return a & 0xff;
4582}
4583
4584static int stbi__shiftsigned(int v, int shift, int bits)
4585{
4586   int result;
4587   int z=0;
4588
4589   if (shift < 0) v <<= -shift;
4590   else v >>= shift;
4591   result = v;
4592
4593   z = bits;
4594   while (z < 8) {
4595      result += v >> z;
4596      z += bits;
4597   }
4598   return result;
4599}
4600
4601static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4602{
4603   stbi_uc *out;
4604   unsigned int mr=0,mg=0,mb=0,ma=0, all_a=255;
4605   stbi_uc pal[256][4];
4606   int psize=0,i,j,compress=0,width;
4607   int bpp, flip_vertically, pad, target, offset, hsz;
4608   if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
4609   stbi__get32le(s); // discard filesize
4610   stbi__get16le(s); // discard reserved
4611   stbi__get16le(s); // discard reserved
4612   offset = stbi__get32le(s);
4613   hsz = stbi__get32le(s);
4614   if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
4615   if (hsz == 12) {
4616      s->img_x = stbi__get16le(s);
4617      s->img_y = stbi__get16le(s);
4618   } else {
4619      s->img_x = stbi__get32le(s);
4620      s->img_y = stbi__get32le(s);
4621   }
4622   if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
4623   bpp = stbi__get16le(s);
4624   if (bpp == 1) return stbi__errpuc("monochrome", "BMP type not supported: 1-bit");
4625   flip_vertically = ((int) s->img_y) > 0;
4626   s->img_y = abs((int) s->img_y);
4627   if (hsz == 12) {
4628      if (bpp < 24)
4629         psize = (offset - 14 - 24) / 3;
4630   } else {
4631      compress = stbi__get32le(s);
4632      if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
4633      stbi__get32le(s); // discard sizeof
4634      stbi__get32le(s); // discard hres
4635      stbi__get32le(s); // discard vres
4636      stbi__get32le(s); // discard colorsused
4637      stbi__get32le(s); // discard max important
4638      if (hsz == 40 || hsz == 56) {
4639         if (hsz == 56) {
4640            stbi__get32le(s);
4641            stbi__get32le(s);
4642            stbi__get32le(s);
4643            stbi__get32le(s);
4644         }
4645         if (bpp == 16 || bpp == 32) {
4646            mr = mg = mb = 0;
4647            if (compress == 0) {
4648               if (bpp == 32) {
4649                  mr = 0xffu << 16;
4650                  mg = 0xffu <<  8;
4651                  mb = 0xffu <<  0;
4652                  ma = 0xffu << 24;
4653                  all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
4654               } else {
4655                  mr = 31u << 10;
4656                  mg = 31u <<  5;
4657                  mb = 31u <<  0;
4658               }
4659            } else if (compress == 3) {
4660               mr = stbi__get32le(s);
4661               mg = stbi__get32le(s);
4662               mb = stbi__get32le(s);
4663               // not documented, but generated by photoshop and handled by mspaint
4664               if (mr == mg && mg == mb) {
4665                  // ?!?!?
4666                  return stbi__errpuc("bad BMP", "bad BMP");
4667               }
4668            } else
4669               return stbi__errpuc("bad BMP", "bad BMP");
4670         }
4671      } else {
4672         STBI_ASSERT(hsz == 108 || hsz == 124);
4673         mr = stbi__get32le(s);
4674         mg = stbi__get32le(s);
4675         mb = stbi__get32le(s);
4676         ma = stbi__get32le(s);
4677         stbi__get32le(s); // discard color space
4678         for (i=0; i < 12; ++i)
4679            stbi__get32le(s); // discard color space parameters
4680         if (hsz == 124) {
4681            stbi__get32le(s); // discard rendering intent
4682            stbi__get32le(s); // discard offset of profile data
4683            stbi__get32le(s); // discard size of profile data
4684            stbi__get32le(s); // discard reserved
4685         }
4686      }
4687      if (bpp < 16)
4688         psize = (offset - 14 - hsz) >> 2;
4689   }
4690   s->img_n = ma ? 4 : 3;
4691   if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
4692      target = req_comp;
4693   else
4694      target = s->img_n; // if they want monochrome, we'll post-convert
4695   out = (stbi_uc *) stbi__malloc(target * s->img_x * s->img_y);
4696   if (!out) return stbi__errpuc("outofmem", "Out of memory");
4697   if (bpp < 16) {
4698      int z=0;
4699      if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
4700      for (i=0; i < psize; ++i) {
4701         pal[i][2] = stbi__get8(s);
4702         pal[i][1] = stbi__get8(s);
4703         pal[i][0] = stbi__get8(s);
4704         if (hsz != 12) stbi__get8(s);
4705         pal[i][3] = 255;
4706      }
4707      stbi__skip(s, offset - 14 - hsz - psize * (hsz == 12 ? 3 : 4));
4708      if (bpp == 4) width = (s->img_x + 1) >> 1;
4709      else if (bpp == 8) width = s->img_x;
4710      else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
4711      pad = (-width)&3;
4712      for (j=0; j < (int) s->img_y; ++j) {
4713         for (i=0; i < (int) s->img_x; i += 2) {
4714            int v=stbi__get8(s),v2=0;
4715            if (bpp == 4) {
4716               v2 = v & 15;
4717               v >>= 4;
4718            }
4719            out[z++] = pal[v][0];
4720            out[z++] = pal[v][1];
4721            out[z++] = pal[v][2];
4722            if (target == 4) out[z++] = 255;
4723            if (i+1 == (int) s->img_x) break;
4724            v = (bpp == 8) ? stbi__get8(s) : v2;
4725            out[z++] = pal[v][0];
4726            out[z++] = pal[v][1];
4727            out[z++] = pal[v][2];
4728            if (target == 4) out[z++] = 255;
4729         }
4730         stbi__skip(s, pad);
4731      }
4732   } else {
4733      int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
4734      int z = 0;
4735      int easy=0;
4736      stbi__skip(s, offset - 14 - hsz);
4737      if (bpp == 24) width = 3 * s->img_x;
4738      else if (bpp == 16) width = 2*s->img_x;
4739      else /* bpp = 32 and pad = 0 */ width=0;
4740      pad = (-width) & 3;
4741      if (bpp == 24) {
4742         easy = 1;
4743      } else if (bpp == 32) {
4744         if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
4745            easy = 2;
4746      }
4747      if (!easy) {
4748         if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
4749         // right shift amt to put high bit in position #7
4750         rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
4751         gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
4752         bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
4753         ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
4754      }
4755      for (j=0; j < (int) s->img_y; ++j) {
4756         if (easy) {
4757            for (i=0; i < (int) s->img_x; ++i) {
4758               unsigned char a;
4759               out[z+2] = stbi__get8(s);
4760               out[z+1] = stbi__get8(s);
4761               out[z+0] = stbi__get8(s);
4762               z += 3;
4763               a = (easy == 2 ? stbi__get8(s) : 255);
4764               all_a |= a;
4765               if (target == 4) out[z++] = a;
4766            }
4767         } else {
4768            for (i=0; i < (int) s->img_x; ++i) {
4769               stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
4770               int a;
4771               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
4772               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
4773               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
4774               a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
4775               all_a |= a;
4776               if (target == 4) out[z++] = STBI__BYTECAST(a);
4777            }
4778         }
4779         stbi__skip(s, pad);
4780      }
4781   }
4782
4783   // if alpha channel is all 0s, replace with all 255s
4784   if (target == 4 && all_a == 0)
4785      for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
4786         out[i] = 255;
4787
4788   if (flip_vertically) {
4789      stbi_uc t;
4790      for (j=0; j < (int) s->img_y>>1; ++j) {
4791         stbi_uc *p1 = out +      j     *s->img_x*target;
4792         stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
4793         for (i=0; i < (int) s->img_x*target; ++i) {
4794            t = p1[i], p1[i] = p2[i], p2[i] = t;
4795         }
4796      }
4797   }
4798
4799   if (req_comp && req_comp != target) {
4800      out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
4801      if (out == NULL) return out; // stbi__convert_format frees input on failure
4802   }
4803
4804   *x = s->img_x;
4805   *y = s->img_y;
4806   if (comp) *comp = s->img_n;
4807   return out;
4808}
4809#endif
4810
4811// Targa Truevision - TGA
4812// by Jonathan Dummer
4813#ifndef STBI_NO_TGA
4814static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
4815{
4816    int tga_w, tga_h, tga_comp;
4817    int sz;
4818    stbi__get8(s);                   // discard Offset
4819    sz = stbi__get8(s);              // color type
4820    if( sz > 1 ) {
4821        stbi__rewind(s);
4822        return 0;      // only RGB or indexed allowed
4823    }
4824    sz = stbi__get8(s);              // image type
4825    // only RGB or grey allowed, +/- RLE
4826    if ((sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11)) return 0;
4827    stbi__skip(s,9);
4828    tga_w = stbi__get16le(s);
4829    if( tga_w < 1 ) {
4830        stbi__rewind(s);
4831        return 0;   // test width
4832    }
4833    tga_h = stbi__get16le(s);
4834    if( tga_h < 1 ) {
4835        stbi__rewind(s);
4836        return 0;   // test height
4837    }
4838    sz = stbi__get8(s);               // bits per pixel
4839    // only RGB or RGBA or grey allowed
4840    if ((sz != 8) && (sz != 16) && (sz != 24) && (sz != 32)) {
4841        stbi__rewind(s);
4842        return 0;
4843    }
4844    tga_comp = sz;
4845    if (x) *x = tga_w;
4846    if (y) *y = tga_h;
4847    if (comp) *comp = tga_comp / 8;
4848    return 1;                   // seems to have passed everything
4849}
4850
4851static int stbi__tga_test(stbi__context *s)
4852{
4853   int res;
4854   int sz;
4855   stbi__get8(s);      //   discard Offset
4856   sz = stbi__get8(s);   //   color type
4857   if ( sz > 1 ) return 0;   //   only RGB or indexed allowed
4858   sz = stbi__get8(s);   //   image type
4859   if ( (sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11) ) return 0;   //   only RGB or grey allowed, +/- RLE
4860   stbi__get16be(s);      //   discard palette start
4861   stbi__get16be(s);      //   discard palette length
4862   stbi__get8(s);         //   discard bits per palette color entry
4863   stbi__get16be(s);      //   discard x origin
4864   stbi__get16be(s);      //   discard y origin
4865   if ( stbi__get16be(s) < 1 ) return 0;      //   test width
4866   if ( stbi__get16be(s) < 1 ) return 0;      //   test height
4867   sz = stbi__get8(s);   //   bits per pixel
4868   if ( (sz != 8) && (sz != 16) && (sz != 24) && (sz != 32) )
4869      res = 0;
4870   else
4871      res = 1;
4872   stbi__rewind(s);
4873   return res;
4874}
4875
4876static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4877{
4878   //   read in the TGA header stuff
4879   int tga_offset = stbi__get8(s);
4880   int tga_indexed = stbi__get8(s);
4881   int tga_image_type = stbi__get8(s);
4882   int tga_is_RLE = 0;
4883   int tga_palette_start = stbi__get16le(s);
4884   int tga_palette_len = stbi__get16le(s);
4885   int tga_palette_bits = stbi__get8(s);
4886   int tga_x_origin = stbi__get16le(s);
4887   int tga_y_origin = stbi__get16le(s);
4888   int tga_width = stbi__get16le(s);
4889   int tga_height = stbi__get16le(s);
4890   int tga_bits_per_pixel = stbi__get8(s);
4891   int tga_comp = tga_bits_per_pixel / 8;
4892   int tga_inverted = stbi__get8(s);
4893   //   image data
4894   unsigned char *tga_data;
4895   unsigned char *tga_palette = NULL;
4896   int i, j;
4897   unsigned char raw_data[4];
4898   int RLE_count = 0;
4899   int RLE_repeating = 0;
4900   int read_next_pixel = 1;
4901
4902   //   do a tiny bit of precessing
4903   if ( tga_image_type >= 8 )
4904   {
4905      tga_image_type -= 8;
4906      tga_is_RLE = 1;
4907   }
4908   /* int tga_alpha_bits = tga_inverted & 15; */
4909   tga_inverted = 1 - ((tga_inverted >> 5) & 1);
4910
4911   //   error check
4912   if ( //(tga_indexed) ||
4913      (tga_width < 1) || (tga_height < 1) ||
4914      (tga_image_type < 1) || (tga_image_type > 3) ||
4915      ((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16) &&
4916      (tga_bits_per_pixel != 24) && (tga_bits_per_pixel != 32))
4917      )
4918   {
4919      return NULL; // we don't report this as a bad TGA because we don't even know if it's TGA
4920   }
4921
4922   //   If I'm paletted, then I'll use the number of bits from the palette
4923   if ( tga_indexed )
4924   {
4925      tga_comp = tga_palette_bits / 8;
4926   }
4927
4928   //   tga info
4929   *x = tga_width;
4930   *y = tga_height;
4931   if (comp) *comp = tga_comp;
4932
4933   tga_data = (unsigned char*)stbi__malloc( (size_t)tga_width * tga_height * tga_comp );
4934   if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
4935
4936   // skip to the data's starting position (offset usually = 0)
4937   stbi__skip(s, tga_offset );
4938
4939   if ( !tga_indexed && !tga_is_RLE) {
4940      for (i=0; i < tga_height; ++i) {
4941         int row = tga_inverted ? tga_height -i - 1 : i;
4942         stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
4943         stbi__getn(s, tga_row, tga_width * tga_comp);
4944      }
4945   } else  {
4946      //   do I need to load a palette?
4947      if ( tga_indexed)
4948      {
4949         //   any data to skip? (offset usually = 0)
4950         stbi__skip(s, tga_palette_start );
4951         //   load the palette
4952         tga_palette = (unsigned char*)stbi__malloc( tga_palette_len * tga_palette_bits / 8 );
4953         if (!tga_palette) {
4954            STBI_FREE(tga_data);
4955            return stbi__errpuc("outofmem", "Out of memory");
4956         }
4957         if (!stbi__getn(s, tga_palette, tga_palette_len * tga_palette_bits / 8 )) {
4958            STBI_FREE(tga_data);
4959            STBI_FREE(tga_palette);
4960            return stbi__errpuc("bad palette", "Corrupt TGA");
4961         }
4962      }
4963      //   load the data
4964      for (i=0; i < tga_width * tga_height; ++i)
4965      {
4966         //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
4967         if ( tga_is_RLE )
4968         {
4969            if ( RLE_count == 0 )
4970            {
4971               //   yep, get the next byte as a RLE command
4972               int RLE_cmd = stbi__get8(s);
4973               RLE_count = 1 + (RLE_cmd & 127);
4974               RLE_repeating = RLE_cmd >> 7;
4975               read_next_pixel = 1;
4976            } else if ( !RLE_repeating )
4977            {
4978               read_next_pixel = 1;
4979            }
4980         } else
4981         {
4982            read_next_pixel = 1;
4983         }
4984         //   OK, if I need to read a pixel, do it now
4985         if ( read_next_pixel )
4986         {
4987            //   load however much data we did have
4988            if ( tga_indexed )
4989            {
4990               //   read in 1 byte, then perform the lookup
4991               int pal_idx = stbi__get8(s);
4992               if ( pal_idx >= tga_palette_len )
4993               {
4994                  //   invalid index
4995                  pal_idx = 0;
4996               }
4997               pal_idx *= tga_bits_per_pixel / 8;
4998               for (j = 0; j*8 < tga_bits_per_pixel; ++j)
4999               {
5000                  raw_data[j] = tga_palette[pal_idx+j];
5001               }
5002            } else
5003            {
5004               //   read in the data raw
5005               for (j = 0; j*8 < tga_bits_per_pixel; ++j)
5006               {
5007                  raw_data[j] = stbi__get8(s);
5008               }
5009            }
5010            //   clear the reading flag for the next pixel
5011            read_next_pixel = 0;
5012         } // end of reading a pixel
5013
5014         // copy data
5015         for (j = 0; j < tga_comp; ++j)
5016           tga_data[i*tga_comp+j] = raw_data[j];
5017
5018         //   in case we're in RLE mode, keep counting down
5019         --RLE_count;
5020      }
5021      //   do I need to invert the image?
5022      if ( tga_inverted )
5023      {
5024         for (j = 0; j*2 < tga_height; ++j)
5025         {
5026            int index1 = j * tga_width * tga_comp;
5027            int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
5028            for (i = tga_width * tga_comp; i > 0; --i)
5029            {
5030               unsigned char temp = tga_data[index1];
5031               tga_data[index1] = tga_data[index2];
5032               tga_data[index2] = temp;
5033               ++index1;
5034               ++index2;
5035            }
5036         }
5037      }
5038      //   clear my palette, if I had one
5039      if ( tga_palette != NULL )
5040      {
5041         STBI_FREE( tga_palette );
5042      }
5043   }
5044
5045   // swap RGB
5046   if (tga_comp >= 3)
5047   {
5048      unsigned char* tga_pixel = tga_data;
5049      for (i=0; i < tga_width * tga_height; ++i)
5050      {
5051         unsigned char temp = tga_pixel[0];
5052         tga_pixel[0] = tga_pixel[2];
5053         tga_pixel[2] = temp;
5054         tga_pixel += tga_comp;
5055      }
5056   }
5057
5058   // convert to target component count
5059   if (req_comp && req_comp != tga_comp)
5060      tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
5061
5062   //   the things I do to get rid of an error message, and yet keep
5063   //   Microsoft's C compilers happy... [8^(
5064   tga_palette_start = tga_palette_len = tga_palette_bits =
5065         tga_x_origin = tga_y_origin = 0;
5066   //   OK, done
5067   return tga_data;
5068}
5069#endif
5070
5071// *************************************************************************************************
5072// Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
5073
5074#ifndef STBI_NO_PSD
5075static int stbi__psd_test(stbi__context *s)
5076{
5077   int r = (stbi__get32be(s) == 0x38425053);
5078   stbi__rewind(s);
5079   return r;
5080}
5081
5082static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5083{
5084   int   pixelCount;
5085   int channelCount, compression;
5086   int channel, i, count, len;
5087   int bitdepth;
5088   int w,h;
5089   stbi_uc *out;
5090
5091   // Check identifier
5092   if (stbi__get32be(s) != 0x38425053)   // "8BPS"
5093      return stbi__errpuc("not PSD", "Corrupt PSD image");
5094
5095   // Check file type version.
5096   if (stbi__get16be(s) != 1)
5097      return stbi__errpuc("wrong version", "Unsupported version of PSD image");
5098
5099   // Skip 6 reserved bytes.
5100   stbi__skip(s, 6 );
5101
5102   // Read the number of channels (R, G, B, A, etc).
5103   channelCount = stbi__get16be(s);
5104   if (channelCount < 0 || channelCount > 16)
5105      return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
5106
5107   // Read the rows and columns of the image.
5108   h = stbi__get32be(s);
5109   w = stbi__get32be(s);
5110
5111   // Make sure the depth is 8 bits.
5112   bitdepth = stbi__get16be(s);
5113   if (bitdepth != 8 && bitdepth != 16)
5114      return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
5115
5116   // Make sure the color mode is RGB.
5117   // Valid options are:
5118   //   0: Bitmap
5119   //   1: Grayscale
5120   //   2: Indexed color
5121   //   3: RGB color
5122   //   4: CMYK color
5123   //   7: Multichannel
5124   //   8: Duotone
5125   //   9: Lab color
5126   if (stbi__get16be(s) != 3)
5127      return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
5128
5129   // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
5130   stbi__skip(s,stbi__get32be(s) );
5131
5132   // Skip the image resources.  (resolution, pen tool paths, etc)
5133   stbi__skip(s, stbi__get32be(s) );
5134
5135   // Skip the reserved data.
5136   stbi__skip(s, stbi__get32be(s) );
5137
5138   // Find out if the data is compressed.
5139   // Known values:
5140   //   0: no compression
5141   //   1: RLE compressed
5142   compression = stbi__get16be(s);
5143   if (compression > 1)
5144      return stbi__errpuc("bad compression", "PSD has an unknown compression format");
5145
5146   // Create the destination image.
5147   out = (stbi_uc *) stbi__malloc(4 * w*h);
5148   if (!out) return stbi__errpuc("outofmem", "Out of memory");
5149   pixelCount = w*h;
5150
5151   // Initialize the data to zero.
5152   //memset( out, 0, pixelCount * 4 );
5153
5154   // Finally, the image data.
5155   if (compression) {
5156      // RLE as used by .PSD and .TIFF
5157      // Loop until you get the number of unpacked bytes you are expecting:
5158      //     Read the next source byte into n.
5159      //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
5160      //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
5161      //     Else if n is 128, noop.
5162      // Endloop
5163
5164      // The RLE-compressed data is preceeded by a 2-byte data count for each row in the data,
5165      // which we're going to just skip.
5166      stbi__skip(s, h * channelCount * 2 );
5167
5168      // Read the RLE data by channel.
5169      for (channel = 0; channel < 4; channel++) {
5170         stbi_uc *p;
5171
5172         p = out+channel;
5173         if (channel >= channelCount) {
5174            // Fill this channel with default data.
5175            for (i = 0; i < pixelCount; i++, p += 4)
5176               *p = (channel == 3 ? 255 : 0);
5177         } else {
5178            // Read the RLE data.
5179            count = 0;
5180            while (count < pixelCount) {
5181               len = stbi__get8(s);
5182               if (len == 128) {
5183                  // No-op.
5184               } else if (len < 128) {
5185                  // Copy next len+1 bytes literally.
5186                  len++;
5187                  count += len;
5188                  while (len) {
5189                     *p = stbi__get8(s);
5190                     p += 4;
5191                     len--;
5192                  }
5193               } else if (len > 128) {
5194                  stbi_uc   val;
5195                  // Next -len+1 bytes in the dest are replicated from next source byte.
5196                  // (Interpret len as a negative 8-bit int.)
5197                  len ^= 0x0FF;
5198                  len += 2;
5199                  val = stbi__get8(s);
5200                  count += len;
5201                  while (len) {
5202                     *p = val;
5203                     p += 4;
5204                     len--;
5205                  }
5206               }
5207            }
5208         }
5209      }
5210
5211   } else {
5212      // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
5213      // where each channel consists of an 8-bit value for each pixel in the image.
5214
5215      // Read the data by channel.
5216      for (channel = 0; channel < 4; channel++) {
5217         stbi_uc *p;
5218
5219         p = out + channel;
5220         if (channel >= channelCount) {
5221            // Fill this channel with default data.
5222            stbi_uc val = channel == 3 ? 255 : 0;
5223            for (i = 0; i < pixelCount; i++, p += 4)
5224               *p = val;
5225         } else {
5226            // Read the data.
5227            if (bitdepth == 16) {
5228               for (i = 0; i < pixelCount; i++, p += 4)
5229                  *p = (stbi_uc) (stbi__get16be(s) >> 8);
5230            } else {
5231               for (i = 0; i < pixelCount; i++, p += 4)
5232                  *p = stbi__get8(s);
5233            }
5234         }
5235      }
5236   }
5237
5238   if (req_comp && req_comp != 4) {
5239      out = stbi__convert_format(out, 4, req_comp, w, h);
5240      if (out == NULL) return out; // stbi__convert_format frees input on failure
5241   }
5242
5243   if (comp) *comp = 4;
5244   *y = h;
5245   *x = w;
5246
5247   return out;
5248}
5249#endif
5250
5251// *************************************************************************************************
5252// Softimage PIC loader
5253// by Tom Seddon
5254//
5255// See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
5256// See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
5257
5258#ifndef STBI_NO_PIC
5259static int stbi__pic_is4(stbi__context *s,const char *str)
5260{
5261   int i;
5262   for (i=0; i<4; ++i)
5263      if (stbi__get8(s) != (stbi_uc)str[i])
5264         return 0;
5265
5266   return 1;
5267}
5268
5269static int stbi__pic_test_core(stbi__context *s)
5270{
5271   int i;
5272
5273   if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
5274      return 0;
5275
5276   for(i=0;i<84;++i)
5277      stbi__get8(s);
5278
5279   if (!stbi__pic_is4(s,"PICT"))
5280      return 0;
5281
5282   return 1;
5283}
5284
5285typedef struct
5286{
5287   stbi_uc size,type,channel;
5288} stbi__pic_packet;
5289
5290static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
5291{
5292   int mask=0x80, i;
5293
5294   for (i=0; i<4; ++i, mask>>=1) {
5295      if (channel & mask) {
5296         if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
5297         dest[i]=stbi__get8(s);
5298      }
5299   }
5300
5301   return dest;
5302}
5303
5304static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
5305{
5306   int mask=0x80,i;
5307
5308   for (i=0;i<4; ++i, mask>>=1)
5309      if (channel&mask)
5310         dest[i]=src[i];
5311}
5312
5313static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
5314{
5315   int act_comp=0,num_packets=0,y,chained;
5316   stbi__pic_packet packets[10];
5317
5318   // this will (should...) cater for even some bizarre stuff like having data
5319    // for the same channel in multiple packets.
5320   do {
5321      stbi__pic_packet *packet;
5322
5323      if (num_packets==sizeof(packets)/sizeof(packets[0]))
5324         return stbi__errpuc("bad format","too many packets");
5325
5326      packet = &packets[num_packets++];
5327
5328      chained = stbi__get8(s);
5329      packet->size    = stbi__get8(s);
5330      packet->type    = stbi__get8(s);
5331      packet->channel = stbi__get8(s);
5332
5333      act_comp |= packet->channel;
5334
5335      if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
5336      if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
5337   } while (chained);
5338
5339   *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
5340
5341   for(y=0; y<height; ++y) {
5342      int packet_idx;
5343
5344      for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
5345         stbi__pic_packet *packet = &packets[packet_idx];
5346         stbi_uc *dest = result+y*width*4;
5347
5348         switch (packet->type) {
5349            default:
5350               return stbi__errpuc("bad format","packet has bad compression type");
5351
5352            case 0: {//uncompressed
5353               int x;
5354
5355               for(x=0;x<width;++x, dest+=4)
5356                  if (!stbi__readval(s,packet->channel,dest))
5357                     return 0;
5358               break;
5359            }
5360
5361            case 1://Pure RLE
5362               {
5363                  int left=width, i;
5364
5365                  while (left>0) {
5366                     stbi_uc count,value[4];
5367
5368                     count=stbi__get8(s);
5369                     if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
5370
5371                     if (count > left)
5372                        count = (stbi_uc) left;
5373
5374                     if (!stbi__readval(s,packet->channel,value))  return 0;
5375
5376                     for(i=0; i<count; ++i,dest+=4)
5377                        stbi__copyval(packet->channel,dest,value);
5378                     left -= count;
5379                  }
5380               }
5381               break;
5382
5383            case 2: {//Mixed RLE
5384               int left=width;
5385               while (left>0) {
5386                  int count = stbi__get8(s), i;
5387                  if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
5388
5389                  if (count >= 128) { // Repeated
5390                     stbi_uc value[4];
5391
5392                     if (count==128)
5393                        count = stbi__get16be(s);
5394                     else
5395                        count -= 127;
5396                     if (count > left)
5397                        return stbi__errpuc("bad file","scanline overrun");
5398
5399                     if (!stbi__readval(s,packet->channel,value))
5400                        return 0;
5401
5402                     for(i=0;i<count;++i, dest += 4)
5403                        stbi__copyval(packet->channel,dest,value);
5404                  } else { // Raw
5405                     ++count;
5406                     if (count>left) return stbi__errpuc("bad file","scanline overrun");
5407
5408                     for(i=0;i<count;++i, dest+=4)
5409                        if (!stbi__readval(s,packet->channel,dest))
5410                           return 0;
5411                  }
5412                  left-=count;
5413               }
5414               break;
5415            }
5416         }
5417      }
5418   }
5419
5420   return result;
5421}
5422
5423static stbi_uc *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp)
5424{
5425   stbi_uc *result;
5426   int i, x,y;
5427
5428   for (i=0; i<92; ++i)
5429      stbi__get8(s);
5430
5431   x = stbi__get16be(s);
5432   y = stbi__get16be(s);
5433   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
5434   if ((1 << 28) / x < y) return stbi__errpuc("too large", "Image too large to decode");
5435
5436   stbi__get32be(s); //skip `ratio'
5437   stbi__get16be(s); //skip `fields'
5438   stbi__get16be(s); //skip `pad'
5439
5440   // intermediate buffer is RGBA
5441   result = (stbi_uc *) stbi__malloc(x*y*4);
5442   memset(result, 0xff, x*y*4);
5443
5444   if (!stbi__pic_load_core(s,x,y,comp, result)) {
5445      STBI_FREE(result);
5446      result=0;
5447   }
5448   *px = x;
5449   *py = y;
5450   if (req_comp == 0) req_comp = *comp;
5451   result=stbi__convert_format(result,4,req_comp,x,y);
5452
5453   return result;
5454}
5455
5456static int stbi__pic_test(stbi__context *s)
5457{
5458   int r = stbi__pic_test_core(s);
5459   stbi__rewind(s);
5460   return r;
5461}
5462#endif
5463
5464// *************************************************************************************************
5465// GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
5466
5467#ifndef STBI_NO_GIF
5468typedef struct
5469{
5470   stbi__int16 prefix;
5471   stbi_uc first;
5472   stbi_uc suffix;
5473} stbi__gif_lzw;
5474
5475typedef struct
5476{
5477   int w,h;
5478   stbi_uc *out, *old_out;             // output buffer (always 4 components)
5479   int flags, bgindex, ratio, transparent, eflags, delay;
5480   stbi_uc  pal[256][4];
5481   stbi_uc lpal[256][4];
5482   stbi__gif_lzw codes[4096];
5483   stbi_uc *color_table;
5484   int parse, step;
5485   int lflags;
5486   int start_x, start_y;
5487   int max_x, max_y;
5488   int cur_x, cur_y;
5489   int line_size;
5490} stbi__gif;
5491
5492static int stbi__gif_test_raw(stbi__context *s)
5493{
5494   int sz;
5495   if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
5496   sz = stbi__get8(s);
5497   if (sz != '9' && sz != '7') return 0;
5498   if (stbi__get8(s) != 'a') return 0;
5499   return 1;
5500}
5501
5502static int stbi__gif_test(stbi__context *s)
5503{
5504   int r = stbi__gif_test_raw(s);
5505   stbi__rewind(s);
5506   return r;
5507}
5508
5509static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
5510{
5511   int i;
5512   for (i=0; i < num_entries; ++i) {
5513      pal[i][2] = stbi__get8(s);
5514      pal[i][1] = stbi__get8(s);
5515      pal[i][0] = stbi__get8(s);
5516      pal[i][3] = transp == i ? 0 : 255;
5517   }
5518}
5519
5520static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
5521{
5522   stbi_uc version;
5523   if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
5524      return stbi__err("not GIF", "Corrupt GIF");
5525
5526   version = stbi__get8(s);
5527   if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
5528   if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
5529
5530   stbi__g_failure_reason = "";
5531   g->w = stbi__get16le(s);
5532   g->h = stbi__get16le(s);
5533   g->flags = stbi__get8(s);
5534   g->bgindex = stbi__get8(s);
5535   g->ratio = stbi__get8(s);
5536   g->transparent = -1;
5537
5538   if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
5539
5540   if (is_info) return 1;
5541
5542   if (g->flags & 0x80)
5543      stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
5544
5545   return 1;
5546}
5547
5548static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
5549{
5550   stbi__gif g;
5551   if (!stbi__gif_header(s, &g, comp, 1)) {
5552      stbi__rewind( s );
5553      return 0;
5554   }
5555   if (x) *x = g.w;
5556   if (y) *y = g.h;
5557   return 1;
5558}
5559
5560static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
5561{
5562   stbi_uc *p, *c;
5563
5564   // recurse to decode the prefixes, since the linked-list is backwards,
5565   // and working backwards through an interleaved image would be nasty
5566   if (g->codes[code].prefix >= 0)
5567      stbi__out_gif_code(g, g->codes[code].prefix);
5568
5569   if (g->cur_y >= g->max_y) return;
5570
5571   p = &g->out[g->cur_x + g->cur_y];
5572   c = &g->color_table[g->codes[code].suffix * 4];
5573
5574   if (c[3] >= 128) {
5575      p[0] = c[2];
5576      p[1] = c[1];
5577      p[2] = c[0];
5578      p[3] = c[3];
5579   }
5580   g->cur_x += 4;
5581
5582   if (g->cur_x >= g->max_x) {
5583      g->cur_x = g->start_x;
5584      g->cur_y += g->step;
5585
5586      while (g->cur_y >= g->max_y && g->parse > 0) {
5587         g->step = (1 << g->parse) * g->line_size;
5588         g->cur_y = g->start_y + (g->step >> 1);
5589         --g->parse;
5590      }
5591   }
5592}
5593
5594static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
5595{
5596   stbi_uc lzw_cs;
5597   stbi__int32 len, init_code;
5598   stbi__uint32 first;
5599   stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
5600   stbi__gif_lzw *p;
5601
5602   lzw_cs = stbi__get8(s);
5603   if (lzw_cs > 12) return NULL;
5604   clear = 1 << lzw_cs;
5605   first = 1;
5606   codesize = lzw_cs + 1;
5607   codemask = (1 << codesize) - 1;
5608   bits = 0;
5609   valid_bits = 0;
5610   for (init_code = 0; init_code < clear; init_code++) {
5611      g->codes[init_code].prefix = -1;
5612      g->codes[init_code].first = (stbi_uc) init_code;
5613      g->codes[init_code].suffix = (stbi_uc) init_code;
5614   }
5615
5616   // support no starting clear code
5617   avail = clear+2;
5618   oldcode = -1;
5619
5620   len = 0;
5621   for(;;) {
5622      if (valid_bits < codesize) {
5623         if (len == 0) {
5624            len = stbi__get8(s); // start new block
5625            if (len == 0)
5626               return g->out;
5627         }
5628         --len;
5629         bits |= (stbi__int32) stbi__get8(s) << valid_bits;
5630         valid_bits += 8;
5631      } else {
5632         stbi__int32 code = bits & codemask;
5633         bits >>= codesize;
5634         valid_bits -= codesize;
5635         // @OPTIMIZE: is there some way we can accelerate the non-clear path?
5636         if (code == clear) {  // clear code
5637            codesize = lzw_cs + 1;
5638            codemask = (1 << codesize) - 1;
5639            avail = clear + 2;
5640            oldcode = -1;
5641            first = 0;
5642         } else if (code == clear + 1) { // end of stream code
5643            stbi__skip(s, len);
5644            while ((len = stbi__get8(s)) > 0)
5645               stbi__skip(s,len);
5646            return g->out;
5647         } else if (code <= avail) {
5648            if (first) return stbi__errpuc("no clear code", "Corrupt GIF");
5649
5650            if (oldcode >= 0) {
5651               p = &g->codes[avail++];
5652               if (avail > 4096)        return stbi__errpuc("too many codes", "Corrupt GIF");
5653               p->prefix = (stbi__int16) oldcode;
5654               p->first = g->codes[oldcode].first;
5655               p->suffix = (code == avail) ? p->first : g->codes[code].first;
5656            } else if (code == avail)
5657               return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5658
5659            stbi__out_gif_code(g, (stbi__uint16) code);
5660
5661            if ((avail & codemask) == 0 && avail <= 0x0FFF) {
5662               codesize++;
5663               codemask = (1 << codesize) - 1;
5664            }
5665
5666            oldcode = code;
5667         } else {
5668            return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5669         }
5670      }
5671   }
5672}
5673
5674static void stbi__fill_gif_background(stbi__gif *g, int x0, int y0, int x1, int y1)
5675{
5676   int x, y;
5677   stbi_uc *c = g->pal[g->bgindex];
5678   for (y = y0; y < y1; y += 4 * g->w) {
5679      for (x = x0; x < x1; x += 4) {
5680         stbi_uc *p  = &g->out[y + x];
5681         p[0] = c[2];
5682         p[1] = c[1];
5683         p[2] = c[0];
5684         p[3] = 0;
5685      }
5686   }
5687}
5688
5689// this function is designed to support animated gifs, although stb_image doesn't support it
5690static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp)
5691{
5692   int i;
5693   stbi_uc *prev_out = 0;
5694
5695   if (g->out == 0 && !stbi__gif_header(s, g, comp,0))
5696      return 0; // stbi__g_failure_reason set by stbi__gif_header
5697
5698   prev_out = g->out;
5699   g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h);
5700   if (g->out == 0) return stbi__errpuc("outofmem", "Out of memory");
5701
5702   switch ((g->eflags & 0x1C) >> 2) {
5703      case 0: // unspecified (also always used on 1st frame)
5704         stbi__fill_gif_background(g, 0, 0, 4 * g->w, 4 * g->w * g->h);
5705         break;
5706      case 1: // do not dispose
5707         if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
5708         g->old_out = prev_out;
5709         break;
5710      case 2: // dispose to background
5711         if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
5712         stbi__fill_gif_background(g, g->start_x, g->start_y, g->max_x, g->max_y);
5713         break;
5714      case 3: // dispose to previous
5715         if (g->old_out) {
5716            for (i = g->start_y; i < g->max_y; i += 4 * g->w)
5717               memcpy(&g->out[i + g->start_x], &g->old_out[i + g->start_x], g->max_x - g->start_x);
5718         }
5719         break;
5720   }
5721
5722   for (;;) {
5723      switch (stbi__get8(s)) {
5724         case 0x2C: /* Image Descriptor */
5725         {
5726            int prev_trans = -1;
5727            stbi__int32 x, y, w, h;
5728            stbi_uc *o;
5729
5730            x = stbi__get16le(s);
5731            y = stbi__get16le(s);
5732            w = stbi__get16le(s);
5733            h = stbi__get16le(s);
5734            if (((x + w) > (g->w)) || ((y + h) > (g->h)))
5735               return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
5736
5737            g->line_size = g->w * 4;
5738            g->start_x = x * 4;
5739            g->start_y = y * g->line_size;
5740            g->max_x   = g->start_x + w * 4;
5741            g->max_y   = g->start_y + h * g->line_size;
5742            g->cur_x   = g->start_x;
5743            g->cur_y   = g->start_y;
5744
5745            g->lflags = stbi__get8(s);
5746
5747            if (g->lflags & 0x40) {
5748               g->step = 8 * g->line_size; // first interlaced spacing
5749               g->parse = 3;
5750            } else {
5751               g->step = g->line_size;
5752               g->parse = 0;
5753            }
5754
5755            if (g->lflags & 0x80) {
5756               stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
5757               g->color_table = (stbi_uc *) g->lpal;
5758            } else if (g->flags & 0x80) {
5759               if (g->transparent >= 0 && (g->eflags & 0x01)) {
5760                  prev_trans = g->pal[g->transparent][3];
5761                  g->pal[g->transparent][3] = 0;
5762               }
5763               g->color_table = (stbi_uc *) g->pal;
5764            } else
5765               return stbi__errpuc("missing color table", "Corrupt GIF");
5766
5767            o = stbi__process_gif_raster(s, g);
5768            if (o == NULL) return NULL;
5769
5770            if (prev_trans != -1)
5771               g->pal[g->transparent][3] = (stbi_uc) prev_trans;
5772
5773            return o;
5774         }
5775
5776         case 0x21: // Comment Extension.
5777         {
5778            int len;
5779            if (stbi__get8(s) == 0xF9) { // Graphic Control Extension.
5780               len = stbi__get8(s);
5781               if (len == 4) {
5782                  g->eflags = stbi__get8(s);
5783                  g->delay = stbi__get16le(s);
5784                  g->transparent = stbi__get8(s);
5785               } else {
5786                  stbi__skip(s, len);
5787                  break;
5788               }
5789            }
5790            while ((len = stbi__get8(s)) != 0)
5791               stbi__skip(s, len);
5792            break;
5793         }
5794
5795         case 0x3B: // gif stream termination code
5796            return (stbi_uc *) s; // using '1' causes warning on some compilers
5797
5798         default:
5799            return stbi__errpuc("unknown code", "Corrupt GIF");
5800      }
5801   }
5802
5803   STBI_NOTUSED(req_comp);
5804}
5805
5806static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5807{
5808   stbi_uc *u = 0;
5809   stbi__gif g;
5810   memset(&g, 0, sizeof(g));
5811
5812   u = stbi__gif_load_next(s, &g, comp, req_comp);
5813   if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
5814   if (u) {
5815      *x = g.w;
5816      *y = g.h;
5817      if (req_comp && req_comp != 4)
5818         u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
5819   }
5820   else if (g.out)
5821      STBI_FREE(g.out);
5822
5823   return u;
5824}
5825
5826static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
5827{
5828   return stbi__gif_info_raw(s,x,y,comp);
5829}
5830#endif
5831
5832// *************************************************************************************************
5833// Radiance RGBE HDR loader
5834// originally by Nicolas Schulz
5835#ifndef STBI_NO_HDR
5836static int stbi__hdr_test_core(stbi__context *s)
5837{
5838   const char *signature = "#?RADIANCE\n";
5839   int i;
5840   for (i=0; signature[i]; ++i)
5841      if (stbi__get8(s) != signature[i])
5842         return 0;
5843   return 1;
5844}
5845
5846static int stbi__hdr_test(stbi__context* s)
5847{
5848   int r = stbi__hdr_test_core(s);
5849   stbi__rewind(s);
5850   return r;
5851}
5852
5853#define STBI__HDR_BUFLEN  1024
5854static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
5855{
5856   int len=0;
5857   char c = '\0';
5858
5859   c = (char) stbi__get8(z);
5860
5861   while (!stbi__at_eof(z) && c != '\n') {
5862      buffer[len++] = c;
5863      if (len == STBI__HDR_BUFLEN-1) {
5864         // flush to end of line
5865         while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
5866            ;
5867         break;
5868      }
5869      c = (char) stbi__get8(z);
5870   }
5871
5872   buffer[len] = 0;
5873   return buffer;
5874}
5875
5876static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
5877{
5878   if ( input[3] != 0 ) {
5879      float f1;
5880      // Exponent
5881      f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
5882      if (req_comp <= 2)
5883         output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
5884      else {
5885         output[0] = input[0] * f1;
5886         output[1] = input[1] * f1;
5887         output[2] = input[2] * f1;
5888      }
5889      if (req_comp == 2) output[1] = 1;
5890      if (req_comp == 4) output[3] = 1;
5891   } else {
5892      switch (req_comp) {
5893         case 4: output[3] = 1; /* fallthrough */
5894         case 3: output[0] = output[1] = output[2] = 0;
5895                 break;
5896         case 2: output[1] = 1; /* fallthrough */
5897         case 1: output[0] = 0;
5898                 break;
5899      }
5900   }
5901}
5902
5903static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5904{
5905   char buffer[STBI__HDR_BUFLEN];
5906   char *token;
5907   int valid = 0;
5908   int width, height;
5909   stbi_uc *scanline;
5910   float *hdr_data;
5911   int len;
5912   unsigned char count, value;
5913   int i, j, k, c1,c2, z;
5914
5915
5916   // Check identifier
5917   if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0)
5918      return stbi__errpf("not HDR", "Corrupt HDR image");
5919
5920   // Parse header
5921   for(;;) {
5922      token = stbi__hdr_gettoken(s,buffer);
5923      if (token[0] == 0) break;
5924      if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
5925   }
5926
5927   if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
5928
5929   // Parse width and height
5930   // can't use sscanf() if we're not using stdio!
5931   token = stbi__hdr_gettoken(s,buffer);
5932   if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
5933   token += 3;
5934   height = (int) strtol(token, &token, 10);
5935   while (*token == ' ') ++token;
5936   if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
5937   token += 3;
5938   width = (int) strtol(token, NULL, 10);
5939
5940   *x = width;
5941   *y = height;
5942
5943   if (comp) *comp = 3;
5944   if (req_comp == 0) req_comp = 3;
5945
5946   // Read data
5947   hdr_data = (float *) stbi__malloc(height * width * req_comp * sizeof(float));
5948
5949   // Load image data
5950   // image data is stored as some number of sca
5951   if ( width < 8 || width >= 32768) {
5952      // Read flat data
5953      for (j=0; j < height; ++j) {
5954         for (i=0; i < width; ++i) {
5955            stbi_uc rgbe[4];
5956           main_decode_loop:
5957            stbi__getn(s, rgbe, 4);
5958            stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
5959         }
5960      }
5961   } else {
5962      // Read RLE-encoded data
5963      scanline = NULL;
5964
5965      for (j = 0; j < height; ++j) {
5966         c1 = stbi__get8(s);
5967         c2 = stbi__get8(s);
5968         len = stbi__get8(s);
5969         if (c1 != 2 || c2 != 2 || (len & 0x80)) {
5970            // not run-length encoded, so we have to actually use THIS data as a decoded
5971            // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
5972            stbi_uc rgbe[4];
5973            rgbe[0] = (stbi_uc) c1;
5974            rgbe[1] = (stbi_uc) c2;
5975            rgbe[2] = (stbi_uc) len;
5976            rgbe[3] = (stbi_uc) stbi__get8(s);
5977            stbi__hdr_convert(hdr_data, rgbe, req_comp);
5978            i = 1;
5979            j = 0;
5980            STBI_FREE(scanline);
5981            goto main_decode_loop; // yes, this makes no sense
5982         }
5983         len <<= 8;
5984         len |= stbi__get8(s);
5985         if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
5986         if (scanline == NULL) scanline = (stbi_uc *) stbi__malloc(width * 4);
5987
5988         for (k = 0; k < 4; ++k) {
5989            i = 0;
5990            while (i < width) {
5991               count = stbi__get8(s);
5992               if (count > 128) {
5993                  // Run
5994                  value = stbi__get8(s);
5995                  count -= 128;
5996                  for (z = 0; z < count; ++z)
5997                     scanline[i++ * 4 + k] = value;
5998               } else {
5999                  // Dump
6000                  for (z = 0; z < count; ++z)
6001                     scanline[i++ * 4 + k] = stbi__get8(s);
6002               }
6003            }
6004         }
6005         for (i=0; i < width; ++i)
6006            stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
6007      }
6008      STBI_FREE(scanline);
6009   }
6010
6011   return hdr_data;
6012}
6013
6014static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
6015{
6016   char buffer[STBI__HDR_BUFLEN];
6017   char *token;
6018   int valid = 0;
6019
6020   if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0) {
6021       stbi__rewind( s );
6022       return 0;
6023   }
6024
6025   for(;;) {
6026      token = stbi__hdr_gettoken(s,buffer);
6027      if (token[0] == 0) break;
6028      if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
6029   }
6030
6031   if (!valid) {
6032       stbi__rewind( s );
6033       return 0;
6034   }
6035   token = stbi__hdr_gettoken(s,buffer);
6036   if (strncmp(token, "-Y ", 3)) {
6037       stbi__rewind( s );
6038       return 0;
6039   }
6040   token += 3;
6041   *y = (int) strtol(token, &token, 10);
6042   while (*token == ' ') ++token;
6043   if (strncmp(token, "+X ", 3)) {
6044       stbi__rewind( s );
6045       return 0;
6046   }
6047   token += 3;
6048   *x = (int) strtol(token, NULL, 10);
6049   *comp = 3;
6050   return 1;
6051}
6052#endif // STBI_NO_HDR
6053
6054#ifndef STBI_NO_BMP
6055static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
6056{
6057   int hsz;
6058   if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') {
6059       stbi__rewind( s );
6060       return 0;
6061   }
6062   stbi__skip(s,12);
6063   hsz = stbi__get32le(s);
6064   if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) {
6065       stbi__rewind( s );
6066       return 0;
6067   }
6068   if (hsz == 12) {
6069      *x = stbi__get16le(s);
6070      *y = stbi__get16le(s);
6071   } else {
6072      *x = stbi__get32le(s);
6073      *y = stbi__get32le(s);
6074   }
6075   if (stbi__get16le(s) != 1) {
6076       stbi__rewind( s );
6077       return 0;
6078   }
6079   *comp = stbi__get16le(s) / 8;
6080   return 1;
6081}
6082#endif
6083
6084#ifndef STBI_NO_PSD
6085static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
6086{
6087   int channelCount;
6088   if (stbi__get32be(s) != 0x38425053) {
6089       stbi__rewind( s );
6090       return 0;
6091   }
6092   if (stbi__get16be(s) != 1) {
6093       stbi__rewind( s );
6094       return 0;
6095   }
6096   stbi__skip(s, 6);
6097   channelCount = stbi__get16be(s);
6098   if (channelCount < 0 || channelCount > 16) {
6099       stbi__rewind( s );
6100       return 0;
6101   }
6102   *y = stbi__get32be(s);
6103   *x = stbi__get32be(s);
6104   if (stbi__get16be(s) != 8) {
6105       stbi__rewind( s );
6106       return 0;
6107   }
6108   if (stbi__get16be(s) != 3) {
6109       stbi__rewind( s );
6110       return 0;
6111   }
6112   *comp = 4;
6113   return 1;
6114}
6115#endif
6116
6117#ifndef STBI_NO_PIC
6118static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
6119{
6120   int act_comp=0,num_packets=0,chained;
6121   stbi__pic_packet packets[10];
6122
6123   if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
6124      stbi__rewind(s);
6125      return 0;
6126   }
6127
6128   stbi__skip(s, 88);
6129
6130   *x = stbi__get16be(s);
6131   *y = stbi__get16be(s);
6132   if (stbi__at_eof(s)) {
6133      stbi__rewind( s);
6134      return 0;
6135   }
6136   if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
6137      stbi__rewind( s );
6138      return 0;
6139   }
6140
6141   stbi__skip(s, 8);
6142
6143   do {
6144      stbi__pic_packet *packet;
6145
6146      if (num_packets==sizeof(packets)/sizeof(packets[0]))
6147         return 0;
6148
6149      packet = &packets[num_packets++];
6150      chained = stbi__get8(s);
6151      packet->size    = stbi__get8(s);
6152      packet->type    = stbi__get8(s);
6153      packet->channel = stbi__get8(s);
6154      act_comp |= packet->channel;
6155
6156      if (stbi__at_eof(s)) {
6157          stbi__rewind( s );
6158          return 0;
6159      }
6160      if (packet->size != 8) {
6161          stbi__rewind( s );
6162          return 0;
6163      }
6164   } while (chained);
6165
6166   *comp = (act_comp & 0x10 ? 4 : 3);
6167
6168   return 1;
6169}
6170#endif
6171
6172// *************************************************************************************************
6173// Portable Gray Map and Portable Pixel Map loader
6174// by Ken Miller
6175//
6176// PGM: http://netpbm.sourceforge.net/doc/pgm.html
6177// PPM: http://netpbm.sourceforge.net/doc/ppm.html
6178//
6179// Known limitations:
6180//    Does not support comments in the header section
6181//    Does not support ASCII image data (formats P2 and P3)
6182//    Does not support 16-bit-per-channel
6183
6184#ifndef STBI_NO_PNM
6185
6186static int      stbi__pnm_test(stbi__context *s)
6187{
6188   char p, t;
6189   p = (char) stbi__get8(s);
6190   t = (char) stbi__get8(s);
6191   if (p != 'P' || (t != '5' && t != '6')) {
6192       stbi__rewind( s );
6193       return 0;
6194   }
6195   return 1;
6196}
6197
6198static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
6199{
6200   stbi_uc *out;
6201   if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n))
6202      return 0;
6203   *x = s->img_x;
6204   *y = s->img_y;
6205   *comp = s->img_n;
6206
6207   out = (stbi_uc *) stbi__malloc(s->img_n * s->img_x * s->img_y);
6208   if (!out) return stbi__errpuc("outofmem", "Out of memory");
6209   stbi__getn(s, out, s->img_n * s->img_x * s->img_y);
6210
6211   if (req_comp && req_comp != s->img_n) {
6212      out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
6213      if (out == NULL) return out; // stbi__convert_format frees input on failure
6214   }
6215   return out;
6216}
6217
6218static int      stbi__pnm_isspace(char c)
6219{
6220   return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
6221}
6222
6223static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
6224{
6225   while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
6226      *c = (char) stbi__get8(s);
6227}
6228
6229static int      stbi__pnm_isdigit(char c)
6230{
6231   return c >= '0' && c <= '9';
6232}
6233
6234static int      stbi__pnm_getinteger(stbi__context *s, char *c)
6235{
6236   int value = 0;
6237
6238   while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
6239      value = value*10 + (*c - '0');
6240      *c = (char) stbi__get8(s);
6241   }
6242
6243   return value;
6244}
6245
6246static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
6247{
6248   int maxv;
6249   char c, p, t;
6250
6251   stbi__rewind( s );
6252
6253   // Get identifier
6254   p = (char) stbi__get8(s);
6255   t = (char) stbi__get8(s);
6256   if (p != 'P' || (t != '5' && t != '6')) {
6257       stbi__rewind( s );
6258       return 0;
6259   }
6260
6261   *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
6262
6263   c = (char) stbi__get8(s);
6264   stbi__pnm_skip_whitespace(s, &c);
6265
6266   *x = stbi__pnm_getinteger(s, &c); // read width
6267   stbi__pnm_skip_whitespace(s, &c);
6268
6269   *y = stbi__pnm_getinteger(s, &c); // read height
6270   stbi__pnm_skip_whitespace(s, &c);
6271
6272   maxv = stbi__pnm_getinteger(s, &c);  // read max value
6273
6274   if (maxv > 255)
6275      return stbi__err("max value > 255", "PPM image not 8-bit");
6276   else
6277      return 1;
6278}
6279#endif
6280
6281static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
6282{
6283   #ifndef STBI_NO_JPEG
6284   if (stbi__jpeg_info(s, x, y, comp)) return 1;
6285   #endif
6286
6287   #ifndef STBI_NO_PNG
6288   if (stbi__png_info(s, x, y, comp))  return 1;
6289   #endif
6290
6291   #ifndef STBI_NO_GIF
6292   if (stbi__gif_info(s, x, y, comp))  return 1;
6293   #endif
6294
6295   #ifndef STBI_NO_BMP
6296   if (stbi__bmp_info(s, x, y, comp))  return 1;
6297   #endif
6298
6299   #ifndef STBI_NO_PSD
6300   if (stbi__psd_info(s, x, y, comp))  return 1;
6301   #endif
6302
6303   #ifndef STBI_NO_PIC
6304   if (stbi__pic_info(s, x, y, comp))  return 1;
6305   #endif
6306
6307   #ifndef STBI_NO_PNM
6308   if (stbi__pnm_info(s, x, y, comp))  return 1;
6309   #endif
6310
6311   #ifndef STBI_NO_HDR
6312   if (stbi__hdr_info(s, x, y, comp))  return 1;
6313   #endif
6314
6315   // test tga last because it's a crappy test!
6316   #ifndef STBI_NO_TGA
6317   if (stbi__tga_info(s, x, y, comp))
6318       return 1;
6319   #endif
6320   return stbi__err("unknown image type", "Image not of any known type, or corrupt");
6321}
6322
6323#ifndef STBI_NO_STDIO
6324STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
6325{
6326    FILE *f = stbi__fopen(filename, "rb");
6327    int result;
6328    if (!f) return stbi__err("can't fopen", "Unable to open file");
6329    result = stbi_info_from_file(f, x, y, comp);
6330    fclose(f);
6331    return result;
6332}
6333
6334STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
6335{
6336   int r;
6337   stbi__context s;
6338   long pos = ftell(f);
6339   stbi__start_file(&s, f);
6340   r = stbi__info_main(&s,x,y,comp);
6341   fseek(f,pos,SEEK_SET);
6342   return r;
6343}
6344#endif // !STBI_NO_STDIO
6345
6346STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
6347{
6348   stbi__context s;
6349   stbi__start_mem(&s,buffer,len);
6350   return stbi__info_main(&s,x,y,comp);
6351}
6352
6353STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
6354{
6355   stbi__context s;
6356   stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
6357   return stbi__info_main(&s,x,y,comp);
6358}
6359
6360#endif // STB_IMAGE_IMPLEMENTATION
6361
6362/*
6363   revision history:
6364      2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
6365      2.07  (2015-09-13) fix compiler warnings
6366                         partial animated GIF support
6367                         limited 16-bit PSD support
6368                         #ifdef unused functions
6369                         bug with < 92 byte PIC,PNM,HDR,TGA
6370      2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
6371      2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
6372      2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
6373      2.03  (2015-04-12) extra corruption checking (mmozeiko)
6374                         stbi_set_flip_vertically_on_load (nguillemot)
6375                         fix NEON support; fix mingw support
6376      2.02  (2015-01-19) fix incorrect assert, fix warning
6377      2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
6378      2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
6379      2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
6380                         progressive JPEG (stb)
6381                         PGM/PPM support (Ken Miller)
6382                         STBI_MALLOC,STBI_REALLOC,STBI_FREE
6383                         GIF bugfix -- seemingly never worked
6384                         STBI_NO_*, STBI_ONLY_*
6385      1.48  (2014-12-14) fix incorrectly-named assert()
6386      1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
6387                         optimize PNG (ryg)
6388                         fix bug in interlaced PNG with user-specified channel count (stb)
6389      1.46  (2014-08-26)
6390              fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
6391      1.45  (2014-08-16)
6392              fix MSVC-ARM internal compiler error by wrapping malloc
6393      1.44  (2014-08-07)
6394              various warning fixes from Ronny Chevalier
6395      1.43  (2014-07-15)
6396              fix MSVC-only compiler problem in code changed in 1.42
6397      1.42  (2014-07-09)
6398              don't define _CRT_SECURE_NO_WARNINGS (affects user code)
6399              fixes to stbi__cleanup_jpeg path
6400              added STBI_ASSERT to avoid requiring assert.h
6401      1.41  (2014-06-25)
6402              fix search&replace from 1.36 that messed up comments/error messages
6403      1.40  (2014-06-22)
6404              fix gcc struct-initialization warning
6405      1.39  (2014-06-15)
6406              fix to TGA optimization when req_comp != number of components in TGA;
6407              fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
6408              add support for BMP version 5 (more ignored fields)
6409      1.38  (2014-06-06)
6410              suppress MSVC warnings on integer casts truncating values
6411              fix accidental rename of 'skip' field of I/O
6412      1.37  (2014-06-04)
6413              remove duplicate typedef
6414      1.36  (2014-06-03)
6415              convert to header file single-file library
6416              if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
6417      1.35  (2014-05-27)
6418              various warnings
6419              fix broken STBI_SIMD path
6420              fix bug where stbi_load_from_file no longer left file pointer in correct place
6421              fix broken non-easy path for 32-bit BMP (possibly never used)
6422              TGA optimization by Arseny Kapoulkine
6423      1.34  (unknown)
6424              use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
6425      1.33  (2011-07-14)
6426              make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
6427      1.32  (2011-07-13)
6428              support for "info" function for all supported filetypes (SpartanJ)
6429      1.31  (2011-06-20)
6430              a few more leak fixes, bug in PNG handling (SpartanJ)
6431      1.30  (2011-06-11)
6432              added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
6433              removed deprecated format-specific test/load functions
6434              removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
6435              error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
6436              fix inefficiency in decoding 32-bit BMP (David Woo)
6437      1.29  (2010-08-16)
6438              various warning fixes from Aurelien Pocheville
6439      1.28  (2010-08-01)
6440              fix bug in GIF palette transparency (SpartanJ)
6441      1.27  (2010-08-01)
6442              cast-to-stbi_uc to fix warnings
6443      1.26  (2010-07-24)
6444              fix bug in file buffering for PNG reported by SpartanJ
6445      1.25  (2010-07-17)
6446              refix trans_data warning (Won Chun)
6447      1.24  (2010-07-12)
6448              perf improvements reading from files on platforms with lock-heavy fgetc()
6449              minor perf improvements for jpeg
6450              deprecated type-specific functions so we'll get feedback if they're needed
6451              attempt to fix trans_data warning (Won Chun)
6452      1.23    fixed bug in iPhone support
6453      1.22  (2010-07-10)
6454              removed image *writing* support
6455              stbi_info support from Jetro Lauha
6456              GIF support from Jean-Marc Lienher
6457              iPhone PNG-extensions from James Brown
6458              warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
6459      1.21    fix use of 'stbi_uc' in header (reported by jon blow)
6460      1.20    added support for Softimage PIC, by Tom Seddon
6461      1.19    bug in interlaced PNG corruption check (found by ryg)
6462      1.18  (2008-08-02)
6463              fix a threading bug (local mutable static)
6464      1.17    support interlaced PNG
6465      1.16    major bugfix - stbi__convert_format converted one too many pixels
6466      1.15    initialize some fields for thread safety
6467      1.14    fix threadsafe conversion bug
6468              header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
6469      1.13    threadsafe
6470      1.12    const qualifiers in the API
6471      1.11    Support installable IDCT, colorspace conversion routines
6472      1.10    Fixes for 64-bit (don't use "unsigned long")
6473              optimized upsampling by Fabian "ryg" Giesen
6474      1.09    Fix format-conversion for PSD code (bad global variables!)
6475      1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
6476      1.07    attempt to fix C++ warning/errors again
6477      1.06    attempt to fix C++ warning/errors again
6478      1.05    fix TGA loading to return correct *comp and use good luminance calc
6479      1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
6480      1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
6481      1.02    support for (subset of) HDR files, float interface for preferred access to them
6482      1.01    fix bug: possible bug in handling right-side up bmps... not sure
6483              fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
6484      1.00    interface to zlib that skips zlib header
6485      0.99    correct handling of alpha in palette
6486      0.98    TGA loader by lonesock; dynamically add loaders (untested)
6487      0.97    jpeg errors on too large a file; also catch another malloc failure
6488      0.96    fix detection of invalid v value - particleman@mollyrocket forum
6489      0.95    during header scan, seek to markers in case of padding
6490      0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
6491      0.93    handle jpegtran output; verbose errors
6492      0.92    read 4,8,16,24,32-bit BMP files of several formats
6493      0.91    output 24-bit Windows 3.0 BMP files
6494      0.90    fix a few more warnings; bump version number to approach 1.0
6495      0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
6496      0.60    fix compiling as c++
6497      0.59    fix warnings: merge Dave Moore's -Wall fixes
6498      0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
6499      0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
6500      0.56    fix bug: zlib uncompressed mode len vs. nlen
6501      0.55    fix bug: restart_interval not initialized to 0
6502      0.54    allow NULL for 'int *comp'
6503      0.53    fix bug in png 3->4; speedup png decoding
6504      0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
6505      0.51    obey req_comp requests, 1-component jpegs return as 1-component,
6506              on 'test' only check type, not whether we support this variant
6507      0.50  (2006-11-19)
6508              first released version
6509*/
6510