18e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels% -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*- 28e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels%!TEX root = Vorbis_I_spec.tex 38e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels% $Id$ 48e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\section{Introduction and Description} \label{vorbis:spec:intro} 58e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 68e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsection{Overview} 78e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 88e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThis document provides a high level description of the Vorbis codec's 98e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsconstruction. A bit-by-bit specification appears beginning in 108e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\xref{vorbis:spec:codec}. 118e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe later sections assume a high-level 128e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsunderstanding of the Vorbis decode process, which is 138e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsprovided here. 148e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 158e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Application} 168e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis is a general purpose perceptual audio CODEC intended to allow 178e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmaximum encoder flexibility, thus allowing it to scale competitively 188e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsover an exceptionally wide range of bitrates. At the high 198e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsquality/bitrate end of the scale (CD or DAT rate stereo, 16/24 bits) 208e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsit is in the same league as MPEG-2 and MPC. Similarly, the 1.0 218e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsencoder can encode high-quality CD and DAT rate stereo at below 48kbps 228e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswithout resampling to a lower rate. Vorbis is also intended for 238e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslower and higher sample rates (from 8kHz telephony to 192kHz digital 248e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmasters) and a range of channel representations (monaural, 258e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspolyphonic, stereo, quadraphonic, 5.1, ambisonic, or up to 255 268e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdiscrete channels). 278e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 288e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 298e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Classification} 308e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis I is a forward-adaptive monolithic transform CODEC based on the 318e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsModified Discrete Cosine Transform. The codec is structured to allow 328e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsaddition of a hybrid wavelet filterbank in Vorbis II to offer better 338e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelstransient response and reproduction using a transform better suited to 348e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslocalized time events. 358e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 368e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 378e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Assumptions} 388e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 398e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe Vorbis CODEC design assumes a complex, psychoacoustically-aware 408e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsencoder and simple, low-complexity decoder. Vorbis decode is 418e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscomputationally simpler than mp3, although it does require more 428e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsworking memory as Vorbis has no static probability model; the vector 438e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscodebooks used in the first stage of decoding from the bitstream are 448e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspacked in their entirety into the Vorbis bitstream headers. In 458e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspacked form, these codebooks occupy only a few kilobytes; the extent 468e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsto which they are pre-decoded into a cache is the dominant factor in 478e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdecoder memory usage. 488e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 498e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 508e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis provides none of its own framing, synchronization or protection 518e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsagainst errors; it is solely a method of accepting input audio, 528e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdividing it into individual frames and compressing these frames into 538e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsraw, unformatted 'packets'. The decoder then accepts these raw 548e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspackets in sequence, decodes them, synthesizes audio frames from 558e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthem, and reassembles the frames into a facsimile of the original 568e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsaudio stream. Vorbis is a free-form variable bit rate (VBR) codec and packets have no 578e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsminimum size, maximum size, or fixed/expected size. Packets 588e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsare designed that they may be truncated (or padded) and remain 598e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdecodable; this is not to be considered an error condition and is used 608e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsextensively in bitrate management in peeling. Both the transport 618e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmechanism and decoder must allow that a packet may be any size, or 628e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsend before or after packet decode expects. 638e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 648e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis packets are thus intended to be used with a transport mechanism 658e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthat provides free-form framing, sync, positioning and error correction 668e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsin accordance with these design assumptions, such as Ogg (for file 678e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelstransport) or RTP (for network multicast). For purposes of a few 688e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsexamples in this document, we will assume that Vorbis is to be 698e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsembedded in an Ogg stream specifically, although this is by no means a 708e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrequirement or fundamental assumption in the Vorbis design. 718e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 728e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe specification for embedding Vorbis into 738e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsan Ogg transport stream is in \xref{vorbis:over:ogg}. 748e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 758e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 768e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 778e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Codec Setup and Probability Model} 788e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 798e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis' heritage is as a research CODEC and its current design 808e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsreflects a desire to allow multiple decades of continuous encoder 818e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsimprovement before running out of room within the codec specification. 828e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsFor these reasons, configurable aspects of codec setup intentionally 838e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslean toward the extreme of forward adaptive. 848e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 858e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe single most controversial design decision in Vorbis (and the most 868e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsunusual for a Vorbis developer to keep in mind) is that the entire 878e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsprobability model of the codec, the Huffman and VQ codebooks, is 888e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspacked into the bitstream header along with extensive CODEC setup 898e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsparameters (often several hundred fields). This makes it impossible, 908e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsas it would be with MPEG audio layers, to embed a simple frame type 918e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsflag in each audio packet, or begin decode at any frame in the stream 928e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswithout having previously fetched the codec setup header. 938e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 948e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 958e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\begin{note} 968e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis \emph{can} initiate decode at any arbitrary packet within a 978e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbitstream so long as the codec has been initialized/setup with the 988e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssetup headers. 998e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\end{note} 1008e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1018e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThus, Vorbis headers are both required for decode to begin and 1028e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrelatively large as bitstream headers go. The header size is 1038e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsunbounded, although for streaming a rule-of-thumb of 4kB or less is 1048e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrecommended (and Xiph.Org's Vorbis encoder follows this suggestion). 1058e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1068e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsOur own design work indicates the primary liability of the 1078e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrequired header is in mindshare; it is an unusual design and thus 1088e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscauses some amount of complaint among engineers as this runs against 1098e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscurrent design trends (and also points out limitations in some 1108e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsexisting software/interface designs, such as Windows' ACM codec 1118e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsframework). However, we find that it does not fundamentally limit 1128e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis' suitable application space. 1138e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1148e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1158e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Format Specification} 1168e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe Vorbis format is well-defined by its decode specification; any 1178e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsencoder that produces packets that are correctly decoded by the 1188e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsreference Vorbis decoder described below may be considered a proper 1198e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis encoder. A decoder must faithfully and completely implement 1208e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe specification defined below (except where noted) to be considered 1218e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsa proper Vorbis decoder. 1228e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1238e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Hardware Profile} 1248e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsAlthough Vorbis decode is computationally simple, it may still run 1258e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsinto specific limitations of an embedded design. For this reason, 1268e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsembedded designs are allowed to deviate in limited ways from the 1278e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels`full' decode specification yet still be certified compliant. These 1288e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsoptional omissions are labelled in the spec where relevant. 1298e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1308e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1318e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsection{Decoder Configuration} 1328e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1338e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsDecoder setup consists of configuration of multiple, self-contained 1348e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscomponent abstractions that perform specific functions in the decode 1358e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspipeline. Each different component instance of a specific type is 1368e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssemantically interchangeable; decoder configuration consists both of 1378e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsinternal component configuration, as well as arrangement of specific 1388e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsinstances into a decode pipeline. Componentry arrangement is roughly 1398e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsas follows: 1408e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1418e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\begin{center} 1428e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\includegraphics[width=\textwidth]{components} 1438e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\captionof{figure}{decoder pipeline configuration} 1448e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\end{center} 1458e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1468e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Global Config} 1478e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsGlobal codec configuration consists of a few audio related fields 1488e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels(sample rate, channels), Vorbis version (always '0' in Vorbis I), 1498e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbitrate hints, and the lists of component instances. All other 1508e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsconfiguration is in the context of specific components. 1518e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1528e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Mode} 1538e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1548e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsEach Vorbis frame is coded according to a master 'mode'. A bitstream 1558e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmay use one or many modes. 1568e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1578e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe mode mechanism is used to encode a frame according to one of 1588e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmultiple possible methods with the intention of choosing a method best 1598e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssuited to that frame. Different modes are, e.g. how frame size 1608e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsis changed from frame to frame. The mode number of a frame serves as a 1618e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelstop level configuration switch for all other specific aspects of frame 1628e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdecode. 1638e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1648e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsA 'mode' configuration consists of a frame size setting, window type 1658e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels(always 0, the Vorbis window, in Vorbis I), transform type (always 1668e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelstype 0, the MDCT, in Vorbis I) and a mapping number. The mapping 1678e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsnumber specifies which mapping configuration instance to use for 1688e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslow-level packet decode and synthesis. 1698e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1708e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1718e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Mapping} 1728e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1738e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsA mapping contains a channel coupling description and a list of 1748e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels'submaps' that bundle sets of channel vectors together for grouped 1758e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsencoding and decoding. These submaps are not references to external 1768e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscomponents; the submap list is internal and specific to a mapping. 1778e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1788e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsA 'submap' is a configuration/grouping that applies to a subset of 1798e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsfloor and residue vectors within a mapping. The submap functions as a 1808e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslast layer of indirection such that specific special floor or residue 1818e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssettings can be applied not only to all the vectors in a given mode, 1828e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbut also specific vectors in a specific mode. Each submap specifies 1838e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe proper floor and residue instance number to use for decoding that 1848e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssubmap's spectral floor and spectral residue vectors. 1858e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1868e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsAs an example: 1878e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1888e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsAssume a Vorbis stream that contains six channels in the standard 5.1 1898e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsformat. The sixth channel, as is normal in 5.1, is bass only. 1908e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsTherefore it would be wasteful to encode a full-spectrum version of it 1918e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsas with the other channels. The submapping mechanism can be used to 1928e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsapply a full range floor and residue encoding to channels 0 through 4, 1938e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsand a bass-only representation to the bass channel, thus saving space. 1948e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsIn this example, channels 0-4 belong to submap 0 (which indicates use 1958e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsof a full-range floor) and channel 5 belongs to submap 1, which uses a 1968e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbass-only representation. 1978e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1988e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 1998e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Floor} 2008e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2018e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis encodes a spectral 'floor' vector for each PCM channel. This 2028e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsvector is a low-resolution representation of the audio spectrum for 2038e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe given channel in the current frame, generally used akin to a 2048e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswhitening filter. It is named a 'floor' because the Xiph.Org 2058e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsreference encoder has historically used it as a unit-baseline for 2068e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsspectral resolution. 2078e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2088e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsA floor encoding may be of two types. Floor 0 uses a packed LSP 2098e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrepresentation on a dB amplitude scale and Bark frequency scale. 2108e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsFloor 1 represents the curve as a piecewise linear interpolated 2118e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrepresentation on a dB amplitude scale and linear frequency scale. 2128e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe two floors are semantically interchangeable in 2138e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsencoding/decoding. However, floor type 1 provides more stable 2148e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsinter-frame behavior, and so is the preferred choice in all 2158e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscoupled-stereo and high bitrate modes. Floor 1 is also considerably 2168e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsless expensive to decode than floor 0. 2178e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2188e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsFloor 0 is not to be considered deprecated, but it is of limited 2198e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmodern use. No known Vorbis encoder past Xiph.org's own beta 4 makes 2208e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsuse of floor 0. 2218e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2228e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe values coded/decoded by a floor are both compactly formatted and 2238e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmake use of entropy coding to save space. For this reason, a floor 2248e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsconfiguration generally refers to multiple codebooks in the codebook 2258e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscomponent list. Entropy coding is thus provided as an abstraction, 2268e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsand each floor instance may choose from any and all available 2278e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscodebooks when coding/decoding. 2288e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2298e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2308e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Residue} 2318e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe spectral residue is the fine structure of the audio spectrum 2328e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsonce the floor curve has been subtracted out. In simplest terms, it 2338e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsis coded in the bitstream using cascaded (multi-pass) vector 2348e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsquantization according to one of three specific packing/coding 2358e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsalgorithms numbered 0 through 2. The packing algorithm details are 2368e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsconfigured by residue instance. As with the floor components, the 2378e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsfinal VQ/entropy encoding is provided by external codebook instances 2388e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsand each residue instance may choose from any and all available 2398e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscodebooks. 2408e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2418e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Codebooks} 2428e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2438e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsCodebooks are a self-contained abstraction that perform entropy 2448e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdecoding and, optionally, use the entropy-decoded integer value as an 2458e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsoffset into an index of output value vectors, returning the indicated 2468e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsvector of values. 2478e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2488e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe entropy coding in a Vorbis I codebook is provided by a standard 2498e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsHuffman binary tree representation. This tree is tightly packed using 2508e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsone of several methods, depending on whether codeword lengths are 2518e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsordered or unordered, or the tree is sparse. 2528e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2538e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe codebook vector index is similarly packed according to index 2548e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscharacteristic. Most commonly, the vector index is encoded as a 2558e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssingle list of values of possible values that are then permuted into 2568e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsa list of n-dimensional rows (lattice VQ). 2578e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2588e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2598e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2608e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsection{High-level Decode Process} 2618e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2628e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Decode Setup} 2638e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2648e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsBefore decoding can begin, a decoder must initialize using the 2658e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbitstream headers matching the stream to be decoded. Vorbis uses 2668e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthree header packets; all are required, in-order, by this 2678e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsspecification. Once set up, decode may begin at any audio packet 2688e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbelonging to the Vorbis stream. In Vorbis I, all packets after the 2698e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthree initial headers are audio packets. 2708e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2718e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe header packets are, in order, the identification 2728e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsheader, the comments header, and the setup header. 2738e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2748e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{Identification Header} 2758e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe identification header identifies the bitstream as Vorbis, Vorbis 2768e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsversion, and the simple audio characteristics of the stream such as 2778e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssample rate and number of channels. 2788e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2798e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{Comment Header} 2808e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe comment header includes user text comments (``tags'') and a vendor 2818e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsstring for the application/library that produced the bitstream. The 2828e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsencoding and proper use of the comment header is described in \xref{vorbis:spec:comment}. 2838e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2848e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{Setup Header} 2858e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe setup header includes extensive CODEC setup information as well as 2868e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe complete VQ and Huffman codebooks needed for decode. 2878e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2888e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2898e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Decode Procedure} 2908e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 2918e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe decoding and synthesis procedure for all audio packets is 2928e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsfundamentally the same. 2938e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\begin{enumerate} 2948e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item decode packet type flag 2958e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item decode mode number 2968e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item decode window shape (long windows only) 2978e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item decode floor 2988e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item decode residue into residue vectors 2998e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item inverse channel coupling of residue vectors 3008e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item generate floor curve from decoded floor data 3018e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item compute dot product of floor and residue, producing audio spectrum vector 3028e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item inverse monolithic transform of audio spectrum vector, always an MDCT in Vorbis I 3038e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item overlap/add left-hand output of transform with right-hand output of previous frame 3048e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item store right hand-data from transform of current frame for future lapping 3058e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item if not first frame, return results of overlap/add as audio result of current frame 3068e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\end{enumerate} 3078e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3088e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsNote that clever rearrangement of the synthesis arithmetic is 3098e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspossible; as an example, one can take advantage of symmetries in the 3108e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsMDCT to store the right-hand transform data of a partial MDCT for a 3118e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels50\% inter-frame buffer space savings, and then complete the transform 3128e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslater before overlap/add with the next frame. This optimization 3138e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsproduces entirely equivalent output and is naturally perfectly legal. 3148e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe decoder must be \emph{entirely mathematically equivalent} to the 3158e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsspecification, it need not be a literal semantic implementation. 3168e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3178e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{Packet type decode} 3188e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3198e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis I uses four packet types. The first three packet types mark each 3208e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsof the three Vorbis headers described above. The fourth packet type 3218e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmarks an audio packet. All other packet types are reserved; packets 3228e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmarked with a reserved type should be ignored. 3238e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3248e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsFollowing the three header packets, all packets in a Vorbis I stream 3258e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsare audio. The first step of audio packet decode is to read and 3268e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsverify the packet type; \emph{a non-audio packet when audio is expected 3278e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsindicates stream corruption or a non-compliant stream. The decoder 3288e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmust ignore the packet and not attempt decoding it to 3298e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsaudio}. 3308e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3318e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3328e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3338e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3348e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{Mode decode} 3358e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis allows an encoder to set up multiple, numbered packet 'modes', 3368e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsas described earlier, all of which may be used in a given Vorbis 3378e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsstream. The mode is encoded as an integer used as a direct offset into 3388e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe mode instance index. 3398e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3408e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3418e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{Window shape decode (long windows only)} \label{vorbis:spec:window} 3428e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3438e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis frames may be one of two PCM sample sizes specified during 3448e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscodec setup. In Vorbis I, legal frame sizes are powers of two from 64 3458e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsto 8192 samples. Aside from coupling, Vorbis handles channels as 3468e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsindependent vectors and these frame sizes are in samples per channel. 3478e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3488e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis uses an overlapping transform, namely the MDCT, to blend one 3498e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsframe into the next, avoiding most inter-frame block boundary 3508e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsartifacts. The MDCT output of one frame is windowed according to MDCT 3518e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrequirements, overlapped 50\% with the output of the previous frame and 3528e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsadded. The window shape assures seamless reconstruction. 3538e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3548e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThis is easy to visualize in the case of equal sized-windows: 3558e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3568e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\begin{center} 3578e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\includegraphics[width=\textwidth]{window1} 3588e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\captionof{figure}{overlap of two equal-sized windows} 3598e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\end{center} 3608e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3618e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsAnd slightly more complex in the case of overlapping unequal sized 3628e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswindows: 3638e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3648e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\begin{center} 3658e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\includegraphics[width=\textwidth]{window2} 3668e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\captionof{figure}{overlap of a long and a short window} 3678e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\end{center} 3688e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3698e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsIn the unequal-sized window case, the window shape of the long window 3708e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmust be modified for seamless lapping as above. It is possible to 3718e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscorrectly infer window shape to be applied to the current window from 3728e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsknowing the sizes of the current, previous and next window. It is 3738e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslegal for a decoder to use this method. However, in the case of a long 3748e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswindow (short windows require no modification), Vorbis also codes two 3758e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsflag bits to specify pre- and post- window shape. Although not 3768e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsstrictly necessary for function, this minor redundancy allows a packet 3778e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsto be fully decoded to the point of lapping entirely independently of 3788e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsany other packet, allowing easier abstraction of decode layers as well 3798e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsas allowing a greater level of easy parallelism in encode and 3808e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdecode. 3818e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3828e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsA description of valid window functions for use with an inverse MDCT 3838e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscan be found in \cite{Sporer/Brandenburg/Edler}. Vorbis windows 3848e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsall use the slope function 3858e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\[ y = \sin(.5*\pi \, \sin^2((x+.5)/n*\pi)) . \] 3868e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3878e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3888e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3898e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{floor decode} 3908e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsEach floor is encoded/decoded in channel order, however each floor 3918e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbelongs to a 'submap' that specifies which floor configuration to 3928e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsuse. All floors are decoded before residue decode begins. 3938e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3948e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3958e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{residue decode} 3968e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 3978e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsAlthough the number of residue vectors equals the number of channels, 3988e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelschannel coupling may mean that the raw residue vectors extracted 3998e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsduring decode do not map directly to specific channels. When channel 4008e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscoupling is in use, some vectors will correspond to coupled magnitude 4018e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsor angle. The coupling relationships are described in the codec setup 4028e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsand may differ from frame to frame, due to different mode numbers. 4038e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4048e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis codes residue vectors in groups by submap; the coding is done 4058e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsin submap order from submap 0 through n-1. This differs from floors 4068e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswhich are coded using a configuration provided by submap number, but 4078e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsare coded individually in channel order. 4088e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4098e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4108e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4118e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{inverse channel coupling} 4128e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4138e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsA detailed discussion of stereo in the Vorbis codec can be found in 4148e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe document \href{stereo.html}{Stereo Channel Coupling in the 4158e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis CODEC}. Vorbis is not limited to only stereo coupling, but 4168e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe stereo document also gives a good overview of the generic coupling 4178e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmechanism. 4188e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4198e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis coupling applies to pairs of residue vectors at a time; 4208e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdecoupling is done in-place a pair at a time in the order and using 4218e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe vectors specified in the current mapping configuration. The 4228e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdecoupling operation is the same for all pairs, converting square 4238e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspolar representation (where one vector is magnitude and the second 4248e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsangle) back to Cartesian representation. 4258e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4268e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsAfter decoupling, in order, each pair of vectors on the coupling list, 4278e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe resulting residue vectors represent the fine spectral detail 4288e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsof each output channel. 4298e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4308e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4318e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4328e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{generate floor curve} 4338e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4348e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe decoder may choose to generate the floor curve at any appropriate 4358e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelstime. It is reasonable to generate the output curve when the floor 4368e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdata is decoded from the raw packet, or it can be generated after 4378e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsinverse coupling and applied to the spectral residue directly, 4388e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscombining generation and the dot product into one step and eliminating 4398e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssome working space. 4408e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4418e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsBoth floor 0 and floor 1 generate a linear-range, linear-domain output 4428e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsvector to be multiplied (dot product) by the linear-range, 4438e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslinear-domain spectral residue. 4448e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4458e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4468e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4478e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{compute floor/residue dot product} 4488e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4498e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThis step is straightforward; for each output channel, the decoder 4508e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmultiplies the floor curve and residue vectors element by element, 4518e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsproducing the finished audio spectrum of each channel. 4528e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4538e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels% TODO/FIXME: The following two paragraphs have identical twins 4548e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels% in section 4 (under "dot product") 4558e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsOne point is worth mentioning about this dot product; a common mistake 4568e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsin a fixed point implementation might be to assume that a 32 bit 4578e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsfixed-point representation for floor and residue and direct 4588e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmultiplication of the vectors is sufficient for acceptable spectral 4598e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdepth in all cases because it happens to mostly work with the current 4608e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsXiph.Org reference encoder. 4618e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4628e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsHowever, floor vector values can span \~{}140dB (\~{}24 bits unsigned), and 4638e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe audio spectrum vector should represent a minimum of 120dB (\~{}21 4648e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbits with sign), even when output is to a 16 bit PCM device. For the 4658e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsresidue vector to represent full scale if the floor is nailed to 4668e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels$-140$dB, it must be able to span 0 to $+140$dB. For the residue vector 4678e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsto reach full scale if the floor is nailed at 0dB, it must be able to 4688e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrepresent $-140$dB to $+0$dB. Thus, in order to handle full range 4698e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdynamics, a residue vector may span $-140$dB to $+140$dB entirely within 4708e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsspec. A 280dB range is approximately 48 bits with sign; thus the 4718e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsresidue vector must be able to represent a 48 bit range and the dot 4728e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsproduct must be able to handle an effective 48 bit times 24 bit 4738e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmultiplication. This range may be achieved using large (64 bit or 4748e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslarger) integers, or implementing a movable binary point 4758e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrepresentation. 4768e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4778e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4788e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4798e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{inverse monolithic transform (MDCT)} 4808e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4818e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe audio spectrum is converted back into time domain PCM audio via an 4828e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsinverse Modified Discrete Cosine Transform (MDCT). A detailed 4838e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdescription of the MDCT is available in \cite{Sporer/Brandenburg/Edler}. 4848e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4858e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsNote that the PCM produced directly from the MDCT is not yet finished 4868e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsaudio; it must be lapped with surrounding frames using an appropriate 4878e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswindow (such as the Vorbis window) before the MDCT can be considered 4888e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsorthogonal. 4898e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4908e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4918e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 4928e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{overlap/add data} 4938e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsWindowed MDCT output is overlapped and added with the right hand data 4948e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsof the previous window such that the 3/4 point of the previous window 4958e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsis aligned with the 1/4 point of the current window (as illustrated in 4968e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe window overlap diagram). At this point, the audio data between the 4978e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscenter of the previous frame and the center of the current frame is 4988e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsnow finished and ready to be returned. 4998e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 5008e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 5018e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{cache right hand data} 5028e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe decoder must cache the right hand portion of the current frame to 5038e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbe lapped with the left hand portion of the next frame. 5048e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 5058e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 5068e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 5078e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{return finished audio data} 5088e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 5098e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe overlapped portion produced from overlapping the previous and 5108e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscurrent frame data is finished data to be returned by the decoder. 5118e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThis data spans from the center of the previous window to the center 5128e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsof the current window. In the case of same-sized windows, the amount 5138e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsof data to return is one-half block consisting of and only of the 5148e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsoverlapped portions. When overlapping a short and long window, much of 5158e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe returned range is not actually overlap. This does not damage 5168e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelstransform orthogonality. Pay attention however to returning the 5178e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscorrect data range; the amount of data to be returned is: 5188e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 5198e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\begin{Verbatim}[commandchars=\\\{\}] 5208e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswindow_blocksize(previous_window)/4+window_blocksize(current_window)/4 5218e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\end{Verbatim} 5228e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 5238e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsfrom the center of the previous window to the center of the current 5248e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswindow. 5258e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels 5268e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsData is not returned from the first frame; it must be used to 'prime' 5278e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe decode engine. The encoder accounts for this priming when 5288e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscalculating PCM offsets; after the first frame, the proper PCM output 5298e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsoffset is '0' (as no data has been returned yet). 530