18e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels% -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
28e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels%!TEX root = Vorbis_I_spec.tex
38e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels% $Id$
48e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\section{Introduction and Description} \label{vorbis:spec:intro}
58e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
68e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsection{Overview}
78e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
88e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThis document provides a high level description of the Vorbis codec's
98e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsconstruction.  A bit-by-bit specification appears beginning in
108e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\xref{vorbis:spec:codec}.
118e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe later sections assume a high-level
128e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsunderstanding of the Vorbis decode process, which is
138e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsprovided here.
148e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
158e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Application}
168e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis is a general purpose perceptual audio CODEC intended to allow
178e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmaximum encoder flexibility, thus allowing it to scale competitively
188e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsover an exceptionally wide range of bitrates.  At the high
198e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsquality/bitrate end of the scale (CD or DAT rate stereo, 16/24 bits)
208e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsit is in the same league as MPEG-2 and MPC.  Similarly, the 1.0
218e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsencoder can encode high-quality CD and DAT rate stereo at below 48kbps
228e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswithout resampling to a lower rate.  Vorbis is also intended for
238e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslower and higher sample rates (from 8kHz telephony to 192kHz digital
248e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmasters) and a range of channel representations (monaural,
258e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspolyphonic, stereo, quadraphonic, 5.1, ambisonic, or up to 255
268e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdiscrete channels).
278e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
288e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
298e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Classification}
308e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis I is a forward-adaptive monolithic transform CODEC based on the
318e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsModified Discrete Cosine Transform.  The codec is structured to allow
328e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsaddition of a hybrid wavelet filterbank in Vorbis II to offer better
338e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelstransient response and reproduction using a transform better suited to
348e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslocalized time events.
358e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
368e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
378e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Assumptions}
388e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
398e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe Vorbis CODEC design assumes a complex, psychoacoustically-aware
408e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsencoder and simple, low-complexity decoder. Vorbis decode is
418e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscomputationally simpler than mp3, although it does require more
428e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsworking memory as Vorbis has no static probability model; the vector
438e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscodebooks used in the first stage of decoding from the bitstream are
448e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspacked in their entirety into the Vorbis bitstream headers. In
458e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspacked form, these codebooks occupy only a few kilobytes; the extent
468e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsto which they are pre-decoded into a cache is the dominant factor in
478e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdecoder memory usage.
488e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
498e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
508e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis provides none of its own framing, synchronization or protection
518e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsagainst errors; it is solely a method of accepting input audio,
528e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdividing it into individual frames and compressing these frames into
538e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsraw, unformatted 'packets'. The decoder then accepts these raw
548e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspackets in sequence, decodes them, synthesizes audio frames from
558e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthem, and reassembles the frames into a facsimile of the original
568e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsaudio stream. Vorbis is a free-form variable bit rate (VBR) codec and packets have no
578e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsminimum size, maximum size, or fixed/expected size.  Packets
588e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsare designed that they may be truncated (or padded) and remain
598e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdecodable; this is not to be considered an error condition and is used
608e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsextensively in bitrate management in peeling.  Both the transport
618e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmechanism and decoder must allow that a packet may be any size, or
628e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsend before or after packet decode expects.
638e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
648e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis packets are thus intended to be used with a transport mechanism
658e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthat provides free-form framing, sync, positioning and error correction
668e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsin accordance with these design assumptions, such as Ogg (for file
678e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelstransport) or RTP (for network multicast).  For purposes of a few
688e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsexamples in this document, we will assume that Vorbis is to be
698e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsembedded in an Ogg stream specifically, although this is by no means a
708e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrequirement or fundamental assumption in the Vorbis design.
718e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
728e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe specification for embedding Vorbis into
738e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsan Ogg transport stream is in \xref{vorbis:over:ogg}.
748e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
758e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
768e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
778e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Codec Setup and Probability Model}
788e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
798e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis' heritage is as a research CODEC and its current design
808e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsreflects a desire to allow multiple decades of continuous encoder
818e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsimprovement before running out of room within the codec specification.
828e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsFor these reasons, configurable aspects of codec setup intentionally
838e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslean toward the extreme of forward adaptive.
848e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
858e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe single most controversial design decision in Vorbis (and the most
868e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsunusual for a Vorbis developer to keep in mind) is that the entire
878e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsprobability model of the codec, the Huffman and VQ codebooks, is
888e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspacked into the bitstream header along with extensive CODEC setup
898e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsparameters (often several hundred fields).  This makes it impossible,
908e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsas it would be with MPEG audio layers, to embed a simple frame type
918e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsflag in each audio packet, or begin decode at any frame in the stream
928e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswithout having previously fetched the codec setup header.
938e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
948e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
958e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\begin{note}
968e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis \emph{can} initiate decode at any arbitrary packet within a
978e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbitstream so long as the codec has been initialized/setup with the
988e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssetup headers.
998e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\end{note}
1008e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1018e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThus, Vorbis headers are both required for decode to begin and
1028e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrelatively large as bitstream headers go.  The header size is
1038e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsunbounded, although for streaming a rule-of-thumb of 4kB or less is
1048e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrecommended (and Xiph.Org's Vorbis encoder follows this suggestion).
1058e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1068e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsOur own design work indicates the primary liability of the
1078e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrequired header is in mindshare; it is an unusual design and thus
1088e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscauses some amount of complaint among engineers as this runs against
1098e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscurrent design trends (and also points out limitations in some
1108e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsexisting software/interface designs, such as Windows' ACM codec
1118e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsframework).  However, we find that it does not fundamentally limit
1128e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis' suitable application space.
1138e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1148e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1158e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Format Specification}
1168e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe Vorbis format is well-defined by its decode specification; any
1178e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsencoder that produces packets that are correctly decoded by the
1188e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsreference Vorbis decoder described below may be considered a proper
1198e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis encoder.  A decoder must faithfully and completely implement
1208e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe specification defined below (except where noted) to be considered
1218e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsa proper Vorbis decoder.
1228e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1238e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Hardware Profile}
1248e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsAlthough Vorbis decode is computationally simple, it may still run
1258e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsinto specific limitations of an embedded design.  For this reason,
1268e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsembedded designs are allowed to deviate in limited ways from the
1278e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels`full' decode specification yet still be certified compliant.  These
1288e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsoptional omissions are labelled in the spec where relevant.
1298e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1308e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1318e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsection{Decoder Configuration}
1328e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1338e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsDecoder setup consists of configuration of multiple, self-contained
1348e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscomponent abstractions that perform specific functions in the decode
1358e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspipeline.  Each different component instance of a specific type is
1368e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssemantically interchangeable; decoder configuration consists both of
1378e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsinternal component configuration, as well as arrangement of specific
1388e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsinstances into a decode pipeline.  Componentry arrangement is roughly
1398e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsas follows:
1408e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1418e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\begin{center}
1428e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\includegraphics[width=\textwidth]{components}
1438e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\captionof{figure}{decoder pipeline configuration}
1448e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\end{center}
1458e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1468e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Global Config}
1478e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsGlobal codec configuration consists of a few audio related fields
1488e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels(sample rate, channels), Vorbis version (always '0' in Vorbis I),
1498e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbitrate hints, and the lists of component instances.  All other
1508e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsconfiguration is in the context of specific components.
1518e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1528e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Mode}
1538e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1548e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsEach Vorbis frame is coded according to a master 'mode'.  A bitstream
1558e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmay use one or many modes.
1568e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1578e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe mode mechanism is used to encode a frame according to one of
1588e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmultiple possible methods with the intention of choosing a method best
1598e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssuited to that frame.  Different modes are, e.g. how frame size
1608e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsis changed from frame to frame. The mode number of a frame serves as a
1618e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelstop level configuration switch for all other specific aspects of frame
1628e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdecode.
1638e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1648e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsA 'mode' configuration consists of a frame size setting, window type
1658e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels(always 0, the Vorbis window, in Vorbis I), transform type (always
1668e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelstype 0, the MDCT, in Vorbis I) and a mapping number.  The mapping
1678e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsnumber specifies which mapping configuration instance to use for
1688e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslow-level packet decode and synthesis.
1698e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1708e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1718e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Mapping}
1728e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1738e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsA mapping contains a channel coupling description and a list of
1748e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels'submaps' that bundle sets of channel vectors together for grouped
1758e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsencoding and decoding. These submaps are not references to external
1768e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscomponents; the submap list is internal and specific to a mapping.
1778e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1788e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsA 'submap' is a configuration/grouping that applies to a subset of
1798e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsfloor and residue vectors within a mapping.  The submap functions as a
1808e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslast layer of indirection such that specific special floor or residue
1818e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssettings can be applied not only to all the vectors in a given mode,
1828e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbut also specific vectors in a specific mode.  Each submap specifies
1838e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe proper floor and residue instance number to use for decoding that
1848e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssubmap's spectral floor and spectral residue vectors.
1858e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1868e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsAs an example:
1878e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1888e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsAssume a Vorbis stream that contains six channels in the standard 5.1
1898e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsformat.  The sixth channel, as is normal in 5.1, is bass only.
1908e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsTherefore it would be wasteful to encode a full-spectrum version of it
1918e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsas with the other channels.  The submapping mechanism can be used to
1928e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsapply a full range floor and residue encoding to channels 0 through 4,
1938e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsand a bass-only representation to the bass channel, thus saving space.
1948e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsIn this example, channels 0-4 belong to submap 0 (which indicates use
1958e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsof a full-range floor) and channel 5 belongs to submap 1, which uses a
1968e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbass-only representation.
1978e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1988e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
1998e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Floor}
2008e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2018e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis encodes a spectral 'floor' vector for each PCM channel.  This
2028e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsvector is a low-resolution representation of the audio spectrum for
2038e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe given channel in the current frame, generally used akin to a
2048e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswhitening filter.  It is named a 'floor' because the Xiph.Org
2058e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsreference encoder has historically used it as a unit-baseline for
2068e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsspectral resolution.
2078e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2088e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsA floor encoding may be of two types.  Floor 0 uses a packed LSP
2098e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrepresentation on a dB amplitude scale and Bark frequency scale.
2108e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsFloor 1 represents the curve as a piecewise linear interpolated
2118e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrepresentation on a dB amplitude scale and linear frequency scale.
2128e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe two floors are semantically interchangeable in
2138e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsencoding/decoding. However, floor type 1 provides more stable
2148e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsinter-frame behavior, and so is the preferred choice in all
2158e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscoupled-stereo and high bitrate modes.  Floor 1 is also considerably
2168e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsless expensive to decode than floor 0.
2178e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2188e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsFloor 0 is not to be considered deprecated, but it is of limited
2198e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmodern use.  No known Vorbis encoder past Xiph.org's own beta 4 makes
2208e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsuse of floor 0.
2218e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2228e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe values coded/decoded by a floor are both compactly formatted and
2238e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmake use of entropy coding to save space.  For this reason, a floor
2248e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsconfiguration generally refers to multiple codebooks in the codebook
2258e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscomponent list.  Entropy coding is thus provided as an abstraction,
2268e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsand each floor instance may choose from any and all available
2278e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscodebooks when coding/decoding.
2288e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2298e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2308e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Residue}
2318e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe spectral residue is the fine structure of the audio spectrum
2328e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsonce the floor curve has been subtracted out.  In simplest terms, it
2338e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsis coded in the bitstream using cascaded (multi-pass) vector
2348e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsquantization according to one of three specific packing/coding
2358e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsalgorithms numbered 0 through 2.  The packing algorithm details are
2368e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsconfigured by residue instance.  As with the floor components, the
2378e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsfinal VQ/entropy encoding is provided by external codebook instances
2388e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsand each residue instance may choose from any and all available
2398e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscodebooks.
2408e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2418e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Codebooks}
2428e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2438e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsCodebooks are a self-contained abstraction that perform entropy
2448e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdecoding and, optionally, use the entropy-decoded integer value as an
2458e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsoffset into an index of output value vectors, returning the indicated
2468e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsvector of values.
2478e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2488e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe entropy coding in a Vorbis I codebook is provided by a standard
2498e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsHuffman binary tree representation.  This tree is tightly packed using
2508e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsone of several methods, depending on whether codeword lengths are
2518e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsordered or unordered, or the tree is sparse.
2528e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2538e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe codebook vector index is similarly packed according to index
2548e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscharacteristic.  Most commonly, the vector index is encoded as a
2558e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssingle list of values of possible values that are then permuted into
2568e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsa list of n-dimensional rows (lattice VQ).
2578e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2588e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2598e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2608e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsection{High-level Decode Process}
2618e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2628e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Decode Setup}
2638e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2648e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsBefore decoding can begin, a decoder must initialize using the
2658e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbitstream headers matching the stream to be decoded.  Vorbis uses
2668e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthree header packets; all are required, in-order, by this
2678e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsspecification. Once set up, decode may begin at any audio packet
2688e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbelonging to the Vorbis stream. In Vorbis I, all packets after the
2698e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthree initial headers are audio packets.
2708e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2718e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe header packets are, in order, the identification
2728e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsheader, the comments header, and the setup header.
2738e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2748e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{Identification Header}
2758e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe identification header identifies the bitstream as Vorbis, Vorbis
2768e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsversion, and the simple audio characteristics of the stream such as
2778e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssample rate and number of channels.
2788e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2798e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{Comment Header}
2808e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe comment header includes user text comments (``tags'') and a vendor
2818e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsstring for the application/library that produced the bitstream.  The
2828e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsencoding and proper use of the comment header is described in \xref{vorbis:spec:comment}.
2838e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2848e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{Setup Header}
2858e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe setup header includes extensive CODEC setup information as well as
2868e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe complete VQ and Huffman codebooks needed for decode.
2878e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2888e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2898e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\subsubsection{Decode Procedure}
2908e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
2918e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe decoding and synthesis procedure for all audio packets is
2928e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsfundamentally the same.
2938e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\begin{enumerate}
2948e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item decode packet type flag
2958e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item decode mode number
2968e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item decode window shape (long windows only)
2978e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item decode floor
2988e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item decode residue into residue vectors
2998e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item inverse channel coupling of residue vectors
3008e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item generate floor curve from decoded floor data
3018e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item compute dot product of floor and residue, producing audio spectrum vector
3028e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item inverse monolithic transform of audio spectrum vector, always an MDCT in Vorbis I
3038e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item overlap/add left-hand output of transform with right-hand output of previous frame
3048e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item store right hand-data from transform of current frame for future lapping
3058e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\item if not first frame, return results of overlap/add as audio result of current frame
3068e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\end{enumerate}
3078e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3088e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsNote that clever rearrangement of the synthesis arithmetic is
3098e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspossible; as an example, one can take advantage of symmetries in the
3108e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsMDCT to store the right-hand transform data of a partial MDCT for a
3118e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels50\% inter-frame buffer space savings, and then complete the transform
3128e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslater before overlap/add with the next frame.  This optimization
3138e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsproduces entirely equivalent output and is naturally perfectly legal.
3148e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe decoder must be \emph{entirely mathematically equivalent} to the
3158e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsspecification, it need not be a literal semantic implementation.
3168e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3178e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{Packet type decode}
3188e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3198e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis I uses four packet types. The first three packet types mark each
3208e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsof the three Vorbis headers described above. The fourth packet type
3218e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmarks an audio packet. All other packet types are reserved; packets
3228e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmarked with a reserved type should be ignored.
3238e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3248e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsFollowing the three header packets, all packets in a Vorbis I stream
3258e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsare audio.  The first step of audio packet decode is to read and
3268e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsverify the packet type; \emph{a non-audio packet when audio is expected
3278e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsindicates stream corruption or a non-compliant stream. The decoder
3288e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmust ignore the packet and not attempt decoding it to
3298e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsaudio}.
3308e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3318e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3328e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3338e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3348e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{Mode decode}
3358e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis allows an encoder to set up multiple, numbered packet 'modes',
3368e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsas described earlier, all of which may be used in a given Vorbis
3378e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsstream. The mode is encoded as an integer used as a direct offset into
3388e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe mode instance index.
3398e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3408e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3418e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{Window shape decode (long windows only)} \label{vorbis:spec:window}
3428e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3438e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis frames may be one of two PCM sample sizes specified during
3448e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscodec setup.  In Vorbis I, legal frame sizes are powers of two from 64
3458e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsto 8192 samples.  Aside from coupling, Vorbis handles channels as
3468e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsindependent vectors and these frame sizes are in samples per channel.
3478e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3488e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis uses an overlapping transform, namely the MDCT, to blend one
3498e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsframe into the next, avoiding most inter-frame block boundary
3508e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsartifacts.  The MDCT output of one frame is windowed according to MDCT
3518e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrequirements, overlapped 50\% with the output of the previous frame and
3528e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsadded.  The window shape assures seamless reconstruction.
3538e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3548e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThis is easy to visualize in the case of equal sized-windows:
3558e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3568e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\begin{center}
3578e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\includegraphics[width=\textwidth]{window1}
3588e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\captionof{figure}{overlap of two equal-sized windows}
3598e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\end{center}
3608e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3618e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsAnd slightly more complex in the case of overlapping unequal sized
3628e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswindows:
3638e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3648e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\begin{center}
3658e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\includegraphics[width=\textwidth]{window2}
3668e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\captionof{figure}{overlap of a long and a short window}
3678e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\end{center}
3688e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3698e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsIn the unequal-sized window case, the window shape of the long window
3708e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmust be modified for seamless lapping as above.  It is possible to
3718e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscorrectly infer window shape to be applied to the current window from
3728e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsknowing the sizes of the current, previous and next window.  It is
3738e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslegal for a decoder to use this method. However, in the case of a long
3748e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswindow (short windows require no modification), Vorbis also codes two
3758e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsflag bits to specify pre- and post- window shape.  Although not
3768e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsstrictly necessary for function, this minor redundancy allows a packet
3778e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsto be fully decoded to the point of lapping entirely independently of
3788e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsany other packet, allowing easier abstraction of decode layers as well
3798e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsas allowing a greater level of easy parallelism in encode and
3808e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdecode.
3818e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3828e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsA description of valid window functions for use with an inverse MDCT
3838e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscan be found in \cite{Sporer/Brandenburg/Edler}.  Vorbis windows
3848e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsall use the slope function
3858e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\[ y = \sin(.5*\pi \, \sin^2((x+.5)/n*\pi)) . \]
3868e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3878e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3888e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3898e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{floor decode}
3908e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsEach floor is encoded/decoded in channel order, however each floor
3918e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbelongs to a 'submap' that specifies which floor configuration to
3928e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsuse.  All floors are decoded before residue decode begins.
3938e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3948e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3958e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{residue decode}
3968e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
3978e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsAlthough the number of residue vectors equals the number of channels,
3988e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelschannel coupling may mean that the raw residue vectors extracted
3998e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsduring decode do not map directly to specific channels.  When channel
4008e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscoupling is in use, some vectors will correspond to coupled magnitude
4018e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsor angle.  The coupling relationships are described in the codec setup
4028e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsand may differ from frame to frame, due to different mode numbers.
4038e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4048e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis codes residue vectors in groups by submap; the coding is done
4058e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsin submap order from submap 0 through n-1.  This differs from floors
4068e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswhich are coded using a configuration provided by submap number, but
4078e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsare coded individually in channel order.
4088e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4098e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4108e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4118e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{inverse channel coupling}
4128e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4138e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsA detailed discussion of stereo in the Vorbis codec can be found in
4148e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe document \href{stereo.html}{Stereo Channel Coupling in the
4158e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis CODEC}.  Vorbis is not limited to only stereo coupling, but
4168e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe stereo document also gives a good overview of the generic coupling
4178e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmechanism.
4188e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4198e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsVorbis coupling applies to pairs of residue vectors at a time;
4208e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdecoupling is done in-place a pair at a time in the order and using
4218e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe vectors specified in the current mapping configuration.  The
4228e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdecoupling operation is the same for all pairs, converting square
4238e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelspolar representation (where one vector is magnitude and the second
4248e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsangle) back to Cartesian representation.
4258e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4268e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsAfter decoupling, in order, each pair of vectors on the coupling list,
4278e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe resulting residue vectors represent the fine spectral detail
4288e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsof each output channel.
4298e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4308e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4318e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4328e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{generate floor curve}
4338e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4348e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe decoder may choose to generate the floor curve at any appropriate
4358e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelstime.  It is reasonable to generate the output curve when the floor
4368e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdata is decoded from the raw packet, or it can be generated after
4378e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsinverse coupling and applied to the spectral residue directly,
4388e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscombining generation and the dot product into one step and eliminating
4398e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelssome working space.
4408e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4418e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsBoth floor 0 and floor 1 generate a linear-range, linear-domain output
4428e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsvector to be multiplied (dot product) by the linear-range,
4438e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslinear-domain spectral residue.
4448e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4458e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4468e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4478e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{compute floor/residue dot product}
4488e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4498e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThis step is straightforward; for each output channel, the decoder
4508e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmultiplies the floor curve and residue vectors element by element,
4518e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsproducing the finished audio spectrum of each channel.
4528e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4538e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels% TODO/FIXME: The following two paragraphs have identical twins
4548e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels%   in section 4 (under "dot product")
4558e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsOne point is worth mentioning about this dot product; a common mistake
4568e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsin a fixed point implementation might be to assume that a 32 bit
4578e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsfixed-point representation for floor and residue and direct
4588e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmultiplication of the vectors is sufficient for acceptable spectral
4598e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdepth in all cases because it happens to mostly work with the current
4608e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsXiph.Org reference encoder.
4618e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4628e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsHowever, floor vector values can span \~{}140dB (\~{}24 bits unsigned), and
4638e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe audio spectrum vector should represent a minimum of 120dB (\~{}21
4648e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbits with sign), even when output is to a 16 bit PCM device.  For the
4658e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsresidue vector to represent full scale if the floor is nailed to
4668e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels$-140$dB, it must be able to span 0 to $+140$dB.  For the residue vector
4678e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsto reach full scale if the floor is nailed at 0dB, it must be able to
4688e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrepresent $-140$dB to $+0$dB.  Thus, in order to handle full range
4698e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdynamics, a residue vector may span $-140$dB to $+140$dB entirely within
4708e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsspec.  A 280dB range is approximately 48 bits with sign; thus the
4718e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsresidue vector must be able to represent a 48 bit range and the dot
4728e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsproduct must be able to handle an effective 48 bit times 24 bit
4738e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsmultiplication.  This range may be achieved using large (64 bit or
4748e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelslarger) integers, or implementing a movable binary point
4758e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsrepresentation.
4768e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4778e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4788e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4798e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{inverse monolithic transform (MDCT)}
4808e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4818e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe audio spectrum is converted back into time domain PCM audio via an
4828e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsinverse Modified Discrete Cosine Transform (MDCT).  A detailed
4838e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsdescription of the MDCT is available in \cite{Sporer/Brandenburg/Edler}.
4848e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4858e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsNote that the PCM produced directly from the MDCT is not yet finished
4868e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsaudio; it must be lapped with surrounding frames using an appropriate
4878e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswindow (such as the Vorbis window) before the MDCT can be considered
4888e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsorthogonal.
4898e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4908e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4918e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
4928e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{overlap/add data}
4938e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsWindowed MDCT output is overlapped and added with the right hand data
4948e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsof the previous window such that the 3/4 point of the previous window
4958e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsis aligned with the 1/4 point of the current window (as illustrated in
4968e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe window overlap diagram). At this point, the audio data between the
4978e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscenter of the previous frame and the center of the current frame is
4988e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsnow finished and ready to be returned.
4998e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
5008e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
5018e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{cache right hand data}
5028e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe decoder must cache the right hand portion of the current frame to
5038e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsbe lapped with the left hand portion of the next frame.
5048e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
5058e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
5068e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
5078e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\paragraph{return finished audio data}
5088e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
5098e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThe overlapped portion produced from overlapping the previous and
5108e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscurrent frame data is finished data to be returned by the decoder.
5118e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsThis data spans from the center of the previous window to the center
5128e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsof the current window.  In the case of same-sized windows, the amount
5138e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsof data to return is one-half block consisting of and only of the
5148e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsoverlapped portions. When overlapping a short and long window, much of
5158e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe returned range is not actually overlap.  This does not damage
5168e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelstransform orthogonality.  Pay attention however to returning the
5178e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscorrect data range; the amount of data to be returned is:
5188e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
5198e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\begin{Verbatim}[commandchars=\\\{\}]
5208e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswindow_blocksize(previous_window)/4+window_blocksize(current_window)/4
5218e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels\end{Verbatim}
5228e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
5238e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsfrom the center of the previous window to the center of the current
5248e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelswindow.
5258e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckels
5268e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas EckelsData is not returned from the first frame; it must be used to 'prime'
5278e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsthe decode engine.  The encoder accounts for this priming when
5288e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelscalculating PCM offsets; after the first frame, the proper PCM output
5298e01cdce135d5d816f92d7bb83f9a930aa1b45aeLucas Eckelsoffset is '0' (as no data has been returned yet).
530