1% -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
2%!TEX root = Vorbis_I_spec.tex
3% $Id$
4\section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg}
5
6\subsection{Overview}
7
8This document describes using Ogg logical and physical transport
9streams to encapsulate Vorbis compressed audio packet data into file
10form.
11
12The \xref{vorbis:spec:intro} provides an overview of the construction
13of Vorbis audio packets.
14
15The \href{oggstream.html}{Ogg
16bitstream overview} and \href{framing.html}{Ogg logical
17bitstream and framing spec} provide detailed descriptions of Ogg
18transport streams. This specification document assumes a working
19knowledge of the concepts covered in these named backround
20documents.  Please read them first.
21
22\subsubsection{Restrictions}
23
24The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis
25streams use Ogg transport streams in degenerate, unmultiplexed
26form only. That is:
27
28\begin{itemize}
29 \item
30  A meta-headerless Ogg file encapsulates the Vorbis I packets
31
32 \item
33  The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links).
34
35 \item
36  The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link)
37
38\end{itemize}
39
40
41This is not to say that it is not currently possible to multiplex
42Vorbis with other media types into a multi-stream Ogg file.  At the
43time this document was written, Ogg was becoming a popular container
44for low-bitrate movies consisting of DivX video and Vorbis audio.
45However, a 'Vorbis I audio file' is taken to imply Vorbis audio
46existing alone within a degenerate Ogg stream.  A compliant 'Vorbis
47audio player' is not required to implement Ogg support beyond the
48specific support of Vorbis within a degenrate Ogg stream (naturally,
49application authors are encouraged to support full multiplexed Ogg
50handling).
51
52
53
54
55\subsubsection{MIME type}
56
57The MIME type of Ogg files depend on the context.  Specifically, complex
58multimedia and applications should use \literal{application/ogg},
59while visual media should use \literal{video/ogg}, and audio
60\literal{audio/ogg}.  Vorbis data encapsulated in Ogg may appear
61in any of those types.  RTP encapsulated Vorbis should use
62\literal{audio/vorbis} + \literal{audio/vorbis-config}.
63
64
65\subsection{Encapsulation}
66
67Ogg encapsulation of a Vorbis packet stream is straightforward.
68
69\begin{itemize}
70
71\item
72  The first Vorbis packet (the identification header), which
73  uniquely identifies a stream as Vorbis audio, is placed alone in the
74  first page of the logical Ogg stream.  This results in a first Ogg
75  page of exactly 58 bytes at the very beginning of the logical stream.
76
77
78\item
79  This first page is marked 'beginning of stream' in the page flags.
80
81
82\item
83  The second and third vorbis packets (comment and setup
84  headers) may span one or more pages beginning on the second page of
85  the logical stream.  However many pages they span, the third header
86  packet finishes the page on which it ends.  The next (first audio) packet
87  must begin on a fresh page.
88
89
90\item
91  The granule position of these first pages containing only headers is zero.
92
93
94\item
95  The first audio packet of the logical stream begins a fresh Ogg page.
96
97
98\item
99  Packets are placed into ogg pages in order until the end of stream.
100
101
102\item
103  The last page is marked 'end of stream' in the page flags.
104
105
106\item
107  Vorbis packets may span page boundaries.
108
109
110\item
111  The granule position of pages containing Vorbis audio is in units
112  of PCM audio samples (per channel; a stereo stream's granule position
113  does not increment at twice the speed of a mono stream).
114
115
116\item
117  The granule position of a page represents the end PCM sample
118  position of the last packet \emph{completed} on that
119  page.  The 'last PCM sample' is the last complete sample returned by
120  decode, not an internal sample awaiting lapping with a
121  subsequent block.  A page that is entirely spanned by a single
122  packet (that completes on a subsequent page) has no granule
123  position, and the granule position is set to '-1'.
124
125
126  Note that the last decoded (fully lapped) PCM sample from a packet
127  is not necessarily the middle sample from that block. If, eg, the
128  current Vorbis packet encodes a "long block" and the next Vorbis
129  packet encodes a "short block", the last decodable sample from the
130  current packet be at position (3*long\_block\_length/4) -
131  (short\_block\_length/4).
132
133
134\item
135    The granule (PCM) position of the first page need not indicate
136    that the stream started at position zero.  Although the granule
137    position belongs to the last completed packet on the page and a
138    valid granule position must be positive, by
139    inference it may indicate that the PCM position of the beginning
140    of audio is positive or negative.
141
142
143  \begin{itemize}
144    \item
145        A positive starting value simply indicates that this stream begins at
146        some positive time offset, potentially within a larger
147        program. This is a common case when connecting to the middle
148        of broadcast stream.
149
150    \item
151        A negative value indicates that
152        output samples preceeding time zero should be discarded during
153        decoding; this technique is used to allow sample-granularity
154        editing of the stream start time of already-encoded Vorbis
155        streams.  The number of samples to be discarded must not exceed
156        the overlap-add span of the first two audio packets.
157
158  \end{itemize}
159
160
161    In both of these cases in which the initial audio PCM starting
162    offset is nonzero, the second finished audio packet must flush the
163    page on which it appears and the third packet begin a fresh page.
164    This allows the decoder to always be able to perform PCM position
165    adjustments before needing to return any PCM data from synthesis,
166    resulting in correct positioning information without any aditional
167    seeking logic.
168
169
170  \begin{note}
171    Failure to do so should, at worst, cause a
172    decoder implementation to return incorrect positioning information
173    for seeking operations at the very beginning of the stream.
174  \end{note}
175
176
177\item
178  A granule position on the final page in a stream that indicates
179  less audio data than the final packet would normally return is used to
180  end the stream on other than even frame boundaries.  The difference
181  between the actual available data returned and the declared amount
182  indicates how many trailing samples to discard from the decoding
183  process.
184
185\end{itemize}
186