1page.title=Implementing graphics
2@jd:body
3
4<!--
5    Copyright 2014 The Android Open Source Project
6
7    Licensed under the Apache License, Version 2.0 (the "License");
8    you may not use this file except in compliance with the License.
9    You may obtain a copy of the License at
10
11        http://www.apache.org/licenses/LICENSE-2.0
12
13    Unless required by applicable law or agreed to in writing, software
14    distributed under the License is distributed on an "AS IS" BASIS,
15    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16    See the License for the specific language governing permissions and
17    limitations under the License.
18-->
19
20<div id="qv-wrapper">
21  <div id="qv">
22    <h2>In this document</h2>
23    <ol id="auto-toc">
24    </ol>
25  </div>
26</div>
27
28
29<p>Follow the instructions here to implement the Android graphics HAL.</p>
30
31<h2 id=requirements>Requirements</h2>
32
33<p>The following list and sections describe what you need to provide to support
34graphics in your product:</p>
35
36<ul> <li> OpenGL ES 1.x Driver <li> OpenGL ES 2.0 Driver <li> OpenGL ES 3.0
37Driver (optional) <li> EGL Driver <li> Gralloc HAL implementation <li> Hardware
38Composer HAL implementation <li> Framebuffer HAL implementation </ul>
39
40<h2 id=implementation>Implementation</h2>
41
42<h3 id=opengl_and_egl_drivers>OpenGL and EGL drivers</h3>
43
44<p>You must provide drivers for OpenGL ES 1.x, OpenGL ES 2.0, and EGL. Here are
45some key considerations:</p>
46
47<ul> <li> The GL driver needs to be robust and conformant to OpenGL ES
48standards.  <li> Do not limit the number of GL contexts. Because Android allows
49apps in the background and tries to keep GL contexts alive, you should not
50limit the number of contexts in your driver.  <li> It is not uncommon to have
5120-30 active GL contexts at once, so you should also be careful with the amount
52of memory allocated for each context.  <li> Support the YV12 image format and
53any other YUV image formats that come from other components in the system such
54as media codecs or the camera.  <li> Support the mandatory extensions:
55<code>GL_OES_texture_external</code>,
56<code>EGL_ANDROID_image_native_buffer</code>, and
57<code>EGL_ANDROID_recordable</code>. The
58<code>EGL_ANDROID_framebuffer_target</code> extension is required for Hardware
59Composer 1.1 and higher, as well.  <li> We highly recommend also supporting
60<code>EGL_ANDROID_blob_cache</code>, <code>EGL_KHR_fence_sync</code>,
61<code>EGL_KHR_wait_sync</code>, and <code>EGL_ANDROID_native_fence_sync</code>.
62</ul>
63
64<p>Note the OpenGL API exposed to app developers is different from the OpenGL
65interface that you are implementing. Apps do not have access to the GL driver
66layer and must go through the interface provided by the APIs.</p>
67
68<h3 id=pre-rotation>Pre-rotation</h3>
69
70<p>Many hardware overlays do not support rotation, and even if they do it costs
71processing power. So the solution is to pre-transform the buffer before it
72reaches SurfaceFlinger. A query hint in <code>ANativeWindow</code> was added
73(<code>NATIVE_WINDOW_TRANSFORM_HINT</code>) that represents the most likely
74transform to be applied to the buffer by SurfaceFlinger. Your GL driver can use
75this hint to pre-transform the buffer before it reaches SurfaceFlinger so when
76the buffer arrives, it is correctly transformed.</p>
77
78<p>For example, you may receive a hint to rotate 90 degrees. You must generate
79a matrix and apply it to the buffer to prevent it from running off the end of
80the page. To save power, this should be done in pre-rotation. See the
81<code>ANativeWindow</code> interface defined in
82<code>system/core/include/system/window.h</code> for more details.</p>
83
84<h3 id=gralloc_hal>Gralloc HAL</h3>
85
86<p>The graphics memory allocator is needed to allocate memory that is requested
87by image producers. You can find the interface definition of the HAL at:
88<code>hardware/libhardware/modules/gralloc.h</code></p>
89
90<h3 id=protected_buffers>Protected buffers</h3>
91
92<p>The gralloc usage flag <code>GRALLOC_USAGE_PROTECTED</code> allows the
93graphics buffer to be displayed only through a hardware-protected path. These
94overlay planes are the only way to display DRM content. DRM-protected buffers
95cannot be accessed by SurfaceFlinger or the OpenGL ES driver.</p>
96
97<p>DRM-protected video can be presented only on an overlay plane. Video players
98that support protected content must be implemented with SurfaceView. Software
99running on unprotected hardware cannot read or write the buffer.
100Hardware-protected paths must appear on the Hardware Composer overlay. For
101instance, protected videos will disappear from the display if Hardware Composer
102switches to OpenGL ES composition.</p>
103
104<p>See the <a href="{@docRoot}devices/drm.html">DRM</a> page for a description
105of protected content.</p>
106
107<h3 id=hardware_composer_hal>Hardware Composer HAL</h3>
108
109<p>The Hardware Composer HAL is used by SurfaceFlinger to composite surfaces to
110the screen. The Hardware Composer abstracts objects like overlays and 2D
111blitters and helps offload some work that would normally be done with
112OpenGL.</p>
113
114<p>We recommend you start using version 1.3 of the Hardware Composer HAL as it
115will provide support for the newest features (explicit synchronization,
116external displays, and more). Because the physical display hardware behind the
117Hardware Composer abstraction layer can vary from device to device, it is
118difficult to define recommended features. But here is some guidance:</p>
119
120<ul> <li> The Hardware Composer should support at least four overlays (status
121bar, system bar, application, and wallpaper/background).  <li> Layers can be
122bigger than the screen, so the Hardware Composer should be able to handle
123layers that are larger than the display (for example, a wallpaper).  <li>
124Pre-multiplied per-pixel alpha blending and per-plane alpha blending should be
125supported at the same time.  <li> The Hardware Composer should be able to
126consume the same buffers that the GPU, camera, video decoder, and Skia buffers
127are producing, so supporting some of the following properties is helpful: <ul>
128<li> RGBA packing order <li> YUV formats <li> Tiling, swizzling, and stride
129properties </ul> <li> A hardware path for protected video playback must be
130present if you want to support protected content.  </ul>
131
132<p>The general recommendation when implementing your Hardware Composer is to
133implement a non-operational Hardware Composer first. Once you have the
134structure done, implement a simple algorithm to delegate composition to the
135Hardware Composer. For example, just delegate the first three or four surfaces
136to the overlay hardware of the Hardware Composer.</p>
137
138<p>Focus on optimization, such as intelligently selecting the surfaces to send
139to the overlay hardware that maximizes the load taken off of the GPU. Another
140optimization is to detect whether the screen is updating. If not, delegate
141composition to OpenGL instead of the Hardware Composer to save power. When the
142screen updates again, continue to offload composition to the Hardware
143Composer.</p>
144
145<p>Devices must report the display mode (or resolution). Android uses the first
146mode reported by the device. To support televisions, have the TV device report
147the mode selected for it by the manufacturer to Hardware Composer. See
148hwcomposer.h for more details.</p>
149
150<p>Prepare for common use cases, such as:</p>
151
152<ul> <li> Full-screen games in portrait and landscape mode <li> Full-screen
153video with closed captioning and playback control <li> The home screen
154(compositing the status bar, system bar, application window, and live
155wallpapers) <li> Protected video playback <li> Multiple display support </ul>
156
157<p>These use cases should address regular, predictable uses rather than edge
158cases that are rarely encountered. Otherwise, any optimization will have little
159benefit. Implementations must balance two competing goals: animation smoothness
160and interaction latency.</p>
161
162<p>Further, to make best use of Android graphics, you must develop a robust
163clocking strategy. Performance matters little if clocks have been turned down
164to make every operation slow. You need a clocking strategy that puts the clocks
165at high speed when needed, such as to make animations seamless, and then slows
166the clocks whenever the increased speed is no longer needed.</p>
167
168<p>Use the <code>adb shell dumpsys SurfaceFlinger</code> command to see
169precisely what SurfaceFlinger is doing. See the <a
170href="{@docRoot}devices/graphics/architecture.html#hwcomposer">Hardware
171Composer</a> section of the Architecture page for example output and a
172description of relevant fields.</p>
173
174<p>You can find the HAL for the Hardware Composer and additional documentation
175in: <code>hardware/libhardware/include/hardware/hwcomposer.h
176hardware/libhardware/include/hardware/hwcomposer_defs.h</code></p>
177
178<p>A stub implementation is available in the
179<code>hardware/libhardware/modules/hwcomposer</code> directory.</p>
180
181<h3 id=vsync>VSYNC</h3>
182
183<p>VSYNC synchronizes certain events to the refresh cycle of the display.
184Applications always start drawing on a VSYNC boundary, and SurfaceFlinger
185always composites on a VSYNC boundary. This eliminates stutters and improves
186visual performance of graphics. The Hardware Composer has a function
187pointer:</p>
188
189<pre class=prettyprint> int (waitForVsync*) (int64_t *timestamp) </pre>
190
191
192<p>This points to a function you must implement for VSYNC. This function blocks
193until a VSYNC occurs and returns the timestamp of the actual VSYNC. A message
194must be sent every time VSYNC occurs. A client can receive a VSYNC timestamp
195once, at specified intervals, or continuously (interval of 1). You must
196implement VSYNC to have no more than a 1ms lag at the maximum (0.5ms or less is
197recommended), and the timestamps returned must be extremely accurate.</p>
198
199<h4 id=explicit_synchronization>Explicit synchronization</h4>
200
201<p>Explicit synchronization is required and provides a mechanism for Gralloc
202buffers to be acquired and released in a synchronized way. Explicit
203synchronization allows producers and consumers of graphics buffers to signal
204when they are done with a buffer. This allows the Android system to
205asynchronously queue buffers to be read or written with the certainty that
206another consumer or producer does not currently need them. See the <a
207href="#synchronization_framework">Synchronization framework</a> section for an overview of
208this mechanism.</p>
209
210<p>The benefits of explicit synchronization include less behavior variation
211between devices, better debugging support, and improved testing metrics. For
212instance, the sync framework output readily identifies problem areas and root
213causes. And centralized SurfaceFlinger presentation timestamps show when events
214occur in the normal flow of the system.</p>
215
216<p>This communication is facilitated by the use of synchronization fences,
217which are now required when requesting a buffer for consuming or producing. The
218synchronization framework consists of three main building blocks:
219sync_timeline, sync_pt, and sync_fence.</p>
220
221<h5 id=sync_timeline>sync_timeline</h5>
222
223<p>A sync_timeline is a monotonically increasing timeline that should be
224implemented for each driver instance, such as a GL context, display controller,
225or 2D blitter. This is essentially a counter of jobs submitted to the kernel
226for a particular piece of hardware. It provides guarantees about the order of
227operations and allows hardware-specific implementations.</p>
228
229<p>Please note, the sync_timeline is offered as a CPU-only reference
230implementation called sw_sync (which stands for software sync). If possible,
231use sw_sync instead of a sync_timeline to save resources and avoid complexity.
232If you’re not employing a hardware resource, sw_sync should be sufficient.</p>
233
234<p>If you must implement a sync_timeline, use the sw_sync driver as a starting
235point. Follow these guidelines:</p>
236
237<ul> <li> Provide useful names for all drivers, timelines, and fences. This
238simplifies debugging.  <li> Implement timeline_value str and pt_value_str
239operators in your timelines as they make debugging output much more readable.
240<li> If you want your userspace libraries (such as the GL library) to have
241access to the private data of your timelines, implement the fill driver_data
242operator. This lets you get information about the immutable sync_fence and
243sync_pts so you might build command lines based upon them.  </ul>
244
245<p>When implementing a sync_timeline, <strong>don’t</strong>:</p>
246
247<ul> <li> Base it on any real view of time, such as when a wall clock or other
248piece of work might finish. It is better to create an abstract timeline that
249you can control.  <li> Allow userspace to explicitly create or signal a fence.
250This can result in one piece of the user pipeline creating a denial-of-service
251attack that halts all functionality. This is because the userspace cannot make
252promises on behalf of the kernel.  <li> Access sync_timeline, sync_pt, or
253sync_fence elements explicitly, as the API should provide all required
254functions.  </ul>
255
256<h5 id=sync_pt>sync_pt</h5>
257
258<p>A sync_pt is a single value or point on a sync_timeline. A point has three
259states: active, signaled, and error. Points start in the active state and
260transition to the signaled or error states. For instance, when a buffer is no
261longer needed by an image consumer, this sync_point is signaled so that image
262producers know it is okay to write into the buffer again.</p>
263
264<h5 id=sync_fence>sync_fence</h5>
265
266<p>A sync_fence is a collection of sync_pts that often have different
267sync_timeline parents (such as for the display controller and GPU). These are
268the main primitives over which drivers and userspace communicate their
269dependencies. A fence is a promise from the kernel that it gives upon accepting
270work that has been queued and assures completion in a finite amount of
271time.</p>
272
273<p>This allows multiple consumers or producers to signal they are using a
274buffer and to allow this information to be communicated with one function
275parameter. Fences are backed by a file descriptor and can be passed from
276kernel-space to user-space. For instance, a fence can contain two sync_points
277that signify when two separate image consumers are done reading a buffer. When
278the fence is signaled, the image producers know both consumers are done
279consuming.
280
281Fences, like sync_pts, start active and then change state based upon the state
282of their points. If all sync_pts become signaled, the sync_fence becomes
283signaled. If one sync_pt falls into an error state, the entire sync_fence has
284an error state.
285
286Membership in the sync_fence is immutable once the fence is created. And since
287a sync_pt can be in only one fence, it is included as a copy. Even if two
288points have the same value, there will be two copies of the sync_pt in the
289fence.
290
291To get more than one point in a fence, a merge operation is conducted. In the
292merge, the points from two distinct fences are added to a third fence. If one
293of those points was signaled in the originating fence, and the other was not,
294the third fence will also not be in a signaled state.</p>
295
296<p>To implement explicit synchronization, you need to provide the
297following:</p>
298
299<ul> <li> A kernel-space driver that implements a synchronization timeline for
300a particular piece of hardware. Drivers that need to be fence-aware are
301generally anything that accesses or communicates with the Hardware Composer.
302Here are the key files (found in the android-3.4 kernel branch): <ul> <li> Core
303implementation: <ul> <li> <code>kernel/common/include/linux/sync.h</code> <li>
304<code>kernel/common/drivers/base/sync.c</code> </ul> <li> sw_sync: <ul> <li>
305<code>kernel/common/include/linux/sw_sync.h</code> <li>
306<code>kernel/common/drivers/base/sw_sync.c</code> </ul> <li> Documentation:
307<li> <code>kernel/common//Documentation/sync.txt</code> Finally, the
308<code>platform/system/core/libsync</code> directory includes a library to
309communicate with the kernel-space.  </ul> <li> A Hardware Composer HAL module
310(version 1.3 or later) that supports the new synchronization functionality. You
311will need to provide the appropriate synchronization fences as parameters to
312the set() and prepare() functions in the HAL.  <li> Two GL-specific extensions
313related to fences, <code>EGL_ANDROID_native_fence_sync</code> and
314<code>EGL_ANDROID_wait_sync</code>, along with incorporating fence support into
315your graphics drivers.  </ul>
316
317<p>For example, to use the API supporting the synchronization function, you
318might develop a display driver that has a display buffer function. Before the
319synchronization framework existed, this function would receive dma-bufs, put
320those buffers on the display, and block while the buffer is visible, like
321so:</p>
322
323<pre class=prettyprint>
324/*
325 * assumes buf is ready to be displayed.  returns when buffer is no longer on
326 * screen.
327 */
328void display_buffer(struct dma_buf *buf); </pre>
329
330
331<p>With the synchronization framework, the API call is slightly more complex.
332While putting a buffer on display, you associate it with a fence that says when
333the buffer will be ready. So you queue up the work, which you will initiate
334once the fence clears.</p>
335
336<p>In this manner, you are not blocking anything. You immediately return your
337own fence, which is a guarantee of when the buffer will be off of the display.
338As you queue up buffers, the kernel will list dependencies. With the
339synchronization framework:</p>
340
341<pre class=prettyprint>
342/*
343 * will display buf when fence is signaled.  returns immediately with a fence
344 * that will signal when buf is no longer displayed.
345 */
346struct sync_fence* display_buffer(struct dma_buf *buf, struct sync_fence
347*fence); </pre>
348
349
350<h4 id=sync_integration>Sync integration</h4>
351
352<h5 id=integration_conventions>Integration conventions</h5>
353
354<p>This section explains how to integrate the low-level sync framework with
355different parts of the Android framework and the drivers that need to
356communicate with one another.</p>
357
358<p>The Android HAL interfaces for graphics follow consistent conventions so
359when file descriptors are passed across a HAL interface, ownership of the file
360descriptor is always transferred. This means:</p>
361
362<ul> <li> if you receive a fence file descriptor from the sync framework, you
363must close it.  <li> if you return a fence file descriptor to the sync
364framework, the framework will close it.  <li> if you want to continue using the
365fence file descriptor, you must duplicate the descriptor.  </ul>
366
367<p>Every time a fence is passed through BufferQueue - such as for a window that
368passes a fence to BufferQueue saying when its new contents will be ready - the
369fence object is renamed. Since kernel fence support allows fences to have
370strings for names, the sync framework uses the window name and buffer index
371that is being queued to name the fence, for example:
372<code>SurfaceView:0</code></p>
373
374<p>This is helpful in debugging to identify the source of a deadlock. Those
375names appear in the output of <code>/d/sync</code> and bug reports when
376taken.</p>
377
378<h5 id=anativewindow_integration>ANativeWindow integration</h5>
379
380<p>ANativeWindow is fence aware. <code>dequeueBuffer</code>,
381<code>queueBuffer</code>, and <code>cancelBuffer</code> have fence
382parameters.</p>
383
384<h5 id=opengl_es_integration>OpenGL ES integration</h5>
385
386<p>OpenGL ES sync integration relies upon these two EGL extensions:</p>
387
388<ul> <li> <code>EGL_ANDROID_native_fence_sync</code> - provides a way to either
389wrap or create native Android fence file descriptors in EGLSyncKHR objects.
390<li> <code>EGL_ANDROID_wait_sync</code> - allows GPU-side stalls rather than in
391CPU, making the GPU wait for an EGLSyncKHR. This is essentially the same as the
392<code>EGL_KHR_wait_sync</code> extension. See the
393<code>EGL_KHR_wait_sync</code> specification for details.  </ul>
394
395<p>These extensions can be used independently and are controlled by a compile
396flag in libgui. To use them, first implement the
397<code>EGL_ANDROID_native_fence_sync</code> extension along with the associated
398kernel support. Next add a ANativeWindow support for fences to your driver and
399then turn on support in libgui to make use of the
400<code>EGL_ANDROID_native_fence_sync</code> extension.</p>
401
402<p>Then, as a second pass, enable the <code>EGL_ANDROID_wait_sync</code>
403extension in your driver and turn it on separately. The
404<code>EGL_ANDROID_native_fence_sync</code> extension consists of a distinct
405native fence EGLSync object type so extensions that apply to existing EGLSync
406object types don’t necessarily apply to <code>EGL_ANDROID_native_fence</code>
407objects to avoid unwanted interactions.</p>
408
409<p>The EGL_ANDROID_native_fence_sync extension employs a corresponding native
410fence file descriptor attribute that can be set only at creation time and
411cannot be directly queried onward from an existing sync object. This attribute
412can be set to one of two modes:</p>
413
414<ul> <li> A valid fence file descriptor - wraps an existing native Android
415fence file descriptor in an EGLSyncKHR object.  <li> -1 - creates a native
416Android fence file descriptor from an EGLSyncKHR object.  </ul>
417
418<p>The DupNativeFenceFD function call is used to extract the EGLSyncKHR object
419from the native Android fence file descriptor. This has the same result as
420querying the attribute that was set but adheres to the convention that the
421recipient closes the fence (hence the duplicate operation). Finally, destroying
422the EGLSync object should close the internal fence attribute.</p>
423
424<h5 id=hardware_composer_integration>Hardware Composer integration</h5>
425
426<p>Hardware Composer handles three types of sync fences:</p>
427
428<ul> <li> <em>Acquire fence</em> - one per layer, this is set before calling
429HWC::set. It signals when Hardware Composer may read the buffer.  <li>
430<em>Release fence</em> - one per layer, this is filled in by the driver in
431HWC::set. It signals when Hardware Composer is done reading the buffer so the
432framework can start using that buffer again for that particular layer.  <li>
433<em>Retire fence</em> - one per the entire frame, this is filled in by the
434driver each time HWC::set is called. This covers all of the layers for the set
435operation. It signals to the framework when all of the effects of this set
436operation has completed. The retire fence signals when the next set operation
437takes place on the screen.  </ul>
438
439<p>The retire fence can be used to determine how long each frame appears on the
440screen. This is useful in identifying the location and source of delays, such
441as a stuttering animation. </p>
442
443<h4 id=vsync_offset>VSYNC Offset</h4>
444
445<p>Application and SurfaceFlinger render loops should be synchronized to the
446hardware VSYNC. On a VSYNC event, the display begins showing frame N while
447SurfaceFlinger begins compositing windows for frame N+1. The app handles
448pending input and generates frame N+2.</p>
449
450<p>Synchronizing with VSYNC delivers consistent latency. It reduces errors in
451apps and SurfaceFlinger and the drifting of displays in and out of phase with
452each other. This, however, does assume application and SurfaceFlinger per-frame
453times don’t vary widely. Nevertheless, the latency is at least two frames.</p>
454
455<p>To remedy this, you may employ VSYNC offsets to reduce the input-to-display
456latency by making application and composition signal relative to hardware
457VSYNC. This is possible because application plus composition usually takes less
458than 33 ms.</p>
459
460<p>The result of VSYNC offset is three signals with same period, offset
461phase:</p>
462
463<ul> <li> <em>HW_VSYNC_0</em> - Display begins showing next frame <li>
464<em>VSYNC</em> - App reads input and generates next frame <li> <em>SF
465VSYNC</em> - SurfaceFlinger begins compositing for next frame </ul>
466
467<p>With VSYNC offset, SurfaceFlinger receives the buffer and composites the
468frame, while the application processes the input and renders the frame, all
469within a single frame of time.</p>
470
471<p>Please note, VSYNC offsets reduce the time available for app and composition
472and therefore provide a greater chance for error.</p>
473
474<h5 id=dispsync>DispSync</h5>
475
476<p>DispSync maintains a model of the periodic hardware-based VSYNC events of a
477display and uses that model to execute periodic callbacks at specific phase
478offsets from the hardware VSYNC events.</p>
479
480<p>DispSync is essentially a software phase lock loop (PLL) that generates the
481VSYNC and SF VSYNC signals used by Choreographer and SurfaceFlinger, even if
482not offset from hardware VSYNC.</p>
483
484<img src="images/dispsync.png" alt="DispSync flow">
485
486<p class="img-caption"><strong>Figure 4.</strong> DispSync flow</p>
487
488<p>DispSync has these qualities:</p>
489
490<ul> <li> <em>Reference</em> - HW_VSYNC_0 <li> <em>Output</em> - VSYNC and SF
491VSYNC <li> <em>Feedback</em> - Retire fence signal timestamps from Hardware
492Composer </ul>
493
494<h5 id=vsync_retire_offset>VSYNC/Retire Offset</h5>
495
496<p>The signal timestamp of retire fences must match HW VSYNC even on devices
497that don’t use the offset phase. Otherwise, errors appear to have greater
498severity than reality.</p>
499
500<p>“Smart” panels often have a delta. Retire fence is the end of direct memory
501access (DMA) to display memory. The actual display switch and HW VSYNC is some
502time later.</p>
503
504<p><code>PRESENT_TIME_OFFSET_FROM_VSYNC_NS</code> is set in the device’s
505BoardConfig.mk make file. It is based upon the display controller and panel
506characteristics. Time from retire fence timestamp to HW Vsync signal is
507measured in nanoseconds.</p>
508
509<h5 id=vsync_and_sf_vsync_offsets>VSYNC and SF_VSYNC Offsets</h5>
510
511<p>The <code>VSYNC_EVENT_PHASE_OFFSET_NS</code> and
512<code>SF_VSYNC_EVENT_PHASE_OFFSET_NS</code> are set conservatively based on
513high-load use cases, such as partial GPU composition during window transition
514or Chrome scrolling through a webpage containing animations. These offsets
515allow for long application render time and long GPU composition time.</p>
516
517<p>More than a millisecond or two of latency is noticeable. We recommend
518integrating thorough automated error testing to minimize latency without
519significantly increasing error counts.</p>
520
521<p>Note these offsets are also set in the device’s BoardConfig.mk make file.
522The default if not set is zero offset. Both settings are offset in nanoseconds
523after HW_VSYNC_0. Either can be negative.</p>
524
525<h3 id=virtual_displays>Virtual displays</h3>
526
527<p>Android added support for virtual displays to Hardware Composer in version
5281.3. This support was implemented in the Android platform and can be used by
529Miracast.</p>
530
531<p>The virtual display composition is similar to the physical display: Input
532layers are described in prepare(), SurfaceFlinger conducts GPU composition, and
533layers and GPU framebuffer are  provided to Hardware Composer in set().</p>
534
535<p>Instead of the output going to the screen, it is sent to a gralloc buffer.
536Hardware Composer writes output to a buffer and provides the completion fence.
537The buffer is sent to an arbitrary consumer: video encoder, GPU, CPU, etc.
538Virtual displays can use 2D/blitter or overlays if the display pipeline can
539write to memory.</p>
540
541<h4 id=modes>Modes</h4>
542
543<p>Each frame is in one of three modes after prepare():</p>
544
545<ul> <li> <em>GLES</em> - All layers composited by GPU. GPU writes directly to
546the output buffer while Hardware Composer does nothing. This is equivalent to
547virtual display composition with Hardware Composer <1.3.  <li> <em>MIXED</em> -
548GPU composites some layers to framebuffer, and Hardware Composer composites
549framebuffer and remaining layers. GPU writes to scratch buffer (framebuffer).
550Hardware Composer reads scratch buffer and writes to the output buffer. Buffers
551may have different formats, e.g. RGBA and YCbCr.  <li> <em>HWC</em> - All
552layers composited by Hardware Composer. Hardware Composer writes directly to
553the output buffer.  </ul>
554
555<h4 id=output_format>Output format</h4>
556
557<p><em>MIXED and HWC modes</em>: If the consumer needs CPU access, the consumer
558chooses the format. Otherwise, the format is IMPLEMENTATION_DEFINED. Gralloc
559can choose best format based on usage flags. For example, choose a YCbCr format
560if the consumer is video encoder, and Hardware Composer can write the format
561efficiently.</p>
562
563<p><em>GLES mode</em>: EGL driver chooses output buffer format in
564dequeueBuffer(), typically RGBA8888. The consumer must be able to accept this
565format.</p>
566
567<h4 id=egl_requirement>EGL requirement</h4>
568
569<p>Hardware Composer 1.3 virtual displays require that eglSwapBuffers() does
570not dequeue the next buffer immediately. Instead, it should defer dequeueing
571the buffer until rendering begins. Otherwise, EGL always owns the “next” output
572buffer. SurfaceFlinger can’t get the output buffer for Hardware Composer in
573MIXED/HWC mode. </p>
574
575<p>If Hardware Composer always sends all virtual display layers to GPU, all
576frames will be in GLES mode. Although it is not recommended, you may use this
577method if you need to support Hardware Composer 1.3 for some other reason but
578can’t conduct virtual display composition.</p>
579
580<h2 id=testing>Testing</h2>
581
582<p>For benchmarking, we suggest following this flow by phase:</p>
583
584<ul> <li> <em>Specification</em> - When initially specifying the device, such
585as when using immature drivers, you should use predefined (fixed) clocks and
586workloads to measure the frames per second rendered. This gives a clear view of
587what the hardware is capable of doing.  <li> <em>Development</em> - In the
588development phase as drivers mature, you should use a fixed set of user actions
589to measure the number of visible stutters (janks) in animations.  <li>
590<em>Production</em> - Once the device is ready for production and you want to
591compare against competitors, you should increase the workload until stutters
592increase. Determine if the current clock settings can keep up with the load.
593This can help you identify where you might be able to slow the clocks and
594reduce power use.  </ul>
595
596<p>For the specification phase, Android offers the Flatland tool to help derive
597device capabilities. It can be found at:
598<code>platform/frameworks/native/cmds/flatland/</code></p>
599
600<p>Flatland relies upon fixed clocks and shows the throughput that can be
601achieved with composition-based workloads. It uses gralloc buffers to simulate
602multiple window scenarios, filling in the window with GL and then measuring the
603compositing. Please note, Flatland uses the synchronization framework to
604measure time. So you must support the synchronization framework to readily use
605Flatland.</p>
606