pnacl-c-cpp-language-support.rst revision 116680a4aac90f2aa7413d9095a592090648e557
1============================
2PNaCl C/C++ Language Support
3============================
4
5.. contents::
6   :local:
7   :backlinks: none
8   :depth: 3
9
10Source language support
11=======================
12
13The currently supported languages are C and C++. The PNaCl toolchain is
14based on recent Clang, which fully supports C++11 and most of C11. A
15detailed status of the language support is available `here
16<http://clang.llvm.org/cxx_status.html>`_.
17
18For information on using languages other than C/C++, see the :ref:`FAQ
19section on other languages <other_languages>`.
20
21As for the standard libraries, the PNaCl toolchain is currently based on
22``libc++``, and the ``newlib`` standard C library. ``libstdc++`` is also
23supported but its use is discouraged; see :ref:`building_cpp_libraries`
24for more details.
25
26Versions
27--------
28
29Version information can be obtained:
30
31* Clang/LLVM: run ``pnacl-clang -v``.
32* ``newlib``: use the ``_NEWLIB_VERSION`` macro.
33* ``libc++``: use the ``_LIBCPP_VERSION`` macro.
34* ``libstdc++``: use the ``_GLIBCXX_VERSION`` macro.
35
36Preprocessor definitions
37------------------------
38
39When compiling C/C++ code, the PNaCl toolchain defines the ``__pnacl__``
40macro. In addition, ``__native_client__`` is defined for compatibility
41with other NaCl toolchains.
42
43.. _memory_model_and_atomics:
44
45Memory Model and Atomics
46========================
47
48Memory Model for Concurrent Operations
49--------------------------------------
50
51The memory model offered by PNaCl relies on the same coding guidelines
52as the C11/C++11 one: concurrent accesses must always occur through
53atomic primitives (offered by `atomic intrinsics
54<PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always
55occur with the same size for the same memory location. Visibility of
56stores is provided on a happens-before basis that relates memory
57locations to each other as the C11/C++11 standards do.
58
59Non-atomic memory accesses may be reordered, separated, elided or fused
60according to C and C++'s memory model before the pexe is created as well
61as after its creation. Accessing atomic memory location through
62non-atomic primitives is :ref:`Undefined Behavior <undefined_behavior>`.
63
64As in C11/C++11 some atomic accesses may be implemented with locks on
65certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be
66``1``, signifying that all types are sometimes lock-free. The
67``is_lock_free`` methods and ``atomic_is_lock_free`` will return the
68current platform's implementation at translation time. These macros,
69methods and functions are in the C11 header ``<stdatomic.h>`` and the
70C++11 header ``<atomic>``.
71
72The PNaCl toolchain supports concurrent memory accesses through legacy
73GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic
74primitives and the underlying `GCCMM
75<http://gcc.gnu.org/wiki/Atomic/GCCMM>`_ ``__atomic_*``
76primitives. ``volatile`` memory accesses can also be used, though these
77are discouraged. See `Volatile Memory Accesses`_.
78
79PNaCl supports concurrency and parallelism with some restrictions:
80
81* Threading is explicitly supported and has no restrictions over what
82  prevalent implementations offer. See `Threading`_.
83
84* ``volatile`` and atomic operations are address-free (operations on the
85  same memory location via two different addresses work atomically), as
86  intended by the C11/C++11 standards. This is critical in supporting
87  synchronous "external modifications" such as mapping underlying memory
88  at multiple locations.
89
90* Inter-process communication through shared memory is currently not
91  supported. See `Future Directions`_.
92
93* Signal handling isn't supported, PNaCl therefore promotes all
94  primitives to cross-thread (instead of single-thread). This may change
95  at a later date. Note that using atomic operations which aren't
96  lock-free may lead to deadlocks when handling asynchronous
97  signals. See `Future Directions`_.
98
99* Direct interaction with device memory isn't supported, and there is no
100  intent to support it. The embedding sandbox's runtime can offer APIs
101  to indirectly access devices.
102
103Setting up the above mechanisms requires assistance from the embedding
104sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup
105can be done through regular C/C++ code.
106
107Atomic Memory Ordering Constraints
108----------------------------------
109
110Atomics follow the same ordering constraints as in regular C11/C++11,
111but all accesses are promoted to sequential consistency (the strongest
112memory ordering) at pexe creation time. We plan to support more of the
113C11/C++11 memory orderings in the future.
114
115Some additional restrictions, following the C11/C++11 standards:
116
117- Atomic accesses must at least be naturally aligned.
118- Some accesses may not actually be atomic on certain platforms,
119  requiring an implementation that uses global locks.
120- An atomic memory location must always be accessed with atomic
121  primitives, and these primitives must always be of the same bit size
122  for that location.
123- Not all memory orderings are valid for all atomic operations.
124
125Volatile Memory Accesses
126------------------------
127
128The C11/C++11 standards mandate that ``volatile`` accesses execute in
129program order (but are not fences, so other memory operations can
130reorder around them), are not necessarily atomic, and can’t be
131elided. They can be separated into smaller width accesses.
132
133Before any optimizations occur, the PNaCl toolchain transforms
134``volatile`` loads and stores into sequentially consistent ``volatile``
135atomic loads and stores, and applies regular compiler optimizations
136along the above guidelines. This orders ``volatiles`` according to the
137atomic rules, and means that fences (including ``__sync_synchronize``)
138act in a better-defined manner. Regular memory accesses still do not
139have ordering guarantees with ``volatile`` and atomic accesses, though
140the internal representation of ``__sync_synchronize`` attempts to
141prevent reordering of memory accesses to objects which may escape.
142
143Relaxed ordering could be used instead, but for the first release it is
144more conservative to apply sequential consistency. Future releases may
145change what happens at compile-time, but already-released pexes will
146continue using sequential consistency.
147
148The PNaCl toolchain also requires that ``volatile`` accesses be at least
149naturally aligned, and tries to guarantee this alignment.
150
151The above guarantees ease the support of legacy (i.e. non-C11/C++11)
152code, and combined with builtin fences these programs can do meaningful
153cross-thread communication without changing code. They also better
154reflect the original code's intent and guarantee better portability.
155
156.. _language_support_threading:
157
158Threading
159=========
160
161Threading is explicitly supported through C11/C++11's threading
162libraries as well as POSIX threads.
163
164Communication between threads should use atomic primitives as described
165in `Memory Model and Atomics`_.
166
167``setjmp`` and ``longjmp``
168==========================
169
170PNaCl and NaCl support ``setjmp`` and ``longjmp`` without any
171restrictions beyond C's.
172
173.. _exception_handling:
174
175C++ Exception Handling
176======================
177
178PNaCl currently supports C++ exception handling through ``setjmp()`` and
179``longjmp()``, which can be enabled with the ``--pnacl-exceptions=sjlj``
180linker flag. Exceptions are disabled by default so that faster and
181smaller code is generated, and ``throw`` statements are replaced with
182calls to ``abort()``. The usual ``-fno-exceptions`` flag is also
183supported. PNaCl will support full zero-cost exception handling in the
184future.
185
186NaCl supports full zero-cost C++ exception handling.
187
188Inline Assembly
189===============
190
191Inline assembly isn't supported by PNaCl because it isn't portable. The
192one current exception is the common compiler barrier idiom
193``asm("":::"memory")``, which gets transformed to a sequentially
194consistent memory barrier (equivalent to ``__sync_synchronize()``). In
195PNaCl this barrier is only guaranteed to order ``volatile`` and atomic
196memory accesses, though in practice the implementation attempts to also
197prevent reordering of memory accesses to objects which may escape.
198
199PNaCl supports :ref:`Portable SIMD Vectors <portable_simd_vectors>`,
200which are traditionally expressed through target-specific intrinsics or
201inline assembly.
202
203NaCl supports a fairly wide subset of inline assembly through GCC's
204inline assembly syntax, with the restriction that the sandboxing model
205for the target architecture has to be respected.
206
207.. _portable_simd_vectors:
208
209Portable SIMD Vectors
210=====================
211
212SIMD vectors aren't part of the C/C++ standards and are traditionally
213very hardware-specific. Portable Native Client offers a portable version
214of SIMD vector datatypes and operations which map well to modern
215architectures and offer performance which matches or approaches
216hardware-specific uses.
217
218SIMD vector support was added to Portable Native Client for version 37
219of Chrome and more features, including performance enhancements, are
220expected to be added in subsequent releases.
221
222Hand-Coding Vector Extensions
223-----------------------------
224
225The initial vector support in Portable Native Client adds `LLVM vectors
226<http://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors>`_
227and `GCC vectors
228<http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html>`_ since these
229are well supported by different hardware platforms and don't require any
230new compiler intrinsics.
231
232Vector types can be used through the ``vector_size`` attribute:
233
234.. naclcode::
235
236  #define VECTOR_BYTES 16
237  typedef int v4s __attribute__((vector_size(VECTOR_BYTES)));
238  v4s a = {1,2,3,4};
239  v4s b = {5,6,7,8};
240  v4s c, d, e;
241  c = a + b;  /* c = {6,8,10,12} */
242  d = b >> a; /* d = {2,1,0,0} */
243
244Vector comparisons are represented as a bitmask as wide as the compared
245elements of all ``0`` or all ``1``:
246
247.. naclcode::
248
249  typedef int v4s __attribute__((vector_size(16)));
250  v4s snip(v4s in) {
251    v4s limit = {32,64,128,256};
252    v4s mask = in > limit;
253    v4s ret = in & mask;
254    return ret;
255  }
256
257Vector datatypes are currently expected to be 128-bit wide with one of
258the following element types:
259
260============  ============  ================
261Type          Num Elements  Vector Bit Width
262============  ============  ================
263``uint8_t``   16            128
264``int8_t``    16            128
265``uint16_t``  8             128
266``int16_t``   8             128
267``uint32_t``  4             128
268``int32_t``   4             128
269``float``     4             128
270============  ============  ================
271
27264-bit integers and double-precision floating point will be supported in
273a future release, as will 256-bit and 512-bit vectors.
274
275The following operators are supported on vectors:
276
277+----------------------------------------------+
278| unary ``+``, ``-``                           |
279+----------------------------------------------+
280| ``++``, ``--``                               |
281+----------------------------------------------+
282| ``+``, ``-``, ``*``, ``/``, ``%``            |
283+----------------------------------------------+
284| ``&``, ``|``, ``^``, ``~``                   |
285+----------------------------------------------+
286| ``>>``, ``<<``                               |
287+----------------------------------------------+
288| ``!``, ``&&``, ``||``                        |
289+----------------------------------------------+
290| ``==``, ``!=``, ``>``, ``<``, ``>=``, ``<=`` |
291+----------------------------------------------+
292| ``=``                                        |
293+----------------------------------------------+
294
295C-style casts can be used to convert one vector type to another without
296modifying the underlying bits. ``__builtin_convertvector`` can be used
297to convert from one type to another provided both types have the same
298number of elements, truncating when converting from floating-point to
299integer.
300
301.. naclcode::
302
303  typedef unsigned v4u __attribute__((vector_size(16)));
304  typedef float v4f __attribute__((vector_size(16)));
305  v4u a = {0x3f19999a,0x40000000,0x40490fdb,0x66ff0c30};
306  v4f b = (v4f) a; /* b = {0.6,2,3.14159,6.02214e+23}  */
307  v4u c = __builtin_convertvector(b, v4u); /* c = {0,2,3,0} */
308
309It is also possible to use array-style indexing into vectors to extract
310individual elements using ``[]``.
311
312.. naclcode::
313
314  typedef unsigned v4u __attribute__((vector_size(16)));
315  template<typename T>
316  void print(const T v) {
317    for (size_t i = 0; i != sizeof(v) / sizeof(v[0]); ++i)
318      std::cout << v[i] << ' ';
319    std::cout << std::endl;
320  }
321
322Vector shuffles (often called permutation or swizzle) operations are
323supported through ``__builtin_shufflevector``. The builtin has two
324vector arguments of the same element type, followed by a list of
325constant integers that specify the element indices of the first two
326vectors that should be extracted and returned in a new vector. These
327element indices are numbered sequentially starting with the first
328vector, continuing into the second vector. Thus, if ``vec1`` is a
3294-element vector, index ``5`` would refer to the second element of
330``vec2``. An index of ``-1`` can be used to indicate that the
331corresponding element in the returned vector is a don’t care and can be
332optimized by the backend.
333
334The result of ``__builtin_shufflevector`` is a vector with the same
335element type as ``vec1`` / ``vec2`` but that has an element count equal
336to the number of indices specified.
337
338.. naclcode::
339
340  // identity operation - return 4-element vector v1.
341  __builtin_shufflevector(v1, v1, 0, 1, 2, 3)
342
343  // "Splat" element 0 of v1 into a 4-element result.
344  __builtin_shufflevector(v1, v1, 0, 0, 0, 0)
345
346  // Reverse 4-element vector v1.
347  __builtin_shufflevector(v1, v1, 3, 2, 1, 0)
348
349  // Concatenate every other element of 4-element vectors v1 and v2.
350  __builtin_shufflevector(v1, v2, 0, 2, 4, 6)
351
352  // Concatenate every other element of 8-element vectors v1 and v2.
353  __builtin_shufflevector(v1, v2, 0, 2, 4, 6, 8, 10, 12, 14)
354
355  // Shuffle v1 with some elements being undefined
356  __builtin_shufflevector(v1, v1, 3, -1, 1, -1)
357
358One common use of ``__builtin_shufflevector`` is to perform
359vector-scalar operations:
360
361.. naclcode::
362
363  typedef int v4s __attribute__((vector_size(16)));
364  v4s shift_right_by(v4s shift_me, int shift_amount) {
365    v4s tmp = {shift_amount};
366    return shift_me >> __builtin_shuffle_vector(tmp, tmp, 0, 0, 0, 0);
367  }
368
369Auto-Vectorization
370------------------
371
372Auto-vectorization is currently not enabled for Portable Native Client,
373but will be in a future release.
374
375Undefined Behavior
376==================
377
378The C and C++ languages expose some undefined behavior which is
379discussed in :ref:`PNaCl Undefined Behavior <undefined_behavior>`.
380
381Floating-Point
382==============
383
384PNaCl exposes 32-bit and 64-bit floating point operations which are
385mostly IEEE-754 compliant. There are a few caveats:
386
387* Some :ref:`floating-point behavior is currently left as undefined
388  <undefined_behavior_fp>`.
389* The default rounding mode is round-to-nearest and other rounding modes
390  are currently not usable, which isn't IEEE-754 compliant. PNaCl could
391  support switching modes (the 4 modes exposed by C99 ``FLT_ROUNDS``
392  macros).
393* Signaling ``NaN`` never fault.
394* Fast-math optimizations are currently supported before *pexe* creation
395  time. A *pexe* loses all fast-math information when it is
396  created. Fast-math translation could be enabled at a later date,
397  potentially at a perf-function granularity. This wouldn't affect
398  already-existing *pexe*; it would be an opt-in feature.
399
400  * Fused-multiply-add have higher precision and often execute faster;
401    PNaCl currently disallows them in the *pexe* because they aren't
402    supported on all platforms and can't realistically be
403    emulated. PNaCl could (but currently doesn't) only generate them in
404    the backend if fast-math were specified and the hardware supports
405    the operation.
406  * Transcendentals aren't exposed by PNaCl's ABI; they are part of the
407    math library that is included in the *pexe*. PNaCl could, but
408    currently doesn't, use hardware support if fast-math were provided
409    in the *pexe*.
410
411Computed ``goto``
412=================
413
414PNaCl supports computed ``goto``, a non-standard GCC extension to C used
415by some interpreters, by lowering them to ``switch`` statements. The
416resulting use of ``switch`` might not be as fast as the original
417indirect branches. If you are compiling a program that has a
418compile-time option for using computed ``goto``, it's possible that the
419program will run faster with the option turned off (e.g., if the program
420does extra work to take advantage of computed ``goto``).
421
422NaCl supports computed ``goto`` without any transformation.
423
424Future Directions
425=================
426
427Inter-Process Communication
428---------------------------
429
430Inter-process communication through shared memory is currently not
431supported by PNaCl/NaCl. When implemented, it may be limited to
432operations which are lock-free on the current platform (``is_lock_free``
433methods). It will rely on the address-free properly discussed in `Memory
434Model for Concurrent Operations`_.
435
436POSIX-style Signal Handling
437---------------------------
438
439POSIX-style signal handling really consists of two different features:
440
441* **Hardware exception handling** (synchronous signals): The ability
442  to catch hardware exceptions (such as memory access faults and
443  division by zero) using a signal handler.
444
445  PNaCl currently doesn't support hardware exception handling.
446
447  NaCl supports hardware exception handling via the
448  ``<nacl/nacl_exception.h>`` interface.
449
450* **Asynchronous interruption of threads** (asynchronous signals): The
451  ability to asynchronously interrupt the execution of a thread,
452  forcing the thread to run a signal handler.
453
454  A similar feature is **thread suspension**: The ability to
455  asynchronously suspend and resume a thread and inspect or modify its
456  execution state (such as register state).
457
458  Neither PNaCl nor NaCl currently support asynchronous interruption
459  or suspension of threads.
460
461If PNaCl were to support either of these, the interaction of
462``volatile`` and atomics with same-thread signal handling would need
463to be carefully detailed.
464