pnacl-c-cpp-language-support.rst revision 116680a4aac90f2aa7413d9095a592090648e557
1============================ 2PNaCl C/C++ Language Support 3============================ 4 5.. contents:: 6 :local: 7 :backlinks: none 8 :depth: 3 9 10Source language support 11======================= 12 13The currently supported languages are C and C++. The PNaCl toolchain is 14based on recent Clang, which fully supports C++11 and most of C11. A 15detailed status of the language support is available `here 16<http://clang.llvm.org/cxx_status.html>`_. 17 18For information on using languages other than C/C++, see the :ref:`FAQ 19section on other languages <other_languages>`. 20 21As for the standard libraries, the PNaCl toolchain is currently based on 22``libc++``, and the ``newlib`` standard C library. ``libstdc++`` is also 23supported but its use is discouraged; see :ref:`building_cpp_libraries` 24for more details. 25 26Versions 27-------- 28 29Version information can be obtained: 30 31* Clang/LLVM: run ``pnacl-clang -v``. 32* ``newlib``: use the ``_NEWLIB_VERSION`` macro. 33* ``libc++``: use the ``_LIBCPP_VERSION`` macro. 34* ``libstdc++``: use the ``_GLIBCXX_VERSION`` macro. 35 36Preprocessor definitions 37------------------------ 38 39When compiling C/C++ code, the PNaCl toolchain defines the ``__pnacl__`` 40macro. In addition, ``__native_client__`` is defined for compatibility 41with other NaCl toolchains. 42 43.. _memory_model_and_atomics: 44 45Memory Model and Atomics 46======================== 47 48Memory Model for Concurrent Operations 49-------------------------------------- 50 51The memory model offered by PNaCl relies on the same coding guidelines 52as the C11/C++11 one: concurrent accesses must always occur through 53atomic primitives (offered by `atomic intrinsics 54<PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always 55occur with the same size for the same memory location. Visibility of 56stores is provided on a happens-before basis that relates memory 57locations to each other as the C11/C++11 standards do. 58 59Non-atomic memory accesses may be reordered, separated, elided or fused 60according to C and C++'s memory model before the pexe is created as well 61as after its creation. Accessing atomic memory location through 62non-atomic primitives is :ref:`Undefined Behavior <undefined_behavior>`. 63 64As in C11/C++11 some atomic accesses may be implemented with locks on 65certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be 66``1``, signifying that all types are sometimes lock-free. The 67``is_lock_free`` methods and ``atomic_is_lock_free`` will return the 68current platform's implementation at translation time. These macros, 69methods and functions are in the C11 header ``<stdatomic.h>`` and the 70C++11 header ``<atomic>``. 71 72The PNaCl toolchain supports concurrent memory accesses through legacy 73GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic 74primitives and the underlying `GCCMM 75<http://gcc.gnu.org/wiki/Atomic/GCCMM>`_ ``__atomic_*`` 76primitives. ``volatile`` memory accesses can also be used, though these 77are discouraged. See `Volatile Memory Accesses`_. 78 79PNaCl supports concurrency and parallelism with some restrictions: 80 81* Threading is explicitly supported and has no restrictions over what 82 prevalent implementations offer. See `Threading`_. 83 84* ``volatile`` and atomic operations are address-free (operations on the 85 same memory location via two different addresses work atomically), as 86 intended by the C11/C++11 standards. This is critical in supporting 87 synchronous "external modifications" such as mapping underlying memory 88 at multiple locations. 89 90* Inter-process communication through shared memory is currently not 91 supported. See `Future Directions`_. 92 93* Signal handling isn't supported, PNaCl therefore promotes all 94 primitives to cross-thread (instead of single-thread). This may change 95 at a later date. Note that using atomic operations which aren't 96 lock-free may lead to deadlocks when handling asynchronous 97 signals. See `Future Directions`_. 98 99* Direct interaction with device memory isn't supported, and there is no 100 intent to support it. The embedding sandbox's runtime can offer APIs 101 to indirectly access devices. 102 103Setting up the above mechanisms requires assistance from the embedding 104sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup 105can be done through regular C/C++ code. 106 107Atomic Memory Ordering Constraints 108---------------------------------- 109 110Atomics follow the same ordering constraints as in regular C11/C++11, 111but all accesses are promoted to sequential consistency (the strongest 112memory ordering) at pexe creation time. We plan to support more of the 113C11/C++11 memory orderings in the future. 114 115Some additional restrictions, following the C11/C++11 standards: 116 117- Atomic accesses must at least be naturally aligned. 118- Some accesses may not actually be atomic on certain platforms, 119 requiring an implementation that uses global locks. 120- An atomic memory location must always be accessed with atomic 121 primitives, and these primitives must always be of the same bit size 122 for that location. 123- Not all memory orderings are valid for all atomic operations. 124 125Volatile Memory Accesses 126------------------------ 127 128The C11/C++11 standards mandate that ``volatile`` accesses execute in 129program order (but are not fences, so other memory operations can 130reorder around them), are not necessarily atomic, and can’t be 131elided. They can be separated into smaller width accesses. 132 133Before any optimizations occur, the PNaCl toolchain transforms 134``volatile`` loads and stores into sequentially consistent ``volatile`` 135atomic loads and stores, and applies regular compiler optimizations 136along the above guidelines. This orders ``volatiles`` according to the 137atomic rules, and means that fences (including ``__sync_synchronize``) 138act in a better-defined manner. Regular memory accesses still do not 139have ordering guarantees with ``volatile`` and atomic accesses, though 140the internal representation of ``__sync_synchronize`` attempts to 141prevent reordering of memory accesses to objects which may escape. 142 143Relaxed ordering could be used instead, but for the first release it is 144more conservative to apply sequential consistency. Future releases may 145change what happens at compile-time, but already-released pexes will 146continue using sequential consistency. 147 148The PNaCl toolchain also requires that ``volatile`` accesses be at least 149naturally aligned, and tries to guarantee this alignment. 150 151The above guarantees ease the support of legacy (i.e. non-C11/C++11) 152code, and combined with builtin fences these programs can do meaningful 153cross-thread communication without changing code. They also better 154reflect the original code's intent and guarantee better portability. 155 156.. _language_support_threading: 157 158Threading 159========= 160 161Threading is explicitly supported through C11/C++11's threading 162libraries as well as POSIX threads. 163 164Communication between threads should use atomic primitives as described 165in `Memory Model and Atomics`_. 166 167``setjmp`` and ``longjmp`` 168========================== 169 170PNaCl and NaCl support ``setjmp`` and ``longjmp`` without any 171restrictions beyond C's. 172 173.. _exception_handling: 174 175C++ Exception Handling 176====================== 177 178PNaCl currently supports C++ exception handling through ``setjmp()`` and 179``longjmp()``, which can be enabled with the ``--pnacl-exceptions=sjlj`` 180linker flag. Exceptions are disabled by default so that faster and 181smaller code is generated, and ``throw`` statements are replaced with 182calls to ``abort()``. The usual ``-fno-exceptions`` flag is also 183supported. PNaCl will support full zero-cost exception handling in the 184future. 185 186NaCl supports full zero-cost C++ exception handling. 187 188Inline Assembly 189=============== 190 191Inline assembly isn't supported by PNaCl because it isn't portable. The 192one current exception is the common compiler barrier idiom 193``asm("":::"memory")``, which gets transformed to a sequentially 194consistent memory barrier (equivalent to ``__sync_synchronize()``). In 195PNaCl this barrier is only guaranteed to order ``volatile`` and atomic 196memory accesses, though in practice the implementation attempts to also 197prevent reordering of memory accesses to objects which may escape. 198 199PNaCl supports :ref:`Portable SIMD Vectors <portable_simd_vectors>`, 200which are traditionally expressed through target-specific intrinsics or 201inline assembly. 202 203NaCl supports a fairly wide subset of inline assembly through GCC's 204inline assembly syntax, with the restriction that the sandboxing model 205for the target architecture has to be respected. 206 207.. _portable_simd_vectors: 208 209Portable SIMD Vectors 210===================== 211 212SIMD vectors aren't part of the C/C++ standards and are traditionally 213very hardware-specific. Portable Native Client offers a portable version 214of SIMD vector datatypes and operations which map well to modern 215architectures and offer performance which matches or approaches 216hardware-specific uses. 217 218SIMD vector support was added to Portable Native Client for version 37 219of Chrome and more features, including performance enhancements, are 220expected to be added in subsequent releases. 221 222Hand-Coding Vector Extensions 223----------------------------- 224 225The initial vector support in Portable Native Client adds `LLVM vectors 226<http://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors>`_ 227and `GCC vectors 228<http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html>`_ since these 229are well supported by different hardware platforms and don't require any 230new compiler intrinsics. 231 232Vector types can be used through the ``vector_size`` attribute: 233 234.. naclcode:: 235 236 #define VECTOR_BYTES 16 237 typedef int v4s __attribute__((vector_size(VECTOR_BYTES))); 238 v4s a = {1,2,3,4}; 239 v4s b = {5,6,7,8}; 240 v4s c, d, e; 241 c = a + b; /* c = {6,8,10,12} */ 242 d = b >> a; /* d = {2,1,0,0} */ 243 244Vector comparisons are represented as a bitmask as wide as the compared 245elements of all ``0`` or all ``1``: 246 247.. naclcode:: 248 249 typedef int v4s __attribute__((vector_size(16))); 250 v4s snip(v4s in) { 251 v4s limit = {32,64,128,256}; 252 v4s mask = in > limit; 253 v4s ret = in & mask; 254 return ret; 255 } 256 257Vector datatypes are currently expected to be 128-bit wide with one of 258the following element types: 259 260============ ============ ================ 261Type Num Elements Vector Bit Width 262============ ============ ================ 263``uint8_t`` 16 128 264``int8_t`` 16 128 265``uint16_t`` 8 128 266``int16_t`` 8 128 267``uint32_t`` 4 128 268``int32_t`` 4 128 269``float`` 4 128 270============ ============ ================ 271 27264-bit integers and double-precision floating point will be supported in 273a future release, as will 256-bit and 512-bit vectors. 274 275The following operators are supported on vectors: 276 277+----------------------------------------------+ 278| unary ``+``, ``-`` | 279+----------------------------------------------+ 280| ``++``, ``--`` | 281+----------------------------------------------+ 282| ``+``, ``-``, ``*``, ``/``, ``%`` | 283+----------------------------------------------+ 284| ``&``, ``|``, ``^``, ``~`` | 285+----------------------------------------------+ 286| ``>>``, ``<<`` | 287+----------------------------------------------+ 288| ``!``, ``&&``, ``||`` | 289+----------------------------------------------+ 290| ``==``, ``!=``, ``>``, ``<``, ``>=``, ``<=`` | 291+----------------------------------------------+ 292| ``=`` | 293+----------------------------------------------+ 294 295C-style casts can be used to convert one vector type to another without 296modifying the underlying bits. ``__builtin_convertvector`` can be used 297to convert from one type to another provided both types have the same 298number of elements, truncating when converting from floating-point to 299integer. 300 301.. naclcode:: 302 303 typedef unsigned v4u __attribute__((vector_size(16))); 304 typedef float v4f __attribute__((vector_size(16))); 305 v4u a = {0x3f19999a,0x40000000,0x40490fdb,0x66ff0c30}; 306 v4f b = (v4f) a; /* b = {0.6,2,3.14159,6.02214e+23} */ 307 v4u c = __builtin_convertvector(b, v4u); /* c = {0,2,3,0} */ 308 309It is also possible to use array-style indexing into vectors to extract 310individual elements using ``[]``. 311 312.. naclcode:: 313 314 typedef unsigned v4u __attribute__((vector_size(16))); 315 template<typename T> 316 void print(const T v) { 317 for (size_t i = 0; i != sizeof(v) / sizeof(v[0]); ++i) 318 std::cout << v[i] << ' '; 319 std::cout << std::endl; 320 } 321 322Vector shuffles (often called permutation or swizzle) operations are 323supported through ``__builtin_shufflevector``. The builtin has two 324vector arguments of the same element type, followed by a list of 325constant integers that specify the element indices of the first two 326vectors that should be extracted and returned in a new vector. These 327element indices are numbered sequentially starting with the first 328vector, continuing into the second vector. Thus, if ``vec1`` is a 3294-element vector, index ``5`` would refer to the second element of 330``vec2``. An index of ``-1`` can be used to indicate that the 331corresponding element in the returned vector is a don’t care and can be 332optimized by the backend. 333 334The result of ``__builtin_shufflevector`` is a vector with the same 335element type as ``vec1`` / ``vec2`` but that has an element count equal 336to the number of indices specified. 337 338.. naclcode:: 339 340 // identity operation - return 4-element vector v1. 341 __builtin_shufflevector(v1, v1, 0, 1, 2, 3) 342 343 // "Splat" element 0 of v1 into a 4-element result. 344 __builtin_shufflevector(v1, v1, 0, 0, 0, 0) 345 346 // Reverse 4-element vector v1. 347 __builtin_shufflevector(v1, v1, 3, 2, 1, 0) 348 349 // Concatenate every other element of 4-element vectors v1 and v2. 350 __builtin_shufflevector(v1, v2, 0, 2, 4, 6) 351 352 // Concatenate every other element of 8-element vectors v1 and v2. 353 __builtin_shufflevector(v1, v2, 0, 2, 4, 6, 8, 10, 12, 14) 354 355 // Shuffle v1 with some elements being undefined 356 __builtin_shufflevector(v1, v1, 3, -1, 1, -1) 357 358One common use of ``__builtin_shufflevector`` is to perform 359vector-scalar operations: 360 361.. naclcode:: 362 363 typedef int v4s __attribute__((vector_size(16))); 364 v4s shift_right_by(v4s shift_me, int shift_amount) { 365 v4s tmp = {shift_amount}; 366 return shift_me >> __builtin_shuffle_vector(tmp, tmp, 0, 0, 0, 0); 367 } 368 369Auto-Vectorization 370------------------ 371 372Auto-vectorization is currently not enabled for Portable Native Client, 373but will be in a future release. 374 375Undefined Behavior 376================== 377 378The C and C++ languages expose some undefined behavior which is 379discussed in :ref:`PNaCl Undefined Behavior <undefined_behavior>`. 380 381Floating-Point 382============== 383 384PNaCl exposes 32-bit and 64-bit floating point operations which are 385mostly IEEE-754 compliant. There are a few caveats: 386 387* Some :ref:`floating-point behavior is currently left as undefined 388 <undefined_behavior_fp>`. 389* The default rounding mode is round-to-nearest and other rounding modes 390 are currently not usable, which isn't IEEE-754 compliant. PNaCl could 391 support switching modes (the 4 modes exposed by C99 ``FLT_ROUNDS`` 392 macros). 393* Signaling ``NaN`` never fault. 394* Fast-math optimizations are currently supported before *pexe* creation 395 time. A *pexe* loses all fast-math information when it is 396 created. Fast-math translation could be enabled at a later date, 397 potentially at a perf-function granularity. This wouldn't affect 398 already-existing *pexe*; it would be an opt-in feature. 399 400 * Fused-multiply-add have higher precision and often execute faster; 401 PNaCl currently disallows them in the *pexe* because they aren't 402 supported on all platforms and can't realistically be 403 emulated. PNaCl could (but currently doesn't) only generate them in 404 the backend if fast-math were specified and the hardware supports 405 the operation. 406 * Transcendentals aren't exposed by PNaCl's ABI; they are part of the 407 math library that is included in the *pexe*. PNaCl could, but 408 currently doesn't, use hardware support if fast-math were provided 409 in the *pexe*. 410 411Computed ``goto`` 412================= 413 414PNaCl supports computed ``goto``, a non-standard GCC extension to C used 415by some interpreters, by lowering them to ``switch`` statements. The 416resulting use of ``switch`` might not be as fast as the original 417indirect branches. If you are compiling a program that has a 418compile-time option for using computed ``goto``, it's possible that the 419program will run faster with the option turned off (e.g., if the program 420does extra work to take advantage of computed ``goto``). 421 422NaCl supports computed ``goto`` without any transformation. 423 424Future Directions 425================= 426 427Inter-Process Communication 428--------------------------- 429 430Inter-process communication through shared memory is currently not 431supported by PNaCl/NaCl. When implemented, it may be limited to 432operations which are lock-free on the current platform (``is_lock_free`` 433methods). It will rely on the address-free properly discussed in `Memory 434Model for Concurrent Operations`_. 435 436POSIX-style Signal Handling 437--------------------------- 438 439POSIX-style signal handling really consists of two different features: 440 441* **Hardware exception handling** (synchronous signals): The ability 442 to catch hardware exceptions (such as memory access faults and 443 division by zero) using a signal handler. 444 445 PNaCl currently doesn't support hardware exception handling. 446 447 NaCl supports hardware exception handling via the 448 ``<nacl/nacl_exception.h>`` interface. 449 450* **Asynchronous interruption of threads** (asynchronous signals): The 451 ability to asynchronously interrupt the execution of a thread, 452 forcing the thread to run a signal handler. 453 454 A similar feature is **thread suspension**: The ability to 455 asynchronously suspend and resume a thread and inspect or modify its 456 execution state (such as register state). 457 458 Neither PNaCl nor NaCl currently support asynchronous interruption 459 or suspension of threads. 460 461If PNaCl were to support either of these, the interaction of 462``volatile`` and atomics with same-thread signal handling would need 463to be carefully detailed. 464