3590c452ea8485d063874138eec92411297a9abb |
|
09-Feb-2018 |
Mingsheng Hong <hongm@google.com> |
Enabled XLA for TF C API. Summary of changes: 1. Set MarkForCompilationPassFlags::tf_xla_cpu_global_jit default to true in C_API unit test env when XLA-execute is intended. Together with setting session config config.graph_options.optimizer_options.global_jit_level to > 0, this turns on XLA for the entire graph (eligible nodes only, with _Arg and _RetVal nodes excluded). We decided against defaulting MarkForCompilationPassFlags::tf_xla_cpu_global_jit to true, due to performance concerns with the single-threaded nature of the XLA CPU backend (see https://www.tensorflow.org/performance/xla/jit#turning_on_jit_compilation). 2. In FindCompilationCandidates() during MarkForCompilationPass, skip compiling any '_Arg'-typed nodes. This is necessary to avoid hitting a "Invalid argument number" error during MarkForCompilationPass. 3. Extended C API based build rules to link in XLA libraries, and added unit test "CAPI.Session_Min_XLA_CPU". Also added some misc improvements and debugging aids. PiperOrigin-RevId: 185193314
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
e78ec6b40d9af515044bde7e184a9a85b0aa0a41 |
|
19-Dec-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Initial checkin for outside_compilation. Adds a new attribute for encapsulating XLA subgraphs that will in the future be used to mark some Ops in the subgraph as 'outside_compilation' meaning they will be run as interpreted TensorFlow via a callout from a compiled XLA subgraph. This is the first of a sequence of checkins. It adds new types of edges entering and leaving the subgraphs, suitable for send/recv between a compiled XLA subgraph and the 'host', i.e., uncompiled TensorFlow. For now no code sets the new 'outside_compilation' attributes, and the Ops to perform the send/recv are not present in the codebase; these will follow in subsequent checkins. PiperOrigin-RevId: 179591853
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
47b674c938a38c6d88f27244a12ce3944c2f0464 |
|
13-Dec-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Remove a source of nondeterminism in HLO clustering. Record the HLO clusters with std::set instead of std::unordered_set to ensure that the algorithm to assign each cluster a sequence number during a set traversal is deterministic. PiperOrigin-RevId: 178830794
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
0472116d163eeb77d51cabdc5fc67be917048870 |
|
01-Dec-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[TF:XLA] Make tf_cnn_benchmarks run on CPU with XLA. Adds _cpu_jit to tf_cnn_benchmarks_xla BUILD rule and fixes an issue in XLA bridge triggered by XLA CPU compilation of whole graphs. In particular, modifies mark_for_compilation_pass.cc to skip _Retval nodes when looking for compilation candidates in the top level function. _Retval nodes are introduced in the input subgraph as a replacement for fetches. Including _Retval nodes into XLA clusters confuses encapsulate subgraph pass that expects a graph with no pre-existing _Retval nodes. PiperOrigin-RevId: 177518178
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
205ff0f7592c60ab09fc705f2c5501d8547e83be |
|
15-Nov-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[TF:XLA] Added tf_xla_cpu_global_jit flag to TF_XLA_FLAGS environment variable to enable global JIT compilation for CPU via SessionOptions. By default, global JIT compilation for CPU via SessionOptions is disabled. When TF_XLA_FLAGS=--tf_xla_cpu_global_jit is set, the value of enable_jit_by_default variable in mark_for_compilation_pass.cc is ignored allowing XLA to use JIT compilation for the whole graph according to SessionOptions setting . Unless tf_xla_cpu_dev_mode is explicitly set via TF_XLA_FLAGS, this code change should have no effect on Tensorflow or XLA execution. RELNOTES: n/a PiperOrigin-RevId: 175754729
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
825a9f8d9a4cc3cce7cee2fb08dcc058b5a8e2a8 |
|
06-Oct-2017 |
Peter Hawkins <phawkins@google.com> |
[TF:XLA] Make registration of an XlaDevice for autoclustering optional. PiperOrigin-RevId: 171281666
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
a81d10e2e753039e675d256762b6a3337342b7cd |
|
28-Sep-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
When constructing the error message, check for a nonexistent node before trying to get the name of that node. PiperOrigin-RevId: 170349499
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
8bcc4151d4ea266f5f4183f7eaa51c7874ad15a1 |
|
21-Sep-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
If a cycle is detected, mention in the error message what the cycle is. PiperOrigin-RevId: 169575965
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
19a55725af8102d72d4e081c5139f0e4bd5a4bb7 |
|
18-Aug-2017 |
Rohan Jain <rohanj@google.com> |
Allowing functions to run across devices. This change expands the ProcessFunctionLibraryRuntime library to Instantiate and Run functions on different devices. When a FunctionLibraryRuntime encounters a function with a target that is another device, it delegates Instantiate() and Run() calls to the ProcessFunctionLibraryRuntime. This change also moves the table_ containing all function instantiations to the PFLR instead of the FunctionLibraryRuntime. PiperOrigin-RevId: 165651194
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
935ff49201edd7a6297b313fb9545d1299b9a28d |
|
17-Aug-2017 |
Rohan Jain <rohanj@google.com> |
Automated g4 rollback of changelist 165521057 PiperOrigin-RevId: 165604864
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
37de1372ff43b144750c789b088f3166bcb6a27a |
|
17-Aug-2017 |
Rohan Jain <rohanj@google.com> |
Allowing functions to run across devices. This change expands the ProcessFunctionLibraryRuntime library to Instantiate and Run functions on different devices. When a FunctionLibraryRuntime encounters a function with a target that is another device, it delegates Instantiate() and Run() calls to the ProcessFunctionLibraryRuntime. This change also moves the table_ containing all function instantiations to the PFLR instead of the FunctionLibraryRuntime. PiperOrigin-RevId: 165521057
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
2f1ff0e90dc3ba80f6bbc3f9850e8028875dcbbf |
|
25-Jul-2017 |
Peter Hawkins <phawkins@google.com> |
[TF:XLA] Register a no-op kernel for ControlTrigger, but forbid the JIT marking pass from compiling ControlTrigger nodes. CL in preparation for compiling dynamic RNN gradients via XLA. PiperOrigin-RevId: 163073212
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
a8087b4aae40ef5c97d8b27d40795950996f86d5 |
|
27-Jun-2017 |
Peter Hawkins <phawkins@google.com> |
[TF:XLA] Reject operators with resource outputs on CPU and GPU devices. We were checking for resource inputs but not resource outputs, which led to accidental fusion of some TensorArray ops on CPU and GPU. PiperOrigin-RevId: 160294302
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
6ada43366663210beb0159b8c1a67b26ebfe6cb7 |
|
23-Jun-2017 |
Geoffrey Irving <geoffreyi@google.com> |
Prepare to not include node_def.proto.h in node_def_util.h The goal is to make kernels mostly independent of proto headers, which will let us lock down our .so imports. This CL makes a bunch of .cc files either include node_def.proto.h themselves or not need the definition of NodeDef; a second CL will make node_def_util.h not include node_def.proto.h. RELNOTES: n/a PiperOrigin-RevId: 159982117
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
0f2db739163809782049b2c956355506c88c77e5 |
|
02-Jun-2017 |
Peter Hawkins <phawkins@google.com> |
[TF:XLA] Split union-find implementation in mark_for_compilation_pass.cc into a separate library, make it more generic. PiperOrigin-RevId: 157850985
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
4e131d27354bc9be90e291f3ec4538c0e3bf06eb |
|
22-May-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Many algorithms need to enumerate the set of nodes within a graph, while excluding the special Sink and Source nodes. The checks for skipping Source and Sink are duplicated in dozens of loops. This CL adds a new Graph::op_nodes() method, which returns an enumerable range of all operation nodes, excluding Sink and Source. This allows many for loops to be simplified. This simplification is being done mainly for readability / reliability. There may be a tiny performance difference owing to this change (as well as making the Graph::nodes() and Graph::op_nodes() methods inlineable), but the measured difference is not reliably large enough to be significant. The changes to graph.h and graph.cc are quite minimal. I updated all of the uses of Graph::nodes() that I could reliably determine were unaffected by the change. Most uses immediately checked node->IsOp(). Some compared node->type_string() against literal strings, none of which were "_SINK" or "_SOURCE", and so using op_nodes() was more appropriate than nodes(). In some cases, it was not obvious whether an existing use of Graph::node() wanted to enumerate Sink / Source, so I left those uses unaffected. PiperOrigin-RevId: 156782112
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
73882f257ffb1bc9e1a828571c085d080b1d9266 |
|
17-May-2017 |
Geoffrey Irving <geoffreyi@google.com> |
Automated g4 rollback of changelist 156251356 PiperOrigin-RevId: 156315860
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
43db5c623f748b6f9704e9e9be5a5a11fa2a4c1a |
|
17-May-2017 |
Geoffrey Irving <geoffreyi@google.com> |
Automated g4 rollback of changelist 156244933 PiperOrigin-RevId: 156251356
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
749e5cc18381f7a5ec174673f76e20aead8529c6 |
|
17-May-2017 |
Geoffrey Irving <geoffreyi@google.com> |
Reduce direct references to NodeDef in favor of Node and AttrSlice This is one step towards replacing in-memory use of NodeDef with a customized NodeInfo class. There are still quite a few Node::def() references, but far fewer than before. Those remaining require more work, either because they are part of kernel registration (which is a bunch of functions), copy and modify the NodeDef, etc. Follow-on CLs will remove more. RELNOTES: n/a PiperOrigin-RevId: 156244933
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
1d0b8c007b8bc7f77dd63c74f02d87185071f038 |
|
09-May-2017 |
Peter Hawkins <phawkins@google.com> |
Remove unnecessary copies of value parameters. PiperOrigin-RevId: 155511618
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
535928864e296ca051fd7ceedbba915fb0e81bbe |
|
03-Apr-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Increase kMaxRecursionDepth. Change: 152042191
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
b05a83916f21becf59eff4e9db1d375eeb0fe904 |
|
16-Mar-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[TF:XLA] Don't compile functions that are marked "noinline". The underlying function mechanism uses LocalExecutor to call the function, which interacts poorly with the LocalExecutor used by tf2xla to translate the TF graph into XLA. Change: 150268961
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
91d2cc5cb4cb2d2463e3ed7ea323fc627c4a2098 |
|
23-Feb-2017 |
Eugene Brevdo <ebrevdo@google.com> |
Avoid merging adjacent XLA compilations from different scopes/functions This is part 2 of the bugfix. It implements the xla chain breaking mechanism based on different coloring of the graph (as represented by different XlaScope strings). In part 1, we modified both experimental_jit_scope and Defun to mark their ops as having different XlaScopes, so this is the final change that actually enables the fusion breaking. Also fixed a bug where xla_enabled was not True for rnn_cell_tests (the XLA benchmarks in that test were not actually being run with xla enabled). Change: 148286731
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
542c3cbf711c4b89310fa4046c48150d29564008 |
|
22-Feb-2017 |
Peter Hawkins <phawkins@google.com> |
[TF:XLA] Add support for resource variables to the Tensorflow/XLA bridge. Change: 148176223
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
6640d3f3de88a3f3ade8ec6e5e4540e545024f87 |
|
16-Feb-2017 |
Peter Hawkins <phawkins@google.com> |
[TF:XLA] Refactor XlaOpRegistry, moving metadata about how to compile operators on a device into a struct. No functional changes. Change: 147741833
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
a8c325e57c1077f1e8df540a20bd8b36d3d1f968 |
|
15-Feb-2017 |
Peter Hawkins <phawkins@google.com> |
[TF:XLA] Split XlaOpRegistry out of xla_compilation_device.{cc,h} into a separate xla_op_registry.{cc,h}. Move XlaExpression out of xla_context.{cc,h} into xla_compilation_device.{cc,h}, since it is used to wrap computation handles on the XLA compilation device. Change just moves code around, there are no functional changes. Change: 147632770
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
96007205c42d591ef5cef2d7e8245b780f44f0d7 |
|
07-Feb-2017 |
Peter Hawkins <phawkins@google.com> |
[TF:XLA] Disable the XLA CPU jit by default when the JIT is requested via the OptimizerOptions. The XLA CPU JIT is not optimized yet, and should not be enabled by default since it is usually slower than the standard Tensorflow CPU kernels. Change: 146811646
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
c44cde12a4533571257fb30fb2e5ea1b7c6dbf7f |
|
19-Jan-2017 |
Peter Hawkins <phawkins@google.com> |
[TF:XLA] Add support for compiling computations with no return values. Remove check for _Send and _Recv nodes in mark_for_compilation_pass.cc. Change: 144984665
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
c8384ed2900201f55f219e52e2fd57e2d4d48e70 |
|
13-Jan-2017 |
Peter Hawkins <phawkins@google.com> |
Add a unit test for XlaCompiler. Add support for marking Xla computations as stateful. Add a store for xla::ChannelHandles in XlaCompiler. Don't mark _Send/_Recv for XLA computation. Change: 144382814
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|
1e67c90e2caceeff82d09793d1ef5fa0300d219b |
|
09-Jan-2017 |
Peter Hawkins <phawkins@google.com> |
Initial open-source release of XLA: Accelerated Linear Algebra. XLA is a compiler-based linear algebra execution engine that targets CPUs, GPUs and custom accelerators. XLA is still experimental; we are releasing it early to get the community involved. Change: 143990941
/external/tensorflow/tensorflow/compiler/jit/mark_for_compilation_pass.cc
|