f83bb6db5bd17df215994bde7adacef50ece0192 |
|
16-Feb-2018 |
Sanjoy Das <sanjoy@google.com> |
[XLA:CPU] Minor cleanup to simple_orc_jit SimpleResolver became unused after an LLVM upstream merge, and we never needed the name mangling logic in what is now FindCompiledSymbol. PiperOrigin-RevId: 186039307
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
ffa63e57bdd703ae051ae849af5b5a272fca2223 |
|
25-Jan-2018 |
Sanjoy Das <sanjoy@google.com> |
[TF:XLA] Replace most of HloProfilePrinter by a protocol buffer This change replaces the meat of HloProfilePrinter with a protobuf HloProfilePrinterData. The original plan was to serialize HloProfilePrinter into C++ source code and put that in a .cc file along with the string for the xla::ProgramShape. However, since we now directly serialize xla::ProgramShape into a .o file, for consistency I think we should do the same thing for HloProfilePrinter (instead of adding yet another output file to tfcompile). The change itself is fairly simple, it is large mostly due to the mass renaming I had to do. PiperOrigin-RevId: 183158192
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
d4bfabc0cf744b890319d4612c2704e74fbc4eac |
|
10-Jan-2018 |
Sanjoy Das <sanjoy@google.com> |
[XLA] Clean up our handling of ExecutionProfile and add a test case ExecutionProfile::compute_cycle_count never worked for CPU and GPU with Hlo profiling disabled, as far as I can tell. PiperOrigin-RevId: 181517824
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
fc2526a8c1cf0bc2a93c8cc819ff7209eb4628c9 |
|
16-Dec-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Merged commit includes the following changes: 179277894 by gunan: Run buildifier on build file. -- 179275101 by meheff: Replace DeviceMemoryBase with ShapedBuffer in XLA interfaces. Executable, TransferManager, and AllocationTracker now use ShapedBuffer to hold device memory addresses holding XLA data. Most of the change is straight-forward with the exception of AllocationTracker which was mostly rewritten (and simplified) and some refactoring in the CPU executable. Also, have ShapedBuffer hold on-host and on-device Shapes which are the shapes of the representation of the data on the host and device, respectively. This is necessary because with cl/178624364 the on-host and on-device shape may no longer be equal. -- 179265385 by A. Unique TensorFlower: Return error rather than CHECK fail in Executable::ExecuteOnStreamWrapper -- 179264551 by dandelion: Internal fixes. -- PiperOrigin-RevId: 179277894
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
4b636957604faa3361a799dd9d8749a6b85afff7 |
|
22-Nov-2017 |
Sanjoy Das <sanjoy@google.com> |
Place HloProfilePrinter and HloProfileIndexMap in Executable This refactoring will later allow XlaCompiledCpuFunction to pull out the HloProfilePrinter from Executable and use that to display the hlo execution profile. A de/serialized HloProfilePrinter will let AOT compiled binaries display their Hlo execution profile. PiperOrigin-RevId: 176689528
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
cf245240ca90e6b552415f720342ae1acd326590 |
|
22-Nov-2017 |
Sanjoy Das <sanjoy@google.com> |
[XLA:CPU] Add a basic implementation for ExecuteAsyncOnStream PiperOrigin-RevId: 176680801
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
d3f1adc0394c4954328ba03f3bcb6ee378b97068 |
|
20-Nov-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Reserve vector capacity when the final size is known PiperOrigin-RevId: 176414557
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
0a7be5a2f58fe5470fa7526c9de1404cb16fe3dc |
|
31-Oct-2017 |
Sanjoy Das <sanjoy@google.com> |
Rename (Add|Get)ProfileResult to something more specific; NFC PiperOrigin-RevId: 174084570
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
8c748bdb7cbf435925675d6b7a3d75ecbefa3351 |
|
27-Sep-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Add more `const`s to xla::Executable. No functional change. PiperOrigin-RevId: 170252047
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
06deeea373c93ea36547648481c5daf4dc56126f |
|
27-Sep-2017 |
Mark Heffernan <meheff@google.com> |
For tuple-shaped data, change ShapedBuffer (an abstraction holding on-device data of a given shape) to also hold an array of pointers representing the tuple structure in the device memory. Previously ShapedBuffer only held array-shaped data at the leaves of the tuple shape. Construction of these array-of-pointers is handled by TransferManager which has to construct array-of-pointers anyway to transfer literals to the device. This change makes ShapedBuffer match the native representative of tuple-shaped data passed into XLA computations. This is the first step to migrating XLA interfaces away from using naked device memory pointers (DeviceMemoryBase) to using more expressive ShapedBuffers instead. This change enables tuple-shaped parameters in computations run through the LocalClient interface. Also, change LocalClient interfaces to return ScopedShapedBuffers as these are generally easier to deal with ownership-wise that ShapedBuffers. They are analogous to std::unique_ptr, while ShapedBuffers are analogous to bare pointers. This change includes a couple other cleanups found along the way: * move cpu/gpu/interpreter transfer managers into their respective directories under xla/service. * Make the generic transfer manager take a pointer size. Previously it would just use sizeof(void*) which might not be exactly what is needed. PiperOrigin-RevId: 170133015
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
5ead76420dee762a5f710fda6893075f1292d5d3 |
|
19-Aug-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Reduce XLA compile time by ~7% for a convolutional image model: * Added CompactPointerSet<T>, which is optimized for set size <= 1. * Changed expensive CHECKs to DCHECKS in buffer_assignment.cc * Reserve space in DFS state array before starting DFS. * Use unsigned arithmetic in DFS state maintenance. * HloInstruction: - Moved frequently used fields to start for better cache locality. - Use InlinedVector instead of vector for operand array. - Use InlinedVector instead of vector for DFS stack. * Pre-compute "is array" and "is tuple" for LogicalBuffer. * PointsToSet: - Combine two ShapeTrees into one. - Use CompactPointerSet instead of std::set to hold sources. - Use CompactPointerSet instead of std::set to hold flattened buffers. * ShapeTree: use unique_ptr instead of optional for shape storage (reduces size and destruction overhead). * Add proper const qualifiers to some FlatSet iterator methods. Co-author=jeff PiperOrigin-RevId: 165759117
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
b882d686ff00f73425a846c47e29a7c336435f25 |
|
01-Aug-2017 |
Bjarke Hammersholt Roune <broune@google.com> |
Allow cost estimates to differ per backend and include the estimates into the HLO profile. Add a summary table for what categories have the most opportunity for optimization left in them. PiperOrigin-RevId: 163780413
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
34cbf161d7b1191ad5c1b3bc02fc52d338e8b175 |
|
27-Jul-2017 |
Jiri Simsa <jsimsa@google.com> |
Update Dataset API documentation. PiperOrigin-RevId: 163349457
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
b760c0cade03186a8f194390f6cba46fb363bfca |
|
07-Jul-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Update xla compiler after upstream Orc API change r307350: https://reviews.llvm.org/rL307350 PiperOrigin-RevId: 161195744
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
05412bd367198ec491ca034b4bc634784c03125c |
|
07-Jun-2017 |
Mark Heffernan <meheff@google.com> |
[XLA] Simplify Shape traversal visitors. Simplify shape traversal visitors in ShapeUtil and ShapeTree. Add a non-Status form because most uses of the traversal methods do not use it, and remove is_leaf parameter from ShapeTree.ForEach* as it is not frequently used. PiperOrigin-RevId: 158201574
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
95719e869c61c78a4b0ac0407e1fb04e60daca35 |
|
23-May-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Teach Executable to do its own profiling (patch 1/4). Presently, ExecuteOnStreamWrapper is a method on xla::Service, where it doesn't really conceptually belong -- note that it doesn't use anything from the containing Service object, but it does have an Executable object as its first parameter that it could easily be a method on instead. The only reason that it needs to be on Service is that it needs to access a Backend object in order to call backend->compiler()->shape_size_function(), and simply moving that into Executable would introduce a dependency cycle. Thus, this patch (the first part of a sequence to address this) teaches Executable and its derivatives to compute shape_size_function. In the CPU cases, this is simply a static function. However, in the GPU case, we need to pass in the shape_size_function to the constructor, since it depends on a pointer size computed in the GpuCompiler. PiperOrigin-RevId: 156807318
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
e84588a3e9e67d91d6ae32e64469f890d217c5dd |
|
18-May-2017 |
Eli Bendersky <eliben@google.com> |
[XLA] Attach an HloModuleConfig to HloModule, obviating the need to pass them around as a pair. This cuts through a bunch of critical XLA APIs, but it's time... The background for this change is to make flags/options more easily pipe-able from the TF/XLA boundary deep into the XLA compiler and other components. The situation after this CL is still not perfect; there are a number of places with chicken-egg scenarios when a module has to be constructed before a config (to register the result shape), but the situation is strictly better than before. Future CLs will clean things up even more. PiperOrigin-RevId: 156469639
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
738143e6cd7f9eba0a0e77b44c6cc5ae4e1781ad |
|
07-Mar-2017 |
Peter Hawkins <phawkins@google.com> |
[TF:XLA] Remove support for client-allocated result buffers. This code path is unused; Tensorflow ended up settling on having XLA allocate result buffers using Tensorflow's allocator. Remove it to reduce the proliferation of ExecuteXYZ() methods. Change: 149423775
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
112a534b50c0a23dec95382941ac0556f2866b29 |
|
03-Mar-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA:GPU] Cache GPU substreams across executions Change: 149063035
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
af2c7253bb1f9d135ad9b0c6a271741205ab57fd |
|
02-Mar-2017 |
David Majnemer <majnemer@google.com> |
[XLA] Add support for profiling multiple computations While we are here, add support for getting the cost analysis for call HLOs. Change: 148952748
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
8ff1c465c87fc3967c9d480646fac6d6205f856c |
|
07-Feb-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[TF:XLA] Change buffer assignment to combine temp buffers into one allocation. This lays the groundwork for future CLs to reduce overall memory usage, but doesn't accomplish that goal yet. I.e. this is step 1. The main change is in the semantics of BufferAllocation. Previously we'd only assign non-interferring (i.e. disjoint in liveness) LogicalBuffers to a single BufferAllocation. This meant that each BufferAllocation represented a unique address range in the working memory of the compiled program. Now we allow assignment of LogicalBuffers that overlap in liveness to the same BufferAllocation, by ensuring they occupy disjoint address ranges within the allocation. Bookkeeping of each address range is accomplished by associating each LogicalBuffer with an offset and size. We take advantage of these new semantics to combine all temp buffers into a single BufferAllocation, by laying them end-to-end in a postprocessing step - see BufferAssigner::CombineTempAllocations. This is the same logic that TempBufferOffsets used on the GPU side; that class has been removed. Entry parameters (inputs) and maybe_live_out (outputs) are unchanged, and may still occupy multiple BufferAllocations. The rest of the CL deals with the consequences of these changes. Change: 146800348
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|
1e67c90e2caceeff82d09793d1ef5fa0300d219b |
|
09-Jan-2017 |
Peter Hawkins <phawkins@google.com> |
Initial open-source release of XLA: Accelerated Linear Algebra. XLA is a compiler-based linear algebra execution engine that targets CPUs, GPUs and custom accelerators. XLA is still experimental; we are releasing it early to get the community involved. Change: 143990941
/external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
|