Cross Reference: /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu

History log of /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
f83bb6db5bd17df215994bde7adacef50ece0192	16-Feb-2018	Sanjoy Das <sanjoy@google.com>	[XLA:CPU] Minor cleanup to simple_orc_jit SimpleResolver became unused after an LLVM upstream merge, and we never needed the name mangling logic in what is now FindCompiledSymbol. PiperOrigin-RevId: 186039307 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
ffa63e57bdd703ae051ae849af5b5a272fca2223	25-Jan-2018	Sanjoy Das <sanjoy@google.com>	[TF:XLA] Replace most of HloProfilePrinter by a protocol buffer This change replaces the meat of HloProfilePrinter with a protobuf HloProfilePrinterData. The original plan was to serialize HloProfilePrinter into C++ source code and put that in a .cc file along with the string for the xla::ProgramShape. However, since we now directly serialize xla::ProgramShape into a .o file, for consistency I think we should do the same thing for HloProfilePrinter (instead of adding yet another output file to tfcompile). The change itself is fairly simple, it is large mostly due to the mass renaming I had to do. PiperOrigin-RevId: 183158192 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
d4bfabc0cf744b890319d4612c2704e74fbc4eac	10-Jan-2018	Sanjoy Das <sanjoy@google.com>	[XLA] Clean up our handling of ExecutionProfile and add a test case ExecutionProfile::compute_cycle_count never worked for CPU and GPU with Hlo profiling disabled, as far as I can tell. PiperOrigin-RevId: 181517824 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
fc2526a8c1cf0bc2a93c8cc819ff7209eb4628c9	16-Dec-2017	A. Unique TensorFlower <gardener@tensorflow.org>	Merged commit includes the following changes: 179277894 by gunan: Run buildifier on build file. -- 179275101 by meheff: Replace DeviceMemoryBase with ShapedBuffer in XLA interfaces. Executable, TransferManager, and AllocationTracker now use ShapedBuffer to hold device memory addresses holding XLA data. Most of the change is straight-forward with the exception of AllocationTracker which was mostly rewritten (and simplified) and some refactoring in the CPU executable. Also, have ShapedBuffer hold on-host and on-device Shapes which are the shapes of the representation of the data on the host and device, respectively. This is necessary because with cl/178624364 the on-host and on-device shape may no longer be equal. -- 179265385 by A. Unique TensorFlower: Return error rather than CHECK fail in Executable::ExecuteOnStreamWrapper -- 179264551 by dandelion: Internal fixes. -- PiperOrigin-RevId: 179277894 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
4b636957604faa3361a799dd9d8749a6b85afff7	22-Nov-2017	Sanjoy Das <sanjoy@google.com>	Place HloProfilePrinter and HloProfileIndexMap in Executable This refactoring will later allow XlaCompiledCpuFunction to pull out the HloProfilePrinter from Executable and use that to display the hlo execution profile. A de/serialized HloProfilePrinter will let AOT compiled binaries display their Hlo execution profile. PiperOrigin-RevId: 176689528 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
cf245240ca90e6b552415f720342ae1acd326590	22-Nov-2017	Sanjoy Das <sanjoy@google.com>	[XLA:CPU] Add a basic implementation for ExecuteAsyncOnStream PiperOrigin-RevId: 176680801 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
d3f1adc0394c4954328ba03f3bcb6ee378b97068	20-Nov-2017	A. Unique TensorFlower <gardener@tensorflow.org>	Reserve vector capacity when the final size is known PiperOrigin-RevId: 176414557 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
0a7be5a2f58fe5470fa7526c9de1404cb16fe3dc	31-Oct-2017	Sanjoy Das <sanjoy@google.com>	Rename (Add\|Get)ProfileResult to something more specific; NFC PiperOrigin-RevId: 174084570 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
8c748bdb7cbf435925675d6b7a3d75ecbefa3351	27-Sep-2017	A. Unique TensorFlower <gardener@tensorflow.org>	Add more `const`s to xla::Executable. No functional change. PiperOrigin-RevId: 170252047 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
06deeea373c93ea36547648481c5daf4dc56126f	27-Sep-2017	Mark Heffernan <meheff@google.com>	For tuple-shaped data, change ShapedBuffer (an abstraction holding on-device data of a given shape) to also hold an array of pointers representing the tuple structure in the device memory. Previously ShapedBuffer only held array-shaped data at the leaves of the tuple shape. Construction of these array-of-pointers is handled by TransferManager which has to construct array-of-pointers anyway to transfer literals to the device. This change makes ShapedBuffer match the native representative of tuple-shaped data passed into XLA computations. This is the first step to migrating XLA interfaces away from using naked device memory pointers (DeviceMemoryBase) to using more expressive ShapedBuffers instead. This change enables tuple-shaped parameters in computations run through the LocalClient interface. Also, change LocalClient interfaces to return ScopedShapedBuffers as these are generally easier to deal with ownership-wise that ShapedBuffers. They are analogous to std::unique_ptr, while ShapedBuffers are analogous to bare pointers. This change includes a couple other cleanups found along the way: * move cpu/gpu/interpreter transfer managers into their respective directories under xla/service. * Make the generic transfer manager take a pointer size. Previously it would just use sizeof(void*) which might not be exactly what is needed. PiperOrigin-RevId: 170133015 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
5ead76420dee762a5f710fda6893075f1292d5d3	19-Aug-2017	A. Unique TensorFlower <gardener@tensorflow.org>	Reduce XLA compile time by ~7% for a convolutional image model: * Added CompactPointerSet<T>, which is optimized for set size <= 1. * Changed expensive CHECKs to DCHECKS in buffer_assignment.cc * Reserve space in DFS state array before starting DFS. * Use unsigned arithmetic in DFS state maintenance. * HloInstruction: - Moved frequently used fields to start for better cache locality. - Use InlinedVector instead of vector for operand array. - Use InlinedVector instead of vector for DFS stack. * Pre-compute "is array" and "is tuple" for LogicalBuffer. * PointsToSet: - Combine two ShapeTrees into one. - Use CompactPointerSet instead of std::set to hold sources. - Use CompactPointerSet instead of std::set to hold flattened buffers. * ShapeTree: use unique_ptr instead of optional for shape storage (reduces size and destruction overhead). * Add proper const qualifiers to some FlatSet iterator methods. Co-author=jeff PiperOrigin-RevId: 165759117 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
b882d686ff00f73425a846c47e29a7c336435f25	01-Aug-2017	Bjarke Hammersholt Roune <broune@google.com>	Allow cost estimates to differ per backend and include the estimates into the HLO profile. Add a summary table for what categories have the most opportunity for optimization left in them. PiperOrigin-RevId: 163780413 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
34cbf161d7b1191ad5c1b3bc02fc52d338e8b175	27-Jul-2017	Jiri Simsa <jsimsa@google.com>	Update Dataset API documentation. PiperOrigin-RevId: 163349457 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
b760c0cade03186a8f194390f6cba46fb363bfca	07-Jul-2017	A. Unique TensorFlower <gardener@tensorflow.org>	Update xla compiler after upstream Orc API change r307350: https://reviews.llvm.org/rL307350 PiperOrigin-RevId: 161195744 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
05412bd367198ec491ca034b4bc634784c03125c	07-Jun-2017	Mark Heffernan <meheff@google.com>	[XLA] Simplify Shape traversal visitors. Simplify shape traversal visitors in ShapeUtil and ShapeTree. Add a non-Status form because most uses of the traversal methods do not use it, and remove is_leaf parameter from ShapeTree.ForEach* as it is not frequently used. PiperOrigin-RevId: 158201574 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
95719e869c61c78a4b0ac0407e1fb04e60daca35	23-May-2017	A. Unique TensorFlower <gardener@tensorflow.org>	[XLA] Teach Executable to do its own profiling (patch 1/4). Presently, ExecuteOnStreamWrapper is a method on xla::Service, where it doesn't really conceptually belong -- note that it doesn't use anything from the containing Service object, but it does have an Executable object as its first parameter that it could easily be a method on instead. The only reason that it needs to be on Service is that it needs to access a Backend object in order to call backend->compiler()->shape_size_function(), and simply moving that into Executable would introduce a dependency cycle. Thus, this patch (the first part of a sequence to address this) teaches Executable and its derivatives to compute shape_size_function. In the CPU cases, this is simply a static function. However, in the GPU case, we need to pass in the shape_size_function to the constructor, since it depends on a pointer size computed in the GpuCompiler. PiperOrigin-RevId: 156807318 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
e84588a3e9e67d91d6ae32e64469f890d217c5dd	18-May-2017	Eli Bendersky <eliben@google.com>	[XLA] Attach an HloModuleConfig to HloModule, obviating the need to pass them around as a pair. This cuts through a bunch of critical XLA APIs, but it's time... The background for this change is to make flags/options more easily pipe-able from the TF/XLA boundary deep into the XLA compiler and other components. The situation after this CL is still not perfect; there are a number of places with chicken-egg scenarios when a module has to be constructed before a config (to register the result shape), but the situation is strictly better than before. Future CLs will clean things up even more. PiperOrigin-RevId: 156469639 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
738143e6cd7f9eba0a0e77b44c6cc5ae4e1781ad	07-Mar-2017	Peter Hawkins <phawkins@google.com>	[TF:XLA] Remove support for client-allocated result buffers. This code path is unused; Tensorflow ended up settling on having XLA allocate result buffers using Tensorflow's allocator. Remove it to reduce the proliferation of ExecuteXYZ() methods. Change: 149423775 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
112a534b50c0a23dec95382941ac0556f2866b29	03-Mar-2017	A. Unique TensorFlower <gardener@tensorflow.org>	[XLA:GPU] Cache GPU substreams across executions Change: 149063035 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
af2c7253bb1f9d135ad9b0c6a271741205ab57fd	02-Mar-2017	David Majnemer <majnemer@google.com>	[XLA] Add support for profiling multiple computations While we are here, add support for getting the cost analysis for call HLOs. Change: 148952748 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
8ff1c465c87fc3967c9d480646fac6d6205f856c	07-Feb-2017	A. Unique TensorFlower <gardener@tensorflow.org>	[TF:XLA] Change buffer assignment to combine temp buffers into one allocation. This lays the groundwork for future CLs to reduce overall memory usage, but doesn't accomplish that goal yet. I.e. this is step 1. The main change is in the semantics of BufferAllocation. Previously we'd only assign non-interferring (i.e. disjoint in liveness) LogicalBuffers to a single BufferAllocation. This meant that each BufferAllocation represented a unique address range in the working memory of the compiled program. Now we allow assignment of LogicalBuffers that overlap in liveness to the same BufferAllocation, by ensuring they occupy disjoint address ranges within the allocation. Bookkeeping of each address range is accomplished by associating each LogicalBuffer with an offset and size. We take advantage of these new semantics to combine all temp buffers into a single BufferAllocation, by laying them end-to-end in a postprocessing step - see BufferAssigner::CombineTempAllocations. This is the same logic that TempBufferOffsets used on the GPU side; that class has been removed. Entry parameters (inputs) and maybe_live_out (outputs) are unchanged, and may still occupy multiple BufferAllocations. The rest of the CL deals with the consequences of these changes. Change: 146800348 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc
1e67c90e2caceeff82d09793d1ef5fa0300d219b	09-Jan-2017	Peter Hawkins <phawkins@google.com>	Initial open-source release of XLA: Accelerated Linear Algebra. XLA is a compiler-based linear algebra execution engine that targets CPUs, GPUs and custom accelerators. XLA is still experimental; we are releasing it early to get the community involved. Change: 143990941 /external/tensorflow/tensorflow/compiler/xla/service/cpu/cpu_executable.cc