Cross Reference: /external/tensorflow/tensorflow/compiler/xla/service/gpu/hlo_schedule

History log of /external/tensorflow/tensorflow/compiler/xla/service/gpu/hlo_schedule_test.cc
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
2ad85aa4db2d0d6ea71e53c0fed3c7081847c55c	12-Sep-2017	Mark Heffernan <meheff@google.com>	Use xla/tests:xla_internal_test_main for all tests under tf/compiler/xla and remove any main() definitions in tests. This enables use of flags in all tests. PiperOrigin-RevId: 168424796 /external/tensorflow/tensorflow/compiler/xla/service/gpu/hlo_schedule_test.cc
9641b8edab3113f5bb83b5491de747dc9a43fe01	09-Jun-2017	Eli Bendersky <eliben@google.com>	[XLA] Switch HloTestBase-based tests to use new debug options flag. PiperOrigin-RevId: 158522608 /external/tensorflow/tensorflow/compiler/xla/service/gpu/hlo_schedule_test.cc
eb10a4c494d95e7c17ddc44ef35197d08f2f6b33	01-Jun-2017	A. Unique TensorFlower <gardener@tensorflow.org>	Preallocate vector storage when the ultimate vector size is known in advance PiperOrigin-RevId: 157724431 /external/tensorflow/tensorflow/compiler/xla/service/gpu/hlo_schedule_test.cc
379560be32c3910593e94aa6e91277fc3df3fc98	02-Mar-2017	A. Unique TensorFlower <gardener@tensorflow.org>	[TF:XLA] Reduce sequential memory usage via better ordering and simulated heap. The choice of instruction ordering, and the minimization of fragmentation once we've chosen an order, are two large inter-related factors wrt overall memory usage. The approach in this CL uses heuristics to do better on both, but neither problem is completely solved. To pick a better an ordering (the larger factor), the approach is to try the original list-scheduler based ordering, and to also try a DFS based ordering. We pick the ordering that yields a smaller minimum memory, computed with the simulated heap, ignoring fragmentation. Note that this is the absolute minimum memory for a given ordering. To minimize fragmentation, the approach is to run a heap simulation on temporary buffers. We still try to re-use existing allocations when possible, but instead of creating new allocations for temp buffers, we collect all the leftovers and use a heap to pack them. The heap algorithm that gave the best results is "lazy best-fit"; a variant of traditional best-fit that sometimes delays offset assignment until Free is called, in the hopes of yielding larger free chunks. Here's some measurements of the temp buffer sizes for GNMT encoder training (a stacked LSTM). Lower is better. I've tried various combinations of instruction ordering and heap simulation, to show the joint impact of these two factors. List-scheduler order, no heap simulation 33.33GiB List-scheduler order, with heap simulation 25.09GiB Minimized DFS order, no heap simulation 16.59GiB Arbitrary DFS order, no heap simulation 15.05GiB (old) Arbitrary DFS order, with heap simulation 12.57GiB Minimized DFS order, with heap simulation 11.71GiB (new) Note that the original list scheduler order is much worse than DFS on stacked LSTMs, but (not shown here) is much better than DFS on convolutions like Inception. Also note that heap simulation packs things tighter for all instruction orders in this example, but to varying degrees. Change: 149049028 /external/tensorflow/tensorflow/compiler/xla/service/gpu/hlo_schedule_test.cc
c4824086310a284bd41c49e22f11274a224f68ea	28-Jan-2017	A. Unique TensorFlower <gardener@tensorflow.org>	[TF:XLA:GPU] Fix HloSchedule to account for stream predecessors transitively. For a given op X, the previous code only considered stream predecessors to run before X. The new code considers the transitive closure of stream predecessors to run before X. This is a tighter (and more correct) restriction. This should lower memory usage, but the full effect won't be seen until sim-heap based buffer allocation is implemented. Change: 145858339 /external/tensorflow/tensorflow/compiler/xla/service/gpu/hlo_schedule_test.cc
1e67c90e2caceeff82d09793d1ef5fa0300d219b	09-Jan-2017	Peter Hawkins <phawkins@google.com>	Initial open-source release of XLA: Accelerated Linear Algebra. XLA is a compiler-based linear algebra execution engine that targets CPUs, GPUs and custom accelerators. XLA is still experimental; we are releasing it early to get the community involved. Change: 143990941 /external/tensorflow/tensorflow/compiler/xla/service/gpu/hlo_schedule_test.cc