ed2d6181ce9df646c8b7150fac754a796ccc2f88 |
|
20-Feb-2018 |
Patrick Nguyen <drpng@google.com> |
Merge commit for internal changes Conflicts: RELEASE.md configure.py tensorflow/contrib/cmake/external/zlib.cmake tensorflow/contrib/cmake/python_modules.txt tensorflow/contrib/cmake/tests/cuda/compatibility_test.c tensorflow/contrib/cmake/tests/cuda/compatibility_test.cc tensorflow/contrib/data/python/ops/dataset_ops.py tensorflow/contrib/gan/python/eval/python/summaries_test.py tensorflow/contrib/layers/python/layers/layers.py tensorflow/contrib/layers/python/layers/layers_test.py tensorflow/contrib/tpu/profiler/pip_package/setup.py tensorflow/core/public/version.h tensorflow/docs_src/install/install_c.md tensorflow/docs_src/install/install_go.md tensorflow/docs_src/install/install_java.md tensorflow/docs_src/install/install_linux.md tensorflow/docs_src/install/install_mac.md tensorflow/docs_src/install/install_sources.md tensorflow/examples/image_retraining/retrain.py tensorflow/python/framework/test_util.py tensorflow/python/keras/_impl/keras/layers/lstm_test.py tensorflow/python/layers/utils.py tensorflow/python/ops/bitwise_ops_test.py tensorflow/python/ops/distributions/beta.py tensorflow/python/ops/image_ops_test.py tensorflow/python/ops/losses/losses_impl.py tensorflow/tools/pip_package/setup.py
|
ba019dc689d6393d8dba04ca57e8b01b374db14f |
|
17-Feb-2018 |
Sanjoy Das <sanjoy@google.com> |
[XLA] Add some plumbing, documentation, verification and shape inference for Gather Pretty much everything other than HLO verification and shape inference will fail for Gather with Unimplemented. Note that this CL is intentionally incomplete -- I figured it would be nicer to get some of the boiler-platey stuff out of the way early. Let me know if you want me to send in a larger but more complete CL instead. PiperOrigin-RevId: 186055521
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
13df417665f216bfb527440f1fd8f04958000ec5 |
|
16-Feb-2018 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[TF:XLA] Adds HostCompute HLO - a pseudo-op to represent host-side computation. PiperOrigin-RevId: 186047964
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
96c2a846609d3a68f9a88c60c4c68a243f74ee44 |
|
16-Feb-2018 |
Bjarke Hammersholt Roune <broune@google.com> |
Add TODOs. PiperOrigin-RevId: 186032527
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
8dfaa05d2824290b33eb922a5269f0772f53478e |
|
16-Feb-2018 |
David Majnemer <majnemer@google.com> |
[XLA] Factor out the code which adds operands to a fusion node This makes it easier for Hlo passes to do interesting rewrites with new, additional parameters which were not operands to the original fusion node. PiperOrigin-RevId: 186024182
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
c2a3935e2bd27d0befa7db5f9c050cfec057e5bb |
|
11-Feb-2018 |
Loo Rong Jie <loorongjie@gmail.com> |
[MSVC] Use explicit func pointer to static method instead of lambda func
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
5476489053f0523b8aebab05bc39a02c089300e0 |
|
06-Feb-2018 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Sink layout sensitivity from CSE into HloInstruction::Identical, and make it the default. PiperOrigin-RevId: 184598903
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
2b932c28d60ec4f248d950cd2ef69a7eb98bb66d |
|
03-Feb-2018 |
Justin Lebar <jlebar@google.com> |
[XLA] Minor cleanups related to multi-output fusion. - Add some comments about preexisting invariants, and add some CHECKs. - In the LoopEmitter constructor, materialize the given ArraySlice<IrArray> to a vector, so we don't rely on the given ArraySlice having any particular lifetime. - Add the invariant that the LoopEmitter constructor which takes a list of IrArrays is only for multi-output fusion. Previously it said: If you only pass one array, then treat it as regular fusion. But this results in an LLVM type mismatch, because the given target_element_generator should be passing a struct with one element. PiperOrigin-RevId: 184365310
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
3be90a490c31d5a8fad70713e059bbb3e723e664 |
|
02-Feb-2018 |
Justin Lebar <jlebar@google.com> |
Internal change PiperOrigin-RevId: 184239740
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
a58524fa602829459aa7eb0335a33afe1f28382a |
|
19-Jan-2018 |
Chris Leary <leary@google.com> |
[XLA] Simplify trivial pad/reduce-window combos into broadcasts. PiperOrigin-RevId: 182585236
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
7eba57baec4442640f11059caecfc10898966e00 |
|
11-Jan-2018 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Make HLO CSE faster. Passing around copies of std::functions incurs heap allocations and deallocations, which, unfortunately, matters in this case. Minimize the amount of copies. PiperOrigin-RevId: 181625079
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
4f877d4e54bb2427882f4a800607a1cf0531b293 |
|
05-Jan-2018 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Make HLO device placer more stable as far as created partitions goes. Also remove the multi-module input capability for the device placer. PiperOrigin-RevId: 180871703
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
5bf26acd87d3d44183fc28cb9576cda10c0255ca |
|
02-Jan-2018 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Automated g4 rollback of changelist 180000981 PiperOrigin-RevId: 180581912
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
c0c2775ce3de682f7913d1aeaf50bbc4d1521934 |
|
23-Dec-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Automated g4 rollback of changelist 179983419 PiperOrigin-RevId: 180000981
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
7d1072dd3374a0aa22637a0fd4a17a4ddd064110 |
|
23-Dec-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Adds FFT for XLA: CPU via Eigen, GPU via cuFFT. GPU support includes plan reuse with new scratch allocator per execution in fft_thunk. PiperOrigin-RevId: 179983419
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
210076afd0eb5a4c4f7f54bb079b75f92b087b3f |
|
23-Dec-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Output of a slice op can alias its operand. PiperOrigin-RevId: 179969317
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
b2aa6950db67ab980012c05d496401200ad60320 |
|
22-Dec-2017 |
Justin Lebar <jlebar@google.com> |
[XLA] Print out missing extra-info for many instructions in the HLO graph dumper. Now we use the same functionality as HloInstruction::ToString() to print instructions' extra info. This fills in a lot of previously-missing info, like reduce-windows' windows, and dots' dot-dimension-numbers. PiperOrigin-RevId: 179892469
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
d2355fcee9f47cc2e8225f8ff54f7c12fa8045f0 |
|
18-Dec-2017 |
Justin Lebar <jlebar@google.com> |
[XLA] Fix comments on HloInstruction::epsilon() and HloInstruction::feature_index(). These functions can be called for kBatchNorm{Training,Inference,Grad}, not just kBatchNormTraining. No functional change. PiperOrigin-RevId: 179363059
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
f806269602219d5095265d036f294cc9a6260971 |
|
15-Dec-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Remove '%' when printing the hlo text in short parsable mode. PiperOrigin-RevId: 179138523
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
d57ab2c4a7cd13e47f942aaff495912fdc96f84a |
|
15-Dec-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Allow omitting operands shapes and program shapes. PiperOrigin-RevId: 179132435
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
b03f0e408710c5a92b87d748360b03c6cb60760d |
|
15-Dec-2017 |
Justin Lebar <jlebar@google.com> |
[XLA] Express the default options to ToString using an overload, rather than a default param. No functional change. The motivation for this is that GDB ignores default params, but resolves overloads just fine. PiperOrigin-RevId: 179125588
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
0ea2d74f883914109eb154bcf2a7d61ae0557f2d |
|
15-Dec-2017 |
Justin Lebar <jlebar@google.com> |
[XLA] Remove the notion of a "parameter name" separate from the instruction's name. Also set the instruction's name in the HLO parser, so that after parsing, the instructions have the names they're given in the input string. PiperOrigin-RevId: 179119003
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
a99b32fb149d028cd31fe638f81c6ca56c6e3b57 |
|
14-Dec-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Gather the bool parameters into one thing to control the text format. PiperOrigin-RevId: 179079727
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
2adbc217b3eeed329d077050e0f1f7d88edd86d7 |
|
12-Dec-2017 |
Sanjoy Das <sanjoy@google.com> |
[XLA:CPU] Teach the CPU layout assignment about dot dimension numbers There is no great need for this yet, but I noticed that the test cases were broken (they were constructing dots with unset dimension numbers), and one thing led to another. PiperOrigin-RevId: 178713597
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
fe8406149feec453250905965a14285465cd2063 |
|
07-Dec-2017 |
Shanqing Cai <cais@google.com> |
Merge changes from github. PiperOrigin-RevId: 178185697
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
22767d59b3c6958ed690814ff77e29ee1d458b18 |
|
06-Dec-2017 |
Bjarke Hammersholt Roune <broune@google.com> |
Allow CrossReplicaSum to take multiple operands internally. PiperOrigin-RevId: 178043362
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
6eec9c2ea33f3b86012cb0ea2aeb9e49e65bc716 |
|
01-Dec-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Hlo parser: support rng and reduce-precision. Also simplify the lexer by regarding several things as identifier. PiperOrigin-RevId: 177548483
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
4146ff1259c0b4ada8afbbad11a7b37d8373d1b9 |
|
30-Nov-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Adds Dot with DotDimensionNumbers proto for specifying arbitrary contracting and batch dimensions. PiperOrigin-RevId: 177481231
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
b1d8c59e9b014b527fb2fbef9ce9afc14dbc4938 |
|
22-Nov-2017 |
Yifei Feng <yifeif@google.com> |
Merge changes from github. PiperOrigin-RevId: 176695926
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
e70c00950d295c519fd9c7f8b12e13a3c5aaf710 |
|
22-Nov-2017 |
Yifei Feng <yifeif@google.com> |
Automated g4 rollback of changelist 176615107 PiperOrigin-RevId: 176622438
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
c6d603f02e1a98f871912cda6716cdcbed6b439e |
|
22-Nov-2017 |
Yifei Feng <yifeif@google.com> |
Merge changes from github. PiperOrigin-RevId: 176615107
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
ef3ee202659a2a49afcd9898451bf9b1256a2757 |
|
22-Nov-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Add BitcastConvert HLO op to enable bitwise operations on floating point types. PiperOrigin-RevId: 176610007
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
8547044d4dacaa0d6001578634a44b488dd23601 |
|
18-Nov-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Add conditional HloInstruction and handle conditional in DFS visitors. PiperOrigin-RevId: 176175297
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
55ee41a98d50e200eda314ebf08f092000477f6e |
|
17-Nov-2017 |
Mark Heffernan <meheff@google.com> |
When constructing fusion computations from a proto, do not uniquify the names. The names are already unique and uniquifying them again will mutate them resulting in inconsistent names between the proto and the constructed HLO. PiperOrigin-RevId: 176035108
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
f9e3e8d8731daf338b6dc743aef84c35740ca037 |
|
14-Nov-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Hlo parser: support fusion. Also, - Add a HloInstruction::CreateFusion interface that creates a fusion instruction with given fusion computation. Add a HloComputation::SetFusionInstruction interface to help do that. - Change how we print fusion kind. Before this change we print fusion kind together with the opcode, e.g., fusion:kLoop, which is not easy to parse. Now we append fusion kind as an attribute. - Print fusion computation the same way as other computations, instead of nested in an instruction. PiperOrigin-RevId: 175621768
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
9c6eebabc71906d240338adc89fa838bd5635aa0 |
|
11-Nov-2017 |
HyoukJoong Lee <hyouklee@google.com> |
Add comment on HloPtrComparator. PiperOrigin-RevId: 175370054
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
61aebf140e12e2ad834dc94a83f23fc574c79340 |
|
11-Nov-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Hlo parser: support metadata. Also give metadata it's own format. PiperOrigin-RevId: 175356154
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
8614ef614245cfcfdd09bda0d633d5aa4f6e856e |
|
10-Nov-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Extend the Array class with more functionality. PiperOrigin-RevId: 175277161
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
8392a4b8e9d6d7ccbfde15dcdda0477c2791b6dc |
|
10-Nov-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Hlo parser: support padding. Also, give PaddingConfig its own ToString format. PiperOrigin-RevId: 175239832
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
51895becce83ef4dc8bac263377d158fc50e4d53 |
|
09-Nov-2017 |
HyoukJoong Lee <hyouklee@google.com> |
Change for asynchronous Send and Recv by splitting Send into {Send, SendDone} and Recv into {Recv, RecvDone}. See operation_semantics.md for the updated semantics. PiperOrigin-RevId: 175216012
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
456929281592f14d50443cfbdaa2f6b36167a134 |
|
03-Nov-2017 |
Mark Heffernan <meheff@google.com> |
Rollback copy insertion change because it results in a DCHECK with an internal model. END_PUBLIC BEGIN_PUBLIC Automated g4 rollback of changelist 174423881 PiperOrigin-RevId: 174505237
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
96c415ad77c20e1cf2da5e61f85e24fd6c36eb28 |
|
03-Nov-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Use maps with a deterministic iteration order for HloInstruction*. Convert a bunch of std::maps with HloInstruction* and const HloInstruction* keys to use a comparator that is based on the unique_id of the instruction rather than the pointer value. PiperOrigin-RevId: 174474868
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
7bb2d57b0b051d1cf8dd74d3276bf5a452774172 |
|
03-Nov-2017 |
Mark Heffernan <meheff@google.com> |
Rewrite CopyInsertion to use module-scoped HloAliasAnalysis. The net effect (number of copies inserted) is roughly similar to the existing implementation, but the new implementation is much more general. The new implementation can handle entry argument buffer reuse with minimal modification, for example. Some unnecessary copies are still added due to deficiencies in buffer assignment (b/62548313), but these can be removed when buffer assignment also uses HloAliasAnalysis. Also address a few issues uncovered with this cl: (1) For inplace dynamic slice in llvm backends, truncate do not wrap the slice. This matches the behavior of the non-inplace variant. (2) Disable SelectBetweenPredTuples test on GPU. The test introduces top-level buffer ambiguity which is not tolerated by the gpu backend. (3) When deserializing HLO form a proto, do not uniquify instruction names in fused computations. (4) In dataflow analysis, don't deallocate deleted HloValues during propagation. (5) In dataflow analysis, fix issue with live_out_of_computation property. PiperOrigin-RevId: 174423881
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
274e9ed51ea6cc09a0b5fc1cee4756ac0e9aa525 |
|
03-Nov-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[TF:XLA] Add a const HLO visitor. Use it in the HLO cost analysis pass. PiperOrigin-RevId: 174411043
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
53a4fcbdbad571e659203733f6a07ba82651d40b |
|
02-Nov-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Fixed HloComputation/HloInstruction clone to allow deep clone, and avoid the cloned instruction and computations to still have live link to their parent original modules and computations. PiperOrigin-RevId: 174271432
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
3b0414872f08cfabbf71a495ad661a7c892c76d8 |
|
02-Nov-2017 |
Chris Leary <leary@google.com> |
[XLA] Allow full dumps of constant values via boolean parameter. PiperOrigin-RevId: 174257660
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
efcbf6e34e4519172d38be76c08c2d99792fd7be |
|
30-Oct-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Supported in this CL: * Attaching sharding descriptors to HLO ops * Partitioning the HLO graph into per-device computations based on those sharding descriptors. * All operator support for device placement and ops replicated on all devices. * Elementwise op support for tiled shardings. * 2D Convolution support for tiled shardings (no stride or dilation support). PiperOrigin-RevId: 173946036
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
5dd569cf026bae92330a194c8f2895d0f48149d9 |
|
14-Oct-2017 |
Mark Heffernan <meheff@google.com> |
Make the HLO proto representation (hlo.proto) full fidelity. Hlo modules can be serialized to HLO protos and deserialized without any information loss. As part of this change, a bug is fixed in NameUniquer. Previously, passing names with numeric suffixes could result in name collisions. PiperOrigin-RevId: 172161360
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
403e51018b3c47cd5989d6b50776e235221fade4 |
|
10-Oct-2017 |
Justin Lebar <jlebar@google.com> |
[XLA] Factor out repeated LatestNonGteAncestorAndIndex helper. PiperOrigin-RevId: 171620470
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
76db7553ab2998116a62d6c242aa39373a362993 |
|
29-Sep-2017 |
Chris Leary <leary@google.com> |
[XLA] Make it possible to inline calls to side-effecting computations. PiperOrigin-RevId: 170515496
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
c7d4e4bf9cdc9aa29de6e6c3d97e4a1c4f2f25d9 |
|
29-Sep-2017 |
Sanjoy Das <sanjoy@google.com> |
Automated g4 rollback of changelist 170435356 PiperOrigin-RevId: 170507630
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
9b1b5d85b9ce3c812dc772da1f3f5d09581e5b49 |
|
29-Sep-2017 |
Justin Lebar <jlebar@google.com> |
[XLA] Make HloComputation::instructions() return a view of HloInstruction*s. Currently it returns a view of unique_ptr<HloInstruction>s. But the fact that these are unique_ptrs is an implementation detail, and it's ugly to leak it everywhere. PiperOrigin-RevId: 170445375
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
872917e78f7628c00f93162c70d74e8b659e0123 |
|
29-Sep-2017 |
Sanjoy Das <sanjoy@google.com> |
Automated g4 rollback of changelist 170430143 PiperOrigin-RevId: 170435356
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
bda0dde93049505b113aa78f3291f47546fd9265 |
|
29-Sep-2017 |
Sanjoy Das <sanjoy@google.com> |
Avoid creating fusions that reuse their inputs. We generally avoid creating such fusions, but it looks like we missed the case where elementwise operations implicitly broadcast their inputs. PiperOrigin-RevId: 170430143
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
f972d800ca3accc9af0ad5b9dcabbc5d9b125ab5 |
|
28-Sep-2017 |
Justin Lebar <jlebar@google.com> |
[XLA] Replace HloComputation::ReplaceUsesOfInstruction with HloInstruction::ReplaceAllUsesWith. RAUW used to be *almost* synonymous with RUOI, except RAUW didn't update the computation's root. This was a dangerous footgun -- if you accidentally called RAUW when you wanted RUOI (which you almost always did), your code would work perfectly, except when the relevant node happened to be the root of a computation. This change simplifies our APIs so there's just one Right Way To Do It, by making RAUW update the computation. PiperOrigin-RevId: 170290230
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
5cac28c41af785532e90101787cf85545cdac410 |
|
27-Sep-2017 |
Justin Lebar <jlebar@google.com> |
[XLA] Add HloEvaluator::EvaluateWithSubstitutions(). This evaluates an HLO, using a given map of literals to determine the values of some of its operands. PiperOrigin-RevId: 170215954
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
85c4a379985b46930ece49edc4347af628ee2928 |
|
24-Sep-2017 |
Peter Hawkins <phawkins@google.com> |
[XLA] Adds an API to attach a device assignment to HLO operators. PiperOrigin-RevId: 169841868
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
c82a933f449e637ee83244d2c40162e24cdde0e1 |
|
15-Sep-2017 |
Sanjoy Das <sanjoy@google.com> |
Lower vector-matrix dot to LLVM IR if the RHS of the dot can be made column major. The naive dot lowering to LLVM IR (already present in XLA today) is cache efficient if the dot has LHS of shape [1,K]{1,0} and RHS of shape [K x N]{0,1}. This change teaches the layout assignment pass to exploit this property by converting a constant RHS matrix to a column major layout when possible. Couple of related things I had to touch in this change: - In LayoutAssignmentTest.TupleLayout we used to generate a kCopy to satisfy the conflicting constraints between the result and the constant shapes, but with this change we change the layout of the constants themselves. So the EXPECT_FALSE is now an EXPECT_TRUE. - The extra instruction layout constraints added at the end of CpuLayoutAssignment::AddBackendConstraints seemed redundant. The layout assignment pass already tries to make all unconstrained buffers have the default row-major layout. Moreover, they were blocking this optimization in some cases by introducing conflicting constraints. - The changes to literal_util.h have to be made to deal with the Literal::Relayout calls we now get on literals of various types. PiperOrigin-RevId: 168761204
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
27ce2a4ac956941ba8a0b9aaaa77acc0aa861fef |
|
07-Sep-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Rip CheckFusionNode() out of instruction, and move it into the HLO verifier instead. CheckFusionNode() is linear in the size of the fusion node, and was called once per Fuse(), leading to run-time quadratic in the fusion node's size. PiperOrigin-RevId: 167812735
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
24ddee8659c3fd7f8d2db02efef7d96de53cdbae |
|
04-Sep-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Expose component parts of HloInstruction's string representation. PiperOrigin-RevId: 167516835
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
8b20ddf3e0eedb52a7ae0f10a55658e64efc4d1a |
|
31-Aug-2017 |
David Majnemer <majnemer@google.com> |
[XLA] Sanity check the list of called computations for fusion nodes called_compuatations for a fusion node should only include the fusion computation that it calls. PiperOrigin-RevId: 167149669
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
e565d1f1fced69789feb10f1ea1241157ec95f93 |
|
30-Aug-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Refactor parent-fusion-instruction pointer into HloComputation, not HloInstruction. Presently, each instruction inside a fusion computation contains a pointer to the fusion instruction that contains the computation, which is redundant since this is common across the entire computation. This leads to lots of places where this pointer must be set when adding an instruction to the fusion computation (and bugs such as b/65177535 when one is missed), as well as code to check that it's set correctly. In addition, this is simply unnecessary data bloat. Moreover, the computation itself does not contain a pointer to the fusion instruction that references it, which leads to odd circumlocutions in the HloComputation code that retrieve the fusion instruction from the computation's root instruction. Thus, this CL moves this pointer into the HloComputation class (replacing the is_fusion_computation_ bool value), and refactor the uses as necessary. PiperOrigin-RevId: 167039280
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
797ca0d482457185f35d46cbce4c430f55b8b66a |
|
30-Aug-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Teach HloComputation::Reparent to properly handle reparenting into fusion computations. This also moves HloInstruction::CheckFusionInstruction() out of "private", and adds calls to it in the reduce-precision-insertion test to confirm that the reduce-precision-insertion pass maintains valid fusion computations. (These checks then fail without the fix to HloComputation::Reparent.) PiperOrigin-RevId: 167031741
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
5ead76420dee762a5f710fda6893075f1292d5d3 |
|
19-Aug-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Reduce XLA compile time by ~7% for a convolutional image model: * Added CompactPointerSet<T>, which is optimized for set size <= 1. * Changed expensive CHECKs to DCHECKS in buffer_assignment.cc * Reserve space in DFS state array before starting DFS. * Use unsigned arithmetic in DFS state maintenance. * HloInstruction: - Moved frequently used fields to start for better cache locality. - Use InlinedVector instead of vector for operand array. - Use InlinedVector instead of vector for DFS stack. * Pre-compute "is array" and "is tuple" for LogicalBuffer. * PointsToSet: - Combine two ShapeTrees into one. - Use CompactPointerSet instead of std::set to hold sources. - Use CompactPointerSet instead of std::set to hold flattened buffers. * ShapeTree: use unique_ptr instead of optional for shape storage (reduces size and destruction overhead). * Add proper const qualifiers to some FlatSet iterator methods. Co-author=jeff PiperOrigin-RevId: 165759117
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
7359fec792e4efec1670a12332bb524a5608b215 |
|
18-Aug-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Implement Batchnorm Inference by expanding them into smaller ops. 1. Add batch norm inference support in batchnorm_rewriter 2. Connect xla's batchnorm inference to tf's FusedBatchNorm RELNOTES: n/a PiperOrigin-RevId: 165655351
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
0dfe9d308e4a738252a53f52654fc3bbdf74d809 |
|
17-Aug-2017 |
Chris Leary <leary@google.com> |
[XLA] Some judicious inlining to speed up large compiles by a second or so. PiperOrigin-RevId: 165599564
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
9fc99811d70d4671b6f2bdabf8754ddf2d24e427 |
|
16-Aug-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Add a function that returns whether an hlo is elementwise binary. PiperOrigin-RevId: 165470975
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
822603aed3f20159f06284af5ce35efa81b95ed6 |
|
11-Aug-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Merging sibling fusion instruction using multi_output_fusion PiperOrigin-RevId: 164920220
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
fcbb00ff21f55ffede44793002d2f9d4f67c2306 |
|
10-Aug-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Clean up fused_instructions method in HloInstruction PiperOrigin-RevId: 164879220
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
b4001ea6934ce0b7de02cc52ffbde778dcd62dca |
|
09-Aug-2017 |
HyoukJoong Lee <hyouklee@google.com> |
Consider the nested computations when checking if an instruction is removable from a computation. This is to prevent DCE from removing a while instruction that includes a send/recv instruction. PiperOrigin-RevId: 164722478
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
7ac31c1a3c66a02f158f2819f730a1e7438a2327 |
|
03-Aug-2017 |
Jeffrey A. Dean <jeff@google.com> |
Assign unique ids at the HloModule level to each HloInstruction object. Use these when doing DFS over a graph in order to store the visited bits using an array of two-bit values (in the dfs_hlo_visitor.{h,cc} module), rather than a significantly larger and more expensive hash table to store this state. Ids are initially -1 and are assigned when unique names are assigned to the HloInstruction objects. Speeds up compilation of a convolutional image model by ~5.3% PiperOrigin-RevId: 164050902
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
efb7fb8e58bbf7a04ed80a2affed516cdef15e0b |
|
31-Jul-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Use XLA_VLOG_LINES() in literal_test_util to avoid truncation of large tensors. PiperOrigin-RevId: 163745522
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
78a9b95436f45438abf3e818307f707e9ae92343 |
|
26-Jul-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Finish normalizing fusion computations into standard computations PiperOrigin-RevId: 163210327
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
50cd64cbb91997a6ef701e4d5ada32f6e55e0d29 |
|
08-Jul-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Rename HloInstruction::slice_stride(int64) to HloInstruction::slide_strides(int64). This is so that the function matches the naming convention of the other slice accessors. PiperOrigin-RevId: 161272516
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
7d0f6385f8e7637e155ef9c340c19aded365a6ff |
|
07-Jul-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[BatchNorm] Skeleton code to implement BatchNormGrad This CL sets up all the boilerplate code needed to implement BatchNormGrad. None of the backends bas been implemented yet. RELNOTES: n/a PiperOrigin-RevId: 161161713
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
14c5bd7e654ce50f8d1dfbbd87499a0b2cb52b64 |
|
06-Jul-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Allow "::" in HloInstruction names again. FullyQualifiedName() used "::" as a separator, so it was important that "::" should not be used in HloInstruction names. Now that FullyQualifiedName() no longer exist, we can remove this restriction. PiperOrigin-RevId: 161070812
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
94d52acdc0087d5829f220c4d46eea67e0d30305 |
|
05-Jul-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Eliminate HloInstruction::FullyQualifiedName(). Now that HloInstruction names are unique within an HloModule, we can replace all uses of FullyQualifiedName() with simply name(). PiperOrigin-RevId: 160961583
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
a7032f21d72dd051f09b94733ed890dcd7ceaac8 |
|
04-Jul-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Clarify rules for names of HloInstructions and HloOpcodes. - The name of an HloInstruction may not contain "::", as this is used as a separator in fully qualified names. - The name of an HloOpcode may not contain ':', as this is used as a separator in extended opcode strings. PiperOrigin-RevId: 160894413
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
50b999a8336d19400ab75aea66fe46eca2f5fe0b |
|
28-Jun-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Merge changes from github. PiperOrigin-RevId: 160344052
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
1464b9930de871fd11870941963253670f737c23 |
|
27-Jun-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Add more tests for BatchNormTraining. RELNOTES: n/a PiperOrigin-RevId: 160307959
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
1fa73c53ab95693f070ce70e6be0c644d83c163a |
|
26-Jun-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Automated g4 rollback of changelist 160182040 PiperOrigin-RevId: 160190881
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
f3c89936e97c99dead1ca3310246691c1b221adf |
|
26-Jun-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Merge changes from github. END_PUBLIC Note: this CL will break builds. cl/159887762 to follow to fix all the breakages. --- Commit 2336cdf7f authored by Maxwell Paul Brickner<mbrickn@users.noreply.github.com> Committed by gunan<gunan@google.com>: Updated link to use HTTPS (#10998) Howdy! I just updated a link to use https instead of http. Thanks! --- Commit ad0892df1 authored by Luke Iwanski<luke@codeplay.com> Committed by Luke Iwanski<luke@codeplay.com>: [OpenCL] Fixes run_metadata_test for SYCL This test is designed to test CUDA specific behavior --- Commit 6b37a0725 authored by Todd Wang<toddwang@gmail.com> Committed by GitHub<noreply@github.com>: Update comments --- Commit 1699d904a authored by John Lawson<john@codeplay.com> Committed by Luke Iwanski<luke@codeplay.com>: [OpenCL] Fixes CUDA specific test run on SYCL (#56) The testBadParentValuesOnGPU should only be run on CUDA devices, as the test checks for particular CUDA behaviour. We don't actually provide a SYCL kernel for GatherTree and so it's not a problem that the tests don't target SYCL. --- Commit 3c1946230 authored by myPrecious<Moriadry@users.noreply.github.com> Committed by Shanqing Cai<cais@google.com>: Java API to get the size of specified input list of operations. (#10865) * Java API to get the size of specified input list of operations * remove unnecessary explain to avoid bring a new term to users. --- Commit e911c7480 authored by Luke Iwanski<luke@codeplay.com> Committed by Luke Iwanski<luke@codeplay.com>: [OpenCL] REGISTER -> REGISTER6 --- Commit fbf6c4cec authored by superryanguo<superryanguo@gmail.com> Committed by superryanguo<superryanguo@gmail.com>: Simplify the Quickstart section with the weblink is better --- Commit 72e2918cc authored by Taehoon Lee<taehoonlee@snu.ac.kr> Committed by Taehoon Lee<taehoonlee@snu.ac.kr>: Fix typos --- Commit 90c4406b7 authored by Rishabh Patel<patelrishabh@users.noreply.github.com> Committed by GitHub<noreply@github.com>: Correct the learning rate as per the code snippet --- Commit 03da61134 authored by Todd Wang<toddwang@gmail.com> Committed by GitHub<noreply@github.com>: Update ir_array.cc --- Commit 2df6cd3ac authored by Todd Wang<toddwang@gmail.com> Committed by GitHub<noreply@github.com>: Another try --- Commit af0cbace1 authored by Luke Iwanski<luke@codeplay.com> Committed by Benoit Steiner<benoitsteiner@users.noreply.github.com>: [OpenCL] Transpose to go through Eigen (#10321) --- Commit fc7361081 authored by Luke Iwanski<luke@codeplay.com> Committed by Benoit Steiner<benoitsteiner@users.noreply.github.com>: [OpenCL] Registers RGBToHSV and HSVToRGB (#91) (#10848) * [OpenCL] Added RGBToHSV and HSVToRGB * Aligning '\' --- Commit 832894ef8 authored by Luke Iwanski<luke@codeplay.com> Committed by Benoit Steiner<benoitsteiner@users.noreply.github.com>: [OpenCL] Registers AdjustContrastv2 (#10949) * [OpenCL] Registers AdjustContrastv2 (#93) * [OpenCL] Extended adjust_contrast_op_benchmark_test for OpenCL (#96) * [OpenCL] Extended adjust_contrast_op_benchmark_test for OpenCL * simplified to #ifndef * Changed to "#if GOOGLE_CUDA" * Update adjust_contrast_op_benchmark_test.cc * Added comments --- Commit cb4c2f8d1 authored by Yifei Feng<yifeif@google.com> Committed by Yifei Feng<yifeif@google.com>: Make TransferBufferToInFeed not virual so it compiles. --- Commit e89f04d80 authored by Yifei Feng<yifeif@google.com> Committed by Yifei Feng<yifeif@google.com>: Fix calling Literal member functions. --- Commit 15a8df724 authored by Yifei Feng<yifeif@google.com> Committed by Yifei Feng<yifeif@google.com>: Fix mac build clone from meheff's change: [XLA] Change return type of DeviceAssignment::Deserialize to fix build breakage on mac. The mac build had the following error: error: incomplete type 'xla::DeviceAssignment' used in type trait expression This was due to a static method returning a StatusOr<DeviceAssignment> inside of the definition of DeviceAssignment. --- Commit a54d43fa4 authored by Yifei Feng<yifeif@google.com> Committed by Yifei Feng<yifeif@google.com>: Replace LiteralUtil to Literal in compiler/plugin/executor --- Commit 88a6bb80c authored by Guenther Schmuelling<guschmue@microsoft.com> Committed by Guenther Schmuelling<guschmue@microsoft.com>: expand inline for debug builds to limit number of symbols --- Commit 62fb49d31 authored by Yifei Feng<yifeif@google.com> Committed by Yifei Feng<yifeif@google.com>: Fix visibility error for contrib/remote_fused_graph/pylib/BUILD. --- Commit 4c75252f2 authored by Mark Neumann<markn@allenai.org> Committed by Mark Neumann<markn@allenai.org>: fix initial test values to avoid numerical instability --- Commit b58d98353 authored by sj6077<epik03sj@gmail.com> Committed by Benoit Steiner<benoitsteiner@users.noreply.github.com>: Fixes of AutoParallel bug (#10368) * Fix the bug that auto_parallel could replicate variable snapshot name * Use NodeName in grappler:utils instead of substr, convert variables->variable_def of grappler item * remove variable_def from grappler item, exclude snapshot nodes from dont_replicate_nodes in auto_parallel --- Commit a286b7db8 authored by Yifei Feng<yifeif@google.com> Committed by Yifei Feng<yifeif@google.com>: Make debug_test slice integer. --- Commit 97fcfdfa6 authored by Toby Boyd<tobyboyd@google.com> Committed by GitHub<noreply@github.com>: Fixed path to seq2seq.py and minor formatting --- Commit 63c1befb8 authored by Anish Shah<shah.anish07@gmail.com> Committed by Anish Shah<shah.anish07@gmail.com>: Improve docs for tf.nn.depthwise_conv2d_native --- Commit 8d42202b2 authored by Yong Tang<yong.tang.github@outlook.com> Committed by Yong Tang<yong.tang.github@outlook.com>: Fix mismatched delete in mkl_tfconv_op.cc This fix fixes mismatched new[]-delete in mkl_tfconv_op.cc (the file went through clang-format so there are some additional changes) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit 26301bd55 authored by Danny Goodman<goodman.danny@gmail.com> Committed by Danny Goodman<goodman.danny@gmail.com>: fix error format --- Commit b3f33ad46 authored by Yao Zhang<yaozhang@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Make changes to prepare for the fused option of batch norm to be set to None (None means using fused batch norm if possible). PiperOrigin-RevId: 159649743 --- Commit a4a469832 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Add tests for select ops and while loops that produce tuples that contain predicates. PiperOrigin-RevId: 159645900 --- Commit 980d3f2be authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Use C API to implement Operation.name property This name property is used in many existing tests including those that already run with C API enabled (math_ops_test, framework_ops_test, session_test, session_partial_run_test, math_ops_test_gpu, etc). PiperOrigin-RevId: 159645767 --- Commit 26239c706 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Previously we didn't have an implementation of BatchNormInference and BatchNormTraining, which gives a linker error if anyone ever tries to call that. A dummy implementation is friendlier than a linker error. PiperOrigin-RevId: 159645612 --- Commit f671c5caa authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: BEGIN_PUBLIC Automated g4 rollback of changelist 159570549 PiperOrigin-RevId: 160182040
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
b1c56cc5d971c74062d140a1c5ce98afaa085402 |
|
22-Jun-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Make HloModule clonable This CL makes HloModule clonable, which is necessary when we want to run the same compilation twice with the same input. PiperOrigin-RevId: 159874256
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
3b41352a3177c2fe8a1329e8981b285bb6aacf8b |
|
19-Jun-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA:CPU] Thread-parallel CPU backend (work in progress). *) Partitions HLO instructions along outer dimensions, based on simple cost model. *) Emits loop nests with dynamic outer loop bounds (for partitions), leaves inner loop bounds static (for optimizations). *) Dispatches parallel tasks on thread pool for execution. Simple element-wise fusion benchmark: CPU: Intel Sandybridge with HyperThreading (16 cores) dL1:32KB dL2:256KB dL3:20MB Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------------------- BM_ParallelFusion/T1 16821490 16740939 100 237.791MB/s BM_ParallelFusion/T2 9175467 17826232 100 435.945MB/s BM_ParallelFusion/T4 5106019 18875761 100 783.389MB/s BM_ParallelFusion/T8 2833598 19624622 233 1.379GB/s BM_ParallelFusion/T16 1995259 26541594 344 1.958GB/s Performance on some select model benchmarks (more work needed is needed here, but wanted to get this CL in and iterate). Benchmark runs with 16 threads and wall time reported in seconds. InceptionResnetV2.inception_resnet_v2_200x200x20x1000_inference_xla_cpu wall_time(old): 7.97818803787 wall_time(new): 4.328297019 InceptionV3.inception_v3_200x200x20x1000_inference_xla_cpu wall_time(old): 2.96792650223 wall_time(new): 1.21296644211 InceptionResnetV2.inception_resnet_v2_200x200x20x1000_training_xla_cpu wall_time(old): 42.0342495441 wall_time(new): 17.9182584286 InceptionV3.inception_v3_200x200x20x1000_training_xla_cpu wall_time(old): 6.99778497219 wall_time(new): 3.95318603516 BenchmarkRNN.rnn_basic_lstm_64x512_4x20_xla_cpu_forward wall_time(old): 11.869822979 wall_time(new): 7.89778208733 BenchmarkRNN.rnn_basic_lstm_64x512_4x20_xla_cpu_forward_backward wall_time(old): 38.1911079884 wall_time(new): 29.8181960583 PiperOrigin-RevId: 159474444
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
9d2a432ce74eab4c439fe8c60389e4da9d6c92b2 |
|
17-Jun-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Add plumbing for a ReducePrecision operation. This CL is the first part of a series that adds a ReducePrecision operation for experimenting with the effects of reduced-precision storage of intermediate values. ReducePrecision is a Unary operation parameterized on floating-point exponent and mantissa bit sizes, and rounds the input data as if it were converted to a floating-point value with the given bit sizes and then converted back to "normal" F32 data. Using arbitrary parameterized values to describe the lower-precision value type, rather than hardcoding this as a reduction to IEEE f16, allows us to do more flexible experiments -- e.g., "Is this training error due to the reduced mantissa precision, or due to the reduced exponent range?" or "Is this a smooth degradation with reduced precision or is there a sudden drop at some value?" -- which may suggest software mitigations for the effects. This version of the CL adds the kReducePrecision instruction opcode, and the overall plumbing to support the operation. To allow testing, it includes an exceptionally simple implementation of the actual operation that returns "unimplemented" except for the exponent and mantissa bit sizes where it is a complete no-op. PiperOrigin-RevId: 159295615
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
a7c36173cabcc1289a836e8143accb5f0914b19a |
|
14-Jun-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Use a non-recursive DFS in HloInstruction::Accept to avoid stack overflow on deep graphs Even with this fix, we don't finish compiling the exact test case from b/38494745 in a reasonable amount of time (we spend a lot of time inside HloInstruction::FusionReusesParamElements::ComputeInternal, for instance), so I've used a smaller graph depth for now to avoid timing out the test. PiperOrigin-RevId: 159026595
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
b6039c875290cdd5c9a62e01393b75b928827504 |
|
14-Jun-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
We believe a fused version of batch_norm_op can speed the algorithm up. This pr implements a new op: fused_batch_norm_op in tf-xla and HLO. This is the CPU implementation for batch norm training. This CL is big but a lot of code are boilerplate. PiperOrigin-RevId: 158930166
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
94085bee74557f34fd7ad3bef969eecf6c8c4f4e |
|
07-Jun-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Replace std::function object with regular function. The function is called recursively, and the std::function object had only existed to allow recursion from within a lambda expression. A regular function should be cheaper than a polymorphic function wrapper. PiperOrigin-RevId: 158292415
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
02ac85399d4fb35d5055ecf426632b9446a70041 |
|
01-Jun-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Introduce new class Literal to replace protobuf Literal. This renames the existing Literal message to LiteralProto and introduces a new C++ class named Literal to replace it. The LiteralProto is only used at RPC boundaries, or when protobuf-specific functionality is required. The Literal class offers a 'ToProto' function to generate a new LiteralProto message when necessary. Currently, all the static functions in class LiteralUtil, just forward to their counterparts in class Literal. This will change in a future CL. Class Literal implements all the buffers as std::vectors. The only exception is preds(), which given the std::vector<bool> representation, makes it unusable for the semantics we require (it's not possible to get the address of the underlying vector, for instance). The CL adds a BoolVector class to work around that issue. In future CLs, the std::vector representation may be changed to something more efficient, if needed. PiperOrigin-RevId: 157739125
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
1f9529b8dc1868a980297b7843d2fbae97062179 |
|
27-May-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Allow of the tuple instruction as the fusion root. PiperOrigin-RevId: 157274264
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
3e767e9db0e0a00a509354ec18462841ea4d40f2 |
|
26-May-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Add debug protos that serialize HLO graph information. Also add flags to dump this data in JSON format, for each backend. This is useful for upcoming debugging tools. PiperOrigin-RevId: 157178357
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
09f3fb939c9b395a9bc747cf81d15b2dc2804c3e |
|
08-May-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Merged commit includes the following changes: 155425029 by A. Unique TensorFlower <gardener@tensorflow.org>: Internal change. -- 155424167 by A. Unique TensorFlower <gardener@tensorflow.org>: Internal change. -- PiperOrigin-RevId: 155425029
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
965d620104d375c5fd2b18881f353eb41d9a63a2 |
|
04-May-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Internal change. Change: 155009390
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
74770749840e1c823a50b743a50637afc3529e3c |
|
29-Apr-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Make ReshapeMover account for broadcast operands, add VLOGging for debug. Change: 154637127
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
7be3d9ee95e59f28126e3b9370fba738253fdc3b |
|
27-Apr-2017 |
Mark Heffernan <meheff@google.com> |
[XLA] Various HLO naming fixes. This change includes a number of fixes to HLO instruction especially fusion instructions. Specific changes: (1) Remove HloInstruction::set_name and HloComputation::set_name. These methods were a bit dangerous as it made easy to create non-unique HLO names. Replace it with UniquifyName which renames the object to a unique name based on its current name. (2) Change the name of the fusion computations to "fused_computation". Previously it was named after the root. (3) Change naming of fusion parameters. They are now named after the unfused-instructions whose values they represent. Also, previously superfluous ".1", ".2", etc, could be added to the parameter names. This change fixes that. (4) Change naming of instructions in fusion computations to be identical to the instructions they were cloned from. Previously all fused instructions would end up having a .clone suffix. (4) If HloInstruction::Clone() is called with an empty suffix, then don't add a "." to the name. Change: 154454938
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
fc197e6c77e336700a22e04df2b1f20e0fc72fd5 |
|
24-Apr-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA:HLO] Reduce copy and get-tuple-element instructions added by CopyInsertion. CopyInsertion adds copies to while init operand elements to maintain correctness for in-place while loops; e.g. if an element is updated in the loop, it must be copied before entering the loop to avoid corrupting the state of other users of the same buffer. However these copies are unnecessary if the element is read-only in the while body. That is the general idea behind this CL; to remove copies of read-only elements. But there are some details. E.g. if any of these read-only elements are entry parameters, they still must be copied (at least once). The problem here is that entry parameter buffers are managed by the caller, and cannot (currently) share the same allocation with other buffers. We add an optimization such that if the same entry parameter is used by multiple while loops in a read-only fashion, it is only copied once. Also, the way the original code was adding the copies was sub-optimal. We'd end up with this type of accordian pattern: tuple -> (gte, gte, gte) -> tuple This CL also removes many of the extra gte+tuple ops. Change: 154082222
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
c0088ae3d2541d8e00fc238377dd802a811624f3 |
|
20-Apr-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Fix the parameter instruction printing issue Append the parameter number to the fusion parameter name, and use the parameter name rather the instruction name in creating the new parameter. Show the paramameter number when printing out parameter instructions. Change: 153752424
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
817533db9b17b5456b85ba9187df7262c2c9c453 |
|
20-Apr-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Fix incorrect comments Change: 153737501
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
33fd4134234170745f989e2cdd73c8ca8709d926 |
|
17-Apr-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Represent fusion instructions as a HloComputation Using a HloComputation to represent the HloInstructions inside a fusion instruction. All the interfaces are kept the same except for the parent field of the fusion instruction. It now points to the newly created HloComputation rather the enclosing computation for the fusion instruction. Change: 153390245
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
f7e1a723aa40e29ac7a887e481e3f183e1b38ff8 |
|
15-Apr-2017 |
David Majnemer <majnemer@google.com> |
Internal-only changes. Change: 153238377
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
6c42f0a1a80226692a9f37ff50e0e1356951e86c |
|
28-Mar-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Add HLO verifier that checks HLO instruction's parent computation. Change: 151494158
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
060e002e70e1abf04144a107fde939bda4051ac5 |
|
24-Mar-2017 |
Mark Heffernan <meheff@google.com> |
[XLA] Rematerialize subcomputations. Extend HLO rematerialization to rematerialize subcomputations in addition to the entry computations. Outer nesting levels of computations are rematerialized before inner nesting levels because inner subcomputations may be while bodies where rematerialization is more expensive. Also Also fix latent bug in call_graph dealing with fusion instructions, and extend HloInstruction::Clone to accept a string suffix (eg, "remat") for the clone name. Change: 151179956
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
1258b206590d9460f87f0aaab0c9f9ccba3b1bfe |
|
16-Mar-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Refactor convolution dimension numbers and windows dumping code and remove duplicate code in hlo_graph_dumper Change: 150324515
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
fc112a6b53d782eacb46eb357a8720d6b5a5d3cc |
|
11-Mar-2017 |
Mark Heffernan <meheff@google.com> |
[XLA] Replace uses of std::set with std::vector. std::set is slow and the iteration order is unstable. A couple other opportunistic changes include consolidating all called computations of an instruction in a single vector. This faciliates fast access to all called computations. Also, replace AddControlSuccessor/Predecessor with Add/RemoveControlDepedencyTo which is less error prone as you can't create a half connected control edge. Change: 149810889
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
00d0347ccebc3e29ffe541703b5a2f929b89da36 |
|
10-Mar-2017 |
Brennan Saeta <saeta@google.com> |
[TF:XLA] Add debug metadata to HLO ops. In order to support end-to-end debugging and performance profiling tooling for the TensorFlow::XLA toolchain, this change adds a DebugMetadata proto to the HloInstruction class, and pipes it through the tf2xla stack. Change: 149703349
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
379560be32c3910593e94aa6e91277fc3df3fc98 |
|
02-Mar-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[TF:XLA] Reduce sequential memory usage via better ordering and simulated heap. The choice of instruction ordering, and the minimization of fragmentation once we've chosen an order, are two large inter-related factors wrt overall memory usage. The approach in this CL uses heuristics to do better on both, but neither problem is completely solved. To pick a better an ordering (the larger factor), the approach is to try the original list-scheduler based ordering, and to also try a DFS based ordering. We pick the ordering that yields a smaller minimum memory, computed with the simulated heap, ignoring fragmentation. Note that this is the absolute minimum memory for a given ordering. To minimize fragmentation, the approach is to run a heap simulation on temporary buffers. We still try to re-use existing allocations when possible, but instead of creating new allocations for temp buffers, we collect all the leftovers and use a heap to pack them. The heap algorithm that gave the best results is "lazy best-fit"; a variant of traditional best-fit that sometimes delays offset assignment until Free is called, in the hopes of yielding larger free chunks. Here's some measurements of the temp buffer sizes for GNMT encoder training (a stacked LSTM). Lower is better. I've tried various combinations of instruction ordering and heap simulation, to show the joint impact of these two factors. List-scheduler order, no heap simulation 33.33GiB List-scheduler order, with heap simulation 25.09GiB Minimized DFS order, no heap simulation 16.59GiB Arbitrary DFS order, no heap simulation 15.05GiB (old) Arbitrary DFS order, with heap simulation 12.57GiB Minimized DFS order, with heap simulation 11.71GiB (new) Note that the original list scheduler order is much worse than DFS on stacked LSTMs, but (not shown here) is much better than DFS on convolutions like Inception. Also note that heap simulation packs things tighter for all instruction orders in this example, but to varying degrees. Change: 149049028
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
efc8f98d45df835bac2373e19f1da57e3a1ea2d0 |
|
28-Feb-2017 |
Jacques Pienaar <jpienaar@google.com> |
[XLA] Add basic outfeed support. Change: 148699787
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
ad12d7233da6f6b034ad409e0bd1ca0fd201e68d |
|
14-Feb-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Change BufferLiveness to allow operand buffer sharing for fused DynamicUpdateSlice instructions. Change: 147506969
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
fd3e50d2a060cde47c3aac75f200946e3d916b16 |
|
13-Feb-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Change PointsToAnalysis to (optionally) include loop fusion instructions. Change: 147341541
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
6c4077255fa4e6ae4e3e45122035f891ae803246 |
|
28-Jan-2017 |
Bjarke Hammersholt Roune <broune@google.com> |
Add sum-across-opcodes report for HLO profiling. Change: 145863928
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
9113e98115ecbeb1404edb7d14d2cf443f2484bf |
|
27-Jan-2017 |
Tayo Oguntebi <tayo@google.com> |
Addition of Outfeed HLO op. Change: 145772331
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
239493a6825f33c96d64b6a36be6616fbb41e42b |
|
25-Jan-2017 |
Mark Heffernan <meheff@google.com> |
Break out HloOrdering classes into separate files. Add CreateMemoryMinimizingSequence which constructs a sequence of the instructions in an HLO module that heuristically minimizes the total size of live buffers containing HLO outputs. Change: 145599747
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
863bab34202f650282dfe00aaa082a4796fdd839 |
|
24-Jan-2017 |
Bjarke Hammersholt Roune <broune@google.com> |
Improvements to HLO text format printing. Change: 145374835
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
4fe280c59a71e85b73e9947063147743adf2ff2b |
|
21-Jan-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Added optional string argument to infeed HLO op. Change: 145188452
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
2abeb0f1b68a73ce54e9c90459e37de581117d45 |
|
19-Jan-2017 |
HyoukJoong Lee <hyouklee@google.com> |
Add control successors to HloInstruction. Add Send/Recv cases for the ConstantVisitor in UserComputation. Change: 145001882
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
6149fc54f4567cb00619fe085a691e3513343faf |
|
19-Jan-2017 |
Mark Heffernan <meheff@google.com> |
Don't remove send, recv, or trace instructions in DCE pass. Also, opportunistically change HloComputation::Remove*, HloComputation::Replace*, and HloInstruction::Replace* to return Status. An error Status is returned if one of those instruction types (send, etc) is removed. Change: 144978731
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
0a876b717288eb2bd5ef0d39d6f91969e491b0c2 |
|
18-Jan-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Internal-only change. Change: 144835065
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
d8723bc9ce98053380aae77c91ff63ad6dbb4916 |
|
17-Jan-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
[XLA] Privatize `HloInstruction::CreateNary`, NFC This method is only used in other `HloInstruction` creator methods. Change: 144737654
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|
1e67c90e2caceeff82d09793d1ef5fa0300d219b |
|
09-Jan-2017 |
Peter Hawkins <phawkins@google.com> |
Initial open-source release of XLA: Accelerated Linear Algebra. XLA is a compiler-based linear algebra execution engine that targets CPUs, GPUs and custom accelerators. XLA is still experimental; we are releasing it early to get the community involved. Change: 143990941
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
|