History log of /external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
ed2d6181ce9df646c8b7150fac754a796ccc2f88 20-Feb-2018 Patrick Nguyen <drpng@google.com> Merge commit for internal changes

Conflicts:
RELEASE.md
configure.py
tensorflow/contrib/cmake/external/zlib.cmake
tensorflow/contrib/cmake/python_modules.txt
tensorflow/contrib/cmake/tests/cuda/compatibility_test.c
tensorflow/contrib/cmake/tests/cuda/compatibility_test.cc
tensorflow/contrib/data/python/ops/dataset_ops.py
tensorflow/contrib/gan/python/eval/python/summaries_test.py
tensorflow/contrib/layers/python/layers/layers.py
tensorflow/contrib/layers/python/layers/layers_test.py
tensorflow/contrib/tpu/profiler/pip_package/setup.py
tensorflow/core/public/version.h
tensorflow/docs_src/install/install_c.md
tensorflow/docs_src/install/install_go.md
tensorflow/docs_src/install/install_java.md
tensorflow/docs_src/install/install_linux.md
tensorflow/docs_src/install/install_mac.md
tensorflow/docs_src/install/install_sources.md
tensorflow/examples/image_retraining/retrain.py
tensorflow/python/framework/test_util.py
tensorflow/python/keras/_impl/keras/layers/lstm_test.py
tensorflow/python/layers/utils.py
tensorflow/python/ops/bitwise_ops_test.py
tensorflow/python/ops/distributions/beta.py
tensorflow/python/ops/image_ops_test.py
tensorflow/python/ops/losses/losses_impl.py
tensorflow/tools/pip_package/setup.py
ba019dc689d6393d8dba04ca57e8b01b374db14f 17-Feb-2018 Sanjoy Das <sanjoy@google.com> [XLA] Add some plumbing, documentation, verification and shape inference for Gather

Pretty much everything other than HLO verification and shape inference will fail
for Gather with Unimplemented.

Note that this CL is intentionally incomplete -- I figured it would be nicer to
get some of the boiler-platey stuff out of the way early. Let me know if you
want me to send in a larger but more complete CL instead.

PiperOrigin-RevId: 186055521
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
13df417665f216bfb527440f1fd8f04958000ec5 16-Feb-2018 A. Unique TensorFlower <gardener@tensorflow.org> [TF:XLA] Adds HostCompute HLO - a pseudo-op to represent host-side computation.

PiperOrigin-RevId: 186047964
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
96c2a846609d3a68f9a88c60c4c68a243f74ee44 16-Feb-2018 Bjarke Hammersholt Roune <broune@google.com> Add TODOs.

PiperOrigin-RevId: 186032527
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
8dfaa05d2824290b33eb922a5269f0772f53478e 16-Feb-2018 David Majnemer <majnemer@google.com> [XLA] Factor out the code which adds operands to a fusion node

This makes it easier for Hlo passes to do interesting rewrites with new,
additional parameters which were not operands to the original fusion node.

PiperOrigin-RevId: 186024182
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
c2a3935e2bd27d0befa7db5f9c050cfec057e5bb 11-Feb-2018 Loo Rong Jie <loorongjie@gmail.com> [MSVC] Use explicit func pointer to static method instead of lambda func
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
5476489053f0523b8aebab05bc39a02c089300e0 06-Feb-2018 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Sink layout sensitivity from CSE into HloInstruction::Identical, and make it the default.

PiperOrigin-RevId: 184598903
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
2b932c28d60ec4f248d950cd2ef69a7eb98bb66d 03-Feb-2018 Justin Lebar <jlebar@google.com> [XLA] Minor cleanups related to multi-output fusion.

- Add some comments about preexisting invariants, and add some CHECKs.

- In the LoopEmitter constructor, materialize the given
ArraySlice<IrArray> to a vector, so we don't rely on the given
ArraySlice having any particular lifetime.

- Add the invariant that the LoopEmitter constructor which takes a
list of IrArrays is only for multi-output fusion. Previously it said:
If you only pass one array, then treat it as regular fusion. But this
results in an LLVM type mismatch, because the given
target_element_generator should be passing a struct with one element.

PiperOrigin-RevId: 184365310
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
3be90a490c31d5a8fad70713e059bbb3e723e664 02-Feb-2018 Justin Lebar <jlebar@google.com> Internal change

PiperOrigin-RevId: 184239740
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
a58524fa602829459aa7eb0335a33afe1f28382a 19-Jan-2018 Chris Leary <leary@google.com> [XLA] Simplify trivial pad/reduce-window combos into broadcasts.

PiperOrigin-RevId: 182585236
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
7eba57baec4442640f11059caecfc10898966e00 11-Jan-2018 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Make HLO CSE faster.

Passing around copies of std::functions incurs heap allocations and deallocations, which, unfortunately, matters in this case. Minimize the amount of copies.

PiperOrigin-RevId: 181625079
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
4f877d4e54bb2427882f4a800607a1cf0531b293 05-Jan-2018 A. Unique TensorFlower <gardener@tensorflow.org> Make HLO device placer more stable as far as created partitions goes.
Also remove the multi-module input capability for the device placer.

PiperOrigin-RevId: 180871703
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
5bf26acd87d3d44183fc28cb9576cda10c0255ca 02-Jan-2018 A. Unique TensorFlower <gardener@tensorflow.org> Automated g4 rollback of changelist 180000981

PiperOrigin-RevId: 180581912
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
c0c2775ce3de682f7913d1aeaf50bbc4d1521934 23-Dec-2017 A. Unique TensorFlower <gardener@tensorflow.org> Automated g4 rollback of changelist 179983419

PiperOrigin-RevId: 180000981
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
7d1072dd3374a0aa22637a0fd4a17a4ddd064110 23-Dec-2017 A. Unique TensorFlower <gardener@tensorflow.org> Adds FFT for XLA: CPU via Eigen, GPU via cuFFT.

GPU support includes plan reuse with new scratch allocator per execution in fft_thunk.

PiperOrigin-RevId: 179983419
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
210076afd0eb5a4c4f7f54bb079b75f92b087b3f 23-Dec-2017 A. Unique TensorFlower <gardener@tensorflow.org> Output of a slice op can alias its operand.

PiperOrigin-RevId: 179969317
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
b2aa6950db67ab980012c05d496401200ad60320 22-Dec-2017 Justin Lebar <jlebar@google.com> [XLA] Print out missing extra-info for many instructions in the HLO graph dumper.

Now we use the same functionality as HloInstruction::ToString() to print
instructions' extra info. This fills in a lot of previously-missing
info, like reduce-windows' windows, and dots' dot-dimension-numbers.

PiperOrigin-RevId: 179892469
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
d2355fcee9f47cc2e8225f8ff54f7c12fa8045f0 18-Dec-2017 Justin Lebar <jlebar@google.com> [XLA] Fix comments on HloInstruction::epsilon() and HloInstruction::feature_index().

These functions can be called for kBatchNorm{Training,Inference,Grad},
not just kBatchNormTraining.

No functional change.

PiperOrigin-RevId: 179363059
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
f806269602219d5095265d036f294cc9a6260971 15-Dec-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Remove '%' when printing the hlo text in short parsable mode.

PiperOrigin-RevId: 179138523
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
d57ab2c4a7cd13e47f942aaff495912fdc96f84a 15-Dec-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Allow omitting operands shapes and program shapes.

PiperOrigin-RevId: 179132435
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
b03f0e408710c5a92b87d748360b03c6cb60760d 15-Dec-2017 Justin Lebar <jlebar@google.com> [XLA] Express the default options to ToString using an overload, rather than a default param.

No functional change.

The motivation for this is that GDB ignores default params, but resolves
overloads just fine.

PiperOrigin-RevId: 179125588
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
0ea2d74f883914109eb154bcf2a7d61ae0557f2d 15-Dec-2017 Justin Lebar <jlebar@google.com> [XLA] Remove the notion of a "parameter name" separate from the instruction's name.

Also set the instruction's name in the HLO parser, so that after
parsing, the instructions have the names they're given in the input
string.

PiperOrigin-RevId: 179119003
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
a99b32fb149d028cd31fe638f81c6ca56c6e3b57 14-Dec-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Gather the bool parameters into one thing to control the text format.

PiperOrigin-RevId: 179079727
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
2adbc217b3eeed329d077050e0f1f7d88edd86d7 12-Dec-2017 Sanjoy Das <sanjoy@google.com> [XLA:CPU] Teach the CPU layout assignment about dot dimension numbers

There is no great need for this yet, but I noticed that the test cases were
broken (they were constructing dots with unset dimension numbers), and one thing
led to another.

PiperOrigin-RevId: 178713597
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
fe8406149feec453250905965a14285465cd2063 07-Dec-2017 Shanqing Cai <cais@google.com> Merge changes from github.

PiperOrigin-RevId: 178185697
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
22767d59b3c6958ed690814ff77e29ee1d458b18 06-Dec-2017 Bjarke Hammersholt Roune <broune@google.com> Allow CrossReplicaSum to take multiple operands internally.

PiperOrigin-RevId: 178043362
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
6eec9c2ea33f3b86012cb0ea2aeb9e49e65bc716 01-Dec-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Hlo parser: support rng and reduce-precision. Also simplify the lexer by regarding several things as identifier.

PiperOrigin-RevId: 177548483
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
4146ff1259c0b4ada8afbbad11a7b37d8373d1b9 30-Nov-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Adds Dot with DotDimensionNumbers proto for specifying arbitrary contracting and batch dimensions.

PiperOrigin-RevId: 177481231
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
b1d8c59e9b014b527fb2fbef9ce9afc14dbc4938 22-Nov-2017 Yifei Feng <yifeif@google.com> Merge changes from github.

PiperOrigin-RevId: 176695926
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
e70c00950d295c519fd9c7f8b12e13a3c5aaf710 22-Nov-2017 Yifei Feng <yifeif@google.com> Automated g4 rollback of changelist 176615107

PiperOrigin-RevId: 176622438
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
c6d603f02e1a98f871912cda6716cdcbed6b439e 22-Nov-2017 Yifei Feng <yifeif@google.com> Merge changes from github.

PiperOrigin-RevId: 176615107
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
ef3ee202659a2a49afcd9898451bf9b1256a2757 22-Nov-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Add BitcastConvert HLO op to enable bitwise operations on
floating point types.

PiperOrigin-RevId: 176610007
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
8547044d4dacaa0d6001578634a44b488dd23601 18-Nov-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Add conditional HloInstruction and handle conditional in DFS visitors.

PiperOrigin-RevId: 176175297
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
55ee41a98d50e200eda314ebf08f092000477f6e 17-Nov-2017 Mark Heffernan <meheff@google.com> When constructing fusion computations from a proto, do not uniquify the names. The names are already unique and uniquifying them again will mutate them resulting in inconsistent names between the proto and the constructed HLO.

PiperOrigin-RevId: 176035108
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
f9e3e8d8731daf338b6dc743aef84c35740ca037 14-Nov-2017 A. Unique TensorFlower <gardener@tensorflow.org> Hlo parser: support fusion.

Also,
- Add a HloInstruction::CreateFusion interface that creates a fusion instruction with given fusion computation. Add a HloComputation::SetFusionInstruction interface to help do that.
- Change how we print fusion kind. Before this change we print fusion kind together with the opcode, e.g., fusion:kLoop, which is not easy to parse. Now we append fusion kind as an attribute.
- Print fusion computation the same way as other computations, instead of nested in an instruction.

PiperOrigin-RevId: 175621768
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
9c6eebabc71906d240338adc89fa838bd5635aa0 11-Nov-2017 HyoukJoong Lee <hyouklee@google.com> Add comment on HloPtrComparator.

PiperOrigin-RevId: 175370054
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
61aebf140e12e2ad834dc94a83f23fc574c79340 11-Nov-2017 A. Unique TensorFlower <gardener@tensorflow.org> Hlo parser: support metadata.

Also give metadata it's own format.

PiperOrigin-RevId: 175356154
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
8614ef614245cfcfdd09bda0d633d5aa4f6e856e 10-Nov-2017 A. Unique TensorFlower <gardener@tensorflow.org> Extend the Array class with more functionality.

PiperOrigin-RevId: 175277161
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
8392a4b8e9d6d7ccbfde15dcdda0477c2791b6dc 10-Nov-2017 A. Unique TensorFlower <gardener@tensorflow.org> Hlo parser: support padding.

Also, give PaddingConfig its own ToString format.

PiperOrigin-RevId: 175239832
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
51895becce83ef4dc8bac263377d158fc50e4d53 09-Nov-2017 HyoukJoong Lee <hyouklee@google.com> Change for asynchronous Send and Recv by splitting Send into {Send, SendDone}
and Recv into {Recv, RecvDone}. See operation_semantics.md for the updated
semantics.

PiperOrigin-RevId: 175216012
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
456929281592f14d50443cfbdaa2f6b36167a134 03-Nov-2017 Mark Heffernan <meheff@google.com> Rollback copy insertion change because it results in a DCHECK with an internal model.
END_PUBLIC

BEGIN_PUBLIC
Automated g4 rollback of changelist 174423881

PiperOrigin-RevId: 174505237
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
96c415ad77c20e1cf2da5e61f85e24fd6c36eb28 03-Nov-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Use maps with a deterministic iteration order for HloInstruction*.

Convert a bunch of std::maps with HloInstruction* and const HloInstruction* keys to use a comparator that is based on the unique_id of the instruction rather than the pointer value.

PiperOrigin-RevId: 174474868
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
7bb2d57b0b051d1cf8dd74d3276bf5a452774172 03-Nov-2017 Mark Heffernan <meheff@google.com> Rewrite CopyInsertion to use module-scoped HloAliasAnalysis. The net effect (number of copies inserted) is roughly similar to the existing implementation, but the new implementation is much more general. The new implementation can handle entry argument buffer reuse with minimal modification, for example.

Some unnecessary copies are still added due to deficiencies in buffer assignment (b/62548313), but these can be removed when buffer assignment also uses HloAliasAnalysis.

Also address a few issues uncovered with this cl:

(1) For inplace dynamic slice in llvm backends, truncate do not wrap the slice. This matches the behavior of the non-inplace variant.

(2) Disable SelectBetweenPredTuples test on GPU. The test introduces top-level buffer ambiguity which is not tolerated by the gpu backend.

(3) When deserializing HLO form a proto, do not uniquify instruction names in fused computations.

(4) In dataflow analysis, don't deallocate deleted HloValues during propagation.

(5) In dataflow analysis, fix issue with live_out_of_computation property.

PiperOrigin-RevId: 174423881
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
274e9ed51ea6cc09a0b5fc1cee4756ac0e9aa525 03-Nov-2017 A. Unique TensorFlower <gardener@tensorflow.org> [TF:XLA] Add a const HLO visitor.

Use it in the HLO cost analysis pass.

PiperOrigin-RevId: 174411043
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
53a4fcbdbad571e659203733f6a07ba82651d40b 02-Nov-2017 A. Unique TensorFlower <gardener@tensorflow.org> Fixed HloComputation/HloInstruction clone to allow deep clone, and avoid the cloned instruction and computations to still have live link to their parent original modules and computations.

PiperOrigin-RevId: 174271432
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
3b0414872f08cfabbf71a495ad661a7c892c76d8 02-Nov-2017 Chris Leary <leary@google.com> [XLA] Allow full dumps of constant values via boolean parameter.

PiperOrigin-RevId: 174257660
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
efcbf6e34e4519172d38be76c08c2d99792fd7be 30-Oct-2017 A. Unique TensorFlower <gardener@tensorflow.org> Supported in this CL:
* Attaching sharding descriptors to HLO ops
* Partitioning the HLO graph into per-device computations based on those sharding descriptors.
* All operator support for device placement and ops replicated on all devices.
* Elementwise op support for tiled shardings.
* 2D Convolution support for tiled shardings (no stride or dilation support).

PiperOrigin-RevId: 173946036
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
5dd569cf026bae92330a194c8f2895d0f48149d9 14-Oct-2017 Mark Heffernan <meheff@google.com> Make the HLO proto representation (hlo.proto) full fidelity. Hlo modules can be serialized to HLO protos and deserialized without any information loss.

As part of this change, a bug is fixed in NameUniquer. Previously, passing names with numeric suffixes could result in name collisions.

PiperOrigin-RevId: 172161360
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
403e51018b3c47cd5989d6b50776e235221fade4 10-Oct-2017 Justin Lebar <jlebar@google.com> [XLA] Factor out repeated LatestNonGteAncestorAndIndex helper.

PiperOrigin-RevId: 171620470
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
76db7553ab2998116a62d6c242aa39373a362993 29-Sep-2017 Chris Leary <leary@google.com> [XLA] Make it possible to inline calls to side-effecting computations.

PiperOrigin-RevId: 170515496
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
c7d4e4bf9cdc9aa29de6e6c3d97e4a1c4f2f25d9 29-Sep-2017 Sanjoy Das <sanjoy@google.com> Automated g4 rollback of changelist 170435356

PiperOrigin-RevId: 170507630
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
9b1b5d85b9ce3c812dc772da1f3f5d09581e5b49 29-Sep-2017 Justin Lebar <jlebar@google.com> [XLA] Make HloComputation::instructions() return a view of HloInstruction*s.

Currently it returns a view of unique_ptr<HloInstruction>s. But the
fact that these are unique_ptrs is an implementation detail, and it's
ugly to leak it everywhere.

PiperOrigin-RevId: 170445375
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
872917e78f7628c00f93162c70d74e8b659e0123 29-Sep-2017 Sanjoy Das <sanjoy@google.com> Automated g4 rollback of changelist 170430143

PiperOrigin-RevId: 170435356
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
bda0dde93049505b113aa78f3291f47546fd9265 29-Sep-2017 Sanjoy Das <sanjoy@google.com> Avoid creating fusions that reuse their inputs.

We generally avoid creating such fusions, but it looks like we missed the case
where elementwise operations implicitly broadcast their inputs.

PiperOrigin-RevId: 170430143
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
f972d800ca3accc9af0ad5b9dcabbc5d9b125ab5 28-Sep-2017 Justin Lebar <jlebar@google.com> [XLA] Replace HloComputation::ReplaceUsesOfInstruction with HloInstruction::ReplaceAllUsesWith.

RAUW used to be *almost* synonymous with RUOI, except RAUW didn't update
the computation's root. This was a dangerous footgun -- if you
accidentally called RAUW when you wanted RUOI (which you almost always
did), your code would work perfectly, except when the relevant node
happened to be the root of a computation.

This change simplifies our APIs so there's just one Right Way To Do It,
by making RAUW update the computation.

PiperOrigin-RevId: 170290230
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
5cac28c41af785532e90101787cf85545cdac410 27-Sep-2017 Justin Lebar <jlebar@google.com> [XLA] Add HloEvaluator::EvaluateWithSubstitutions().

This evaluates an HLO, using a given map of literals to determine the
values of some of its operands.

PiperOrigin-RevId: 170215954
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
85c4a379985b46930ece49edc4347af628ee2928 24-Sep-2017 Peter Hawkins <phawkins@google.com> [XLA] Adds an API to attach a device assignment to HLO operators.

PiperOrigin-RevId: 169841868
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
c82a933f449e637ee83244d2c40162e24cdde0e1 15-Sep-2017 Sanjoy Das <sanjoy@google.com> Lower vector-matrix dot to LLVM IR if the RHS of the dot can be made
column major.

The naive dot lowering to LLVM IR (already present in XLA today) is
cache efficient if the dot has LHS of shape [1,K]{1,0} and RHS of
shape [K x N]{0,1}. This change teaches the layout assignment pass to
exploit this property by converting a constant RHS matrix to a column
major layout when possible.

Couple of related things I had to touch in this change:

- In LayoutAssignmentTest.TupleLayout we used to generate a kCopy to satisfy
the conflicting constraints between the result and the constant shapes, but
with this change we change the layout of the constants themselves. So the
EXPECT_FALSE is now an EXPECT_TRUE.

- The extra instruction layout constraints added at the end of
CpuLayoutAssignment::AddBackendConstraints seemed redundant. The layout
assignment pass already tries to make all unconstrained buffers have the
default row-major layout. Moreover, they were blocking this optimization in
some cases by introducing conflicting constraints.

- The changes to literal_util.h have to be made to deal with the
Literal::Relayout calls we now get on literals of various types.

PiperOrigin-RevId: 168761204
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
27ce2a4ac956941ba8a0b9aaaa77acc0aa861fef 07-Sep-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Rip CheckFusionNode() out of instruction, and move it into the HLO verifier instead.

CheckFusionNode() is linear in the size of the fusion node, and was called once per Fuse(), leading to run-time quadratic in the fusion node's size.

PiperOrigin-RevId: 167812735
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
24ddee8659c3fd7f8d2db02efef7d96de53cdbae 04-Sep-2017 A. Unique TensorFlower <gardener@tensorflow.org> Expose component parts of HloInstruction's string representation.

PiperOrigin-RevId: 167516835
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
8b20ddf3e0eedb52a7ae0f10a55658e64efc4d1a 31-Aug-2017 David Majnemer <majnemer@google.com> [XLA] Sanity check the list of called computations for fusion nodes

called_compuatations for a fusion node should only include the fusion
computation that it calls.

PiperOrigin-RevId: 167149669
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
e565d1f1fced69789feb10f1ea1241157ec95f93 30-Aug-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Refactor parent-fusion-instruction pointer into HloComputation, not HloInstruction.

Presently, each instruction inside a fusion computation contains a pointer to the fusion instruction that contains the computation, which is redundant since this is common across the entire computation. This leads to lots of places where this pointer must be set when adding an instruction to the fusion computation (and bugs such as b/65177535 when one is missed), as well as code to check that it's set correctly. In addition, this is simply unnecessary data bloat.

Moreover, the computation itself does not contain a pointer to the fusion instruction that references it, which leads to odd circumlocutions in the HloComputation code that retrieve the fusion instruction from the computation's root instruction.

Thus, this CL moves this pointer into the HloComputation class (replacing the is_fusion_computation_ bool value), and refactor the uses as necessary.

PiperOrigin-RevId: 167039280
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
797ca0d482457185f35d46cbce4c430f55b8b66a 30-Aug-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Teach HloComputation::Reparent to properly handle reparenting into fusion computations.

This also moves HloInstruction::CheckFusionInstruction() out of "private", and adds calls to it in the reduce-precision-insertion test to confirm that the reduce-precision-insertion pass maintains valid fusion computations. (These checks then fail without the fix to HloComputation::Reparent.)

PiperOrigin-RevId: 167031741
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
5ead76420dee762a5f710fda6893075f1292d5d3 19-Aug-2017 A. Unique TensorFlower <gardener@tensorflow.org> Reduce XLA compile time by ~7% for a convolutional image model:

* Added CompactPointerSet<T>, which is optimized for set size <= 1.
* Changed expensive CHECKs to DCHECKS in buffer_assignment.cc
* Reserve space in DFS state array before starting DFS.
* Use unsigned arithmetic in DFS state maintenance.
* HloInstruction:
- Moved frequently used fields to start for better cache locality.
- Use InlinedVector instead of vector for operand array.
- Use InlinedVector instead of vector for DFS stack.
* Pre-compute "is array" and "is tuple" for LogicalBuffer.
* PointsToSet:
- Combine two ShapeTrees into one.
- Use CompactPointerSet instead of std::set to hold sources.
- Use CompactPointerSet instead of std::set to hold flattened buffers.
* ShapeTree: use unique_ptr instead of optional for shape storage
(reduces size and destruction overhead).
* Add proper const qualifiers to some FlatSet iterator methods.

Co-author=jeff
PiperOrigin-RevId: 165759117
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
7359fec792e4efec1670a12332bb524a5608b215 18-Aug-2017 A. Unique TensorFlower <gardener@tensorflow.org> Implement Batchnorm Inference by expanding them into smaller ops.

1. Add batch norm inference support in batchnorm_rewriter
2. Connect xla's batchnorm inference to tf's FusedBatchNorm

RELNOTES: n/a
PiperOrigin-RevId: 165655351
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
0dfe9d308e4a738252a53f52654fc3bbdf74d809 17-Aug-2017 Chris Leary <leary@google.com> [XLA] Some judicious inlining to speed up large compiles by a second or so.

PiperOrigin-RevId: 165599564
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
9fc99811d70d4671b6f2bdabf8754ddf2d24e427 16-Aug-2017 A. Unique TensorFlower <gardener@tensorflow.org> Add a function that returns whether an hlo is elementwise binary.

PiperOrigin-RevId: 165470975
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
822603aed3f20159f06284af5ce35efa81b95ed6 11-Aug-2017 A. Unique TensorFlower <gardener@tensorflow.org> Merging sibling fusion instruction using multi_output_fusion

PiperOrigin-RevId: 164920220
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
fcbb00ff21f55ffede44793002d2f9d4f67c2306 10-Aug-2017 A. Unique TensorFlower <gardener@tensorflow.org> Clean up fused_instructions method in HloInstruction

PiperOrigin-RevId: 164879220
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
b4001ea6934ce0b7de02cc52ffbde778dcd62dca 09-Aug-2017 HyoukJoong Lee <hyouklee@google.com> Consider the nested computations when checking if an instruction is
removable from a computation. This is to prevent DCE from removing a
while instruction that includes a send/recv instruction.

PiperOrigin-RevId: 164722478
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
7ac31c1a3c66a02f158f2819f730a1e7438a2327 03-Aug-2017 Jeffrey A. Dean <jeff@google.com> Assign unique ids at the HloModule level to each HloInstruction object.
Use these when doing DFS over a graph in order to store the visited bits
using an array of two-bit values (in the dfs_hlo_visitor.{h,cc} module),
rather than a significantly larger and more expensive hash table to
store this state.

Ids are initially -1 and are assigned when unique names are assigned to
the HloInstruction objects.

Speeds up compilation of a convolutional image model by ~5.3%

PiperOrigin-RevId: 164050902
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
efb7fb8e58bbf7a04ed80a2affed516cdef15e0b 31-Jul-2017 A. Unique TensorFlower <gardener@tensorflow.org> Use XLA_VLOG_LINES() in literal_test_util to avoid truncation of large tensors.

PiperOrigin-RevId: 163745522
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
78a9b95436f45438abf3e818307f707e9ae92343 26-Jul-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Finish normalizing fusion computations into standard computations

PiperOrigin-RevId: 163210327
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
50cd64cbb91997a6ef701e4d5ada32f6e55e0d29 08-Jul-2017 A. Unique TensorFlower <gardener@tensorflow.org> Rename HloInstruction::slice_stride(int64) to
HloInstruction::slide_strides(int64).

This is so that the function matches the naming
convention of the other slice accessors.

PiperOrigin-RevId: 161272516
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
7d0f6385f8e7637e155ef9c340c19aded365a6ff 07-Jul-2017 A. Unique TensorFlower <gardener@tensorflow.org> [BatchNorm] Skeleton code to implement BatchNormGrad

This CL sets up all the boilerplate code needed to implement BatchNormGrad. None of the
backends bas been implemented yet.

RELNOTES: n/a
PiperOrigin-RevId: 161161713
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
14c5bd7e654ce50f8d1dfbbd87499a0b2cb52b64 06-Jul-2017 A. Unique TensorFlower <gardener@tensorflow.org> Allow "::" in HloInstruction names again.

FullyQualifiedName() used "::" as a separator, so it was important that "::" should not be used in HloInstruction names. Now that FullyQualifiedName() no longer exist, we can remove this restriction.

PiperOrigin-RevId: 161070812
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
94d52acdc0087d5829f220c4d46eea67e0d30305 05-Jul-2017 A. Unique TensorFlower <gardener@tensorflow.org> Eliminate HloInstruction::FullyQualifiedName().

Now that HloInstruction names are unique within an HloModule, we can replace
all uses of FullyQualifiedName() with simply name().

PiperOrigin-RevId: 160961583
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
a7032f21d72dd051f09b94733ed890dcd7ceaac8 04-Jul-2017 A. Unique TensorFlower <gardener@tensorflow.org> Clarify rules for names of HloInstructions and HloOpcodes.

- The name of an HloInstruction may not contain "::", as this is used as a
separator in fully qualified names.
- The name of an HloOpcode may not contain ':', as this is used as a separator
in extended opcode strings.

PiperOrigin-RevId: 160894413
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
50b999a8336d19400ab75aea66fe46eca2f5fe0b 28-Jun-2017 A. Unique TensorFlower <gardener@tensorflow.org> Merge changes from github.

PiperOrigin-RevId: 160344052
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
1464b9930de871fd11870941963253670f737c23 27-Jun-2017 A. Unique TensorFlower <gardener@tensorflow.org> Add more tests for BatchNormTraining.
RELNOTES: n/a

PiperOrigin-RevId: 160307959
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
1fa73c53ab95693f070ce70e6be0c644d83c163a 26-Jun-2017 A. Unique TensorFlower <gardener@tensorflow.org> Automated g4 rollback of changelist 160182040

PiperOrigin-RevId: 160190881
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
f3c89936e97c99dead1ca3310246691c1b221adf 26-Jun-2017 A. Unique TensorFlower <gardener@tensorflow.org> Merge changes from github.
END_PUBLIC

Note: this CL will break builds. cl/159887762 to follow to fix all the breakages.

---
Commit 2336cdf7f authored by Maxwell Paul Brickner<mbrickn@users.noreply.github.com>
Committed by gunan<gunan@google.com>:
Updated link to use HTTPS (#10998)

Howdy!

I just updated a link to use https instead of http.

Thanks!
---
Commit ad0892df1 authored by Luke Iwanski<luke@codeplay.com>
Committed by Luke Iwanski<luke@codeplay.com>:
[OpenCL] Fixes run_metadata_test for SYCL

This test is designed to test CUDA specific behavior

---
Commit 6b37a0725 authored by Todd Wang<toddwang@gmail.com>
Committed by GitHub<noreply@github.com>:
Update comments
---
Commit 1699d904a authored by John Lawson<john@codeplay.com>
Committed by Luke Iwanski<luke@codeplay.com>:
[OpenCL] Fixes CUDA specific test run on SYCL (#56)

The testBadParentValuesOnGPU should only be run on CUDA devices, as the
test checks for particular CUDA behaviour. We don't actually provide a
SYCL kernel for GatherTree and so it's not a problem that the tests
don't target SYCL.
---
Commit 3c1946230 authored by myPrecious<Moriadry@users.noreply.github.com>
Committed by Shanqing Cai<cais@google.com>:
Java API to get the size of specified input list of operations. (#10865)

* Java API to get the size of specified input list of operations

* remove unnecessary explain to avoid bring a new term to users.

---
Commit e911c7480 authored by Luke Iwanski<luke@codeplay.com>
Committed by Luke Iwanski<luke@codeplay.com>:
[OpenCL] REGISTER -> REGISTER6

---
Commit fbf6c4cec authored by superryanguo<superryanguo@gmail.com>
Committed by superryanguo<superryanguo@gmail.com>:
Simplify the Quickstart section with the weblink is better

---
Commit 72e2918cc authored by Taehoon Lee<taehoonlee@snu.ac.kr>
Committed by Taehoon Lee<taehoonlee@snu.ac.kr>:
Fix typos

---
Commit 90c4406b7 authored by Rishabh Patel<patelrishabh@users.noreply.github.com>
Committed by GitHub<noreply@github.com>:
Correct the learning rate as per the code snippet
---
Commit 03da61134 authored by Todd Wang<toddwang@gmail.com>
Committed by GitHub<noreply@github.com>:
Update ir_array.cc
---
Commit 2df6cd3ac authored by Todd Wang<toddwang@gmail.com>
Committed by GitHub<noreply@github.com>:
Another try
---
Commit af0cbace1 authored by Luke Iwanski<luke@codeplay.com>
Committed by Benoit Steiner<benoitsteiner@users.noreply.github.com>:
[OpenCL] Transpose to go through Eigen (#10321)

---
Commit fc7361081 authored by Luke Iwanski<luke@codeplay.com>
Committed by Benoit Steiner<benoitsteiner@users.noreply.github.com>:
[OpenCL] Registers RGBToHSV and HSVToRGB (#91) (#10848)

* [OpenCL] Added RGBToHSV and HSVToRGB

* Aligning '\'
---
Commit 832894ef8 authored by Luke Iwanski<luke@codeplay.com>
Committed by Benoit Steiner<benoitsteiner@users.noreply.github.com>:
[OpenCL] Registers AdjustContrastv2 (#10949)

* [OpenCL] Registers AdjustContrastv2 (#93)

* [OpenCL] Extended adjust_contrast_op_benchmark_test for OpenCL (#96)

* [OpenCL] Extended adjust_contrast_op_benchmark_test for OpenCL

* simplified to #ifndef

* Changed to "#if GOOGLE_CUDA"

* Update adjust_contrast_op_benchmark_test.cc

* Added comments

---
Commit cb4c2f8d1 authored by Yifei Feng<yifeif@google.com>
Committed by Yifei Feng<yifeif@google.com>:
Make TransferBufferToInFeed not virual so it compiles.

---
Commit e89f04d80 authored by Yifei Feng<yifeif@google.com>
Committed by Yifei Feng<yifeif@google.com>:
Fix calling Literal member functions.

---
Commit 15a8df724 authored by Yifei Feng<yifeif@google.com>
Committed by Yifei Feng<yifeif@google.com>:
Fix mac build
clone from meheff's change:
[XLA] Change return type of DeviceAssignment::Deserialize to fix build
breakage on mac.
The mac build had the following error:

error: incomplete type 'xla::DeviceAssignment' used in type trait
expression

This was due to a static method returning a StatusOr<DeviceAssignment>
inside of the definition of DeviceAssignment.

---
Commit a54d43fa4 authored by Yifei Feng<yifeif@google.com>
Committed by Yifei Feng<yifeif@google.com>:
Replace LiteralUtil to Literal in compiler/plugin/executor

---
Commit 88a6bb80c authored by Guenther Schmuelling<guschmue@microsoft.com>
Committed by Guenther Schmuelling<guschmue@microsoft.com>:
expand inline for debug builds to limit number of symbols

---
Commit 62fb49d31 authored by Yifei Feng<yifeif@google.com>
Committed by Yifei Feng<yifeif@google.com>:
Fix visibility error for contrib/remote_fused_graph/pylib/BUILD.

---
Commit 4c75252f2 authored by Mark Neumann<markn@allenai.org>
Committed by Mark Neumann<markn@allenai.org>:
fix initial test values to avoid numerical instability

---
Commit b58d98353 authored by sj6077<epik03sj@gmail.com>
Committed by Benoit Steiner<benoitsteiner@users.noreply.github.com>:
Fixes of AutoParallel bug (#10368)

* Fix the bug that auto_parallel could replicate variable snapshot name

* Use NodeName in grappler:utils instead of substr, convert variables->variable_def of grappler item

* remove variable_def from grappler item, exclude snapshot nodes from dont_replicate_nodes in auto_parallel

---
Commit a286b7db8 authored by Yifei Feng<yifeif@google.com>
Committed by Yifei Feng<yifeif@google.com>:
Make debug_test slice integer.

---
Commit 97fcfdfa6 authored by Toby Boyd<tobyboyd@google.com>
Committed by GitHub<noreply@github.com>:
Fixed path to seq2seq.py and minor formatting
---
Commit 63c1befb8 authored by Anish Shah<shah.anish07@gmail.com>
Committed by Anish Shah<shah.anish07@gmail.com>:
Improve docs for tf.nn.depthwise_conv2d_native

---
Commit 8d42202b2 authored by Yong Tang<yong.tang.github@outlook.com>
Committed by Yong Tang<yong.tang.github@outlook.com>:
Fix mismatched delete in mkl_tfconv_op.cc

This fix fixes mismatched new[]-delete in mkl_tfconv_op.cc

(the file went through clang-format so there are some additional
changes)

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

---
Commit 26301bd55 authored by Danny Goodman<goodman.danny@gmail.com>
Committed by Danny Goodman<goodman.danny@gmail.com>:
fix error format

---
Commit b3f33ad46 authored by Yao Zhang<yaozhang@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Make changes to prepare for the fused option of batch norm to be set to None (None means using fused batch norm if possible).

PiperOrigin-RevId: 159649743

---
Commit a4a469832 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
[XLA] Add tests for select ops and while loops that produce tuples that contain predicates.

PiperOrigin-RevId: 159645900

---
Commit 980d3f2be authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Use C API to implement Operation.name property

This name property is used in many existing tests including those that
already run with C API enabled (math_ops_test, framework_ops_test,
session_test, session_partial_run_test, math_ops_test_gpu, etc).

PiperOrigin-RevId: 159645767

---
Commit 26239c706 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Previously we didn't have an implementation of BatchNormInference and BatchNormTraining, which gives a linker error if anyone ever tries to call that. A dummy implementation is friendlier than a linker error.

PiperOrigin-RevId: 159645612

---
Commit f671c5caa authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
BEGIN_PUBLIC
Automated g4 rollback of changelist 159570549

PiperOrigin-RevId: 160182040
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
b1c56cc5d971c74062d140a1c5ce98afaa085402 22-Jun-2017 A. Unique TensorFlower <gardener@tensorflow.org> Make HloModule clonable

This CL makes HloModule clonable, which is necessary when we want to run the same compilation twice with the same input.

PiperOrigin-RevId: 159874256
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
3b41352a3177c2fe8a1329e8981b285bb6aacf8b 19-Jun-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA:CPU] Thread-parallel CPU backend (work in progress).
*) Partitions HLO instructions along outer dimensions, based on simple cost model.
*) Emits loop nests with dynamic outer loop bounds (for partitions), leaves inner loop bounds static (for optimizations).
*) Dispatches parallel tasks on thread pool for execution.

Simple element-wise fusion benchmark:

CPU: Intel Sandybridge with HyperThreading (16 cores) dL1:32KB dL2:256KB dL3:20MB
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------
BM_ParallelFusion/T1 16821490 16740939 100 237.791MB/s
BM_ParallelFusion/T2 9175467 17826232 100 435.945MB/s
BM_ParallelFusion/T4 5106019 18875761 100 783.389MB/s
BM_ParallelFusion/T8 2833598 19624622 233 1.379GB/s
BM_ParallelFusion/T16 1995259 26541594 344 1.958GB/s

Performance on some select model benchmarks (more work needed is needed here, but wanted to get this CL in and iterate).
Benchmark runs with 16 threads and wall time reported in seconds.

InceptionResnetV2.inception_resnet_v2_200x200x20x1000_inference_xla_cpu
wall_time(old): 7.97818803787
wall_time(new): 4.328297019

InceptionV3.inception_v3_200x200x20x1000_inference_xla_cpu
wall_time(old): 2.96792650223
wall_time(new): 1.21296644211

InceptionResnetV2.inception_resnet_v2_200x200x20x1000_training_xla_cpu
wall_time(old): 42.0342495441
wall_time(new): 17.9182584286

InceptionV3.inception_v3_200x200x20x1000_training_xla_cpu
wall_time(old): 6.99778497219
wall_time(new): 3.95318603516

BenchmarkRNN.rnn_basic_lstm_64x512_4x20_xla_cpu_forward
wall_time(old): 11.869822979
wall_time(new): 7.89778208733

BenchmarkRNN.rnn_basic_lstm_64x512_4x20_xla_cpu_forward_backward
wall_time(old): 38.1911079884
wall_time(new): 29.8181960583

PiperOrigin-RevId: 159474444
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
9d2a432ce74eab4c439fe8c60389e4da9d6c92b2 17-Jun-2017 A. Unique TensorFlower <gardener@tensorflow.org> Add plumbing for a ReducePrecision operation.

This CL is the first part of a series that adds a ReducePrecision operation for experimenting with the effects of reduced-precision storage of intermediate values. ReducePrecision is a Unary operation parameterized on floating-point exponent and mantissa bit sizes, and rounds the input data as if it were converted to a floating-point value with the given bit sizes and then converted back to "normal" F32 data.

Using arbitrary parameterized values to describe the lower-precision value type, rather than hardcoding this as a reduction to IEEE f16, allows us to do more flexible experiments -- e.g., "Is this training error due to the reduced mantissa precision, or due to the reduced exponent range?" or "Is this a smooth degradation with reduced precision or is there a sudden drop at some value?" -- which may suggest software mitigations for the effects.

This version of the CL adds the kReducePrecision instruction opcode, and the overall plumbing to support the operation. To allow testing, it includes an exceptionally simple implementation of the actual operation that returns "unimplemented" except for the exponent and mantissa bit sizes where it is a complete no-op.

PiperOrigin-RevId: 159295615
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
a7c36173cabcc1289a836e8143accb5f0914b19a 14-Jun-2017 A. Unique TensorFlower <gardener@tensorflow.org> Use a non-recursive DFS in HloInstruction::Accept to avoid stack
overflow on deep graphs

Even with this fix, we don't finish compiling the exact test case from
b/38494745 in a reasonable amount of time (we spend a lot of time
inside HloInstruction::FusionReusesParamElements::ComputeInternal, for
instance), so I've used a smaller graph depth for now to avoid timing
out the test.

PiperOrigin-RevId: 159026595
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
b6039c875290cdd5c9a62e01393b75b928827504 14-Jun-2017 A. Unique TensorFlower <gardener@tensorflow.org> We believe a fused version of batch_norm_op can speed the algorithm up. This pr implements a new op: fused_batch_norm_op in tf-xla and HLO.

This is the CPU implementation for batch norm training. This CL is big but
a lot of code are boilerplate.

PiperOrigin-RevId: 158930166
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
94085bee74557f34fd7ad3bef969eecf6c8c4f4e 07-Jun-2017 A. Unique TensorFlower <gardener@tensorflow.org> Replace std::function object with regular function.

The function is called recursively, and the std::function object had only existed to allow recursion from within a lambda expression. A regular function should be cheaper than a polymorphic function wrapper.

PiperOrigin-RevId: 158292415
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
02ac85399d4fb35d5055ecf426632b9446a70041 01-Jun-2017 A. Unique TensorFlower <gardener@tensorflow.org> Introduce new class Literal to replace protobuf Literal.

This renames the existing Literal message to LiteralProto and introduces a new
C++ class named Literal to replace it.

The LiteralProto is only used at RPC boundaries, or when protobuf-specific
functionality is required. The Literal class offers a 'ToProto' function to
generate a new LiteralProto message when necessary.

Currently, all the static functions in class LiteralUtil, just forward to their
counterparts in class Literal. This will change in a future CL.

Class Literal implements all the buffers as std::vectors. The only exception
is preds(), which given the std::vector<bool> representation, makes it unusable
for the semantics we require (it's not possible to get the address of the
underlying vector, for instance).

The CL adds a BoolVector class to work around that issue.

In future CLs, the std::vector representation may be changed to something more
efficient, if needed.

PiperOrigin-RevId: 157739125
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
1f9529b8dc1868a980297b7843d2fbae97062179 27-May-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Allow of the tuple instruction as the fusion root.

PiperOrigin-RevId: 157274264
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
3e767e9db0e0a00a509354ec18462841ea4d40f2 26-May-2017 A. Unique TensorFlower <gardener@tensorflow.org> Add debug protos that serialize HLO graph information.

Also add flags to dump this data in JSON format, for each backend.
This is useful for upcoming debugging tools.

PiperOrigin-RevId: 157178357
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
09f3fb939c9b395a9bc747cf81d15b2dc2804c3e 08-May-2017 A. Unique TensorFlower <gardener@tensorflow.org> Merged commit includes the following changes:
155425029 by A. Unique TensorFlower <gardener@tensorflow.org>:

Internal change.

--
155424167 by A. Unique TensorFlower <gardener@tensorflow.org>:

Internal change.

--

PiperOrigin-RevId: 155425029
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
965d620104d375c5fd2b18881f353eb41d9a63a2 04-May-2017 A. Unique TensorFlower <gardener@tensorflow.org> Internal change.
Change: 155009390
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
74770749840e1c823a50b743a50637afc3529e3c 29-Apr-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Make ReshapeMover account for broadcast operands, add VLOGging for debug.
Change: 154637127
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
7be3d9ee95e59f28126e3b9370fba738253fdc3b 27-Apr-2017 Mark Heffernan <meheff@google.com> [XLA] Various HLO naming fixes.
This change includes a number of fixes to HLO instruction especially fusion instructions. Specific changes:

(1) Remove HloInstruction::set_name and HloComputation::set_name. These methods were a bit dangerous as it made easy to create non-unique HLO names. Replace it with UniquifyName which renames the object to a unique name based on its current name.

(2) Change the name of the fusion computations to "fused_computation". Previously it was named after the root.

(3) Change naming of fusion parameters. They are now named after the unfused-instructions whose values they represent. Also, previously superfluous ".1", ".2", etc, could be added to the parameter names. This change fixes that.

(4) Change naming of instructions in fusion computations to be identical to the instructions they were cloned from. Previously all fused instructions would end up having a .clone suffix.

(4) If HloInstruction::Clone() is called with an empty suffix, then don't add a "." to the name.
Change: 154454938
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
fc197e6c77e336700a22e04df2b1f20e0fc72fd5 24-Apr-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA:HLO] Reduce copy and get-tuple-element instructions added by CopyInsertion.

CopyInsertion adds copies to while init operand elements to maintain correctness
for in-place while loops; e.g. if an element is updated in the loop, it must be
copied before entering the loop to avoid corrupting the state of other users of
the same buffer. However these copies are unnecessary if the element is
read-only in the while body. That is the general idea behind this CL; to remove
copies of read-only elements.

But there are some details. E.g. if any of these read-only elements are entry
parameters, they still must be copied (at least once). The problem here is that
entry parameter buffers are managed by the caller, and cannot (currently) share
the same allocation with other buffers. We add an optimization such that if the
same entry parameter is used by multiple while loops in a read-only fashion, it
is only copied once.

Also, the way the original code was adding the copies was sub-optimal. We'd end
up with this type of accordian pattern:
tuple -> (gte, gte, gte) -> tuple
This CL also removes many of the extra gte+tuple ops.
Change: 154082222
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
c0088ae3d2541d8e00fc238377dd802a811624f3 20-Apr-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Fix the parameter instruction printing issue

Append the parameter number to the fusion parameter name, and use the parameter
name rather the instruction name in creating the new parameter.

Show the paramameter number when printing out parameter instructions.
Change: 153752424
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
817533db9b17b5456b85ba9187df7262c2c9c453 20-Apr-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Fix incorrect comments
Change: 153737501
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
33fd4134234170745f989e2cdd73c8ca8709d926 17-Apr-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Represent fusion instructions as a HloComputation

Using a HloComputation to represent the HloInstructions inside a fusion
instruction.

All the interfaces are kept the same except for the parent field of the fusion
instruction. It now points to the newly created HloComputation rather the
enclosing computation for the fusion instruction.
Change: 153390245
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
f7e1a723aa40e29ac7a887e481e3f183e1b38ff8 15-Apr-2017 David Majnemer <majnemer@google.com> Internal-only changes.
Change: 153238377
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
6c42f0a1a80226692a9f37ff50e0e1356951e86c 28-Mar-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Add HLO verifier that checks HLO instruction's parent computation.
Change: 151494158
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
060e002e70e1abf04144a107fde939bda4051ac5 24-Mar-2017 Mark Heffernan <meheff@google.com> [XLA] Rematerialize subcomputations.
Extend HLO rematerialization to rematerialize subcomputations in addition to the entry computations. Outer nesting levels of computations are rematerialized before inner nesting levels because inner subcomputations may be while bodies where rematerialization is more expensive. Also

Also fix latent bug in call_graph dealing with fusion instructions, and extend HloInstruction::Clone to accept a string suffix (eg, "remat") for the clone name.
Change: 151179956
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
1258b206590d9460f87f0aaab0c9f9ccba3b1bfe 16-Mar-2017 A. Unique TensorFlower <gardener@tensorflow.org> Refactor convolution dimension numbers and windows dumping code
and remove duplicate code in hlo_graph_dumper
Change: 150324515
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
fc112a6b53d782eacb46eb357a8720d6b5a5d3cc 11-Mar-2017 Mark Heffernan <meheff@google.com> [XLA] Replace uses of std::set with std::vector.

std::set is slow and the iteration order is unstable. A couple other opportunistic changes include consolidating all called computations of an instruction in a single vector. This faciliates fast access to all called computations. Also, replace AddControlSuccessor/Predecessor with Add/RemoveControlDepedencyTo which is less error prone as you can't create a half connected control edge.
Change: 149810889
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
00d0347ccebc3e29ffe541703b5a2f929b89da36 10-Mar-2017 Brennan Saeta <saeta@google.com> [TF:XLA] Add debug metadata to HLO ops.

In order to support end-to-end debugging and performance profiling tooling for
the TensorFlow::XLA toolchain, this change adds a DebugMetadata proto to the
HloInstruction class, and pipes it through the tf2xla stack.
Change: 149703349
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
379560be32c3910593e94aa6e91277fc3df3fc98 02-Mar-2017 A. Unique TensorFlower <gardener@tensorflow.org> [TF:XLA] Reduce sequential memory usage via better ordering and simulated heap.

The choice of instruction ordering, and the minimization of fragmentation once
we've chosen an order, are two large inter-related factors wrt overall memory
usage. The approach in this CL uses heuristics to do better on both, but
neither problem is completely solved.

To pick a better an ordering (the larger factor), the approach is to try the
original list-scheduler based ordering, and to also try a DFS based ordering.
We pick the ordering that yields a smaller minimum memory, computed with the
simulated heap, ignoring fragmentation. Note that this is the absolute minimum
memory for a given ordering.

To minimize fragmentation, the approach is to run a heap simulation on temporary
buffers. We still try to re-use existing allocations when possible, but instead
of creating new allocations for temp buffers, we collect all the leftovers and
use a heap to pack them. The heap algorithm that gave the best results is "lazy
best-fit"; a variant of traditional best-fit that sometimes delays offset
assignment until Free is called, in the hopes of yielding larger free chunks.

Here's some measurements of the temp buffer sizes for GNMT encoder training (a
stacked LSTM). Lower is better. I've tried various combinations of instruction
ordering and heap simulation, to show the joint impact of these two factors.

List-scheduler order, no heap simulation 33.33GiB
List-scheduler order, with heap simulation 25.09GiB
Minimized DFS order, no heap simulation 16.59GiB
Arbitrary DFS order, no heap simulation 15.05GiB (old)
Arbitrary DFS order, with heap simulation 12.57GiB
Minimized DFS order, with heap simulation 11.71GiB (new)

Note that the original list scheduler order is much worse than DFS on stacked
LSTMs, but (not shown here) is much better than DFS on convolutions like
Inception. Also note that heap simulation packs things tighter for all
instruction orders in this example, but to varying degrees.
Change: 149049028
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
efc8f98d45df835bac2373e19f1da57e3a1ea2d0 28-Feb-2017 Jacques Pienaar <jpienaar@google.com> [XLA] Add basic outfeed support.
Change: 148699787
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
ad12d7233da6f6b034ad409e0bd1ca0fd201e68d 14-Feb-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Change BufferLiveness to allow operand buffer sharing for fused DynamicUpdateSlice instructions.
Change: 147506969
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
fd3e50d2a060cde47c3aac75f200946e3d916b16 13-Feb-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Change PointsToAnalysis to (optionally) include loop fusion instructions.
Change: 147341541
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
6c4077255fa4e6ae4e3e45122035f891ae803246 28-Jan-2017 Bjarke Hammersholt Roune <broune@google.com> Add sum-across-opcodes report for HLO profiling.
Change: 145863928
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
9113e98115ecbeb1404edb7d14d2cf443f2484bf 27-Jan-2017 Tayo Oguntebi <tayo@google.com> Addition of Outfeed HLO op.
Change: 145772331
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
239493a6825f33c96d64b6a36be6616fbb41e42b 25-Jan-2017 Mark Heffernan <meheff@google.com> Break out HloOrdering classes into separate files.

Add CreateMemoryMinimizingSequence which constructs a sequence of the
instructions in an HLO module that heuristically minimizes the
total size of live buffers containing HLO outputs.
Change: 145599747
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
863bab34202f650282dfe00aaa082a4796fdd839 24-Jan-2017 Bjarke Hammersholt Roune <broune@google.com> Improvements to HLO text format printing.
Change: 145374835
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
4fe280c59a71e85b73e9947063147743adf2ff2b 21-Jan-2017 A. Unique TensorFlower <gardener@tensorflow.org> Added optional string argument to infeed HLO op.
Change: 145188452
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
2abeb0f1b68a73ce54e9c90459e37de581117d45 19-Jan-2017 HyoukJoong Lee <hyouklee@google.com> Add control successors to HloInstruction. Add Send/Recv cases for the
ConstantVisitor in UserComputation.
Change: 145001882
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
6149fc54f4567cb00619fe085a691e3513343faf 19-Jan-2017 Mark Heffernan <meheff@google.com> Don't remove send, recv, or trace instructions in DCE pass. Also,
opportunistically change HloComputation::Remove*,
HloComputation::Replace*, and HloInstruction::Replace* to return Status.
An error Status is returned if one of those instruction types (send, etc)
is removed.
Change: 144978731
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
0a876b717288eb2bd5ef0d39d6f91969e491b0c2 18-Jan-2017 A. Unique TensorFlower <gardener@tensorflow.org> Internal-only change.
Change: 144835065
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
d8723bc9ce98053380aae77c91ff63ad6dbb4916 17-Jan-2017 A. Unique TensorFlower <gardener@tensorflow.org> [XLA] Privatize `HloInstruction::CreateNary`, NFC

This method is only used in other `HloInstruction` creator methods.
Change: 144737654
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h
1e67c90e2caceeff82d09793d1ef5fa0300d219b 09-Jan-2017 Peter Hawkins <phawkins@google.com> Initial open-source release of XLA: Accelerated Linear Algebra.

XLA is a compiler-based linear algebra execution engine that targets CPUs, GPUs and custom accelerators.

XLA is still experimental; we are releasing it early to get the community involved.
Change: 143990941
/external/tensorflow/tensorflow/compiler/xla/service/hlo_instruction.h