History log of /external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
6b1208c647079e822f19a4c869af52bf3a06d532 08-Feb-2018 Yong Tang <yong.tang.github@outlook.com> Add uint32 and uint64 kernel support for `Invert` (#15154)

* Add uint32 and uint64 kernel support for `Invert`

This fix adds uint32 and uint64 kernel support for `Invert`.

In bitwise_ops.cc, uint32 and uint64 have been registered
for `Invert` like other bitwise ops
`BitwiseAnd`/`BitwiseOr`/`BitwiseXor`/`LeftShift`/`RightShift`.
However, no uint32 and uint64 kernels available for `Invert` yet.

This fix add uint32 and uint64 kernel for `Invert`,
and adds additional test cases to cover the changes.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add test cases for uint32 and uint64 support with `Invert`

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add missing uint32 and uint64 in GPU for invert

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add DEFINE_UNARY8 for invert

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
2fe28ca92fdf4ec209a68877c9c5e76348809a8b 11-Apr-2017 RJ Ryan <rjryan@google.com> Add a tf.neg GPU kernel for complex64/complex128.

Also, enable GPU tests for ops that are already supported on GPU.
Change: 152831861
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
e17f4a645c33b2f86c2a4ac9c88947783a18096a 07-Mar-2017 Yao Zhang <yaozhang@google.com> Add an approximate_equal op for floating point tensor comparison.
Change: 149374684
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
93f15d4cde2f08057819f1194e5a4771f0d391ff 25-Sep-2016 RJ Ryan <rjryan@google.com> Enable complex addition, subtraction, multiplication, and division GPU support.
Change: 134186092
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
999b794c137d12d73adbf41dcbe9383a0cd94769 21-Sep-2016 Martin Wicke <wicke@google.com> Merge changes from github.
Change: 133874452
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
f21642013092b53186491064335053a9e02ce010 14-Sep-2016 RJ Ryan <rjryan@google.com> Add complex64/complex128 GPU support for the following ops:
* ComplexAbs
* Shape
* ShapeN
* Sign
* Slice
* ZerosLike

Also, add a "special" int32 Reverse GPU kernel to avoid unnecessary CPU/GPU transfers.
Change: 133160947
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
c8b59c046895fa5b6d79f73e0b5817330fcfbfc1 02-Jun-2016 A. Unique TensorFlower <nobody@tensorflow.org> Update copyright for 3p/tf/core.
Change: 123900938
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
892ca4ddc12852a7b4633fd08f163941356cb4e6 23-May-2016 Derek Murray <mrry@google.com> Merge changes from github.
Change: 123026122
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
1ed0e764c8ca84d45823f2fd172dc8d40f89e3e2 04-May-2016 Geoffrey Irving <geoffreyi@google.com> Catch integer division by zero on CPU to avoid SIGFPE

We let it through on GPU since the behavior is bizarre but harmless.

On the CPU, we have to turn off packet math in Eigen and use a special binary
functor that sets an error bit on division by zero. Ideally we'd be able to
use packet math too; all it would take is a nice way for checking if a packet
contains a zero.

Fixes #2163.
Change: 121429857
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
ce9f3044615af06b841407bdfe4aa5e8894ebaa8 29-Apr-2016 Martin Wicke <wicke@google.com> Make comparison operators work for bool tensors.
Change: 121073548
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
31fd4868711f393bec74200231c2936bf3df079a 15-Apr-2016 A. Unique TensorFlower <nobody@tensorflow.org> fp16-enable all the componentwise ops.

This also includes updating parts of the Python test framework to handle fp16.
fp16 is too inaccurate to do numerical gradients unless a lot of care is taken,
so for fp16, we compare fp16 theoretical gradients to fp32 numerical ones.
This means that the gradient check doesn't also implicitly test the function
itself, so we will need to rely on the numpy tests for those.
Change: 119948035
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
9ede62fccd761203c17250afe5c132fb82aa689d 02-Feb-2016 Eugene Brevdo <ebrevdo@gmail.com> Select op can now broadcast when the condition is a vector.
In this case, instead of operating element-wise it operates
row-wise on arbitrary rank tensors.

This alternate calculation happens when the cond Tensor
is a vector whose size matches the row count of the 'then'
and 'else' tensors.

GPU-enabled + tests. Will be used for intelligent update of RNN states during
dynamic with computation variable sequence lengths in a minibatch.
Change: 113659369
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
3ede5506acf6a026f09eda33277d46e34ac7ed10 26-Jan-2016 Josh Levenberg <josh11b@tensorflow.org> Global search & replace to move to the new location for
tensorflow/core/ files and build targets.
Change: 113075177
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
795f35da2d458cbae477ac2fe2bff80c1427a771 01-Dec-2015 Vijay Vasudevan <vrv@google.com> TensorFlow: upstream changes to git

Change:
Clean up documentation for ReverseSequence
Change:
Updated several tensorflow operations to use 32bit indices on GPU.
Change:
Add attribute batch_dim to ReverseSequenceOp.
Change:
Fix error in convert_to_records.py. As reported in
https://github.com/tensorflow/tensorflow/issues/370
by AlexUnderMicrocontRoll.
Change:
Update TensorBoard README.
Change:
Fixes to boolean flags reported in
https://github.com/tensorflow/tensorflow/issues/379. Supports:

--bool_flag=True --> True
--bool_flag=False --> False
--bool_flag=gibberish --> False
--bool_flag --> True
--nobool_flag --> False

Fixes #379
Change:
Update generated Op docs.
Change:
Enable local development of TensorBoard using gulp
Also make tf-tensorboard a regular component rather than special case

This is mostly effected by creating tfserve.js, which is a small server
with clever routing to load from bower_components/ and components/ using
the paths that work within google3.

Workflow: `gulp serve`
Change:
Add a full working code example to the tensorboard and summaries tutorial
Change:
Fix seq2seq_test when running on GPU.

The "proj_w" and "proj_b" variables were being created before the
`test_session()`'s device function took effect, which pushed the
placement algorithm into making an incorrect decision.
Change:
Add a sentence in TensorBoard README on how to serialize summary data to logs and provide link to the how-to tutorial on the TensorFlow website.
Change:
Add error-catching code if string_input_producer is supplied a null input.
Before this change, it would die with an opaque shape error from inside
the queue. This change catches (most) python null lists being
passed directly in, and at runtime detects null tensors.

Adds two tests for this to input_test.py
Change:
Speed up for models that use the same variable multiple times in the case
where variables must be copied across devices:
- Have Variables wrap the Variable op in an Identity op when converted to Tensor.
This avoids multiple copies across devices if a variable is used multiple time
in a computation.
- Add Variable.mutable() to return the non-wrapped Variable op for used when
assigning new values.
- Add an as_ref parameter to convert_to_tensor() to allow code to specify
if they plan to assign a new value to the result of the conversion. Make Variable
return the result of Variable.mutable() when as_ref is True.
- Make all ops that assign values to variables pass as_ref=True when converting
their arguments.
Change:
Change to reduce critical section times in gpu_event_mgr.h:
(1) Call stream->ThenRecordEvent outside the EventMgr critical section
(2) Do memory deallocation outside the critical section

Speeds up one configuration of ptb_word_lm from 2924 words per
second (wps) to 3278 wps on my desktop machine with a Titan X.
Change:
Remove some colons that break the open source build

::tensorflow::StringPiece breaks for @raingo, see
https://github.com/tensorflow/tensorflow/issues/358.
tensorflow::StringPiece (without the leading colons)
seems to fix the problem.
Change:
Added check that inputs to Operation is a list and make a defensive copy of the input. This is for cases where the input list is changed such as in _add_input.
Change:
Use standard names for TensorFlow dtypes in the tutorial.
Change:
Add tests for tensor inputs.
Change:
Fix build after declaring more types for ops
Change:
Switch to 32 bit indexing to speedup convolutions and concatenations.
Change:
Add convert_image op to convert between types for images (similar to OpenCV's cvtScale).
Change:
Make cast work between numeric types (bool, uint8, int16, int32, int64, float, double).
Change:

Padding input data for odd number of paddings, so we can use cudnn anyway.
+ Fix total padding computation when padding==VALID.
+ This CL makes the Googlenet benchmark run 5x faster.

Change:
Support IndexedSlices in ConcatGrad
Change:
* sampled softmax op uses one embedding lookup for positive and negative samples
* float64 support for sampled softmax
Change:
Move RNN code out of models.rnn (without breaking existing code). The API may still undergo minor changes, until full documentation as added.
Change:
Changed to use per-step stacks for the accumulators used in while-loop gradient computation. This addresses the problem caused by using concat without sufficient static shape information. It should also improve performance as we avoided those expensive concats.
Change:
Update generated Op docs.
Change:
Improve error messages when the optimizer finds no variables to minimize or
when none of the variables has gradients.
Change:
Say that -1 isn't just for flattening in reshape docs

Also add scalar reshape (reshape(t, [])) as an example.

This fixes https://github.com/tensorflow/tensorflow/issues/281.
Change:
This is a test.

Base CL: 109118714
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
9c3043ff3bf31a6a81810b4ce9e87ef936f1f529 20-Nov-2015 Manjunath Kudlur <keveman@gmail.com> TensorFlow: Improve performance of Alexnet

Changes:

* error message that refers to removed `DefaultSession` method.
* -Wnull-conversion warnings
* the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set.
* typo in tutorial data download progress message.
* a typo ("however their installing"=>"however installing").
* typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website.
* a typo ("subtact"=>"subtract").
* protobuf examples in comments in tensorflow::Example.proto.
* formula formatting in MNIST beginner tutorial
* negative fraction-of-queue-full stats
* protobuf inclusion path so that Android demo will build under Blaze.
* small typo (moderatly > moderately)
* Session.run() to check that tensor arguments come from the session's graph.
* another six import
* seq2seq typo in bazel command

Base CL: 108349164
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
56313def004795f75ef8281a0294c958d28f1e06 16-Nov-2015 Vijay Vasudevan <vrv@google.com> TensorFlow: Doc and linter fixes, some additional tests and
error handling, updates to website.

Changes:
- Removes redundant reshape from image models by @mrry
- Default TensorBoard to localhost by @danmane
- Reformatting of tensorflow/core by @josh11b
- Make tutorials backwards compatible to 0.5.0 by @girving
- Improve print documentation (md files not updated).
- Add proper scrolling to sitemap by @martinwicke

Base CL: 107956254
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
f41959ccb2d9d4c722fe8fc3351401d53bcf4900 07-Nov-2015 Manjunath Kudlur <keveman@gmail.com> TensorFlow: Initial commit of TensorFlow library.
TensorFlow is an open source software library for numerical computation
using data flow graphs.

Base CL: 107276108
/external/tensorflow/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h