History log of /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
70062d11bf11d6579bfdbc87c3350a0074a12ae8 13-Dec-2017 A. Unique TensorFlower <gardener@tensorflow.org> Rename Stream::BlockHostUntilDoneWithStatus to BlockHostUntilDone.

PiperOrigin-RevId: 178951330
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
383a3226a9ad08ac507a3fbd6c220c5c1e15a540 12-Dec-2017 A. Unique TensorFlower <gardener@tensorflow.org> Use BlockHostUntilDoneWithStatus in various places.

PiperOrigin-RevId: 178723711
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
eb0808fb95567c1f5b7ce48d29f47edfd988aff8 23-Aug-2017 Benoit Steiner <bsteiner@google.com> Converted LOG(FATAL) into regular errors to prevent the process from crashing
on error.

PiperOrigin-RevId: 166257105
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
e85d3df92deb9d717befdf173966a2913ac2aea0 29-Jun-2017 Geoffrey Irving <geoffreyi@google.com> Prepare to remove a bunch of proto.h includes from tensorflow/core headers

The goal is to make kernels mostly independent of proto headers, which will let
us lock down our .so imports. This CL does not remove any actual headers, but
changes a bunch of files so that header removal is possible in a followup CL.
It also marks the headers that will be removed with

// TODO(b/62899350): Remove

RELNOTES: n/a
PiperOrigin-RevId: 160552878
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
eae0b5f53d0c7a13308e616135a75f6228bddb1a 27-May-2017 A. Unique TensorFlower <gardener@tensorflow.org> Remove unused using-declarations

PiperOrigin-RevId: 157276276
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
53cb26d05a5c2080d8022124178b1cc43a30ffe5 19-May-2017 A. Unique TensorFlower <gardener@tensorflow.org> Merge changes from github.
END_PUBLIC

---
Commit c2b8927f2 authored by Dandelion Man?<dandelion@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Fix another d3v4 regression in the graph visualizer.

PiperOrigin-RevId: 156343038

---
Commit 170f0b350 authored by Peter Hawkins<phawkins@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
[TF:XLA] Add XLA implementation of ResourceStridedSliceAssign.

PiperOrigin-RevId: 156341053

---
Commit 1390dd68f authored by Vijay Vasudevan<vrv@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
When Op Type is not registered, log the hostname of the machine that
it is running on in the error message, since the message could be routed
back during a failure on a remote binary, and it is hard to tell which
machine it came from.

Ideally, we'd somehow log the name of the binary running instead, but
we don't have a function to get that right now.

PiperOrigin-RevId: 156337679

---
Commit 9ca8a151b authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Internal change.

PiperOrigin-RevId: 156335942

---
Commit 40255434c authored by Martin Wicke<wicke@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Deprecate contrib/learn/dataframe. To be removed June 15.

PiperOrigin-RevId: 156333930

---
Commit 7f71b7fbe authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
BEGIN_PUBLIC
Automated g4 rollback of changelist 156123287

PiperOrigin-RevId: 156503903
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
6714c150df0a764b29acf8d23981162dd2f0a9a1 20-Jul-2016 Shanqing Cai <cais@google.com> Automated rollback of change 127562075
Change: 127906463
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
12efe48d210477bf9d9fa1a3f5e0f0ab4a24de77 18-Jul-2016 Shanqing Cai <cais@google.com> Automated rollback of change 127562075
Change: 127709092
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
e5ea34a104f55e9d698e50982de90d99ce99550f 15-Jul-2016 Shanqing Cai <cais@google.com> tfdb: Debug nodes inserter

EXPERIMENTAL: Insert special debug ops (e.g., DebugIdentity) to graph for debugging. Currently, debug ops need to take exactly one input and has the string attribute "tensor_name" to indicate what tensor it watches.

For example, before the node insertion, the graph may look like:

A:0 -----------1----------> B
|
---------2-----------> C

wherein the output slot 0 of node A feeds as the input to nodes B through
edge 1 and to node C through edge 2.

After the node insertion, assuming both B and C have non-Ref input, the graph becomes:

A:0 ---3---> Copy -----------4----------> B
|
---------5--------> C
|
---------6--------> X

If a node (e.g., B) has Ref input, the graph becomes:

----------------4---------------> B
|
A:0 ---3-----> Copy -----------5----------> C
|
-----------6--------> X

In other words, we do not feed Refs to deep-copies to downstream nodes.

The Copy node is the inserted deep-copy node that copies the input tensor on-device (e.g., CPU-to-CPU or GPU-to-GPU deep copy) that reduces the likelihood of racy updates during debug tensor-watching. X is the newly created debug node that transforms the input (copy of the watched tensor) into a debug signal.

DebugIdentity is the simplest debugging paradigm, in which the debug signal (i.e., X:0) equals the tensor itself. More sophisticated debug ops can be used to transform the tensor into other useful debug signals. An example is the added DebugNanCounter op.

If the nodes (A, B and C) are located on GPU and the edges from A to B or C is HOST_MEMORY, the CopyHost op will be used instead of the Copy op.

A reserved string attribute "debug_url" is created for the debug ops to make it possible to send debug signals to files or RPC calls in the future.

Other points worth noting:
* The debug ops have control-edge connections to the original destination node, in order to ensure that the debug signals are deterministically generated before the destination node executes.
* More than one debug ops can be added to watch a tensor.
* A new field called "DebugTensorWatch" is added to RunOptions to support debug node insertion.
* A new method GPUUtil::CopyGPUTensorToSameGPU has been added to make GPU-to-GPU deep-copy of tensors possible.
* The two test files (debug_gateway_test.cc and debug_gateway_gpu_test.cc) have been consolidated to the former, by using the GOOGLE_CUDA macro.
Change: 127562075
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
c8b59c046895fa5b6d79f73e0b5817330fcfbfc1 02-Jun-2016 A. Unique TensorFlower <nobody@tensorflow.org> Update copyright for 3p/tf/core.
Change: 123900938
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
a5923d823e088a9723e445cce9248d5fc59f1b30 09-Mar-2016 A. Unique TensorFlower <nobody@tensorflow.org> Allow StreamExecutor commands to return status types other than the TensorFlow status type.
Change: 116793254
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
ec1403e7dc2b919531e527d36d28659f60621c9e 02-Mar-2016 A. Unique TensorFlower <nobody@tensorflow.org> Add optional comprehensive logging of memory allocation/deallocation events. When enabled, the following events are recorded:

The start of a step, with the numerical step_id and a textual handle describing the step.

A Tensor allocation, including the step_id, the name of the OpKernel, the data type, shape, allocation size, allocation_id, data pointer location, and allocator used (the allocation_id is local to an allocator).

A Tensor deallocation, including the allocation_id and allocator used.

A raw memory allocation, including the step_id, the name of the component (e.g. Eigen), the number of bytes, data pointer location, allocation_id and allocator used.

A raw memory deallocation, including the step_id, the name of the component (e.g. Eigen), allocation_id and allocator used.

For now many Tensor allocations show 'unknown' for the kernel and step_id. These mostly come from Tensors allocated by the system from protocol buffers, and Tensors allocated by Ops using the Tensor constructor directly instead of calling OpKernelContext::allocate_temp. The latter can in principle be cleaned up one by one as necessary. The former would require some plumbing to associate an allocation with the appropriate step_id.

With this CL memory logging is enabled by raising the VLOG level to 1. Once there is an ability to set process-wide options programmatically it would make sense to update the machinery to do that. Currently recorded events are logged as INFO, and they can all be retrieved by filtering the log for lines including __LOG_MEMORY__.

Some example lines are as follows:

I0301 13:38:55.797563 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: -6 kernel_name: "Unknown (from Proto)" tensor { dtype: DT_FLOAT shape { } allocation_description { requested_bytes: 4 allocated_bytes: 4 allocator_name: "cuda_host" allocation_id: 2 has_single_reference: true ptr: 8717861408 } } }
I0301 13:38:55.802245 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: -6 kernel_name: "Unknown" tensor { dtype: DT_FLOAT shape { } allocation_description { requested_bytes: 4 allocated_bytes: 256 allocator_name: "gpu_bfc" allocation_id: 1 has_single_reference: true ptr: 47378989056 } } }
I0301 13:38:55.802347 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorDeallocation { allocation_id: 2 allocator_name: "cuda_host" }

[...]

I0301 13:38:55.806454 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogStep { step_id: 1 handle: "->/init;0" }
I0301 13:38:55.806659 81220 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 1 kernel_name: "random_normal/shape" tensor { dtype: DT_INT32 shape { dim { size: 4 } } allocation_description { requested_bytes: 16 allocated_bytes: 16 allocator_name: "cuda_host" allocation_id: 1 ptr: 8717860896 } } }

[...]

I0301 13:38:56.362898 81218 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: 1 kernel_name: "conv1/truncated_normal" tensor { dtype: DT_FLOAT shape { dim { size: 11 } dim { size: 11 } dim { size: 3 } dim { size: 96 } } allocation_description { requested_bytes: 139392 allocated_bytes: 139520 allocator_name: "gpu_bfc" allocation_id: 36 has_single_reference: true ptr: 47379030016 } } }
I0301 13:38:56.362894 81217 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorDeallocation { allocation_id: 24 allocator_name: "gpu_bfc" }
I0301 13:38:56.362903 81213 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 1 kernel_name: "conv5/truncated_normal/mul" tensor { dtype: DT_FLOAT shape { dim { size: 3 } dim { size: 3 } dim { size: 1024 } dim { size: 1024 } } allocation_description { requested_bytes: 37748736 allocated_bytes: 37748736 allocator_name: "gpu_bfc" allocation_id: 34 ptr: 48512711168 } } }

[...]

I0229 16:39:57.482980 76558 log_memory.cc:18] __LOG_MEMORY__ MemoryLogRawAllocation { step_id: 13 operation: "xentropy/EigenAllocator" num_bytes: 64 ptr: 47386857472 allocation_id: 625 allocator_name: "gpu_bfc" }
I0229 16:39:57.483147 76558 log_memory.cc:18] __LOG_MEMORY__ MemoryLogRawDeallocation { step_id: 13 operation: "xentropy/EigenAllocator" allocation_id: 625 allocator_name: "gpu_bfc" deferred: true }
I0229 16:39:57.483197 76558 log_memory.cc:18] __LOG_MEMORY__ MemoryLogRawDeallocation { step_id: 13 operation: "xentropy/EigenAllocator" allocation_id: 625 allocator_name: "gpu_bfc" }
Change: 116065112
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
241698b6ba6cd9b13d606a9e4603baa4f33891f2 06-Feb-2016 Xiaoqiang Zheng <zhengxq@google.com> Move CPU/GPU memory copies into their own streams.
Change: 114000504
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
545dee2e9897e424b641f092eec9ffd4a277f9d1 03-Feb-2016 Xiaoqiang Zheng <zhengxq@google.com> Put device-to-device GPU memory copies on a different stream.
Change: 113784244
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
d821f6aeb66a93501673ac5314685bd7d58151f8 02-Feb-2016 Xiaoqiang Zheng <zhengxq@google.com> Disable tensor tracking when only one GPU stream is used.
Change: 113579306
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
8a59748c087a2fee535c0d5067dbabb01920e812 29-Jan-2016 A. Unique TensorFlower <nobody@tensorflow.org> Use cc_binary rather than cc_library to reduce size of native library in APK from 5.5mb to 3.2mb (compressed).
Change: 113369407
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
f8fa35b8a1910772d6d6ba7b621f905358640c2c 26-Jan-2016 Josh Levenberg <josh11b@tensorflow.org> Global search & replace to move to the new location for
tensorflow/core/ files and build targets.
Change: 113080048
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
4ba51b33357d68f882a920fb4f87bfe67bb034a0 22-Jan-2016 Josh Levenberg <josh11b@tensorflow.org> Disentangle the GPU code from the CPU code. This means a few things:
* The "core_cpu_internal" build target no longer includes files from the
common_runtime/gpu/ directory.
* tensorflow/core internal targets instead can get access to those headers via
the "gpu_runtime" target.
* The class "CopyTensor" is introduced. It lives in common_runtime/
but supports registration of copy functions so the "gpu_runtime"
target can add a GPU->GPU copy ability if it is linked in.
This registration should make it easier to add more device types
in the future.
* The "core_cpu" and "core_cpu_internal" build targets no longer
reference GPUUtil::CopyViaDMA; rendezvous_mgr uses CopyTensor
instead.

Also the "copy_tensor" build target was not needed.
Change: 112821119
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
c8eaac926c929e07ac8db69f67803a2223ff2d93 20-Jan-2016 Josh Levenberg <josh11b@tensorflow.org> Many tensorflow/core build clean ups.
Change: 112523833
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
f592f23775e2a6ac75496829db5005d3bb70a3d2 19-Jan-2016 A. Unique TensorFlower <nobody@tensorflow.org> Replacing reference 'names' variable with 'example_names' variable.
Change: 112481326
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
c7258dfbf2f97714596676972bf76880c4db6253 15-Jan-2016 A. Unique TensorFlower <nobody@tensorflow.org> Move GPU specific code out of generic DMA codepaths.

The stream() and gpu_device_info() calls are not necessary for simple DMAs to/from a device. Moving this code into the GPU-GPU case allows the other DMA codepaths to be used for non-StreamExecutor devices which provide copy methods via their DeviceContext.
Change: 112204599
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
861f8f01334d20e998eae9a759c8f5a1e07721ca 14-Jan-2016 A. Unique TensorFlower <nobody@tensorflow.org> Move responsibility for constructing/destructing complex objects from
a helper class in tensor.cc into the allocator class. This allows
experimental devices more control over their memory allocation.
Change: 112111713
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
8e388a5546f1466aa0d2afa00e5a015997a23a2b 14-Jan-2016 Vijay Vasudevan <vrv@google.com> TensorFlow: Get rid of legacy command line flags use in TensorFlow.
Change: 112105282
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
f9d3e9d03c69bfac77a2fe1ad80f7c5aa517e0f0 06-Dec-2015 Vijay Vasudevan <vrv@google.com> TensorFlow: upstream latest changes to git.

Change 109537918
TensorFlow pip setup: wheel >= 0.26 for python3 pip install
Change 109505848
Fix distortion default value to 1.0 in fixed_unigram_candidate_sampler. This means we default to the actual provided unigram distribution, instead of to the uniform (as it is currently).
Change 109470494
Bugfix in gradients calculation when the ys rely on each other.
Change 109467619
Fix CIFAR-10 model to train on all the training data instead of just 80% of it. Fixes #396.
Change 109467557
Replaced checkpoint file with binary GraphDef.
Change 109467433
Updates to C++ tutorial section.
Change 109465269
TensorFlow: update documentation for tutorials to not assume use of bazel
(when possible).
Change 109462916
A tutorial for image recognition to coincide with the release of the latest Inception image classification model.
Change 109462342
Clear control dependencies in variable_scope.get_variable() when creating
ops for the initializer.

Add tests of various error conditions.
Change 109461981
Various performance improvements in low-level node execution code paths.

Speeds up ptb_word_lm on my desktop with a Titan X from
3638 words per second to 3751 words per second (3.1% speedup).

Changes include:

o Avoided many strcmp operations per node execution and extra touches
of cache lines in executor.cc, by making all the various IsMerge,
IsSwitch, IsSend, etc. operations instead be based on an internal enum
value that is pre-computed at Node construction time, rather than doing
string comparisons against node->type_string(). We were doing about
6 such comparisons per executed node.

o Removed mutex_lock in executor.cc in ExecutorState::Process. The
lock was not needed and the comment about the iterations array being
potentially resized is not true (the iterations arrays are created
with a fixed size). Checked with yuanbyu to confirm this.

o Added new two-argument port::Tracing::ScopedAnnotation constructor
that takes two StringPiece arguments, and only concatenates them
lazily if tracing is enabled. Also changed the code in
platform/tracing.{h,cc} so that the ScopedAnnotation constructor and
the TraceMe constructor can be inlined.

o In BaseGPUDevice::Compute, used the two-argument ScopedAnnotation
constructor to avoid doing StrCat(opkernel->name(), ":",
op_kernel->type_string()) on every node execution on a GPU.

o Introduced a new TensorReference class that just holds a reference to an
underlying TensorBuffer, and requires an explicit Unref().

o Changed the EventMgr interface to take a vector of TensorReference objects
for EventMgr::ThenDeleteTensors, rather than a vector of Tensor objects.

o Used TensorReference in a few places in gpu_util.cc

o Minor: switched to using InlinedVectors in a few places to get better
cache locality.
Change 109456692
Updated the label_image example to use the latest Inception model
Change 109456545
Provides classify_image which performs image recognition on a 1000 object label set.

$ ./classify_image
giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.88493)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00878)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00317)
custard apple (score = 0.00149)
earthstar (score = 0.00127)

Change 109455002
TensorFlow: make the helper libraries for various models available
in the pip package so that when users type:

python translate.py ...

the absolute import works.

This change is supposed to help make our tutorials run without the
*need* to use bazel.
Change 109450041
TensorFlow: remove cifar and convolutional binary copies from pip install.
Adds embedding and some other models to the list.
Change 109448520
Move the description of a failing invariant from a comment into the dcheck-fail message text.
Change 109447577
TensorBoard has release tagging (tensorboard/TAG)
Also track TensorBoard changes (tensorboard/CHANGES)
Change 109444161
Added ParseSingleSequenceExample + python wrappers + unit tests.
Change 109440864
Update all the TensorFlow Dockerfiles, and simplify GPU containers.

This change updates all four of our Dockerfiles to match the targets discussed
in https://github.com/tensorflow/tensorflow/issues/149. The most notable
change here is moving the GPU images to use the NVidia containers which
include cudnn and other build-time dependencies, dramatically simplifying both
the build and run steps.

A description of which tags exist and get pushed where will be in a follow-up.
Change 109432591
Some pylint and pydoc changes in saver.
Change 109430127
Remove unused hydrogen components
Change 109419354
The RNN api, although moved into python/ops/, remains undocumented.

It may still change at any time.

Base CL: 109538006
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
9c3043ff3bf31a6a81810b4ce9e87ef936f1f529 20-Nov-2015 Manjunath Kudlur <keveman@gmail.com> TensorFlow: Improve performance of Alexnet

Changes:

* error message that refers to removed `DefaultSession` method.
* -Wnull-conversion warnings
* the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set.
* typo in tutorial data download progress message.
* a typo ("however their installing"=>"however installing").
* typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website.
* a typo ("subtact"=>"subtract").
* protobuf examples in comments in tensorflow::Example.proto.
* formula formatting in MNIST beginner tutorial
* negative fraction-of-queue-full stats
* protobuf inclusion path so that Android demo will build under Blaze.
* small typo (moderatly > moderately)
* Session.run() to check that tensor arguments come from the session's graph.
* another six import
* seq2seq typo in bazel command

Base CL: 108349164
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
56313def004795f75ef8281a0294c958d28f1e06 16-Nov-2015 Vijay Vasudevan <vrv@google.com> TensorFlow: Doc and linter fixes, some additional tests and
error handling, updates to website.

Changes:
- Removes redundant reshape from image models by @mrry
- Default TensorBoard to localhost by @danmane
- Reformatting of tensorflow/core by @josh11b
- Make tutorials backwards compatible to 0.5.0 by @girving
- Improve print documentation (md files not updated).
- Add proper scrolling to sitemap by @martinwicke

Base CL: 107956254
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
f41959ccb2d9d4c722fe8fc3351401d53bcf4900 07-Nov-2015 Manjunath Kudlur <keveman@gmail.com> TensorFlow: Initial commit of TensorFlow library.
TensorFlow is an open source software library for numerical computation
using data flow graphs.

Base CL: 107276108
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc