Cross Reference: /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu

History log of /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
70062d11bf11d6579bfdbc87c3350a0074a12ae8	13-Dec-2017	A. Unique TensorFlower <gardener@tensorflow.org>	Rename Stream::BlockHostUntilDoneWithStatus to BlockHostUntilDone. PiperOrigin-RevId: 178951330 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
383a3226a9ad08ac507a3fbd6c220c5c1e15a540	12-Dec-2017	A. Unique TensorFlower <gardener@tensorflow.org>	Use BlockHostUntilDoneWithStatus in various places. PiperOrigin-RevId: 178723711 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
eb0808fb95567c1f5b7ce48d29f47edfd988aff8	23-Aug-2017	Benoit Steiner <bsteiner@google.com>	Converted LOG(FATAL) into regular errors to prevent the process from crashing on error. PiperOrigin-RevId: 166257105 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
e85d3df92deb9d717befdf173966a2913ac2aea0	29-Jun-2017	Geoffrey Irving <geoffreyi@google.com>	Prepare to remove a bunch of proto.h includes from tensorflow/core headers The goal is to make kernels mostly independent of proto headers, which will let us lock down our .so imports. This CL does not remove any actual headers, but changes a bunch of files so that header removal is possible in a followup CL. It also marks the headers that will be removed with // TODO(b/62899350): Remove RELNOTES: n/a PiperOrigin-RevId: 160552878 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
eae0b5f53d0c7a13308e616135a75f6228bddb1a	27-May-2017	A. Unique TensorFlower <gardener@tensorflow.org>	Remove unused using-declarations PiperOrigin-RevId: 157276276 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
53cb26d05a5c2080d8022124178b1cc43a30ffe5	19-May-2017	A. Unique TensorFlower <gardener@tensorflow.org>	Merge changes from github. END_PUBLIC --- Commit c2b8927f2 authored by Dandelion Man?<dandelion@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix another d3v4 regression in the graph visualizer. PiperOrigin-RevId: 156343038 --- Commit 170f0b350 authored by Peter Hawkins<phawkins@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [TF:XLA] Add XLA implementation of ResourceStridedSliceAssign. PiperOrigin-RevId: 156341053 --- Commit 1390dd68f authored by Vijay Vasudevan<vrv@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: When Op Type is not registered, log the hostname of the machine that it is running on in the error message, since the message could be routed back during a failure on a remote binary, and it is hard to tell which machine it came from. Ideally, we'd somehow log the name of the binary running instead, but we don't have a function to get that right now. PiperOrigin-RevId: 156337679 --- Commit 9ca8a151b authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Internal change. PiperOrigin-RevId: 156335942 --- Commit 40255434c authored by Martin Wicke<wicke@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Deprecate contrib/learn/dataframe. To be removed June 15. PiperOrigin-RevId: 156333930 --- Commit 7f71b7fbe authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: BEGIN_PUBLIC Automated g4 rollback of changelist 156123287 PiperOrigin-RevId: 156503903 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
6714c150df0a764b29acf8d23981162dd2f0a9a1	20-Jul-2016	Shanqing Cai <cais@google.com>	Automated rollback of change 127562075 Change: 127906463 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
12efe48d210477bf9d9fa1a3f5e0f0ab4a24de77	18-Jul-2016	Shanqing Cai <cais@google.com>	Automated rollback of change 127562075 Change: 127709092 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
e5ea34a104f55e9d698e50982de90d99ce99550f	15-Jul-2016	Shanqing Cai <cais@google.com>	tfdb: Debug nodes inserter EXPERIMENTAL: Insert special debug ops (e.g., DebugIdentity) to graph for debugging. Currently, debug ops need to take exactly one input and has the string attribute "tensor_name" to indicate what tensor it watches. For example, before the node insertion, the graph may look like: A:0 -----------1----------> B \| ---------2-----------> C wherein the output slot 0 of node A feeds as the input to nodes B through edge 1 and to node C through edge 2. After the node insertion, assuming both B and C have non-Ref input, the graph becomes: A:0 ---3---> Copy -----------4----------> B \| ---------5--------> C \| ---------6--------> X If a node (e.g., B) has Ref input, the graph becomes: ----------------4---------------> B \| A:0 ---3-----> Copy -----------5----------> C \| -----------6--------> X In other words, we do not feed Refs to deep-copies to downstream nodes. The Copy node is the inserted deep-copy node that copies the input tensor on-device (e.g., CPU-to-CPU or GPU-to-GPU deep copy) that reduces the likelihood of racy updates during debug tensor-watching. X is the newly created debug node that transforms the input (copy of the watched tensor) into a debug signal. DebugIdentity is the simplest debugging paradigm, in which the debug signal (i.e., X:0) equals the tensor itself. More sophisticated debug ops can be used to transform the tensor into other useful debug signals. An example is the added DebugNanCounter op. If the nodes (A, B and C) are located on GPU and the edges from A to B or C is HOST_MEMORY, the CopyHost op will be used instead of the Copy op. A reserved string attribute "debug_url" is created for the debug ops to make it possible to send debug signals to files or RPC calls in the future. Other points worth noting: * The debug ops have control-edge connections to the original destination node, in order to ensure that the debug signals are deterministically generated before the destination node executes. * More than one debug ops can be added to watch a tensor. * A new field called "DebugTensorWatch" is added to RunOptions to support debug node insertion. * A new method GPUUtil::CopyGPUTensorToSameGPU has been added to make GPU-to-GPU deep-copy of tensors possible. * The two test files (debug_gateway_test.cc and debug_gateway_gpu_test.cc) have been consolidated to the former, by using the GOOGLE_CUDA macro. Change: 127562075 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
c8b59c046895fa5b6d79f73e0b5817330fcfbfc1	02-Jun-2016	A. Unique TensorFlower <nobody@tensorflow.org>	Update copyright for 3p/tf/core. Change: 123900938 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
a5923d823e088a9723e445cce9248d5fc59f1b30	09-Mar-2016	A. Unique TensorFlower <nobody@tensorflow.org>	Allow StreamExecutor commands to return status types other than the TensorFlow status type. Change: 116793254 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
ec1403e7dc2b919531e527d36d28659f60621c9e	02-Mar-2016	A. Unique TensorFlower <nobody@tensorflow.org>	Add optional comprehensive logging of memory allocation/deallocation events. When enabled, the following events are recorded: The start of a step, with the numerical step_id and a textual handle describing the step. A Tensor allocation, including the step_id, the name of the OpKernel, the data type, shape, allocation size, allocation_id, data pointer location, and allocator used (the allocation_id is local to an allocator). A Tensor deallocation, including the allocation_id and allocator used. A raw memory allocation, including the step_id, the name of the component (e.g. Eigen), the number of bytes, data pointer location, allocation_id and allocator used. A raw memory deallocation, including the step_id, the name of the component (e.g. Eigen), allocation_id and allocator used. For now many Tensor allocations show 'unknown' for the kernel and step_id. These mostly come from Tensors allocated by the system from protocol buffers, and Tensors allocated by Ops using the Tensor constructor directly instead of calling OpKernelContext::allocate_temp. The latter can in principle be cleaned up one by one as necessary. The former would require some plumbing to associate an allocation with the appropriate step_id. With this CL memory logging is enabled by raising the VLOG level to 1. Once there is an ability to set process-wide options programmatically it would make sense to update the machinery to do that. Currently recorded events are logged as INFO, and they can all be retrieved by filtering the log for lines including __LOG_MEMORY__. Some example lines are as follows: I0301 13:38:55.797563 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: -6 kernel_name: "Unknown (from Proto)" tensor { dtype: DT_FLOAT shape { } allocation_description { requested_bytes: 4 allocated_bytes: 4 allocator_name: "cuda_host" allocation_id: 2 has_single_reference: true ptr: 8717861408 } } } I0301 13:38:55.802245 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: -6 kernel_name: "Unknown" tensor { dtype: DT_FLOAT shape { } allocation_description { requested_bytes: 4 allocated_bytes: 256 allocator_name: "gpu_bfc" allocation_id: 1 has_single_reference: true ptr: 47378989056 } } } I0301 13:38:55.802347 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorDeallocation { allocation_id: 2 allocator_name: "cuda_host" } [...] I0301 13:38:55.806454 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogStep { step_id: 1 handle: "->/init;0" } I0301 13:38:55.806659 81220 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 1 kernel_name: "random_normal/shape" tensor { dtype: DT_INT32 shape { dim { size: 4 } } allocation_description { requested_bytes: 16 allocated_bytes: 16 allocator_name: "cuda_host" allocation_id: 1 ptr: 8717860896 } } } [...] I0301 13:38:56.362898 81218 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: 1 kernel_name: "conv1/truncated_normal" tensor { dtype: DT_FLOAT shape { dim { size: 11 } dim { size: 11 } dim { size: 3 } dim { size: 96 } } allocation_description { requested_bytes: 139392 allocated_bytes: 139520 allocator_name: "gpu_bfc" allocation_id: 36 has_single_reference: true ptr: 47379030016 } } } I0301 13:38:56.362894 81217 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorDeallocation { allocation_id: 24 allocator_name: "gpu_bfc" } I0301 13:38:56.362903 81213 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 1 kernel_name: "conv5/truncated_normal/mul" tensor { dtype: DT_FLOAT shape { dim { size: 3 } dim { size: 3 } dim { size: 1024 } dim { size: 1024 } } allocation_description { requested_bytes: 37748736 allocated_bytes: 37748736 allocator_name: "gpu_bfc" allocation_id: 34 ptr: 48512711168 } } } [...] I0229 16:39:57.482980 76558 log_memory.cc:18] __LOG_MEMORY__ MemoryLogRawAllocation { step_id: 13 operation: "xentropy/EigenAllocator" num_bytes: 64 ptr: 47386857472 allocation_id: 625 allocator_name: "gpu_bfc" } I0229 16:39:57.483147 76558 log_memory.cc:18] __LOG_MEMORY__ MemoryLogRawDeallocation { step_id: 13 operation: "xentropy/EigenAllocator" allocation_id: 625 allocator_name: "gpu_bfc" deferred: true } I0229 16:39:57.483197 76558 log_memory.cc:18] __LOG_MEMORY__ MemoryLogRawDeallocation { step_id: 13 operation: "xentropy/EigenAllocator" allocation_id: 625 allocator_name: "gpu_bfc" } Change: 116065112 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
241698b6ba6cd9b13d606a9e4603baa4f33891f2	06-Feb-2016	Xiaoqiang Zheng <zhengxq@google.com>	Move CPU/GPU memory copies into their own streams. Change: 114000504 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
545dee2e9897e424b641f092eec9ffd4a277f9d1	03-Feb-2016	Xiaoqiang Zheng <zhengxq@google.com>	Put device-to-device GPU memory copies on a different stream. Change: 113784244 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
d821f6aeb66a93501673ac5314685bd7d58151f8	02-Feb-2016	Xiaoqiang Zheng <zhengxq@google.com>	Disable tensor tracking when only one GPU stream is used. Change: 113579306 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
8a59748c087a2fee535c0d5067dbabb01920e812	29-Jan-2016	A. Unique TensorFlower <nobody@tensorflow.org>	Use cc_binary rather than cc_library to reduce size of native library in APK from 5.5mb to 3.2mb (compressed). Change: 113369407 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
f8fa35b8a1910772d6d6ba7b621f905358640c2c	26-Jan-2016	Josh Levenberg <josh11b@tensorflow.org>	Global search & replace to move to the new location for tensorflow/core/ files and build targets. Change: 113080048 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
4ba51b33357d68f882a920fb4f87bfe67bb034a0	22-Jan-2016	Josh Levenberg <josh11b@tensorflow.org>	Disentangle the GPU code from the CPU code. This means a few things: * The "core_cpu_internal" build target no longer includes files from the common_runtime/gpu/ directory. * tensorflow/core internal targets instead can get access to those headers via the "gpu_runtime" target. * The class "CopyTensor" is introduced. It lives in common_runtime/ but supports registration of copy functions so the "gpu_runtime" target can add a GPU->GPU copy ability if it is linked in. This registration should make it easier to add more device types in the future. * The "core_cpu" and "core_cpu_internal" build targets no longer reference GPUUtil::CopyViaDMA; rendezvous_mgr uses CopyTensor instead. Also the "copy_tensor" build target was not needed. Change: 112821119 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
c8eaac926c929e07ac8db69f67803a2223ff2d93	20-Jan-2016	Josh Levenberg <josh11b@tensorflow.org>	Many tensorflow/core build clean ups. Change: 112523833 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
f592f23775e2a6ac75496829db5005d3bb70a3d2	19-Jan-2016	A. Unique TensorFlower <nobody@tensorflow.org>	Replacing reference 'names' variable with 'example_names' variable. Change: 112481326 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
c7258dfbf2f97714596676972bf76880c4db6253	15-Jan-2016	A. Unique TensorFlower <nobody@tensorflow.org>	Move GPU specific code out of generic DMA codepaths. The stream() and gpu_device_info() calls are not necessary for simple DMAs to/from a device. Moving this code into the GPU-GPU case allows the other DMA codepaths to be used for non-StreamExecutor devices which provide copy methods via their DeviceContext. Change: 112204599 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
861f8f01334d20e998eae9a759c8f5a1e07721ca	14-Jan-2016	A. Unique TensorFlower <nobody@tensorflow.org>	Move responsibility for constructing/destructing complex objects from a helper class in tensor.cc into the allocator class. This allows experimental devices more control over their memory allocation. Change: 112111713 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
8e388a5546f1466aa0d2afa00e5a015997a23a2b	14-Jan-2016	Vijay Vasudevan <vrv@google.com>	TensorFlow: Get rid of legacy command line flags use in TensorFlow. Change: 112105282 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
f9d3e9d03c69bfac77a2fe1ad80f7c5aa517e0f0	06-Dec-2015	Vijay Vasudevan <vrv@google.com>	TensorFlow: upstream latest changes to git. Change 109537918 TensorFlow pip setup: wheel >= 0.26 for python3 pip install Change 109505848 Fix distortion default value to 1.0 in fixed_unigram_candidate_sampler. This means we default to the actual provided unigram distribution, instead of to the uniform (as it is currently). Change 109470494 Bugfix in gradients calculation when the ys rely on each other. Change 109467619 Fix CIFAR-10 model to train on all the training data instead of just 80% of it. Fixes #396. Change 109467557 Replaced checkpoint file with binary GraphDef. Change 109467433 Updates to C++ tutorial section. Change 109465269 TensorFlow: update documentation for tutorials to not assume use of bazel (when possible). Change 109462916 A tutorial for image recognition to coincide with the release of the latest Inception image classification model. Change 109462342 Clear control dependencies in variable_scope.get_variable() when creating ops for the initializer. Add tests of various error conditions. Change 109461981 Various performance improvements in low-level node execution code paths. Speeds up ptb_word_lm on my desktop with a Titan X from 3638 words per second to 3751 words per second (3.1% speedup). Changes include: o Avoided many strcmp operations per node execution and extra touches of cache lines in executor.cc, by making all the various IsMerge, IsSwitch, IsSend, etc. operations instead be based on an internal enum value that is pre-computed at Node construction time, rather than doing string comparisons against node->type_string(). We were doing about 6 such comparisons per executed node. o Removed mutex_lock in executor.cc in ExecutorState::Process. The lock was not needed and the comment about the iterations array being potentially resized is not true (the iterations arrays are created with a fixed size). Checked with yuanbyu to confirm this. o Added new two-argument port::Tracing::ScopedAnnotation constructor that takes two StringPiece arguments, and only concatenates them lazily if tracing is enabled. Also changed the code in platform/tracing.{h,cc} so that the ScopedAnnotation constructor and the TraceMe constructor can be inlined. o In BaseGPUDevice::Compute, used the two-argument ScopedAnnotation constructor to avoid doing StrCat(opkernel->name(), ":", op_kernel->type_string()) on every node execution on a GPU. o Introduced a new TensorReference class that just holds a reference to an underlying TensorBuffer, and requires an explicit Unref(). o Changed the EventMgr interface to take a vector of TensorReference objects for EventMgr::ThenDeleteTensors, rather than a vector of Tensor objects. o Used TensorReference in a few places in gpu_util.cc o Minor: switched to using InlinedVectors in a few places to get better cache locality. Change 109456692 Updated the label_image example to use the latest Inception model Change 109456545 Provides classify_image which performs image recognition on a 1000 object label set. $ ./classify_image giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.88493) indri, indris, Indri indri, Indri brevicaudatus (score = 0.00878) lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00317) custard apple (score = 0.00149) earthstar (score = 0.00127) Change 109455002 TensorFlow: make the helper libraries for various models available in the pip package so that when users type: python translate.py ... the absolute import works. This change is supposed to help make our tutorials run without the need to use bazel. Change 109450041 TensorFlow: remove cifar and convolutional binary copies from pip install. Adds embedding and some other models to the list. Change 109448520 Move the description of a failing invariant from a comment into the dcheck-fail message text. Change 109447577 TensorBoard has release tagging (tensorboard/TAG) Also track TensorBoard changes (tensorboard/CHANGES) Change 109444161 Added ParseSingleSequenceExample + python wrappers + unit tests. Change 109440864 Update all the TensorFlow Dockerfiles, and simplify GPU containers. This change updates all four of our Dockerfiles to match the targets discussed in https://github.com/tensorflow/tensorflow/issues/149. The most notable change here is moving the GPU images to use the NVidia containers which include cudnn and other build-time dependencies, dramatically simplifying both the build and run steps. A description of which tags exist and get pushed where will be in a follow-up. Change 109432591 Some pylint and pydoc changes in saver. Change 109430127 Remove unused hydrogen components Change 109419354 The RNN api, although moved into python/ops/, remains undocumented. It may still change at any time. Base CL: 109538006 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
9c3043ff3bf31a6a81810b4ce9e87ef936f1f529	20-Nov-2015	Manjunath Kudlur <keveman@gmail.com>	TensorFlow: Improve performance of Alexnet Changes: * error message that refers to removed `DefaultSession` method. * -Wnull-conversion warnings * the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set. * typo in tutorial data download progress message. * a typo ("however their installing"=>"however installing"). * typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website. * a typo ("subtact"=>"subtract"). * protobuf examples in comments in tensorflow::Example.proto. * formula formatting in MNIST beginner tutorial * negative fraction-of-queue-full stats * protobuf inclusion path so that Android demo will build under Blaze. * small typo (moderatly > moderately) * Session.run() to check that tensor arguments come from the session's graph. * another six import * seq2seq typo in bazel command Base CL: 108349164 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
56313def004795f75ef8281a0294c958d28f1e06	16-Nov-2015	Vijay Vasudevan <vrv@google.com>	TensorFlow: Doc and linter fixes, some additional tests and error handling, updates to website. Changes: - Removes redundant reshape from image models by @mrry - Default TensorBoard to localhost by @danmane - Reformatting of tensorflow/core by @josh11b - Make tutorials backwards compatible to 0.5.0 by @girving - Improve print documentation (md files not updated). - Add proper scrolling to sitemap by @martinwicke Base CL: 107956254 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
f41959ccb2d9d4c722fe8fc3351401d53bcf4900	07-Nov-2015	Manjunath Kudlur <keveman@gmail.com>	TensorFlow: Initial commit of TensorFlow library. TensorFlow is an open source software library for numerical computation using data flow graphs. Base CL: 107276108 /external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc