70062d11bf11d6579bfdbc87c3350a0074a12ae8 |
|
13-Dec-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Rename Stream::BlockHostUntilDoneWithStatus to BlockHostUntilDone. PiperOrigin-RevId: 178951330
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
383a3226a9ad08ac507a3fbd6c220c5c1e15a540 |
|
12-Dec-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Use BlockHostUntilDoneWithStatus in various places. PiperOrigin-RevId: 178723711
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
eb0808fb95567c1f5b7ce48d29f47edfd988aff8 |
|
23-Aug-2017 |
Benoit Steiner <bsteiner@google.com> |
Converted LOG(FATAL) into regular errors to prevent the process from crashing on error. PiperOrigin-RevId: 166257105
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
e85d3df92deb9d717befdf173966a2913ac2aea0 |
|
29-Jun-2017 |
Geoffrey Irving <geoffreyi@google.com> |
Prepare to remove a bunch of proto.h includes from tensorflow/core headers The goal is to make kernels mostly independent of proto headers, which will let us lock down our .so imports. This CL does not remove any actual headers, but changes a bunch of files so that header removal is possible in a followup CL. It also marks the headers that will be removed with // TODO(b/62899350): Remove RELNOTES: n/a PiperOrigin-RevId: 160552878
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
eae0b5f53d0c7a13308e616135a75f6228bddb1a |
|
27-May-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Remove unused using-declarations PiperOrigin-RevId: 157276276
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
53cb26d05a5c2080d8022124178b1cc43a30ffe5 |
|
19-May-2017 |
A. Unique TensorFlower <gardener@tensorflow.org> |
Merge changes from github. END_PUBLIC --- Commit c2b8927f2 authored by Dandelion Man?<dandelion@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix another d3v4 regression in the graph visualizer. PiperOrigin-RevId: 156343038 --- Commit 170f0b350 authored by Peter Hawkins<phawkins@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [TF:XLA] Add XLA implementation of ResourceStridedSliceAssign. PiperOrigin-RevId: 156341053 --- Commit 1390dd68f authored by Vijay Vasudevan<vrv@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: When Op Type is not registered, log the hostname of the machine that it is running on in the error message, since the message could be routed back during a failure on a remote binary, and it is hard to tell which machine it came from. Ideally, we'd somehow log the name of the binary running instead, but we don't have a function to get that right now. PiperOrigin-RevId: 156337679 --- Commit 9ca8a151b authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Internal change. PiperOrigin-RevId: 156335942 --- Commit 40255434c authored by Martin Wicke<wicke@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Deprecate contrib/learn/dataframe. To be removed June 15. PiperOrigin-RevId: 156333930 --- Commit 7f71b7fbe authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: BEGIN_PUBLIC Automated g4 rollback of changelist 156123287 PiperOrigin-RevId: 156503903
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
6714c150df0a764b29acf8d23981162dd2f0a9a1 |
|
20-Jul-2016 |
Shanqing Cai <cais@google.com> |
Automated rollback of change 127562075 Change: 127906463
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
12efe48d210477bf9d9fa1a3f5e0f0ab4a24de77 |
|
18-Jul-2016 |
Shanqing Cai <cais@google.com> |
Automated rollback of change 127562075 Change: 127709092
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
e5ea34a104f55e9d698e50982de90d99ce99550f |
|
15-Jul-2016 |
Shanqing Cai <cais@google.com> |
tfdb: Debug nodes inserter EXPERIMENTAL: Insert special debug ops (e.g., DebugIdentity) to graph for debugging. Currently, debug ops need to take exactly one input and has the string attribute "tensor_name" to indicate what tensor it watches. For example, before the node insertion, the graph may look like: A:0 -----------1----------> B | ---------2-----------> C wherein the output slot 0 of node A feeds as the input to nodes B through edge 1 and to node C through edge 2. After the node insertion, assuming both B and C have non-Ref input, the graph becomes: A:0 ---3---> Copy -----------4----------> B | ---------5--------> C | ---------6--------> X If a node (e.g., B) has Ref input, the graph becomes: ----------------4---------------> B | A:0 ---3-----> Copy -----------5----------> C | -----------6--------> X In other words, we do not feed Refs to deep-copies to downstream nodes. The Copy node is the inserted deep-copy node that copies the input tensor on-device (e.g., CPU-to-CPU or GPU-to-GPU deep copy) that reduces the likelihood of racy updates during debug tensor-watching. X is the newly created debug node that transforms the input (copy of the watched tensor) into a debug signal. DebugIdentity is the simplest debugging paradigm, in which the debug signal (i.e., X:0) equals the tensor itself. More sophisticated debug ops can be used to transform the tensor into other useful debug signals. An example is the added DebugNanCounter op. If the nodes (A, B and C) are located on GPU and the edges from A to B or C is HOST_MEMORY, the CopyHost op will be used instead of the Copy op. A reserved string attribute "debug_url" is created for the debug ops to make it possible to send debug signals to files or RPC calls in the future. Other points worth noting: * The debug ops have control-edge connections to the original destination node, in order to ensure that the debug signals are deterministically generated before the destination node executes. * More than one debug ops can be added to watch a tensor. * A new field called "DebugTensorWatch" is added to RunOptions to support debug node insertion. * A new method GPUUtil::CopyGPUTensorToSameGPU has been added to make GPU-to-GPU deep-copy of tensors possible. * The two test files (debug_gateway_test.cc and debug_gateway_gpu_test.cc) have been consolidated to the former, by using the GOOGLE_CUDA macro. Change: 127562075
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
c8b59c046895fa5b6d79f73e0b5817330fcfbfc1 |
|
02-Jun-2016 |
A. Unique TensorFlower <nobody@tensorflow.org> |
Update copyright for 3p/tf/core. Change: 123900938
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
a5923d823e088a9723e445cce9248d5fc59f1b30 |
|
09-Mar-2016 |
A. Unique TensorFlower <nobody@tensorflow.org> |
Allow StreamExecutor commands to return status types other than the TensorFlow status type. Change: 116793254
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
ec1403e7dc2b919531e527d36d28659f60621c9e |
|
02-Mar-2016 |
A. Unique TensorFlower <nobody@tensorflow.org> |
Add optional comprehensive logging of memory allocation/deallocation events. When enabled, the following events are recorded: The start of a step, with the numerical step_id and a textual handle describing the step. A Tensor allocation, including the step_id, the name of the OpKernel, the data type, shape, allocation size, allocation_id, data pointer location, and allocator used (the allocation_id is local to an allocator). A Tensor deallocation, including the allocation_id and allocator used. A raw memory allocation, including the step_id, the name of the component (e.g. Eigen), the number of bytes, data pointer location, allocation_id and allocator used. A raw memory deallocation, including the step_id, the name of the component (e.g. Eigen), allocation_id and allocator used. For now many Tensor allocations show 'unknown' for the kernel and step_id. These mostly come from Tensors allocated by the system from protocol buffers, and Tensors allocated by Ops using the Tensor constructor directly instead of calling OpKernelContext::allocate_temp. The latter can in principle be cleaned up one by one as necessary. The former would require some plumbing to associate an allocation with the appropriate step_id. With this CL memory logging is enabled by raising the VLOG level to 1. Once there is an ability to set process-wide options programmatically it would make sense to update the machinery to do that. Currently recorded events are logged as INFO, and they can all be retrieved by filtering the log for lines including __LOG_MEMORY__. Some example lines are as follows: I0301 13:38:55.797563 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: -6 kernel_name: "Unknown (from Proto)" tensor { dtype: DT_FLOAT shape { } allocation_description { requested_bytes: 4 allocated_bytes: 4 allocator_name: "cuda_host" allocation_id: 2 has_single_reference: true ptr: 8717861408 } } } I0301 13:38:55.802245 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: -6 kernel_name: "Unknown" tensor { dtype: DT_FLOAT shape { } allocation_description { requested_bytes: 4 allocated_bytes: 256 allocator_name: "gpu_bfc" allocation_id: 1 has_single_reference: true ptr: 47378989056 } } } I0301 13:38:55.802347 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorDeallocation { allocation_id: 2 allocator_name: "cuda_host" } [...] I0301 13:38:55.806454 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogStep { step_id: 1 handle: "->/init;0" } I0301 13:38:55.806659 81220 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 1 kernel_name: "random_normal/shape" tensor { dtype: DT_INT32 shape { dim { size: 4 } } allocation_description { requested_bytes: 16 allocated_bytes: 16 allocator_name: "cuda_host" allocation_id: 1 ptr: 8717860896 } } } [...] I0301 13:38:56.362898 81218 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: 1 kernel_name: "conv1/truncated_normal" tensor { dtype: DT_FLOAT shape { dim { size: 11 } dim { size: 11 } dim { size: 3 } dim { size: 96 } } allocation_description { requested_bytes: 139392 allocated_bytes: 139520 allocator_name: "gpu_bfc" allocation_id: 36 has_single_reference: true ptr: 47379030016 } } } I0301 13:38:56.362894 81217 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorDeallocation { allocation_id: 24 allocator_name: "gpu_bfc" } I0301 13:38:56.362903 81213 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 1 kernel_name: "conv5/truncated_normal/mul" tensor { dtype: DT_FLOAT shape { dim { size: 3 } dim { size: 3 } dim { size: 1024 } dim { size: 1024 } } allocation_description { requested_bytes: 37748736 allocated_bytes: 37748736 allocator_name: "gpu_bfc" allocation_id: 34 ptr: 48512711168 } } } [...] I0229 16:39:57.482980 76558 log_memory.cc:18] __LOG_MEMORY__ MemoryLogRawAllocation { step_id: 13 operation: "xentropy/EigenAllocator" num_bytes: 64 ptr: 47386857472 allocation_id: 625 allocator_name: "gpu_bfc" } I0229 16:39:57.483147 76558 log_memory.cc:18] __LOG_MEMORY__ MemoryLogRawDeallocation { step_id: 13 operation: "xentropy/EigenAllocator" allocation_id: 625 allocator_name: "gpu_bfc" deferred: true } I0229 16:39:57.483197 76558 log_memory.cc:18] __LOG_MEMORY__ MemoryLogRawDeallocation { step_id: 13 operation: "xentropy/EigenAllocator" allocation_id: 625 allocator_name: "gpu_bfc" } Change: 116065112
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
241698b6ba6cd9b13d606a9e4603baa4f33891f2 |
|
06-Feb-2016 |
Xiaoqiang Zheng <zhengxq@google.com> |
Move CPU/GPU memory copies into their own streams. Change: 114000504
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
545dee2e9897e424b641f092eec9ffd4a277f9d1 |
|
03-Feb-2016 |
Xiaoqiang Zheng <zhengxq@google.com> |
Put device-to-device GPU memory copies on a different stream. Change: 113784244
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
d821f6aeb66a93501673ac5314685bd7d58151f8 |
|
02-Feb-2016 |
Xiaoqiang Zheng <zhengxq@google.com> |
Disable tensor tracking when only one GPU stream is used. Change: 113579306
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
8a59748c087a2fee535c0d5067dbabb01920e812 |
|
29-Jan-2016 |
A. Unique TensorFlower <nobody@tensorflow.org> |
Use cc_binary rather than cc_library to reduce size of native library in APK from 5.5mb to 3.2mb (compressed). Change: 113369407
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
f8fa35b8a1910772d6d6ba7b621f905358640c2c |
|
26-Jan-2016 |
Josh Levenberg <josh11b@tensorflow.org> |
Global search & replace to move to the new location for tensorflow/core/ files and build targets. Change: 113080048
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
4ba51b33357d68f882a920fb4f87bfe67bb034a0 |
|
22-Jan-2016 |
Josh Levenberg <josh11b@tensorflow.org> |
Disentangle the GPU code from the CPU code. This means a few things: * The "core_cpu_internal" build target no longer includes files from the common_runtime/gpu/ directory. * tensorflow/core internal targets instead can get access to those headers via the "gpu_runtime" target. * The class "CopyTensor" is introduced. It lives in common_runtime/ but supports registration of copy functions so the "gpu_runtime" target can add a GPU->GPU copy ability if it is linked in. This registration should make it easier to add more device types in the future. * The "core_cpu" and "core_cpu_internal" build targets no longer reference GPUUtil::CopyViaDMA; rendezvous_mgr uses CopyTensor instead. Also the "copy_tensor" build target was not needed. Change: 112821119
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
c8eaac926c929e07ac8db69f67803a2223ff2d93 |
|
20-Jan-2016 |
Josh Levenberg <josh11b@tensorflow.org> |
Many tensorflow/core build clean ups. Change: 112523833
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
f592f23775e2a6ac75496829db5005d3bb70a3d2 |
|
19-Jan-2016 |
A. Unique TensorFlower <nobody@tensorflow.org> |
Replacing reference 'names' variable with 'example_names' variable. Change: 112481326
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
c7258dfbf2f97714596676972bf76880c4db6253 |
|
15-Jan-2016 |
A. Unique TensorFlower <nobody@tensorflow.org> |
Move GPU specific code out of generic DMA codepaths. The stream() and gpu_device_info() calls are not necessary for simple DMAs to/from a device. Moving this code into the GPU-GPU case allows the other DMA codepaths to be used for non-StreamExecutor devices which provide copy methods via their DeviceContext. Change: 112204599
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
861f8f01334d20e998eae9a759c8f5a1e07721ca |
|
14-Jan-2016 |
A. Unique TensorFlower <nobody@tensorflow.org> |
Move responsibility for constructing/destructing complex objects from a helper class in tensor.cc into the allocator class. This allows experimental devices more control over their memory allocation. Change: 112111713
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
8e388a5546f1466aa0d2afa00e5a015997a23a2b |
|
14-Jan-2016 |
Vijay Vasudevan <vrv@google.com> |
TensorFlow: Get rid of legacy command line flags use in TensorFlow. Change: 112105282
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
f9d3e9d03c69bfac77a2fe1ad80f7c5aa517e0f0 |
|
06-Dec-2015 |
Vijay Vasudevan <vrv@google.com> |
TensorFlow: upstream latest changes to git. Change 109537918 TensorFlow pip setup: wheel >= 0.26 for python3 pip install Change 109505848 Fix distortion default value to 1.0 in fixed_unigram_candidate_sampler. This means we default to the actual provided unigram distribution, instead of to the uniform (as it is currently). Change 109470494 Bugfix in gradients calculation when the ys rely on each other. Change 109467619 Fix CIFAR-10 model to train on all the training data instead of just 80% of it. Fixes #396. Change 109467557 Replaced checkpoint file with binary GraphDef. Change 109467433 Updates to C++ tutorial section. Change 109465269 TensorFlow: update documentation for tutorials to not assume use of bazel (when possible). Change 109462916 A tutorial for image recognition to coincide with the release of the latest Inception image classification model. Change 109462342 Clear control dependencies in variable_scope.get_variable() when creating ops for the initializer. Add tests of various error conditions. Change 109461981 Various performance improvements in low-level node execution code paths. Speeds up ptb_word_lm on my desktop with a Titan X from 3638 words per second to 3751 words per second (3.1% speedup). Changes include: o Avoided many strcmp operations per node execution and extra touches of cache lines in executor.cc, by making all the various IsMerge, IsSwitch, IsSend, etc. operations instead be based on an internal enum value that is pre-computed at Node construction time, rather than doing string comparisons against node->type_string(). We were doing about 6 such comparisons per executed node. o Removed mutex_lock in executor.cc in ExecutorState::Process. The lock was not needed and the comment about the iterations array being potentially resized is not true (the iterations arrays are created with a fixed size). Checked with yuanbyu to confirm this. o Added new two-argument port::Tracing::ScopedAnnotation constructor that takes two StringPiece arguments, and only concatenates them lazily if tracing is enabled. Also changed the code in platform/tracing.{h,cc} so that the ScopedAnnotation constructor and the TraceMe constructor can be inlined. o In BaseGPUDevice::Compute, used the two-argument ScopedAnnotation constructor to avoid doing StrCat(opkernel->name(), ":", op_kernel->type_string()) on every node execution on a GPU. o Introduced a new TensorReference class that just holds a reference to an underlying TensorBuffer, and requires an explicit Unref(). o Changed the EventMgr interface to take a vector of TensorReference objects for EventMgr::ThenDeleteTensors, rather than a vector of Tensor objects. o Used TensorReference in a few places in gpu_util.cc o Minor: switched to using InlinedVectors in a few places to get better cache locality. Change 109456692 Updated the label_image example to use the latest Inception model Change 109456545 Provides classify_image which performs image recognition on a 1000 object label set. $ ./classify_image giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.88493) indri, indris, Indri indri, Indri brevicaudatus (score = 0.00878) lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00317) custard apple (score = 0.00149) earthstar (score = 0.00127) Change 109455002 TensorFlow: make the helper libraries for various models available in the pip package so that when users type: python translate.py ... the absolute import works. This change is supposed to help make our tutorials run without the *need* to use bazel. Change 109450041 TensorFlow: remove cifar and convolutional binary copies from pip install. Adds embedding and some other models to the list. Change 109448520 Move the description of a failing invariant from a comment into the dcheck-fail message text. Change 109447577 TensorBoard has release tagging (tensorboard/TAG) Also track TensorBoard changes (tensorboard/CHANGES) Change 109444161 Added ParseSingleSequenceExample + python wrappers + unit tests. Change 109440864 Update all the TensorFlow Dockerfiles, and simplify GPU containers. This change updates all four of our Dockerfiles to match the targets discussed in https://github.com/tensorflow/tensorflow/issues/149. The most notable change here is moving the GPU images to use the NVidia containers which include cudnn and other build-time dependencies, dramatically simplifying both the build and run steps. A description of which tags exist and get pushed where will be in a follow-up. Change 109432591 Some pylint and pydoc changes in saver. Change 109430127 Remove unused hydrogen components Change 109419354 The RNN api, although moved into python/ops/, remains undocumented. It may still change at any time. Base CL: 109538006
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
9c3043ff3bf31a6a81810b4ce9e87ef936f1f529 |
|
20-Nov-2015 |
Manjunath Kudlur <keveman@gmail.com> |
TensorFlow: Improve performance of Alexnet Changes: * error message that refers to removed `DefaultSession` method. * -Wnull-conversion warnings * the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set. * typo in tutorial data download progress message. * a typo ("however their installing"=>"however installing"). * typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website. * a typo ("subtact"=>"subtract"). * protobuf examples in comments in tensorflow::Example.proto. * formula formatting in MNIST beginner tutorial * negative fraction-of-queue-full stats * protobuf inclusion path so that Android demo will build under Blaze. * small typo (moderatly > moderately) * Session.run() to check that tensor arguments come from the session's graph. * another six import * seq2seq typo in bazel command Base CL: 108349164
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
56313def004795f75ef8281a0294c958d28f1e06 |
|
16-Nov-2015 |
Vijay Vasudevan <vrv@google.com> |
TensorFlow: Doc and linter fixes, some additional tests and error handling, updates to website. Changes: - Removes redundant reshape from image models by @mrry - Default TensorBoard to localhost by @danmane - Reformatting of tensorflow/core by @josh11b - Make tutorials backwards compatible to 0.5.0 by @girving - Improve print documentation (md files not updated). - Add proper scrolling to sitemap by @martinwicke Base CL: 107956254
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|
f41959ccb2d9d4c722fe8fc3351401d53bcf4900 |
|
07-Nov-2015 |
Manjunath Kudlur <keveman@gmail.com> |
TensorFlow: Initial commit of TensorFlow library. TensorFlow is an open source software library for numerical computation using data flow graphs. Base CL: 107276108
/external/tensorflow/tensorflow/core/common_runtime/gpu/gpu_util.cc
|