History log of /external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
995378c4c9ff156cae7a365cfdc1480a3ee6d0bf 26-Jan-2018 Yong Tang <yong.tang.github@outlook.com> Switch over to max_pool_v2 in Python (#14983)

* Switch over to max_pool_v2 in Python

This fix is a follow up to 11875 so that MaxPool in Python
use v2 version. As 11875 has been merged some time ago,
this fix conforms to the deprecation policy.

This fix is realted to 11875 and 4746.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Update test cases in contrib/specs/python/specs_test due to MaxPool -> MaxPoolV2

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Update tensorflow/contrib/receptive_field

Update tensorflow/contrib/receptive_field
due to max_pool's strides and ksize from attr -> input

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Remove const restriction for strides and ksize

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Register MaxPoolV2 with XLA

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Reformat with clang-format -i --style=Google

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
351c0a533a111636333b4ebeede16485cf679ca9 25-Jan-2018 Yifei Feng <yifeif@google.com> Add C0330 bad-continuation check to pylint.

PiperOrigin-RevId: 183270896
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
71896cc7e5bd3d1b8b5bb615eac7bebf86fa998c 04-Jan-2018 Raghuraman Krishnamoorthi <raghuramank@google.com> Merge changes from github.

PiperOrigin-RevId: 180746153
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
b1d8c59e9b014b527fb2fbef9ce9afc14dbc4938 22-Nov-2017 Yifei Feng <yifeif@google.com> Merge changes from github.

PiperOrigin-RevId: 176695926
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
e70c00950d295c519fd9c7f8b12e13a3c5aaf710 22-Nov-2017 Yifei Feng <yifeif@google.com> Automated g4 rollback of changelist 176615107

PiperOrigin-RevId: 176622438
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
ad7eeec1cc06d7fdba6ee404f03a35fab9cd3e6a 22-Nov-2017 Yifei Feng <yifeif@google.com> Automated g4 rollback of changelist 176615737

PiperOrigin-RevId: 176621645
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
d0a3b2d3983b970b750329088013dc5cb67d96f9 22-Nov-2017 A. Unique TensorFlower <gardener@tensorflow.org> Merged commit includes the following changes:
176617057 by yifeif:

Internal change.

--
176615737 by yifeif:

Fix internal tests.

--

PiperOrigin-RevId: 176617057
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
c6d603f02e1a98f871912cda6716cdcbed6b439e 22-Nov-2017 Yifei Feng <yifeif@google.com> Merge changes from github.

PiperOrigin-RevId: 176615107
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
248d7c26c2a3f8d3f45b3498eff1d639e7cb0077 20-Nov-2017 Yifei Feng <yifeif@google.com> Automated g4 rollback of changelist 176054079

PiperOrigin-RevId: 176418959
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
0e0cee30e74ee374104dcd15b787dac89dd9ed5f 18-Nov-2017 Yao Zhang <yaozhang@google.com> Add the missing use_gpu=True to make the GPU test take effect.

PiperOrigin-RevId: 176183039
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
780c64e3e872269e76efa27b5bb7fe2465c26dfe 17-Nov-2017 Yao Zhang <yaozhang@google.com> Turn off graph optimization in max pooling test because of the inconsistent
behavior on handling NaN and -Inf in different MaxPooling implementations. Split the tests as ConfigProto could interfere with each other.

PiperOrigin-RevId: 176054079
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
c14550a38308a7f516e83f5c8e21748ad76bf972 08-Sep-2017 A. Unique TensorFlower <gardener@tensorflow.org> Add an NCHW_VECT_C kernel to MaxPoolOp and MaxPoolOpV2

PiperOrigin-RevId: 168021874
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
fe8905ed992e29467bcf69053dfce07b77f906d3 01-Sep-2017 A. Unique TensorFlower <gardener@tensorflow.org> Fix issue where some pooling op gradients on the CPU would fail when strides > ksize.

RELNOTES: Enable support for strides > ksize for pooling operations.
PiperOrigin-RevId: 167327007
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
28ce1d163eeffe618a6972c5245be0e660d94e85 15-Aug-2017 A. Unique TensorFlower <gardener@tensorflow.org> Merge changes from github.
END_PUBLIC

---
Commit 9f81374c3 authored by raymondxyang<zihao.yang@microsoft.com>
Committed by Rasmus Munk Larsen<rmlarsen@google.com>:
Add option for build more python tests in Cmake (#11853)

* Ignore Windows built project

* Fix deprecated methods in tf.contrib.python

* Fix regex match for Windows build in contrib.keras

* Fix Regex match for Windows build in session_bundle

* * Fix deprecated methods
* Fix regex match for Windows
* Fix compatibility issue with Python 3.x

* Add missing ops into Windows build for test

* Enabled more testcases for Windows build

* Clean code and fix typo

* Add conditional cmake mode for enabling more unit testcase

* Add Cmake mode for major Contrib packages

* Add supplementary info in RAEDME for new cmake option

* * Update tf_tests after testing with TF 1.3
* Clean code and resolve conflicts

* Fix unsafe regex matches and format code

* Update exclude list after testing with latest master branch

* Fix missing module

---
Commit 98f0e1efe authored by Yong Tang<yong.tang.github@outlook.com>
Committed by Rasmus Munk Larsen<rmlarsen@google.com>:
Dynamic ksize and strides with MaxPool (#11875)

* Dynamic ksize with max_pool

This fix tries to fix the issue raised in 4746 where ksize
is static (attr) with max_pool.
This fix changes ksize to input tensor so that it is dynamic now.

This fix fixes 4746.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add dynamic ksize to MaxPoolGrad and MaxPoolGradGrad

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add test cases for max_pool_v2

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Fix GPU Jenkins issue.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Enable MaxPoolV2 in GPU

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Hide MaxPoolV2 and other fixes.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

---
Commit 02d6bc185 authored by Bairen Yi<byronyi@users.noreply.github.com>
Committed by Rasmus Munk Larsen<rmlarsen@google.com>:
remove useless variable (#12212)

---
Commit ed6b0d905 authored by namrata-ibm<bhavenamrata@gmail.com>
Committed by Rasmus Munk Larsen<rmlarsen@google.com>:
Adding support for s390x in calculation of cpu_frequency (#12201)

---
Commit 627dfc9dd authored by Taehoon Lee<taehoonlee@snu.ac.kr>
Committed by Taehoon Lee<taehoonlee@snu.ac.kr>:
Fix typos

---
Commit c0f9b0a91 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
In fast-math mode emit a tanh that has a faster min/max.

PiperOrigin-RevId: 164943597

---
Commit 87605f3d6 authored by Kay Zhu<kayzhu@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
[TF:XLA] Use HloEvaluator for ComputeConstant, remove the need of a dedicated
compute constant backend.

PiperOrigin-RevId: 164940970

---
Commit 881de45c2 authored by Taehoon Lee<me@taehoonlee.com>
Committed by Rasmus Munk Larsen<rmlarsen@google.com>:
Add bool type supports for GPU kernels (#11927)

* Add bool type supports for GPU kernels

* Add bool type test codes for GPU kernels

---
Commit eeacdcdb1 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Add missing "CPU" suffix in registrations.

PiperOrigin-RevId: 164939527

---
Commit de01be952 authored by namrata-ibm<bhavenamrata@gmail.com>
Committed by Rasmus Munk Larsen<rmlarsen@google.com>:
Adding support for Big Endian in graph_constructor_test and wav_io (#12179)

---
Commit 26719d29f authored by QingYing Chen<pkudysj@126.com>
Committed by Rasmus Munk Larsen<rmlarsen@google.com>:
Implement CRF decode (Viterbi decode) for tensor (#12056)

* Implement CRF decoding for tensors

* add test code for tensor version's CRF decoding

* made modifications according to pylint

* add some comments for crf decode

* remove useless code

* add comments at the top comment of crf module and add more comments in crf_test

* capitalize first char of first word in comments

* replace crf_decode test code with a deterministic example

---
Commit f9a81ca2f authored by Pete Warden<pete@petewarden.com>
Committed by gunan<gunan@google.com>:
Create CI build script for Raspberry Pi (#12190)

* Create CI build script for Raspberry Pi

* Moved location of Pi build script

---
Commit e2a163a90 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Merge code from PR #11940 with internal changes from cl/164796436, and update Python tests to also run on GPU.

PiperOrigin-RevId: 164929133

---
Commit 08bbfa187 authored by Taehoon Lee<me@taehoonlee.com>
Committed by Rasmus Munk Larsen<rmlarsen@google.com>:
Fix typos (#12195)

---
Commit ab96f41fb authored by Luke Iwanski<luke@codeplay.com>
Committed by Rasmus Munk Larsen<rmlarsen@google.com>:
[OpenCL] Extends matmul_benchmark.py to cover SYCL (#11697)

* [OpenCL] Extends matmul_benchmark.py to cover SYCL

* Fixed typo

* /gpu:0 -> /device:GPU:0

* Fixes control_flow_ops_py_test

* /gpu: -> /device:GPU:

* Fixes //tensorflow/python/profiler/internal:run_metadata_test

* gpu: -> GPU:

* Fixes tfprof_node

* [OpenCL] Fixes device path to name with many colons (#123)

The device path is constructed from a device name by replacing all
colons with underscores. Some device names contain more than one colon,
for example 'device:SYCL:0' which gives a path 'device_SYCL_0'. The
previous code would not convert this back to the original device name,
but rather to 'device:SYCL_0'.

An alternative fix would be to convert all underscores to colons in the
device name (i.e. remove the restriction inside `replace("_", ":", 1)`),
however I'm not sure if there are any device names which contain
underscores.

* If no gpu device aviable fake one

* gpu: -> device:GPU

* Fixes profiler test

* /gpu:x -> /device:GPU:x

* Fixes debug_io_utils_test.cc test

* Fixes device_name_utils_test.cc

---
Commit 35e7a3665 authored by Yong Tang<yong.tang.github@outlook.com>
Committed by Rasmus Munk Larsen<rmlarsen@google.com>:
Remove unneeded casting of int64 for reverse_sequence (#12192)

This fix remove unneeded cast of int64 for reverse_sequence:
```
lengths = math_ops.to_int64(lengths)
```
as int32 has already been enabled for reverse_sequence.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
---
Commit 9fba8c185 authored by Anna R<annarev@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Add benchmark dashboard link to benchmarks doc. Also, I added a link and
description for Benchmarks page to Community index page.

PiperOrigin-RevId: 164924906

---
Commit bb6f32fa7 authored by Mark Heffernan<meheff@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Make HloAliasAnalysis updatable after changes to the HLO graph.
As part of this change make HloAliasAnalysis a thinner layer which
basically only holds a map from HloValue to HloBuffer and vice versa.

PiperOrigin-RevId: 164923041

---
Commit 9103096c1 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by Thomas K?ppe<tkoeppe@google.com>:
Merged commit includes the following changes:
164923041 by meheff:

Make HloAliasAnalysis updatable after changes to the HLO graph.
As part of this change make HloAliasAnalysis a thinner layer which
basically only holds a map from HloValue to HloBuffer and vice versa.

--

PiperOrigin-RevId: 164923041

---
Commit 822603aed authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Merging sibling fusion instruction using multi_output_fusion

PiperOrigin-RevId: 164920220

---
Commit c035aa2a8 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Go: Update generated wrapper functions for TensorFlow ops.

PiperOrigin-RevId: 164917891

---
Commit e1e81d9ba authored by Luke Iwanski<luke@codeplay.com>
Committed by Rasmus Munk Larsen<rmlarsen@google.com>:
[OpenCL] Fixes double memcpy bug (#151) (#12173)

* [OpenCL] Fixes double memcpy bug (#151)

As the debg CopyOp is called on a Tensor without type, we need to use
the DataType enum to get type information, and use this to pass the type
on to Eigen. This is a workaround Eigen's need to have a type when
calling memcpy. If the Eigen memcpy can be provided without a type
requirement, then the memcpy in sycl_util is unnecessary.

* Acts on feedback from: #12173/files/32cb12a9001b672425867b5a3110fd98e737a20b#r132496277

---
Commit d9ca2d86d authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Internal change

PiperOrigin-RevId: 164916465

---
Commit b8d13d218 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Remove more parts of DCASGD missed in the first pass. (47949b)

PiperOrigin-RevId: 164914552

---
Commit 73b3d52c7 authored by Alexandre Passos<apassos@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
cmake fix

PiperOrigin-RevId: 164911656

---
Commit 2173b5b0a authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Allow TFE_TensorHandleCopyToDevice to have the same device as src and
destination. It will reuse the same underlying buffer in those cases.

PiperOrigin-RevId: 164909906

---
Commit 13eb3b90e authored by Alexandre Passos<apassos@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Experimental C and Python APIs to invoke TensorFlow kernels on concrete values.

PiperOrigin-RevId: 164902588

---
Commit 7dfabcc01 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Initialize ExecutionOptions in ComputeConstant to default values.

PiperOrigin-RevId: 164894867

---
Commit c8897e9bc authored by Benoit Steiner<bsteiner@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Static required time computation

PiperOrigin-RevId: 164894645

---
Commit 076158f9b authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Enable implicit->explicit conversion by default.

PiperOrigin-RevId: 164890915

---
Commit 58c4a4cb1 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Bugfix: number of input channels is not necessarily in the last dimension, after introduction of data_format param.

PiperOrigin-RevId: 164889729

---
Commit 8f9b1af8a authored by Igor Saprykin<isaprykin@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Recover MonitoredSession when the Coordinator is requested to stop with one of the _PREEMPTION_ERRORS.

When SyncReplicasOptimizer is used, a preemption in the Coordinator may result in two cases:
Case 1) the session gets silently marked as complete
Case 2) the session gets stuck

This CL aims to solve and verify solutions for both of these problems. Fix 1 changes the should_stop logic. Fix 2 changes the CoordinatedSession.run() logic.

SyncReplicasOptimizer runs a separate set of threads using a Coordinator instance. Those threads do FIFOQueue.enqueue; the main thread does a blocking FIFOQueue.dequeue.

`sync_token_q` FIFOQueue is on parameter-servers. When one of the PS instances gets preempted, an AbortedError causes the Coordinator to stop via request_stop(ex). That by itself changes the state of MonitoredSession.should_stop() to True (Fix 1).

Results of the blocking Dequeue operation are sent to the chief worker via Recv. What happens next depends on the amount of tokens in `sync_token_q`. If there are enough for the next call to Dequeue to return, then the low-level "tf session run() call" returns. The next iteration of the `while not MonitoredSession.should_stop()` loop decides that the training is complete (Case 1).

If there are not enough tokens in `sync_token_q`, then the blocking Dequeue is going to keep waiting for them. This results in the graph execution getting stuck and the whole session getting garbage collected after 10 minutes (Case 2).

We decided to fix that by re-creating a session after it gets garbage collected (Fix 2). An alternative was to try to cancel the pending Dequeue operation, but it's not clear that it is the right thing to do and it is also not easy.

PiperOrigin-RevId: 164888390

---
Commit 46e4de6e5 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Undo loop fusion changes for now as they seem to be altering a few results.
END_PUBLIC
RELNOTES: n/a

BEGIN_PUBLIC
BEGIN_PUBLIC
Automated g4 rollback of changelist 164825735

PiperOrigin-RevId: 165340331
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
90d6421c5e0898fb840197d9533c2f8ba1a7c651 11-Jul-2017 Shanqing Cai <cais@google.com> Merge changes from github.
END_PUBLIC

---
Commit d0f53f77f authored by Penghao Cen<scorpiocph@gmail.com>
Committed by Shanqing Cai<cais@google.com>:
Minor fix typo (#11323)

---
Commit 02fcf564e authored by Chris Song<sjhshy@gmail.com>
Committed by Chris Song<sjhshy@gmail.com>:
Fix misspells.

---
Commit 764c9b6b4 authored by Louis Tiao<ltiao@users.noreply.github.com>
Committed by GitHub<noreply@github.com>:
Fixed typo in docstring
---
Commit f8cd1283e authored by Shanqing Cai<cais@google.com>
Committed by Shanqing Cai<cais@google.com>:
Chaser

---
Commit 01383b946 authored by Shanqing Cai<cais@google.com>
Committed by Shanqing Cai<cais@google.com>:
Adapt TensorFlowTestCase.setUp() to new reset_default_graph() semantics

Avoid calling reset_default_graph() directly to prevent exceptions in
cases where test methods error out from within nested graph contexts,
which can leave _default_graph_stack non-empty in certain Python
versions.

---
Commit 0ffc37890 authored by Amit Patankar<amitpatankar@google.com>
Committed by Amit Patankar<amitpatankar@google.com>:
Removing second declaration of functions.

---
Commit f9c9cacb0 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Refactor ElementalIrEmitter's slice index finding code into
IrArray::Index::SourceIndexOfSlice().

PiperOrigin-RevId: 161140653

---
Commit ba297aec9 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Update ops-related pbtxt files.

PiperOrigin-RevId: 161138258

---
Commit 68d666737 authored by Alexandre Passos<apassos@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Fixes a reentrant lock issue with tensors using ndarray memory which uses tensor memory.

PiperOrigin-RevId: 161137788

---
Commit a2ee8bca3 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Add support for int8 x int8 -> int32 matrix multiplication via cublasGemmEx to stream_executor.

PiperOrigin-RevId: 161137741

---
Commit 755fa7b50 authored by Mark Daoust<markdaoust@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Block generate_test, and docs generating from running in python3.

- Doc generation is currently unsupported in python3

- These both end in errors in python 3.5.1+

PiperOrigin-RevId: 161137467

---
Commit 97cbcac45 authored by Peter Hawkins<phawkins@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
[TF:XLA] Fix failure in functionalize_control_flow rewrite for Enter nodes that are unused. Make sure we ignore such nodes without producing an error.

PiperOrigin-RevId: 161136545

---
Commit dabcb60bc authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
[XLA] Add reasonable error messages to Builder::Build for bad parameter numbers.

PiperOrigin-RevId: 161136262

---
Commit 0cbd249e8 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Add complex tensors support to `matrix_determinant`.

PiperOrigin-RevId: 161132422

---
Commit 335f1f14d authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Extend static shape inference for SparseTensors with dense_shapes constructed using slicing.

PiperOrigin-RevId: 161132391

---
Commit 53604916e authored by Jianwei Xie<xiejw@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Fixed the missing labels test in TPUEstimator.

PiperOrigin-RevId: 161131282

---
Commit 9f57dc8dd authored by Bruno Rosa<bruno.rosa@eldorado.org.br>
Committed by Bruno Rosa<bruno.rosa@eldorado.org.br>:
Use mcpu instead of march for ppc64le

march is not support by gcc on ppc64le

---
Commit 7d5c74a9c authored by Skye Wanderman-Milne<skyewm@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Move duplicate detection logic from Graph to FunctionLibraryDefinition

Turns out this is more useful, since there are many function libraries
that don't belong to a graph. This will be used in a future
change. Note that this maintains the current behavior of Graph.

In addition, updates FunctionDefsEqual() to handle unset attr entries
(I ran into this when using this in said future change).

PiperOrigin-RevId: 161126628

---
Commit 2caec3af1 authored by Shanqing Cai<cais@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Disable more timeseries py tests failing in OSS PIP GPU builds

PiperOrigin-RevId: 161124799

---
Commit 0b5cce367 authored by Eugene Brevdo<ebrevdo@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Get TopK op working on GPU again. Extend using cub's radix sort.

1. Undo rollback of Andreas Kirsch's initial implementation.
2. Use cub segmented radix sort if Andreas' heap-based impl
for large k and small num_cols (thresholds of k=100, n=1000
determined empirically).
3. Use cub segmented radix sort if k == num_cols (this case is always faster).
4. Added benchmarks.

Benchmarks show that the GPU implementation is up to 3x slower for small k but
can be 10x faster for large num_cols and k.

Benchmarks:

Benchmark: m_128_n_10_k_5_use_gpu_False wall_time: 0.000166 s Throughput: 0.0077 GB/s
Benchmark: m_128_n_10_k_5_use_gpu_True wall_time: 0.000796 s Throughput: 0.00161 GB/s
Benchmark: m_128_n_10_k_9_use_gpu_False wall_time: 0.00017 s Throughput: 0.00751 GB/s
Benchmark: m_128_n_10_k_9_use_gpu_True wall_time: 0.000796 s Throughput: 0.00161 GB/s
Benchmark: m_128_n_10_k_10_use_gpu_False wall_time: 0.00017 s Throughput: 0.00753 GB/s
Benchmark: m_128_n_10_k_10_use_gpu_True wall_time: 0.000775 s Throughput: 0.00165 GB/s
Benchmark: m_128_n_100_k_1_use_gpu_False wall_time: 0.000155 s Throughput: 0.0826 GB/s
Benchmark: m_128_n_100_k_1_use_gpu_True wall_time: 0.000796 s Throughput: 0.0161 GB/s
Benchmark: m_128_n_100_k_50_use_gpu_False wall_time: 0.000247 s Throughput: 0.0519 GB/s
Benchmark: m_128_n_100_k_50_use_gpu_True wall_time: 0.0008 s Throughput: 0.016 GB/s
Benchmark: m_128_n_100_k_99_use_gpu_False wall_time: 0.000261 s Throughput: 0.049 GB/s
Benchmark: m_128_n_100_k_99_use_gpu_True wall_time: 0.000794 s Throughput: 0.0161 GB/s
Benchmark: m_128_n_100_k_100_use_gpu_False wall_time: 0.000239 s Throughput: 0.0536 GB/s
Benchmark: m_128_n_100_k_100_use_gpu_True wall_time: 0.000777 s Throughput: 0.0165 GB/s
Benchmark: m_128_n_1000_k_1_use_gpu_False wall_time: 0.000324 s Throughput: 0.395 GB/s
Benchmark: m_128_n_1000_k_1_use_gpu_True wall_time: 0.000916 s Throughput: 0.14 GB/s
Benchmark: m_128_n_1000_k_10_use_gpu_False wall_time: 0.00042 s Throughput: 0.305 GB/s
Benchmark: m_128_n_1000_k_10_use_gpu_True wall_time: 0.000902 s Throughput: 0.142 GB/s
Benchmark: m_128_n_1000_k_500_use_gpu_False wall_time: 0.0011 s Throughput: 0.116 GB/s
Benchmark: m_128_n_1000_k_500_use_gpu_True wall_time: 0.00097 s Throughput: 0.132 GB/s
Benchmark: m_128_n_1000_k_990_use_gpu_False wall_time: 0.00133 s Throughput: 0.0962 GB/s
Benchmark: m_128_n_1000_k_990_use_gpu_True wall_time: 0.000993 s Throughput: 0.129 GB/s
Benchmark: m_128_n_1000_k_1000_use_gpu_False wall_time: 0.00102 s Throughput: 0.126 GB/s
Benchmark: m_128_n_1000_k_1000_use_gpu_True wall_time: 0.000964 s Throughput: 0.133 GB/s
Benchmark: m_128_n_10000_k_10_use_gpu_False wall_time: 0.002 s Throughput: 0.64 GB/s
Benchmark: m_128_n_10000_k_10_use_gpu_True wall_time: 0.00288 s Throughput: 0.445 GB/s
Benchmark: m_128_n_10000_k_100_use_gpu_False wall_time: 0.00233 s Throughput: 0.549 GB/s
Benchmark: m_128_n_10000_k_100_use_gpu_True wall_time: 0.00325 s Throughput: 0.394 GB/s
Benchmark: m_128_n_10000_k_5000_use_gpu_False wall_time: 0.0127 s Throughput: 0.101 GB/s
Benchmark: m_128_n_10000_k_5000_use_gpu_True wall_time: 0.00381 s Throughput: 0.336 GB/s
Benchmark: m_128_n_10000_k_9900_use_gpu_False wall_time: 0.015 s Throughput: 0.0853 GB/s
Benchmark: m_128_n_10000_k_9900_use_gpu_True wall_time: 0.00438 s Throughput: 0.292 GB/s
Benchmark: m_128_n_10000_k_10000_use_gpu_False wall_time: 0.0104 s Throughput: 0.123 GB/s
Benchmark: m_128_n_10000_k_10000_use_gpu_True wall_time: 0.00427 s Throughput: 0.3 GB/s
Benchmark: m_128_n_100000_k_100_use_gpu_False wall_time: 0.0148 s Throughput: 0.865 GB/s
Benchmark: m_128_n_100000_k_100_use_gpu_True wall_time: 0.0262 s Throughput: 0.488 GB/s
Benchmark: m_128_n_100000_k_1000_use_gpu_False wall_time: 0.0201 s Throughput: 0.636 GB/s
Benchmark: m_128_n_100000_k_1000_use_gpu_True wall_time: 0.0263 s Throughput: 0.486 GB/s
Benchmark: m_128_n_100000_k_50000_use_gpu_False wall_time: 0.214 s Throughput: 0.0599 GB/s
Benchmark: m_128_n_100000_k_50000_use_gpu_True wall_time: 0.0322 s Throughput: 0.398 GB/s
Benchmark: m_128_n_100000_k_99000_use_gpu_False wall_time: 0.262 s Throughput: 0.0489 GB/s
Benchmark: m_128_n_100000_k_99000_use_gpu_True wall_time: 0.0377 s Throughput: 0.34 GB/s
Benchmark: m_128_n_100000_k_100000_use_gpu_False wall_time: 0.118 s Throughput: 0.108 GB/s
Benchmark: m_128_n_100000_k_100000_use_gpu_True wall_time: 0.0365 s Throughput: 0.351 GB/s

END_PUBLIC

BEGIN_PUBLIC
BEGIN_PUBLIC
Automated g4 rollback of changelist 157169178

PiperOrigin-RevId: 161476569
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
ef2f8891ad409e41f4f9b8e9cfd86b519adb6da6 08-Apr-2017 A. Unique TensorFlower <gardener@tensorflow.org> Fix race due to unsafe buffer forwarding in maxpooling second order gradients added in #6664.
Re-enable previously flaky tests.
Clean up a few minor things in maxpooling_op_gpu.cu.cc
Change: 152550050
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
d0697156736ff137a8d8f6bcd934aa935bf89001 07-Apr-2017 Rohan Jain <rohanj@google.com> Merge changes from github.
Change: 152508170
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
f0d1d69e098458c95e07e37b93bb8ee38b2b7294 07-Apr-2017 Gunhan Gulsoy <gunan@google.com> Disable failing test cases in pooling_ops_test.
Change: 152447322
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
ccbc8991db3943ef984405881a1c917c530f902f 05-Apr-2017 A. Unique TensorFlower <gardener@tensorflow.org> Merge changes from github.
Change: 152200430
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
61f30222eba5e3f1f51dedb3c5493f5f8eb331c8 21-Mar-2017 A. Unique TensorFlower <gardener@tensorflow.org> Add support for the NCHW data_format for 3d operations (convolution, pooling).
This brings NCHW support for 3d in sync with the corresponding 2d ops.
Change: 150811076
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
bed8383c27a0a7225e6fc7ff59a2cd6388fb4d09 23-Dec-2016 Jonathan Hseu <jhseu@google.com> Merge changes from github.
Change: 142805270
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
5866e065bc95c1d7de8a27413b368016941889a6 15-Dec-2016 Justine Tunney <jart@google.com> Remove hourglass imports from kernel_tests
Change: 142080137
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
7c4333c3f4aac04140739cbd527247a7e07de014 30-Nov-2016 Benoit Steiner <bsteiner@google.com> Only run the tests for the NCHW layout on CUDA GPUs, since this is the only type of GPUs for which they're implemented.
Change: 140537838
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
c3a30a230f47a8ca8f5dd1dd79c63229ce1349b8 08-Sep-2016 A. Unique TensorFlower <gardener@tensorflow.org> Switch nn_ops shape fns to delegate to C++, for all that have a C++
implementation (fractional pool ones don't yet).

Change BiasAdd functions to require only rank 3, not 4, for NHWC. This matches
the behavior of GetBiasValueDims in bias_op.cc.

Removed unused functions common_shapes.bias_add_shape and
common_shapes.bias_add_grad_shape.
Change: 132597521
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
b74a74393b0d4ce118cc051d91f7b5996bf3dd98 07-Sep-2016 A. Unique TensorFlower <gardener@tensorflow.org> Delegate to C++ shape inference function for some conv and pooling functions.

Change several C++ shape inference functions to not return an error if an input
dimension is unknown; this more closely matches the python functions.
Change: 132459740
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
40a85bcf69bf250736aeea70cf0c18e0fcd72daa 28-Jun-2016 Gunhan Gulsoy <gunan@google.com> Make sure tests check if the machine has a GPU, rather than just checking if
CUDA is linked in.
Also fix a typo in tf.test
Change: 126104437
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
5a84537852ef9b164bad165c8450bde67d30df05 07-Jun-2016 Benoit Steiner <benoit.steiner.goog@gmail.com> Enable fp16 for most of the pooling ops (MaxPool, AvgPool, associated
gradients, some variants etc.).
Change: 124197406
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
8abe7db540825441948fcc656f0d108f1a83361c 03-Jun-2016 A. Unique TensorFlower <nobody@tensorflow.org> Enable fp16 for most of the pooling ops (MaxPool, AvgPool, associated
gradients, some variants etc.).
Change: 123967787
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
9bedadceab3e126684494e6e6a8103ccab9d90c7 03-Jun-2016 Benoit Steiner <benoit.steiner.goog@gmail.com> Enable fp16 for most of the pooling ops (MaxPool, AvgPool, associated
gradients, some variants etc.).
Change: 123967117
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
0cf9ed3a719c0782695154d5a0bca260001cec15 02-Jun-2016 A. Unique TensorFlower <nobody@tensorflow.org> Update copyright for 3p/tf/python.
Change: 123900456
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
8bf6ef1337359993a8be057c0dc90da8f5a6e4fa 05-May-2016 A. Unique TensorFlower <nobody@tensorflow.org> Merge changes from github.
Change: 121586635
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
6a187ccddaebb741ea77fc3201c6e36625f0aadb 04-May-2016 A. Unique TensorFlower <nobody@tensorflow.org> Add support for 3d convolutions and pooling. CPU kernels use Eigen, GPU kernels use CuDNN.
Change: 121484787
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
5c9bc51857bc0c330d3ab976871ee3509647d1e7 19-Apr-2016 Illia Polosukhin <ilblackdragon@gmail.com> Merge changes from github.
Change: 120185825
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
0663ed111abce98a22d8311a122313df11acde68 22-Mar-2016 Geoffrey Irving <geoffreyi@google.com> Shrink pooling_ops_test and unshard

pooling_ops_test is only slow because it uses unnecessarily large sizes. Speed
it up by dividing all the depth values coming out of Inception by 30.

After this speedup, there is no need to shard.
Change: 117867788
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
01a6f5e504d9299395888a786e52c589c16af529 26-Feb-2016 Xiaoqiang Zheng <zhengxq@google.com> Multiple layout support for pooling operations.
Change: 115611259
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
7760ce56fc3ab4ab8cdc408e29d8ad8b539c417e 11-Feb-2016 Josh Levenberg <josh11b@tensorflow.org> Get rid of some import cruft.
Change: 114374558
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
18297126c0bc13c7045841f5a7a99f3da68176f4 10-Feb-2016 Geoffrey Irving <geoffreyi@google.com> Fix tf.test for PEP-8 and document

tf.test now has appropriate snake case function names (get_temp_dir and
is_built_with_cuda) and has normal toplevel module documentation.

Also fix a bug in make_all.
Change: 114351269
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
795f35da2d458cbae477ac2fe2bff80c1427a771 01-Dec-2015 Vijay Vasudevan <vrv@google.com> TensorFlow: upstream changes to git

Change:
Clean up documentation for ReverseSequence
Change:
Updated several tensorflow operations to use 32bit indices on GPU.
Change:
Add attribute batch_dim to ReverseSequenceOp.
Change:
Fix error in convert_to_records.py. As reported in
https://github.com/tensorflow/tensorflow/issues/370
by AlexUnderMicrocontRoll.
Change:
Update TensorBoard README.
Change:
Fixes to boolean flags reported in
https://github.com/tensorflow/tensorflow/issues/379. Supports:

--bool_flag=True --> True
--bool_flag=False --> False
--bool_flag=gibberish --> False
--bool_flag --> True
--nobool_flag --> False

Fixes #379
Change:
Update generated Op docs.
Change:
Enable local development of TensorBoard using gulp
Also make tf-tensorboard a regular component rather than special case

This is mostly effected by creating tfserve.js, which is a small server
with clever routing to load from bower_components/ and components/ using
the paths that work within google3.

Workflow: `gulp serve`
Change:
Add a full working code example to the tensorboard and summaries tutorial
Change:
Fix seq2seq_test when running on GPU.

The "proj_w" and "proj_b" variables were being created before the
`test_session()`'s device function took effect, which pushed the
placement algorithm into making an incorrect decision.
Change:
Add a sentence in TensorBoard README on how to serialize summary data to logs and provide link to the how-to tutorial on the TensorFlow website.
Change:
Add error-catching code if string_input_producer is supplied a null input.
Before this change, it would die with an opaque shape error from inside
the queue. This change catches (most) python null lists being
passed directly in, and at runtime detects null tensors.

Adds two tests for this to input_test.py
Change:
Speed up for models that use the same variable multiple times in the case
where variables must be copied across devices:
- Have Variables wrap the Variable op in an Identity op when converted to Tensor.
This avoids multiple copies across devices if a variable is used multiple time
in a computation.
- Add Variable.mutable() to return the non-wrapped Variable op for used when
assigning new values.
- Add an as_ref parameter to convert_to_tensor() to allow code to specify
if they plan to assign a new value to the result of the conversion. Make Variable
return the result of Variable.mutable() when as_ref is True.
- Make all ops that assign values to variables pass as_ref=True when converting
their arguments.
Change:
Change to reduce critical section times in gpu_event_mgr.h:
(1) Call stream->ThenRecordEvent outside the EventMgr critical section
(2) Do memory deallocation outside the critical section

Speeds up one configuration of ptb_word_lm from 2924 words per
second (wps) to 3278 wps on my desktop machine with a Titan X.
Change:
Remove some colons that break the open source build

::tensorflow::StringPiece breaks for @raingo, see
https://github.com/tensorflow/tensorflow/issues/358.
tensorflow::StringPiece (without the leading colons)
seems to fix the problem.
Change:
Added check that inputs to Operation is a list and make a defensive copy of the input. This is for cases where the input list is changed such as in _add_input.
Change:
Use standard names for TensorFlow dtypes in the tutorial.
Change:
Add tests for tensor inputs.
Change:
Fix build after declaring more types for ops
Change:
Switch to 32 bit indexing to speedup convolutions and concatenations.
Change:
Add convert_image op to convert between types for images (similar to OpenCV's cvtScale).
Change:
Make cast work between numeric types (bool, uint8, int16, int32, int64, float, double).
Change:

Padding input data for odd number of paddings, so we can use cudnn anyway.
+ Fix total padding computation when padding==VALID.
+ This CL makes the Googlenet benchmark run 5x faster.

Change:
Support IndexedSlices in ConcatGrad
Change:
* sampled softmax op uses one embedding lookup for positive and negative samples
* float64 support for sampled softmax
Change:
Move RNN code out of models.rnn (without breaking existing code). The API may still undergo minor changes, until full documentation as added.
Change:
Changed to use per-step stacks for the accumulators used in while-loop gradient computation. This addresses the problem caused by using concat without sufficient static shape information. It should also improve performance as we avoided those expensive concats.
Change:
Update generated Op docs.
Change:
Improve error messages when the optimizer finds no variables to minimize or
when none of the variables has gradients.
Change:
Say that -1 isn't just for flattening in reshape docs

Also add scalar reshape (reshape(t, [])) as an example.

This fixes https://github.com/tensorflow/tensorflow/issues/281.
Change:
This is a test.

Base CL: 109118714
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
9c3043ff3bf31a6a81810b4ce9e87ef936f1f529 20-Nov-2015 Manjunath Kudlur <keveman@gmail.com> TensorFlow: Improve performance of Alexnet

Changes:

* error message that refers to removed `DefaultSession` method.
* -Wnull-conversion warnings
* the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set.
* typo in tutorial data download progress message.
* a typo ("however their installing"=>"however installing").
* typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website.
* a typo ("subtact"=>"subtract").
* protobuf examples in comments in tensorflow::Example.proto.
* formula formatting in MNIST beginner tutorial
* negative fraction-of-queue-full stats
* protobuf inclusion path so that Android demo will build under Blaze.
* small typo (moderatly > moderately)
* Session.run() to check that tensor arguments come from the session's graph.
* another six import
* seq2seq typo in bazel command

Base CL: 108349164
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
f2102f4e2c1c87f1d1bf9ab856a2849c54478760 12-Nov-2015 Vijay Vasudevan <vrv@google.com> TensorFlow: upstream changes from the afternoon.

Changes:

- futurize --stage2 changes for Python 3 compatibility by @girving.

- Small updates to documentation by @vrv, schuster and others

- Account for failure of std::thread::hardware_concurrency by @ebrevdo.

- More changes for backwards-compatibility tests by Josh

- Updates to python op doc generation by Josh

- Added support for using the best-fit allocator via ConfigProto by @vrv.

- Rename LocalSession to DirectSession, since local was a bad name for
it.

- Enable tf.nn.moments() to work with tensors of unknown shape by @mrry.
GITHUB_ISSUE: 139

- Changes for Android build by Andrew.

Base CL: 107645181
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
61d3a958d6d83cb6037490d933b47621cc4009cc 09-Nov-2015 Vijay Vasudevan <vrv@google.com> TensorFlow: Initial steps towards python3 support, some documentation
bug fixes -- reindents to 2 for some of the files to match our internal
requirements.

Thanks to Martin Andrews for the basic_usage.md suggested fix via
Gerrit.

Base CL: 107394029
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py
f41959ccb2d9d4c722fe8fc3351401d53bcf4900 07-Nov-2015 Manjunath Kudlur <keveman@gmail.com> TensorFlow: Initial commit of TensorFlow library.
TensorFlow is an open source software library for numerical computation
using data flow graphs.

Base CL: 107276108
/external/tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py