History log of /external/tensorflow/tensorflow/tools/dist_test/scripts/k8s_tensorflow.py
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
b06281ba47595d07f58766d6477f896e39eb8a5e 18-Feb-2017 A. Unique TensorFlower <gardener@tensorflow.org> Split out most of k8s_tensorflow into a library and add a way to pass any
environment variables. Add benchmark_util library that would use environemnt
variable to decide on a storage location.
Change: 147890534
/external/tensorflow/tensorflow/tools/dist_test/scripts/k8s_tensorflow.py
cfe5658ff910aefd0b8c693e004782b8f6d66c75 10-Feb-2017 A. Unique TensorFlower <gardener@tensorflow.org> Set correct name prefix in cluster spec generated in k8s_tensorflow.py. Also, add a job label.
Change: 147184814
/external/tensorflow/tensorflow/tools/dist_test/scripts/k8s_tensorflow.py
9a88bff7589801727a1649a330fbb70321b04998 10-Feb-2017 A. Unique TensorFlower <gardener@tensorflow.org> A few changes for k8s_tensorflow.py.
Change: 147108270
/external/tensorflow/tensorflow/tools/dist_test/scripts/k8s_tensorflow.py
65b010308c2ab3f365b5b9b40dd56591b179b996 15-Sep-2016 Shanqing Cai <cais@google.com> Update & fix OSS distributed TF tests: mnist_replica

1) Replace the old and breaking docker-in-docker local test with a single-instance, multi-process test, built upon GitHub PR https://github.com/tensorflow/tensorflow/pull/3935

This simplifies the local test and makes it less susceptible to future changes in docker-in-docker support by docker.

2) Adding --existing_servers flag to mnist_replica.py and associated bash scripts, so that we can distinguish
a) the case in which we want to create in-process servers and supervisors (as in the new local_test.sh), and
b) the case in which GRPC TF servers are already created and we just want to connect to the workers (as in remote_test.sh).

3) Rename some flags in bash script to improve consistency with the mnist_replica.py.

4) Related doc changes in README.md.
Change: 133209130
/external/tensorflow/tensorflow/tools/dist_test/scripts/k8s_tensorflow.py
c87a7ca3113c95aaf52de7a094773feb2dba2fa1 13-Jul-2016 Vijay Vasudevan <vrv@google.com> s/Tensorflow/TensorFlow. A losing battle :)
Change: 127324936
/external/tensorflow/tensorflow/tools/dist_test/scripts/k8s_tensorflow.py
29510860560c66ebd117ba7544ededd49ca2d9cf 28-Jun-2016 Shanqing Cai <cais@google.com> Add wide&deep census model to dist_test

Test distributed training of a wide&deep (census) model on a local k8s cluster.

With local_test.sh, the "--model-name CENSUS_WIDENDEEP" flag can be used to run this test, e.g.,
local_test.sh --model-name CENSUS_WIDENDEEP

The k8s tf workers launched from within the docker-in-docker container will have shared storage at "/shared", which is mounted using k8s hostPath.

The syntax to run the existing MNIST distributed test is not affected.
Change: 126030379
/external/tensorflow/tensorflow/tools/dist_test/scripts/k8s_tensorflow.py
0a0e72b13e0bbe0d935b7fcef07a95949d7fd2e5 02-Jun-2016 A. Unique TensorFlower <nobody@tensorflow.org> Update copyright for 3p/tf/tools.
Change: 123889091
/external/tensorflow/tensorflow/tools/dist_test/scripts/k8s_tensorflow.py
7c86e9c90662d7e82a32e5887aab489747e94510 12-Apr-2016 A. Unique TensorFlower <nobody@tensorflow.org> Add SyncReplicasOptimizer test in dist_test

Usage example: ./remote_test.sh --num-workers 3 --sync-replicas

Also changed:
1) In local and remote tests, let different workers contact separate GRPC
sessions.
2) In local and remote tests, adding the capacity to specify the number of
workers. Before it was hard-coded at 2.
Usage example:
./remote_test.sh --num-workers 2 --sync-replicas
3) Using device setter in mnist_replica.py
Change: 119599547
/external/tensorflow/tensorflow/tools/dist_test/scripts/k8s_tensorflow.py
9742a2ed4e1e2015334d53dc2824c82812ad21e8 18-Mar-2016 A. Unique TensorFlower <nobody@tensorflow.org> Test for distributed (grpc) runtime in OSS TensorFlow

See README.md for detailed descriptions of the usage of the tools and tests in this changeset.

Three modes of testing are supported:
1) Launch a local Kubernetes (k8s) cluster and run the test suites on it
(See local_test.sh)
2) Launch a remote k8s cluster on Google Container Engine (GKE) and run the test suite on it
(See remote_test.sh)
3) Run the test suite on an existing k8s TensorFlow cluster
(Also see remote_test.sh)

Take the remote test for example, the following steps are performed:
1) Builds a Docker image with gcloud and Kubernetes tools, and the latest TensorFlow pip installed (see Dockerfile)
2) Launches a Docker container based on the said image (see test_distributed.sh)
3) From within the image, authenticate the gcloud user (with credentials files mapped from outside the container), configer the k8s cluster and launch a new k8s container cluster for TensorFlow workers
4) Generate a k8s (yaml) config file and user this yaml file to create a TensorFlow worker cluster consisting of a certian number of parameter servers (ps) and workers. The workers are exposed as external services with public IPs (see dist_test.sh)
5) Run a simple softmax MNIST model on multiple workers, with the model weights and biases located on the ps nodes. Train the models in parallel and observe the final validation cross entropy (see dist_mnist_test.sh)
Change: 117543657
/external/tensorflow/tensorflow/tools/dist_test/scripts/k8s_tensorflow.py