• Home
  • History
  • Annotate
  • only in /external/tensorflow/tensorflow/contrib/kfac/
NameDateSize

..10-Aug-20184 KiB

__init__.py10-Aug-20182 KiB

BUILD10-Aug-20181.2 KiB

examples/10-Aug-20184 KiB

g3doc/10-Aug-20184 KiB

python/10-Aug-20184 KiB

README.md10-Aug-20183.1 KiB

README.md

1# K-FAC: Kronecker-Factored Approximate Curvature
2
3**K-FAC in TensorFlow** is an implementation of [K-FAC][kfac-paper], an
4approximate second-order optimization method, in TensorFlow. When applied to
5feedforward and convolutional neural networks, K-FAC can converge `>3.5x`
6faster in `>14x` fewer iterations than SGD with Momentum.
7
8[kfac-paper]: https://arxiv.org/abs/1503.05671
9
10## What is K-FAC?
11
12K-FAC, short for "Kronecker-factored Approximate Curvature", is an approximation
13to the [Natural Gradient][natural_gradient] algorithm designed specifically for
14neural networks. It maintains a block-diagonal approximation to the [Fisher
15Information matrix][fisher_information], whose inverse preconditions the
16gradient.
17
18K-FAC can be used in place of SGD, Adam, and other `Optimizer` implementations.
19Experimentally, K-FAC converges `>3.5x` faster than well-tuned SGD.
20
21Unlike most optimizers, K-FAC exploits structure in the model itself (e.g. "What
22are the weights for layer i?"). As such, you must add some additional code while
23constructing your model to use K-FAC.
24
25[natural_gradient]: http://www.mitpressjournals.org/doi/abs/10.1162/089976698300017746
26[fisher_information]: https://en.wikipedia.org/wiki/Fisher_information#Matrix_form
27
28## Why should I use K-FAC?
29
30K-FAC can take advantage of the curvature of the optimization problem, resulting
31in **faster training**. For an 8-layer Autoencoder, K-FAC converges to the same
32loss as SGD with Momentum in 3.8x fewer seconds and 14.7x fewer updates. See how
33training loss changes as a function of number of epochs, steps, and seconds:
34
35![autoencoder](g3doc/autoencoder.png)
36
37## Is K-FAC for me?
38
39If you have a feedforward or convolutional model for classification that is
40converging too slowly, K-FAC is for you. K-FAC can be used in your model if:
41
42*   Your model defines a posterior distribution.
43*   Your model uses only fully-connected or convolutional layers (residual
44    connections OK).
45*   You are training on CPU or GPU.
46*   You can modify model code to register layers with K-FAC.
47
48## How do I use K-FAC?
49
50Using K-FAC requires three steps:
51
521.  Registering layer inputs, weights, and pre-activations with a
53    `LayerCollection`.
541.  Minimizing the loss with a `KfacOptimizer`.
551.  Keeping K-FAC's preconditioner updated.
56
57```python
58# Build model.
59w = tf.get_variable("w", ...)
60b = tf.get_variable("b", ...)
61logits = tf.matmul(x, w) + b
62loss = tf.reduce_mean(
63  tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=logits))
64
65# Register layers.
66layer_collection = LayerCollection()
67layer_collection.register_fully_connected((w, b), x, logits)
68layer_collection.register_categorical_predictive_distribution(logits)
69
70# Construct training ops.
71optimizer = KfacOptimizer(..., layer_collection=layer_collection)
72train_op = optimizer.minimize(loss)
73
74# Minimize loss.
75with tf.Session() as sess:
76  ...
77  sess.run([train_op, optimizer.cov_update_op, optimizer.inv_update_op])
78```
79
80See [`examples/`](https://www.tensorflow.org/code/tensorflow/contrib/kfac/examples/) for runnable, end-to-end illustrations.
81
82## Authors
83
84- Alok Aggarwal
85- Daniel Duckworth
86- James Martens
87- Matthew Johnson
88- Olga Wichrowska
89- Roger Grosse
90