# TensorFlow 2.0 + Keras Crash Course

Ich habe heute einen großartiger TensorFlow 2.0 + Keras Crash Course für Forscher im Bereich des tiefen Lernens/Deep Learning durch den Schöpfer von Keras (Bibliothek für neuronale Netzwerke), François Chollet, entdeckt.

Code wird auch in einem Colab-Notizbuch bereitgestellt.

Bist du ein Forscher im Deep Learning Bereich? Fragst du dich, ob all das TensorFlow 2.0-Zeug für dich relevant ist? Dieses Notizbuch ist ein Crash-Kurs zu allem, was Sie wissen müssen, um TensorFlow 2.0 für die Deep Learning Forschung zu nutzen.

Ich werde unten ein Copy Paste von tf.keras for Researchers: Crash Course.ipynb machen. Viel Spaß!

!pip install tf-nightly-gpu-2.0-preview

import tensorflow as tf

tf.__version__

1) The first class you need to know is `Layer`. A Layer encapsulates a state (weights) and some computation (defined in the `call` method).

from tensorflow.keras.layers import Layer

class Linear(Layer):
„““y = w.x + b“““

def __init__(self, units=32, input_dim=32):
super(Linear, self).__init__()
w_init = tf.random_normal_initializer()
self.w = tf.Variable(
initial_value=w_init(shape=(input_dim, units), dtype=’float32′),
trainable=True)
b_init = tf.zeros_initializer()
self.b = tf.Variable(
initial_value=b_init(shape=(units,), dtype=’float32′),
trainable=True)

def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b

# Instantiate our layer.
linear_layer = Linear(4, 2)

# The layer can be treated as a function.
# Here we call it on some data.
y = linear_layer(tf.ones((2, 2)))
assert y.shape == (2, 4)

# Weights are automatically tracked under the `weights` property.
assert linear_layer.weights == [linear_layer.w, linear_layer.b]

2) The `add_weight` method gives you a shortcut for creating weights.

3) It’s good practice to create weights in a separate `build` method, called lazily with the shape of the first inputs seen by your layer. Here, this pattern prevents us from having to specify `input_dim` in the constructor:

class Linear(Layer):
„““y = w.x + b“““

def __init__(self, units=32):
super(Linear, self).__init__()
self.units = units

def build(self, input_shape):
initializer=’random_normal‘,
trainable=True)
initializer=’random_normal‘,
trainable=True)

def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b

# Instantiate our lazy layer.
linear_layer = Linear(4)

# This will also call `build(input_shape)` and create the weights.
y = linear_layer(tf.ones((2, 2)))

4) You can automatically retrieve the gradients of the weights of a layer by calling it inside a `GradientTape`. Using these gradients, you can update the weights of the layer, either manually, or using an optimizer object. Of course, you can modify the gradients before using them, if you need to.

# Prepare a dataset.
dataset = tf.data.Dataset.from_tensor_slices(
(x_train.reshape(60000, 784).astype(‚float32‘) / 255, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)

# Instantiate our linear layer (defined above) with 10 units.
linear_layer = Linear(10)

# Instantiate a logistic loss function that expects integer targets.
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)

# Iterate over the batches of the dataset.
for step, (x, y) in enumerate(dataset):

# Forward pass.
logits = linear_layer(x)

# Loss value for this batch.
loss = loss_fn(y, logits)

# Get gradients of weights wrt the loss.

# Update the weights of our linear layer.

# Logging.
if step % 100 == 0:
print(step, float(loss))

5) Weights created by layers can be either trainable or non-trainable. They’re exposed in `trainable_weights` and `non_trainable_weights`. Here’s a layer with a non-trainable weight:

class ComputeSum(Layer):
„““Returns the sum of the inputs.“““

def __init__(self, input_dim):
super(ComputeSum, self).__init__()
# Create a non-trainable weight.
self.total = tf.Variable(initial_value=tf.zeros((input_dim,)),
trainable=False)

def call(self, inputs):
return self.total

my_sum = ComputeSum(2)
x = tf.ones((2, 2))

y = my_sum(x)
print(y.numpy()) # [2. 2.]

y = my_sum(x)
print(y.numpy()) # [4. 4.]

assert my_sum.weights == [my_sum.total]
assert my_sum.non_trainable_weights == [my_sum.total]
assert my_sum.trainable_weights == []

6) Layers can be recursively nested to create bigger computation blocks. Each layer will track the weights of its sublayers (both trainable and non-trainable).

# Let’s reuse the Linear class
# with a `build` method that we defined above.

class MLP(Layer):
„““Simple stack of Linear layers.“““

def __init__(self):
super(MLP, self).__init__()
self.linear_1 = Linear(32)
self.linear_2 = Linear(32)
self.linear_3 = Linear(10)

def call(self, inputs):
x = self.linear_1(inputs)
x = tf.nn.relu(x)
x = self.linear_2(x)
x = tf.nn.relu(x)
return self.linear_3(x)

mlp = MLP()

# The first call to the `mlp` object will create the weights.
y = mlp(tf.ones(shape=(3, 64)))

# Weights are recursively tracked.
assert len(mlp.weights) == 6

7) Layers can create losses during the forward pass. This is especially useful for regularization losses. The losses created by sublayers are recursively tracked by the parent layers.

class ActivityRegularization(Layer):
„““Layer that creates an activity sparsity regularization loss.“““

def __init__(self, rate=1e-2):
super(ActivityRegularization, self).__init__()
self.rate = rate

def call(self, inputs):
# We use `add_loss` to create a regularization loss
# that depends on the inputs.
return inputs

# Let’s use the loss layer in a MLP block.

class SparseMLP(Layer):
„““Stack of Linear layers with a sparsity regularization loss.“““

def __init__(self):
super(SparseMLP, self).__init__()
self.linear_1 = Linear(32)
self.regularization = ActivityRegularization(1e-2)
self.linear_3 = Linear(10)

def call(self, inputs):
x = self.linear_1(inputs)
x = tf.nn.relu(x)
x = self.regularization(x)
return self.linear_3(x)

mlp = SparseMLP()
y = mlp(tf.ones((10, 10)))

print(mlp.losses) # List containing one float32 scalar

8) These losses are cleared by the top-level layer at the start of each forward pass — they don’t accumulate. `layer.losses` always contain only the losses created during the last forward pass. You would typically use these losses by summing them before computing your gradients when writing a training loop.

# Losses correspond to the *last* forward pass.
mlp = SparseMLP()
mlp(tf.ones((10, 10)))
assert len(mlp.losses) == 1
mlp(tf.ones((10, 10)))
assert len(mlp.losses) == 1 # No accumulation.

# Let’s demonstrate how to use these losses in a training loop.

# Prepare a dataset.
dataset = tf.data.Dataset.from_tensor_slices(
(x_train.reshape(60000, 784).astype(‚float32‘) / 255, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)

# A new MLP.
mlp = SparseMLP()

# Loss and optimizer.
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)

for step, (x, y) in enumerate(dataset):

# Forward pass.
logits = mlp(x)

# External loss value for this batch.
loss = loss_fn(y, logits)

# Add the losses created during the forward pass.
loss += sum(mlp.losses)

# Get gradients of weights wrt the loss.

# Update the weights of our linear layer.

# Logging.
if step % 100 == 0:
print(step, float(loss))

9) Running eagerly is great for debugging, but you will get better performance by compiling your computation into static graphs. Static graphs are a researcher’s best friends. You can compile any function by wrapping it in a `tf.function` decorator.

# Prepare our layer, loss, and optimizer.
mlp = MLP()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)

# Create a training step function.

@tf.function # Make it fast.
def train_on_batch(x, y):
logits = mlp(x)
loss = loss_fn(y, logits)
return loss

# Prepare a dataset.
dataset = tf.data.Dataset.from_tensor_slices(
(x_train.reshape(60000, 784).astype(‚float32‘) / 255, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)

for step, (x, y) in enumerate(dataset):
loss = train_on_batch(x, y)
if step % 100 == 0:
print(step, float(loss))

10) Some layers, in particular the `BatchNormalization` layer and the `Dropout` layer, have different behaviors during training and inference. For such layers, it is standard practice to expose a `training` (boolean) argument in the `call` method.

By exposing this argument in `call`, you enable the built-in training and evaluation loops (e.g. `fit`) to correctly use the layer in training and inference.

class Dropout(Layer):

def __init__(self, rate):
super(Dropout, self).__init__()
self.rate = rate

@tf.function
def call(self, inputs, training=None):
# Note that the tf.function decorator enables use
# to use imperative control flow like this `if`,
# while defining a static graph!
if training:
return tf.nn.dropout(inputs, rate=self.rate)
return inputs

class MLPWithDropout(Layer):

def __init__(self):
super(MLPWithDropout, self).__init__()
self.linear_1 = Linear(32)
self.dropout = Dropout(0.5)
self.linear_3 = Linear(10)

def call(self, inputs, training=None):
x = self.linear_1(inputs)
x = tf.nn.relu(x)
x = self.dropout(x, training=training)
return self.linear_3(x)

mlp = MLPWithDropout()
y_train = mlp(tf.ones((2, 2)), training=True)
y_test = mlp(tf.ones((2, 2)), training=False)

11) You have many built-in layers available, from `Dense` to `Conv2D` to `LSTM` to fancier ones like `Conv2DTranspose` or `ConvLSTM2D`. Be smart about reusing built-in functionality.

12) To build deep learning models, you don’t have to use object-oriented programming all the time. All layers we’ve seen so far can also be composed functionally, like this (we call it the „Functional API“):

# We use an `Input` object to describe the shape and dtype of the inputs.
# This is the deep learning equivalent of *declaring a type*.
# The shape argument is per-sample; it does not include the batch size.
# The functional API focused on defining per-sample transformations.
# The model we create will automatically batch the per-sample transformations,
# so that it can be called on batches of data.
inputs = tf.keras.Input(shape=(16,))

# We call layers on these „type“ objects
# and they return updated types (new shapes/dtypes).
x = Linear(32)(inputs) # We are reusing the Linear layer we defined earlier.
x = Dropout(0.5)(x) # We are reusing the Dropout layer we defined earlier.
outputs = Linear(10)(x)

# A functional `Model` can be defined by specifying inputs and outputs.
# A model is itself a layer like any other.
model = tf.keras.Model(inputs, outputs)

# A functional model already has weights, before being called on any data.
# That’s because we defined its input shape in advance (in `Input`).
assert len(model.weights) == 4

# Let’s call our model on some data, for fun.
y = model(tf.ones((2, 16)))
assert y.shape == (2, 10)

# You can pass a `training` argument in `__call__`
# (it will get passed down to the Dropout layer).
y = model(tf.ones((2, 16)), training=True)

The Functional API tends to be more concise than subclassing, and provides a few other advantages (generally the same advantages that functional, typed languages provide over untyped OO development). However, it can only be used to define DAGs of layers — recursive networks should be defined as `Layer`subclasses instead.

In your research workflows, you may often find yourself mix-and-matching OO models and Functional models.

### Das ist alles, was Sie benötigen, um die meisten tiefgreifenden Forschungsarbeiten in TensorFlow 2.0 und Keras neu zu implementieren!

Nun zu einem kurzen Forschungsbeispiel: Hypernetzwerke.

Let’s put these concepts into practice with a simple end-to-end example.

hypernetwork is a deep neural network whose weights are generated by another network (usually smaller).

Let’s implement a really trivial hypernetwork: we’ll take the `Linear` layer we defined earlier, and we’ll use it to generate the weights of… another `Linear` layer.

input_dim = 784
classes = 10

# The model we’ll actually use (the hypernetwork).
outer_model = Linear(classes)

# It doesn’t need to create its own weights, so let’s mark it as already built.
# That way, calling `outer_model` won’t create new variables.
outer_model.built = True

# The model that generates the weights of the model above.
inner_model = Linear(input_dim * classes + classes)

# Loss and optimizer.
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)

# Prepare a dataset.
dataset = tf.data.Dataset.from_tensor_slices(
(x_train.reshape(60000, 784).astype(‚float32‘) / 255, y_train))

# We’ll use a batch size of 1 for this experiment.
dataset = dataset.shuffle(buffer_size=1024).batch(1)

losses = [] # Keep track of the losses over time.
for step, (x, y) in enumerate(dataset):

# Predict weights for the outer model.
weights_pred = inner_model(x)

# Reshape them to the expected shapes for w and b for the outer model.
w_pred = tf.reshape(weights_pred[:, :-classes], (input_dim, classes))
b_pred = tf.reshape(weights_pred[:, -classes:], (classes,))

# Set the weight predictions as the weight variables on the outer model.
outer_model.w = w_pred
outer_model.b = b_pred

# Inference on the outer model.
preds = outer_model(x)
loss = loss_fn(y, preds)

# Train only inner model.

# Logging.
losses.append(float(loss))
if step % 100 == 0:
print(step, sum(losses) / len(losses))

# Stop after 1000 steps.
if step >= 1000:
break