TensorFlowにdefine by run（TensorFlow Eager）がやってきた - のんびりしているエンジニアの日記

皆さんこんにちは
お元気ですか。今回の三連休は二郎食べて満足しました。

Chainerにはじまり、PyTorchなどdefine by runで
ニューラルネットワークを計算するフレームワークがあります。
このdefine by runは非常にRNN系のニューラルネットワークを書く際に重宝しています。

そして、10月末にTensorFlowからもdefine by run用の
インターフェースが試験的に提供されました。それがTensorFlow Eagerです。

※define by runが不明な方はこちらへ
s0sem0y.hatenablog.com

TensorFlow Eager

TensorFlow Eagerは次の公式の記事で紹介されています。
ただし、この機能はPreview段階です。
通常使う場面では問題ないと思いますが、念のため何か起こっても問題ない環境で利用してください。

research.googleblog.com

一言で言うなれば、TensorFlowのdefine by run用のインターフェースです。
次の特徴があるそうです。

高速なデバッグ
Pythonを利用した動的なモデル
TensorFlowの処理が大体実行可能

インストール

pipに含まれるTensorFlowに含まれていないため、別途インストールが必要です。
1.4.0ではimportをできますが、内部の実装がありません。
そのため、お試しで使うには次のコマンドを実行してください。

pip install tf-nightly

Example集

公式のサンプル集は次に掲載されています。とりあえず、MNISTを使います。
github.com

MNIST サンプルのご紹介

公式MNISTを紹介します。簡単に実装を理解するならばMNISTがわかりやすい。

準備

tensorflow eagerでは最初に次のコードを実行する必要があります。
このコードの実行によりtensorflow eagerでの実行ができます。所謂おまじないのようなコードです。

  tfe.enable_eager_execution()

そして、モデル、最適関数、データセットの準備をします。
MNISTModelの内部のコードに動かすニューラルネットワークを定義します。
実装方式は次で紹介します。

  # Load the datasets
  (train_ds, test_ds) = load_data(FLAGS.data_dir)
  train_ds = train_ds.shuffle(60000).batch(FLAGS.batch_size)

  # Create the model and optimizer
  model = MNISTModel(data_format)
  optimizer = tf.train.MomentumOptimizer(FLAGS.lr, FLAGS.momentum)

モデル

モデル部分のコードです。ChainerのChainやPyTorchのModuleと殆ど似ています。
__init__側に各モジュールを定義します。
Convolutionの計算や全結合層の計算モジュールを定義し、インスタンス変数として定義します。
__init__側で定義するのは、更新するパラメータを持つモジュールです。

計算の仕方はcall関数に定義します。ここで変数に対してどう計算するかを定義します。

class MNISTModel(tfe.Network):
  def __init__(self, data_format):
    super(MNISTModel, self).__init__(name='')
    if data_format == 'channels_first':
      self._input_shape = [-1, 1, 28, 28]
    else:
      assert data_format == 'channels_last'
      self._input_shape = [-1, 28, 28, 1]
    self.conv1 = self.track_layer(
        tf.layers.Conv2D(32, 5, data_format=data_format, activation=tf.nn.relu))
    self.conv2 = self.track_layer(
        tf.layers.Conv2D(64, 5, data_format=data_format, activation=tf.nn.relu))
    self.fc1 = self.track_layer(tf.layers.Dense(1024, activation=tf.nn.relu))
    self.fc2 = self.track_layer(tf.layers.Dense(10))
    self.dropout = self.track_layer(tf.layers.Dropout(0.5))
    self.max_pool2d = self.track_layer(
        tf.layers.MaxPooling2D(
            (2, 2), (2, 2), padding='SAME', data_format=data_format))

  def call(self, inputs, training):
    x = tf.reshape(inputs, self._input_shape)
    x = self.conv1(x)
    x = self.max_pool2d(x)
    x = self.conv2(x)
    x = self.max_pool2d(x)
    x = tf.layers.flatten(x)
    x = self.fc1(x)
    if training:
      x = self.dropout(x)
    x = self.fc2(x)
    return x

メインループ

サンプルのメインとなる処理です。
次のコードでは、1poch分の学習を行い、1epoch分学習が完了したモデルを保存しています。

  with tf.device(device):
    for epoch in range(1, 11):
      with tfe.restore_variables_on_create(
          tf.train.latest_checkpoint(FLAGS.checkpoint_dir)):
        global_step = tf.train.get_or_create_global_step()
        start = time.time()
        with summary_writer.as_default():
          train_one_epoch(model, optimizer, train_ds, FLAGS.log_interval)
        end = time.time()
        print('\nTrain time for epoch #%d (global step %d): %f' % (
            epoch, global_step.numpy(), end - start))
      with test_summary_writer.as_default():
        test(model, test_ds)
      all_variables = (
          model.variables
          + optimizer.variables()
          + [global_step])
      tfe.Saver(all_variables).save(
          checkpoint_prefix, global_step=global_step)

学習

次のコードは学習するコードです。
モデル部分で定義したmodelを使います。全体の流れは次の通りです。

バッチごとのデータセットをiteratorで取得する。（tfe.iterator(dataset)）
誤差を計算する（model loss）
最適化関数で最小化する（optimizer.minimize）

定義したモデルはmodel()で実行できます。

def train_one_epoch(model, optimizer, dataset, log_interval=None):
  """Trains model on `dataset` using `optimizer`."""

  tf.train.get_or_create_global_step()

  def model_loss(labels, images):
    prediction = model(images, training=True)
    loss_value = loss(prediction, labels)
    tf.contrib.summary.scalar('loss', loss_value)
    tf.contrib.summary.scalar('accuracy',
                              compute_accuracy(prediction, labels))
    return loss_value

  for (batch, (images, labels)) in enumerate(tfe.Iterator(dataset)):
    with tf.contrib.summary.record_summaries_every_n_global_steps(10):
      batch_model_loss = functools.partial(model_loss, labels, images)
      optimizer.minimize(
          batch_model_loss, global_step=tf.train.get_global_step())
      if log_interval and batch % log_interval == 0:
        print('Batch #%d\tLoss: %.6f' % (batch, batch_model_loss()))

判定

判定部は次のとおりです。モデルを使って出力し、誤差を計算しているなど
学習時に見られるコードのため、目立って新しい箇所はないかと思います。

def test(model, dataset):
  """Perform an evaluation of `model` on the examples from `dataset`."""
  avg_loss = tfe.metrics.Mean('loss')
  accuracy = tfe.metrics.Accuracy('accuracy')

  for (images, labels) in tfe.Iterator(dataset):
    predictions = model(images, training=False)
    avg_loss(loss(predictions, labels))
    accuracy(tf.argmax(predictions, axis=1, output_type=tf.int64),
             tf.argmax(labels, axis=1, output_type=tf.int64))
  print('Test set: Average loss: %.4f, Accuracy: %4f%%\n' %
        (avg_loss.result(), 100 * accuracy.result()))
  with tf.contrib.summary.always_record_summaries():
    tf.contrib.summary.scalar('loss', avg_loss.result())
    tf.contrib.summary.scalar('accuracy', accuracy.result())

標準出力

MNISTのサンプルを動作させると次の出力になります。

$ python mnist.py
/Users/Tereka/anaconda3/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
2017-11-05 23:14:38.631714: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
Using device /cpu:0, and data format channels_last.
Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz
Batch #0	Loss: 2.295565
Batch #10	Loss: 2.279653
Batch #20	Loss: 2.263355
Batch #30	Loss: 2.227737
Batch #40	Loss: 2.176997
Batch #50	Loss: 2.159155
Batch #60	Loss: 1.995388
Batch #70	Loss: 1.832986
Batch #80	Loss: 1.613709
Batch #90	Loss: 1.197411

仮にTypeErrorが発生した場合は、該当する引数を消してください。私の場合、次の箇所で発生しました。

TypeError: create_summary_file_writer() got an unexpected keyword argument 'flush_secs'

最後に

MNISTの公式サンプルをご紹介しました。
RNNを書く時に非常に使いやすくなりそうです。また、フレームワーク戦争が熾烈になりそう。