TensorFlowのTutrialの解説と機械学習の実験をした。 - のんびりしているエンジニアの日記

今日は様々な箇所で賑わっているTensorFlowを使ってみました。

皆さんこんにちは。
お元気でしょうか。朝弱いと結構困りますよね。
TensorFlowが盛り上がってたのでつい書いてみました。

TensorFlowとは

http://tensorflow.org/
http://download.tensorflow.org/paper/whitepaper2015.pdf
（詳細にライブラリのことを知りたい人はこちらのpdfへどうぞ）

TensorFlowはGoogleが開発したデータフローグラフを使用した数値計算ライブラリです。
グラフの各ノードは数値計算のオペレータを示し、エッジはデータの配列を示す。
desktopやserverなどでのCPU,GPU演算をシンプルなAPIで実現することが可能です。

開発者は、GoogleのBrain Teamの研究者、エンジニアです。目的は、機械学習や深層学習の
研究目的ですが、様々なドメインに対して利用することができます。（About TensorFlowより）

以下はニュース記事
Google、人工知能ライブラリ TensorFlow をオープンソース化。音声検索や写真認識、翻訳の基盤技術ディープラーニングを商利用可で解放 - Engadget Japanese
Google、ディープラーニングをサポートした機械学習ライブラリ「TensorFlow」をオープンソースで公開－ Publickey
Google、機械学習システム「TensorFlow」をオープンソースで公開 - ITmedia ニュース

Install

アーキテクチャによって異なります。
http://tensorflow.org/get_started/os_setup.mdに記載があります。

Mac

pip install https://storage.googleapis.com/tensorflow/mac/tensorflow-0.5.0-py2-none-any.whl

Ubuntu

CPU Version

pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl

GPU Version

pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl

Easy Exampleを解説しながらやってみる。

データ取得

以下のDeep MNIST for Expertsをやってみます。

まずは、データの取得が必要です。
Load MNIST Dataを実行してみましょう。ひとまず、mnistを取得してみます。
データセットの内部を確認してみるとわかりますが、numpyのarray構造になっています。

>>> import input_data
>>> mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
Succesfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Succesfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Succesfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Succesfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
>>> type(mnist.train.images)
<type 'numpy.ndarray'>

Model構築

モデル構築を行います。
TensorFlowではComputation Graphと呼ばれる構造を用いて実施します。

これから、モデルの構築を実施します。
Placeholderと呼ばれる構造を用いて定義します。これは実際に変数が与えられた時に
計算される準備（おまじない）みたいなものです。ひとまず書いておきます。
入力次元と、出力次元を定義しておけば大丈夫です。

import tensorflow as tf
sess = tf.InteractiveSession()
x = tf.placeholder("float", shape=[None, 784])
y_ = tf.placeholder("float", shape=[None, 10])

変数宣言

今回は、weightであるWとバイアスであるbの変数を定義します。
定義方法は以下の通りです。tensorflow.Variableを使って行います。
今回使っている、tf.zerosは、初期値が全て0の行列（もしくはベクトル）を構築しています。numpy.zerosと同じでしょうか。

W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

予測と誤差関数

モデルを構築するにあたり、予測する関数と誤差関数の定義は重要です。
以下は通常のニューラルネットでも用いられる出力次元10のベクトルに対して、softmax関数を
実行した内容になっています。

y = tf.nn.softmax(tf.matmul(x,W) + b)

次は誤差関数を定義します。

cross_entropy = -tf.reduce_sum(y_*tf.log(y))

トレーニングステップ

トレーニングステップの定義はかなり簡単で、以下のコード1行で定義することができます。

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

トレーニングステップの実行は以下の方法で行えます。
for i in range(1000):
batch = mnist.train.next_batch(50)
train_step.run(feed_dict={x: batch[0], y_: batch[1]})

評価

正答率を以下の式で求めることができます。

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels})

Source Code

import input_data
import tensorflow as tf

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
sess = tf.InteractiveSession()

x = tf.placeholder("float", shape=[None, 784])
y_ = tf.placeholder("float", shape=[None, 10])

W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

sess.run(tf.initialize_all_variables())
y = tf.nn.softmax(tf.matmul(x,W) + b)
cross_entropy = -tf.reduce_sum(y_*tf.log(y))

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

for i in range(1000):
  batch = mnist.train.next_batch(50)
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels})

HardなExampleをやってみた。（Build a Multilayer Convolutional Network）

Convolutional Neural Networkの基本構造は、畳込み（Convolutional）とPoolingです。
画像処理では、かなり盛んなニューラルネットワークの構造ですね。

Modelの構造

重みの初期化

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

ConvolutionとPoolingの定義

定義されている、conv2dやmax_poolを使えばできます。

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

First Convolutinal Layer

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

上記の構造に入力するには入力に工夫が入ります。
reshapeをします。2番目、3番目が縦と横、最後がチャンネルになります。

x_image = tf.reshape(x, [-1,28,28,1])

実際の畳込みからプーリングを行うのは以下になります。

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

Second Convolutional Layer

以下のコードを見るとわかることは、重みとバイアスを定義して、
計算して繋げることで、Layerを積み重ねることができそうです。

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

Densely Connected Layer

Convolutional Neural Networkで計算したのをFull Connected Layerにつなげます。

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1

今回注目するのはtf.reshape(h_pool2, [-1, 7*7*64])です。ここでreshapeを行うことで
Full Connected Layerへの変換を行っています。

DropOut

よくあるDropoutを実行する方法です。こちらはトレーニングでは、実行されますがテスト時には実行されないようにします。

keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

Readout Layer

Easy Exampleと比較しても特に変化がありません。

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

Training and Evaluate

最後にトレーニングと評価をします。

cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print "step %d, training accuracy %g"%(i, train_accuracy)
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print "test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

Source Code

import input_data
import tensorflow as tf

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

if __name__ == '__main__':
	mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
	x = tf.placeholder("float", shape=[None, 784])
	y_ = tf.placeholder("float", shape=[None, 10])
	sess = tf.InteractiveSession()
	
	W_conv1 = weight_variable([5, 5, 1, 32])
	b_conv1 = bias_variable([32])

	x_image = tf.reshape(x, [-1,28,28,1])

	h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
	h_pool1 = max_pool_2x2(h_conv1)

	W_conv2 = weight_variable([5, 5, 32, 64])
	b_conv2 = bias_variable([64])

	h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
	h_pool2 = max_pool_2x2(h_conv2)

	W_fc1 = weight_variable([7 * 7 * 64, 1024])
	b_fc1 = bias_variable([1024])

	h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
	h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

	keep_prob = tf.placeholder("float")
	h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
	W_fc2 = weight_variable([1024, 10])
	b_fc2 = bias_variable([10])

	y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
	h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

	cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
	train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
	correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
	accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
	sess.run(tf.initialize_all_variables())
	for i in range(20000):
	  batch = mnist.train.next_batch(50)
	  if i%100 == 0:
	    train_accuracy = accuracy.eval(feed_dict={
	        x:batch[0], y_: batch[1], keep_prob: 1.0})
	    print "step %d, training accuracy %g"%(i, train_accuracy)
	  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

	print "test accuracy %g"%accuracy.eval(feed_dict={
	    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

Output

Convolutional Neural NetworkでMNISTに対して、99%の正答率を叩き出しています。

step 0, training accuracy 0.14
step 100, training accuracy 0.86
step 200, training accuracy 0.9
step 300, training accuracy 0.86
step 400, training accuracy 0.98
step 500, training accuracy 0.88
#(中略)
step 19400, training accuracy 1
step 19500, training accuracy 1
step 19600, training accuracy 1
step 19700, training accuracy 1
step 19800, training accuracy 1
step 19900, training accuracy 1
test accuracy 0.9929

感想

最初見た時は難しそうですが、式は自由にかけそうなので、他のとくらべて案外良いかもしれません。
意外と気軽に様々な実装ができそう。
tutrialが丁寧に作られているので、しっかりと追えばある意味最近のライブラリの中でもとっつきやすい方だと思っています。
他との違いはもう少しドキュメントの読み込みが必要でしょう。
そろそろ誰か様々なライブラリ比較してほしい。
かなり色々な機能が揃っているので、だいたいのことはできるイメージです。
他にはない素晴らしい可視化機能が搭載されています。