TensorFlowのTutrialの解説と機械学習の実験をした。 - のんびりしているエンジニアの日記

今日は様々な箇所で賑わっているTensorFlowを使ってみました。

皆さんこんにちは。
お元気でしょうか。朝弱いと結構困りますよね。
TensorFlowが盛り上がってたのでつい書いてみました。

TensorFlowとは

http://tensorflow.org/
http://download.tensorflow.org/paper/whitepaper2015.pdf
（詳細にライブラリのことを知りたい人はこちらのpdfへどうぞ）

TensorFlowはGoogleが開発したデータフローグラフを使用した数値計算ライブラリです。
グラフの各ノードは数値計算のオペレータを示し、エッジはデータの配列を示す。
desktopやserverなどでのCPU,GPU演算をシンプルなAPIで実現することが可能です。

開発者は、GoogleのBrain Teamの研究者、エンジニアです。目的は、機械学習や深層学習の
研究目的ですが、様々なドメインに対して利用することができます。（About TensorFlowより）

以下はニュース記事
Google、人工知能ライブラリ TensorFlow をオープンソース化。音声検索や写真認識、翻訳の基盤技術ディープラーニングを商利用可で解放 - Engadget Japanese
Google、ディープラーニングをサポートした機械学習ライブラリ「TensorFlow」をオープンソースで公開－ Publickey
Google、機械学習システム「TensorFlow」をオープンソースで公開 - ITmedia ニュース

Install

アーキテクチャによって異なります。
http://tensorflow.org/get_started/os_setup.mdに記載があります。

Mac

pip install https://storage.googleapis.com/tensorflow/mac/tensorflow-0.5.0-py2-none-any.whl

Ubuntu

CPU Version

pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl

GPU Version

pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl

Easy Exampleを解説しながらやってみる。

データ取得

以下のDeep MNIST for Expertsをやってみます。

まずは、データの取得が必要です。
Load MNIST Dataを実行してみましょう。ひとまず、mnistを取得してみます。
データセットの内部を確認してみるとわかりますが、numpyのarray構造になっています。

>>> import input_data
>>> mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
Succesfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Succesfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Succesfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Succesfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
>>> type(mnist.train.images)
<type 'numpy.ndarray'>

Model構築

モデル構築を行います。
TensorFlowではComputation Graphと呼ばれる構造を用いて実施します。

これから、モデルの構築を実施します。
Placeholderと呼ばれる構造を用いて定義します。これは実際に変数が与えられた時に
計算される準備（おまじない）みたいなものです。ひとまず書いておきます。
入力次元と、出力次元を定義しておけば大丈夫です。

import tensorflow as tf
sess = tf.InteractiveSession()
x = tf.placeholder("float", shape=[None, 784])
y_ = tf.placeholder("float", shape=[None, 10])

変数宣言

今回は、weightであるWとバイアスであるbの変数を定義します。
定義方法は以下の通りです。tensorflow.Variableを使って行います。
今回使っている、tf.zerosは、初期値が全て0の行列（もしくはベクトル）を構築しています。numpy.zerosと同じでしょうか。

W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

予測と誤差関数

モデルを構築するにあたり、予測する関数と誤差関数の定義は重要です。
以下は通常のニューラルネットでも用いられる出力次元10のベクトルに対して、softmax関数を
実行した内容になっています。

y = tf.nn.softmax(tf.matmul(x,W) + b)

次は誤差関数を定義します。

cross_entropy = -tf.reduce_sum(y_*tf.log(y))

トレーニングステップ

トレーニングステップの定義はかなり簡単で、以下のコード1行で定義することができます。

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

トレーニングステップの実行は以下の方法で行えます。
for i in range(1000):
batch = mnist.train.next_batch(50)
train_step.run(feed_dict={x: batch[0], y_: batch[1]})

評価

正答率を以下の式で求めることができます。

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels})

Source Code

import input_data
import tensorflow as tf

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
sess = tf.InteractiveSession()

x = tf.placeholder("float", shape=[None, 784])
y_ = tf.placeholder("float", shape=[None, 10])

W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

sess.run(tf.initialize_all_variables())
y = tf.nn.softmax(tf.matmul(x,W) + b)
cross_entropy = -tf.reduce_sum(y_*tf.log(y))

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

for i in range(1000):
  batch = mnist.train.next_batch(50)
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels})

HardなExampleをやってみた。（Build a Multilayer Convolutional Network）

Convolutional Neural Networkの基本構造は、畳込み（Convolutional）とPoolingです。
画像処理では、かなり盛んなニューラルネットワークの構造ですね。

Modelの構造

重みの初期化

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

ConvolutionとPoolingの定義

定義されている、conv2dやmax_poolを使えばできます。

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

First Convolutinal Layer

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

上記の構造に入力するには入力に工夫が入ります。
reshapeをします。2番目、3番目が縦と横、最後がチャンネルになります。

x_image = tf.reshape(x, [-1,28,28,1])

実際の畳込みからプーリングを行うのは以下になります。

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

Second Convolutional Layer

以下のコードを見るとわかることは、重みとバイアスを定義して、
計算して繋げることで、Layerを積み重ねることができそうです。

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

Densely Connected Layer

Convolutional Neural Networkで計算したのをFull Connected Layerにつなげます。

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1

今回注目するのはtf.reshape(h_pool2, [-1, 7*7*64])です。ここでreshapeを行うことで
Full Connected Layerへの変換を行っています。

DropOut

よくあるDropoutを実行する方法です。こちらはトレーニングでは、実行されますがテスト時には実行されないようにします。

keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

Readout Layer

Easy Exampleと比較しても特に変化がありません。

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

Training and Evaluate

最後にトレーニングと評価をします。

cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print "step %d, training accuracy %g"%(i, train_accuracy)
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print "test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

Source Code

import input_data
import tensorflow as tf

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

if __name__ == '__main__':
	mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
	x = tf.placeholder("float", shape=[None, 784])
	y_ = tf.placeholder("float", shape=[None, 10])
	sess = tf.InteractiveSession()
	
	W_conv1 = weight_variable([5, 5, 1, 32])
	b_conv1 = bias_variable([32])

	x_image = tf.reshape(x, [-1,28,28,1])

	h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
	h_pool1 = max_pool_2x2(h_conv1)

	W_conv2 = weight_variable([5, 5, 32, 64])
	b_conv2 = bias_variable([64])

	h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
	h_pool2 = max_pool_2x2(h_conv2)

	W_fc1 = weight_variable([7 * 7 * 64, 1024])
	b_fc1 = bias_variable([1024])

	h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
	h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

	keep_prob = tf.placeholder("float")
	h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
	W_fc2 = weight_variable([1024, 10])
	b_fc2 = bias_variable([10])

	y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
	h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

	cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
	train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
	correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
	accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
	sess.run(tf.initialize_all_variables())
	for i in range(20000):
	  batch = mnist.train.next_batch(50)
	  if i%100 == 0:
	    train_accuracy = accuracy.eval(feed_dict={
	        x:batch[0], y_: batch[1], keep_prob: 1.0})
	    print "step %d, training accuracy %g"%(i, train_accuracy)
	  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

	print "test accuracy %g"%accuracy.eval(feed_dict={
	    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

Output

Convolutional Neural NetworkでMNISTに対して、99%の正答率を叩き出しています。

step 0, training accuracy 0.14
step 100, training accuracy 0.86
step 200, training accuracy 0.9
step 300, training accuracy 0.86
step 400, training accuracy 0.98
step 500, training accuracy 0.88
#(中略)
step 19400, training accuracy 1
step 19500, training accuracy 1
step 19600, training accuracy 1
step 19700, training accuracy 1
step 19800, training accuracy 1
step 19900, training accuracy 1
test accuracy 0.9929