一步一步學(xué)用Tensorflow構(gòu)建卷積神經(jīng)網(wǎng)絡(luò)

時(shí)間：2017-11-10 20:55:11

關(guān)鍵字：測試濾波器電源技術(shù)解析

手機(jī)看文章

掃描二維碼
隨時(shí)隨地手機(jī)看文章

[導(dǎo)讀]0. 簡介在過去，我寫的主要都是“傳統(tǒng)類”的機(jī)器學(xué)習(xí)文章，如樸素貝葉斯分類、邏輯回歸和Perceptron算法。在過去的一年中，我一直在研究深度學(xué)習(xí)技術(shù)，因此，我想

0. 簡介

在過去，我寫的主要都是“傳統(tǒng)類”的機(jī)器學(xué)習(xí)文章，如樸素貝葉斯分類、邏輯回歸和Perceptron算法。在過去的一年中，我一直在研究深度學(xué)習(xí)技術(shù)，因此，我想和大家分享一下如何使用Tensorflow從頭開始構(gòu)建和訓(xùn)練卷積神經(jīng)網(wǎng)絡(luò)。這樣，我們以后就可以將這個(gè)知識作為一個(gè)構(gòu)建塊來創(chuàng)造有趣的深度學(xué)習(xí)應(yīng)用程序了。

為此，你需要安裝Tensorflow(請參閱安裝說明)，你還應(yīng)該對Python編程和卷積神經(jīng)網(wǎng)絡(luò)背后的理論有一個(gè)基本的了解。安裝完Tensorflow之后，你可以在不依賴GPU的情況下運(yùn)行一個(gè)較小的神經(jīng)網(wǎng)絡(luò)，但對于更深層次的神經(jīng)網(wǎng)絡(luò)，就需要用到GPU的計(jì)算能力了。

在互聯(lián)網(wǎng)上有很多解釋卷積神經(jīng)網(wǎng)絡(luò)工作原理方面的網(wǎng)站和課程，其中有一些還是很不錯的，圖文并茂、易于理解[點(diǎn)擊此處獲取更多信息]。我在這里就不再解釋相同的東西，所以在開始閱讀下文之前，請?zhí)崆傲私饩矸e神經(jīng)網(wǎng)絡(luò)的工作原理。例如：

什么是卷積層，卷積層的過濾器是什么?

什么是激活層(ReLu層(應(yīng)用最廣泛的)、S型激活或tanh)?

什么是池層(最大池/平均池)，什么是dropout?

隨機(jī)梯度下降的工作原理是什么?

本文內(nèi)容如下：

Tensorflow基礎(chǔ)

1.1 常數(shù)和變量

1.2 Tensorflow中的圖和會話

1.3 占位符和feed_dicts

Tensorflow中的神經(jīng)網(wǎng)絡(luò)

2.1 介紹

2.2 數(shù)據(jù)加載

2.3 創(chuàng)建一個(gè)簡單的一層神經(jīng)網(wǎng)絡(luò)

2.4 Tensorflow的多個(gè)方面

2.5 創(chuàng)建LeNet5卷積神經(jīng)網(wǎng)絡(luò)

2.6 影響層輸出大小的參數(shù)

2.7 調(diào)整LeNet5架構(gòu)

2.8 學(xué)習(xí)速率和優(yōu)化器的影響

Tensorflow中的深度神經(jīng)網(wǎng)絡(luò)

3.1 AlexNet

3.2 VGG Net-16

3.3 AlexNet性能

結(jié)語

1. Tensorflow 基礎(chǔ)

在這里，我將向以前從未使用過Tensorflow的人做一個(gè)簡單的介紹。如果你想要立即開始構(gòu)建神經(jīng)網(wǎng)絡(luò)，或者已經(jīng)熟悉Tensorflow，可以直接跳到第2節(jié)。如果你想了解更多有關(guān)Tensorflow的信息，你還可以查看這個(gè)代碼庫，或者閱讀斯坦福大學(xué)CS20SI課程的講義1和講義2。

1.1 常量與變量

Tensorflow中最基本的單元是常量、變量和占位符。

tf.constant()和tf.Variable()之間的區(qū)別很清楚;一個(gè)常量有著恒定不變的值，一旦設(shè)置了它，它的值不能被改變。而變量的值可以在設(shè)置完成后改變，但變量的數(shù)據(jù)類型和形狀無法改變。

#We can create constants and variables of different types.

#However, the different types do not mix well together.

a = tf.constant(2, tf.int16)

b = tf.constant(4, tf.float32)

c = tf.constant(8, tf.float32)

d = tf.Variable(2, tf.int16)

e = tf.Variable(4, tf.float32)

f = tf.Variable(8, tf.float32)

#we can perform computations on variable of the same type: e + f

#but the following can not be done: d + e

#everything in Tensorflow is a tensor, these can have different dimensions:

#0D, 1D, 2D, 3D, 4D, or nD-tensors

g = tf.constant(np.zeros(shape=(2,2), dtype=np.float32)) #does work

h = tf.zeros([11], tf.int16)

i = tf.ones([2,2], tf.float32)

j = tf.zeros([1000,4,3], tf.float64)

k = tf.Variable(tf.zeros([2,2], tf.float32))

l = tf.Variable(tf.zeros([5,6,5], tf.float32))

除了tf.zeros()和tf.ones()能夠創(chuàng)建一個(gè)初始值為0或1的張量(見這里)之外，還有一個(gè)tf.random_normal()函數(shù)，它能夠創(chuàng)建一個(gè)包含多個(gè)隨機(jī)值的張量，這些隨機(jī)值是從正態(tài)分布中隨機(jī)抽取的(默認(rèn)的分布均值為0.0，標(biāo)準(zhǔn)差為1.0)。

另外還有一個(gè)tf.truncated_normal()函數(shù)，它創(chuàng)建了一個(gè)包含從截?cái)嗟恼龖B(tài)分布中隨機(jī)抽取的值的張量，其中下上限是標(biāo)準(zhǔn)偏差的兩倍。

有了這些知識，我們就可以創(chuàng)建用于神經(jīng)網(wǎng)絡(luò)的權(quán)重矩陣和偏差向量了。

weights = tf.Variable(tf.truncated_normal([256 * 256, 10]))

biases = tf.Variable(tf.zeros([10]))

print(weights.get_shape().as_list())

print(biases.get_shape().as_list())

>>>[65536, 10]

>>>[10]

1.2 Tensorflow 中的圖與會話

在Tensorflow中，所有不同的變量以及對這些變量的操作都保存在圖(Graph)中。在構(gòu)建了一個(gè)包含針對模型的所有計(jì)算步驟的圖之后，就可以在會話(Session)中運(yùn)行這個(gè)圖了。會話可以跨CPU和GPU分配所有的計(jì)算。

graph = tf.Graph()

with graph.as_default():

a = tf.Variable(8, tf.float32)

b = tf.Variable(tf.zeros([2,2], tf.float32))

with tf.Session(graph=graph) as session:

tf.global_variables_initializer().run()

print(f)

print(session.run(f))

print(session.run(k))

>>>

>>> 8

>>> [[ 0. 0.]

>>> [ 0. 0.]]

1.3 占位符與 feed_dicts

我們已經(jīng)看到了用于創(chuàng)建常量和變量的各種形式。Tensorflow中也有占位符，它不需要初始值，僅用于分配必要的內(nèi)存空間。在一個(gè)會話中，這些占位符可以通過feed_dict填入(外部)數(shù)據(jù)。

以下是占位符的使用示例。

list_of_points1_ = [[1,2], [3,4], [5,6], [7,8]]

list_of_points2_ = [[15,16], [13,14], [11,12], [9,10]]

list_of_points1 = np.array([np.array(elem).reshape(1,2) for elem in list_of_points1_])

list_of_points2 = np.array([np.array(elem).reshape(1,2) for elem in list_of_points2_])

graph = tf.Graph()

with graph.as_default():

#we should use a tf.placeholder() to create a variable whose value you will fill in later (during session.run()).

#this can be done by 'feeding' the data into the placeholder.

#below we see an example of a method which uses two placeholder arrays of size [2,1] to calculate the eucledian distance

point1 = tf.placeholder(tf.float32, shape=(1, 2))

point2 = tf.placeholder(tf.float32, shape=(1, 2))

def calculate_eucledian_distance(point1, point2):

difference = tf.subtract(point1, point2)

power2 = tf.pow(difference, tf.constant(2.0, shape=(1,2)))

add = tf.reduce_sum(power2)

eucledian_distance = tf.sqrt(add)

return eucledian_distance

dist = calculate_eucledian_distance(point1, point2)

with tf.Session(graph=graph) as session:

tf.global_variables_initializer().run()

for ii in range(len(list_of_points1)):

point1_ = list_of_points1[ii]

point2_ = list_of_points2[ii]

feed_dict = {point1 : point1_, point2 : point2_}

distance = session.run([dist], feed_dict=feed_dict)

print("the distance between {} and {} -> {}".format(point1_, point2_, distance))

>>> the distance between [[1 2]] and [[15 16]] -> [19.79899]

>>> the distance between [[3 4]] and [[13 14]] -> [14.142136]

>>> the distance between [[5 6]] and [[11 12]] -> [8.485281]

>>> the distance between [[7 8]] and [[ 9 10]] -> [2.8284271]

2. Tensorflow 中的神經(jīng)網(wǎng)絡(luò)

2.1 簡介

包含神經(jīng)網(wǎng)絡(luò)的圖(如上圖所示)應(yīng)包含以下步驟：

1. 輸入數(shù)據(jù)集：訓(xùn)練數(shù)據(jù)集和標(biāo)簽、測試數(shù)據(jù)集和標(biāo)簽(以及驗(yàn)證數(shù)據(jù)集和標(biāo)簽)。測試和驗(yàn)證數(shù)據(jù)集可以放在tf.constant()中。而訓(xùn)練數(shù)據(jù)集被放在tf.placeholder()中，這樣它可以在訓(xùn)練期間分批輸入(隨機(jī)梯度下降)。

2. 神經(jīng)網(wǎng)絡(luò)**模型**及其所有的層。這可以是一個(gè)簡單的完全連接的神經(jīng)網(wǎng)絡(luò)，僅由一層組成，或者由5、9、16層組成的更復(fù)雜的神經(jīng)網(wǎng)絡(luò)。

3. 權(quán)重矩陣和**偏差矢量**以適當(dāng)?shù)男螤钸M(jìn)行定義和初始化。(每層一個(gè)權(quán)重矩陣和偏差矢量)

4. 損失值：模型可以輸出分對數(shù)矢量(估計(jì)的訓(xùn)練標(biāo)簽)，并通過將分對數(shù)與實(shí)際標(biāo)簽進(jìn)行比較，計(jì)算出損失值(具有交叉熵函數(shù)的softmax)。損失值表示估計(jì)訓(xùn)練標(biāo)簽與實(shí)際訓(xùn)練標(biāo)簽的接近程度，并用于更新權(quán)重值。

5. 優(yōu)化器：它用于將計(jì)算得到的損失值來更新反向傳播算法中的權(quán)重和偏差。

2.2 數(shù)據(jù)加載

下面我們來加載用于訓(xùn)練和測試神經(jīng)網(wǎng)絡(luò)的數(shù)據(jù)集。為此，我們要下載MNIST和CIFAR-10數(shù)據(jù)集。 MNIST數(shù)據(jù)集包含了6萬個(gè)手寫數(shù)字圖像，其中每個(gè)圖像大小為28 x 28 x 1(灰度)。 CIFAR-10數(shù)據(jù)集也包含了6萬個(gè)圖像(3個(gè)通道)，大小為32 x 32 x 3，包含10個(gè)不同的物體(飛機(jī)、汽車、鳥、貓、鹿、狗、青蛙、馬、船、卡車)。由于兩個(gè)數(shù)據(jù)集中都有10個(gè)不同的對象，所以這兩個(gè)數(shù)據(jù)集都包含10個(gè)標(biāo)簽。

首先，我們來定義一些方便載入數(shù)據(jù)和格式化數(shù)據(jù)的方法。

def randomize(dataset, labels):

permutation = np.random.permutation(labels.shape[0])

shuffled_dataset = dataset[permutation, :, :]

shuffled_labels = labels[permutation]

return shuffled_dataset, shuffled_labels

def one_hot_encode(np_array):

return (np.arange(10) == np_array[:,None]).astype(np.float32)

def reformat_data(dataset, labels, image_width, image_height, image_depth):

np_dataset_ = np.array([np.array(image_data).reshape(image_width, image_height, image_depth) for image_data in dataset])

np_labels_ = one_hot_encode(np.array(labels, dtype=np.float32))

np_dataset, np_labels = randomize(np_dataset_, np_labels_)

return np_dataset, np_labels

def flatten_tf_array(array):

shape = array.get_shape().as_list()

return tf.reshape(array, [shape[0], shape[1] shape[2] shape[3]])

def accuracy(predictions, labels):

return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0])

這些方法可用于對標(biāo)簽進(jìn)行獨(dú)熱碼編碼、將數(shù)據(jù)加載到隨機(jī)數(shù)組中、扁平化矩陣(因?yàn)橥耆B接的網(wǎng)絡(luò)需要一個(gè)扁平矩陣作為輸入)：

在我們定義了這些必要的函數(shù)之后，我們就可以這樣加載MNIST和CIFAR-10數(shù)據(jù)集了：

mnist_folder = './data/mnist/'

mnist_image_width = 28

mnist_image_height = 28

mnist_image_depth = 1

mnist_num_labels = 10

mndata = MNIST(mnist_folder)

mnist_train_dataset_, mnist_train_labels_ = mndata.load_training()

mnist_test_dataset_, mnist_test_labels_ = mndata.load_testing()

mnist_train_dataset, mnist_train_labels = reformat_data(mnist_train_dataset_, mnist_train_labels_, mnist_image_size, mnist_image_size, mnist_image_depth)

mnist_test_dataset, mnist_test_labels = reformat_data(mnist_test_dataset_, mnist_test_labels_, mnist_image_size, mnist_image_size, mnist_image_depth)

print("There are {} images, each of size {}".format(len(mnist_train_dataset), len(mnist_train_dataset[0])))

print("Meaning each image has the size of 28281 = {}".format(mnist_image_sizemnist_image_size1))

print("The training set contains the following {} labels: {}".format(len(np.unique(mnist_train_labels_)), np.unique(mnist_train_labels_)))

print('Training set shape', mnist_train_dataset.shape, mnist_train_labels.shape)

print('Test set shape', mnist_test_dataset.shape, mnist_test_labels.shape)

train_dataset_mnist, train_labels_mnist = mnist_train_dataset, mnist_train_labels

test_dataset_mnist, test_labels_mnist = mnist_test_dataset, mnist_test_labels

######################################################################################

cifar10_folder = './data/cifar10/'

train_datasets = ['data_batch_1', 'data_batch_2', 'data_batch_3', 'data_batch_4', 'data_batch_5', ]

test_dataset = ['test_batch']

c10_image_height = 32

c10_image_width = 32

c10_image_depth = 3

c10_num_labels = 10

with open(cifar10_folder + test_dataset[0], 'rb') as f0:

c10_test_dict = pickle.load(f0, encoding='bytes')

c10_test_dataset, c10_test_labels = c10_test_dict[b'data'], c10_test_dict[b'labels']

test_dataset_cifar10, test_labels_cifar10 = reformat_data(c10_test_dataset, c10_test_labels, c10_image_size, c10_image_size, c10_image_depth)

c10_train_dataset, c10_train_labels = [], []

for train_dataset in train_datasets:

with open(cifar10_folder + train_dataset, 'rb') as f0:

c10_train_dict = pickle.load(f0, encoding='bytes')

c10_train_dataset_, c10_train_labels_ = c10_train_dict[b'data'], c10_train_dict[b'labels']

c10_train_dataset.append(c10_train_dataset_)

c10_train_labels += c10_train_labels_

c10_train_dataset = np.concatenate(c10_train_dataset, axis=0)

train_dataset_cifar10, train_labels_cifar10 = reformat_data(c10_train_dataset, c10_train_labels, c10_image_size, c10_image_size, c10_image_depth)

del c10_train_dataset

del c10_train_labels

print("The training set contains the following labels: {}".format(np.unique(c10_train_dict[b'labels'])))

print('Training set shape', train_dataset_cifar10.shape, train_labels_cifar10.shape)

print('Test set shape', test_dataset_cifar10.shape, test_labels_cifar10.shape)

你可以從Yann LeCun的網(wǎng)站下載MNIST數(shù)據(jù)集。下載并解壓縮之后，可以使用python-mnist 工具來加載數(shù)據(jù)。 CIFAR-10數(shù)據(jù)集可以從這里下載。

2.3 創(chuàng)建一個(gè)簡單的一層神經(jīng)網(wǎng)絡(luò)

神經(jīng)網(wǎng)絡(luò)最簡單的形式是一層線性全連接神經(jīng)網(wǎng)絡(luò)(FCNN， Fully Connected Neural Network)。在數(shù)學(xué)上它由一個(gè)矩陣乘法組成。

最好是在Tensorflow中從這樣一個(gè)簡單的NN開始，然后再去研究更復(fù)雜的神經(jīng)網(wǎng)絡(luò)。當(dāng)我們研究那些更復(fù)雜的神經(jīng)網(wǎng)絡(luò)的時(shí)候，只是圖的模型(步驟2)和權(quán)重(步驟3)發(fā)生了改變，其他步驟仍然保持不變。

我們可以按照如下代碼制作一層FCNN：

image_width = mnist_image_width

image_height = mnist_image_height

image_depth = mnist_image_depth

num_labels = mnist_num_labels

#the dataset

train_dataset = mnist_train_dataset

train_labels = mnist_train_labels

test_dataset = mnist_test_dataset

test_labels = mnist_test_labels

#number of iterations and learning rate

num_steps = 10001

display_step = 1000

learning_rate = 0.5

graph = tf.Graph()

with graph.as_default():

#1) First we put the input data in a Tensorflow friendly form.

tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth))

tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))

tf_test_dataset = tf.constant(test_dataset, tf.float32)

#2) Then, the weight matrices and bias vectors are initialized

#as a default, tf.truncated_normal() is used for the weight matrix and tf.zeros() is used for the bias vector.

weights = tf.Variable(tf.truncated_normal([image_width image_height image_depth, num_labels]), tf.float32)

bias = tf.Variable(tf.zeros([num_labels]), tf.float32)

#3) define the model:

#A one layered fccd simply consists of a matrix multiplication

def model(data, weights, bias):

return tf.matmul(flatten_tf_array(data), weights) + bias

logits = model(tf_train_dataset, weights, bias)

#4) calculate the loss, which will be used in the optimization of the weights

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))

#5) Choose an optimizer. Many are available.

optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

#6) The predicted values for the images in the train dataset and test dataset are assigned to the variables train_prediction and test_prediction.

#It is only necessary if you want to know the accuracy by comparing it with the actual values.

train_prediction = tf.nn.softmax(logits)

test_prediction = tf.nn.softmax(model(tf_test_dataset, weights, bias))

with tf.Session(graph=graph) as session:

tf.global_variables_initializer().run()

print('Initialized')

for step in range(num_steps):

_, l, predictions = session.run([optimizer, loss, train_prediction])

if (step % display_step == 0):

train_accuracy = accuracy(predictions, train_labels[:, :])

test_accuracy = accuracy(test_prediction.eval(), test_labels)

message = "step {:04d} : loss is {:06.2f}, accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)

print(message)

>>> Initialized

>>> step 0000 : loss is 2349.55, accuracy on training set 10.43 %, accuracy on test set 34.12 %

>>> step 0100 : loss is 3612.48, accuracy on training set 89.26 %, accuracy on test set 90.15 %

>>> step 0200 : loss is 2634.40, accuracy on training set 91.10 %, accuracy on test set 91.26 %

>>> step 0300 : loss is 2109.42, accuracy on training set 91.62 %, accuracy on test set 91.56 %

>>> step 0400 : loss is 2093.56, accuracy on training set 91.85 %, accuracy on test set 91.67 %

>>> step 0500 : loss is 2325.58, accuracy on training set 91.83 %, accuracy on test set 91.67 %

>>> step 0600 : loss is 22140.44, accuracy on training set 68.39 %, accuracy on test set 75.06 %

>>> step 0700 : loss is 5920.29, accuracy on training set 83.73 %, accuracy on test set 87.76 %

>>> step 0800 : loss is 9137.66, accuracy on training set 79.72 %, accuracy on test set 83.33 %

>>> step 0900 : loss is 15949.15, accuracy on training set 69.33 %, accuracy on test set 77.05 %

>>> step 1000 : loss is 1758.80, accuracy on training set 92.45 %, accuracy on test set 91.79 %

在圖中，我們加載數(shù)據(jù)，定義權(quán)重矩陣和模型，從分對數(shù)矢量中計(jì)算損失值，并將其傳遞給優(yōu)化器，該優(yōu)化器將更新迭代“num_steps”次數(shù)的權(quán)重。

在上述完全連接的NN中，我們使用了梯度下降優(yōu)化器來優(yōu)化權(quán)重。然而，有很多不同的優(yōu)化器可用于Tensorflow。最常用的優(yōu)化器有GradientDescentOptimizer、AdamOptimizer和AdaGradOptimizer，所以如果你正在構(gòu)建一個(gè)CNN的話，我建議你試試這些。

Sebastian Ruder有一篇不錯的博文介紹了不同優(yōu)化器之間的區(qū)別，通過這篇文章，你可以更詳細(xì)地了解它們。

2.4 Tensorflow的幾個(gè)方面

Tensorflow包含許多層，這意味著可以通過不同的抽象級別來完成相同的操作。這里有一個(gè)簡單的例子，操作

logits = tf.matmul(tf_train_dataset, weights) + biases，

也可以這樣來實(shí)現(xiàn)

logits = tf.nn.xw_plus_b(train_dataset, weights, biases)。

這是layers API中最明顯的一層，它是一個(gè)具有高度抽象性的層，可以很容易地創(chuàng)建由許多不同層組成的神經(jīng)網(wǎng)絡(luò)。例如，conv_2d()或fully_connected()函數(shù)用于創(chuàng)建卷積和完全連接的層。通過這些函數(shù)，可以將層數(shù)、過濾器的大小或深度、激活函數(shù)的類型等指定為參數(shù)。然后，權(quán)重矩陣和偏置矩陣會自動創(chuàng)建，一起創(chuàng)建的還有激活函數(shù)和丟棄正則化層(dropout regularization laye)。

例如，通過使用層API，下面這些代碼：

import Tensorflow as tf

w1 = tf.Variable(tf.truncated_normal([filter_size, filter_size, image_depth, filter_depth], stddev=0.1))

b1 = tf.Variable(tf.zeros([filter_depth]))

layer1_conv = tf.nn.conv2d(data, w1, [1, 1, 1, 1], padding='SAME')

layer1_relu = tf.nn.relu(layer1_conv + b1)

layer1_pool = tf.nn.max_pool(layer1_pool, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

可以替換為

from tflearn.layers.conv import conv_2d, max_pool_2d

layer1_conv = conv_2d(data, filter_depth, filter_size, activation='relu')

layer1_pool = max_pool_2d(layer1_conv_relu, 2, strides=2)

可以看到，我們不需要定義權(quán)重、偏差或激活函數(shù)。尤其是在你建立一個(gè)具有很多層的神經(jīng)網(wǎng)絡(luò)的時(shí)候，這樣可以保持代碼的清晰和整潔。

然而，如果你剛剛接觸Tensorflow的話，學(xué)習(xí)如何構(gòu)建不同種類的神經(jīng)網(wǎng)絡(luò)并不合適，因?yàn)閠flearn做了所有的工作。

因此，我們不會在本文中使用層API，但是一旦你完全理解了如何在Tensorflow中構(gòu)建神經(jīng)網(wǎng)絡(luò)，我還是建議你使用它。

2.5 創(chuàng)建 LeNet5 卷積神經(jīng)網(wǎng)絡(luò)

下面我們將開始構(gòu)建更多層的神經(jīng)網(wǎng)絡(luò)。例如LeNet5卷積神經(jīng)網(wǎng)絡(luò)。

LeNet5 CNN架構(gòu)最早是在1998年由Yann Lecun(見論文)提出的。它是最早的CNN之一，專門用于對手寫數(shù)字進(jìn)行分類。盡管它在由大小為28 x 28的灰度圖像組成的MNIST數(shù)據(jù)集上運(yùn)行良好，但是如果用于其他包含更多圖片、更大分辨率以及更多類別的數(shù)據(jù)集時(shí)，它的性能會低很多。對于這些較大的數(shù)據(jù)集，更深的ConvNets(如AlexNet、VGGNet或ResNet)會表現(xiàn)得更好。

但由于LeNet5架構(gòu)僅由5個(gè)層構(gòu)成，因此，學(xué)習(xí)如何構(gòu)建CNN是一個(gè)很好的起點(diǎn)。

Lenet5架構(gòu)如下圖所示：

我們可以看到，它由5個(gè)層組成：

第1層：卷積層，包含S型激活函數(shù)，然后是平均池層。

第2層：卷積層，包含S型激活函數(shù)，然后是平均池層。

第3層：一個(gè)完全連接的網(wǎng)絡(luò)(S型激活)

第4層：一個(gè)完全連接的網(wǎng)絡(luò)(S型激活)

第5層：輸出層

這意味著我們需要創(chuàng)建5個(gè)權(quán)重和偏差矩陣，我們的模型將由12行代碼組成(5個(gè)層 + 2個(gè)池 + 4個(gè)激活函數(shù) + 1個(gè)扁平層)。

由于這個(gè)還是有一些代碼量的，因此最好在圖之外的一個(gè)單獨(dú)函數(shù)中定義這些代碼。

LENET5_BATCH_SIZE = 32

LENET5_PATCH_SIZE = 5

LENET5_PATCH_DEPTH_1 = 6

LENET5_PATCH_DEPTH_2 = 16

LENET5_NUM_HIDDEN_1 = 120

LENET5_NUM_HIDDEN_2 = 84

def variables_lenet5(patch_size = LENET5_PATCH_SIZE, patch_depth1 = LENET5_PATCH_DEPTH_1,

patch_depth2 = LENET5_PATCH_DEPTH_2,

num_hidden1 = LENET5_NUM_HIDDEN_1, num_hidden2 = LENET5_NUM_HIDDEN_2,

image_depth = 1, num_labels = 10):

w1 = tf.Variable(tf.truncated_normal([patch_size, patch_size, image_depth, patch_depth1], stddev=0.1))

b1 = tf.Variable(tf.zeros([patch_depth1]))

w2 = tf.Variable(tf.truncated_normal([patch_size, patch_size, patch_depth1, patch_depth2], stddev=0.1))

b2 = tf.Variable(tf.constant(1.0, shape=[patch_depth2]))

w3 = tf.Variable(tf.truncated_normal([55patch_depth2, num_hidden1], stddev=0.1))

b3 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))

w4 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))

b4 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))

w5 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))

b5 = tf.Variable(tf.constant(1.0, shape = [num_labels]))

variables = {

'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5,

'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5

}

return variables

def model_lenet5(data, variables):

layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')

layer1_actv = tf.sigmoid(layer1_conv + variables['b1'])

layer1_pool = tf.nn.avg_pool(layer1_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

layer2_conv = tf.nn.conv2d(layer1_pool, variables['w2'], [1, 1, 1, 1], padding='VALID')

layer2_actv = tf.sigmoid(layer2_conv + variables['b2'])

layer2_pool = tf.nn.avg_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

flat_layer = flatten_tf_array(layer2_pool)

layer3_fccd = tf.matmul(flat_layer, variables['w3']) + variables['b3']

layer3_actv = tf.nn.sigmoid(layer3_fccd)

layer4_fccd = tf.matmul(layer3_actv, variables['w4']) + variables['b4']

layer4_actv = tf.nn.sigmoid(layer4_fccd)

logits = tf.matmul(layer4_actv, variables['w5']) + variables['b5']

return logits

由于變量和模型是單獨(dú)定義的，我們可以稍稍調(diào)整一下圖，以便讓它使用這些權(quán)重和模型，而不是以前的完全連接的NN：

#parameters determining the model size

image_size = mnist_image_size

num_labels = mnist_num_labels

#the datasets

train_dataset = mnist_train_dataset

train_labels = mnist_train_labels

test_dataset = mnist_test_dataset

test_labels = mnist_test_labels

#number of iterations and learning rate

num_steps = 10001

display_step = 1000

learning_rate = 0.001

graph = tf.Graph()

with graph.as_default():

#1) First we put the input data in a Tensorflow friendly form.

tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_width, image_height, image_depth))

tf_train_labels = tf.placeholder(tf.float32, shape = (batch_size, num_labels))

tf_test_dataset = tf.constant(test_dataset, tf.float32)

#2) Then, the weight matrices and bias vectors are initialized

variables = variables_lenet5(image_depth = image_depth, num_labels = num_labels)

#3. The model used to calculate the logits (predicted labels)

model = model_lenet5

logits = model(tf_train_dataset, variables)

#4. then we compute the softmax cross entropy between the logits and the (actual) labels

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))

#5. The optimizer is used to calculate the gradients of the loss function

optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

# Predictions for the training, validation, and test data.

train_prediction = tf.nn.softmax(logits)

test_prediction = tf.nn.softmax(model(tf_test_dataset, variables))

with tf.Session(graph=graph) as session:

tf.global_variables_initializer().run()

print('Initialized with learning_rate', learning_rate)

for step in range(num_steps):

#Since we are using stochastic gradient descent, we are selecting small batches from the training dataset,

#and training the convolutional neural network each time with a batch.

offset = (step * batch_size) % (train_labels.shape[0] - batch_size)

batch_data = train_dataset[offset:(offset + batch_size), :, :, :]

batch_labels = train_labels[offset:(offset + batch_size), :]

feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}

_, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)

if step % display_step == 0:

train_accuracy = accuracy(predictions, batch_labels)

test_accuracy = accuracy(test_prediction.eval(), test_labels)

message = "step {:04d} : loss is {:06.2f}, accuracy on training set {:02.2f} %, accuracy on test set {:02.2f} %".format(step, l, train_accuracy, test_accuracy)

print(message)

>>> Initialized with learning_rate 0.1

>>> step 0000 : loss is 002.49, accuracy on training set 3.12 %, accuracy on test set 10.09 %

>>> step 1000 : loss is 002.29, accuracy on training set 21.88 %, accuracy on test set 9.58 %

>>> step 2000 : loss is 000.73, accuracy on training set 75.00 %, accuracy on test set 78.20 %

>>> step 3000 : loss is 000.41, accuracy on training set 81.25 %, accuracy on test set 86.87 %

>>> step 4000 : loss is 000.26, accuracy on training set 93.75 %, accuracy on test set 90.49 %

>>> step 5000 : loss is 000.28, accuracy on training set 87.50 %, accuracy on test set 92.79 %

>>> step 6000 : loss is 000.23, accuracy on training set 96.88 %, accuracy on test set 93.64 %

>>> step 7000 : loss is 000.18, accuracy on training set 90.62 %, accuracy on test set 95.14 %

>>> step 8000 : loss is 000.14, accuracy on training set 96.88 %, accuracy on test set 95.80 %

>>> step 9000 : loss is 000.35, accuracy on training set 90.62 %, accuracy on test set 96.33 %

>>> step 10000 : loss is 000.12, accuracy on training set 93.75 %, accuracy on test set 96.76 %

我們可以看到，LeNet5架構(gòu)在MNIST數(shù)據(jù)集上的表現(xiàn)比簡單的完全連接的NN更好。

2.6 影響層輸出大小的參數(shù)

一般來說，神經(jīng)網(wǎng)絡(luò)的層數(shù)越多越好。我們可以添加更多的層、修改激活函數(shù)和池層，修改學(xué)習(xí)速率，以看看每個(gè)步驟是如何影響性能的。由于i層的輸入是i-1層的輸出，我們需要知道不同的參數(shù)是如何影響i-1層的輸出大小的。

要了解這一點(diǎn)，可以看看conv2d()函數(shù)。

它有四個(gè)參數(shù)：

輸入圖像，維度為[batch size, image_width, image_height, image_depth]的4D張量

權(quán)重矩陣，維度為[filter_size, filter_size, image_depth, filter_depth]的4D張量

每個(gè)維度的步幅數(shù)。

填充(='SAME'/'VALID')

這四個(gè)參數(shù)決定了輸出圖像的大小。

前兩個(gè)參數(shù)分別是包含一批輸入圖像的4D張量和包含卷積濾波器權(quán)重的4D張量。

第三個(gè)參數(shù)是卷積的步幅，即卷積濾波器在四維的每一個(gè)維度中應(yīng)該跳過多少個(gè)位置。這四個(gè)維度中的第一個(gè)維度表示圖像批次中的圖像編號，由于我們不想跳過任何圖像，因此始終為1。最后一個(gè)維度表示圖像深度(不是色彩的通道數(shù);灰度為1，RGB為3)，由于我們不想跳過任何顏色通道，所以這個(gè)也總是為1。第二和第三維度表示X和Y方向上的步幅(圖像寬度和高度)。如果要應(yīng)用步幅，則這些是過濾器應(yīng)跳過的位置的維度。因此，對于步幅為1，我們必須將步幅參數(shù)設(shè)置為[1, 1, 1, 1]，如果我們希望步幅為2，則將其設(shè)置為[1，2，2，1]。以此類推。

最后一個(gè)參數(shù)表示Tensorflow是否應(yīng)該對圖像用零進(jìn)行填充，以確保對于步幅為1的輸出尺寸不會改變。如果 padding = 'SAME'，則圖像用零填充(并且輸出大小不會改變)，如果 padding = 'VALID'，則不填充。

下面我們可以看到通過圖像(大小為28 x 28)掃描的卷積濾波器(濾波器大小為5 x 5)的兩個(gè)示例。

在左側(cè)，填充參數(shù)設(shè)置為“SAME”，圖像用零填充，最后4行/列包含在輸出圖像中。

在右側(cè)，填充參數(shù)設(shè)置為“VALID”，圖像不用零填充，最后4行/列不包括在輸出圖像中。

我們可以看到，如果沒有用零填充，則不包括最后四個(gè)單元格，因?yàn)榫矸e濾波器已經(jīng)到達(dá)(非零填充)圖像的末尾。這意味著，對于28 x 28的輸入大小，輸出大小變?yōu)?4 x 24 。如果 padding = 'SAME'，則輸出大小為28 x 28。

如果在掃描圖像時(shí)記下過濾器在圖像上的位置(為簡單起見，只有X方向)，那么這一點(diǎn)就變得更加清晰了。如果步幅為1，則X位置為0-5、1-6、2-7，等等。如果步幅為2，則X位置為0-5、2-7、4-9，等等。

如果圖像大小為28 x 28，濾鏡大小為5 x 5，并且步長1到4，那么我們可以得到下面這個(gè)表：

可以看到，對于步幅為1，零填充輸出圖像大小為28 x 28。如果非零填充，則輸出圖像大小變?yōu)?4 x 24。對于步幅為2的過濾器，這幾個(gè)數(shù)字分別為 14 x 14 和 12 x 12，對于步幅為3的過濾器，分別為 10 x 10 和 8 x 8。以此類推。

對于任意一個(gè)步幅S，濾波器尺寸K，圖像尺寸W和填充尺寸P，輸出尺寸將為

如果在Tensorflow中 padding = “SAME”，則分子加起來恒等于1，輸出大小僅由步幅S決定。

2.7 調(diào)整 LeNet5 的架構(gòu)

在原始論文中，LeNet5架構(gòu)使用了S形激活函數(shù)和平均池。然而，現(xiàn)在，使用relu激活函數(shù)則更為常見。所以，我們來稍稍修改一下LeNet5 CNN，看看是否能夠提高準(zhǔn)確性。我們將稱之為類LeNet5架構(gòu)：

LENET5_LIKE_BATCH_SIZE = 32

LENET5_LIKE_FILTER_SIZE = 5

LENET5_LIKE_FILTER_DEPTH = 16

LENET5_LIKE_NUM_HIDDEN = 120

def variables_lenet5_like(filter_size = LENET5_LIKE_FILTER_SIZE,

filter_depth = LENET5_LIKE_FILTER_DEPTH,

num_hidden = LENET5_LIKE_NUM_HIDDEN,

image_width = 28, image_depth = 1, num_labels = 10):

w1 = tf.Variable(tf.truncated_normal([filter_size, filter_size, image_depth, filter_depth], stddev=0.1))

b1 = tf.Variable(tf.zeros([filter_depth]))

w2 = tf.Variable(tf.truncated_normal([filter_size, filter_size, filter_depth, filter_depth], stddev=0.1))

b2 = tf.Variable(tf.constant(1.0, shape=[filter_depth]))

w3 = tf.Variable(tf.truncated_normal([(image_width // 4)(image_width // 4)filter_depth , num_hidden], stddev=0.1))

b3 = tf.Variable(tf.constant(1.0, shape = [num_hidden]))

w4 = tf.Variable(tf.truncated_normal([num_hidden, num_hidden], stddev=0.1))

b4 = tf.Variable(tf.constant(1.0, shape = [num_hidden]))

w5 = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))

b5 = tf.Variable(tf.constant(1.0, shape = [num_labels]))

variables = {

'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5,

'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5

}

return variables

def model_lenet5_like(data, variables):

layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')

layer1_actv = tf.nn.relu(layer1_conv + variables['b1'])

layer1_pool = tf.nn.avg_pool(layer1_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

layer2_conv = tf.nn.conv2d(layer1_pool, variables['w2'], [1, 1, 1, 1], padding='SAME')

layer2_actv = tf.nn.relu(layer2_conv + variables['b2'])

layer2_pool = tf.nn.avg_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

flat_layer = flatten_tf_array(layer2_pool)

layer3_fccd = tf.matmul(flat_layer, variables['w3']) + variables['b3']

layer3_actv = tf.nn.relu(layer3_fccd)

#layer3_drop = tf.nn.dropout(layer3_actv, 0.5)

layer4_fccd = tf.matmul(layer3_actv, variables['w4']) + variables['b4']

layer4_actv = tf.nn.relu(layer4_fccd)

#layer4_drop = tf.nn.dropout(layer4_actv, 0.5)

logits = tf.matmul(layer4_actv, variables['w5']) + variables['b5']

return logits

主要區(qū)別是我們使用了relu激活函數(shù)而不是S形激活函數(shù)。

除了激活函數(shù)，我們還可以改變使用的優(yōu)化器，看看不同的優(yōu)化器對精度的影響。

2.8 學(xué)習(xí)速率和優(yōu)化器的影響

讓我們來看看這些CNN在MNIST和CIFAR-10數(shù)據(jù)集上的表現(xiàn)。

在上面的圖中，測試集的精度是迭代次數(shù)的函數(shù)。左側(cè)為一層完全連接的NN，中間為LeNet5 NN，右側(cè)為類LeNet5 NN。

可以看到，LeNet5 CNN在MNIST數(shù)據(jù)集上表現(xiàn)得非常好。這并不是一個(gè)大驚喜，因?yàn)樗鼘ｉT就是為分類手寫數(shù)字而設(shè)計(jì)的。MNIST數(shù)據(jù)集很小，并沒有太大的挑戰(zhàn)性，所以即使是一個(gè)完全連接的網(wǎng)絡(luò)也表現(xiàn)的很好。

然而，在CIFAR-10數(shù)據(jù)集上，LeNet5 NN的性能顯著下降，精度下降到了40%左右。

為了提高精度，我們可以通過應(yīng)用正則化或?qū)W習(xí)速率衰減來改變優(yōu)化器，或者微調(diào)神經(jīng)網(wǎng)絡(luò)。

可以看到，AdagradOptimizer、AdamOptimizer和RMSPropOptimizer的性能比GradientDescentOptimizer更好。這些都是自適應(yīng)優(yōu)化器，其性能通常比GradientDescentOptimizer更好，但需要更多的計(jì)算能力。

通過L2正則化或指數(shù)速率衰減，我們可能會得到更搞的準(zhǔn)確性，但是要獲得更好的結(jié)果，我們需要進(jìn)一步研究。

3. Tensorflow 中的深度神經(jīng)網(wǎng)絡(luò)

到目前為止，我們已經(jīng)看到了LeNet5 CNN架構(gòu)。 LeNet5包含兩個(gè)卷積層，緊接著的是完全連接的層，因此可以稱為淺層神經(jīng)網(wǎng)絡(luò)。那時(shí)候(1998年)，GPU還沒有被用來進(jìn)行計(jì)算，而且CPU的功能也沒有那么強(qiáng)大，所以，在當(dāng)時(shí)，兩個(gè)卷積層已經(jīng)算是相當(dāng)具有創(chuàng)新意義了。

后來，很多其他類型的卷積神經(jīng)網(wǎng)絡(luò)被設(shè)計(jì)出來，你可以在這里查看詳細(xì)信息。

比如，由Alex Krizhevsky開發(fā)的非常有名的AlexNet 架構(gòu)(2012年)，7層的ZF Net (2013)，以及16層的 VGGNet (2014)。

在2015年，Google發(fā)布了一個(gè)包含初始模塊的22層的CNN(GoogLeNet)，而微軟亞洲研究院構(gòu)建了一個(gè)152層的CNN，被稱為ResNet。

現(xiàn)在，根據(jù)我們目前已經(jīng)學(xué)到的知識，我們來看一下如何在Tensorflow中創(chuàng)建AlexNet和VGGNet16架構(gòu)。

3.1 AlexNet

雖然LeNet5是第一個(gè)ConvNet，但它被認(rèn)為是一個(gè)淺層神經(jīng)網(wǎng)絡(luò)。它在由大小為28 x 28的灰度圖像組成的MNIST數(shù)據(jù)集上運(yùn)行良好，但是當(dāng)我們嘗試分類更大、分辨率更好、類別更多的圖像時(shí)，性能就會下降。

第一個(gè)深度CNN于2012年推出，稱為AlexNet，其創(chuàng)始人為Alex Krizhevsky、Ilya Sutskever和Geoffrey Hinton。與最近的架構(gòu)相比，AlexNet可以算是簡單的了，但在當(dāng)時(shí)它確實(shí)非常成功。它以令人難以置信的15.4%的測試錯誤率贏得了ImageNet比賽(亞軍的誤差為26.2%)，并在全球深度學(xué)習(xí)和人工智能領(lǐng)域掀起了一場革命。

它包括5個(gè)卷積層、3個(gè)最大池化層、3個(gè)完全連接層和2個(gè)丟棄層。整體架構(gòu)如下所示：

第0層：大小為224 x 224 x 3的輸入圖像

第1層：具有96個(gè)濾波器(filter_depth_1 = 96)的卷積層，大小為11×11(filter_size_1 = 11)，步長為4。它包含ReLU激活函數(shù)。緊接著的是最大池化層和本地響應(yīng)歸一化層。

第2層：具有大小為5 x 5(filter_size_2 = 5)的256個(gè)濾波器(filter_depth_2 = 256)且步幅為1的卷積層。它包含ReLU激活函數(shù)。緊接著的還是最大池化層和本地響應(yīng)歸一化層。

第3層：具有384個(gè)濾波器的卷積層(filter_depth_3 = 384)，尺寸為3×3(filter_size_3 = 3)，步幅為1。它包含ReLU激活函數(shù)

第4層：與第3層相同。

第5層：具有大小為3×3(filter_size_4 = 3)的256個(gè)濾波器(filter_depth_4 = 256)且步幅為1的卷積層。它包含ReLU激活函數(shù)

第6-8層：這些卷積層之后是完全連接層，每個(gè)層具有4096個(gè)神經(jīng)元。在原始論文中，他們對1000個(gè)類別的數(shù)據(jù)集進(jìn)行分類，但是我們將使用具有17個(gè)不同類別(的花卉)的oxford17數(shù)據(jù)集。

請注意，由于這些數(shù)據(jù)集中的圖像太小，因此無法在MNIST或CIFAR-10數(shù)據(jù)集上使用此CNN(或其他的深度CNN)。正如我們以前看到的，一個(gè)池化層(或一個(gè)步幅為2的卷積層)將圖像大小減小了2倍。 AlexNet具有3個(gè)最大池化層和一個(gè)步長為4的卷積層。這意味著原始圖像尺寸會縮小2^5。 MNIST數(shù)據(jù)集中的圖像將簡單地縮小到尺寸小于0。

因此，我們需要加載具有較大圖像的數(shù)據(jù)集，最好是224 x 224 x 3(如原始文件所示)。 17個(gè)類別的花卉數(shù)據(jù)集，又名oxflower17數(shù)據(jù)集是最理想的，因?yàn)樗诉@個(gè)大小的圖像：

ox17_image_width = 224

ox17_image_height = 224

ox17_image_depth = 3

ox17_num_labels = 17

import tflearn.datasets.oxflower17 as oxflower17

train_dataset_, train_labels_ = oxflower17.load_data(one_hot=True)

train_dataset_ox17, train_labels_ox17 = train_dataset_[:1000,:,:,:], train_labels_[:1000,:]

test_dataset_ox17, test_labels_ox17 = train_dataset_[1000:,:,:,:], train_labels_[1000:,:]

print('Training set', train_dataset_ox17.shape, train_labels_ox17.shape)

print('Test set', test_dataset_ox17.shape, test_labels_ox17.shape)

讓我們試著在AlexNet中創(chuàng)建權(quán)重矩陣和不同的層。正如我們之前看到的，我們需要跟層數(shù)一樣多的權(quán)重矩陣和偏差矢量，并且每個(gè)權(quán)重矩陣的大小應(yīng)該與其所屬層的過濾器的大小相對應(yīng)。

ALEX_PATCH_DEPTH_1, ALEX_PATCH_DEPTH_2, ALEX_PATCH_DEPTH_3, ALEX_PATCH_DEPTH_4 = 96, 256, 384, 256

ALEX_PATCH_SIZE_1, ALEX_PATCH_SIZE_2, ALEX_PATCH_SIZE_3, ALEX_PATCH_SIZE_4 = 11, 5, 3, 3

ALEX_NUM_HIDDEN_1, ALEX_NUM_HIDDEN_2 = 4096, 4096

def variables_alexnet(patch_size1 = ALEX_PATCH_SIZE_1, patch_size2 = ALEX_PATCH_SIZE_2,

patch_size3 = ALEX_PATCH_SIZE_3, patch_size4 = ALEX_PATCH_SIZE_4,

patch_depth1 = ALEX_PATCH_DEPTH_1, patch_depth2 = ALEX_PATCH_DEPTH_2,

patch_depth3 = ALEX_PATCH_DEPTH_3, patch_depth4 = ALEX_PATCH_DEPTH_4,

num_hidden1 = ALEX_NUM_HIDDEN_1, num_hidden2 = ALEX_NUM_HIDDEN_2,

image_width = 224, image_height = 224, image_depth = 3, num_labels = 17):

w1 = tf.Variable(tf.truncated_normal([patch_size1, patch_size1, image_depth, patch_depth1], stddev=0.1))

b1 = tf.Variable(tf.zeros([patch_depth1]))

w2 = tf.Variable(tf.truncated_normal([patch_size2, patch_size2, patch_depth1, patch_depth2], stddev=0.1))

b2 = tf.Variable(tf.constant(1.0, shape=[patch_depth2]))

w3 = tf.Variable(tf.truncated_normal([patch_size3, patch_size3, patch_depth2, patch_depth3], stddev=0.1))

b3 = tf.Variable(tf.zeros([patch_depth3]))

w4 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth3, patch_depth3], stddev=0.1))

b4 = tf.Variable(tf.constant(1.0, shape=[patch_depth3]))

w5 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth3, patch_depth3], stddev=0.1))

b5 = tf.Variable(tf.zeros([patch_depth3]))

pool_reductions = 3

conv_reductions = 2

no_reductions = pool_reductions + conv_reductions

w6 = tf.Variable(tf.truncated_normal([(image_width // 2no_reductions)(image_height // 2no_reductions)patch_depth3, num_hidden1], stddev=0.1))

b6 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))

w7 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))

b7 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))

w8 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))

b8 = tf.Variable(tf.constant(1.0, shape = [num_labels]))

variables = {

'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5, 'w6': w6, 'w7': w7, 'w8': w8,

'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5, 'b6': b6, 'b7': b7, 'b8': b8

}

return variables

def model_alexnet(data, variables):

layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 4, 4, 1], padding='SAME')

layer1_relu = tf.nn.relu(layer1_conv + variables['b1'])

layer1_pool = tf.nn.max_pool(layer1_relu, [1, 3, 3, 1], [1, 2, 2, 1], padding='SAME')

layer1_norm = tf.nn.local_response_normalization(layer1_pool)

layer2_conv = tf.nn.conv2d(layer1_norm, variables['w2'], [1, 1, 1, 1], padding='SAME')

layer2_relu = tf.nn.relu(layer2_conv + variables['b2'])

layer2_pool = tf.nn.max_pool(layer2_relu, [1, 3, 3, 1], [1, 2, 2, 1], padding='SAME')

layer2_norm = tf.nn.local_response_normalization(layer2_pool)

layer3_conv = tf.nn.conv2d(layer2_norm, variables['w3'], [1, 1, 1, 1], padding='SAME')

layer3_relu = tf.nn.relu(layer3_conv + variables['b3'])

layer4_conv = tf.nn.conv2d(layer3_relu, variables['w4'], [1, 1, 1, 1], padding='SAME')

layer4_relu = tf.nn.relu(layer4_conv + variables['b4'])

layer5_conv = tf.nn.conv2d(layer4_relu, variables['w5'], [1, 1, 1, 1], padding='SAME')

layer5_relu = tf.nn.relu(layer5_conv + variables['b5'])

layer5_pool = tf.nn.max_pool(layer4_relu, [1, 3, 3, 1], [1, 2, 2, 1], padding='SAME')

layer5_norm = tf.nn.local_response_normalization(layer5_pool)

flat_layer = flatten_tf_array(layer5_norm)

layer6_fccd = tf.matmul(flat_layer, variables['w6']) + variables['b6']

layer6_tanh = tf.tanh(layer6_fccd)

layer6_drop = tf.nn.dropout(layer6_tanh, 0.5)

layer7_fccd = tf.matmul(layer6_drop, variables['w7']) + variables['b7']

layer7_tanh = tf.tanh(layer7_fccd)

layer7_drop = tf.nn.dropout(layer7_tanh, 0.5)

logits = tf.matmul(layer7_drop, variables['w8']) + variables['b8']

return logits

現(xiàn)在我們可以修改CNN模型來使用AlexNet模型的權(quán)重和層次來對圖像進(jìn)行分類。

3.2 VGG Net-16

VGG Net于2014年由牛津大學(xué)的Karen Simonyan和Andrew Zisserman創(chuàng)建出來。它包含了更多的層(16-19層)，但是每一層的設(shè)計(jì)更為簡單;所有卷積層都具有3×3以及步長為3的過濾器，并且所有最大池化層的步長都為2。

所以它是一個(gè)更深的CNN，但更簡單。

它存在不同的配置，16層或19層。這兩種不同配置之間的區(qū)別是在第2，第3和第4最大池化層之后對3或4個(gè)卷積層的使用(見下文)。

配置為16層(配置D)的結(jié)果似乎更好，所以我們試著在Tensorflow中創(chuàng)建它。

#The VGGNET Neural Network

VGG16_PATCH_SIZE_1, VGG16_PATCH_SIZE_2, VGG16_PATCH_SIZE_3, VGG16_PATCH_SIZE_4 = 3, 3, 3, 3

VGG16_PATCH_DEPTH_1, VGG16_PATCH_DEPTH_2, VGG16_PATCH_DEPTH_3, VGG16_PATCH_DEPTH_4 = 64, 128, 256, 512

VGG16_NUM_HIDDEN_1, VGG16_NUM_HIDDEN_2 = 4096, 1000

def variables_vggnet16(patch_size1 = VGG16_PATCH_SIZE_1, patch_size2 = VGG16_PATCH_SIZE_2,

patch_size3 = VGG16_PATCH_SIZE_3, patch_size4 = VGG16_PATCH_SIZE_4,

patch_depth1 = VGG16_PATCH_DEPTH_1, patch_depth2 = VGG16_PATCH_DEPTH_2,

patch_depth3 = VGG16_PATCH_DEPTH_3, patch_depth4 = VGG16_PATCH_DEPTH_4,

num_hidden1 = VGG16_NUM_HIDDEN_1, num_hidden2 = VGG16_NUM_HIDDEN_2,

image_width = 224, image_height = 224, image_depth = 3, num_labels = 17):

w1 = tf.Variable(tf.truncated_normal([patch_size1, patch_size1, image_depth, patch_depth1], stddev=0.1))

b1 = tf.Variable(tf.zeros([patch_depth1]))

w2 = tf.Variable(tf.truncated_normal([patch_size1, patch_size1, patch_depth1, patch_depth1], stddev=0.1))

b2 = tf.Variable(tf.constant(1.0, shape=[patch_depth1]))

w3 = tf.Variable(tf.truncated_normal([patch_size2, patch_size2, patch_depth1, patch_depth2], stddev=0.1))

b3 = tf.Variable(tf.constant(1.0, shape = [patch_depth2]))

w4 = tf.Variable(tf.truncated_normal([patch_size2, patch_size2, patch_depth2, patch_depth2], stddev=0.1))

b4 = tf.Variable(tf.constant(1.0, shape = [patch_depth2]))

w5 = tf.Variable(tf.truncated_normal([patch_size3, patch_size3, patch_depth2, patch_depth3], stddev=0.1))

b5 = tf.Variable(tf.constant(1.0, shape = [patch_depth3]))

w6 = tf.Variable(tf.truncated_normal([patch_size3, patch_size3, patch_depth3, patch_depth3], stddev=0.1))

b6 = tf.Variable(tf.constant(1.0, shape = [patch_depth3]))

w7 = tf.Variable(tf.truncated_normal([patch_size3, patch_size3, patch_depth3, patch_depth3], stddev=0.1))

b7 = tf.Variable(tf.constant(1.0, shape=[patch_depth3]))

w8 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth3, patch_depth4], stddev=0.1))

b8 = tf.Variable(tf.constant(1.0, shape = [patch_depth4]))

w9 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth4, patch_depth4], stddev=0.1))

b9 = tf.Variable(tf.constant(1.0, shape = [patch_depth4]))

w10 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth4, patch_depth4], stddev=0.1))

b10 = tf.Variable(tf.constant(1.0, shape = [patch_depth4]))

w11 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth4, patch_depth4], stddev=0.1))

b11 = tf.Variable(tf.constant(1.0, shape = [patch_depth4]))

w12 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth4, patch_depth4], stddev=0.1))

b12 = tf.Variable(tf.constant(1.0, shape=[patch_depth4]))

w13 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth4, patch_depth4], stddev=0.1))

b13 = tf.Variable(tf.constant(1.0, shape = [patch_depth4]))

no_pooling_layers = 5

w14 = tf.Variable(tf.truncated_normal([(image_width // (2no_pooling_layers))(image_height // (2no_pooling_layers))patch_depth4 , num_hidden1], stddev=0.1))

b14 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))

w15 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))

b15 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))

w16 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))

b16 = tf.Variable(tf.constant(1.0, shape = [num_labels]))

variables = {

'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5, 'w6': w6, 'w7': w7, 'w8': w8, 'w9': w9, 'w10': w10,

'w11': w11, 'w12': w12, 'w13': w13, 'w14': w14, 'w15': w15, 'w16': w16,

'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5, 'b6': b6, 'b7': b7, 'b8': b8, 'b9': b9, 'b10': b10,

'b11': b11, 'b12': b12, 'b13': b13, 'b14': b14, 'b15': b15, 'b16': b16

}

return variables

def model_vggnet16(data, variables):

layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')

layer1_actv = tf.nn.relu(layer1_conv + variables['b1'])

layer2_conv = tf.nn.conv2d(layer1_actv, variables['w2'], [1, 1, 1, 1], padding='SAME')

layer2_actv = tf.nn.relu(layer2_conv + variables['b2'])

layer2_pool = tf.nn.max_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

layer3_conv = tf.nn.conv2d(layer2_pool, variables['w3'], [1, 1, 1, 1], padding='SAME')

layer3_actv = tf.nn.relu(layer3_conv + variables['b3'])

layer4_conv = tf.nn.conv2d(layer3_actv, variables['w4'], [1, 1, 1, 1], padding='SAME')

layer4_actv = tf.nn.relu(layer4_conv + variables['b4'])

layer4_pool = tf.nn.max_pool(layer4_pool, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

layer5_conv = tf.nn.conv2d(layer4_pool, variables['w5'], [1, 1, 1, 1], padding='SAME')

layer5_actv = tf.nn.relu(layer5_conv + variables['b5'])

layer6_conv = tf.nn.conv2d(layer5_actv, variables['w6'], [1, 1, 1, 1], padding='SAME')

layer6_actv = tf.nn.relu(layer6_conv + variables['b6'])

layer7_conv = tf.nn.conv2d(layer6_actv, variables['w7'], [1, 1, 1, 1], padding='SAME')

layer7_actv = tf.nn.relu(layer7_conv + variables['b7'])

layer7_pool = tf.nn.max_pool(layer7_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

layer8_conv = tf.nn.conv2d(layer7_pool, variables['w8'], [1, 1, 1, 1], padding='SAME')

layer8_actv = tf.nn.relu(layer8_conv + variables['b8'])

layer9_conv = tf.nn.conv2d(layer8_actv, variables['w9'], [1, 1, 1, 1], padding='SAME')

layer9_actv = tf.nn.relu(layer9_conv + variables['b9'])

layer10_conv = tf.nn.conv2d(layer9_actv, variables['w10'], [1, 1, 1, 1], padding='SAME')

layer10_actv = tf.nn.relu(layer10_conv + variables['b10'])

layer10_pool = tf.nn.max_pool(layer10_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

layer11_conv = tf.nn.conv2d(layer10_pool, variables['w11'], [1, 1, 1, 1], padding='SAME')

layer11_actv = tf.nn.relu(layer11_conv + variables['b11'])

layer12_conv = tf.nn.conv2d(layer11_actv, variables['w12'], [1, 1, 1, 1], padding='SAME')

layer12_actv = tf.nn.relu(layer12_conv + variables['b12'])

layer13_conv = tf.nn.conv2d(layer12_actv, variables['w13'], [1, 1, 1, 1], padding='SAME')

layer13_actv = tf.nn.relu(layer13_conv + variables['b13'])

layer13_pool = tf.nn.max_pool(layer13_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

flat_layer = flatten_tf_array(layer13_pool)

layer14_fccd = tf.matmul(flat_layer, variables['w14']) + variables['b14']

layer14_actv = tf.nn.relu(layer14_fccd)

layer14_drop = tf.nn.dropout(layer14_actv, 0.5)

layer15_fccd = tf.matmul(layer14_drop, variables['w15']) + variables['b15']

layer15_actv = tf.nn.relu(layer15_fccd)

layer15_drop = tf.nn.dropout(layer15_actv, 0.5)

logits = tf.matmul(layer15_drop, variables['w16']) + variables['b16']

return logits

3.3 AlexNet 性能

作為比較，看一下對包含了較大圖片的oxflower17數(shù)據(jù)集的LeNet5 CNN性能：

4. 結(jié)語

相關(guān)代碼可以在我的GitHub庫中獲得，因此可以隨意在自己的數(shù)據(jù)集上使用它。

在深度學(xué)習(xí)的世界中還有更多的知識可以去探索：循環(huán)神經(jīng)網(wǎng)絡(luò)、基于區(qū)域的CNN、GAN、加強(qiáng)學(xué)習(xí)等等。在未來的博客文章中，我將構(gòu)建這些類型的神經(jīng)網(wǎng)絡(luò)，并基于我們已經(jīng)學(xué)到的知識構(gòu)建更有意思的應(yīng)用程序。

文章原標(biāo)題《Building Convolutional Neural Networks with Tensorflow》，作者：Ahmet Taspinar，譯者：夏天，審校：主題曲。