···66 \usepackage[justification=centering]{caption} % Used for captions
77 \captionsetup[figure]{font=small} % Makes captions small
88 \newcommand\tab[1][0.5cm]{\hspace*{#1}} % Defines a new command to use 'tab' in text
99- % Math package
1010- \usepackage{amsmath}
99+ \usepackage[comma, numbers]{natbib} % Used for the bibliography
1010+ \usepackage{amsmath} % Math package
1111 % Enable that parameters of \cref{}, \ref{}, \cite{}, ... are linked so that a reader can click on the number an jump to the target in the document
1212 \usepackage{hyperref}
1313 %enable \cref{...} and \Cref{...} instead of \ref: Type of reference included in the link
···9999 architectures, as this method is currently the most used for image
100100 classification.
101101102102+ \textbf{
103103+ \\A couple of papers that may be useful (if needed):
104104+ - LeNet: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
105105+ - AlexNet: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
106106+ - General comparison of LeNet and AlexNet:
107107+ "On the Performance of GoogLeNet and AlexNet Applied to Sketches", Pedro Ballester and Ricardo Matsumura Araujo
108108+ - Deep NN Architecture:
109109+ https://www-sciencedirect-com.ezproxy.lib.monash.edu.au/science/article/pii/S0925231216315533
110110+ }
111111+102112 \subsection{Classical Machine Learning Methods}
103113104114 The following paragraphs will give only brief descriptions of the different
···130140131141 \subsection{Neural Network Architectures}
132142 \todo{Did we only do the three in the end? (Alexnet?)}
143143+ Yeah, we implemented the LeNet architecture, then improved on it for a fairly standar convolutional neural network (CNN) that was deeper, extracted more features, and condensed that image information more. Then we implemented a more fully convolutional network (FCN) which contained only one dense layer for the final binary classification step. The FCN added an extra convolutional layer, meaning the before classifying each image, the network abstracted the data more than the other two.
144144+ \begin{itemize}
145145+ \item LeNet
146146+ \item CNN
147147+ \item FCN
148148+ \end{itemize}
133149134150 \paragraph{Convolutional Neural Networks}
135151···139155140156141157 \section{Method} \label{sec:method}
158158+ \tab
159159+ In order to effectively utilize the aforementioned modelling and classification techniques, a key consideration is the data they are acting on.
160160+ A dataset containing Waldo and non-Waldo images was obtained from an Open Database\footnote{``The Open Database License (ODbL) is a license agreement intended to allow users to freely share, modify, and use [a] Database while maintaining [the] same freedom for others"\cite{openData}}hosted on the predictive modelling and analytics competition framework, Kaggle.
161161+ The distinction between images containing Waldo, and those that do not, was providied by the separation of the images in different sub-directories.
162162+ It was therefore necessary to preprocess these images before they could be utilised by the proposed machine learning algorithms.
163163+164164+ \subsection{Image Processing}
165165+ \tab
166166+ The Waldo image database consists of images of size 64$\times$64, 128$\times$128, and 256$\times$256 pixels obtained by dividing complete Where's Waldo? puzzles.
167167+ Within each set of images, those containing Waldo are located in a folder called `waldo', and those not containing Waldo, in a folder called `not\_waldo'.
168168+ Since Where's Waldo? puzzles are usually densely populated and contain fine details, the 64$\times$64 pixel set of images were selected to train and evaluate the machine learning models.
169169+ These images provide the added benefit of containing the most individual images of the three size groups.
170170+ \\
171171+ \par
172172+ Each of the 64$\times$64 pixel images were inserted into a Numpy
173173+ \footnote{Numpy is a popular Python programming library for scientific computing}
174174+ array of images, and a binary value was inserted into a seperate list at the same index.
175175+ These binary values form the labels for each image (waldo or not waldo).
176176+ Colour normalisation was performed on each so that artefacts in an image's colour profile correspond to meaningful features of the image (rather than photographic method).
177177+ \\
178178+ \par
179179+ Each original puzzle is broken down into many images, and only contains one Waldo. Although Waldo might span multiple 64$\times$64 pixel squares, this means that the non-Waldo data far outnumbers the Waldo data.
180180+ To combat the bias introduced by the skewed data, all Waldo images were artificially augmented by performing random rotations, reflections, and introducing random noise in the image to produce news images.
181181+ In this way, each original Waldo image was used to produce an additional 10 variations of the image, inserted into the image array.
182182+ This provided more variation in the true positives of the data set and assists in the development of more robust methods by exposing each technique to variations of the image during the training phase.
183183+ \\
184184+ \par
185185+ Despite the additional data, there were still over ten times as many non-Waldo images than Waldo images.
186186+ Therefore, it was necessary to cull the no-Waldo data, so that there was an even split of Waldo and non-Waldo images, improving the representation of true positives in the image data set.
187187+ \\
142188143143- % Kelvin Start
144144- \subsection{Benchmarking}\label{benchmarking}
189189+ % Kelvin Start
190190+ \subsection{Benchmarking}\label{benchmarking}
145191146146- In order to benchmark the Neural Networks, the performance of these
147147- algorithms are evaluated against other Machine Learning algorithms. We
148148- use Support Vector Machines, K-Nearest Neighbours (\(K=5\)), Gaussian
149149- Naive Bayes and Random Forest classifiers, as provided in Scikit-Learn.
192192+ In order to benchmark the Neural Networks, the performance of these
193193+ algorithms are evaluated against other Machine Learning algorithms. We
194194+ use Support Vector Machines, K-Nearest Neighbours (\(K=5\)), Gaussian
195195+ Naive Bayes and Random Forest classifiers, as provided in Scikit-Learn.
150196151151- \subsection{Performance Metrics}\label{performance-metrics}
197197+ \subsection{Performance Metrics}\label{performance-metrics}
152198153153- To evaluate the performance of the models, we record the time taken by
154154- each model to train, based on the training data and statistics about the
155155- predictions the models make on the test data. These prediction
156156- statistics include:
199199+ To evaluate the performance of the models, we record the time taken by
200200+ each model to train, based on the training data and statistics about the
201201+ predictions the models make on the test data. These prediction
202202+ statistics include:
157203158158- \begin{itemize}
159159- \item
160160- \textbf{Accuracy:}
161161- \[a = \dfrac{|correct\ predictions|}{|predictions|} = \dfrac{tp + tn}{tp + tn + fp + fn}\]
162162- \item
163163- \textbf{Precision:}
164164- \[p = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|predicted\ as\ Waldo|} = \dfrac{tp}{tp + fp}\]
165165- \item
166166- \textbf{Recall:}
167167- \[r = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|actually\ Waldo|} = \dfrac{tp}{tp + fn}\]
168168- \item
169169- \textbf{F1 Measure:} \[f1 = \dfrac{2pr}{p + r}\] where \(tp\) is the
170170- number of true positives, \(tn\) is the number of true negatives,
171171- \(fp\) is the number of false positives, and \(tp\) is the number of
172172- false negatives.
173173- \end{itemize}
204204+ \begin{itemize}
205205+ \item
206206+ \textbf{Accuracy:}
207207+ \[a = \dfrac{|correct\ predictions|}{|predictions|} = \dfrac{tp + tn}{tp + tn + fp + fn}\]
208208+ \item
209209+ \textbf{Precision:}
210210+ \[p = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|predicted\ as\ Waldo|} = \dfrac{tp}{tp + fp}\]
211211+ \item
212212+ \textbf{Recall:}
213213+ \[r = \dfrac{|Waldo\ predicted\ as\ Waldo|}{|actually\ Waldo|} = \dfrac{tp}{tp + fn}\]
214214+ \item
215215+ \textbf{F1 Measure:} \[f1 = \dfrac{2pr}{p + r}\] where \(tp\) is the
216216+ number of true positives, \(tn\) is the number of true negatives,
217217+ \(fp\) is the number of false positives, and \(tp\) is the number of
218218+ false negatives.
219219+ \end{itemize}
174220175175- Accuracy is a common performance metric used in Machine Learning,
176176- however in classification problems where the training data is heavily
177177- biased toward one category, sometimes a model will learn to optimize its
178178- accuracy by classifying all instances as one category. I.e. the
179179- classifier will classify all images that do not contain Waldo as not
180180- containing Waldo, but will also classify all images containing Waldo as
181181- not containing Waldo. Thus we use, other metrics to measure performance
182182- as well.
221221+ Accuracy is a common performance metric used in Machine Learning,
222222+ however in classification problems where the training data is heavily
223223+ biased toward one category, sometimes a model will learn to optimize its
224224+ accuracy by classifying all instances as one category. I.e. the
225225+ classifier will classify all images that do not contain Waldo as not
226226+ containing Waldo, but will also classify all images containing Waldo as
227227+ not containing Waldo. Thus we use, other metrics to measure performance
228228+ as well.
183229184184- \emph{Precision} returns the percentage of classifications of Waldo that
185185- are actually Waldo. \emph{Recall} returns the percentage of Waldos that
186186- were actually predicted as Waldo. In the case of a classifier that
187187- classifies all things as Waldo, the recall would be 0. \emph{F1-Measure}
188188- returns a combination of precision and recall that heavily penalises
189189- classifiers that perform poorly in either precision or recall.
190190- % Kelvin End
230230+ \emph{Precision} returns the percentage of classifications of Waldo that
231231+ are actually Waldo. \emph{Recall} returns the percentage of Waldos that
232232+ were actually predicted as Waldo. In the case of a classifier that
233233+ classifies all things as Waldo, the recall would be 0. \emph{F1-Measure}
234234+ returns a combination of precision and recall that heavily penalises
235235+ classifiers that perform poorly in either precision or recall.
236236+ % Kelvin End
191237192238 \section{Results} \label{sec:results}
193239194240 \section{Conclusion} \label{sec:conclusion}
195241242242+ \clearpage % Ensures that the references are on a seperate page
243243+ \pagebreak
244244+ % References
245245+ \section{References}
246246+ \renewcommand{\refname}{}
196247 \bibliographystyle{alpha}
197248 \bibliography{references}
198198-199249 \end{document}
+73-4
mini_proj/waldo_model.py
···2525'''
2626Model definition define the network structure
2727'''
2828-def FCN():
2828+def CNN():
2929 ## List of model layers
3030 inputs = Input((3, 64, 64))
3131···3333 m_pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
34343535 conv2 = Conv2D(32, (3, 3), activation='relu', padding='same')(m_pool1)
3636- #drop1 = Dropout(0.2)(conv2) # Drop some portion of features to prevent overfitting
3736 m_pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
38373938 conv3 = Conv2D(32, (3, 3), activation='relu', padding='same')(m_pool2)
···4746 drop3 = Dropout(0.2)(dense)
4847 classif = Dense(2, activation='sigmoid')(drop3) # Final layer to classify
49485050- ## Define the model structure
4949+ ## Define the model start and end
5050+ model = Model(inputs=inputs, outputs=classif)
5151+ # Optimizer recommended Adadelta values (lr=0.01)
5252+ model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy', f1])
5353+5454+ return model
5555+5656+'''
5757+Model definition for a fully convolutional (no dense layers) network structure
5858+'''
5959+def FCN():
6060+ ## List of model layers
6161+ inputs = Input((3, 64, 64))
6262+6363+ conv1 = Conv2D(16, (3, 3), activation='relu', padding='same', input_shape=(64, 64, 3))(inputs)
6464+ m_pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
6565+6666+ conv2 = Conv2D(32, (3, 3), activation='relu', padding='same')(m_pool1)
6767+ m_pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
6868+6969+ conv3 = Conv2D(32, (3, 3), activation='relu', padding='same')(m_pool2)
7070+ drop2 = Dropout(0.2)(conv3) # Drop some portion of features to prevent overfitting
7171+ m_pool2 = MaxPooling2D(pool_size=(2, 2))(drop2)
7272+7373+ conv4 = Conv2D(64, (2, 2), activation='relu', padding='same')(m_pool2)
7474+7575+ flat = Flatten()(conv4) # Makes data 1D
7676+ drop3 = Dropout(0.2)(flat)
7777+ classif = Dense(2, activation='sigmoid')(drop3) # Final layer to classify
7878+7979+ ## Define the model start and end
5180 model = Model(inputs=inputs, outputs=classif)
5281 # Optimizer recommended Adadelta values (lr=0.01)
5382 model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy', f1])
54835584 return model
56858686+8787+'''
8888+Model definition for the network structure of LeNet
8989+Note: LeNet was designed to classify into 10 classes, but we are only performing binary classification
9090+'''
9191+def LeNet():
9292+ ## List of model layers
9393+ inputs = Input((3, 64, 64))
9494+9595+ conv1 = Conv2D(6, (5, 5), activation='relu', padding='valid', input_shape=(64, 64, 3))(inputs)
9696+ m_pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
9797+9898+ conv2 = Conv2D(16, (5, 5), activation='relu', padding='valid')(m_pool1)
9999+ m_pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
100100+101101+ flat = Flatten()(m_pool2) # Makes data 1D
102102+103103+ dense1 = Dense(120, activation='relu')(flat) # Fully connected layer
104104+ dense2 = Dense(84, activation='relu')(dense1) # Fully connected layer
105105+ drop3 = Dropout(0.2)(dense2)
106106+ classif = Dense(2, activation='sigmoid')(drop3) # Final layer to classify
107107+108108+ ## Define the model start and end
109109+ model = Model(inputs=inputs, outputs=classif)
110110+ model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy', f1])
111111+112112+ return model
113113+114114+'''
115115+AlexNet architecture
116116+'''
117117+def AlexNet():
118118+ inputs = Input(shape=(3, 64, 64))
119119+120120+121121+ return model
122122+123123+57124def f1(y_true, y_pred):
58125 def recall(y_true, y_pred):
59126 """Recall metric.
···110177lbl_test = to_categorical(lbl_test)
111178112179## Define model
180180+#model = CNN()
113181model = FCN()
182182+#model = LeNet()
114183# svm_iclf = ImageClassifier(svm.SVC)
115184# tree_iclf = ImageClassifier(tree.DecisionTreeClassifier)
116185# naive_bayes_iclf = ImageClassifier(naive_bayes.GaussianNBd)
117186# ensemble_iclf = ImageClassifier(ensemble.RandomForestClassifier)
118187119188## Define training parameters
120120-epochs = 10 # an epoch is one forward pass and back propogation of all training data
189189+epochs = 25 # an epoch is one forward pass and back propogation of all training data
121190batch_size = 150 # batch size - number of training example used in one forward/backward pass
122191# (higher batch size uses more memory, smaller batch size takes more time)
123192#lrate = 0.01 # Learning rate of the model - controls magnitude of weight changes in training the NN