···137137 pages={2825--2830},
138138 year={2011}
139139}
140140+@misc{kaggle,
141141+ title = {Kaggle: The Home of Data Science \& Machine Learning},
142142+ howpublished = {\url{https://www.kaggle.com/}},
143143+ note = {Accessed: 2018-05-25}
144144+}
+41-27
mini_proj/report/waldo.tex
···5050 Almost every child around the world knows about ``Where's Waldo?'', also
5151 known as ``Where's Wally?'' in some countries. This famous puzzle book has
5252 spread its way across the world and is published in more than 25 different
5353- languages. The idea behind the books is to find the character ``Waldo'',
5353+ languages. The idea behind the books is to find the character Waldo,
5454 shown in \Cref{fig:waldo}, in the different pictures in the book. This is,
5555 however, not as easy as it sounds. Every picture in the book is full of tiny
5656 details and Waldo is only one out of many. The puzzle is made even harder by
···6464 \includegraphics[scale=0.35]{waldo.png}
6565 \centering
6666 \caption{
6767- A headshot of the character ``Waldo'', or ``Wally''. Pictures of Waldo
6767+ A headshot of the character Waldo, or Wally. Pictures of Waldo
6868 copyrighted by Martin Handford and are used under the fair-use policy.
6969 }
7070 \label{fig:waldo}
···8282 setting that is humanly tangible. In this report we will try to identify
8383 Waldo in the puzzle images using different classification methods. Every
8484 image will be split into different segments and every segment will have to
8585- be classified as either being ``Waldo'' or ``not Waldo''. We will compare
8585+ be classified as either being Waldo or not Waldo. We will compare
8686 various different classification methods from more classical machine
8787 learning, like naive Bayes classifiers, to the currently state of the art,
8888 Neural Networks. In \Cref{sec:background} we will introduce the different
···158158 of randomness and the mean of these trees is used which avoids this problem.
159159160160 \subsection{Neural Network Architectures}
161161- \tab There are many well established architectures for Neural Networks depending on the task being performed.
162162- In this paper, the focus is placed on convolution neural networks, which have been proven to effectively classify images \cite{NIPS2012_4824}.
163163- One of the pioneering works in the field, the LeNet \cite{726791}architecture, will be implemented to compare against two rudimentary networks with more depth.
164164- These networks have been constructed to improve on the LeNet architecture by extracting more features, condensing image information, and allowing for more parameters in the network.
165165- The difference between the two network use of convolutional and dense layers.
166166- The convolutional neural network contains dense layers in the final stages of the network.
167167- The Fully Convolutional Network (FCN) contains only one dense layer for the final binary classification step.
168168- The FCN instead consists of an extra convolutional layer, resulting in an increased ability for the network to abstract the input data relative to the other two configurations.
169169- \\
170170- \todo{Insert image of LeNet from slides if time}
161161+162162+ There are many well established architectures for Neural Networks depending
163163+ on the task being performed. In this paper, the focus is placed on
164164+ convolution neural networks, which have been proven to effectively classify
165165+ images \cite{NIPS2012_4824}. One of the pioneering works in the field, the
166166+ LeNet \cite{726791}architecture, will be implemented to compare against two
167167+ rudimentary networks with more depth. These networks have been constructed
168168+ to improve on the LeNet architecture by extracting more features, condensing
169169+ image information, and allowing for more parameters in the network. The
170170+ difference between the two network use of convolutional and dense layers.
171171+ The convolutional neural network contains dense layers in the final stages
172172+ of the network. The Fully Convolutional Network (FCN) contains only one
173173+ dense layer for the final binary classification step. The FCN instead
174174+ consists of an extra convolutional layer, resulting in an increased ability
175175+ for the network to abstract the input data relative to the other two
176176+ configurations. \\
177177+178178+ \begin{figure}[H]
179179+ \includegraphics[scale=0.50]{LeNet}
180180+ \centering
181181+ \captionsetup{width=0.90\textwidth}
182182+ \caption{Representation of the LeNet Neural Network model architecture including convolutional layers and pooling (subsampling) layers\cite{726791}}
183183+ \label{fig:LeNet}
184184+ \end{figure}
171185172186 \section{Method} \label{sec:method}
173187···178192 agreement intended to allow users to freely share, modify, and use [a]
179193 Database while maintaining [the] same freedom for
180194 others"\cite{openData}}hosted on the predictive modeling and analytics
181181- competition framework, Kaggle. The distinction between images containing
182182- Waldo, and those that do not, was provided by the separation of the images
183183- in different sub-directories. It was therefore necessary to preprocess these
184184- images before they could be utilized by the proposed machine learning
185185- algorithms.
195195+ competition framework, Kaggle~\cite{kaggle}. The distinction between images
196196+ containing Waldo, and those that do not, was provided by the separation of
197197+ the images in different sub-directories. It was therefore necessary to
198198+ preprocess these images before they could be utilized by the proposed
199199+ machine learning algorithms.
186200187201 \subsection{Image Processing} \label{imageProcessing}
188202···197211 containing the most individual images of the three size groups. \\
198212199213 Each of the 64$\times$64 pixel images were inserted into a
200200- Numpy~\cite{numpy} array of images, and a binary value was inserted into a
214214+ NumPy~\cite{numpy} array of images, and a binary value was inserted into a
201215 separate list at the same index. These binary values form the labels for
202202- each image (``Waldo'' or ``not Waldo''). Color normalization was performed
216216+ each image (Waldo or not Waldo). Color normalization was performed
203217 on each so that artifacts in an image's color profile correspond to
204218 meaningful features of the image (rather than photographic method).\\
205219206220 Each original puzzle is broken down into many images, and only contains one
207221 Waldo. Although Waldo might span multiple 64$\times$64 pixel squares, this
208208- means that the ``non-Waldo'' data far outnumbers the ``Waldo'' data. To
222222+ means that the non-Waldo data far outnumbers the Waldo data. To
209223 combat the bias introduced by the skewed data, all Waldo images were
210224 artificially augmented by performing random rotations, reflections, and
211225 introducing random noise in the image to produce news images. In this way,
···215229 robust methods by exposing each technique to variations of the image during
216230 the training phase. \\
217231218218- Despite the additional data, there were still ten times more ``non-Waldo''
232232+ Despite the additional data, there were still ten times more non-Waldo
219233 images than Waldo images. Therefore, it was necessary to cull the
220220- ``non-Waldo'' data, so that there was an even split of ``Waldo'' and
221221- ``non-Waldo'' images, improving the representation of true positives in the
234234+ non-Waldo data, so that there was an even split of Waldo and
235235+ non-Waldo images, improving the representation of true positives in the
222236 image data set. Following preprocessing, the images (and associated labels)
223237 were divided into a training and a test set with a 3:1 split. \\
224238···254268 To evaluate the performance of the models, we record the time taken by
255269 each model to train, based on the training data and the accuracy with which
256270 the model makes predictions. We calculate accuracy as
257257- \(a = \frac{|correct\ predictions|}{|predictions|} = \frac{tp + tn}{tp + tn + fp + fn}\)
271271+ \[a = \frac{|correct\ predictions|}{|predictions|} = \frac{tp + tn}{tp + tn + fp + fn}\]
258272 where \(tp\) is the number of true positives, \(tn\) is the number of true
259273 negatives, \(fp\) is the number of false positives, and \(tp\) is the number
260274 of false negatives.
···293307 network and traditional machine learning technique}
294308 \label{tab:results}
295309 \end{table}
296296-310310+297311 We can see by the results that Deep Neural Networks outperform our benchmark
298312 classification models, although the time required to train these networks is
299313 significantly greater.
···305319 by the rest of the results, this comes down to a models ability to learn
306320 the hidden relationships between the pixels. This is made more apparent by
307321 performance of the Neural Networks.
308308-322322+309323 \section{Conclusion} \label{sec:conclusion}
310324311325 Image from the ``Where's Waldo?'' puzzle books are ideal images to test