Group work for a Monash Research Methods course

Merge branch 'master' of github.com:Dekker1/ResearchMethods

+49 -27
+3
mini_proj/benchmark_results.csv
··· 3 3 tree,0.25446152687072754,0.7087378640776699 4 4 naive_bayes,0.12949371337890625,0.8252427184466019 5 5 forest,0.2792677879333496,0.9514563106796117 6 + lenet,58.12968325614929,0.8980582524271845 7 + cnn,113.81168508529663,0.9563106796116505 8 + fcn,117.69003772735596,0.9466019417475728
mini_proj/report/LeNet.jpg

This is a binary file and will not be displayed.

+5
mini_proj/report/references.bib
··· 137 137 pages={2825--2830}, 138 138 year={2011} 139 139 } 140 + @misc{kaggle, 141 + title = {Kaggle: The Home of Data Science \& Machine Learning}, 142 + howpublished = {\url{https://www.kaggle.com/}}, 143 + note = {Accessed: 2018-05-25} 144 + }
+41 -27
mini_proj/report/waldo.tex
··· 50 50 Almost every child around the world knows about ``Where's Waldo?'', also 51 51 known as ``Where's Wally?'' in some countries. This famous puzzle book has 52 52 spread its way across the world and is published in more than 25 different 53 - languages. The idea behind the books is to find the character ``Waldo'', 53 + languages. The idea behind the books is to find the character Waldo, 54 54 shown in \Cref{fig:waldo}, in the different pictures in the book. This is, 55 55 however, not as easy as it sounds. Every picture in the book is full of tiny 56 56 details and Waldo is only one out of many. The puzzle is made even harder by ··· 64 64 \includegraphics[scale=0.35]{waldo.png} 65 65 \centering 66 66 \caption{ 67 - A headshot of the character ``Waldo'', or ``Wally''. Pictures of Waldo 67 + A headshot of the character Waldo, or Wally. Pictures of Waldo 68 68 copyrighted by Martin Handford and are used under the fair-use policy. 69 69 } 70 70 \label{fig:waldo} ··· 82 82 setting that is humanly tangible. In this report we will try to identify 83 83 Waldo in the puzzle images using different classification methods. Every 84 84 image will be split into different segments and every segment will have to 85 - be classified as either being ``Waldo'' or ``not Waldo''. We will compare 85 + be classified as either being Waldo or not Waldo. We will compare 86 86 various different classification methods from more classical machine 87 87 learning, like naive Bayes classifiers, to the currently state of the art, 88 88 Neural Networks. In \Cref{sec:background} we will introduce the different ··· 158 158 of randomness and the mean of these trees is used which avoids this problem. 159 159 160 160 \subsection{Neural Network Architectures} 161 - \tab There are many well established architectures for Neural Networks depending on the task being performed. 162 - In this paper, the focus is placed on convolution neural networks, which have been proven to effectively classify images \cite{NIPS2012_4824}. 163 - One of the pioneering works in the field, the LeNet \cite{726791}architecture, will be implemented to compare against two rudimentary networks with more depth. 164 - These networks have been constructed to improve on the LeNet architecture by extracting more features, condensing image information, and allowing for more parameters in the network. 165 - The difference between the two network use of convolutional and dense layers. 166 - The convolutional neural network contains dense layers in the final stages of the network. 167 - The Fully Convolutional Network (FCN) contains only one dense layer for the final binary classification step. 168 - The FCN instead consists of an extra convolutional layer, resulting in an increased ability for the network to abstract the input data relative to the other two configurations. 169 - \\ 170 - \todo{Insert image of LeNet from slides if time} 161 + 162 + There are many well established architectures for Neural Networks depending 163 + on the task being performed. In this paper, the focus is placed on 164 + convolution neural networks, which have been proven to effectively classify 165 + images \cite{NIPS2012_4824}. One of the pioneering works in the field, the 166 + LeNet \cite{726791}architecture, will be implemented to compare against two 167 + rudimentary networks with more depth. These networks have been constructed 168 + to improve on the LeNet architecture by extracting more features, condensing 169 + image information, and allowing for more parameters in the network. The 170 + difference between the two network use of convolutional and dense layers. 171 + The convolutional neural network contains dense layers in the final stages 172 + of the network. The Fully Convolutional Network (FCN) contains only one 173 + dense layer for the final binary classification step. The FCN instead 174 + consists of an extra convolutional layer, resulting in an increased ability 175 + for the network to abstract the input data relative to the other two 176 + configurations. \\ 177 + 178 + \begin{figure}[H] 179 + \includegraphics[scale=0.50]{LeNet} 180 + \centering 181 + \captionsetup{width=0.90\textwidth} 182 + \caption{Representation of the LeNet Neural Network model architecture including convolutional layers and pooling (subsampling) layers\cite{726791}} 183 + \label{fig:LeNet} 184 + \end{figure} 171 185 172 186 \section{Method} \label{sec:method} 173 187 ··· 178 192 agreement intended to allow users to freely share, modify, and use [a] 179 193 Database while maintaining [the] same freedom for 180 194 others"\cite{openData}}hosted on the predictive modeling and analytics 181 - competition framework, Kaggle. The distinction between images containing 182 - Waldo, and those that do not, was provided by the separation of the images 183 - in different sub-directories. It was therefore necessary to preprocess these 184 - images before they could be utilized by the proposed machine learning 185 - algorithms. 195 + competition framework, Kaggle~\cite{kaggle}. The distinction between images 196 + containing Waldo, and those that do not, was provided by the separation of 197 + the images in different sub-directories. It was therefore necessary to 198 + preprocess these images before they could be utilized by the proposed 199 + machine learning algorithms. 186 200 187 201 \subsection{Image Processing} \label{imageProcessing} 188 202 ··· 197 211 containing the most individual images of the three size groups. \\ 198 212 199 213 Each of the 64$\times$64 pixel images were inserted into a 200 - Numpy~\cite{numpy} array of images, and a binary value was inserted into a 214 + NumPy~\cite{numpy} array of images, and a binary value was inserted into a 201 215 separate list at the same index. These binary values form the labels for 202 - each image (``Waldo'' or ``not Waldo''). Color normalization was performed 216 + each image (Waldo or not Waldo). Color normalization was performed 203 217 on each so that artifacts in an image's color profile correspond to 204 218 meaningful features of the image (rather than photographic method).\\ 205 219 206 220 Each original puzzle is broken down into many images, and only contains one 207 221 Waldo. Although Waldo might span multiple 64$\times$64 pixel squares, this 208 - means that the ``non-Waldo'' data far outnumbers the ``Waldo'' data. To 222 + means that the non-Waldo data far outnumbers the Waldo data. To 209 223 combat the bias introduced by the skewed data, all Waldo images were 210 224 artificially augmented by performing random rotations, reflections, and 211 225 introducing random noise in the image to produce news images. In this way, ··· 215 229 robust methods by exposing each technique to variations of the image during 216 230 the training phase. \\ 217 231 218 - Despite the additional data, there were still ten times more ``non-Waldo'' 232 + Despite the additional data, there were still ten times more non-Waldo 219 233 images than Waldo images. Therefore, it was necessary to cull the 220 - ``non-Waldo'' data, so that there was an even split of ``Waldo'' and 221 - ``non-Waldo'' images, improving the representation of true positives in the 234 + non-Waldo data, so that there was an even split of Waldo and 235 + non-Waldo images, improving the representation of true positives in the 222 236 image data set. Following preprocessing, the images (and associated labels) 223 237 were divided into a training and a test set with a 3:1 split. \\ 224 238 ··· 254 268 To evaluate the performance of the models, we record the time taken by 255 269 each model to train, based on the training data and the accuracy with which 256 270 the model makes predictions. We calculate accuracy as 257 - \(a = \frac{|correct\ predictions|}{|predictions|} = \frac{tp + tn}{tp + tn + fp + fn}\) 271 + \[a = \frac{|correct\ predictions|}{|predictions|} = \frac{tp + tn}{tp + tn + fp + fn}\] 258 272 where \(tp\) is the number of true positives, \(tn\) is the number of true 259 273 negatives, \(fp\) is the number of false positives, and \(tp\) is the number 260 274 of false negatives. ··· 293 307 network and traditional machine learning technique} 294 308 \label{tab:results} 295 309 \end{table} 296 - 310 + 297 311 We can see by the results that Deep Neural Networks outperform our benchmark 298 312 classification models, although the time required to train these networks is 299 313 significantly greater. ··· 305 319 by the rest of the results, this comes down to a models ability to learn 306 320 the hidden relationships between the pixels. This is made more apparent by 307 321 performance of the Neural Networks. 308 - 322 + 309 323 \section{Conclusion} \label{sec:conclusion} 310 324 311 325 Image from the ``Where's Waldo?'' puzzle books are ideal images to test