It’s time again for another episode of “Murderous Deer Machine Learning Mayhem”. Last time, I used a neural network to detect raccoons, since they frequented my backyard more often in the early days. In that post, my preliminary results revealed an accuracy (F1 measure) of approximately 72%. In this post, I’ll talk about some of the experiments I performed to try and increase the overall effectiveness of the neural network.
My first observation I thought was fairly obvious – color information should lead to better performance. This is an example where my gut feeling and a quick and dirty exploration of the data set seemed to point in a promising direction. Notice the word seemed – that’s because I ultimately went down a bad path. This is a perfect example of why you should dig into the data much more – and have a better grounding in the domain – before investing too much time into feature engineering. Past-self was warning current-self about this in previous blog posts. Bad Craig for not listening, bad! But before I reveal why this didn’t work, let me describe how I adapted the architecture of the neural net to see in color.
Background – one type of computer image ultimately boils down to a combination of three different color components – Red, Green and Blue (RGB). By mixing these components together with varying intensities of each, you can produce a wide variety of colors. For example, if each component can take on an intensity value between 0 and 255, you can produce 16,777,216 different colors. Cool!
My idea for the neural network was simple – break down each pixel into its component values, and feed all of that to the neural network. A picture will probably describe this much better. My original network looked like this, where I took the average of the RGB intensities and stored it in a single value (creating a grayscale image):
I modified the network (by modify, I mean built in an additional option to the program) to subdivide each pixel into its R, G and B component, and fed that to the neural network like so:
This means that I now had 3,600 x 3 = 10,800 inputs to the neural network. A bit of a combinatorial explosion in the number of network connections, but heck, memory and computing power is cheap these days.
I ran quite a number of experiments on the raccoon data with more control on how testing and training data sets were built, both to re-affirm my original results (which hold, ~70% accuracy is right on the button), and to try out a few different options, which included color information. I’ll explain the different options in a moment, but first, here are some results. Note, these results are after training the network over 1,500 iterations to convergence, and then performing a 10-fold cross validation.
Precision (also known as the positive predictive value) is a measurement of how well the machine learning algorithm is doing when it labels something as a positive example. You basically look at all of the things that the classifier labels as a raccoon, and see how many of them are actually raccoons. High precision is good – it means the algorithm is accurately identifying positive examples.
In the case of the Baseline, you can see that it does fairly well – roughly 70% of the things it labelled a raccoon were actually raccoons. This is good. Notice however, that the Hyperbolic did better (more on that in a bit). Notice too that the precision with Color information was significantly lower.
Recall (also known as sensitivity) measures how well the algorithm is remembering the actual examples it was given (I always think of the movie Total Recall – the original 90’s version with the Governator). Put another way, recall asks: of all of things that were actually raccoons, how many did it properly identify? As an example, if there are 10 raccoons in the data set, and the algorithm correctly identifies 6 raccoons, then the recall is 60% (total recall would be identifying all 10 raccoons).
As we can see, the Baseline recall is roughly 70%, with the Hyperbolic performing equally. Again, it took a hit with Color, but this time, the Hyperbolic Color does better. Notice the variance between the different options (the black lines that indicate a range). Both Color and Hyperbolic Color have quite significant variances. This means that those models are fairly fragile.
The F1 measure is a way of combining both the Precision and Recall of a model into a single measurement (also known as a harmonic mean). Why? Well, sometimes it’s nice to be able to judge performance based on a single feature, rather than comparing the features separately. In a previous post, I used the F1 measure and accuracy to mean the same thing.
As you can see here, the Hyperbolic wins the race for overall performance. Let’s talk about what is happening here.
In a previous post, I talked about how neural networks work. One of the key features is the activation function for the network. When a signal is passed from one end of the network to the other, it must pass through several layers in the network. The signal must be a certain “strength” before it is passed on. The thing that measures the strength and “decides” whether to pass on a new signal is the activation function.
The Hyperbolic Tangent is an activation function with a slightly different profile than the one I was using in the Baseline model. The Baseline model uses the Sigmoid as an activation function. Here is a plot of how the two functions stack up with some values between -10 and 10:
As you can see, the Hyperbolic Tangent maps values between -1 and 1, and is triggered slightly differently than the Sigmoid.
Long story short, recall for both the Hyperbolic Tangent and Sigmoid models were basically identical. The difference was with precision. Basically, the Hyperbolic Tangent resulted in a model that was more precise than the Sigmoid model, which is why the F1 measure is slightly better (because it is a combination of both precision and recall). Note that both the Baseline and Hyperbolic models were produced using grayscale images. So, what is going on with color?
Yup. That’s right. Performance tanked with color information (okay, it’s not abysmal, but it is considerably worse than my original run with grayscale images). At first I thought I had an error somewhere in my code. But looking at the data more, as well as false positives and false negatives, I started to hypothesize that the way I was using color information was probably hurting me very badly. The reason – I think – is due to the fact that I’m losing shape information when I separate out each color band.
To explain a bit more, in both the Baseline and Hyperbolic (without Color) models, for each pixel in each image, I take an average of all the different color bands. This does two things:
Point 2 I think is the important part. To demonstrate this a bit more clearly, I looked at the following false negative (what was a raccoon, but what the algorithm thought was not):
The left-hand image is the color example of the false negative. The next three images are the R, G, and B components, while the right-most image is the grayscale image. Notice how difficult it is to make out shapes when you treat each color band separately (especially the blue component). The raccoon tends to blend in with the background. Aside from the white on its face and front, it is very difficult to make it out. On the grayscale image it is much easier to separate the actual shape of the raccoon out of the background.
The long and the short of it is while I thought I was giving the neural network more data to work with, in actual fact, I was making it harder for the network to distinguish shapes. I need to do some reading to find a new strategy for dealing with color information. Feel free to reply in the comments section if you have an idea for an approach!
With some preliminary analysis out of the way, and with a functioning neural network, I think it is finally time to perform deer detection. I now have over 5,500 images of deer to train with. It will take some time to process that many images in order to generate a training set. Half of those images are taken during the night, and half during the day. Many of them are of the deer laying down, like so:
Notice that pesky post in the way of the deer! Plus, eventually those deer are going to lose their antlers (ha-ha – one less weapon with which to murder me). All in all, there are going to be some unique challenges to identifying deer.
In this post, I talked about the performance of the neural network with respect to identifying raccoons, and talked about why color information didn’t work out as I originally planned. Tune in next time when I’ll look at identifying deer and some of the challenges that brings.