This is it! This is the post where I talk about the Machine Learning component in my Deer Detector. Have no idea what I’m talking about? Check out my other blog posts here and here for some background information about the project.
Interested in how some machine learning algorithms work? Check out one of my Deer Detection Diversion posts on the subject here.
Ready? Let’s go!
First off, some good news – I managed to capture some pictures of the deer in my backyard. In the event of my mysterious death (and my empty garden), I have some interesting pictures as evidence that a deer may be involved:
As I discovered, they like our blackberry bushes, and they like to eat apples from our apple tree too:
There are actually quite a few images of the deer at night, it appears around 3 – 4 am and hangs around for about 20 minutes before moving on. At one point, there were a pair of them that would hang out in our backyard, plotting our murder. But luckily, we’re down to one.
While murderous deer scare me just as much as EMPs, a recent trip to the movies alerted me to a much bigger threat. It appears that deer-built weapons of mass destruction are far less deadly than raccoon built ones. Ominously, I have plenty of raccoons that frequent the back yard. They too seem to enjoy snacking on our blackberry bushes:
As it turns out, I have some good pictures of raccoons in the backyard. This is good – I can test out my Machine Learning strategy with raccoons first to see if it is going to work before turning it on to the deer.
I decided to use a neural network for the brains of my Machine Learning program. One of the reasons for doing so is due to the type of task I’m attempting.
See, one of the problems of Machine Learning is to define features that describe the data you are attempting to process. The Machine Learning Squirrel detector (and sentry water gun) project used color histograms, the size of a blob of color, and the texture of the color as features that the Machine Learning component used to discriminate squirrels from other objects in the yard.
While I could go that route and attempt to engineer features that define raccoons, I think a simpler start is to use a neural network to learn its own set of features from the raw pixel data – at least until I know more about what features are useful for the task at hand. That’s the beauty of a neural network – the hidden layers in the network learn features on their own – yay!
First, I needed to determine what data I actually wanted to send to the neural network. Sending the entire image is probably not going to work too well. For a start, my smallest resolution images are 640×480. That’s 307,200 pixels the neural network would have to look at! Secondly, the network won’t know what to focus on.
A safer start is to focus on portions of the image that could fit raccoons in them. The idea behind this is that the raccoon would take up the majority of the space in the image, giving the neural network something unique to focus on. After looking at a lot of raccoon images, a fixed size of 60×60 pixels looked to be about right.
My eventual goal will be to take a picture, and scan across and down it using a 60×60 window. I’ll then feed the contents of each window frame to the neural network. This way, the neural network will eventually see the entire image – one window frame at a time. Here’s an example of how I intend to scan across the screen:
The detection process will be a simple winner-take-all approach – if the neural network detects a raccoon in any one of the windows scanned across the picture, then it will return a positive result. I will even make the program draw a box around the suspected target, and label it with what it thinks it is.
However, that’s a problem for future self. First, I needed to define an actual neural network, and see if any of this was even possible.
Alright, now for some fun stuff. The neural network architecture I chose is one that has 3 layers. The first layer is the input layer, consisting of the pixels from the 60×60 window. This means it will have 3,600 nodes, where each node represents a single pixel. The second hidden layer will contain 10 nodes (these are the feature selectors). The third and final layer – the output layer – will have 1 node, which will have a value of 1 if there is a raccoon in the image, or 0 if not. Here’s what the architecture looks like:
Next, I came across a tedious task – assembling the actual training data. I found 96 images of raccoons from my Kinect and Raspberry Pi NOIR surveillance camera, and used the GIMP to cut out 60×60 chunks that I fed to the neural network. Since 96 images really isn’t that many, I used Image Magick to generate mirror images of them all, giving me 192 raccoon images. Here is an example of some of the images:
Just for fun, here’s a Bash one-liner that generates mirror images for all the files in a directory (insert evil-mirror-universe raccoon-with-a-goatee joke here):
for file in * ; do convert $file -flop "mirror-$file"; done
I also selected 170 random backgrounds of the backyard, and cut out 60×60 chunks that did not have any raccoons in them. These became negative cases for the neural network.
Next, I transformed the color images into black and white. Why? The black and white pictures only have a single “band” of color information (also known as the pixel intensity). One byte is used to represent how bright a pixel should be (a value between 0 and 255). Since there is only one value to worry about per pixel, using black and white images requires a lot less processing, and for a first shot, is a good idea to try.
I won’t go into too many details here about how I built the actual neural network – that belongs in its own future post! I will say that I validated the approach of using a neural network first using tools such as Weka and R. Once I saw that the network was actually working somewhat decently, I decided to code my own version in Java for more control, and for better portability.
First, let me start off by saying that these results are really preliminary. Given that I only had ~350 samples in total, I decided to use a 10-fold cross-validation technique to tell me how well the algorithm was doing.
What this means is that I generated a training set using 80% of the data, and set aside 20% for testing. I built the model with the training data, and then tested it out using the test set. The 10-fold cross-validation means I did this 10 times, randomly choosing samples for the 80/20 split each time, and building a new model every time. This trick allows me to try out different cross sections of the data-set to get a better feel for how it performs as a whole. Here’s the results of my 10-fold cross-validation:
For those of you not versed in precision, recall, and F1 measures, the way to interpret this is that the neural network has an accuracy of roughly 72%.
That’s actually not too bad for a first attempt. Keep in mind however, that on real data, model performance is likely to be worse. But, all things considered, it’s still doing better than random guessing. Take that talking raccoons from outer-space!
Note: one thing to look at is the variance across the folds, represented by the error bars in the figure. In my example, the variance for precision, recall and F1 measure is quite low, meaning that the model is fairly stable. Large variance would mean that the model changes drastically across each fold. However, I’m likely over-fitting the model to the data, which is bad. I pretty much expected that, given how few training examples I have, and the slight skew to the positive class. More training data would help prevent this.
Also note: training 10 models sounds like it would take a lot of time. However, thanks to the JBlas library and knowledge of vectorization, it took all of 10 minutes to train 10 models – that’s ~1 minute per model.
It helps to look at the examples the neural network gets wrong, so that I can start to diagnose what the problems are, and how to make it perform better. I’ve taken the fold with the best performance (a bit of a cheat) and looked at both the false positives and false negatives, since those are what the classifier gets wrong.
False positives are things the classifier thought were raccoons, but actually were not:
These images aren’t raccoons! Two of them are bushes, two are shadows on the ground, and one of them is a cat! Bah! On the other hand, false negatives are things the classifier thought were not raccoons, but actually were:
The one thing that might improve accuracy is somewhat obvious – color! Here are the images that the classifier got wrong, but with color information instead:
Note the lack of raccoons! And here are the false negatives:
In a number of cases, it becomes much easier to distinguish between raccoon and bush based solely on color information. Raccoons are usually gray, not green... unless they are wearing some form of camouflaged pants!
Note: that color information is probably only somewhat useful moving forward. Shots that are taken during the night time will only have two colors associated with them. But at least for the daytime shots, color could prove useful.
This is an example of a problem that I should have thought about before beginning data collection. Some image formats offer compression, resulting in files that are smaller on disk. The trade-off for a smaller file size is that the raw data is compressed when flushed to disk. This wouldn’t normally be a problem, except that some formats (like JPEG) are known as lossy formats, because you won’t get the same image back after you compress it – some of that original pixel data is thrown out.
How does this impact my raccoon detector? Here are two examples of my images:
The picture on the left is taken from a Kinect in the early days of gathering data using my quick and dirty
Java program. I blindly used
javax.imageio.ImageIO to write my file
to disk, and picked JPEG as the extension – for shame! That left me with no control over the compression level used.
Notice how it is blotchy and blocky?
The picture on the right is taken using the NOIR camera using my Python script, which has a default compression quality setting of 85. Notice how it is sharper? /me facepalm
Moving forward, I should either:
This is fairly significant, since sharper images may provide more information for the neural network to work with.
The first set of results is without performing a thorough search through the parameter space. There are many different settings on the neural network I can adjust – for example, number of iterations, learning rate, and hidden layer size.
One potentially good avenue to explore is the number of hidden units. By tweaking how many feature selectors I implement in the hidden layer, I can produce an even more highly biased model. Then, by adding more examples, I can hopefully increase its performance.
There are many things I intend to explore moving forward:
One interesting point to note is that detection of deer might work better than raccoons – this is due to the pokey bits of the deer (antlers, not horns!). Since they have a very distinct shape, the feature selectors on the neural network might work really well on them.
Check back soon when I’ll talk more about how I built the neural network, will share some wildlife photos taken in the backyard, and will turn my attention to detecting deer!
PS: for those interested in the neural network implementation, it is available on my GitHub account. You can currently train the neural network using image samples, and cross validate its performance. Very shortly, you will be able to save the model and use it to make predictions!