The limits of deep learning

There are a couple of articles I’ve read recently that have gelled with my own about the limits of deep learning. Deep learning simply refers to multi layered neural networks, that typically learn using back-propagation to train. These networks are very good at pattern recognition, and are behind most recent advances in artificial intelligence.  However, despite the amazing things they are capable of, I think it’s important to realize that these networks don’t have any understanding of what they’re looking at or listening to.

Gödel, Escher, BachThe first article was by Douglas Hofstadter, a professor of cognitive science and author of Gödel, Escher, Bach. I read that book many years ago and remember getting a little lost. However, his recent article titled The Shallowness of Google Translate clearly demonstrates how  the deep learning powered Google Translate successfully translates the words, but often fails to translate the meaning. Douglas believes that one day machines may be able to do this but they’ll need to be filled with ideas, images, memories and experiences, rather than sophisticated word clustering algorithms.

The second article by Jason Pontin at Wired discusses the Downsides of Deep Learning:

  • They require lots of data to learn
  • They work poorly when confronted with examples outside of their training set
  • It’s difficult to explain why they do what they do
  • They don’t gain any innate knowledge or common sense.

Jason argues that for artificial intelligence to progress we need something beyond deep learning. There are many others saying the same types of things. I’ve recommend watching MIT’s recent lectures on Artificial General Intelligence that covers this as well.

 

Cacophony: Using deep learning to identify pests

This is the first of a series of posts I intend to write on organisations that are using artificial intelligence in New Zealand. I am closer to this organisation than most because it was started by my brother, Grant.

Cacophony is a non-profit organisation started by Grant when he observed that the volume of bird song increased when he did some trapping around his section in Akaroa. His original idea was simply to build a device to measure the volume of bird song in order to measure the impact of trapping. Upon examining trapping technology, he came to the conclusion there was an opportunity to significantly improve the effectiveness of trapping by applying modern technology. So he set up Cacophony to develop this technology and make it available via open source. This happened a little before the establishment of New Zealand government’s goal to be predator free by 2050. He managed to get some funding and has a small team of engineers working to create what I refer to as an autonomous killing machine. What could possibly go wrong?

Because most of the predators are nocturnal the team have chosen to use thermal cameras. At the time of writing they have about 12 cameras set up in various locations that record when motion is detected. Grant has been reviewing the video and tagging the predators he can identify. This has created the data set that has been used to by the model to automatically detect predators.

They hired an intern, Matthew Aitchison, to build a classifier over the summer and he’s made great progress. I’ve spent a bit of time with Matthew, discussing what he is doing. Matthew completed Standford’s CS231n computer vision course that I’m also working my way through.

He does a reasonable amount of pre-processing: removing the background, splitting the video into 3 second segments and detecting the movement of the pixels, so the model can use this information. One of his initial models was a 3 layer convolution neural network with long short-term memory.  This is still a work in progress and I expect Matthew will shortly be writing a description of his final model, along with releasing his code and data.

However, after just a few weeks he had made great progress. You can see an example of the model correctly classifying predators below, with the thermal image on the left and on the right the image with the background removed, a bounding box around the animal and instantaneous classification at the bottom with the cumulative classification at the top.

 

A version of this model is now being used to help Grant with his tagging function, making his job easier and providing more data, faster.

The next thing is to work out how to kill these predators. They’re developing a tracking system that you can see a prototype working below.

From my perspective it feels like they are making fantastic progress and it won’t be too long before they can have a prototype that can start killing predators. If you ask Grant he thinks we can be predator free well before the government’s goal of 2050.

One final point on this from a New Zealand AI point of view, is how accessible these technologies are that are driving the Artificial Intelligence renaissance. Technologies such as deep learning can be learnt from free and low-cost courses such as the CS231n. Those doing so, not only have a lot of fun, but open up a world of opportunity.