Convolutional Neural Networks, Part 1: Historical Significance

Convolutional Neural Networks (CNNs) are one of the more popular techniques used in image recognition and computer vision systems today, and they’re historically significant in the field of data science. However, they can be one of the most misunderstood tools in the data science tool chest. Here, I’ll provide sufficient historical context for understanding why CNNs are so important, and outline some of the ways they are being used to change the landscape of applied analytics.

With the advent of autonomous driving, image-related tasks are now quite common in data science workflows. CNNs are algorithms for transforming an image into some output, like a category (e.g., “containing a person” versus “no person pictured”). Intelligent photo storage and indexing tools can use CNNs to determine which photos in your library contain images of you with your dog. Automatic Teller Machines (ATMs) can use CNNs to determine the amount of money a handwritten check is for. Autonomous vehicles can use advanced versions to determine whether a sign down the road is indicating “Stop” or “School Zone”.

Figure 1: Puppy in hat

CNNs are commonly associated with computer vision, with historical roots traced back to the 1980s, when Kunihiko Fukushima[i] proposed a neural network architecture inspired by the feline visual processing system[ii] [iii]. While existing algorithms could recognize geometric patterns in images, they weren’t able to generalize very well, or learn how those patterns might occur in other parts of the image. For example, imagine we wanted to train a network to recognize images of puppies. Our network shouldn’t be misled by a puppy standing off-center in the image, the image flipped upside down, or partly obscured by wearing a hat (Figure 1). We certainly wouldn’t want to miss that. Fukushima’s contributions laid the groundwork for getting around this problem—by creating a mechanism for classifiers to be unaffected by patterns that have been shifted in position.

Facebook’s Yann LeCun helped establish how we use CNNs today—as multiple layers of neurons for processing more complex features at deeper layers of the network. Their properties differ from traditional feed-forward neural networks in some important ways that make them effective at image-based problems. For example, images are frequently best-represented in multiple dimensions: height, width, and depth, where depth corresponds to the number of channels used for each pixel. If an image is encoded with a depth of three, for example, those channels would likely correspond to the red, blue, and green channels. In some workflows, using a traditional fully-connected network could lead to a drastic increase in the number of model parameters—in a 670 x 1040 image, this would be over two million weights in the first layer of the network alone!  This would almost certainly lead to something we in the data sciences refer to as “over-fitting”—building a model that is really good at classifying the images that were used to estimate the weight parameters, such as puppies, but really bad at classifying images it hasn’t already been exposed to, such as a puppy wearing a hat. CNNs minimize the number of parameters by allowing different parts of the network to specialize in high-level features like a texture or a repeating pattern.

The upshot of these architectural properties is that we’re able to take something we know about images—that there tends to be a great deal of redundant information in them—and reflect that knowledge in our network, thus minimizing the number of parameters involved. With many image classification tasks, we’re more concerned with lower-level features such as where the edges of objects begin and end. While traditional image classification approaches rely on manual encoding of features—having an expert define where certain patterns will occur in an image. CNNs let us get at this same information efficiently through what are called “convolution operations,” which simultaneously process information in groups of pixels at the same time. In my next post, we’ll spend more time looking at what happens during these and other operations in the CNN architecture, as well as how easy it is to construct these networks using Intel’s neon™ deep learning framework!

CNNs are an essential tool in the modern-day data science toolbox. While some of the details of their implementation may seem technical on the surface, their origins can be traced back to intuition and inspiration from the neurosciences.  At Intel, we make sure that versatile computer vision tools are available to data scientists of all levels of expertise, and that they run efficiently on Intel® architecture (more on this in my next post). I encourage you to check out the Intel® Nervana™ neon deep learning framework at, or learn more about using CNNs with our tutorial on our YouTube channel: