Have you ever wanted to deploy your deep neural network (DNN) models onto a weekend project that is not only fun but also creates a useful gadget to have around the house? Maybe Fido figured out the motion triggered camera system and fills up your disk space with videos of him jumping around the room. Or maybe you don’t want to go through 10 hours of CCTV footage trying to figure out what happened to your delivery package. A friend of mine has been asking me to help him update his motion software based DIY CCTV camera for a while, so I decided to finally help him out. The end result was a smart camera proof of concept created using the Intel® Movidius™ Neural Compute Stick that improves upon traditional CCTV systems by intelligently detecting and recording activity in which the user is specifically interested.
In order to better understand the real-world problem that my project would solve, I did some online research on the need for home security cameras and was quite surprised by the results. As more consumers move towards online shopping and have their items delivered to their doorstep, the risk of having these packages stolen has increased. Some stats are listed below from an online survey :
The best way to define requirements for a product is to first experience the problem(s) it is expected to solve. In order to do this, I replicated my friend’s DIY CCTV camera and used it for a few days. The camera performed well when detecting motion, recording activity, and viewing live streams, but I strongly felt the need for more smartness in its features and functionalities. For example, despite going through quite a bit of trial and error attempts to tweak the motion detection configuration, I ended up with video recordings of cars driving by, wind blowing through the trees, and of course my two-year-old son learning how to trigger motion detection on the camera. I needed a system that was not only easy to setup (no hand-tweaking the motion detection configuration), but also reliable in terms of detecting the subjects, objects and activity in which I am specifically interested.
In an attempt to add ‘smartness’ to the camera, I interviewed a couple of my friends about their perception of a smart security camera. Below is a visual representation of their collective requirements.
In order to set achievable goals for this project, I simplified the requirements into four main tasks. See Figure 3 for a visual representation of these simplified requirements.
Given the fixed set of requirements, I decided to break them down into simpler blocks and find suitable hardware and software components for this project.
Unless I live in the giant’s house from Jack and the Beanstalk (which I don’t), my development laptop won’t fit on or near my door. So I need something that is not only small, but light enough to physically hang on the door. I had built the CCTV camera using a Raspberry Pi 3 Model B (RPi) and a PiCamera, so I decided to leverage that hardware. Since I wanted to apply AI (deep learning) to this project, I paired the RPi with an Intel® Movidius™ Neural Compute Stick, which is designed to offload deep neural network inference from an application processor. Below is an illustration of my hardware setup.
Since the entire setup will potentially be powered by a battery pack, I had to ensure that none of my hardware components were power guzzlers. Fortunately, the Intel® Movidius™ Neural Compute Stick is a low-power device and is designed to run off a single USB 2.0 or 3.0 port. Then, I plugged this into one of the 4 USB 2.0 ports on the RPi and supply it with power either through the micro-USB port or through the RPi stacking header. I used an off the shelf power bank (portable phone charger) to supply power via the micro USB port.
I started with a budget of $120 for the entire project, but a quick dive into my e-dumpster basically gave me everything I needed, so I ended up building my smart camera at no additional cost. Below is a cost estimation for your reference.
|Intel® Movidius™ Neural Compute Stick||$79|
|Raspberry Pi Zero W||$10|
|Raspberry Pi Camera V2||$25|
|USB OTG cable||$1|
|SD Card (min 8GB)|
The most difficult part about building neural-network-based products is finding the relevant dataset and training a model based on the chosen dataset. Since I was building just a proof of concept, I decided to experiment with some of the freely downloadable pre-trained neural networks. I used “An Analysis of Deep Neural Network Models for Practical Applications” by Alfredo Canziani et al. as a guide to help pick the neural network that would meet my requirements; see Figure 5 for a comparison chart.
Since the results in this chart are based on tests performed on a different hardware, I had to re-run these networks on the Intel® Movidius™ Neural Compute Stick. Rather than running all networks, I picked one network from each extreme case: Inception-v4, the most accurate, and AlextNet, the least complex (i.e. fastest). I also ran MobileNets, a class of efficient convolutional neural networks (CNNs) designed for mobile and embedded vision applications. Table 1 lists the performance results from my test case. There are two takeaways from this test:
|Network||Inference time||Frames per second|
|AlexNet||91.33099 ms||10.9 fps|
|Inception-v4||645.0548 ms||1.55 fps|
|MobileNet (1.0 | 224)||39.26307 ms||25.4 fps|
In order ensure that my proof of concept would work in real-world situations, I had to pick a network that not only runs fast on the Intel® Movidius™ Neural Compute Stick, but also has a ‘person’ category and is capable of dealing with multiple subjects or objects in a single image (or a single camera frame). Fortunately, MobileNet SSD meets all of these requirements, and a pre-trained model is readily available . A quick test of running MobileNet SSD on the system yielded the following results..
|Network||Inference time||Frames per second|
|MobileNet SSD||80.47414 ms||12.4 fps|
Thanks to the Intel® Movidius™ Neural Compute SDK’s comprehensive API framework, it was quite easy to develop the app for this project. The basic structure of any app featuring this hardware breaks down into 5 simple steps:
The Neural Compute App Zoo  is loaded with example apps, so I leveraged an existing app called live-image-classifier as the foundation for this project. Apart from writing a utility script to de-serialize the output into a Python dictionary, I only had to update steps 3 and 4 to create a working prototype of the application.
There was a small hitch while migrating the app from my development laptop to the OctoCam’s Raspberry Pi Zero W. The code relied heavily on OpenCV to capture and preprocess frames from the camera, but there is no pre-compiled OpenCV binary or python wheel for the Raspberry Pi. On a Raspberry Pi 3 Model B, compiling OpenCV from source takes about 4 hours, but it failed after 56 painful hours on my RPi Zero W. I could have tried cross compiling on a development machine, but I decided to take a much more effective approach of using PiCamera  for capturing camera frames and PIL  for pre-processing images and visualizing the output.
Below is a video recording of my ‘DIY smart security camera’ in action. Although the MobileNet SSD model is capable of detecting twenty different classes, the code is designed to capture images (or record video snippets) when a person is detected. Since the RPi Zero has built-in WiFi, I can easily SSH into the device from my development laptop and tweak the trigger mechanism. For example, it only takes a 2-3 line code change in Step 4 to start recording only if both a dog and a person is detected.
If you are interested in replicating this project, you can access the source code for both your development machine and Raspberry Pi at https://github.com/movidius/ncappzoo/tree/master/apps/security-cam.
 The package guard – https://www.thepackageguard.com/wp-content/uploads/2017/05/DataSheat_PackageTheftDeliveryindustrystats-1.pdf
 An Analysis of Deep Neural Network Models for Practical Applications by Alfredo Canziani, et al. https://arxiv.org/pdf/1605.07678.pdf
 Caffe* implementation of MobileNet SSD by Chuanqi305 – https://github.com/chuanqi305/MobileNet-SSD
 Neural Compute App Zoo – https://developer.movidius.com/examples
 RPi camera library – https://picamera.readthedocs.io
 Python Imaging Library (PIL) – https://pillow.readthedocs.io
 OpenCV library – https://opencv.org/
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at www.intel.com.
Hardware: Laptop based on Quad core Intel Core i5-6600 CPU @ 3.3GHz, 32GB RAM
Software: Ubuntu 16.04 + NCSDK 1.12
Test code: FPS numbers were generated using sample codes in https://github.com/movidius/ncappzoo, which is released to the public under MIT license, and periodically goes through IP scans.
Intel, the Intel logo, and Movidius are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.