As promised, here are the details of my Over The Air hack — The Eyes Have It — which won prizes for “Best Hardware Hack” and “Best Use of Other Features” in the overnight hack-a-thon competition. I also took it along to Mobile Monday London’s Demo Night on the following Monday.
The hack was a LEGO Mindstorms robot with an iPhone brain that followed faces in front of it and steered towards them. It performed well on stage, following me as I gestured towards it, just like a small pet.
The hack was composed of two parts: an iPhone app that detected faces in the video feed from the front-facing camera; and the LEGO robot that took instructions from the iPhone and steered accordingly.
Communication
Getting the two parts to communicate was one of the trickiest areas to get right, and caused extra headaches before each presentation. Both LEGO Mindstorms and the iPhone can communicate over Bluetooth, but Apple has restricted Bluetooth communication to companies that will pay the Apple “Made For iPhone” license fee or that use particular hardware. Bluetooth comms is not available through the standard SDK and apparently needs some kind of “secret handshake” to work.
Unfortunately, Mindstorms came out a while before the iPhone and uses a different Bluetooth chip; and LEGO and Apple haven’t managed to do a deal to provide Mindstorms access from the iPhone (perhaps because LEGO has open sourced much of their software?). LEGO has now released an Android app to showcase mobile phone integration, so let's hope Apple can work with them to get some iPhone apps too.
So no Bluetooth, and Apple won’t let you talk over the dock connector either… That left feeding information to the robot through one of its five senses — a touch-sensitive button, an ultrasound distance sensor, a microphone, a light sensor or the motors themselves (which can detect rotation).
A quick search on LEGO Mindstorms iPhone brought up the iPhoneRobot which used the light sensor to pick up different greys on the screen. This was a great start, but it wasn’t quite what I was looking for. For a start, it required the LeJOS firmware on the Mindstorms brick — this is a cut-down Java VM that replaces the built-in LEGO firmware. I’ve left my Mindstorms with the default LEGO firmware as I use it with my 7 year-old son. He’s not quite ready for Java, but can easily understand the Labview-based visual programming that comes with the Mindstorms kit. Secondly, the robot itself wasn’t quite suitable — I wanted a robot that would recognise faces, so I needed the iPhone to be pointing upwards towards people’s heads rather than along the ground.
Motor skills — building the robot
Designing a new robot from scratch takes quite a while and I only had overnight. Unlike normal LEGO with bumps and holes, the Mindstorms kit uses the new-style LEGO Technic, which is mostly holes and connectors. Also, LEGO’s own sample models are pretty complex, as they are built to look like animals or people as well as interact.
Fortunately, I found a great site with designs for quick Mindstorms models that are easier to hack to do what you want. Some of the ideas are amazing — especially a Segway that actually balances! I started with the 3-Motor Chassis and added the distance sensor on the front to prevent crashes. I also added the button sensor on the side to help with starting and stopping the program, as the iPhone was in the way of the program buttons on the Mindstorms block.
Here’s some pictures of the final result, holding the iPhone pointing upwards. I’ll try and generate some build instructions later.
Computer vision — the iPhone app
Humans are exceptionally good at seeing faces. Our brains have been trained from birth to detect and analyse faces very quickly. We can tell which way people are looking from far away and even see faces in random patterns.
Computers have a harder time of it, although recent developments have massively improved what is possible. Companies such as Polar Rose have shown demonstrations of both face detection (finding out where any faces are in an image or video) and facial recognition (matching the detected faces against a database of known images) running in real-time on mobile phones. Unfortunately, their code is not available to an overnight hacker, though they’ve recently been bought by Apple so we may see interesting capabilities in future iPhones.
However, Intel launched an open source project back in 1999 called OpenCV (for Computer Vision) and not only is it still going strong, but the library is easily compatible with the iPhone and has a git repository of a ready-built library. OpenCV is all about providing well-optimised functions for real-time computer vision, so that developers do not have to reinvent the wheel. It includes face detection algorithms and a guy called Roy has posted some examples of how to get face detection working on an iPhone video feed.
Roy’s sample code was written for iOS 3, and iOS 4 provides much easier methods for accessing the video feed from the device. I updated Roy’s code to use the new AVCaptureVideoDataOutput class that provides direct access to uncompressed video frames from any iPhone camera. This bit took a little while longer than it should have done, as the video feed is provided as landscape (you’re recording a video, right?) whereas its preview feed is oriented the same way as the camera. This was not obvious, and made worse by the fact that face detection algorithms do not work when the image is rotated by 90°… There was a point in the early hours of Saturday morning when I thought there would be no face detection at all!
Anyway, following Roy’s recommendations, I scaled down the input image and adjusted the parameters of the OpenCV face detection call. At the moment I transpose the image before sending it to the detection algorithm, but I suspect it would be faster to use a rotated Haar feature set (the bits that the algorithm picks out in each image to match faces). I also didn’t use Roy’s changes to use integer arithmetic rather than floating point — it turns out that the iPhone 4G has enough grunt to cope with the standard OpenCV code.
You can get the iPhone code from my github repository and try it for yourself. Note that it’s hardwired to use the front-facing camera at the moment. If you don’t have an iPhone 4 just change the AVCaptureDevice
to point to the ID of the other camera and the rest of the code should still work (though possibly a little more slowly…).
Light and dark — the LEGO program
So now I had a robot base and an iPhone app that could see faces. The next step was to connect the two together using the light sensor.
LEGO provide a drag and drop programming interface for Mindstorms that lets you build up programs using blocks such as “move motor”, “wait for sensor input” and control logic of loops and if/else switches. It’s quite capable and makes simple programs relatively easy, but using variables and arithmetic can be a little cumbersome.
The main issue in getting the robot to drive was calibrating the light sensor, especially when each demo was under different lighting. After a fair amount of tweaking (some just minutes before presenting at Mobile Monday London’s demo night), the best results turned out to be when I crammed a small piece of cardboard into the hinge that held the light sensor onto the iPhone…
You can download the “.rbtx” file from my github repository, but for those who don’t have the LEGO software, the algorithm is essentially:
- Calibrate the sensor when pointing at the black and white squares to either side of the control square on the iPhone screen
- The robot prompts for each reading with its display and waits for you to press the button between each sensor reading
- The program reads the raw values from the sensor and calculates its own scaling values, as the built-in calibration routines turn on the light on the sensor — this works for reading black lines on a white sheet of paper but isn’t so good at reading the backlit screen of the iPhone…
- Wait for another button press to start the robot moving — so you can step back and make sure your face is in frame
- Read the raw value of the light sensor, convert it into a value between -90 and +90, and then steer that amount, then repeat
- The program checks that the light sensor value is within a reasonable range before steering, otherwise the robot tends to go round in tight circles and you have to run round it like a lunatic trying to get your face in the camera frame!
- When the distance sensor picks up something closer than 6 inches, stop, play a sound and show a beating heart on the display (“I’ve found you!”)
- Start moving again when the button is pressed
To give you an idea of what this looks like in the Mindstorms NXT software, here’s a picture of the program!
Prizes
Thanks to Monotype for the beautiful poster of Gill Sans Bold Extra Condensed. They were going to give me two, but were happy to swap one for a copy of FontExplorer Pro instead, so I can see my digital fonts presented almost as prettily. Apparently, it’s now available for Windows as well as Mac OS X.