Agents of Change
Raja Bala: Envisioning a Better Future
Raja’s innate sense of curiosity about how things work has led him to become one of our most prolific inventors.
Harnessing the Power of Computer Vision
Raja Bala is the Principal Scientist of Computer Vision at PARC, a Xerox Company. In his two stints working for Xerox, Raja’s innate sense of curiosity about how things work has led him to become one of our most prolific inventors. It’s also given him unique insight into how computer vision will change the way we live.
What if a selfie was all it took to diagnose disease? If you could drive to work and back without ever touching the wheel? If you could walk into the grocery store, pick up the items you need and leave without waiting in the checkout line — your total automatically deducted from your checking account?
It’s not science fiction. It’s the world Raja Bala is helping to build. A world where computers can see, interpret and analyse the things around them, and then use that information to make real-time decisions and help solve real-world problems. Join us as we uncover the man behind the technology.
How did you first become interested in computer vision?
Growing up, there was a huge emphasis on math and science in my home. My mom was a science teacher. My dad was an engineer. But while I’ve always had a particular love for mathematics — and some talent for it — I’ve never been a completely “left brain” person. I’m a musician. I love the arts. So when I set out to decide on a career, I wanted to find something that could satisfy all of my interests across the board.
Computer vision wasn’t what I initially settled on. When I first joined Xerox in 1993, it was as a colour imaging scientist. At the time, Xerox was making the transition from black-and-white to colour printing. I led several exciting projects developing colour management solutions for Xerox printers and scanners. It was only in 2011, following an acquisition, that I saw the opportunity and potential of computer vision. I made the transition and the rest is history.
Can you talk about what computer vision is and why it’s relevant in the world today?
Computer vision is the science by which machines analyse, interpret and extract useful information from images and video, then use that information to solve real-world problems. In my and my team’s work, we attempt to take a digital image or video, turn it into a mathematical representation that a computer will understand and then teach the computer to perform a task using this representation — such as detecting whether or not there’s a face in an image.
While computer vision has been around since the ‘60s, it was limited in the early years by a lack of availability and access to digital images. Image analysis was relegated to specialty uses, such as in the medical field, and never had mass exposure. But with the advent of smartphones, and companies like Google and Facebook making it easier than ever to access images with large searchable image databases, computer vision’s stature, relevance and penetration into mass consumer markets has exploded. The abundance of image and video data generated by consumers today, combined with the advanced algorithms and computing hardware available to process them, is changing the way we think about the field.
Are there still challenges to overcome in computer vision today, even with these advances?
Right now, there’s a lot of excitement about deep learning and its application to computer vision. Deep learning is a really effective way to extract useful patterns from images. It works by feeding lots of example images into a neural network along with an associated pattern or truth about the images. The network then learns a set of connections and weights that enable it to identify the same type of pattern or truth in new images.
When you have a large dataset to work with, deep learning can be a game-changer. A deep network is capable of understanding extremely complex patterns and relationships in images and is very successful in the tasks that it’s trained to do. One fundamental problem, however, is that deep learning relies on the availability of datasets comprising millions of images and their ground-truth labels in order to be successful. And a lot of applications don’t have access to that many images and labels.
In the medical field, for instance, you may want to use deep learning to help you diagnose a specific disease. That means training the deep network with millions of images of organs that are marked with different levels of severity of the disease. That volume simply doesn’t exist. And even if it did, you’d never be able to afford to have a group of clinical experts sit down and label all those images.
So the question we often have to ask ourselves is, how do we get creative with this? How do we modify deep learning to make intelligent decisions based on limited training?
What we’ve done is revisit some of the first-principles models we worked with before the deep learning era and use them to build prior knowledge and intelligence about the task and environment into a deep network. To teach a deep network to recognise blood vessels in retinal images, for example, we provide hints to the network that it should look for thin curvy structures that branch out like a tree. With these hints, the network not only requires far fewer training images, but actually outperforms today’s best deep learning methods.
You haven’t always worked for Xerox, correct?
Correct. After 22 years at Xerox, I decided to test myself in a new environment — working for the Samsung smartphone camera imaging group on developing computational imaging techniques for the Galaxy and Note devices.
What did you take away from the experience?
A new appreciation for simplicity. Most Xerox products are used in an office environment, where you can count on at least a little bit of user familiarity with technology. But nearly everyone in the world carries a smartphone in their pocket. Working on a consumer product like that, you need to accommodate every level of tech experience. Make something that’s easy to use whether you’re an expert or a beginner. That level of simplicity takes thousands of hours to achieve. An incredible amount of work goes into making sure each click does exactly what it’s supposed to do.
Coming back into a research environment at Xerox, I know what it takes to turn excellent science into an impactful product. It’s one thing to be able to publish a great paper about your research. But if you want your research to result in a product that an end customer can actually use, it needs to be foolproof, simple and as intuitive as possible. You really need to go that extra mile.
Say you’re working on a mobile app for smart document scanning. Computer vision traditionally requires a lot of processing power — something that’s not abundantly available on a mobile device. So if you want your solution to be more than just an academic exercise, you need to be clever about making it not only accurate but fast and energy-efficient. Otherwise people won’t use it.
Which of your projects has had the biggest impact on the world?
My team collaborated with Proctor and Gamble to provide the computer vision and machine learning technology that powers the “Olay Skin Advisor.” It’s a mobile platform that captures a selfie of a consumer, analyses her face and then provides skincare product recommendations.
Ideally, you’d always be able to talk to a dermatologist about problems with your skin. But that’s expensive. And because caring for your skin is an ongoing process, most people can’t afford to do that. As for taking care of your skin yourself? Take a walk through any beauty store. There are hundreds of products to choose from. It’s frustrating, confusing and easy to make the wrong choice. Less than two thirds of women know what products work best for their skin type.
P&G wanted to solve this problem with a low-cost, personalised beauty care navigator. So we developed an easy-to-use mobile app. We figured, why not take advantage of the high-quality camera consumers are carrying around with them anyways?
To use this app, a consumer begins by taking a selfie of her face. That picture is then analysed with computer vision to decide if it’s good enough to perform skin analysis — checking for proper lighting, distance, facial expression and the absence of obstructions. If the picture passes all the checks, the app will then analyse the consumer’s skin, let them know what’s happening with it and suggest products and regime changes to take care of it.
We have more than one million active users, and there have been five million visits to the site since the app launched.
What area of computer vision excites you the most going forward?
Computer vision and the broader domain of AI are a great area to be in today. Only recently has this field gotten mature enough to start making a real, meaningful and pervasive impact in the world, all the way from routine tasks like automatic check deposit with your smartphone camera to big applications like autonomous driving and early disease diagnosis. Applications are plentiful and growing. And there are numerous unsolved scientific and engineering challenges in making these applications accurate and reliable.
But what I’m most excited about personally is continuing our work in building computer vision methods that both draw from prior models of the world and learn from data examples. Models range all the way from our work with retinal blood vessels to general common-sense knowledge about everyday objects, people, and the laws of nature. Thanks to the rich, nurturing environment Xerox provides for innovation, we are on the leading edge of bringing these models of the real world into data-driven machine learning methods to create a form of hybrid learning. I can’t imagine a more exciting place to be.
Agents of Change
We’ve all changed the world. Every one of us. With every breath we take, our presence endlessly ripples outwards.
But few of us have the opportunity to change many lives for the better. And even fewer are challenged to do so every day. That’s the gauntlet thrown daily at Xerox research scientists — to try and effect change.
In return, we give them time and space to dream. And then the resources to turn dreams into reality — whether they’re inventing new materials with incredible functions, or using augmented reality to bolster the memory of Alzheimer’s patients.
We’re proud of our Agents of Change in Xerox research centres across the world. Here are some of their stories.
See how some of the brightest minds on the planet gather at our worldwide research centres to improve the future of work. Learn more
Structural Health Monitoring Using IoT
Xerox and the Victorian Government are partnering to launch Eloque, a joint venture to commercialize new technology that will remotely monitor the structural health of bridges.