Teaching machines to see: New smartphone-based system could accelerate development of driverless cars

December 20, 2015
An example of SegNet in action: the separate components of the road scene are all labelled in real time. Credit: Alex Kendall

Two newly-developed systems for driverless cars can identify a user's location and orientation in places where GPS does not function, and identify the various components of a road scene in real time on a regular camera or smartphone, performing the same job as sensors costing tens of thousands of pounds.

The separate but complementary systems have been designed by researchers from the University of Cambridge and demonstrations are freely available online. Although the systems cannot currently control a , the ability to make a machine 'see' and accurately identify where it is and what it's looking at is a vital part of developing autonomous vehicles and robotics.

The first system, called SegNet, can take an image of a street scene it hasn't seen before and classify it, sorting objects into 12 different categories—such as roads, street signs, pedestrians, buildings and cyclists - in real time. It can deal with light, shadow and night-time environments, and currently labels more than 90% of pixels correctly. Previous systems using expensive laser or radar based sensors have not been able to reach this level of accuracy while operating in .

Users can visit the SegNet website and upload an image or search for any city or town in the world, and the system will label all the components of the road scene. The system has been successfully tested on both city roads and motorways.

For the driverless cars currently in development, radar and base sensors are expensive - in fact, they often cost more than the car itself. In contrast with expensive sensors, which recognise objects through a mixture of radar and LIDAR (a remote sensing technology), SegNet learns by example—it was 'trained' by an industrious group of Cambridge undergraduate students, who manually labelled every pixel in each of 5000 images, with each image taking about 30 minutes to complete. Once the labelling was finished, the researchers then took two days to 'train' the system before it was put into action.

"It's remarkably good at recognising things in an image, because it's had so much practice," said Alex Kendall, a PhD student in the Department of Engineering. "However, there are a million knobs that we can turn to fine-tune the system so that it keeps getting better."

SegNet was primarily trained in highway and urban environments, so it still has some learning to do for rural, snowy or desert environments—although it has performed well in initial tests for these environments.

The system is not yet at the point where it can be used to control a car or truck, but it could be used as a warning system, similar to the anti-collision technologies currently available on some passenger cars.

"Vision is our most powerful sense and driverless cars will also need to see," said Professor Roberto Cipolla, who led the research. "But teaching a machine to see is far more difficult than it sounds."

As children, we learn to recognise objects through example—if we're shown a toy car several times, we learn to recognise both that specific car and other similar cars as the same type of object. But with a machine, it's not as simple as showing it a single car and then having it be able to recognise all different types of cars. Machines today learn under supervision: sometimes through thousands of labelled examples.

There are three key technological questions that must be answered to design : where am I, what's around me and what do I do next. SegNet addresses the second question, while a separate but complementary system answers the first by using images to determine both precise location and orientation.

The localisation system designed by Kendall and Cipolla runs on a similar architecture to SegNet, and is able to localise a user and determine their orientation from a single colour image in a busy urban scene. The system is far more accurate than GPS and works in places where GPS does not, such as indoors, in tunnels, or in cities where a reliable GPS signal is not available.

It has been tested along a kilometre-long stretch of King's Parade in central Cambridge, and it is able to determine both location and orientation within a few metres and a few degrees, which is far more accurate than GPS—a vital consideration for driverless cars. Users can try out the system for themselves here.

The localisation system uses the geometry of a scene to learn its precise location, and is able to determine, for example, whether it is looking at the east or west side of a building, even if the two sides appear identical.

"Work in the field of artificial intelligence and robotics has really taken off in the past few years," said Kendall. "But what's cool about our group is that we've developed technology that uses deep learning to determine where you are and what's around you - this is the first time this has been done using deep learning."

"In the short term, we're more likely to see this sort of system on a domestic robot - such as a robotic vacuum cleaner, for instance," said Cipolla. "It will take time before drivers can fully trust an autonomous car, but the more effective and accurate we can make these technologies, the closer we are to the widespread adoption of driverless cars and other types of autonomous robotics."

The researchers are presenting details of the two technologies at the International Conference on Computer Vision in Santiago, Chile.

Explore further: Where we are on the road to driverless cars

Related Stories

Where we are on the road to driverless cars

November 5, 2015

Who doesn't like the idea of getting in your car, sitting back finishing off your coffee and reading the paper while the vehicle whisks you to your destination? We're not quite there yet, but what is available are technologies ...

Self-driving cars could be the answer to congested roads

November 24, 2014

If cars with drivers still suffer under gridlock conditions on roads, how will driverless cars fare any better? With greater computerisation and network awareness, driverless cars may be the answer to growing traffic congestion.

Recommended for you

New method analyzes corn kernel characteristics

November 17, 2017

An ear of corn averages about 800 kernels. A traditional field method to estimate the number of kernels on the ear is to manually count the number of rows and multiply by the number of kernels in one length of the ear. With ...

Optically tunable microwave antennas for 5G applications

November 16, 2017

Multiband tunable antennas are a critical part of many communication and radar systems. New research by engineers at the University of Bristol has shown significant advances in antennas by using optically induced plasmas ...

9 comments

Adjust slider to filter visible comments by rank

Display comments: newest first

Eikka
not rated yet Dec 21, 2015
so it still has some learning to do for rural, snowy or desert environments


"Some learning" is a gross understatement. In snowy or desert conditions, the difference between a lane or road and the ditch is simply where you drive, because there are no proper lane markers or clear delineation between what is the road and what isn't.

What the road is becomes subjective to the purposeful action of the driver because they have to decide things like how to align into an intersection when there are no lanes and there's a snowdrift blocking part of the road. It's not enough to just label things - perception isn't a one-way street.

To operate in such ambiguity, they need something a bit more sophisticated. And that too is a gross understatement.

What they're attempting is a top-down understanding of the world: "This looks like road, it is road", where instead people go "This looks driveable, it is road". The "looks driveable" is the difficult part that AI can't do.
Eikka
not rated yet Dec 21, 2015
I've used the example of a small chair or a stool before, where the object is either a stool, a coffee table, or if you flip it upside down it becomes a boot drying rack.

An object, and the perception thereof, is not actually defined by how it looks like but by its context or relation to other things, and therefore it's actually impossible to derive a simple algorithm that says "This is a table, that is a chair".

It takes the context of a purposeful actor to make the table and chair. Otherwise they're just assemblies of materials and shouldn't really have any label. To name a thing means giving it an identity, which implies assumptions about the qualities of the object and of its purpose rather than simply its visual appearance, but since the algorithm has no understanding of these, it can't tell the difference when a chair is a chair and when it is a coffee table or a nightstand.

Same problem here: if you put up a cardboard road sign, will the algorithm take it for real?
antialias_physorg
5 / 5 (1) Dec 21, 2015
An object, and the perception thereof, is not actually defined by how it looks like but by its context or relation to other things, and therefore it's actually impossible to derive a simple algorithm that says "This is a table, that is a chair".

The relation aspect is actually being used in image segmentation these days (e.g. in medical images). While it may not be possible to 100% identify if this blob in the x-ray is the spleen or the liver or the kindey based on its shape and gray value alone it IS possible to 100% identify the vertebrae - and based on the where the blob is in relation to the vertebrae you can then identify it as the correct organ. Similar approaches apply to street image segmentation.

Image segmentation has come a long way from just saying "if it has shape x then it is object y"
antialias_physorg
5 / 5 (1) Dec 21, 2015

Same problem here: if you put up a cardboard road sign, will the algorithm take it for real?

That depends on how many sensors you use to identify it.
A system relying only on visual input might be fooled.
E.g. you can construct (to humans completely unrecognizable) images that will be classified with high amount of confidence by DNNs
http://www.evolvi.../fooling

But if one adds sensors that also takes in the IR spectrum, laser reflection aspects, or simply the GPS coordinates and information provided by the local municipality about the location of street signs (which are known and recorded) and then plugs this into a probabilistic analysis algorithm then the car will not be fooled.

All current approaches for autonomous driving rely on multiple modalities for sensing their environment.
Eikka
not rated yet Dec 21, 2015
Similar approaches apply to street image segmentation.


Except street image segmentation is a lot more complex than identifying organs that are almost invariably there in roughly the same places in a living human being, and they have no alternate purposes such as a kid's toy balloon behind a bush being mistaken for a road sign.

The point I was making is, that an algorithm that simpy labels things is missing half the story because all the radars, lidars and cameras are converging onto a simple label that says "Road sign, 60 mph limit", and the AI then tries to act accordingly without knowing whether that information is accurate or even true.

Eikka
not rated yet Dec 21, 2015
All current approaches for autonomous driving rely on multiple modalities for sensing their environment.


The current approaches are all relying mostly on GPS and radar/lidar, and dead reckoning from the wheels. They do surprisingly little sensing of their environment at a rather coarse resolution at least in the processing stages, because they're mainly concerned with knowing where they are on an inch-by-inch basis rather than what's around them.

Actually identifying what is around them is less critical because they get information about lanes and routes and speed limits, traffic lights etc. from elsewhere, from the internet and from their human trainers. They don't actually glean that information from their surroundings on their own, rather they're told what everything is.

In this simple approach, the cars don't "need" to observe the environment beyond watching that they're not going to collide with any of the indistinct blobs they see in the 3D radar data.
Eikka
not rated yet Dec 21, 2015
(which are known and recorded)


That's the thing you can't rely on, especially in the context of driving in the winter.
antialias_physorg
5 / 5 (1) Dec 21, 2015

Except street image segmentation is a lot more complex than identifying organs that are almost invariably there in roughly the same places

Medical image segmentation isn't as easy as all that, either, since what you are almost always looking at is a pathological x-ray. Little point in making x-rays of healthy people. There is quite a body of knowledge how to robustely segment stuff even if it shows quite severe aberrations from the expected norm.

The point I was making is, that an algorithm that simpy labels things is missing half the story because all the radars, lidars and cameras

The more sensors (modalities) you have the more robust your decision making algorithm becomes. This article shows that with very little effort they can add a modality that is already very robust on its own.
As the article states:
"Vision is our most powerful sense and driverless cars will also need to see,"

Note the "also".
antialias_physorg
5 / 5 (1) Dec 21, 2015
To elucidate this further: This algorithm labels stuff based on vision. Other algorithms are running in parallel (based on the lidar input, based on the knowledge of sign post placement input, possibly even based on input from the results from other vehicles in the vicinity, etc. )

Each algorithm gives the stuff it 'sees' labels. If most algorithms agree on what they see then you can make a classification with high confidence. It is not necessary for all algorithms to always be 100% correct to have a safe system.
(Humans don't act any differently. We try to find clues from our surroundings if vision isn't enough to classify something 100%)

To your 'winter' example: If a sign is snowed over then a human driver has even less chance to know what it's supposed to be. He might recal from memory (so might a vehicle, BTW - so no advantage/disadvantage, there). But a vehicle might actively query some municipal database to get a response - something a human could never do fast enough.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.