The laser sensors currently used to detect 3D objects in the paths of autonomous cars are bulky, ugly, expensive, energy-inefficient – and highly accurate.
These Light Detection and Ranging (lidar) sensors are affixed to cars’ roofs, where they increase wind drag, a particular disadvantage for electric cars. They can add around $10,000 to a car’s cost. But despite their drawbacks, most experts have considered lidar sensors the only plausible way for self-driving vehicles to safely perceive pedestrians, cars and other hazards on the road.
Now, Cornell researchers have discovered that a simpler method, using two inexpensive cameras on either side of the windshield, can detect objects with nearly lidar’s accuracy and at a fraction of the cost. The researchers found that analyzing the captured images from a bird’s-eye view rather than the more traditional frontal view more than tripled their accuracy, making stereo camera a viable and low-cost alternative to lidar.
“One of the essential problems in self-driving cars is to identify objects around them – obviously that’s crucial for a car to navigate its environment,” said Kilian Weinberger, associate professor of computer science and senior author of the paper, “Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving,” which will be presented at the 2019 Conference on Computer Vision and Pattern Recognition, June 15-21 in Long Beach, California.
“The common belief is that you couldn’t make self-driving cars without LiDARs,” Weinberger said. “We’ve shown, at least in principle, that it’s possible.”
The first author of the paper is Yan Wang, doctoral student in computer science.
Lidar sensors use lasers to create 3D point maps of their surroundings, measuring objects’ distance via the speed of light. Stereo cameras, which rely on two perspectives to establish depth, as human eyes do, seemed promising. But their accuracy in object detection has been woefully low, and the conventional wisdom was that they were too imprecise.
Then Wang and collaborators took a closer look at the data from stereo cameras. To their surprise, they found that their information was nearly as precise as Lidar. The gap in accuracy emerged, they found, when the stereo cameras’ data was being analyzed.
For most self-driving cars, the data captured by cameras or sensors is analyzed using convolutional neural networks – a kind of machine learning that identifies images by applying filters that recognize patterns associated with them. These convolutional neural networks have been shown to be very good at identifying objects in standard color photographs, but they can distort the 3D information if it’s represented from the front. So when Wang and colleagues switched the representation from a frontal perspective to a point cloud observed from a bird’s-eye view, the accuracy more than tripled.
“When you have camera images, it’s so, so, so tempting to look at the frontal view, because that’s what the camera sees,” Weinberger said. “But there also lies the problem, because if you see objects from the front then the way they’re processed actually deforms them, and you blur objects into the background and deform their shapes.”
Ultimately, Weinberger said, stereo cameras could potentially be used as the primary way of identifying objects in lower-cost cars, or as a backup method in higher-end cars that are also equipped with Lidar.
“The self-driving car industry has been reluctant to move away from Lidar, even with the high costs, given its excellent range accuracy – which is essential for safety around the car,” said Mark Campbell, the John A. Mellowes ’60 Professor and S.C. Thomas Sze Director of the Sibley School of Mechanical and Aerospace Engineering and a co-author of the paper. “The dramatic improvement of range detection and accuracy, with the bird’s-eye representation of camera data, has the potential to revolutionize the industry.”
The results have implications beyond self-driving cars, said co-author Bharath Hariharan, assistant professor of computer science.
“There is a tendency in current practice to feed the data as-is to complex machine learning algorithms under the assumption that these algorithms can always extract the relevant information,” Hariharan said. “Our results suggest that this is not necessarily true, and that we should give some thought to how the data is represented.”
Also contributing were Cornell postdoctoral researcher Wei-Lun Chao and Divyansh Garg ’20.
The research was partly supported by grants from the National Science Foundation, the Office of Naval Research and the Bill and Melinda Gates Foundation.