Google AI Creates Algorithm To Help Robots See Clear Objects
- By John K. Waters
Google's artificial intelligence (AI) group has joined forces with researchers at Columbia University and computer vision company Synthesis AI to develop a machine learning (ML) algorithm capable of estimating accurate 3D data of transparent objects from RGB-D images.
Dubbed ClearGrasp, the algorithm employs three deep convolutional networks: one to estimate surface normals, one for occlusion boundaries (depth discontinuities), and one that masks transparent objects. The mask is used to remove all pixels belonging to transparent objects, the researchers explained, so that the correct depths can be filled in.
"We then use a global optimization module that starts extending the depth from known surfaces, using the predicted surface normals to guide the shape of the reconstruction, and the predicted occlusion boundaries to maintain the separation between distinct objects," the researchers explained in a blog post.
ClearGrasp provides a deep learning approach that uses a single RGB-D image. "RGB" (red, green, blue) refers to a system for representing the colors used on a computer display. Red, green, and blue can be combined in various proportions to create any color in the visible spectrum. An RGB-D image is a combination of an RGB image and its corresponding depth image.
Optical sensors, such as cameras and lidar (Light Detection And Ranging), have become critical components of modern robotics design. Everything from a mobile robot piece-picking deployment in a warehouse to a fleet of self-driving cars needs to be able to "see" obstacles to avoid and/or objects to grasp—even if they're transparent.
Recognizing a window or a plastic bottle is simple for the human eye, but trickier for robot vision systems, which are traditionally taught to recognize objects that reflect light evenly in all directions. The surfaces of transparent objects both reflect and refract light, which tends to confound existing systems.
"Enabling machines to better sense transparent surfaces would not only improve safety," the researchers said, "but could also open up a range of new interactions in unstructured applications — from robots handling kitchenware or sorting plastics for recycling, to navigating indoor environments or generating AR visualizations on glass tabletops."
To train and test ClearGrasp, the researchers constructed a large-scale synthetic dataset of more than 50,000 RGB-D images, along with a real-world test benchmark with 286 RGB-D images of transparent objects and their "ground truth geometries." The researchers have released to the public this dataset, along with a dataset of 286 real-world transparent objects used in the development of ClearGrasp.
The experiments demonstrated that ClearGrasp is substantially better than monocular depth estimation baselines and is capable of generalizing to real-world images and novel objects, the researchers explained. They also demonstrated that ClearGrasp can be applied out-of-the-box to improve grasping algorithms' performance on transparent objects.
"We hope that our dataset will drive further research on data-driven perception algorithms for transparent objects," they wrote.
Download links and more example images are available on the project website and a GitHub repository.
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at firstname.lastname@example.org.