Seamless grasping and manipulation of known and unknown objects in unseen and changing environments – aka the real world – is arguably akin to the Holy Grail in robotics research. In order to perform grasping and manipulation tasks in unstructured environments of the real world, a robot must be able to compute grasps for the almost unlimited number of objects it might encounter.
In addition, a robot needs to be able to operate in dynamic environments, whether that be changes in the robot’s workspace, noise and errors in perception, inaccuracies in the robot’s control or perturbations to the robot itself.
Recent advances in grasp synthesis have been made with the proliferation of vision-based deep learning techniques. However, the primary approach has been to use adapted versions of Convolutional Neural Network (CNN) architectures designed for object recognition.
Ultimately, the real challenge is to develop a faster and more accurate way for robots to grasp objects in cluttered and changing environments, improving usefulness in both industrial and domestic settings.
The ACRV's proposed Generative Grasping Convolutional Neural Network (GG-CNN) aimed to overcome limitations of current deep-learning grasping techniques by avoiding discrete sampling of grasp candidates and long computation times.
Their approach focused on one-to-one mapping from a depth image, predicting the quality and pose of grasps at every pixel. In these trials, they used a Kinova Mico 6DOF robot (no longer commercialized, see our next-generation robotic arms) fitted with a Kinova KG-2 two-fingered gripper.
"Kinova's robot is robust and easy to use (not forgetting its awesome software), making it the ideal platform to prototype from. The new Gen3 looks like it has super-sized in terms of hardware, packing an even more powerful punch as a research tool."
Doug Morrison, PhD Researcher, Australian Centre for Robotic Vision
Significantly smaller and faster than other Convolutional Neural Networks, The Australian Centre for Robotic Vision’s GG-CNN achieved state-of-the-art results in grasping unknown, dynamic objects, including objects in cluttered and changing environments. The final GG-CNN contained 62,420 parameters, compared to CNNs used for grasp candidate classification in other works containing hundreds of thousands or millions of parameters.
Their network’s lightweight and single-pass generative nature allowed for closed-loop control at up to 50Hz, enabling accurate grasping in non-static environments where objects move and in the presence of robot control inaccuracies.
83% grasp success rate on a set of previously unseen objects with adversarial geometry
88% success rate on a set of household objects moved during the grasp attempt
81% accuracy when grasping in dynamic clutter