Robotic grasping in dynamic and unstructured environments

Robotic grasping in unstructured environments

The Australian Centre for Robotic Vision (ACRV) headquartered at Queensland University of Technology (QUT), is a research institution for tasks a robot can tackle with vision such as grasping, perception and visual servoing. It’s demonstrated multiple times that challenging robotics problems have solutions, such as when a team of researchers from ACRV won the Amazon Picking Challenge.

As testament to their great works and successful results with their open, modular, easy-to- integrate approach, Kinova was inspired to adapt their vision aided object picking application described in this article so it would work with our KINOVA Gen3 Ultra lightweight robot with integrated depth and color sensor.


Grasping and manipulating objects has proved to be difficult for robots

Seamless grasping and manipulation of known and unknown objects in unseen and changing environments – aka the real world – is arguably akin to the Holy Grail in robotics research. While most people don’t think about picking up and moving objects (something human brains have learned over time through repetition and routine) for robots, grasping and manipulation is subtle and elusive. In order to perform grasping and manipulation tasks in unstructured environments of the real world, a robot must be able to compute grasps for the almost unlimited number of objects it might encounter. In addition, a robot needs to be able to operate in dynamic environments, whether that be changes in the robot’s workspace, noise and errors in perception, inaccuracies in the robot’s control or perturbations to the robot itself.


Develop a faster and more accurate way for robots to grasp objects in unstructured environments

Recent advances in grasp synthesis have been made with the proliferation of vision-based deep learning techniques. However, the primary approach has been to use adapted versions of Convolutional Neural Network (CNN) architectures designed for object recognition.

In most cases, this results in long computation times due to individually sampling and ranking grasp candidates. That said, these techniques are rarely used in closed-loop grasping and rely on precise camera calibration and precise robot control to grasp successfully, even in static environments.

Ultimately, the real challenge is to develop a faster and more accurate way for robots to grasp objects in cluttered and changing environments, improving usefulness in both industrial and domestic settings.

Developing a faster and more accurate way for robots to grasp objects in cluttered and changing environments improves their usefulness in both industrial and domestic settings.

The approach

Real-time, object-independent grasp synthesis method for closed-loop grasping

The research team at the Australian Centre for Robotic Vision focused on a different approach to selecting grasp points for previously unseen objects – namely a real-time, object-independent grasp synthesis method which can be used for closed-loop grasping.

Their proposed Generative Grasping Convolutional Neural Network (GG-CNN) aimed to overcome limitations of current deep-learning grasping techniques by avoiding discrete sampling of grasp candidates and long computation times.

Their approach focused on one-to-one mapping from a depth image, predicting the quality and pose of grasps at every pixel. In these trials, they used a Kinova Mico 6DOF robot (no longer commercialized, see our next-generation robotic arms) fitted with a Kinova KG-2 two-fingered gripper.

The results

Between 81% and 88% success rates in various applications

Significantly smaller and faster than other Convolutional Neural Networks, The Australian Centre for Robotic Vision’s GG-CNN achieved state-of-the-art results in grasping unknown, dynamic objects, including objects in cluttered and changing environments. The final GG-CNN contained 62,420 parameters, compared to CNNs used for grasp candidate classification in other works containing hundreds of thousands or millions of parameters.

Their network’s lightweight and single-pass generative nature allowed for closed-loop control at up to 50Hz, enabling accurate grasping in non-static environments where objects move and in the presence of robot control inaccuracies.