Case Study: Brown University
The thinking behind this project was that in order to collaborate with humans, a manipulator robot should be able to draw or write on a white board or even a post-it note. Long term, the ability to write would enable a robot to put up a sign directing people that a hallway was closed, to produce art using physical mediums such as a paint brush or a pen, or to address and mail a letter.
Additionally, the robot could potentially engage in teaching activities using a white board, writing a math equation or drawing a diagram. These skills rely on the ability to produce a policy to draw with a writing utensil.
Up until now, work in this area required the robot to have information about the stroke order in advance. In order for this challenge to succeed, students had to find a way to teach the robot to reproduce an image of just-drawn handwritten characters by inferring a plan to replicate the image with a writing utensil.
The research team of computer scientists had to develop an algorithm using deep learning networks so the robot could analyze images of handwritten words or sketches, deduce the likely series of pen strokes that created them, and reproduce them using stroke patterns similar to human handwriting.
The approach contained two distinct models:
Drawing action: A “local” model observed a 5x5 pixels region around the current pen-tip location and determined in which direction to move and when to end the stroke.
- Shifting action: A “global” model to move the robot’s writing utensil to the next stroke of the character.
"I want a robot to be able to do everything a person can do. I’m particularly interested in a robot that can use language. Kinova’s robots have been reliable and easy to use for our research team and reproduced human drawn characters and images very accurately."
Stefanie Tellex, Assistant professor of computer science, Brown University
The team of data scientists at Brown University built a system that works in real-time, enabling the robot to view an image, infer a plan to replicate it, and immediately start drawing it. The robot draws each target stroke in one continuous drawing motion and does not rely on handcrafted rules or on predefined paths of characters. Instead, it learns to write from a dataset of demonstrations. When the robot begins to draw, it must collect the following information before making the next movement:
- Already visited regions
- Current location
- Difference image
- Continuously connected target region
To measure performance, they introduced two metrics: pixel accuracy and stroke accuracy. Pixel accuracy measures how similar the target image and the drawn image are, stroke accuracy checks if the model drew one stroke in one continuous action. They also measured the performance by stroke accuracy, which checked if the robot drew one stroke in one continuous action. The network enabled the robot to not only reproduce characters in different languages, but also replicate any stroke-based drawing, immediately after seeing it for the first time. The solution was first tested in simulation mode, then with different robots.