Case Study: Brown University

Students in Professor Stefanie Tellex’s graduate course “Topics in Collaborative Robotics” at Brown University had a vision: Teaching robots to infer human handwriting and immediately reproduce what they see using human-like strokes.

The key objective was to make it possible for a manipulator robot to reproduce the target  image of characters or drawings only from its pictorial information, while preserving details such as line continuity and stroke orders. After training the model, the hope that the robot gained general knowledge of drawing, which would allow itself to write anything at first sight.

background icon

Background

The thinking behind this project was that in order to collaborate with humans, a manipulator robot should be able to draw or write on a white board or even a post-it note. Long term, the ability to write would enable a robot to put up a sign directing people that a hallway was closed, to produce art using physical mediums such as a paint brush or a pen, or to address and mail a letter.

Additionally, the robot could potentially engage in teaching activities using a white board, writing a math equation or drawing a diagram. These skills rely on the ability to produce a policy to draw with a writing utensil.

challenge icon

The Challenge

Up until now, work in this area required the robot to have information about the stroke order in advance. In order for this challenge to succeed, students had to find a way to teach the robot to reproduce an image of just-drawn handwritten characters by inferring a plan to replicate the image with a writing utensil.

The research team of computer scientists had to develop an algorithm using deep learning networks so the robot could analyze images of handwritten words or sketches, deduce the likely series of pen strokes that created them, and reproduce them using stroke patterns similar to human handwriting.

Approach icon

The Approach

The approach contained two distinct models: 

  • Drawing action: A “local” model observed a 5x5 pixels region around the current pen-tip location and determined in which direction to move and when to end the stroke.  

  • Shifting action: A “global” model to move the robot’s writing utensil to the next stroke of the character. 

"I want a robot to be able to do everything a person can do. I’m particularly interested in a robot that can use language. Kinova’s robots have been reliable and easy to use for our research team and reproduced human drawn characters and images very accurately."

Stefanie Tellex, Assistant professor of computer science, Brown University

The Solution

The team of data scientists at Brown University built a system that works in real-time, enabling the robot to view an image, infer a plan to replicate it, and immediately start drawing it. The robot draws each target stroke in one continuous drawing motion and does not rely on handcrafted rules or on predefined paths of characters. Instead, it learns to write from a dataset of demonstrations. When the robot begins to draw, it must collect the following information before making the next movement:

  • Already visited regions
  • Current location
  • Difference image
  • Continuously connected target region

To measure performance, they introduced two metrics: pixel accuracy and stroke accuracy. Pixel accuracy measures how similar the target image and the drawn image are, stroke accuracy checks if the model drew one stroke in one continuous action. They also measured the performance by stroke accuracy, which checked if the robot drew one stroke in one continuous action. The network enabled the robot to not only reproduce characters in different languages, but also replicate any stroke-based drawing, immediately after seeing it for the first time. The solution was first tested in simulation mode, then with different robots.

99% accuracy

The robot achieved a remarkable 99% pixel-wise accuracy reproducing Japanese characters marked 
96% continuity

Preservation of accuracy in line continuity reproducing Japanese characters in the test.
10 languages

The robot was able to infer and reproduce the word “Hello” in 10 languages, not to mention a sketch of the Mona Lisa.