Team creates click-and-drag interface enabling rapid video object segmentation
Investigators at Disney Research Zurich have developed a method for achieving very accurate object segmentation of video by enabling human editors to work efficiently with state-of-the-art algorithms using a click-and-drag interface.
Segmentation, which identifies objects, backgrounds and other meaningful regions within an image or video, is a necessary step for many editing tasks and for image analysis. People can readily perceive objects and the composition of scenes despite variations in colors, lighting and contours, but despite significant advances in recent years, automatic image segmentation is still inexact.
By keeping a human in the loop, a semi-automatic segmentation system is able to achieve much higher accuracies than possible with fully automated systems, said Miquel A. Farré, associate research engineer.
The interface he and Jordi Pont-Tuset and Aljoscha Smolic developed allows human editors to work with computer software to rapidly identify the regions to be segmented in a video image and then propagate those segmentations in subsequent video frames. They will present their method at the International Workshop on Content-Based Multimedia Indexing, June 10-12 in Prague, Czech Republic.
To minimize the amount of human interaction necessary to achieve an accurate result, the researchers used an approach called segmentation hierarchy to represent images. It transforms an image from a matrix of pixels to a tree-like graph whose nodes represent regions of the images, from small detailed regions in the graph's "leaves" to the whole image in the root.
The software automatically builds these tree-like graphs and the human editor, using the click-and-drag interface, selects and annotates the combination of nodes that represent the object or other image element of interest. The software tracks these nodes - as well as the nodes that the editor avoided - in subsequent frames of the video, with the editor making corrections and adjustments as necessary. This method optimized the quality of the segmentation, they said, while minimizing the amount of human interaction.
The researchers evaluated their method by comparing its performance with 88 sequences obtained from a standard video dataset that had been segmented by human editors. They found that the quality achieved by the semi-automatic method was similar in most cases to that of manual segmentation while requiring much less time.