Multimodal interaction: Humanizing the human-computer interface

Dec 14, 2011 By Adarsh Sandhu
Multimodal interaction: Humanizing the human-computer interface
Kouichi Katsurada. Credit: TUT

In everyday life humans use speech, gestures, facial expressions, touch to communicate. And, over long distances we resort to text messages and other such modern technology. Notably, when we interact with computers we rely exclusively on text and touch in the form of the keyboard/mouse and touch screens.

Kouichi Katsurada is an associate professor at Toyohashi Tech’s Graduate School of Engineering with a mission to ‘humanize’ the computer interface. Katsurada’s research centers on the expansion of human-computer communication by means of a web-based multimodal interactive (MMI) approach employing speech, gesture and , as well as the traditional keyboard and mouse.

“Although many MMI systems have been tried, few are widely used,” says Katsurada. “Some reasons for this lack of use are their complexity of installation and compilation, and their general inaccessibility for ordinary computer users. To resolve these issues we have designed a web browser-based MMI system that only uses open source software and de facto standards.”

Multimodal interaction: Humanizing the human-computer interface
Multimodal interaction. Credit: TUT

This openness has the advantage that it can be executed on any web browser, handle JavaScript, Java applets and Flash, and can be used not only on a PC but also on mobile devices like smart phones and tablet computers.

The user can interact with the system by speaking directly with an anthropomorphic agent that employs speech recognition, speech synthesis and facial image synthesis.

For example, a user can recite a telephone number, which is recorded by the computer and the data sent via the browser to a session manager on the server housing the MMI system. The data is processed by the speech recognition software and sent to a scenario interpreter, which uses XISL (extensible Interaction Scenario Language) to manage the human-computer dialogue.

“XISL is a multimodal interaction description language based on the XML markup language,” says Katsurada. “Its advantage over other MMI description languages is that it has sufficient modal extensibility to deal with various modes of communication without having to change its specifications. Another advantage is that it inherits features from VoiceXML, as well as SMIL used for authoring interactive audio-video presentations.”

On the downside, XISL requires authors to use a large number of parameters for describing individual input and output tags, making it a cumbersome language to use. “In order to solve this problem, we will provide a GUI-prototyping tool that will make it easier to write XISL documents,” says Katsurada.

“Currently, we can use some voice commands and the keyboard with the system, and in the future we will add both touch and for devices equipped with touch displays and cameras,” says Katsurada. “In other words, it is our aim is to make interaction with the computer as natural as possible.”

Explore further: Researchers reverse-engineering China's online censorship methods reveal government's deepest concerns

Provided by Toyohashi University of Technology

not rated yet
add to favorites email to friend print save as pdf

Related Stories

Apple seeks patents for display and noise-out systems

Dec 11, 2011

(PhysOrg.com) -- Apple made patent news this week in two directions, toward a Kinect like system and toward a quest for excellence in sound quality on phones. It’s been reported that Apple has filed patent ...

'Motherese' important for children's language development

May 06, 2011

(Medical Xpress) -- Talking to children has always been fundamental to language development, but new research reveals that the way we talk to children is key to building their ability to understand and create ...

iSchool prof predicts the future of search user interfaces

Nov 07, 2011

School of Information professor Marti Hearst predicts the future of online search interfaces in an article in this month’s edition of the Communications of the ACM. “The future of user interfaces will involv ...

Smart listeners and smooth talkers

Nov 17, 2011

Human-like performance in speech technology could be just around the corner, thanks to a new research project that links three UK universities.

New device provides 'voice' for patients who can't speak

Jul 18, 2011

When Vernia Moore suffered a stroke she took full stock of her functions in the recovery room. Arms and hands moving? Check. Legs and feet okay? Check. Memory intact, with full comprehension? Check, check. ...

Recommended for you

Enabling a new future for cloud computing

Aug 21, 2014

The National Science Foundation (NSF) today announced two $10 million projects to create cloud computing testbeds—to be called "Chameleon" and "CloudLab"—that will enable the academic research community ...

Hacking Gmail with 92 percent success

Aug 21, 2014

(Phys.org) —A team of researchers, including an assistant professor at the University of California, Riverside Bourns College of Engineering, have identified a weakness believed to exist in Android, Windows ...

User comments : 0