Smartphones might soon develop emotional intelligence thanks to team's algorithm for speech-based emotion classification

Dec 04, 2012
Credit: Na Yang, et al. / University of Rochester

If you think having your phone identify the nearest bus stop is cool, wait until it identifies your mood. New research by a team of engineers at the University of Rochester may soon make that possible. At the IEEE Workshop on Spoken Language Technology on Dec. 5, the researchers will describe a new computer program that gauges human feelings through speech, with substantially greater accuracy than existing approaches.

Surprisingly, the program doesn't look at the meaning of the words. "We actually used recordings of actors reading out the date of the month – it really doesn't matter what they say, it's how they're saying it that we're interested in," said Wendi Heinzelman, professor of electrical and .

Heinzelman explained that the program analyzes 12 features of speech, such as pitch and volume, to identify one of six emotions from a sound recording. And it achieves 81 percent accuracy – a significant improvement on earlier studies that achieved only about 55 percent accuracy.

The research has already been used to develop a prototype of an app. The app displays either a happy or sad face after it records and analyzes the user's voice. It was built by one of Heinzelman's graduate students, Na Yang, during a summer internship at Microsoft Research. "The research is still in its early days," Heinzelman added, "but it is easy to envision a more complex app that could use this technology for everything from adjusting the colors displayed on your mobile to fitting to how you're feeling after recording your voice."

Heinzelman and her team are collaborating with Rochester Melissa Sturge-Apple and Patrick Davies, who are currently studying the interactions between teenagers and their parents. "A reliable way of categorizing emotions could be very useful in our research,". Sturge-Apple said. "It would mean that a researcher doesn't have to listen to the conversations and manually input the emotion of different people at different stages."

Teaching a computer to understand emotions begins with recognizing how humans do so.

"You might hear someone speak and think 'oh, he sounds angry!' But what is it that makes you think that?" asks Sturge-Apple. She explained that emotion affects the way people speak by altering the volume, pitch and even the harmonics of their speech. "We don't pay attention to these features individually, we have just come to learn what angry sounds like – particularly for people we know," she adds.

But for a computer to categorize emotion it needs to work with measurable quantities. So the researchers established 12 specific features in speech that were measured in each recording at short intervals. The researchers then categorized each of the recordings and used them to teach the computer program what "sad," "happy," "fearful," "disgusted," or "neutral" sound like.

The system then analyzed new recordings and tried to determine whether the voice in the recording portrayed any of the known emotions. If the was unable to decide between two or more emotions, it just left that recording unclassified.

"We want to be confident that when the computer thinks the recorded speech reflects a particular emotion, it is very likely it is indeed portraying this emotion," Heinzelman explained.

Previous research has shown that emotion classification systems are highly speaker dependent; they work much better if the system is trained by the same voice it will analyze. "This is not ideal for a situation where you want to be able to just run an experiment on a group of people talking and interacting, like the parents and teenagers we work with," Sturge-Apple explained.

Their new results also confirm this finding. If the speech-based emotion classification is used on a voice different from the one that trained the system, the accuracy dropped from 81 percent to about 30 percent. The researchers are now looking at ways of minimizing this effect, for example, by training the system with a voice in the same age group and of the same gender. As Heinzelman said, "there are still challenges to be resolved if we want to use this system in an environment resembling a real-life situation, but we do know that the algorithm we developed is more effective than previous attempts."

Explore further: MIT groups develop smartphone system THAW that allows for direct interaction between devices

More information: For more information on the project visit www.ece.rochester.edu/projects… /project_bridge.html

Related Stories

Your Next Computer May Know How You Feel

Apr 14, 2010

(PhysOrg.com) -- Friends, loved ones and pets can sense your mood almost instantly - and one day your computer may be able to do so pretty quickly as well.

Perception of emotion is culture-specific

Sep 15, 2010

Want to know how a Japanese person is feeling? Pay attention to the tone of his voice, not his face. That's what other Japanese people would do, anyway. A new study examines how Dutch and Japanese people assess others' emotions ...

Speech Synthesizer Helps Movie Critic

Jun 15, 2010

The voices you hear on message services are often created artificially by fitting together short audio snippets from a large library of vocalized words and sounds. Scientists are now moving beyond the older ...

Recommended for you

Who drives Alibaba's Taobao traffic—buyers or sellers?

Sep 18, 2014

As Chinese e-commerce firm Alibaba prepares for what could be the biggest IPO in history, University of Michigan professor Puneet Manchanda dug into its Taobao website data to help solve a lingering chicken-and-egg question.

Computerized emotion detector

Sep 16, 2014

Face recognition software measures various parameters in a mug shot, such as the distance between the person's eyes, the height from lip to top of their nose and various other metrics and then compares it with photos of people ...

User comments : 0