Any child (or spouse) who has been scolded for their tone of voice – such as shouting or being sarcastic – knows that the way you speak to someone can be just as important as the words that you use. Voice artists and actors make great use of this – they are skilled at imparting meaning in the way that they speak, sometimes much more than the words alone would merit.
But just how much information is carried in our tone of voice and conversation patterns and how does that impact our relationships with others? Computational systems can already establish who people are from their voices, so could they also tell us anything about our love life? Amazingly, it seems like it.
New research, just published in the journal PLOS ONE, has analysed the vocal characteristics of 134 couples undergoing therapy. Researchers from the University of Southern California used computers to extract standard speech analysis features from recordings of therapy session participants over two years. The features – including pitch, variation in pitch and intonation – all relate to voice aspects like tone and intensity.
A machine-learning algorithm was then trained to learn a relationship between those vocal features and the eventual outcome of therapy. This wasn't as simple as detecting shouting or raised voices – it included the interplay of conversation, who spoke when and for how long as well as the sound of the voices. It turned out that ignoring what was being said and considering only these patterns of speaking was sufficient to predict whether or not couples would stay together. This was purely data driven, so it didn't relate outcomes to specific voice attributes.
Interestingly, the full video recordings of the therapy session were then given to experts to classify. Unlike the AI, they made their predictions using psychological assessment based on the vocal (and other) attributes – including the words spoken and body language. Surprisingly, their prediction of the eventual outcome (they were correct in 75.6% of the cases) was inferior to predictions made by the AI based only on vocal characteristics (79.3%). Clearly there are elements encoded in the way we speak that not even experts are aware of. But the best results came from combining the automated assessment with the experts' assessment (79.6% correct).
The significance of this is not so much about involving AI in marriage counselling or getting couples to speak more nicely to each other (however meritorious that would be). The significance is revealing how much information about our underlying feelings is encoded in the way we speak – some of it completely unknown to us.
Words written on a page or a screen have lexical meanings derived from their dictionary definitions. These are modified by the context of surrounding words. There can be great complexity in writing. But when words are read aloud, it is true that they take on additional meanings that are conveyed by word stress, volume, speaking rate and tone of voice. In a typical conversation there is also meaning in how long each speaker talks for, and how quickly one or other might interject.
Consider the simple question "Who are you?". Try speaking this with stress on different words; "Who are you?", "Who are you?" and "Who are you?". Listen to these – the semantic meaning can change with how we read even when the words stay the same.
Computers reading 'leaking senses'?
It is unsurprising that words convey different meanings depending on how they are spoken. It is also unsurprising that computers can interpret some of the meaning behind how we choose to speak (maybe one day they will even be able to understand irony).
But this research takes matters further than just looking at the meaning conveyed by a sentence. It seems to reveal underlying attitudes and thoughts that lie behind the sentences. This is a much deeper level of understanding.
The therapy participants were not reading words like actors. They were just talking naturally – or as naturally as they could in a therapist's office. And yet the analysis revealed information about their mutual feelings that they were "leaking" inadvertently into their speech. This may be one of the first steps in using computers to determine what we are really thinking or feeling. Imagine for a moment conversing with future smartphones – will we "leak" information that they can pick up? How will they respond?
Could they advise us about potential partners by listening to us talking together? Could they detect a propensity towards antisocial behaviour, violence, depression or other conditions? It would not be a leap of imagination to imagine the devices themselves as future therapists – interacting with us in various ways to track the effectiveness of interventions that they are delivering.
Don't worry just yet because we are years away from such a future, but it does raise privacy issues, especially as we interact more deeply with computers at the same time as they are becoming more powerful at analysing the world around them.
When we pause also to consider the other human senses apart from sound (speech); perhaps we also leak information through sight (such as body language, blushing), touch (temperature and movement) or even smell (pheromones). If smart devices can learn so much by listening to how we speak, one wonders how much more could they glean from the other senses.
Explore further: Before babies understand words, they understand tones of voice