Speech recognition technology is not a solution for poor readers
About one in five people is considered to be low literate or illiterate, unable to read or write simple statements. Low literacy can be due to reading impairments such as dyslexia or little or no reading practice. For developing countries with low literacy rates, voice recognition has been hailed as a solution by companies such as Google. But is speech technology really the solution?
Falk Huettig and Martin Pickering argue that it is not. In an opinion article in Trends in Cognitive Sciences, the psycholinguists suggest that relying on speech technology might be counterproductive, as literacy has crucial benefits beyond reading. "It is very relevant and timely to look at the advantages of reading on speech, especially as people tend to read less and in different ways than they used to," says Falk Huettig. "Contemporary social media writing and reading habits, for example, are quite different from traditional print media. Information that people used to get from written sources such as novels, newspapers, public notices, or even recipe books they get increasingly from YouTube videos, podcasts or audiobooks."
This is not necessarily a bad thing, as some of the general benefits of reading can also be obtained from listening to audiobooks. As audio books also provide "book language," listening to them will confer some similar advantages—such as a larger vocabulary, increased knowledge of the world and a larger short-term ('working') memory, which is important to keep track of information and multiple entities over several sentences, paragraphs, or often even pages.
But according to Huettig and Pickering, reading itself—the actual physical act of reading—is crucially important for developing the skill of predicting upcoming words, which transfers from reading to understanding spoken language. Reading trains the language prediction system, although even very young children who cannot read yet can predict where a sentence is going. When two-year-olds hear "the boy eats a big cake," they expect something edible (i.e. a cake) after hearing "eats," but before hearing "cake." Predicting upcoming information is useful, as it reduces processing load and frees up limited brain resources. And crucially, skilled readers get much better at predicting.
Children who are among the most avid readers encounter over 4 million words a year, while children who rarely read encounter only about 50,000 words. As a result, good readers get a deeper understanding of the meaning of words and build large networks of words with strong associations between them, which helps them to predict upcoming words. As poor readers have smaller vocabularies and weaker neural representations of words (i.e. the recollection of the sound and meaning of a word), the predictive relationships between words are also weaker (e.g., the prediction that "read the" ... is often followed by "book").
The literate mind
As reading is self-paced, there is a strong incentive to predict upcoming words, as this speeds up reading, which is typically much faster than listening. Skilled readers tend to take in whole words at one glance (gazing with their eyes at multiple letters at the same time) and time their eye movements to optimise the reading process. Printed texts (even given the occasional changes in fonts and word capitalizations) are much more regular than conversational speech, which is full of disfluencies, incomplete word pronunciations and speech errors. This regularity of written texts helps readers to form the predictive relationships between words that then, by extension, can also be used to better predict words when listening to speech.
The very concept of a word is an invention of the literate mind; it is hard to grasp if for an illiterate person who only ever hears a stream of speech sounds. For example, when illiterate people or children who haven't learned to read yet are asked to repeat the last word of a spoken sentence, they tend to repeat the whole sentence. In contrast, words clearly stand out in written language, typically being separated by white space. Written forms make words more salient and precise: readers become more aware that words are stable units in language. Storing the written form of words in memory also helps to make spoken word forms more salient, to be accessed faster when predicting upcoming speech. And, again, it is prediction of upcoming language that makes language understanding become really fast and proficient.
"Our arguments provide one more reason why more efforts should be undertaken to teach the hundreds of millions of illiterates in developing countries and functional illiterates across the world how to read (or to read better) and why a focus on artificial intelligence voice recognition and voice assistants to overcome literacy-related problems has its dangers," the authors argue.
"Writing is an ancient human technology that we shouldn't give up easily. Teaching how to read and how to read better remains very important even in a modern technological world," concludes Huettig.