How do we succeed in putting our ideas into words, so that another person can understand them? This complex undertaking involves translating an idea into a one-dimensional sequence, a string of words to be read or spoken one after the other. Of course the person on the receiving end might not get the intended point: The effective expression of one’s ideas is considered an art, or at least a desirable and important skill.
A team of scientists that included physicists and language researchers at the Weizmann Institute of Science recently investigated this process by applying scientific methods to some of our culture’s most successful models for effective transfer of ideas – classic writings that, by common agreement, get their messages across well. They created mathematical tools that allowed them to trace the development of ideas throughout a book. The international team included Prof. Elisha Moses of the Weizmann Institute’s Physics of Complex Systems Department and Prof. Jean-Pierre Eckmann, a frequent visitor from the University of Geneva, as well as postdoctoral fellow Enrique Alvarez Lacalle and research student Beate Dorow from the University of Stuttgart. The paper describing their research was recently published in the Proceedings of the National Academy of Sciences (PNAS).
Because strings of words are one-dimensional, they literally lack depth. Our minds and memories aid us in recreating complex ideas from this string. The narration “encodes” a hierarchical structure. (An obvious hierarchical structure in a text is chapter-paragraph-sentence.) The implication is that our minds decipher the encoded structure, allowing us to comprehend the abstract concept.
To test for an underlying structure in strings of words that are known for their ability to convey ideas, the scientists applied their mathematical tools to a number of books, including writings of Albert Einstein, Mark Twain’s Tom Sawyer, Metamorphosis by Franz Kafka and other classics of different styles and periods. They defined “windows of attention” of around 200 words (about a paragraph) and within these windows, they identified pairs of words that frequently occurred near each other (after eliminating “meaningless” words such as pronouns). From the resulting word lists and the frequencies with which the single words appeared in the text, the scientists’ mathematical analysis was used to construct a sort of network of “concept vectors” – linked words that convey the principal ideas of the text.
Mathematically, these concept vectors can go in many directions, and reading the text can be thought of as a tour along paths in the resulting network. The multidimensional concept vectors seem to span a “web of ideas.” The scientists’ work suggests this network is based on a tree-like hierarchy that may be a basic underpinning of language. The reader or listener can reconstruct the hierarchical structure of a text, and thus the multidimensional space of ideas, in his or her mind to grasp “the author’s meaning.”
Moses: “Philosophers from Wittgenstein to Chomsky have taught us that language plays a central evolutionary role in shaping the human brain, and that revealing the structure of language is an essential step to comprehending brain structure. Our contribution to research in this basic field is in the creation of mathematical tools that can be used to make the connection between concepts or ideas and the words used to express them, making it possible to trace in a speech or text the path of an idea in an abstract mathematical space. We can understand theoretically how the structure of the wording serves to transmit concepts and reconstruct them in the mind of the reader. A deep question that remains open is if and how the correlations we uncovered serve the aesthetics of the text.”
Prof. Elisha Moses' research is supported by the Clore Center for Biological Physics; the Center for Experimental Physics; and the Rosa and Emilio Segre Research Award.
Source: Weizmann Institute of Science
Explore further: Technique illuminates the inner workings of artificial-intelligence systems that process language