How languages are built
Parents are often amazed by the speed at which children acquire language in early childhood, becoming fluent around three years of age. Compare this with the average adult attempting to acquire a second language, and its a quite remarkable achievement.
A five-year research project led by Professor Ian Roberts from the University of Cambridge aims to work out what it is about how a language is built that guides a childs innate ability to acquire it.
In the late 1950s, the American linguist Noam Chomsky suggested that children are born with an innate ability to acquire language a blueprint for speaking any language on the planet. According to Chomsky, encoded in the human brain is an innate set of linguistic principles he called the universal grammar that encompasses all of the properties that any language can have. The language the child then actually speaks is simply determined by exposure to the language (or languages in the case of a multilingual family) they hear as they develop.
But precisely how a universal grammar might underlie the range of languages we have today, not to mention the many past languages that have vanished completely, is a continuing puzzle, as Professor Roberts explained: If you talk about a universal grammar then you might naturally think there is a universal language, when of course there isnt. Rather, there are thousands of different languages.
The central notion is that the specification that the child has in the genome, the universal grammar, must be of the most abstract, general, structural properties of language and that different languages manifest these properties in slightly different ways, he added. The empirical question then is to work out what it is about a language that guides the childs innate ability to acquire it. In other words, to understand how Chomskys theory could work, we need to work out how languages are built.
One way to investigate the variation between languages is to suppose that there is in fact very little difference, and that each language can be deconstructed to a typological footprint that defines it. This is the hypothesis that Professor Roberts and his team have now set out to investigate over the next five years, with 2.5 million funding from the European Research Council (ERC) Advanced Investigator Grant scheme.
This starting premise is almost certainly going to prove too simplified, admitted Professor Roberts, but in the process of homing in on precisely how languages are built, what we hope will emerge is a new perspective on comparative grammar for the languages of the world.
The idea that languages can be categorised into different types is not a new one but this project will break new ground in syntactic theory (the understanding of how sentences are constructed) by exploring how different languages measure up in terms of a set of five structural properties defined by the team.
Professor Roberts believes that a relatively small number of structural properties are needed to define each languages unique footprint and that this footprint is crucial to learning the language, as he explained: We think that while the innate universal grammar may determine certain gross features of language, it is encountering this footprint that fine-tunes the acquisition of language in children.
A linguistic duck-billed platypus
The carefully chosen properties under investigation relate to the more abstract, structural features of languages. These properties are not always immediately apparent from surface data and require a bit of analysis to discover, said Professor Roberts. If children acquiring language can discover such complex properties spontaneously, this probably reflects their innate abilities since they are doing more than simply reproducing patterns.
One example is the order of words in a sentence. In English, for instance, the word order follows subject-verb-object (as in John loves books). Although this is one of the most common word orders in the world, its by no means the only one. In fact, all of the logical permutations of subject, verb and object can be found in different languages but in very different frequencies, the most frequent being subject-object-verb (John books loves) in languages such as German and Japanese. In languages like Mohawk, words can even be combined to form new verbs (John bookloves).
Each language will have its own rules for this property, and for each of the other five properties being analysed; the task of the team is to identify these patterns. They will look at thousands of languages, from the languages of Europe to the Bantu languages of the sub-Sahara, from Caribbean languages to the Carib languages of the indigenous peoples of Brazil, and from Navajo to Nepalese. Information will be garnered from online grammars, original historical documentation of language structures and, where feasible, native-speaker consultants.
There are bound to be quirks and anomalies the linguistic equivalents of the duck-billed platypus a language that is just weird and doesnt quite fit into the big categories, said Professor Roberts. But of course these isolated cases are interesting in themselves. We hope to reach a situation as we refine the classificatory system when we can make predictions about what types of languages are out there.
Professor Roberts hunch is that the classification will turn out to be more complicated than the team initially envisaged: I suspect that we will need to evolve the properties as we go along until we arrive at the perfect set. Thats what we are most interested in doing. Its the first time this has been done systematically or on this scale.
Tantalisingly, when the researchers arrive at a set of properties that categorise the structure of the languages of the world, the results will not only reveal relations among language families but could also tell us something about ancient patterns of human migration.
The main aim of the project though is to deepen our fundamental understanding of how languages vary and how the human mind works in acquiring language, explained Professor Roberts: Our current view is that language is not pre-specified but rather under-specified in the sense that there are certain aspects about the structure of language that universal grammar doesnt say anything about. These gaps appear to be filled in as the child develops by cognitive mechanisms that work with the properties of the language they hear, and it is these properties that we aim to define.