Big data analysis of state of the union remarks changes view of American History

August 10, 2015, Columbia University
Big data analysis of state of the union remarks changes view of American History
Researchers used computational techniques to map recurring words and their relation to each other across 224-years of State of the Union remarks. Viewed as a network, the words point to common themes and disruptions in political discourse. Credit: Courtesy of the authors

No historical record may capture the nation's changing political consciousness better than the president's State of the Union address, delivered each year except one since 1790.

Now, a computer analysis of this unique archive puts the start of the modern era at America's entry into World War I, challenging histories placing it after Reconstruction, the New Deal or World War II. A team of researchers at Columbia University and University of Paris published their results this week in the Proceedings of the National Academy of Sciences.

Though discussion of industry, finance and dominate the record year after year, the study shows that modern political thought, defined by nation building, the regulation of business and the financing of public infrastructure, emerges with a sharp line after WWI.

"We know what constitutes modern political thinking but until now have been unable to say exactly when it originated," said the study's senior author, Peter Bearman, a sociology professor at Columbia and a member of the Data Science Institute. "Overall, our study finds striking continuity throughout the State of the Union address and a few major changes. Surprisingly, we find that key moments of disruption were unrelated to changes in the mode of delivery."

The researchers developed algorithms to analyze the nearly 1.8 million words used by American presidents in their State of the Union addresses, from George Washington's penned remarks in 1790 to Barack Obama's televised speech in 2014. By identifying how often words appeared jointly, and mapping their relation to other clusters of words, researchers were able to infer the dominant social and political discourses of the day and chart their evolution over time.

Big data analysis of state of the union remarks changes view of American History
The researchers place the shift to modern political discourse in 1917, as keywords focused on the economy, public spending, government regulation and nation building emerge (covered by "Domestic Policy" in red and "Foreign Policy" in dark green). Credit: Courtesy of the authors

They were surprised to see 1917 jump out so clearly. As the United States joined Allied forces in the war against Germany, the researchers found a new set of terms recurring in the State of the Union address. On the topic of foreign policy, "democracy," "unity," "peace" and "terror" emerged as keywords, replacing older notions of statecraft and diplomacy. By the 1940s, a cluster of terms centered on the Navy, perhaps signifying an isolationist foreign policy, all but disappears. "Suddenly the U.S. is no longer an island," said Bearman.

The researchers also found a shift in terminology around domestic policy, as a new conversation over the size of government and its role in regulating the economy and providing equal opportunity emerges. Though the underlying focus stays the same, keywords such as the "Treasury," "amount" and "expenditures" are replaced by "tax relief," "incentives" and "welfare" as America transitions from a classical political economy to the modern welfare state.

"Though the language and entire discourse of governance changes, the conversation streams remain continuous," said study lead author Alix Rule, a graduate student at Columbia.

One challenge in studying two hundreds years of political discourse is that language naturally evolves. Words may stay the same but acquire new meaning; new words may come into use to describe recurring themes. The study uses network-analysis techniques developed by coauthor Jean-Philippe Cointet, a physicist at the University of Paris, to highlight the meaningful changes, and show how some political topics morph into similar topics with common threads while others peter out and die.

The techniques allowed the researchers to capture the meaning of words in relation to other words and in the broader context of evolving topics. The study found that in America's early history the word Constitution was most commonly associated with "people." After the Civil War, "Constitution" became most closely linked to "state," soon to be linked to "law" during WWI and WWII, before becoming associated with the word "people" again in the 1970s. What the word Constitution means at any given time, the authors argue, depends on the words it is linked to.

David Blei, a statistician and computer scientist at Columbia's Data Science Institute who was not involved in the research, says the study pushes the boundaries for statistical machine learning of language. "The authors have developed an impressive and ambitious methodology for revealing the flow of thought and sentiment within a sequence of political texts," he said.

Explore further: State of the nation's egotism: On the rise for a century

More information: Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790–2014,

Related Stories

Keywords hold vocabulary together in memory

May 19, 2014

Much like key players in social networks, University of Kansas scientists have found evidence that there are keywords in word networks that hold together groups of words in our memory.

How arbitrary is language?

August 14, 2014

Words in the English language are structured to help children learn according to research led by Lancaster University.

How language gives your brain a break

August 3, 2015

Here's a quick task: Take a look at the sentences below and decide which is the most effective. (1) "John threw out the old trash sitting in the kitchen." (2) "John threw the old trash sitting in the kitchen out."

Recommended for you

Fish-inspired material changes color using nanocolumns

March 20, 2019

Inspired by the flashing colors of the neon tetra fish, researchers have developed a technique for changing the color of a material by manipulating the orientation of nanostructured columns in the material.

Researchers shed new light on the origins of modern humans

March 20, 2019

Researchers from the University of Huddersfield, with colleagues from the University of Cambridge and the University of Minho in Braga, have been using a genetic approach to tackle one of the most intractable questions of ...

One transistor for all purposes

March 20, 2019

In mobiles, fridges, planes – transistors are everywhere. But they often operate only within a restricted current range. LMU physicists have now developed an organic transistor that functions perfectly under both low and ...


Adjust slider to filter visible comments by rank

Display comments: newest first

3 / 5 (2) Aug 10, 2015
Politics: The bane of liberty.
not rated yet Aug 11, 2015
So Wittgenstein and modern linguistic philosophy is wrong--word meaning does not lie in how words are used but the company they keep. Should we not now close philosophy departments and replace them (to take a cue from the researcher's affiliations) with Interdisciplinary Centers for Innovative Theory and Empirics and Laboratoire Interdisciplinaire Sciences Innovations Sociétés?
PS the article's advanced publication pdf is not behind PNAS's paywall

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.