Essay Genome Project applies algorithmic analysis to writing
Just like the Human Genome Project maps out DNA to understand the human body, the Essay Genome Project analyzes defining characteristics of essays for similarities between a student's work and a well-known classic or contemporary essayist.
An essay is classified as a short piece of writing on a particular subject.
"At times the essay has been considered the 'step-child' to other writing forms," Spotts said. "However, the essay is not a new writing tradition. It's been around for hundreds of years."
With technical assistance from the Office of Digital Humanities, Madden, Spotts, Bulsiewicz and a team of students created a corpus for the Essay Genome Project, a searchable compilation of essays by authors from the past 500 years. Corpus analysis has been previously used for poetry and drama, but the Essay Genome Project is the first to create a corpus for essays.
The essays in the corpus are analyzed by a computer algorithm that identifies the frequency at which authors use common words and phrases, as well as stylistic, tonal and formal similarities in the writing.
Anyone can submit their own essays or blog posts to the corpus. Within seconds the algorithm will share personalized information about their writing style, including a list of essayists with similar styles.
Not only does the corpus help writers improve their skill, but the research also examines which essayists have had the greatest influence throughout time and whether originality exists.
The corpus was designed to compare authors and the evolution of essay subjects through differing time periods and geographical locations. It also traces a writer's literary ancestors and descendants.
"We want students to read many essayists, not only contemporary ones," said Madden. "Students who are well versed develop an appreciation for the tradition of the essay, recognize the ways they've been influenced and make personal connections with past and present essayists."