Lennon or McCartney? Can statistical analysis solve an authorship puzzle?

July 27, 2018, American Statistical Association
Credit: CC0 Public Domain

Stylometry—the use of statistical techniques to determine authorship—is best known for identifying the Unabomber as Theodor Kaczynski and revealing that Shakespeare collaborated with Christopher Marlowe on the Henry IV play cycle. In textual analysis, it is not the unusual word choice that betrays the hidden voice, but the habitual—the recurring patterns of common words, such as prepositions, that mark the probable identity of one person alone.

It was a mutual Beatles passion—discovered at a conference on Prince Edward Island—that led Mark Glickman, senior lecturer in statistics at Harvard, and Jason Brown, professor of mathematics at Dalhousie University, to wonder whether a stylometric approach could answer the burning question: Lennon or McCartney?

As Glickman explains, for most Lennon-McCartney songs, it is well-known and well-documented which of the two wrote the . However, a surprisingly large number of songs (or portions of songs) have disputed authorship. As an example, no one knows who wrote the music for "In My Life," a track from the 1965 album Rubber Soul, which is ranked 23 on Rolling Stone's The 500 Greatest Songs of All Time. Both Lennon and McCartney remembered differently. "So, we wondered whether you could use data analysis techniques to try to figure out what was going on in the song to distinguish whether it was by one or the other," says Glickman.

With help from former Harvard statistics student Ryan Song, Glickman and Brown "decomposed" each Beatles song from 1962 to 1966 into five representations. Each representation consisted of the of occurrence of a set of musical features within each song. "The basic idea behind our approach," says Glickman, "is to convert a song, whose musical content is difficult to quantify in any direct way, into a set of different data structures that are amenable for establishing a signature of a song using a quantitative approach." Glickman continues, "Think of decomposing a color into its constituent components of red, green and blue with different weights attached. We're doing the same thing with Beatles songs, though with more than three components. In total, our method divides songs into a total of 149 constituent components."

"The first representation simply consists of the frequencies of different commonly played chords, along with aggregations of uncommon chords," says Glickman. "We were able to form 11 chord categories." Then, they characterized melodic notes—notes sung by the lead singer. Third, they recorded the frequencies of occurrence of chord transitions, that is, one chord followed by another chord. Again, certain uncommon chord transitions were aggregated into single categories. Fourth, they recorded the frequencies of consecutive melodic note pairs.

And then, finally, they decomposed songs into four-melodic note "contours." A contour, says Glickman, is a four-note melodic sequence categorized into a series of "ups," "downs" and "stays the same." In other words, if a four-note melodic passage involves four notes increasing in pitch, then the contour would be ("up," "up" "up") because each consecutive pair of notes involves an increase in pitch. Examining four-note contours, says Glickman, adds extra detail that can help distinguish styles of melodic composition.

The reason these five representations can serve as signatures of different musical compositional styles is because, as Glickman points out, there is something well-known about the Beatles' songwriting styles: Lennon typically wrote melodic lines that didn't vary much.

"Consider the Lennon song, 'Help!'" says Glickman. "It basically goes, 'When I was younger, so much younger than today,' where the pitch doesn't change very much. It stays at the same note repeatedly, and only changes in short steps. Whereas with Paul McCartney, you take a song like 'Michelle,' and it goes, 'Michelle, ma belle. Sont les mots qui vont très bien ensemble.' In terms of pitch, it's all over the place."

Their approach to infer unknown or disputed authorship from musical features can be understood in three steps. First, their model posits that each of the frequencies of the 149 musical features within a song depends on the song's author. For example, the "tonic" (the root chord of a song) is assumed to occur with one frequency in Lennon songs, but a possibly different frequency in McCartney songs. Second, they use a common tool in probability called "Bayes rule" to reverse the probability. In other words, starting with the frequency of the 149 musical features knowing a song's author, they determine a model for the probability Lennon or McCartney wrote a song given the frequency of the 149 musical features. This model was then trained using 70 Lennon-McCartney songs or song portions in which the authorship was truly known. Finally, as a third step, the results of this model were applied to Lennon-McCartney songs and song portions in which the authorship was disputed, which resulted in probability predictions for the songs of unknown authorship.

"So, the probability that 'In My Life' was written by McCartney is .018," says Glickman, "which basically means it's pretty convincingly a Lennon song." McCartney misremembers. But "The Word," which Glickman thought was certain to be a Lennon song turned out, according to their model, to be almost certainly by McCartney.

Is there more to this exercise than a fun musical whodunnit? "Yes," says Glickman. "This technology can be extended. We can look at pop history and chart the flow of stylistic influence."

Explore further: Here, there and everywhere: Across the universe with the Beatles

More information: JSM Talk: Assessing Authorship of Beatles Songs from Musical Content: Bayesian Classification Modeling from Bags-of-Words Representations ww2.amstat.org/meetings/jsm/20 … bstractid=329336  

Related Stories

Music really is a universal language

January 25, 2018

Every culture enjoys music and song, and those songs serve many different purposes: accompanying a dance, soothing an infant, or expressing love. Now, after analyzing recordings from all around the world, researchers reporting ...

Recommended for you

After a reset, Сuriosity is operating normally

February 23, 2019

NASA's Curiosity rover is busy making new discoveries on Mars. The rover has been climbing Mount Sharp since 2014 and recently reached a clay region that may offer new clues about the ancient Martian environment's potential ...

Study: With Twitter, race of the messenger matters

February 23, 2019

When NFL player Colin Kaepernick took a knee during the national anthem to protest police brutality and racial injustice, the ensuing debate took traditional and social media by storm. University of Kansas researchers have ...

Researchers engineer a tougher fiber

February 22, 2019

North Carolina State University researchers have developed a fiber that combines the elasticity of rubber with the strength of a metal, resulting in a tougher material that could be incorporated into soft robotics, packaging ...

A quantum magnet with a topological twist

February 22, 2019

Taking their name from an intricate Japanese basket pattern, kagome magnets are thought to have electronic properties that could be valuable for future quantum devices and applications. Theories predict that some electrons ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.