If you’ve spent any time listening to the lyrics–and I mean really listening–you may have noticed something strange: repetition. Plenty of repetition.
This hasn’t escaped the attention of data scientists. Colin Morris came up with an algorithm that analyzed the lyrics of 60 years’ worth of pop songs and–well, you’ll see.
In 1977, the great computer scientist Donald Knuth published a paper called The Complexity of Songs, which is basically one long joke about the repetitive lyrics of newfangled music (example quote: “the advent of modern drugs has led to demands for still less memory, and the ultimate improvement of Theorem 1 has consequently just been announced”).
I’m going to try to test this hypothesis with data. I’ll be analyzing the repetitiveness of a dataset of 15,000 songs that charted on the Billboard Hot 100 between 1958 and 2017.
I know a repetitive song when I hear one, but translating that intuition into a number isn’t easy. One thing we might try is looking at the number of unique words in a song, as a fraction of the total number of words. But this metric would call the following lyric excerpts equally repetitive: