Together with a couple of friends, we’ve created our own personal Awesome Mix Vol.1 Instead of being a tape with 13 songs however, we’ve added roughly 1.500 songs. Now I’m curious as to how our musical taste differs from one another, but also what kind of musical clusters we have created in our playlist.

Let’s get started.

##    Rspotify    spotifyr   tidyverse       knitr  kableExtra    ggthemes 
##        TRUE        TRUE        TRUE        TRUE        TRUE        TRUE 
## highcharter   htmltools widgetframe     cluster  factoextra        here 
##        TRUE        TRUE        TRUE        TRUE        TRUE        TRUE

First, I’ll have to extract the audio features of each song in the playlist. This is where the spotifyr package helps me out. I have removed user names or ID’s for privacy reasons.

#tracks <- get_playlist_audio_features("xxxxxxxxxx", playlist_uris = "xxxxxxxxxxxxxxx")

# Keep relevant information
tracks <- tracks %>%
    select(artist_name, track_name, album_name, album_img, 
           track_popularity, danceability, energy, loudness,
           speechiness, acousticness, instrumentalness, liveness,
           valence, tempo, key, key_mode, duration_ms,
           time_signature, track_preview_url, track_open_spotify_url) 

head(tracks, n = 5) %>%
    kable(format = "html") %>%
    kable_styling(bootstrap_options = c("hover", "striped", "responsive", "condensed"), 
                  full_width = T,
                  position = "left") %>%
    scroll_box(width = "100%")

artist_name	track_name	album_name	album_img	track_popularity	danceability	energy	loudness	speechiness	acousticness	instrumentalness	liveness	valence	tempo	key	key_mode	duration_ms	time_signature	track_preview_url	track_open_spotify_url
Editors	All the Kings	IN DREAM	https://i.scdn.co/image/b45a68abbf289097f42b224e10ae834f2547f594	43	0.440	0.539	-9.039	0.0391	3.84e-01	0.00e+00	0.196	0.108	115.019	D	D minor	293562	4	https://p.scdn.co/mp3-preview/bcbe1c796000b1e94a1a67a68ab0caa3b4bdb395?cid=209f3c299b644b06acd255c0166fe5bb	https://open.spotify.com/track/7vsqpQcPaBWzAFvoopHrCd
alt-J	Tessellate	An Awesome Wave (Deluxe Version)	https://i.scdn.co/image/70b570b709ac08c4d700386ff15030ae88a18678	48	0.681	0.608	-6.471	0.0449	3.64e-01	4.93e-02	0.119	0.418	116.878	D	D major	182667	4	https://p.scdn.co/mp3-preview/106ca0041294360730fcd351c438b35bafdc3196?cid=209f3c299b644b06acd255c0166fe5bb	https://open.spotify.com/track/1QXzQKmQiDOzGHwSXVdHTp
Weezer	Back To The Shack	Back To The Shack	https://i.scdn.co/image/a44dcda7b7b87761c2b42a3d7eb9a457429a9906	9	0.435	0.706	-5.310	0.0428	6.05e-03	7.25e-05	0.119	0.658	171.913	C#	C# major	186613	4	NA	https://open.spotify.com/track/4pHQSaOkLN3BvHPRjVm8ws
The Offspring	Want You Bad	Conspiracy Of One	https://i.scdn.co/image/b82ca2c8074ac5dbb560561b9a14578b4087375f	4	0.487	0.969	-4.293	0.0505	6.59e-05	1.20e-06	0.278	0.626	105.539	E	E major	202600	4	NA	https://open.spotify.com/track/09ZEB3X2oswrIBBuzuVLEt
Imagine Dragons	I Bet My Life	I Bet My Life	https://i.scdn.co/image/3db65a1df5dacd133d229141e3527fdf3481c132	29	0.558	0.649	-8.033	0.0389	2.29e-01	5.23e-04	0.312	0.570	107.894	C#	C# major	192893	4	NA	https://open.spotify.com/track/7q2f7lhHTv7j7EFG0vplwA

Perfect! Almost. I’m missing the column Added By, which does show in our spotify playlist. Unfortunately, when I simply tried to copy and paste our complete track overview of the playlist, each record would give me the spotify link to the song (e.g. https://open.spotify.com/track/17g3YBfU8QfYtkgZGI8tTT) rather than the actual data. A quick google search for “Export Spotify Playlists” got me to a JavaScript app called Exportify. This worked like a charm, and provided me with a downloadable .csv file.

#Import Exportify raw .csv data
mixtape_raw <- read_csv(here("static", "data", "Spotify/awesome_mixtape_1.csv"))

This .csv file did include the Added By column, which we can add to the tracks dataset after some data transformations.

track_name	artist_name	added_by
All the Kings	Editors	G
Tessellate	alt-J	M
Back To The Shack	Weezer	M
Want You Bad	The Offspring	V
I Bet My Life	Imagine Dragons	M

Cool. Time to join the two dataframes together and start with the analysis!

mixtape <- tracks %>% 
    inner_join(mixtape_less_raw, by = c("track_name", "artist_name")) %>%
    filter(valence > 0)

Awesome Music Analysis

Spotify adds a bunch of music statistics to each song. I’ll be using these statistics to find out how different our music tastes are, and where they are different (if at all…, we do share a playlist after all). I’ll be looking mainly at the following features:

Feature	Description	Values
Danceability	Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity.	0 to 1
Energy	Energy represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.	0 to 1
Valence	A measure describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).	0 to 1
Loudness	The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude).	-60dB to 0dB
Tempo	The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.	0 bpm to 250bpm
Popularity	Although not a musical feature, the popularity index provides a way of determining how popular a song is. The exact description is not provided, but I’m sure it has a well thought-out algorithm underneath.	0 to 100

Cue the violins!

Normally I’d use boxplots to see if the densities in our musical taste is different from eachother.. Except we are dealing with music here, which just screams the use of violin plots.

We mostly see that G has a thicker upside on Valence - he tends to like happier songs than the others. Specifically, V tends to add more songs low on the Valence scale. M and S are somewhere in between. On Danceability and Energy there is not much to say about the differences - we seem fairly identical in those regards.

Well I guess that is clear. We have extremely similar taste when it comes to individual attributes. Popularity, Tempo and Loudness don’t really seem to have any clear distinction among us.

One thing that is missing however, is the combination of features! The combination of both Energy and Valence can give some very diverse results. In the interactive plot below all of the songs are added with their attached value on these two attributes. This plot is heavily inspired by the Sentify app created by RCharlie. He has attached meaning to the value combinations in an arbitrary way.

A track with high energy and high valence will be an active happy song, while a low energy and low valence song will be a more sad and depressing song. Anyways, enjoy lookin at the songs and our individual tastes, and what songs classify as happy or sad!

Most songs seems to be in the upper half of the energy atmosphere. Which seems reasonable considering our preference for Rock and Punk-Rock styles. However, we do see a slight difference in the energy / valence combination of songs for V. He tends to like more negatively loaded energetic songs from artists like Sum 41, Muse or Shinedown. Especially G seems to enjoy the happier highly energetic songs from artists such as Smash Mouth, The Strokes, The Kinks and The Bloodhound Gang. M and S are on the more neutal spectrum with regards to valence. They prefer songs that are not overhally happy or sad.

Finding Awesome Clusters

So in which musical clusters can we divide our Awesome Mixtape #1 playlist? To answer this question I will be using the K-Means clustering method. The basic idea behind k-means clustering consists of defining clusters so that the total within-cluster variation is minimized.

The drawback is that we have to specify the number of clusters we want the data to be devided into. In order to determine the optimal number of k I emply both the elbow method and silhouette method as seen below. Although the silhouette method suggests k = 2 as the optimal number, this will not provide me with much more detail. Therefor I’ll guide my choice based on the elbow method, where at k = 5 the line starts bending somewhat.

# Centering and Scaling is necessary for the k-means to work properly
cluster_df <- mixtape %>% 
    select_if(.predicate = is.numeric) %>%
    map_df(scale)

set.seed(123)

gridExtra::grid.arrange(
fviz_nbclust(cluster_df, kmeans, method = "wss") + 
    theme(plot.title = element_text(hjust = 0.5),
          plot.subtitle = element_text(hjust =0.5)) +
    labs(subtitle = "Elbow Method"),
fviz_nbclust(cluster_df, kmeans, method = "silhouette") +
    theme(plot.title = element_text(hjust = 0.5),
          plot.subtitle = element_text(hjust =0.5))+
    labs(subtitle = "Silhouette Method"),
ncol = 2)

Creating final clusters with k = 5.

final_clusters <- kmeans(cluster_df, 5, nstart = 25)

mixtape_clusters <- mixtape %>% 
    mutate(cluster = final_clusters$cluster) %>%
    select(cluster, artist_name, track_name, danceability,
           energy, valence, loudness, speechiness,
           acousticness, instrumentalness,
           liveness, tempo, duration_ms)

Now that everything is done, we can start looking at the clusters, and see what kind of distinction the algorithm made.

mixtape_clusters %>% 
    group_by(cluster) %>%
    select(-artist_name, -track_name) %>%
    summarize_all(mean) %>%
    mutate_all(round, 3) %>%
    kable(format = "html") %>%
    kable_styling(bootstrap_options = c("hover", "striped", 
                                        "responsive", "condensed"), 
                  full_width = T,
                  position = "left") %>%
    scroll_box(width = "100%")

cluster	danceability	energy	valence	loudness	speechiness	acousticness	instrumentalness	liveness	tempo	duration_ms
1	0.479	0.672	0.335	-7.996	0.050	0.162	0.679	0.181	126.661	317651.7
2	0.527	0.466	0.325	-9.669	0.044	0.422	0.033	0.149	112.241	268738.4
3	0.395	0.836	0.394	-5.084	0.071	0.030	0.022	0.242	146.018	245262.5
4	0.397	0.619	0.340	-7.430	0.049	0.223	0.053	0.196	132.234	256704.8
5	0.578	0.797	0.617	-5.483	0.059	0.082	0.020	0.174	118.088	224196.1

Although the distinction is hard to tell in this way, I see the following patterns:

Cluster 1 - Fun, happy, dancable and energetic songs.
Cluster 2 - Angry, up-tempo songs
Cluster 3 - Instrumental, acoustic songs
Cluster 4 - High tempo and energetic instrumental songs
Cluster 5 - Far more likely to be live performance songs

Let’s try to see if this fits with some songs for each cluster!

Cluster 1	Cluster 2
The National - Fake Empire	Bear’s Den - New Jerusalem
The Notwist - Consequence	Ed Sheeran - Little Lady - Mikill Pane
Paul Kalkbrenner - Sky and Sand	Editors - Ocean of Night
Porcupine Tree - Lazarus	Causes - Teach Me How To Dance With You
Dropkick Murphys - 4-15-13	The Whitest Boy Alive - High On The Heels
Mando Diao - Black Saturday	The Head and the Heart - Lost In My Mind
Sum 41 - Exit Song	Lou Reed - Walk on the Wild Side
Explosions In The Sky - Your Hand In Mine	Pearl Jam - Black
Muse - Resistance	Bear’s Den - Sophie
Klangkarussell - Sonnentanz	Kaleo - Save Yourself

Cluster 3	Cluster 4
Bon Jovi - These Days	Lord Huron - Meet Me in the Woods
Mansun - Wide Open Space	Lord Huron - The Night We Met
Editors - Spiders	Seafret - Be There
The Bohicas - Where You At	Tom Grennan - Giving It All
Foals - What Went Down	Biffy Clyro - The Captain
Thirteen Senses - Thru The Glass	System Of A Down - Lonely Day
Wolfmother - Victorious	Donovan - Catch the Wind
Thirty Seconds To Mars - Vox Populi	The National - Heavenfaced
U2 - City Of Blinding Lights	Damien Rice - Amie
Rival Sons - Keep On Swinging	Muse - Dig Down

Cluster 5
The Raconteurs - Steady, As She Goes
The Proclaimers - I’m Gonna Be (500 Miles)
Admiral Freebee - Einstein Brain
Typhoon & New Cool Collective - Bumaye
The White Stripes - You’re Pretty Good Looking
Rage Against The Machine - Killing In the Name
Bob Dylan - Hurricane
Mumford & Sons - Wilder Mind
Genesis - Jesus He Knows Me - 2007 Digital Remaster
The Killers - The Man

Awesome Conclusion

As a first impression, the songs fit the descriptions I made quite well. I do think that I put too much emphasis on energy. For example cluster 2 does have quite negatively loaded songs, but aren’t necessarily energetic songs. Cluster 3 seems to fit the acousticness value quite well. The addition of Xavier Rudd, Dermot Kennedy and Luke Sital-Singh confirm the slow acoustic nature of this cluster. In Cluster 5 we see only one live performed song, yet still one more than in the other clusters. So perhaps if we had a bigger sample of songs we’d see more added live songs in this cluster.

In the end, I feel like the clusters made some nice distinctions. Perhaps in the future I could add more metadata or even sentiment of the songs by analyzing the lyrics of a song with the Genius API.

An Awesome Spotify Playlist Analysis

Let’s get started.

Awesome Music Analysis

Cue the violins!

Finding Awesome Clusters

Awesome Conclusion

Stefan Musch

An Awesome Spotify Playlist Analysis

Let’s get started.

Awesome Music Analysis

Cue the violins!

Finding Awesome Clusters

Awesome Conclusion

Stefan Musch

Advanced Marketing Analytics

An Awesome Spotify Playlist Analysis

Creating your own beautiful holiday poster!

Predictive Modeling - Regression Summary

How Jolly are Christmas Songs Actually?

Reading Excel & SPSS Files in R | Tips & Tricks

Dutch Housing Situation - Rotterdam

My Tracked Locations

My R Blog Introduction!