How Recommendation Algorithms Work: From YouTube and Video Hosting Platforms to Music Services?

Recommendation algorithms are everywhere, guiding what you see, buy, and watch online. But how do they work?

February 13, 2024

Despite their widespread use, the principles of how algorithms work remain a mystery to many. And where there are mysteries, there are concerns: what if recommendation algorithms actually place us in an information bubble and don't let us escape from it? Do they deprive us of real choice and leave us only with its illusion? Let's delve into how things really stand

A Look into History

The first "real" recommendation algorithm can be considered the online library Granby. Readers were surveyed about their reading preferences upon joining. Based on similar interests, groups were formed, and recommendations were made for these groups.

Can you already guess how the human factor hindered the correct operation of this algorithm? That's right: just as people want to appear more attractive on a first date than on any other day, library visitors wanted to appear more well-read.

Nevertheless, it was a useful experience and a necessary first step. Primarily because it allowed us to identify two important questions, the search for answers to which shaped the current algorithms in the form in which they exist today.

The first question: how to recommend something without asking the user in detail about their preferences? The second question: how to determine similarity? Not the most successful experience: algorithms based on memorization. So, we decided not to ask users what they like. Then how do we find out?

One option is to carefully observe what the user does and record it. Let's use the same library example: we don't ask a person if they like Dreiser, we just see which books they actually borrow. For example, active reader A constantly reads Mark Twain, Emily Dickinson, and Lovecraft.

Less active reader B has borrowed a few books by Twain, Emily Dickinson, and Lovecraft. What can the algorithm recommend to them? The algorithm will reason like this: I remember that a person who likes the same things as you also liked Dreiser. And it will suggest reading " The Financier" or "Sister Carrie".

It seems like a great system! But even here, there's a catch. Let's imagine for a moment that reader A, for some reason, decided to read "Fifty Shades of Grey". An algorithm that can only memorize but not analyze will confidently recommend this book to reader B. And reader A will start getting recommendations based on what E.L. James fans like. Embarrassing.

That's why algorithms based on memorization quickly gave way to a new generation.

Modern Algorithms

Let's consider a few more interesting ways to find the optimal option for you.

If you often visit online stores selling electronics, you probably easily remember how along with the product, they often suggest buying something else. For example, a phone - a suitable case, a laptop - a carrying bag, and so on.

This is also a recommendation algorithm, but one that analyzes not you, but the products in the store.

This method is called collaborative filtering. It is based on the fact that certain categories of products are associated with similar categories, meaning the algorithms only need to understand which specific product from this category to recommend.

For online stores, this method is often more effective. Imagine for a moment that it was the other way around. A customer buys a screen protector for an iPhone, a frying pan, an electric razor, and a floor scale; not because they all go well together and complement each other, but because they needed exactly that. Now, should we recommend electric razors to all customers buying frying pans?

Machine learning is also widely used nowadays. This is the name for the field of artificial intelligence that deals with creating systems capable of learning and developing based on the data they receive.

Recommendation algorithms that use machine learning are more complex and accurate. They can track patterns, find connections. Let's return to the example with books: unlike earlier algorithms, such an algorithm will already be able to understand that "Fifty Shades" is an exception, not the rule. And if there are many such exceptions, the algorithm will detect a new trend and adjust accordingly.

Machine learning also helps deal with the cold start problem - a situation where we know too little about the user to provide them with any relevant advice. No, initially it will be like that, but the more information you feed the algorithm, the more accurately it will work.

From Theory to Practice

And how do you recommend, for example, music? What should we rely on?

‍
To begin with, you can ask the user to name their favorite genre. This will give machine learning algorithms at least some point of reference.

It may seem like this is enough: just give the person new tracks from their favorite genres, and they will be satisfied. But here immediately arises the problem of differentiation. If we take broad genre criteria, they can include very dissimilar artists and groups, and if we take too narrow ones, the count of genres will go up to thousands, and searching for something new will have to be done from a very limited number of options. You can try not to rely on genres, but on artists. This approach has its place: some streaming services, upon registration, ask not for genres, but for favorite artists and groups.

And there's a third, less obvious option: to look not at the "label," but at the track itself. Instead of searching for similarities through genres or authorship, we can do it through the music itself. If a person were doing this, it would be simple: they would listen to tracks and say which ones are similar. But how can an algorithm "hear" a song?

Here, everything is quite simple. Any audio recording is a combination of sound vibrations. Different music will have different frequencies predominant. Some will have a lot of bass, others will have a lot of high frequencies, responsible, for example, for vocals. All of this is measurable, which means it can be analyzed and compared.

And your interactions with a specific track will be taken into account when selecting or filtering out similar ones. Liked it, disliked it, listened to it all the way through, skipped it, searched for it intentionally - both direct interactions and indirect signs are taken into account.

Still in a bubble?

Let's return to where we started. No, recommendation algorithms do not benefit from acting in this way. On the contrary, they try to find a way to show you something new, something you haven't seen or heard yet. But not just anything, but something you are likely to enjoy. Because different services primarily compete with each other for the most valuable resource - our time. If they constantly offer the same thing - the user will get bored and stop using them; if they often miss the mark with recommendations - they'll shrug and go to a competitor.

‍