We all use a variety of online services like Youtube, Netflix, and Amazon. And we get suggestions that match our preferences and amazingly fit our tastes. This is a convenient tool that certainly makes our lives simple when we are out of ideas.

But have you ever wondered how these sophisticated systems recommend items to us, the users? Have you ever heard about content based recommendation? In this article, we will explain thoroughly and discuss recommender system types and how they work, and we will provide deep insight into content based filtering. 

It is unnecessary to discuss the fact that artificial intelligence (AI) and machine learning (ML) have become mainstream. The development of computer systems has led to the evolution of processes that use algorithms to analyze and draw inferences from collected data without following explicit instructions. Many industries, including the energy, medical, and financial sectors, use these systems to perform different tasks and analyze facts to make critical decisions.

Among the myriad uses, the AI and ML power recommender systems. These systems make recommendations for products, services, or content that users might like while shopping online, choosing a movie or song to listen to, or browsing through the news article section.

Customers appreciate when businesses perceive their liking and value recommendations highlighting items, products, and features that match their interests. That is the exact reason why companies use recommendation systems to produce customized suggestions and provide a personalized experience to the buyer, improve customer retention and brand loyalty, and ultimately, grow revenue.

What Is a Recommender System?

Recommender systems are machine learning algorithms that generate relevant recommendations for items that match user preferences and might interest them. These systems are software tools and algorithms that analyze available data to provide suggestions for items that users could find useful. Depending on how they analyze the data sources to develop concepts about the affinity between users and items and to identify well-matched pairs, there are different recommender systems. 

Our daily online trips, from e-commerce to online advertising, are closely related to some recommendation engines. When we search for something in the search engine, the industry-strength recommender systems make suggestions, and some real-world examples are recommendations for books on Amazon, movies on Netflix, and songs on Youtube.

This feature makes recommendation systems a powerful tool for Internet users and service providers. On the one hand, the business’s sales, profits, and revenues are improved; on the other hand, users get customized and accurate recommendations, thus saving time and energy and reducing transaction costs of finding and selecting items online.

In order to make recommendations for a particular user, the recommender system analyzes collected data from different data sources, such as user evaluations and suggestions, user or item descriptions or constraints, social relationships, past purchases, search history, and user activities.

Importance of Using Recommender Systems

On the Internet, there is an immense number of choices. In the abundance of options, there is an inevitable need to filter, prioritize and effectively deliver relevant information to the user. Too many Internet users and items available for those users have led to information overload.

Recommendation engines solve this problem by searching through the large volume of dynamically generated information and providing suggestions for personalized content and services.

The need to use effective and accurate recommendation techniques within a system that will provide relevant user recommendations cannot be over-emphasized. When there are many options to choose from, it’s natural to be confused and indecisive. Recommendation systems proved to be effective in the decision-making process and quality.

Based on the browsing and purchasing history, patterns, and other user activity data, the recommendation system eliminates the options that do not align with the user’s taste and past behavior. The information filtering systems predict whether the particular user would like an item or not based on the user’s profile, which is built on the user’s preferences, interests, or observed behavior about the item.

Types of Recommender Systems

The design of recommendation engines depends on the domain and the particular characteristics of the available data. Moreover, these systems differ in how they analyze these data sources and find patterns in consumer behavior data, which can be collected implicitly (information collected from activities such as web search history, clicks, cart events, search log, and order history) or explicitly (information gathered from customer input, such as reviews and ratings, likes and dislikes, and product comments).

There are three primary types of recommender systems:

  • Content-Based Filtering

uses similarities in item features, services, or content, as well as data accumulated (such as previous actions and explicit feedback) about the user preferences, to make recommendations.

  • Collaborative Filtering 

relies on similar users’ reactions and preferences to filter out items and offer recommendations that the particular user might like.

  • Hybrid Recommender Systems 

Combine two or more recommendation strategies in different ways and benefit from their complementary advantages to make recommendations.

In this article, we will discuss about content-based recommender systems and how they work, and we will define several important terms, as well as assets and liabilities of content based filtering.

What Is Content-Based Filtering and How Does It Work?

Content based filtering recommender systems are unique types of information filtering systems that deliver recommendations for items selected from an extensive collection based on the user’s activity and user preferences. 

The content based recommender system is a domain-dependent algorithm that analyzes the keywords and attributes assigned to items in the database to generate predictions that the user will likely find useful. The user profile is created based on data derived from the user’s actions while connected to the Internet, such as purchases, ratings (likes and dislikes), downloads, items searched for on a website, added to favorites or placed in a cart, and clicks on product links.

The recommendations are based on the user profile by utilizing features extracted from the content of the items the user has evaluated. The filtering system classifies the unseen items into a positive class (relevant to the user) or a negative class (irrelevant to the user). Products or services related to positively rated items are recommended to the user. Based on the item information, represented as attributes, the algorithm analyzes the similarities between items and correlations between contents. That being said, content based filtering is considered one of the most successful recommendation techniques.

For example, suppose a user bought a smartphone from a website and searched for headphones, wireless chargers, and accessories. In addition to keywords such as the smartphone manufacturer and model, the user profile also generates data about previous purchases for phone holders with sleeves for credit cards. Based on the available information, the recommender system may suggest phone holders with attributes such as RFID-blocking fabric that prevents unauthorized credit card scanning. Also, based on previous activity, the recommender system may suggest a phone mount for a car vent, a waterproof phone case, or cell phone armbands. The user might search for different phone holders, but certain features may be something they didn’t expect yet appreciate and find helpful.

Content Based Recommendation Filtering Techniques

In order to find similarities and generate meaningful recommendations, content based filtering uses different models, including Vector Space Model such as Term Frequency Inverse Document Frequency (TF/IDF) or Probabilistic models such as Naïve Bayes Classifier, Decision Trees or Neural Networks. These models analyze the underlying model with statistical analysis or machine learning techniques.  

The most commonly used methods for content based filtering are the vector spacing method (method 1) and classification model (model 2), both of which use different models and algorithms.

Method 1: The Vector Space Method

Suppose a user has watched a thriller movie with Tom Cruise and a comedy movie. The user reviewed the thriller movie as good and the comedy as bad. Based on the information provided by the user, the rating system is made, and from 0 to 9, the crime thriller and detective genres are ranked 9, and the comedy movies are ranked 0. With this information, the next recommendation for this particular user will be for crime thriller genres.

For this ranking system, a user vector is created, which ranks the information provided by the user. And then, an item vector is created where movies are ranked according to their genres. With the vector, each movie is assigned a certain value by multiplying and getting the dot product of the user and item vector, and the value is used for recommendation.  

Method 2: Classification Method

In the classification method, the system makes recommendations based on the decision tree and decides whether the user wants to watch a movie or not.

For example, a movie is considered: The Notebook (romance/ drama). Based on user data, Tom Cruise is not a cast actor, and the genre is not a thriller, nor is it the type of movie the user ever reviewed. With these classifications, the system concludes that this movie shouldn’t be recommended to the particular user.

Important Terms and Definitions

Assigning Attributes

Content based filtering relies on assigning attributes to database objects and thus enables the algorithm to analyze each object and generate the most similar items. These features predominantly depend on the products, services, and content the system recommends.

For the content based recommendation system to generate relevant suggestions, creating an item profile with different features representing each item’s essential qualities is crucial.

Assigning attributes can be a gigantic project that undertakes time and a specific set of skills. Many companies employ subject-matter expert teams to assign attributes to each item manually. Additionally, the attributes must be added and updated constantly to keep the recommendations accurate, which is a daunting task for brands with a high volume of products.  

Attributes are placeholders associated with an individual entity. An attribute describes an entity such as name, size, color, type, condition, etc., and can be either a numeric value or character descriptor (single word). Moreover, the attributes themselves must be precise and error-free. Labeling a phone holder as “black” is easy, but to be recognized by the algorithm, more complex content is required to label each item correctly.

Building a User Profile

Another crucial element for content based recommender systems is the user profile. A user profile is created with data collected from the processes of finding, extracting, integrating, and identifying keyword-based information from various sources (user interactions on the web) to generate a structured profile.

The profiles contain valuable information about the user’s preferences and tastes based on database objects the user has interacted with. The vectors that define the user’s preferences consist of user activities such as purchasing history, browsing history, user ratings, number of clicks on different items, thumbs up or thumbs down on content, what the user has read, watched, or listened to, and their assigned attributes. 

The database is loaded with attributes, and the ones appearing across multiple objects are weighted more heavily than those appearing less often. Consequently, not all attributes are equal to the user, which helps to establish a degree of importance. Besides, user feedback is critical when considering items, which is why many websites ask users to rate products, services, or content explicitly.

Based on attribute weightings, histories, and user preferences, the recommender system creates a unique model for each user. The model consists of attributes the user likes or dislikes, and the suggestions are weighted by importance. Then, the algorithm compares the user model against all database objects and assigns scores based on similarity with the user profile.

The user profiling information helps recommender systems suggest new and personalized items to the user. By using relevant information, the engines solve issues such as the classification and ranking of items in accordance with the individual’s interest. 

For example, let’s say that the user listened to “Shake It Off” by Taylor Swift, “All About That Bass” by Meghan Trainor, and “7 Rings” by Ariana Grande. The recommender system might recognize that this particular user likes female pop artists and dance songs. The user will likely receive recommendations for more dance songs by these and other female pop artists, such as Katy Perry and “California Gurls.”

content based recommendation

The recommender system may also suggest different types of songs by Katy Perry because it appears that the user likes female pop artists. However, if the user didn’t choose to listen to female pop artists or dance songs before, these selections would receive a lower assigned score.

Utility Matrix

A utility matrix is a common tool used to summarize interaction information between the user and the preferred items. Data gathered from the daily interactions of the user is saved in a structured format, and the algorithm analyzes the likes and dislikes of different items the user has interacted with. Each interaction has an assigned value called a ‘degree of preference.’

Namely, a recommender system has two entities – users and items. Let’s say there are m users and n items. The goal of the recommender engine is to build an mxn matrix (called the utility matrix) which consists of the rating (or preference) for each user-item pair. Initially, this user-item interaction matrix is sparse due to the limited number of ratings for user-item pairs, and additionally, the process is not straightforward because multiple users interact with many different items.

Once there is data about the user’s liking, the system can embed the user in an embedding space using the feature vector generated and recommend items according to the user’s choice. During recommendation, the similarity metrics are calculated from item vectors and preferred user vectors from the user’s previous records. Then, the recommender system suggests the top few items.  

For a more detailed explanation of the utility matrix and how content based recommendation works, you can check the video,

Content-Based Filtering: Advantages and Disadvantages

Why Use Content-Based Filtering?

Content based filtering has many advantages, including:

No Data From Other Users Is Required to Start Making Recommendations

Content based recommender engines don’t need data from other users to create recommendations for a particular user. Once the user has searched a specific term, browsed a few items, or purchased products, the content based recommender can begin producing relevant recommendations. This feature is ideal for businesses with few users to sample. Also, it works great for sellers with few user interactions in specific categories or niches.   

Recommendations Are Highly Relevant to the User

The recommender system produces suggestions based on the user’s day-to-day activities, meaning all the preferences and parameters of the recommendations are finely tuned to the user’s choice. Content based recommendations are tailored to the specific user’s interests, including recommendations for niche items. This method relies on matching the characteristics of attributes of a database object to the user profile and recognizing the specific preference and tastes of the user. It is hugely beneficial for businesses with extensive libraries that contain a single product type, and the suggestions need to be based on discrete features.  

Recommendations Are Transparent to the User

Highly relevant recommendations based on the previous activity of the specific user create a sense of transparency and impartiality. Suggestions for items that precisely match the user’s taste strengthen the trust level in the offered recommendations. In comparison, the collaborative filtering method might produce recommendations that the specific user might not completely understand. For example, if a group of people purchased trainers and an umbrella, the collaborative system may recommend an umbrella to other users who brought trainers, although they are not interested in this product (never browsed or purchased it).   

Avoid the “Cold Start” Problem

Although content based filtering needs initial inputs from users to start making relevant recommendations, the system will generate relevant early recommendations based on specific terms and interactions. Additionally, new items will be suggested as soon as they are launched, without waiting for a census (enumeration), since the system will recognize the users needing the new items. Comparatively, collaborative filtering faces an issue when a new website has few new users and lacks user connections.

Challenges of Content-Based Filtering

There is no perfect recommender system that can cover all aspects and requirements of businesses. The few common disadvantages of content based filtering are: 

Lack of Novelty and Diversity

Content based recommendation system generates suggestions based on relevance. If a user likes American Pie and prefers comedy movies, chances are that the specific user will like Dumb and Dumber. However, likely, the user won’t need a recommender system to suggest another comedy movie.

Scalability Is a Challenge 

Each new item, product, service, or content must be assigned attributes, defined, and tagged. The continuous need for attribute assignments is a burdensome task that makes scalability difficult, time-consuming, and expensive in the long term. 

Attributes May Be Incorrect or Inconsistent

Content based recommender works effectively as long as the subject-matter experts tag items effectively and assign the attributes. This technique depends on the item’s metadata, meaning the system needs detailed descriptions of items and a well-organized user profile. Millions of items need attributes assigned; additionally, since the attributes can be subjective, some may be incorrect or inaccurate. 


What is a content based recommendation?  

Content based recommendation is a system that makes suggestions for items based on the user’s activity and preferences. The content based filtering analyzes keywords and attributes assigned to items in the database and generates predictions that the user will likely find helpful. The user profile is created by surveying data derived from the user’s actions while connected to the Internet (purchases, ratings, downloads, search history, items added to favorites, clicks on product links, etc.)  

How does a content based recommendation system work?

The content based recommender system utilizes features extracted from the content of items evaluated by the user. Namely, the filtering system classifies the unseen items into two classes: positive (relevant to the user) and negative (irrelevant to the user). The products and services related to positively rated items are recommended to the user.    

What are the advantages and disadvantages of content based filtering?

Some of the most significant positive traits of content based filtering is that this recommender system generates suggestions without needing data from other users, making the recommendations highly relevant to the user. Additionally, by recommending items and products related to the previous activity, the user gets a sense of transparency and precision. Content based recommendation avoids the “cold start” problem because, with several initial inputs from the user, the system will start producing relevant recommendations. On the downside, content based filtering lacks novelty and diversity, often scalability is a challenge, and based on subjectivity, the attributes may be incorrect or inconsistent.