Topic Modeling for Product Managers

What is Topic Modeling?

Topic modeling is a type of natural language processing (NLP) used to find “topics,” or commonly occurring words or groups of words, within a set of documents. Topic models are critical to product managers because they enable them to sort and analyze the huge amounts of text data with which they have to work. Product managers need topic models for several tasks, such as:

  • Analyzing a large set of reviews to learn what customers are saying about their products.
  • Understanding what product features customers are talking about.
  • Gaining insights into what new features customers desire.
  • Learning about customers’ opinions of their product via social media.

In a way, topic modeling is similar to sentiment modeling: both are NLP algorithms for analyzing large volumes of text. While sentiment models seek to determine if a piece of text is positive or negative, topic models seek to determine what a piece of text is about. Topic models work by searching a set of documents for the most commonly occurring words or phrases in each document and then cross-correlating to determine which words and phrases are most common both within each document and among all the documents. As we are exploring topic modeling for product managers specifically, it’s important to address the type of learning should be applied.

The Difference Between Unsupervised vs. Supervised Learning

Most topic modeling algorithms are trained using unsupervised learning. “Unsupervised” simply means the algorithm is trained on a large set of unlabeled data. Unsupervised learning offers many benefits:

  • Fast training and computation time.
  • Robust topic selection not limited by pre-defined topics.
  • Greater model accuracy in finding all relevant topics.

However, despite its benefits, unsupervised learning can lead to loss of accuracy or even inaccurate or non-relevant results when analyzing data about a specific product.

 

In contrast to unsupervised learning, supervised learning is the training of the topic model using labeled data. That is, choosing a set of topics for the algorithm to detect. This type of learning involves giving the topic model a set of predetermined topics to search for. While supervised learning has the drawback of being more time consuming and poses the risk of missing key topics outside of the ones chosen, it has its own key benefits:

  • The ability to search for specific topics.
  • The ability to chose how many topics to search for.
  • Greater model accuracy in finding the right topics.

The Benefits of Supervised Learning for Product Managers

Considering topic learning for product managers, the most effective type of training for their topic models is supervised learning. For, say, writing a research paper, unsupervised learning is optimal as it will discover the most common topics and the context within which they are used. On the other hand, a product manager already knows what topics for which to search. Rather than general knowledge about the documents’ topics, a product manager needs to find data specific to their product. A product manager typically already knows what information they need about their product and will gain the most beneficial data from a topic model by training it on this information. The benefits of using supervised learning for topic modeling for product managers are:

  • Supervised learning provides better results when working with industry and news documents.
  • Supervised learning limits the scope of the topics modeled to only the topics relating to the specific product.
  • Supervised learning leads to topics relevant to the product manager’s use case, such as review topics vs. feature topics.

In summary, topic models provide an NLP algorithm for finding the most common words and phrases across a large set of documents. Topic models can be trained via unsupervised and supervised learning. While both types of learning have their benefits, topic modeling for product managers stand the most to gain by using supervised learning for their topic models.

 

Ready to get started? Check out our post on 10 Best Practices for Storing Labeled Data to make sure you have a strong, well organized data set for your supervised topic model.

en_USEnglish