14.7 Data
Many of the most popular technology products and platforms are built on personalized or predictive technologies. Amazon suggests other products you might purchase based on your shopping history. Google personalizes your search results based on sites you frequently visit. Spotify recommends music based on your listening habits. These products are built by processing massive amounts of data.
Analyzing a product's user data can help companies predict user behavior and identify points of influence that can nudge users toward desirable behaviors. For instance, you may be able to discern that users who see a sales promotion at a specific point in their interaction are more likely to buy, or you may discover that once a user adds 10 client records, they're 75% more likely to become a multiyear subscriber.
Predictive technology can also help companies outside of their core products. For example, Salesforce tracks visitors to their website and has developed algorithms that predict the likelihood that these visitors will buy their product. This information lets their sales team prioritize the most promising prospective clients to pursue from among all site visitors.
In this checkpoint, you will learn more about how personalized and predictive technology works. You'll discover what buzzwords like machine learning, artificial intelligence, and data science mean and consider how they might be useful in developing products.
By the end of this checkpoint, you should be able to do the following:
- Explain these terms: data science, AI, personalization, and machine learning

Data science
Data science is the term used to describe data analysis that collects data from a multitude of different data sources, organizes information, identifies relevant questions, and communicates findings in a way that positively affects business decisions. In the product management context, this focuses on tracking and analyzing demographics, psychographics, and user events. The data is used to improve user experience, inform product decisions with quantitative data on how the product is used, and support other teams within your company (such as tech support, operations, marketing, and sales).
Data science methods are drawn from a combination of technical and soft science fields of research. These include computer science, statistics, user experience, psychology, and sociology. Data analysts and data scientists use databases, spreadsheets, programming languages, and business intelligence (BI) tools. The ability to capture and perform complicated analyses on huge amounts of data has dramatically shifted what you can expect to understand about users and products.
Depending on the size and structure of your organization, you may have colleagues solely focused on data science or product analytics. Or, if you end up working in a product role at a smaller company, that analysis may be your responsibility. Regardless, it will be valuable to have a handle on basic data science concepts and data collection methods.
IBM has some of the best videos in Data Science.
Applications of data science
Now take a look at the different ways that product managers use data science.
Understanding user behavior
All product managers use data to understand how user actions, demographics, and psychographics impact product use and success. Imagine, for instance, that your product has an e-commerce flow with a 20% conversion rate (meaning that 20% of site visitors make a purchase). By looking at sales flow data, you're able to see that new users convert much better than old ones (in other words, if people don't buy initially, they are less likely to buy). You also recognize that people who use coupon codes convert best of all and that men are more likely to purchase the first time they view a product while women usually view a product at least twice before purchasing.
Data can also help you spot specific factors or goals that indicate user success or failure. Early in Facebook's history, they famously noted that if users find 10 friends in the first 7 days using the site, the likelihood of that user staying on Facebook was high. As a result, Facebook's user experience (UX) is intensely focused on importing, finding, and adding friends—especially for new users. That initial user data still drives Facebook user experience.
Informing sales and marketing
Acquiring product users can be very difficult and expensive. Marketing teams want to ensure that they spend their advertising budget in the most effective way possible. And a sales team wants to spend its time on the prospective clients most likely to buy or make large contract commitments.
Data science can help these teams identify the best user segments to target. Segmentation is a common sales and marketing term for dividing the market of potential customers into groups (segments) based on characteristics such as age, income, personality traits, and behavior. Data can also identify characteristics or actions that indicate strong interest or potential success as well as those that indicate the opposite. For example, a visitor that goes to a site's Careers page is a far less likely sales target than the visitor that goes to the Enterprise Products page. Capturing and analyzing this type of data offers a fuller picture for identifying valuable sales leads.
Making products personalized
Personalization includes simple modifications tailored to the user. For example, a retail site will offer summer fashions instead of sweaters in December if the user is in Australia. Personalization is also used to make predictions based on the user's past behavior. If you frequently type the word potato on your cell phone, the next time you start typing po, your phone could suggest potato; this is called autocompletion. Other users who type po might see possible because they use that word more frequently. In other words, personalization means products adapt to a unique individual's actions and choices.
Creating predictive recommendations
Many products aggregate data across users to make decisions about what an individual will do. For example, say Target has data that indicates shoppers frequently buy paper towels and tissues together. If you add paper towels to your cart, Target's website would next recommend tissues to you. These recommendations are created by analyzing the behaviors of many users and using them to predict what any single user is likely to do.
Storing data
Data scientists depend on products to collect and track all the information needed to do their work. For example, if a data scientist wants to understand the factors that lead users to pay for a subscription to Slack, the following information would be valuable:
- All the site pages users viewed leading to their subscription
- How often the team's Slack account was used
- What add-ons the users plugged into their Slack account
- The roles of the purchaser (like the CTO or VP of product) and other team members using Slack
To analyze this information, the product must first be configured to track and store all of this data. Many analytics products have their own system for tracking and storing user events—actions that users take. For example, Mixpanel can track clicks or page loads according to your specifications for visitors on your website or mobile app. There are also aggregation tools, like Segment, that can bring together insights from various analytics tools—such as Mixpanel, Google Analytics, and your own databases.
As a product manager, it would be your job to think ahead and consider what information your product should track. This could make all the difference after you launch; you can't analyze events that happened if you didn't capture that data. Collaborate with your stakeholders—from marketing, tech support, account management, engineering, etc. Understand the data that your organization needs and ensure that your plans include all that is needed to track it and establish databases to power future analytics.

Artificial intelligence
Artificial intelligence (AI) is technology that can be made to "think" or "understand" in ways that mimic human thinking. It's not expected to be exact or perfect, however—AI makes some pretty spectacular mistakes. Voice assistants like Siri, Alexa, or Google Assistant are great examples of AI in action. Users ask a question, and the software attempts to understand the intent of your request and provide a response (as information or action).
There are two types of AI; generalized and applied. Generalized AIs respond to open inquiries. The aforementioned Google, Alexa, or Siri attempt to respond to any question posed to them. Applied AI is more common and focuses only on specific areas or tasks. Applied AI can be found in stock trading or self-driving automobiles. The "intelligence" programmed or "learned" by the technology is more sophisticated and in-depth but is narrow in scope.
Machine learning
Machine learning (ML) is a special kind of AI. It automates building models that explain and predict data for specific purposes. By building these models, the technology "learns" without being programmed by individual developers.
ML is a system that can learn and adapt to changes over time and is useful when you need accurate decisions gathered from a huge amount of data. For example, systems that handle email spam filtering receive tons of data from users who have marked individual messages as spam or not spam. This data is used to automate the filtering process; the system makes decisions based on past user actions identifying which emails are most likely spam and which aren't.
It is tempting to have humans analyze the data and build a system that uses that analysis to decide which emails are or aren't spam, but there are two problems with that approach. First, it's really hard to analyze for multiple factors; there are too many variables to take into account in the initial analysis. And if analysts could track for all the factors, the math to apply all of them when filtering each incoming email would be incredibly complicated. Second, factors change. The analysis for filtering would have to be updated constantly to account for the shifting tactics of spammers.
Instead of programming a system to filter the incoming messages, ML software is programmed to recognize patterns and apply what it has learned. Imagine a spreadsheet with a row for every email with information from that email that includes the date, subject line, sender, and all the technical bits and fields that users don't see in their email client software. Each email also includes information about whether or not the recipient marked it as spam.
When all this data is fed into a machine learning system, the program figures out which factors in which combinations lead to an email being marked as spam or not. Maybe there's one email address whose emails are often marked as spam. Or there's a series of email subjects with terms like discount, ray ban, and r4y b4n that are frequently marked as spam.
Machine learning systems come up with multiple solutions at the same time—much faster than any individual could capture and process data. The system isn't programmed with the factors to consider or with the method of assembling the solution; it is programmed to identify the factors and build the solutions by itself.
These solutions are called models. Each model can identify the statistical probability that an email will or won't be marked as spam based on its characteristics. The model can check past emails to see which ones weren't marked as spam but that should have been. It can also identify which incoming emails are likely to be spam. Models also update over time; if there's a new email spam trend that's occurring, the model can identify the pattern, update its data for judging if an email is spam, and mark future emails correctly.
How does a machine "learn"? A machine learning system takes training data that includes all the factors that could go into making a decision and figuring out what the decision should be (for instance, whether or not it is spam). Having both the input data and expected conclusion, ML produces a model to predict outcomes without the decision-making process being manually programmed. As a type of applied AI, a model is good for only specific data and only specific decisions (for example, it can filter email spam—but nothing more).
Machine learning has endless applications. Here are a few notable ones:
- Image recognition to decide if a photo has a specific object or not
- Sentiment analysis for detecting if text content is positive or negative
- Speech recognition to understand intent from the tone of voice
- Banking and finance to make decisions on loans, buying, or selling assets
- E-commerce recommendations about what a purchaser is likely to buy next
- Sales and marketing to predict which users are most likely to become buyers
AI versus ML
While closely related, ML and AI have differences. ML is used to create a self-updating, decision-making machine for a specific task. AI addresses how a system responds to or navigates more complex problems. It's the difference between detecting a spam email (ML) and running self-driving cars (AI). Filtering spam is a single yes-or-no decision that may need adaptive models to change how the decision is made over time. Car driving is not going to change over time, and it involves making a lot of decisions at once.

The tech of personalization
Given the number of supplies a new parent is likely to purchase, a savvy retailer will tailor its marketing and show targeted products to users they know are expecting a child. Personalization and recommendation systems combine data known about a user with actions that are likely to provoke the desired outcome for the seller. Companies see huge upticks in their conversion rates and KPIs when they provide users a more targeted experience. These shifts can be subtle enough that users don't realize they are receiving a personalized experience.
Here are some examples of personalization.
- Showing you recommended items to purchase. This is based on your past purchases or the purchases of people who bought identical or similar items to what you're currently considering.
- Offering frequently chosen items. Uber or Lyft will offer you destinations that you have traveled to frequently in the past.
- Showing relevant search results. For instance, searches are smart enough to know if you're searching for lunch options and can even offer locations that are nearest to you.
- Hiding irrelevant results, like updates from people you never engage with on Facebook or news sites that you never click on.
- Customizing the products promoted. Retailers will tailor the price point of promoted items based on previous buying habits of a known user.
To build personalized systems, a company must have data on users to inform the customization it wants to implement. This data can be captured within their system or product. Or, customer data can be purchased from marketing data firms.
This is also an area where product and user experience is more of an art than a science. Companies can only use user data they have, but sometimes they can deduce things about a user that may feel invasive to the user. Such situations have to be handled with care. As mentioned, retailers are always eager to market to expecting families since they typically buy a lot of baby gear. However, it can feel creepy or unethical to act on information users didn't openly share with the company. There are cases of miscarriages where retailers kept sending promotions reminding the parents of how soon the baby would arrive. Or, there is the infamous case in which Target sent congratulations and coupons to the home address of a pregnant high schooler who had not told her parents the news. Later on in this program, you will dive deeper into navigating moral, legal, and ethical considerations in the world of product.
Deciding when to use personalization
Personalization is an extremely powerful tool that can increase conversion rates and improve users' experience. Below are factors to consider when developing personalized products.
- Do your developers have experience with personalization? It can be complicated to implement if people on your team are not familiar with how it works.
- What goal are you trying to achieve with this tech? Can the goal be achieved without personalization?
- Do you have the information needed to create the desired personalization? If not, how can it be captured? Can that data be collected at all?
- Is there a simpler way to do it? There could be more naive ways to segment based on a single dimension. Or, another option is just asking users for the information rather than building a complex model.
- How can you validate this? There's probably an MVP or next best substitute that can be developed to test whether or not building a personalized model is worthwhile.
- Is there an off-the-shelf product you can use? This is a rapidly growing space with many products. Integrating a personalization product is likely easier and cheaper than building it yourself.
In-depth analysis of the problem to be solved and the data available will guide whether or not personalization and various types of AI would benefit users' product experience.
The video below provides a good overview of predictive analytics.
Practice ✍️
Imagine you work on Slack and want to make it better through personalization or automated recommendations.
- What are some ways you can apply AI/ML to achieve that goal?
- Pick one of the following.
- What are all the pieces of data you would need to analyze in order to create that experience? Is there a specific way that data should be processed?
- Create mockups of an updated experience that includes that personalization in your tool of choice.