2

Meredith Corporation - Neo4j Graph Data Platform

 2 years ago
source link: https://neo4j.com/case-studies/meredith/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Personalized Profiles Drive 600% More Web Traffic

The Challenge

Meredith Corporation is a media conglomerate with $3.2 billion in annual revenue. With over 30 top consumer brands from Parents to People, Real Simple, and Coastal Living, Meredith’s digital presence reaches more than 180 million users a month across dot coms, apps, websites, podcasts, and video.

“Our brands align across several verticals that cater to a wide audience,” said Ben Squire, Senior Data Scientist at Meredith Corporation. “With millions of views and millions of unique visits per month across different topics and lifestyles, our consumers trust us for information on things that affect their daily lives, as well as pique their interest. By understanding and analyzing this content and how it's consumed, we strive to serve the needs of our audiences and advertisers alike.”


By the numbers: Meredith's Identity Graph in Production

  • Graph scale: 30 billion nodes, 67 billion properties, 35 billion relationships
  • Longer touchpoints: From 14 days to 241 days
  • More visits per profile: From 4 to 23.8 visits
  • Platform: Neo4j Enterprise Edition on AWS, with Neo4j Graph Data Science

Meredith seeks to give users just the right content throughout the day, personalized just for them. But that means really knowing your user – a challenge when most people don’t log in.

Crumbling Cookies

Meredith identifies anonymous users through unique cookies that drop on the user’s device. But cookie loss, diverse devices, and ITP 2.3 browsers that block cookies by default increase the difficulty of getting a 360-degree view.

Even when they work, cookies have a short lifespan. That’s a big problem, according to Squire. “Even when you identify the audience that you want to engage or develop models that predict other types of content they will be interested in with high accuracy, if the cookie ID used in the models doesn't appear again, then the money, time and effort that goes into building those models is lost,” said Squire. “Knowing your audience is not good enough; you need to see them again in order to act upon it.”

Connecting Users Across Multiple Data Streams

Meredith’s rich mix of media content naturally generates multiple, disparate streams of data, so Meredith data scientists must blend that data and find ways to identify users across those streams.

For more than two years, Squire and his team had used a variety of data science tools and techniques to analyze user data stored in a relational database management system (RDBMS). “I used to think that we knew this data really well when we looked at it individually from each different data stream,” said Squire.

The Solution

Squire wondered what they could see if they connected their data sources in a graph. The project started with data discovery. “Initially, we just wanted to load data from a relational database and see what it looked like as a graph,” said Squire. “We wanted to see what we could learn.”

“When you combine those data sources and you actually look at the datasets as a whole, it makes you realize that it’s like trying to solve a Rubik’s Cube by only looking at one side of it,” said Squire. “With Neo4j, we’re actually able to combine all of these different datasets. It’s like seeing the Rubik’s Cube in three dimensions and it made it a lot easier to comprehend and understand how to act upon it.”

Connecting their data streams, something interesting jumped out right away. Simple pattern matching showed that cookies designed to identify unique users were sometimes repeated across different data streams. This finding required further investigation.

Squire imported three months of its first-party data into Neo4j, with hundreds of thousands of cookies per month. He then created a POC for Meredith’s Identity Graph on a laptop.

When Squire ran into questions, he turned to the Neo4j Community. “I could ask a question and someone would send me a response within hours,” said Squire. He recommends leaning on the community to discover best practices. “I spent months working on makeshift ways to achieve things that would have taken a fraction of the time had I just asked the Neo4j Community or reached out to engineers directly.”

Vetting Third-Party Data

In the next phase of the project, the team expanded the graph to incorporate a year’s worth of first-party data. They began adding in third-party data as well.

Third-party vendors offer identity data – for a price. Identity vendors take cookie data and run it against their own proprietary identity graph and return enriched user profiles.

Squire had questions about the reliability of this data. Providers offered little transparency into how results were produced. “It can be difficult to validate and verify the accuracy of these products, especially when a large portion of the traffic that you send is anonymous,” said Squire.

By visualizing data from third-party providers in a graph, Squire was able to identify suspicious patterns, such as the hyper connections in the graph below. This analysis enabled Meredith to retain providers that add value while eliminating those that don’t.

Suspicious patterns in third-party data

Analyzing the Whole Graph with Graph Algorithms

Graph queries offer rapid answers no matter how large your graph is. But if you are running the same query over and over to build up user profiles, there is a better way: run a graph algorithm over the entire graph.

Squire and his team chose the Union Find graph algorithm, which identifies unique subgraphs within the larger graph. Subgraphs show data connected to a particular user. The algorithm assigns a unique integer to that subgraph; this integer became the Meredith User Profile ID or MUP ID.

At Production Scale

In production, the Meredith Identity Graph incorporates more than 20 months of user data, from both first- and third-party sources. The database has more than 4.4 terabytes of data across 30 billion nodes, 67 billion properties, and 35 billion relationships.

The average length of touchpoints has exploded, from 14 days with a cookie to 241 days with user profiles. Average visits have increased from 4 per cookie to 23.8 per profile.

Nearly 350 million profiles that would have been considered unique individuals with different interests and patterns were consolidated into 163 million richer and more accurate profiles. A high-definition view of user interests and preferences fuels stronger models, which leads to more relevant content and more users returning over time. It’s a virtuous circle.

Richer Profiles Mean Richer Models and Increased Revenue

Meredith was able to use graph algorithms in Neo4j to transform billions of page views into millions of pseudonymous identifiers with rich browsing profiles, elevating their understanding of customer behavior.

“We basically have increased our understanding of a customer by 20 to 30% by looking at how the data connects over time, rather than just looking at individual cookies themselves,” said Squire. “Instead of ‘advertising in the dark,’ we now better understand our customers, which translates into significant revenue gains and better-served consumers.”

Download Case Study


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK