Google weighing 'Project Ellmann,' uses Gemini AI to tell life stories

Key Points

In this article

Getty Images

A team at Google has proposed using artificial intelligence technology to create a “bird’s-eye” view of users’ lives using mobile phone data such as photographs and searches.

Dubbed “Project Ellmann,” after biographer and literary critic Richard David Ellmann, the idea would be to use LLMs like Gemini to ingest search results, spot patterns in a user’s photos, create a chatbot and “answer previously impossible questions,” according to a copy of a presentation viewed by CNBC. Ellmann’s aim, it states, is to be “Your Life Story Teller.”

It’s unclear if the company has plans to produce these capabilities within Google Photos, or any other product. Google Photos has more than 1 billion users and 4 trillion photos and videos, according to a company blog post.

Project Ellman is just one of many ways Google is proposing to create or improve its products with AI technology. On Wednesday, Google launched its latest “most capable” and advanced AI model yet, Gemini, which in some cases outperformed OpenAI’s GPT-4. The company is planning to license Gemini to a wide range of customers through Google Cloud for them to use in their own applications. One of Gemini’s standout features is that it’s multimodal, meaning it can process and understand information beyond text, including images, video and audio.

A product manager for Google Photos presented Project Ellman alongside Gemini teams at a recent internal summit, according to documents viewed by CNBC. They wrote that the teams spent the past few months determining that large language models are the ideal tech to make this bird’s-eye approach to one’s life story a reality.

Ellmann could pull in context using biographies, previous moments and subsequent photos to describe a user’s photos more deeply than “just pixels with labels and metadata,” the presentation states. It proposes to be able to identify a series of moments like university years, Bay Area years and years as a parent.

“We can’t answer tough questions or tell good stories without a bird’s-eye view of your life,” one description reads alongside a photo of a small boy playing with a dog in the dirt.

“We trawl through your photos, looking at their tags and locations to identify a meaningful moment,” a presentation slide reads. “When we step back and understand your life in its entirety, your overarching story becomes clear.”

The presentation said large language models could infer moments like a user’s child’s birth. “This LLM can use knowledge from higher in the tree to infer that this is Jack’s birth, and that he’s James and Gemma’s first and only child.”

“One of the reasons that an LLM is so powerful for this bird’s-eye approach, is that it’s able to take unstructured context from all different elevations across this tree, and use it to improve how it understands other regions of the tree,” a slide reads, alongside an illustration of a user’s various life “moments” and “chapters.”

Presenters gave another example of determining one user had recently been to a class reunion. “It’s exactly 10 years since he graduated and is full of faces not seen in 10 years so it’s probably a reunion,” the team inferred in its presentation.

The team also demonstrated “Ellmann Chat,” with the description: “Imagine opening ChatGPT but it already knows everything about your life. What would you ask it?”

It displayed a sample chat in which a user asks “Do I have a pet?” To which it answers that yes, the user has a dog which wore a red raincoat, then offered the dog’s name and the names of the two family members it’s most often seen with.

Another example for the chat was a user asking when their siblings last visited. Another asked it to list similar towns to where they live because they are thinking of moving. Ellmann offered answers to both.

Ellmann also presented a summary of the user’s eating habits, other slides showed. “You seem to enjoy Italian food. There are several photos of pasta dishes, as well as a photo of a pizza.” It also said that the user seemed to enjoy new food because one of their photos had a menu with a dish it didn’t recognize.

The technology also determined what products the user was considering purchasing, their interests, work and travel plans based on the user’s screenshots, the presentation stated. It also suggested it would be able to know their favorite websites and apps, giving examples Google Docs, Reddit and Instagram.

A Google spokesperson told CNBC: “Google Photos has always used AI to help people search their photos and videos, and we’re excited about the potential of LLMs to unlock even more helpful experiences. This was an early internal exploration and, as always, should we decide to roll out new features, we would take the time needed to ensure they were helpful to people, and designed to protect users’ privacy and safety as our top priority.”

Recommend

How will generative AI impact legal services? It’s all about responsibility, say...

Revolutionizing Freelance Workforce Management

A workflow pattern in F#

电脑技术：汇总目前常用的自由上网内核仓库和客户端

Chronon - Airbnb’s End-to-End Feature Platform at QCon SF 2023

智能物联：中国AloT产业年会暨2024年智能产业前瞻洞察大典

Material Security’s Path to Product-Market Fit — Find Your Winning Idea by Selli...

Wido Comet Collateral Swap Contracts

Should we be using printed textbooks?

Prenuvo offers $2,500 full-body MRI scans that can detect cancer early

About Joyk