How to use Pinecone and OpenAI to ChatGPTify your website!

A book split into a thousand parts, glowing, magically, digital art — by DALL-E 2

Our company handbook, hosted at https://handbook.variant.no, is a good resource we refer to all the time both internally and externally at Variant. Even though it’s packed with valuable information, we recognize that it’s not always easy to get quick answers. So, we’re trying out a little experiment by leveraging the power of Large Language Models (LLMs), specifically GPT-3.5, to answer questions about the content of our handbook:

This example might not make much sense if you don’t speak Norwegian.

Since our handbook is in Norwegian, that makes it a bit difficult to demo it to non-Norwegian speaking audiences. But in the above example I ask “Do we get Christmas off?” — where the answer is (paraphrasing) “Yes, we get Christmas off. All days between Christmas Eve and New Year’s Day is counted as a holiday”.

We were inspired by Greg Richardson’s implementation of a similar feature, which he did for the Supabase documentation. You can read more about how they did it in this blog post.

In this blog post I’ll take you through how we implemented this for our handbook using the Pinecone and the OpenAI APIs.

Indexing the Handbook

To start off, we need to retrieve and index the content of the handbook somehow!

Fortunately for me, we already have an implementation for indexing the content in our handbook, because we do this for our existing handbook search-engine using Algolia. The details here are not important, and might be different from how you would do it for your own website.

But in essence our search indexer runs through all the .mdx-files in our handbook, and retrieves all the text content in the handbook split into sections in JSON-format. For our normal search, all of these index items are then uploaded to Algolia. But in my case, I wanted to store them in a vector database instead.

Why a vector database? If you’ve used ChatGPT, you might have noticed that it has the ability to remember what you’ve said inside the same conversation. But it doesn’t remember what you’ve said in previous conversations. This is because the model is (at the moment at least) a blank slate for each new conversation, and only remembers what it has originally been trained on (ostensibly information on the public Internet up till and including 2021). This means that if you want to have a conversation with the model on a narrow topic or domain, and get good and updated answers, you need to provide it with context. And this is where the vector database comes in.

The vector database will effectively functions as a long-term memory for the LLM, which we can feed it with. This means that we can provide the model with context from our handbook, and it will be able to use this to provide better answers. A vector database is in this instance a better option than a traditional database because it allows us to do quickly find different texts which relates to each other. Why this is important will become more clear later.

Saving the index

I chose to use Pinecone as my vector database, mainly because it’s a managed service, and I didn’t want to spend too much time on setting up and maintaining a database. But there are other alternatives available as well, as per the OpenAI cookbook for vector databases.

What I want to do in this case, is save the different index items to Pinecone. Each index item is a partial section from our handbook, and looks something like this:

{
  "title": "En variants håndbok",
  "url": "https://handbook.variant.no/#en-variants-håndbok",
  "content": "Om du ikke er en variant men liker det du leser,\n ta en titt på ledige stillinger hos oss. Mer info\nom oss på  nettsiden vår .",
  "department": ["Trondheim", "Oslo", "Bergen", "Molde"]
}

The problem though is that vectors are essentially just arrays with floating-point numbers in them. So how do we represent a piece of text as a vector? In order to do that, we’ll have to convert the content to an “embedding”.

Embeddings, in the context of machine learning, are a way to represent complex data, like words, sentences, or even images, as points in a multi-dimensional space (a vector). The magic of embeddings is that they can arrange words (or other data) in this multi-dimensional space so that similar words are close together, and dissimilar words are far apart. This allows us to more easily identify relationships between words and sentences with similar semantic meaning, just by comparing the distance between vectors. In other words we can more easily find sentences which relates to each other.

Luckily for us, the OpenAI API has an endpoint for creating embeddings. So using the NodeJS-library for the OpenAI API, I can take the content field from the section above, and create an embedding for it like this:

const content = index[0].content;
const configuration = new Configuration({
  apiKey: openAIApiKey,
});
const openaiClient = new OpenAIApi(configuration);
const embeddingResponse = await openaiClient.createEmbedding({
  model: "text-embedding-ada-002",
  input: content,
});

const [{ embedding }] = embeddingResponse.data.data;

This will create a vector-array of floating-point numbers, which is the embedding I can save to my vector database. Since the embeddings created by the text-embedding-ada-002 model have 1536 output dimensions — the index in the Pinecone must be created and specifically set to support 1536 dimensions. For Pinecone this can be done through a simple API-call:

curl --location 'https://controller.eu-west4-gcp.pinecone.io/databases' \
--header 'Api-Key: <your-api-key>' \
--header 'accept: text/plain' \
--header 'content-type: application/json' \
--data '
  {
    "metric": "cosine",
    "pods": 1,
    "replicas": 1,
    "pod_type": "p2.x1",
    "metadata_config": {
      "indexed": ["department"]
    },
    "dimension": 1536,
    "name": "handbook-index"
  }
  '

In this case I’ve also specifically said that the department field should be indexed, so that I can filter the results based on department later. Doing this, I also make sure no other metadata-fields will be indexed, which they are by default. This will save memory and make the queries faster.

As noted earlier as well, the sections index is split up into multiple parts. This is a good thing for the queries as well, since it will be faster and more precise when the vectors are smaller. So instead of saving the entire section as a vector in Pinecone, we instead save several small parts. But, instead each small part is stored with the full content of the entire section as metadata, so that I can retrieve the full content if the queries hit any part of the section.

So to save each part, I take the embedding created above by the OpenAI API, and save it along with metadata to Pinecone with their NodeJS-library:

const upsertRequest = {
  vectors: [
    {
      id: inputChecksum,
      values: embedding, // the embedding from earlier
      metadata: {
        title,
        content, // the section content the embedding is created from
        fullContent, // the full content of the entire section
        url,
        department,
      },
    },
  ],
  namespace: "handbook-namespace",
};

await pineconeIndex.upsert({ upsertRequest });

Just to summarise what we’ve done so far:

We’ve indexed the entire handbook by splitting it up into multiple sections and each section into multiple parts
Then we created embeddings (vectors) for each section part through the OpenAI API
And lastly we saved these embeddings and metadata from the section in the Pinecone vector database

Now we have a vector database index that can be queried for relevant sections in the handbook based on questions being asked. Remember, this will serve as the long-term memory for the LLM. So now we will retrieve the relevant sections from the handbook based on a question being asked. After that the next step is to be able to ask GPT-3.5 for answers with the handbook as context.

Retrieving the relevant sections

Since the input to ask questions about handbook is open to anyone, we have have to take extra care that we do not prompt GPT-3.5 with questions that do not comply with Open AI’s usage policies. To ensure compliance, we can utilize their free moderation endpoint, which verifies whether a question aligns with their guidelines.

So when the user asks a question, we first check if it complies with OpenAI’s usage policies:

const moderationResponse = await openai.createModeration({ input: question });
const [results] = moderationResponse.data.results;
if (results.flagged) {
  throw new Error("Doesn't comply with OpenAI usage policy");
}

If it passes, the next step is to query the vector database for relevant sections in the handbook. However, the question must first be transformed into an embedding.

If you remember from earlier, an embedding is a way to represent a text in a multi-dimensional space (a vector). So the idea here is to convert the question to a vector, which we can then query the Pinecone-database with. This will allow us to find related sections in the handbook, just by comparing the distance between the section-vectors and the question-vector.

So to achieve this, we create an embedding for the question, as we did for the handbook sections:

const embeddingResponse = await openai.createEmbedding({
  model: "text-embedding-ada-002",
  input: question,
});

const [{ embedding }] = embeddingResponse.data.data;

With the question converted, we can now query the vector database for relevant and related handbook sections:

const queryRequest: QueryRequest = {
  vector: embedding, // the query embedding
  topK: 5,
  includeValues: false,
  includeMetadata: true,
  namespace: "handbook-namespace",
};

const queryResponse = await index.query({ queryRequest });

const uniqueFullContents = queryResponse.matches
  .map((m) => m.metadata)
  .map((m) => m.fullContent)
  .reduce(reduceToUniqueValues, []);

The query above will return the top 5 most relevant sections in the handbook, based on the question. In other words, the vectors which were closest to our question-vector in the multi-dimensional space.

And if you remember from earlier, we also store the full content of sections inside the metadata. We make sure to filter out duplicate sections. This is important, since the sections are split up into multiple parts, and we don’t want to prompt GPT-3.5 with the same section multiple times if we get several matching results from the same section.

Why use an LLM?

Now you might be asking yourself, why do we want to prompt GPT-3.5 for an answer when we’ve already pulled the relevant sections out of the database? This is because the LLM has the ability to summarise the relevant sections and answer succinctly with regards to your question. The alternative here could be to print out all the section contents, and let you read through to find the answer yourself, but I don’t find that to be very satisfying.

The next step then is to give the LLM enough relevant context to answer the question from a prompt.

The prompt

Much can be said about how to construct a good prompt for GPT-3.5, but I’ll keep it short here. The prompt is constructed by combining the question with the relevant sections from the handbook. The prompt is then sent to GPT-3.5 for completion. The prompt is constructed like this:

const prompt = `
You are a very enthusiastic Variant representative who 
loves to help people! Given the following sections from 
the Variant handbook, answer the question using only that 
information. If you are unsure and the answer is not 
written in the handbook, say "Sorry, I don't know how to 
help with that." Please do not write URLs that you cannot 
find in the context section.

Context section:
${uniqueFullContents.join("\n---\n")}

Question: """
${question}
"""
`;

As you see, in addition to giving it the relevant sections from the handbook, we also set a tone-of-voice and some preconditions on how to answer the question. And when not to try to answer, for that matter!

Now, finally, we’re ready to ask GPT-3.5 for an answer. We do this by sending the prompt to the completion endpoint:

const completionOptions: CreateCompletionRequest = {
  model: 'text-davinci-003',
  prompt,
  max_tokens: 512,
  temperature: 0,
  stream: false,
};
const res = await openai.createCompletion(completionOptions);
const { choices } = res.data;

const answer = choices[0].text
console.log(answer); // or display it in the UI of your choice

At last, we have an answer! Keep in mind that the GPT-3.5-generated responses are non-deterministic, meaning they may vary slightly each time. However, GPT-3.5 is adept at generating accurate answers when given enough context to do so. Thus, asking good questions is only half the battle — proper preprocessing and indexing of the content beforehand is equally important.

So to summarise what we do when the user asks a question:

Inspect the question for flagged content.
Generate an embedding using the question text.
Query the vector database for relevant handbook content.
Create a natural language prompt containing the question and relevant content, providing sufficient context for GPT-3.5.
Submit the prompt to GPT-3.5 to receive an answer.

And that is the very basics of how we built a integration towards an LLM in our handbook, based on the Pinecone and the APIs from OpenAI.

I’ve left out a lot of details are for the sake of brevity. But, the full implementation with all the details can be viewed in the Open Source repository for our handbook. The most pertinent files are likely generate-embeddings.mjs for details on how we do indexing and insertion to Pinecone. And openai-data.ts for details on how we handle user queries.

How to use Pinecone and OpenAI to ChatGPTify your website!

How to use Pinecone and OpenAI to ChatGPTify your website!

Indexing the Handbook

Saving the index

Retrieving the relevant sections

Why use an LLM?

The prompt

Recommend

唐彬森的逻辑：可乐是口味

Pikamoon vs Uniswap: Comparing Gaming Tokens to Decentralized Exchanges

五一长假已开启网友对调休不满专家称调休式放假更包容更弹性

Economists are worried about AI's potential to increase inequality amid massive...

Quest Can Now Update Games During Shutdown, Pro Gets Support For WiFi 6E

Pretty Package

Infusing Digital Responsibility into Your Organization

User Goals vs. Business Goals – Finding the UX Tipping Point

IPv6更便宜了中国互联网络信息中心CNNIC宣布再降资费

Question - P7P cant get past bootloader after Magisk patch failure | XDA Forums

About Joyk