With all the attention that these LLMs and ChatGPT have been receiving, a new principle emerges “Garbage in, garbage out.” When using GPT-4 or GPT-3.5, or any other large language model, this concept applies to two places Prompt Engineering and Fine-tuning. We should also consider when these models were trained to verify how old or new the data they provide us is.
Most businesses today sit within corporate data sources – inside a firewall or outside and not in a public domain internet. If we can leverage LLMs on this data, then new possibilities emerge.
This article will discuss building a chatbot with a custom knowledge base in 30 minutes to have your AI answer questions about your data.
Pre-Requisites for Creating a Knowledge Bot
We must consider how much data we have to create a custom knowledge base. In this blog, we will create an assistant using PubNub documentation. There is a lot of documentation, so we should consider using a production service such as Vectara, Pinecone, or Weaviate to manage our vector embeddings. You can use LangChain to create a local Vector Database for smaller amounts of data. LangChain also supports mapping your custom embeddings to your LLM models through a semantic or similarity search.
For this blog, we will go through setting up a Vector Database on Vectara, as it provides a simple drag-and-drop solution for all of your company data. The best way to interact with Vectara is to utilize PubNub Functions, which provides a serverless JavaScript container that runs whenever a pre-defined event occurs. You will have customizability over when the function runs and gets fired, being able to adjust how your AI Knowledge Bot operates.
Vectara will use your dataset to index your data into multiple embeddings. When you pass a text input or user input Vectara will run a Semantic Search on your data and summarize the results it has found, providing an answer to your question using your custom data. This will give you the relevant information needed. Vectara supports multiple file types such as TXT, HTML, PDF, and Word Files. Gather all the documents you want to upload and add them to an individual folder.
How to set up Vectara
Set up Vectara is as follows:
Sign up or log in to Vectara
Once you are on the dashboard, click
Create corpus
Once you give the Corpus a name and a description under the
Data Ingestion
header drag and drop the files you want your LLM to know about into theUpload Files
section. After your files are uploaded, you can check your corpus ID at the top of the webpage as you will need it for the request we are about the write.Click on your email in the top right corner and save your
Customer ID
for laterSelect
API Keys
and create an API key for your corpus by selecting your corpus in the drop-down menu. Save your API key for later.
High-Level Architecture
The architecture will be structured as follows:
The chat application will use PubNub to send and receive messages
A PubNub Function will listen to these messages on a specific channel
A PubNub signal will be fired to let the user know when the AI is thinking and when it is done.
The message will then be forwarded to Vectara using the Vectara Rest API
The PubNub Function will then parse 1 of many results out of the response from Vectara
The response will then be published on a channel associated with your chatbot
Configuring your PubNub Function
Navigate to the admin dashboard
Select
functions
on the left-hand menu and click on the appropriate key set you would like to useSelect
+ Create Module
and enter a module name and descriptionSelect the module you just created and click
+ Create a Function
Give the Function a name, such as
Vectara Query
and selectAfter Publish or Fire
in the drop-down menu. This function will fire after the message has been published to the relevant channel, in this case,docs-pubnub-ai
Set the channel name to
docs-pubnub-ai
Click on
My Secrets
and create a secret calledVECTARA_API_KEY
andCUSTOMER_ID
Here is the code for querying the Vectara Database from a PubNub Function
The code snippet for the PubNub Function is defined as follows:
Connecting the PubNub Function to your UI
To connect the PubNub function to a UI following code defined above using one of the many SDKs that PubNub provides. Publish/subscribe to the channel pubnub-docs-ai
and wait for the Vectara query to finish running after utilizing the PubNub Function above. Connecting a Typing Indicator to listen for PubNub signals on the channel pubnub-docs-ai
will allow the user to see when the PubNub Knowledge bot is thinking, adding a smoother end-user experience.
The code for connecting the PubNub function in React:
In Summary
Using PubNub Functions along with any Vector Database or Vector Store is a very quick and production-ready way to create your own AI Knowledge Bot. Not only could you utilize Vectara this way but also Pinecone, Weaviate or any other production Vector Database. With PubNub Functions it is easy to host and control how the message is being sent and when it is sent to enhance your Vector Databases functionality.
Sign up for our admin dashboard to start configuring your PubNub keyset. Also, check the number of tutorials and blogs we have for your specific use case.