RAG and LLM Integration

Applications of Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs) give context-aware solutions for complex Natural Language Processing (NLP) tasks. Natural language processing (NLP) is a machine learning technology that gives computers the ability to interpret, manipulate, and interact with human language.

Combining RAG and LLMs enables personalized, multilingual, and context-aware systems. The objective of this tutorial is to implement RAG for user-specific data handling, develop multilingual RAG systems, use LLMs for content generation, and integrate LLMs in code development.

Retrieval-Augmented Generation (RAG)

RAG Similarity Search is a tutorial on ChromaDB to create a vector store from the Gekko Optimization Suite LLM training data train.jsonl file to retrieve questions and answers that are similar to a query. The tutorial is a guide to install necessary libraries, import modules, and prepare the Gekko training data to build the vector store. It emphasizes the significance of similarity search with k-Nearest Neighbors, with a vector store either in memory or on a local drive. It includes an exercise where participants create question-answer pairs on a topic of interest, construct a vector database, and perform similarity searches using ChromaDB.

LLM with Ollama Python Library

LLM with Ollama Python Library is a tutorial on Large Language Models (LLMs) with Python with the ollama library for chatbot and text generation. It covers the installation of the ollama server and ollama python package and uses different LLM models like mistral, gemma, phi, and mixtral that vary in parameter size and computational requirements.

RAG and LLM Integration

Combining Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs) leads to context-aware systems. RAG optimizes the output of a large language model by referencing an external authoritative knowledge base outside of initial training data sources. These external references generate a response to provide more accurate, contextually relevant, and up-to-date information. In this architecture, the LLM is the reasoning engine while the RAG context provides relevant data. This is different than fine-tuning where the LLM parameters are augmented based on a specific knowledge database.

The synergy of RAG enhances the LLM ability to generate responses that are not only coherent and contextually appropriate but also enriched with the latest information and data, making it valuable for applications that require higher levels of accuracy and specificity, such as customer support, research assistance, and specialized chatbots. This combines the depth and dynamic nature of external data with the intuitive understanding and response generation of LLMs for more intelligent and responsive AI systems.

RAG with LLM (Local)

The Local RAG with LLM downloads the train.jsonl file to provide context-aware information about Gekko questions using the mistral model. The processing of the LLM may take substantial time (minutes) if there are insufficient GPU resources available to process the request.

import ollama
import pandas as pd
import chromadb

# Loading and preparing the ChromaDB with data
def setup_chromadb():
    # read Gekko LLM training data
    url='https://raw.githubusercontent.com'
    path='/BYU-PRISM/GEKKO/master/docs/llm/train.jsonl'
    qa = pd.read_json(url+path,lines=True)
    documents = []
    metadatas = []
    ids = []

    for i in range(len(qa)):
        s = f"### Question: {qa['question'].iloc[i]} ### Answer: {qa['answer'].iloc[i]}"
        documents.append(s)
        metadatas.append({'qid': f'qid_{i}'})
        ids.append(str(i))

    cc = chromadb.Client()
    cdb = cc.create_collection(name='gekko')
    cdb.add(documents=documents, metadatas=metadatas, ids=ids)
    return cdb

# Ollama LLM function
def ollama_llm(question, context):
    formatted_prompt = f"Question: {question}\n\nContext: {context}"
    response = ollama.chat(model='mistral', messages=[{'role': 'user', 'content': formatted_prompt}])
    return response['message']['content']

# Define the RAG chain
def rag_chain(question, cdb):
    context = cdb.query(query_texts=[question],
                        n_results=5, include=['documents'])
    formatted_context = "\n\n".join(x for x in context['documents'][0])
    formatted_context += "\n\nYou are a professional and technical assistant trained to answer questions about Gekko, which is a high-performance Python package for optimization, simulation, machine learning, data-science, model predictive control, and parameter estimation. In addition, you can also help with answering questions about programming in Python, particularly in relation to the aforementioned topics. Your primary goal is to assist users in finding solutions and gaining knowledge in these areas."
    result = ollama_llm(question, formatted_context)
    return result

# Setup ChromaDB
cdb = setup_chromadb()

# Create prompt for Local RAG LLM
question = 'What are you trained to do?'
out = rag_chain(question, cdb)
print(out)

Cloud Service RAG with LLM

The same RAG with LLM program is available in the Gekko support module. The question is sent to a server that has sufficient GPU resources to run the mixtral model with the RAG similarity search to provide question context.

# Use Gekko AI Assistant (cloud service)
from gekko import support
assistant = support.agent()
assistant.ask('How do I install gekko?')

Gekko AI Assistant

A dynamic platform hosts the Gekko AI Assistant as a support tool to transform the user experience beyond the Python environment. It is a chat web interface to harnesses the power of Large Language Models (LLMs) and cloud-based resources to deliver an interactive, intelligent, and responsive service. It is accessible on mobile or desktop environments for those seeking quick information or support with complex problem-solving.

Activity: RAG and LLM Chatbot

Use a JSONL template to generate at least 10 questions and answers based on a topic of your interest and save the file as mydb.jsonl.

{"question":"","answer":""}
{"question":"","answer":""}
{"question":"","answer":""}
{"question":"","answer":""}
{"question":"","answer":""}
{"question":"","answer":""}
{"question":"","answer":""}
{"question":"","answer":""}
{"question":"","answer":""}
{"question":"","answer":""}

Adapt the RAG with LLM code above to create a custom chatbot on the topic of your interest. Choose a topic where you are a domain expert and generate at least 10 question-answer pairs. Once done, you'll build a vector database with these pairs and perform a similarity search using ChromaDB that can be used to enhance the context of the LLM.

Streaming Chatbot
💬