Understanding RAG vs Function Calling for LLMs

7 min read
Deven J.
Deven J.
Published November 22, 2024

Unless you’ve been living under a rock, you probably know Large Language Models (LLMs) are all the rage right now. LLMs like OpenAI's ChatGPT and Google’s Gemini have redefined productivity and have more or less changed the world as we know it. However, their capabilities are not without limits. Static models trained on a fixed dataset don’t have the ability to stay updated with real-world events or execute specific actions by themselves. Don’t get me wrong - these models do exceptionally well even on this fixed dataset. However, ever since they came out, people have been trying to figure out how to tinker with the base model and add their customization on top.

When Diffusion models (used to generate images) first came out, I loved how they gave you the ability to change certain aspects of their behavior with Low-Rank Adaptations (LoRAs) or Hypernetworks. I spent quite a lot of time training these customizations or mixing different models to see the results. However, Diffusion models are often relatively lightweight compared to larger LLMs. They also often don’t offer the levels of customization that folks trying to make their own versions of LLMs may want. While there are some analogous concepts to LoRAs, the size and proprietary nature of most LLMs make it nearly impossible to train them that way.

To bridge these gaps, two promising approaches have emerged: Retrieval-Augmented Generation (RAG) and Function Calling. These methods empower LLMs to access external knowledge or interact with systems, largely enhancing their utility. In this article, we’re going to look into what these approaches entail and how to implement them with OpenAI models.

Looking Into RAG

Retrieval-augmented generation (RAG) is a technique that enhances the capabilities of LLMs by adding a retrieval system. It allows the model to access external knowledge sources - such as databases, documents, or APIs, in real time. This addresses one of the main limitations of LLMs - their inability to know anything beyond their training cut-off date or use domain-specific or private data. In practice, this is often used if you need to use your own repository of data (such as, say, the docs at Stream for a specific SDK), which the LLM does not know about, but could help it by adding extra context to provide a relevant answer.

In general, RAG has two steps associated with it:

  1. A retrieval step fetches relevant information from an external data source based on the user query.
  2. A generation step integrates the retrieved information to create a response.

Implementing RAG with OpenAI

In this section, we’re going to look into the steps of implementing RAG with an OpenAI model and what you need to do to provide your custom data to an LLM.

Step 1: Prepare your knowledge base

First, collect the information you want the system to use, such as documents, FAQs, or structured data. An LLM cannot use this data directly as it does not actually read words as they are usually written. Instead, an LLM (or even other models such as Diffusion) uses text encodings which can be generated using models such as OpenAI’s text-embedding-ada-002 model. These encodings then need to be stored in a special kind of database known as a vector database which stores this text in the encoded format. You can use databases such as Pinecone or Weaviate for this purpose.

Step 2: Create a retrieval system

When a user submits a query, you need to convert it into an embedding using the same model used for encoding (here, text-embedding-ada-002). You must then search the vector database for the closest matches to the query embedding. These matches represent the most relevant data points for generating an answer.

Step 3: Create the prompt

Now, supply the user prompt and the relevant information to the OpenAI API (alongside any system prompt) to create a relevant answer. For example, if we wanted to make a financial assistant using GPT-4, we can use the Python API in this way:
import openai

python
1
2
3
4
5
6
7
8
response = openai.ChatCompletion.create( model="gpt-4", messages=[ {"role": "system", "content": "You are a financial assistant."}, {"role": "user", "content": "What are the current mortgage rates for a 30-year fixed loan?"}, {"role": "system", "content": "Relevant information: The average mortgage rate for a 30-year fixed loan is 6.5% as of November 2024."} ] )

Here, the first message is the system prompt, the second is the user prompt, and the third is the retrieved vector database data.

Diving into Function Calling

Function Calling is a feature introduced in newer models such as GPT-4 that allows an LLM to interact with external systems by calling predefined functions. Instead of generating plain text responses, the model decides when to invoke these functions based on the user prompt. This approach bridges the gap between language understanding and operational execution, making LLMs capable of executing structured tasks programmatically. As an example, if you want to ask your LLM to pay for something, you can create predefined functions to process payments. Stripe recently launched an AI SDK to allow agents to use pre-made Stripe functions to do this.

Ready to integrate? Our team is standing by to help you. Contact us today and launch tomorrow!

Implementing Function Calling with OpenAI

In this section, we’re going to look into the steps of implementing Function Calling with an OpenAI model and what you need to do to provide your custom functions to an LLM.

Step 1: Define the function schema

Let’s continue the example of an investment agent. For this example, let’s create a function that calculates your mortgage. First, we define a schema for this function:

python
1
2
3
4
5
6
7
8
9
10
11
12
{ "name": "calculate_mortgage", "parameters": { "type": "object", "properties": { "loan_amount": { "type": "number", "description": "The total loan amount" }, "interest_rate": { "type": "number", "description": "Annual interest rate as a decimal" }, "loan_term_years": { "type": "number", "description": "The term of the loan in years" } }, "required": ["loan_amount", "interest_rate", "loan_term_years"] } }

Step 2: Pass the function schema to the OpenAI API call

We then need to let the model know that we have predefined functions that it can call if needed. We can pass it along with the user query and the system prompt. Using the Python API:

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import openai tools = [ { "type": "function", "function": { "name": "calculate_mortgage", "parameters": { "type": "object", "properties": { "loan_amount": {"type": "number"}, "interest_rate": {"type": "number"}, "loan_term_years": {"type": "number"} }, "required": ["loan_amount", "interest_rate", "loan_term_years"] } } } ] response = openai.ChatCompletion.create( model="gpt-4o", messages=[ {"role": "system", "content": "You are a financial assistant."}, {"role": "user", "content": "Calculate my monthly payment for a $300,000 loan at 6% interest over 30 years."} ], tools=tools, )

An important thing to note here is that you do not actually provide instructions or a function endpoint to OpenAI. Instead, the model determines if the function should be called. If the function is called, you are responsible for the execution of said function. Which brings us to our next step.

Step 3: Execute the function and pass the result back

In case the LLM decides a function call is necessary, it will provide the necessary arguments for the function call and wait for you to finish running the function and pass the relevant output back. The returned data from the LLM looks like:

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Choice( finish_reason='tool_calls', index=0, logprobs=None, message=chat.completionsMessage( content=None, role='assistant', function_call=None, tool_calls=[ chat.completionsMessageToolCall( id='call_62136354', function=Function( arguments='{"loan_amount": 300000, "interest_rate": 0.06, "loan_term_years": 30}', name='calculate_mortgage'), type='function') ]) )

Once you run this function, you can then supply the result back to the LLM:

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
content = calculate_mortgage(...) response = openai.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "You are a financial assistant."}, {"role": "user", "content": "Calculate my monthly payment for a $300,000 loan at 6% interest over 30 years."}, { "role": "assistant", "tool_calls": [ { "id": "call_62136354", "type": "function", "function": { "arguments": '{"loan_amount": 300000, "interest_rate": 0.06, "loan_term_years": 30}', "name": "calculate_mortgage" } } ] }, { "role": "tool", "content": content, "tool_call_id": "call_62136354" } ] )

With this, the LLM now integrates with your custom-defined functions. For more information on function callin, try the OpenAI guide.

Choosing the Right Approach

When deciding between RAG and Function Calling for your application, the choice ultimately depends on the kind of task you are solving.
RAG is the best choice when your application needs access to large domain-specific datasets. It allows the model to retrieve information that might not exist within the model's training data. For example, a research assistant using RAG can fetch the latest medical studies to provide accurate answers, ensuring responses are grounded in up-to-date knowledge.

On the other hand, Function Calling is good when the task involves predefined operations or structured outputs. For instance, a financial assistant calculating mortgage payments (like in the previous section's examples) or booking a flight requires the LLM to execute clearly defined functions. This approach integrates the LLM with external systems like APIs or databases.

If your application requires both up-to-date knowledge and integration with your own systems, a hybrid approach can combine RAG for retrieving the necessary context and Function Calling for calling any necessary functions regarding it. For example, an e-commerce chatbot could retrieve real-time product availability using RAG and then call a function to process an order.

Conclusion

Both RAG and Function Calling offer new ways to make Large Language Models more capable and practical. RAG is ideal for finding real-time or detailed information by connecting the model to external knowledge sources, while Function Calling allows the model to perform specific tasks like calculations or interacting with systems. Choosing the right approach depends on the needs of your application and both can be used for customising your own LLM systems.

With a better understanding of the AI techniques behind popular LLMs, you might wonder how to present LLM responses in a way that is rich, interactive, and real-time for users. One option to explore could be using a Chat UI SDK designed with AI in mind, like the one developed by our team.

Ready to Increase App Engagement?
Integrate Stream’s real-time communication components today and watch your engagement rate grow overnight!
Contact Us Today!