Data - AI Healthcare Aeronautics - Space - Defence Agrifood Chemicals - Materials Cosmetics Energy - Environment - Mobility Public policy Cross-sector

RAG and fine-tuning: 2 methods for improving the relevance of generative AI chatbots

Published on 31 January 2025 Read 25 min

RAG fine tuning experts cabinet conseil consulting

When it was released at the end of November 2022, ChatGPT gained over a million users in just 5 days. It took the chatbot just 2 months to reach 100 million users – 20 times faster than Netflix, which took 3.5 years!

Entertaining at first, this new technology immediately attracted a great deal of interest. Artificial intelligence is destined to revolutionize many fields. To do this, however, it needs to become reliable. Today, AI is limited by its hallucinations, lack of emotional intelligence, weak expertise and disconnection from current events. To improve its reliability, we can, for example, personalize and optimize models for specific tasks. This is where fine-tuning and Recovery Augmented Generation (RAG) come into play. In this article, Alcimed explores these 2 methods capable of improving the relevance and quality of responses generated by chatbots.

What is a pre-trained model?

When we talk about AI models, we’re generally talking about algorithms using neural networks. This technology is used by Large Language Models (LLMs) such as ChatGPT or Mistral Large, by image generation algorithms or by visual recognition algorithms. These models are trained on very large datasets, enabling them to acquire a solid grounding in the skills they are to master. Numerous parameters can be set on the different layers of a network to modify the way information is processed. In fact, billions of parameters can be set when the models are really large (which leaves plenty of room for customization…)!

What is fine-tuning and RAG?

Fine-tuning involves taking a pre-trained model and adapting it for specific tasks or domains with less data. This method improves the model’s performance on particular tasks, by fine-tuning the weights of neuronal connections and its parameters to better meet specific requirements, while retaining previously acquired knowledge.

Retrieval Augmented Generation (RAG) is a technique in which a language model is combined with an information retrieval system. During response generation, the model uses a search engine to retrieve relevant information from a database or external documentary corpus, then integrates this information into the text generation process. The aim is to enrich the model’s responses with current, specific information not necessarily contained in its initial training. This enables the model to provide more informed, accurate and detailed responses than a traditional model could.

In this way, fine-tuning trains a model on a specific domain, in the manner of a student revising thermal machines in particular for his thermodynamics exam. On the other hand, RAG provides information to the model at the time of answer generation, in the manner of a thermodynamics exam that would provide the new concept of thermal machines to students to answer the questions.

Finally, there’s a combination of both methods: RAFT. The RAFT (Retrieval-Augmented Fine-Tuning) method combines the strengths of fine-tuning and RAG to further enhance the relevance of generative AI chatbots. By integrating real-time access to external information while refining the model with specific data, RAFT enables more precise adaptation to user needs. This innovative hybrid approach maximizes chatbot performance by leveraging the advantages of both techniques.

Fine-tuning is particularly useful in fields where collecting large quantities of annotated data is costly or impractical. A number of specialist models have emerged, such as LEGAL-BERT for the legal field, BloombergGPT for finance, Meta’s Meditron for healthcare, and Mistral’s Codestral for computer code. French player Mistral has released an interface called ‘La Plateforme’, where users can customize their templates.

Many large groups, such as Axa and Capgemini, have also adopted RAG internally, to capitalize on their numerous documents. The models are hosted on their own servers, which means that no documents are shared online. In this way, they have at their disposal a conversational agent specifically adapted to their knowledge, greatly facilitating information retrieval.

By 2024, investments in server factories by Amazon, Microsoft, Google and Meta are estimated at $200 billion1. Much of this investment will be needed to improve the reliability of existing models and deepen their specialization.

Find out how our data team can help you with your projects related to artificial intelligence >

The limits and challenges of LLM specialization

The specialization of AI models is obviously not without its challenges. The main ones are as follows:

Data quality and quantity: Fine-tuning requires labeled data in sufficient quantity and of good quality to ensure high-performance model retraining. These data sets are often laborious to obtain and process.
Overlearning: Overlearning occurs when a model learns to memorize training data rather than generalize patterns. Care must be taken to avoid this phenomenon.
Bias concerns: Specialized models can inherit biases present in pre-trained models or training data, leading to unfair or discriminatory results. Companies need to prioritize fairness, transparency and ethical considerations when deploying AI systems in real-world environments.
Resource constraints: Fine-tuning can be computationally expensive, especially for large models and datasets. It is therefore necessary to carefully assess computing resources and infrastructure capabilities to ensure effective model training and deployment.
Adapting to data formats: When fine-tuning on company-specific data, differences in formats or structures from the standards used in pre-trained models can cause problems. Additional work is often required to adapt the model to these specificities, in order to ensure smooth integration with existing internal systems.

As AI becomes increasingly democratic, the specialization of trained models represents a significant advance in this field. It enables unprecedented customization and increased efficiency in a wide range of fields, by adapting to the user’s specific needs. By overcoming current challenges, we can look forward to a future in which fine-tuning and RAG open the door to even more targeted and impactful innovations, in healthcare or industry for example. Continued exploration of this field will only strengthen our ability to shape AI to our needs, marking an era of true collaboration between human and machine. Alcimed can support you in your projects, so don’t hesitate to contact our team!

About the author,

Paul-Emile, Data Scientist in Alcimed’s Life Sciences team in France.

You have a project?

To go further

Data - AI

Artificial intelligence as an aid to medical diagnosis

Artificial intelligence has made its mark in many specialties as an aid to medical diagnosis. But how is it being used, and what challenges lie ahead?
Data - AI

AI in medical imaging, a revolution in medical diagnosis and patient care

Artificial intelligence has made its mark in many specialties, such as medical imaging. But how is AI in medical imaging used, and what are the challenges ahead?
Data - AI

14 chatbot use cases in healthcare for patients and HCPs

Use case n°1: treatment adherence enhancement through treatment reminders and medication management In the complex world of healthcare, adhering to treatment plans and medication schedules is ...

RAG and fine-tuning: 2 methods for improving the relevance of generative AI chatbots

What is a pre-trained model?

What is fine-tuning and RAG?

The limits and challenges of LLM specialization

You have a project?

Tell us about your uncharted territory

To go further

Artificial intelligence as an aid to medical diagnosis

AI in medical imaging, a revolution in medical diagnosis and patient care

14 chatbot use cases in healthcare for patients and HCPs