Fine Tuning Mistral 7B with Own Data

By Red Marble AI

May 2024

Table of contents

How to fine tune an open source LLM?

Mistral AI, founded in France in April 2023, is at the forefront of AI specialising in open-source large language models (LLMs). By offering open-source software, Mistral AI provides a powerful alternative to proprietary models. Established by 3 former employees of Meta Platforms and Google DeepMind, the company quickly gained traction, raising $428 million by October 2023 and achieving a quickly a valuation of over $2 billion by December 2023. We are excited to explore the benefits of fine-tuning a pre-trained Mistral 7B model to enhance its benefits understanding of our proprietary documents.

Key Concepts

Mistral has released 2 models, both available as weights, with additional 3 models, Small, Medium, and Large, accessible exclusively via API.

Compared to fine-tuning an open-source LLM, training a large language model (LLM) from scratch is significantly more expensive. For instance, Google’s Gemini Ultra requires an estimated $191 million for training. OpenAI’s GPT-4 incurs an estimated cost of $78 million. Databricks’ DBRX reportedly costs $10 million to train from scratch.

Deep Dive

The Mistral 7B is a LLM with 7.3 billion parameters, built using the transformers architecture. Officially released in September 2023 under the Apache 2.0 license, Mistral reports that their model surpasses LLaMA 2 13B on all tested benchmarks and matches LLaMA 34B on many benchmarks. The Mistral 7B employs grouped-query attention (GQA), a variant of the standard attention mechanism that computes attention over groups of hidden states rather than all hidden states. Mistral released 2 versions, a base model and an instruct model. The instruct model has been fine-tuned to respond to chat-style prompts.

Setup & Testing

Before fine-tuning, the base model of Mistral 7B provides responses based solely on general knowledge. The Mistral 7B model offers open-sourced trained weights. We are interested in fine-tuning it with a 181-page contract.

To prepare the training data, we split the contract text into chunks, each with a maximum of 512 characters. During fine-tuning, we monitor the training loss and validation loss, which measure the difference between the model’s predictions and the ground-truth labels in the dataset. At each step, the model learns from a specified number of examples. Our objective is to minimise both the training loss and validation loss.

Hitting the limits

Fine-tuning the model took only four hours in our case, with a cost of approximately 4 USD using an Nvidia A10G GPU on a cloud service. Additional storage costs also apply. We selected a fine tuned model version with low training loss, specifically at step 4500. Although step 525 presented a balance of low training loss and low validation loss, its accuracy was lower than that of step 4500.

We evaluated the fine-tuned model with 11 testing questions. The model answered questions based on the trained document rather than its pre-trained general knowledge. However, the generated answers tended to be verbose and the accuracy was not as good as using Llama 2 7B.

What’s next?

Recently, there has been an increase in the release of open-source LLMs. Meta introduced Llama 3 in April and Databricks released DBRX in May. The performance of these open-source LLMs continues to improve, as evidenced by their standings on the LMSYS Chatbot Arena leaderboard.

What’s the verdict?

Fine-tuning an open-source LLM with proprietary documents offers numerous opportunities in the era of generative AI. This approach is both cost-effective and time- efficient. A fine tuned LLM can perform various language tasks at a lower cost, reduce dependence on online LLM services, and enhance data security

We appreciate your interest and look forward to sharing more with you!

Let’s Talk

Keep reading