The Cost of Fine Tuning an LLM

In Research Briefs

By Dave Timm

June 2024

Table of contents

Print to PDF

Evaluating a low-cost fine tuned open source LLM

Training an LLM from scratch has significant cost. Google’s Gemini Ultra is reported to have cost $191M to train; Open Al’s GPT-4 an estimated $78M. DBRX from Databricks is significantly less but still reportedly cost $10M to train from scratch. Clearly, this is out of reach for nearly all organisations. The viable alternatives are a technique called Retrieval Augmented Generation, or RAG, or training your own instance of an open source model. We explored the cost to train an open-source model to reach a reasonable level of accuracy, using an open-source model from the French company Mistral Al.

Key Concepts

Mistral Al, founded in April 2023, has quickly emerged as one ofthe leading providers ofopen source LLMs. Established by 3 former employees of Meta Platforms and Google DeepMind, the company quickly gained traction, raising $428M by October 2023. We explored Mistral 7B, a 7.3 billion parameter model. Parameters are the learned values that define the model’s behavior and capabilities. In LLM terms, 7B is small model – larger models like GPT4, Claude Opus and Google Gemini Ultra are reported to have trillions of parameters.

Deep Dive

Mistral 7B was officially released in September 2023 under the Apache 2.0 license, with full transparency of the weights used in the model – the learned values representing the probabilities in the model. When training the model, we monitor the training loss (the difference between the model’s predictions and known correct answers, as the model learns) and the validation loss (how well the model performs when applying what it has learned to unseen data). Lower training loss indicates that it is learning well; lower validation loss suggests that the model is performing well on new data. So our goal is low training loss and low validation loss with minimum cost.

Setup & Testing

Before fine-tuning, the base model of Mistral 7B provides responses based solely on general knowledge. We then fine-tuned it with a 181-page commercial contract. To prepare the training data, we split the contract text into chunks, each with a maximum of 512 characters. We then trained the model in “steps”. During each step a small batch of data is fed into the model; the model processes the data and makes predictions; the predictions are compared to the correct answers, and the model’s error (loss) is calculated; finally, the model’s weights are adjusted slightly based on this error, so it can make better predictions next time. This process is repeated many times with different batches of data until the
model’s performance on the specific task improves.

Hitting the limits

Fine-tuning the model took only four hours in our case, and cost less than US$10 in compute costs using a single mid-range GPU – NVIDIA A10G – on a cloud service, excluding storage costs. This would obviously be higher for larger document sets, but still demonstrates what is possible. We selected a fine tuned model version with low training loss, stopping the training after step 4500,a level which showed low training loss and good accuracy in answers.

We evaluated the fine-tuned model with 11 testing questions. The model answered questions based on the specific trained knowledge rather than its pre-trained general knowledge, indicating that the fine-tuning had worked. However, the generated answers tended to be verbose and the accuracy was lower than other fine-tuned open source models such as Meta’s Llama 2 7B model.

What’s next?

Recently, there has been an increase in the release ofopen-source LLMs. Meta introduced Llama 3 in April and Databricks released DBRX in May. The performance ofthese open-source LLMs continues to improve, and fine-tuning them to perform specific roles is a very viable option with negligible cost.

What’s the verdict?

Fine-tuning an open-source LLM with proprietary documents should be a core capability for companies. It is both cost-effective and time-efficient. A fine-tuned LLM can perform various language tasks at a lower cost, reducing dependency on online services, and enhance data security.

Thanks for checking out our business articles.

If you want to learn more, feel free to reach out to Red Marble AI.

You can click on the "Let's Talk" button on our website or email Dave, our AI expert at d.timm@redmarble.ai.

We appreciate your interest and look forward to sharing more with you!

Let’s Talk