Let’s Talk
Emerging LLMs - AI Research From The Lab - Red Marble AI

Emerging LLMs – AI Research From The Lab

By Dave Timm

July 2023

Emerging LLMs: Evaluating the Contenders

Overview

Our research team recently investigated the current state of the art in GPT technology and evaluated the performance of 7 large language models (LLMs).

We then did a deep dive into two interesting competitors to Open Al’s models – Google Palm and the Falcon 40B LLM.

We tested them against a series of complex workflows from our client work, including two examples (python code generation, working with JSON records within a business workflow) where we have found OpenAl struggles for accuracy.

Key Concepts

Google Al has developed several GPT models, including PALM2 and Flan-T5, which can be accessed through their Vertex AI API and Model Garden.

FALCON is a family of language models developed by the Technology Innovation Institute (TII) in Abu Dhabi, which includes Falcon-40B and Falcon-7B. Falcon is particularly interesting as it can be run locally on a relatively low powered GPU on a local server, removing some data security and governance risks.

Deep Dive

LLMs work by predicting which word (or a part-word called a token) is likely to come next in a particular context. The initial training or subsequent fine tuning of the model relates to adjusting the connections (called weights) between words in order to better predict the next word within a context.

The “40B and 7B” descriptors in the Falcon model relates to the number of parameters in the model which includes these weights.

Setup & Testing

We ran Falcom 40B and 7B via a hosted version on HuggingFace, a platform which hosts machine learning models. Falcon models can be easily integrated into existing workflows using the Hugging Face ecosystem.

We ran Google Palm2 (their ChatGPT equivalent) via the Vertex AI APIs. Google has also integrated GPT into their cloud-based machine learning platform, making it easy for businesses to deploy and use GPT models.

Hitting The Limits

OpenAI GPT 4 is still a leader based on the quality level and detail of the
generated text – where the reinforcement learning with the number of users (feedback) plays a significant role but is relatively slow in response time and lacks reasoning (which OpenAl is working on).

The main competitor rising is Google Al – Palm 2 which has very strong logical reasoning capability and around 4x quicker in response time. The ‘code-bisom’ model performed worse in both direct code generation and in complex problem solving than the ‘text-bisom’ model (which over-performed GPT4 in some tasks). Negatives include the need for more precise prompts and there’s insufficient reliability to build into business automation.

FALCON shows that there is an absolutely accessible and free GPT 4 level model, with quick response time, can be run locally and has fine-tuning capability. It is early days though and we failed to get adequate responses on some tests in particular code generation and some complex query interpretation.

What’s Next? 

We will install Falcon locally and fine tune as part of another research paper. This will enable us to adjust the weights in the model and explore its utility within a specific domain.

What’s the verdict?

We will continue to implement OpenAI based models in our client engagements and will monitor progress of these and other models.

Thanks for checking out our business articles.

If you want to learn more, feel free to reach out to Red Marble AI.

You can click on the "Let's Talk" button on our website or email Dave, our AI expert at d.timm@redmarble.ai.

We appreciate your interest and look forward to sharing more with you!

Let’s Talk

Keep reading

Research Briefs
cost fine tuning LLM redmarble
Research Briefs
Research Briefs
Research Briefs
An Update on AI Agents - AI Research From The Lab - Red Marble AI
Research Briefs
OpenAI for Docket Recognition - AI Research From The Lab - Red Marble AI
Research Briefs
AI-Generated Video - AI Research From The Lab - Red Marble AI
Research Briefs
Fine-Tuning GPT-3.5 Turbo - AI Research From The Lab - Red Marble AI
Research Briefs
12 steps to responsible ai
AI Governance
Audiocraft AI Music Generation - AI Research From The Lab - Red Marble AI
Research Briefs
GPT4all - AI Research From The Lab - Red Marble AI
Research Briefs
AI-Powered Autonomous Agents - AI Research From The Lab - Red Marble AI
Research Briefs
AI Regulatory Update
AI Governance
AI Regulatory Update
AI Governance
AI Regulatory Update
AI Governance
descrimination in ai
AI Governance
The Quiet AI revolution in Heavy Industries -Red Marble AI
AI Strategy
Red Marble Construction Language Research project
AI in Construction
The AI Revolution is here - Red Marble AI whitepaper
AI in Business
AI
AI in Construction
AI Strategy
AI in Business
AI in Business
AI
AI
AI
AI Strategy
AI
Experiments with Red Marble AI
AI Strategy
AI in Business
AI in Business
AI
AI in Business