Let’s Talk
An Update on AI Agents - AI Research From The Lab - Red Marble AI

An Update on AI Agents – AI Research From The Lab

By Dave Timm

December 2023

An Update on AI Agents: The workforce of the future?

Overview

Back in July, we wrote a paper on Al-Powered Autonomous Agents. It proved to be one of our most popular papers, and in our view this will be one of the most disruptive applications of Al in the next 12 months. But a lot has changed since July and it is an area of significant research for us, so we’re updating the paper. Al Agents, aka Digital employees or Digital Colleagues can break down previously unseen objectives into executable steps and then complete the work, just like a junior employee would.

Key Concepts

An Al Agent needs memory, both short-term to remember the current conversation or task, and long-term to learn from human feedback. It needs the ability to convert the details from short-term memory into the salient facts for long-term recall. It needs a mechanism to plan its work, a way of acquiring domain knowledge, and the skills to interact with source systems.

A recent release by OpenAI – the Assistants API – supports the development of Al Agents. Our team checked it out to see if it would help us in our work creating Digital Employees.

Deep Dive

We explored the Assistants API in the context of our work to create a Procurement Officer and a Health and Safety Analyst for a mining client, and a Commercial Analyst for a construction client.

Setup & Testing

We tested our agent using the Assistants API to evaluate its performance in three areas: Planning, Memory, and the use of Skills to complete work.

For Planning, it has limited capability in task decomposition and self-reflection, failing to effectively break down tasks, indicating that further work is needed.

Memory-wise, the expanded 32K token limit improved short-term memory. We use a method called Reflection to transition short-term context into long-term memory. Once stored, memory can be retrieved along with domain knowledge.

In Skills, the API provided access to OpenAl tools such as the Code Interpreter and Knowledge Retrieval, plus the flexibility to build custom tools through its Function Calling feature.

Hitting The Limits

The Code Interpreter, while effective, exhibited longer generation and execution times for code, and it only provides the generated code upon specific user request. The Retrieval tool – used to seed domain knowledge – demonstrated strong performance when handling a single file upload, but lost accuracy with multiple files.

Lastly, the Function Calling feature allows users to create their functions, but it’s important to note that the results from these functions are exclusively returned in JSON format.

Overall, the cognitive architecture of the Assistants API is not transparent, leaving us without insight into the underlying mechanisms. The specific algorithms they employ remain unknown, as do the methods used for managing the context of chat histories and the processes involved in information retrieval. This lack of visibility into the internal workings keeps us in the dark about how these systems operate on a fundamental level.

What’s Next? 

The current release of the Assistants API is conceptually strong but remains too basic to support production deployment with our use cases. We expect it to improve quickly however and we will monitor
developments closely.

What’s the verdict?

LLM-Powered Al Agents will cause huge changes to the way work is performed in most if not all organisations.

Change on this level brings opportunity and risk.

We believe 2024 will bring profound change as Al Agents / Digital Employees are introduced into Corporate workforces.

Thanks for checking out our business articles.

If you want to learn more, feel free to reach out to Red Marble AI.

You can click on the "Let's Talk" button on our website or email Dave, our AI expert at d.timm@redmarble.ai.

We appreciate your interest and look forward to sharing more with you!

Let’s Talk

Keep reading

Research Briefs
cost fine tuning LLM redmarble
Research Briefs
Research Briefs
Research Briefs
OpenAI for Docket Recognition - AI Research From The Lab - Red Marble AI
Research Briefs
AI-Generated Video - AI Research From The Lab - Red Marble AI
Research Briefs
Fine-Tuning GPT-3.5 Turbo - AI Research From The Lab - Red Marble AI
Research Briefs
12 steps to responsible ai
AI Governance
Audiocraft AI Music Generation - AI Research From The Lab - Red Marble AI
Research Briefs
GPT4all - AI Research From The Lab - Red Marble AI
Research Briefs
Emerging LLMs - AI Research From The Lab - Red Marble AI
Research Briefs
AI-Powered Autonomous Agents - AI Research From The Lab - Red Marble AI
Research Briefs
AI Regulatory Update
AI Governance
AI Regulatory Update
AI Governance
AI Regulatory Update
AI Governance
descrimination in ai
AI Governance
The Quiet AI revolution in Heavy Industries -Red Marble AI
AI Strategy
Red Marble Construction Language Research project
AI in Construction
The AI Revolution is here - Red Marble AI whitepaper
AI in Business
AI
AI in Construction
AI Strategy
AI in Business
AI in Business
AI
AI
AI
AI Strategy
AI
Experiments with Red Marble AI
AI Strategy
AI in Business
AI in Business
AI
AI in Business