An Update on AI Agents – AI Research From The Lab

In Research Briefs

By Dave Timm

December 2023

Table of contents

Print to PDF

An Update on AI Agents: The workforce of the future?

Overview

Back in July, we wrote a paper on Al-Powered Autonomous Agents. It proved to be one of our most popular papers, and in our view this will be one of the most disruptive applications of Al in the next 12 months. But a lot has changed since July and it is an area of significant research for us, so we’re updating the paper. Al Agents, aka Digital employees or Digital Colleagues can break down previously unseen objectives into executable steps and then complete the work, just like a junior employee would.

Key Concepts

An Al Agent needs memory, both short-term to remember the current conversation or task, and long-term to learn from human feedback. It needs the ability to convert the details from short-term memory into the salient facts for long-term recall. It needs a mechanism to plan its work, a way of acquiring domain knowledge, and the skills to interact with source systems.

A recent release by OpenAI – the Assistants API – supports the development of Al Agents. Our team checked it out to see if it would help us in our work creating Digital Employees.

Deep Dive

We explored the Assistants API in the context of our work to create a Procurement Officer and a Health and Safety Analyst for a mining client, and a Commercial Analyst for a construction client.

Setup & Testing

We tested our agent using the Assistants API to evaluate its performance in three areas: Planning, Memory, and the use of Skills to complete work.

For Planning, it has limited capability in task decomposition and self-reflection, failing to effectively break down tasks, indicating that further work is needed.

Memory-wise, the expanded 32K token limit improved short-term memory. We use a method called Reflection to transition short-term context into long-term memory. Once stored, memory can be retrieved along with domain knowledge.

In Skills, the API provided access to OpenAl tools such as the Code Interpreter and Knowledge Retrieval, plus the flexibility to build custom tools through its Function Calling feature.

Hitting The Limits

The Code Interpreter, while effective, exhibited longer generation and execution times for code, and it only provides the generated code upon specific user request. The Retrieval tool – used to seed domain knowledge – demonstrated strong performance when handling a single file upload, but lost accuracy with multiple files.

Lastly, the Function Calling feature allows users to create their functions, but it’s important to note that the results from these functions are exclusively returned in JSON format.

Overall, the cognitive architecture of the Assistants API is not transparent, leaving us without insight into the underlying mechanisms. The specific algorithms they employ remain unknown, as do the methods used for managing the context of chat histories and the processes involved in information retrieval. This lack of visibility into the internal workings keeps us in the dark about how these systems operate on a fundamental level.

What’s Next?

The current release of the Assistants API is conceptually strong but remains too basic to support production deployment with our use cases. We expect it to improve quickly however and we will monitor
developments closely.