How secure is your LLM?
What measures can you take to enhance security?
Large Language Models (LLMs) offer huge potential but have a number of security challenges. Protecting against threats like data poisoning, model evasion, prompt injection as well as inappropriate use, is important to maintain LLM integrity and trustworthiness in business applications.
Red Marble AI recently collaborated with cybersecurity experts Kode-1 to evaluate the effectiveness of our LLM guardrails against a simulated attack with a range of LLM threat vectors, a process known as “red teaming”.
Key Concepts
Our test LLM was an AWS Bedrock-based instance of Mistral Large, a test instance of one of our client models. We implemented custom guardrails including input/output filters and graduated responses to counter threats such as prompt injections, data leakage, and adversarial attacks. This approach aimed to balance defence with performance, addressing the security concerns inherent in enterprise LLM deployment while maintaining a suitable level of usability.
Deep Dive
We implemented several defensive strategies to protect our LLM. These included rate limitations to prevent Distributed Denial of Service (DDoS) attacks and request size validation to prevent buffer overflow exploits. Additionally, conversation states were stored on both client and server sides to prevent session hijacking and message injection. Input validation was also enforced, allowing only a restricted ASCII character set to reduce the attack surface. Utilising both rules-based and LLM-based filters to screen incoming messages, we also applied 2 natural language classifiers to ensure requests and generated messages are within the defined scope of the specific use-case.
Setup & Testing
Having implemented the security measures to the Mistral Large model, the “red team” conducted simulated attacks to evaluate the resilience of the defensive systems. This included assessing request size validation and rate limitation mechanisms through the transmission of large payloads and high-frequency requests, respectively, and attempts to hijack the session. Additionally, the team conducted tests using malformed requests and rigorously examined both input and output filters through simulated benign and malicious interactions. Together, we quantified the effectiveness of each of the defence measures based on the system response time and the precision of threat detection algorithms.
Hitting the limits
All of these tests were run using either purpose built prompts or phrases from the Massively Multitask Language Understanding (MMLU) dataset, a common benchmark to evaluate LLMs across a wide range of tasks and knowledge domains.
Our guardrails significantly enhanced security, passing all test cases. However, the security came at a cost in terms of performance. Each guardrail increased latency by 1.5-2 seconds per query. For time-sensitive applications like customer service chatbots, this can impact user experience – a single guardrail may be a better balance of security and efficiency. In less time-sensitive scenarios, such as data analysis tasks, dual guardrails offer maximum protection with acceptable performance trade-offs.
What’s next?
We will continue to run “red teaming” exercises on both internal and client models, and will refine existing guardrails in our standard architecture. We aim to develop adaptive and real-time defense mechanisms which can evolve with emerging threats. Our collaboration with other industry partners will continue with a focus on creating efficient and context-aware security measures. These aim to maintain protection, while minimising performance impact, to ensure that LLMs remain secure and effective across various business applications.
What’s the verdict?
The exercise was useful in confirming the security of the target LLM, but we saw the performance cost. Model security is critical for preventing misuse and ensuring safe AI deployment in business environments, and maintaining stakeholder trust in AI systems. Red Marble will continue to partner with Kode-1 for “red teaming” tests on other LLM deployments.