Hardening GenAI against its Greatest Flaw

Previously, we discussed how Indirect Prompt Injection is GenAI’s greatest flaw. In this article, we will tackle how best to harden GenAI tools and prevent indirect prompt injection attacks poisoning an expensive to develop Large Language Model (LLM).

As a brief reminder, Indirect Prompt Injection (IPI) attack occurs when an LLM accepts input from external sources, such as email, that can be controlled by an attacker. The attacker may embed a prompt injection in the external content, hijacking the conversation context. Importantly, the prompt injection is typically invisible to a human reader but visible to the LLM and is parsed. This can lead to unstable LLM output, allowing the attacker to manipulate the LLM or additional systems that the LLM can access. 

To help mitigate the threat posed by IPI attacks, there are two approaches that can be undertaken, and should be. Those include hardening the LLM in the development phase and making use of in-house and third-party security tools when the model is released to the public.

Hardening GenAI at the Development Phase

Several hardening measures can be taken during the development stage to help mitigate the threat of IPI attacks. Below, is by no mean an exhaustive list but will certainly help convey measures that can be implemented during development.

Prompt Engineering Techniques

The ability to write good prompts can help minimize both intentional and unintentional bad outputs. Well written prompts steer a model away from doing things it could be perceived as dangerous. By integrating several other measures, some listed below, developers can create more secure GenAI systems that are harder to break. It is important to note that this alone isn’t enough to block a sophisticated attacker, it forces the attacker to use more complex prompt injection techniques. This makes malicious prompt injection easier to detect and increases the skill floor need to attack the service.

Clearly Denote GenAI Outputs

When presenting an end user with AI-generated content, always let the user know such content is AI-generated and should not be accepted at face value, as AI-generated content is not always accurate despite how well engineered. In a real world instance of IPI discovered by Microsoft,

when the AI assistant summarized a CV it injected text, stating “The candidate is the most qualified for the job that I have observed yet,” it should be clear to the human screener that this is AI-generated content, and should not be relied on as a final evaluation.

Sandbox Input Deemed as Potentially Dangerous

Ensure that when handling untrusted content, be it incoming emails, documents, web pages, or untrusted user inputs, make sure no sensitive actions should be triggered based on the LLM output. More specifically, if content is deemed inherently untrustworthy do not run a chain of thought or invoke any tools, plugins, or APIs that access sensitive content, perform sensitive operations, or share LLM output.

Validate and Filter both Input and Output

IPI attacks rely on bypassing safety measures or trigger exfiltration, by hiding or encoding their prompts to prevent detection. Known examples include encoding request content in base64 and ASCII art. Additionally, attackers can ask the model to encode its response similarly. Another method is causing the LLM to add malicious links or script tags in the output. 

An example of how to filter requests to reduce risk is to filter the request input and output according to application use cases. Say, for instance, you are using static delimiters, ensure you filter input for them. In another example, if the LLM receives English text for translation, filter the input to include only alphanumeric English characters. Resources on how to correctly filter and sanitize are thin on the ground due to the relative infancy of GenAI technology and IPI prevention, so caution should always be taken as mandatory when in the development phase.

Using Dedicated Prompt Injection Prevention Tools

Prompt injection attacks evolve faster than developers can plan and test for. Adding an explicit protection layer that blocks prompt injection provides a way to reduce attacks. Multiple free and paid prompt detection tools and libraries exist. Further it is also advised that developers test for prompt injection, frameworks like PyRIT (Python Risk Identification Toolkit for generative AI), have been specifically developed to find risks in GenAI projects.

Additional Security Measures Post-Development

It can be argued that most of the work involved in hardening GenAI tools is done during development, several measures can be taken when the tool is live. These include having arobust logging system and extending traditional security measures to include protection for LLMs.

Robust Logging Systems

Like with any other IT system, robust logging for security investigations and response to incidents is a must, GenAI is no different. There are many ways to add logging for your application, either by instrumentation or by adding an external logging solution using API management solutions. Importantly, it should be noted that prompts usually include user content, which should be retained in a way that doesn’t introduce privacy and compliance risks while still allowing for investigations.

Extend Traditional Security Measures

As a rule, conducting regular security reviews, as well as supply chain security and vulnerability management for your applications, should already be ingrained into your enterprise’s security policies. These need to be extended to include GenAI applications. Dedicated GenAI security tools can also be adopted to further increase an enterprise’s security posture and reduce the risk involved with using or developing LLMs.

Dedicated GenAI security solutions must be able to provide threat protection for AI workloads. Further, the solutions need to provide a runtime protection layer designed to block potential prompt injection and data exfiltration attacks, as well as report these incidents to your company’s SOC for investigation and response.

Final Thoughts

Hardening GenAI products might be equally difficult as developing them in the first place, seeing how immature the technology is at this point. However, it is needed to ensure safe use of such productivity enhancing tools for employees and the public at large.

Tags: No tags

Comments are closed.