GenAI’s Greatest Flaw

Since the release of ChatGPT to the public, the rise to prominence of Generative AI (GenAI) has been nothing short of remarkable. The technology rise is due in part to both the technology maturing and the massive funding and spending the sector has seen recently. S&P Global stated,

“Funding in GenAI exceeded $56 billion in 2024, according to data from S&P Global Market Intelligence. That was up almost double from 2023, when GenAI companies attracted approximately $29 billion…But while the overall total value of the investments grew, the record funding amount supported fewer companies than the previous year, indicating a trend of investors consolidating around select winners. There were 171 rounds of investment in GenAI in 2024, down from 273 in 2023. Additionally, the average funding round size in 2024 soared to $407 million, compared to $133 million in 2023 and $33 million in 2022.”

Unfortunately, with rapid expansion, cybersecurity flaws follow. Flaws that history judges as preventable more often than not. GenAI applications have already suffered from data leakage, adversarial attacks, prompt injection, bias, and the potential for generating malicious content like spam, phishing emails, or deepfakes, this despite efforts to place guardrails and other less successful security measures adopted in a hurry.

Of the threats faces by GenAI, one stands out as being a flaw needing to be addressed, that being prompt injection. In brief, prompt injection occurs when a threat actor manipulates a large language model (LLM) through crafted inputs, causing the LLM to unknowingly execute the attacker’s intentions. Since GenAI tools became widely available to the public, prompt injection has been a major concern, and sometimes termed GenAI the greatest security flaw. That said, now there appears to be strong competition to the crown, Indirect Prompt Injection.

Indirect Prompt Injection

The Alan Turing Institute defines Indirect Prompt Injection as,

Indirect prompt injection is the insertion of malicious information into the data sources of a GenAI system by hiding instructions in the data it accesses, such as incoming emails or saved documents. Unlike direct prompt injection, it does not require direct access to the GenAI system, instead presenting a risk across the range of data sources that a GenAI system uses to provide context.” 

It is the feature of not needing direct access to the GenAI system that is most troubling, as alluded to in the above definition. At the attack vectors core is a GenAI developers allowing the the AI tool in question access to emails, personal documents, organizational knowledge and other business applications, there is a marked increase in the scope to introduce malicious disinformation through indirect prompt injection using hidden instructions. This in turn also drastically increases the scope of possibilities for threat actors to add hidden prompts only the AI tool will read and address.

As GenAI tools like ChatGPT or Gemini, as examples, do not read data as a human would, this has opened the door for threat actors to use exceedingly simple techniques to potentially compromise an AI’s Large Language Model (LLM). While the malicious prompt is hidden to the human eye it still is crucial to how the AI tool will read the input.

Typically, LLM’s rely on a system called a system called Retrieval-Augmented Generation (RAG) that take a user’s initial query to a system, the RAG and the LMM them reaches into connected data sources (e.g., document stores, databases, internet services and emails) and retrieves the most relevant contextual information. This is then provided to an LLM as part of its prompt, combined with the user’s initial query, and allows the LLM to respond as if it understands the organization’s data.

Case Studies

As to how a malicious actor can poison the process, it begins with whether the malicious actor can insert information into any of the data sources provided without alerting the querying user. This allows them to influence the behavior of a RAG plus LLM system via the context it uses. 

At a basic level, it is possible to completely stop a system from responding. More nuanced attacks can prevent a system from responding to queries associated with certain key terms. At a higher level, far more dangerous than the others that are a nuisance in comparison, a threat actor can use indirect prompt injection to execute malicious code. Further high level dangers emerge when the threat actor can introduce disinformation or return incorrect banking details, so invoices are paid to the wrong account.

Bar the theoretical cases mentioned above, security researchers have shown the practical dangers associated with Indirect Prompt Injection. 

Disinformation via Email

First brought to the public’s attention at BlackHat 2024, security researchers showed that emails can be exploited as a route into a user’s knowledge base. Such an attack can be considered as spreading disinformation via email phishing. This was done via hiding a malicious prompt in an email, where an email address was correctly provided via an LLM regarding a company’s pension policy and who to contact. The attacker then hid a malicious prompt in a reply email which would be read by the LLM, and effectively poison data. The LLM in question would then provide other users incorrect contact information regarding the above-mentioned company pension scheme. Such a scenario would allow attackers to have sensitive information emailed directly to them. 

Disinformation via Documents

Similar in principle to the above case study, security researchers also found that Cloud-based document services like Google Drive and Sharepoint could suffer indirect prompt injection attacks. Researchers showed how the injection of disinformation via obfuscated data into a saved document can lead linked GenAI applications to misrepresent an organization’s stance on legal responsibility, and to repeat that disinformation when asked to draft a letter of engagement to other users.

Targeted Denial of Service

The last real world scenario deserving special mention allows the attacker to effectively break an LLM’s ability to respond to questions posed by users. This can be seen as a targeted denial of service, rather than rendering the entire GenAI tool useless. The attacker can introduce malicious information that triggers a system’s guardrails, forcing it to respond with a ‘“ can’t help with that” response when presented with a generic request. These attacks can be targeted at specific keywords and requests, making them harder to trace, as the system appears to function normally outside these requests.

Concluding Remarks

Security researchers have only begun to scratch the surface into the harm that such an attack can cause. This begs the question as to how one can mitigate this threat. That will be covered in part two, the good news is attacks can be mitigated, but it will not be an easy task.

Tags: No tags

Comments are closed.