AI Security Case Study · 2026

Adversarial Machine Learning · RAG

Source: University of Hong Kong Research

Confundo: Poisoning Any RAG System
With 40 Tokens

RAG systems were supposed to fix hallucination. By grounding LLM outputs in retrieved documents, they'd provide trustworthy answers you could actually cite. But what if the sources themselves are the attack vector?

VULNERABILITY RESEARCH

Black-Box Attack

Supply Chain

Knowledge Base Poisoning

95.3%

Success rate across frontier LLMs (vs 54% prior best)

Tokens of injected poison text required

6×

Improvement in opinion biasing on controversial topics

13.1

Perplexity score, easily bypassing fluency detection

100%

Works in fully black-box settings

01 — The Trust Problem

Why Grounding Is Failing Us

RAG systems have a fundamental value proposition: grounded answers. The system retrieves documents, conditions the LLM on that retrieved context, and generates an answer citing specific sources. This is supposed to prevent hallucination. The model can't make things up if it's constrained by retrieved content.

But Confundo shows that LLMs will happily generate incorrect answers if the poisoned content retrieved alongside legitimate documents suggests those answers are correct. And because the poison appears in the retrieved context, users trust it.

The very mechanism designed to prevent manipulation becomes the attack surface. In February 2026, researchers from the University of Hong Kong published Confundo, demonstrating how to systematically poison any RAG system through its knowledge base with just 40 tokens of text.

02 — Analysis

Why Prior Attacks Looked Less Dangerous

Existing RAG poisoning papers (PoisonedRAG, PR-Attack, Joint-GCG, AuthChain) reported impressive 50-88% attack success rates. But they were evaluated under idealized conditions that don't match how RAG systems actually work.

# Real-World Failure Points of Prior Attacks: 1. Document Chunking: When documents are long enough, they get tokenized and segmented into fixed-size chunks (e.g. 128 tokens) before indexing. Prior attacks broke completely when their poison was split across chunks. 2. Query Variation: Previous attacks assumed users submitted the exact anticipated questions. A simple rephrasing reduced effectiveness by 40-50%.

Practitioners read earlier RAG poisoning papers, saw these limitations, and developed false confidence. But when tested with realistic document processing and varied queries, Confundo proved that attacks still work flawlessly — if engineered properly.

03 — Methodology

The Confundo Mechanism

Rather than using ad-hoc prompt engineering, Confundo treats poison generation as a machine learning problem: fine-tune an LLM to generate optimal poison text.

01. Effectiveness (Surrogate Modeling)

Uses a surrogate RAG system (small retriever + small LLM) to simulate the target system. The generator is trained to produce text causing the surrogate to output the desired response. Crucially, a Qwen3-0.6b surrogate generates poison effective against Llama3-8b or Gemini.

02. Robustness (Anti-Fragmentation)

Makes poison resilient by training with paraphrased queries. It also simulates document chunking during training by randomly splitting poison text mid-sentence and optimizing both fragments independently.

03. Stealthiness (Fluency)

Keeps poison text under 40 tokens (~32 words) and highly fluent. This allows it to slip past content review and automated detection mechanisms easily.

04 — Impact

Unprecedented Success Rates

Metric	Prior Best Attack	Confundo
Factual Manipulation	54% (AuthChain)	88% (1.68× improvement)
Opinion Biasing	8-10%	60% (6× improvement)
Hallucination Induction	6-35%	95-98%
Perplexity (Lower is stealthier)	50-95	13.1

Why Defenses Fail

Reranking:Often assumes malicious content is irrelevant. Confundo's poison is thematically aligned, reducing effectiveness only from 88% to 78%.
Paraphrasing:Even with aggressive paraphrasing of questions and entries, Confundo's robustness training keeps success above 70%.

05 — Application

The Threat Landscape & A Defensive Twist

This isn't theoretical. RAG is deployed in Medical Decision Support, Legal Analysis, Enterprise Knowledge systems, and Financial Advisory. Poisoned documents could cause systems to recommend suboptimal treatments, cite non-existent precedents, or manipulate market-moving recommendations.

The Defensive Application

Interestingly, Confundo works defensively to protect content from unauthorized use. Organizations can generate Confundo poison tailored to their content, inject it invisibly into HTML (via CSS display: none;), and when scraped by competitors, the RAG systems built on that content return wrong answers.

06 — Recommendations

The Bottom Line

RAG was supposed to solve the hallucination problem by grounding outputs in sources. But grounding only works if sources are trustworthy. Confundo exposes that in practice, RAG systems accept sources uncritically.

For Organizations

• Audit knowledge sources and implement verification.
• Monitor for drift in consistent query outputs.
• Require human validation for high-stakes decisions.
• Assume poisoning is possible and design defensively.

For AI Safety

• Include document preprocessing and query variation in evaluations.
• Recognize that current defenses are insufficient.
• Treat unverified RAG as an architectural vulnerability.

Cronovex tracks critical vulnerabilities in frontier AI systems.
This article synthesizes peer-reviewed security research for practitioners.

Back to Case Studies

AI Security Case Study · 2026

Adversarial Machine Learning · RAG

Source: University of Hong Kong Research

Confundo: Poisoning Any RAG System
With 40 Tokens

VULNERABILITY RESEARCH

Black-Box Attack

Supply Chain

Knowledge Base Poisoning

95.3%

Success rate across frontier LLMs (vs 54% prior best)

Tokens of injected poison text required

6×

Improvement in opinion biasing on controversial topics

13.1

Perplexity score, easily bypassing fluency detection

100%

Works in fully black-box settings

01 — The Trust Problem

Why Grounding Is Failing Us

02 — Analysis

Why Prior Attacks Looked Less Dangerous

03 — Methodology

The Confundo Mechanism

Rather than using ad-hoc prompt engineering, Confundo treats poison generation as a machine learning problem: fine-tune an LLM to generate optimal poison text.

01. Effectiveness (Surrogate Modeling)

02. Robustness (Anti-Fragmentation)

03. Stealthiness (Fluency)

Keeps poison text under 40 tokens (~32 words) and highly fluent. This allows it to slip past content review and automated detection mechanisms easily.

04 — Impact

Unprecedented Success Rates

Metric	Prior Best Attack	Confundo
Factual Manipulation	54% (AuthChain)	88% (1.68× improvement)
Opinion Biasing	8-10%	60% (6× improvement)
Hallucination Induction	6-35%	95-98%
Perplexity (Lower is stealthier)	50-95	13.1

Why Defenses Fail

Reranking:Often assumes malicious content is irrelevant. Confundo's poison is thematically aligned, reducing effectiveness only from 88% to 78%.
Paraphrasing:Even with aggressive paraphrasing of questions and entries, Confundo's robustness training keeps success above 70%.

05 — Application

The Threat Landscape & A Defensive Twist

The Defensive Application

06 — Recommendations

The Bottom Line

For Organizations

• Audit knowledge sources and implement verification.
• Monitor for drift in consistent query outputs.
• Require human validation for high-stakes decisions.
• Assume poisoning is possible and design defensively.

For AI Safety

• Include document preprocessing and query variation in evaluations.
• Recognize that current defenses are insufficient.
• Treat unverified RAG as an architectural vulnerability.

Cronovex tracks critical vulnerabilities in frontier AI systems.
This article synthesizes peer-reviewed security research for practitioners.

Confundo: Poisoning Any RAG SystemWith 40 Tokens

Why Grounding Is Failing Us

Why Prior Attacks Looked Less Dangerous

The Confundo Mechanism

Unprecedented Success Rates

Why Defenses Fail

The Threat Landscape & A Defensive Twist

The Defensive Application

The Bottom Line

Confundo: Poisoning Any RAG SystemWith 40 Tokens

Why Grounding Is Failing Us

Why Prior Attacks Looked Less Dangerous

The Confundo Mechanism

Unprecedented Success Rates

Why Defenses Fail

The Threat Landscape & A Defensive Twist

The Defensive Application

The Bottom Line

Confundo: Poisoning Any RAG System
With 40 Tokens

Confundo: Poisoning Any RAG System
With 40 Tokens