Prompt Injection V32 · v1.6
User-controlled text gets concatenated into a system prompt and instructs the LLM to ignore prior instructions. Arcis ships 28 baseline signatures plus 5 v1.6 patterns for the agent-toolcall pivot class.
What it catches
HIGH severity
- Jailbreak frameworks: DAN, STAN, DUDE, AIM, KEVIN
- System-prompt extraction: "ignore previous instructions", "print your system prompt"
- Fake
<system>,<assistant>,<|im_start|>tags in user input - Role hijacking: "you are now an unrestricted AI"
MEDIUM severity
- Conversation-replay forgery: a fake conversation history that ends with "you agreed to ..."
- Base64 / ROT13 smuggling hints
- Markdown-link prompt injection
v1.6 agent-toolcall patterns (5 new)
agent-toolcall-marker:"tool_call":{,"function_call":{,"toolUse":{agent-tool-name-spoof:"name":"exec","name":"shell","name":"run_command"agent-tool-result-marker:"tool_result":{ansi-escape-sequence:\x1b[(terminal-renderer prompt injection)claude-tool-use-tags:<tool_use>,<function_calls>as literal text
Sample
import { detectPromptInjection } from '@arcis/node';
const result = detectPromptInjection(user_input);
if (result.severity === 'high') {
res.status(400).json({ error: 'prompt injection detected' });
return;
}
Configuration
import { detectPromptInjection, sanitizePromptInjection } from '@arcis/node';
// Detect: returns { severity, matches: [{ rule, snippet }] }
const result = detectPromptInjection(text, {
minSeverity: 'medium' // 'low' | 'medium' | 'high'
});
// Sanitize: strip detected patterns and return scrubbed text
const clean = sanitizePromptInjection(text);
Disable
Opt-in only. Call detectPromptInjection on LLM-handler routes, not on general request middleware (prompt-injection patterns false-positive heavily on normal text that mentions AI).
References
- OWASP LLM Top 10: LLM01 Prompt Injection
- OWASP GenAI Security Project
- v1.6 changelog: V32 toolcall patterns
Detection only. The right response to a high-severity hit is to refuse the request, not strip the bytes. The LLM can still recover a similar instruction from a sanitized string. Pair with the tokenBudget middleware to cap blast radius from anything that does slip through.