Prompt Injection V32 · v1.6

User-controlled text gets concatenated into a system prompt and instructs the LLM to ignore prior instructions. Arcis ships 28 baseline signatures plus 5 v1.6 patterns for the agent-toolcall pivot class.

What it catches

HIGH severity

Jailbreak frameworks: DAN, STAN, DUDE, AIM, KEVIN
System-prompt extraction: "ignore previous instructions", "print your system prompt"
Fake <system>, <assistant>, <|im_start|> tags in user input
Role hijacking: "you are now an unrestricted AI"

MEDIUM severity

Conversation-replay forgery: a fake conversation history that ends with "you agreed to ..."
Base64 / ROT13 smuggling hints
Markdown-link prompt injection

v1.6 agent-toolcall patterns (5 new)

agent-toolcall-marker: "tool_call":{, "function_call":{, "toolUse":{
agent-tool-name-spoof: "name":"exec", "name":"shell", "name":"run_command"
agent-tool-result-marker: "tool_result":{
ansi-escape-sequence: \x1b[ (terminal-renderer prompt injection)
claude-tool-use-tags: <tool_use>, <function_calls> as literal text

Sample

import { detectPromptInjection } from '@arcis/node';

const result = detectPromptInjection(user_input);

if (result.severity === 'high') {
  res.status(400).json({ error: 'prompt injection detected' });
  return;
}

Configuration

import { detectPromptInjection, sanitizePromptInjection } from '@arcis/node';

// Detect: returns { severity, matches: [{ rule, snippet }] }
const result = detectPromptInjection(text, {
  minSeverity: 'medium'  // 'low' | 'medium' | 'high'
});

// Sanitize: strip detected patterns and return scrubbed text
const clean = sanitizePromptInjection(text);

Disable

Opt-in only. Call detectPromptInjection on LLM-handler routes, not on general request middleware (prompt-injection patterns false-positive heavily on normal text that mentions AI).

References

Detection only. The right response to a high-severity hit is to refuse the request, not strip the bytes. The LLM can still recover a similar instruction from a sanitized string. Pair with the tokenBudget middleware to cap blast radius from anything that does slip through.

← PreviousHPP Next →Modern Deserialization