Docs / Detectors / Prompt Injection

Prompt Injection V32 · v1.6

User-controlled text gets concatenated into a system prompt and instructs the LLM to ignore prior instructions. Arcis ships 28 baseline signatures plus 5 v1.6 patterns for the agent-toolcall pivot class.

What it catches

HIGH severity

MEDIUM severity

v1.6 agent-toolcall patterns (5 new)

Sample

import { detectPromptInjection } from '@arcis/node';

const result = detectPromptInjection(user_input);

if (result.severity === 'high') {
  res.status(400).json({ error: 'prompt injection detected' });
  return;
}

Configuration

import { detectPromptInjection, sanitizePromptInjection } from '@arcis/node';

// Detect: returns { severity, matches: [{ rule, snippet }] }
const result = detectPromptInjection(text, {
  minSeverity: 'medium'  // 'low' | 'medium' | 'high'
});

// Sanitize: strip detected patterns and return scrubbed text
const clean = sanitizePromptInjection(text);

Disable

Opt-in only. Call detectPromptInjection on LLM-handler routes, not on general request middleware (prompt-injection patterns false-positive heavily on normal text that mentions AI).

References

Detection only. The right response to a high-severity hit is to refuse the request, not strip the bytes. The LLM can still recover a similar instruction from a sanitized string. Pair with the tokenBudget middleware to cap blast radius from anything that does slip through.