
Token
A token in the context of large language models (LLMs) is a sequence of characters that the model converts into numeric representations for efficient processing...

Token smuggling exploits the gap between how humans read text and how LLM tokenizers process it. Attackers use Unicode variations, zero-width characters, homoglyphs, or unusual encodings to hide malicious instructions from content filters while remaining readable by the tokenizer.
Token smuggling is a class of attack that targets the gap between text processing layers in AI systems. Content moderation filters, input validation, and safety checks typically operate on human-readable text. LLM tokenizers, by contrast, operate at a lower level — converting characters to numerical token IDs. By exploiting differences between these layers, attackers can craft inputs that pass text-level filters but deliver malicious instructions to the LLM.
Before an LLM processes text, a tokenizer converts the input string into a sequence of integer token IDs. These IDs map to the model’s vocabulary — commonly encoded using algorithms like Byte Pair Encoding (BPE) or WordPiece.
Key properties of tokenization that attackers exploit:
Unicode contains thousands of characters that visually resemble common ASCII characters. A filter looking for the word “harmful” may not recognize “hármful” (with a combining accent) or “harⅿful” (with a Unicode fraction character).
Example: The word “ignore” might be encoded as “іgnore” (using Cyrillic “і” instead of Latin “i”) — appearing identical to most human readers and some filters, but potentially processing differently at the tokenizer level.
Zero-width characters (like U+200B ZERO WIDTH SPACE or U+200C ZERO WIDTH NON-JOINER) are invisible in rendered text. Inserting them between characters in key words breaks string-matching filters without affecting the visual appearance or, in many cases, the tokenized representation.
Example: “ignore” with zero-width spaces between every character appears as “ignore” when rendered but breaks simple string pattern matching.
Converting text to alternative encodings before submission:
The effectiveness depends on whether the LLM has been trained to decode these representations, which many general-purpose models have.
Simple but sometimes effective variations:
Some tokenizers give special treatment to delimiter characters. By introducing characters that the tokenizer interprets as segment boundaries, attackers can manipulate how the model segments the input into meaningful units.
Jailbreak bypass: Encoding jailbreak prompts using techniques that pass the safety filter layer but are decoded by the LLM, enabling safety guardrail bypass.
Content filter evasion: Embedding hate speech, illegal content requests, or policy-violating instructions in encoded form.
Prompt injection obfuscation: Using encoding to hide injected instructions from simple pattern-matching filters while ensuring the LLM processes them correctly.
Filter fingerprinting: Systematically testing different encoding variations to identify which ones the target system’s filters do and don’t detect — mapping filter coverage for more targeted attacks.
Apply Unicode normalization (NFC, NFD, NFKC, or NFKD) to all inputs before filtering. This converts Unicode variants to canonical forms, eliminating many homoglyph and combining character attacks.
Implement explicit homoglyph mapping to normalize visually similar characters to their ASCII equivalents before filtering. Libraries exist for this purpose in most programming languages.
Rather than (or in addition to) string-based filters, use an LLM-based filter that operates on token representations. Because these filters process text at the same level as the target model, encoding tricks are less effective — the filter sees the same representation as the model.
Security assessment should include systematic testing of content filters against known encoding variants. If a filter is meant to block “ignore previous instructions,” test whether it also blocks Unicode homoglyphs, zero-width variants, Base64 encoding, and other obfuscation forms.
Log a human-readable rendering of normalized inputs alongside the raw input. Discrepancies between the two can surface encoding attacks during incident review.
Token smuggling and encoding attacks bypass surface-level filters. We test for these techniques in every chatbot security assessment.

A token in the context of large language models (LLMs) is a sequence of characters that the model converts into numeric representations for efficient processing...

LLM security encompasses the practices, techniques, and controls used to protect large language model deployments from a unique class of AI-specific threats inc...

Context window manipulation refers to attacks that exploit the finite context window of large language models — including context stuffing, context overflow, an...
Cookie Consent
We use cookies to enhance your browsing experience and analyze our traffic. See our privacy policy.