ZenML

Detecting and Mitigating Prompt Injection via Control Characters in ChatGPT

Dropbox 2023
View original source

Dropbox's security team discovered that control characters like backspace and carriage return can be used to circumvent prompt constraints in OpenAI's GPT-3.5 and GPT-4 models. By inserting large sequences of these characters, they were able to make the models forget context and instructions, leading to prompt injection vulnerabilities. This research revealed previously undocumented behavior that could be exploited in LLM-powered applications, highlighting the importance of proper input sanitization for secure LLM deployments.

Industry

Tech

Technologies

Summary

Dropbox’s security team conducted research into LLM security vulnerabilities as part of their broader effort to harden internal infrastructure for the secure use of large language models. The team discovered a previously undocumented technique for achieving prompt injection on OpenAI’s GPT-3.5 and GPT-4 models by exploiting how these models interpret control characters (like backspace and carriage return) in user input. This research is significant for any organization deploying LLM-powered applications in production, as it demonstrates that even carefully crafted prompt templates with explicit constraints can be circumvented through malformed input.

The context for this work is Dropbox’s experimentation with LLMs as potential backends for product and research initiatives, aligning with their AI principles. The security team was specifically focused on mitigating abuse of potential LLM-powered products and features via user-controlled input—a core concern for any production LLM deployment.

The Production Security Challenge

When deploying LLMs in production, organizations typically use prompt templates to control the context and output of queries. Dropbox experimented with a prompt template that included several safety measures:

This template represents a common pattern in LLMOps: using prompt engineering to constrain model behavior and prevent unauthorized information access or manipulation. The template was designed for use cases like analyzing document text from PDFs or audio transcriptions, where the context would come from server-controlled sources and questions from user input via web forms or API endpoints.

The Vulnerability Discovery

The security researchers discovered that control characters, when encoded properly in JSON payloads, can have unexpected effects on LLM behavior. The key finding was counter-intuitive: it required significantly more control characters than logically expected to achieve “model instruction betrayal.”

The team tested two specific control character encodings:

When testing with carriage returns, the team found that inserting 350 or more of these characters between two questions caused GPT-3.5 to completely forget the first question. For backspaces encoded as three-character JSON strings, at least 450 were needed to achieve similar effects.

Experimental Methodology

The Dropbox team developed a systematic black-box testing approach using Python scripts to query OpenAI’s Chat API. They used the gpt-3.5-turbo model with a fixed context (“Hello, this is a test.”) and tested various question types:

For each question, the script prepended increasing numbers of backspaces to test the effect on model behavior. The researchers calculated “prompt offsets” to understand how many backspaces would logically position the cursor at different points within the prompt, including negative positions.

Key Findings

The experimental results demonstrated several concerning behaviors as control character counts increased:

Context Forgetting: At around 1024 backspaces (offset -1024), the model would completely ignore its provided context and instructions. For the simple question “What is this?”, the model eventually produced hallucinated responses about cubic polynomials instead of referencing the test context.

Instruction Betrayal: Questions that should have triggered the “I don’t know” response instead received direct answers once enough control characters were prepended. For the factual question about the 1982 sci-fi film “Tron”, the model correctly answered the out-of-context question at offset -256, despite being instructed to only use the provided context.

Hallucinations: At extreme offset values (like -3500), the model would hallucinate responses to completely different questions. When asked about the prompt’s first 100 words, the model instead provided the first 100 digits of π. When asked about prompt instructions, it began calculating “10 choose 3” as a combinatorics problem.

Model Variations: GPT-4 showed greater resistance to these techniques at smaller context sizes (8K tokens), but became susceptible when using larger context windows (32K tokens with gpt-4-32k model). The team was able to trigger similar effects at higher relative prompt offsets (-10000 and greater magnitudes) with the larger context GPT-4 model.

Production Implications

This research has significant implications for LLMOps practitioners:

Input Sanitization Requirements: Any production LLM application accepting user input must implement proper sanitization of control characters. The fact that this behavior is not well-documented in OpenAI’s model documentation or API reference makes it a potential blind spot for developers.

Model Selection Trade-offs: While GPT-4 showed more resistance to these attacks at smaller context sizes, it comes with higher costs and potentially higher latency—important considerations for production deployments. Organizations must balance security requirements against performance and cost constraints.

Non-Deterministic Behavior: The researchers note that LLMs are non-deterministic, recommending that organizations conduct their own testing appropriate to their specific applications rather than relying solely on general security guidance.

Template Agnostic: The researchers experimented with variations of their prompt template and found that the injection technique worked regardless of instruction wording changes and formatting suggestions. This suggests that prompt engineering alone is insufficient as a security measure.

Mitigation Considerations

The Dropbox team identified several approaches to mitigation, while acknowledging the complexity of the problem:

Input Sanitization: The primary recommended approach involves sanitizing input appropriately for both the input type and the chosen model. Different control characters (carriage return vs. backspace) produced varying effectiveness, suggesting that comprehensive sanitization strategies are needed.

Valid Use Cases: The team acknowledges that there may be legitimate use cases for control characters in prompts—for example, when evaluating source code or binary formats. Production applications may need to support multiple modes of functionality to balance utility with security.

Risk-Based Approach: The researchers emphasize that risk tolerance, application design, and model choice will dictate required sanitization measures, suggesting there is no one-size-fits-all solution.

Responsible Disclosure

The Dropbox team followed responsible disclosure practices by sharing their findings with OpenAI and awaiting further mitigation guidance before publishing. They also published a GitHub repository with updated research on repeated character sequences that induce LLM instability.

Broader LLMOps Lessons

This case study illustrates several important principles for production LLM deployments:

The research demonstrates the importance of dedicated security expertise when deploying LLMs in production and highlights the need for the broader community to develop comprehensive prompt engineering and sanitization strategies that can block malicious prompt input across different models and use cases.

More Like This

AI Agent System for Automated Security Investigation and Alert Triage

Slack 2025

Slack's Security Engineering team developed an AI agent system to automate the investigation of security alerts from their event ingestion pipeline that handles billions of events daily. The solution evolved from a single-prompt prototype to a multi-agent architecture with specialized personas (Director, domain Experts, and a Critic) that work together through structured output tasks to investigate security incidents. The system uses a "knowledge pyramid" approach where information flows upward from token-intensive data gathering to high-level decision making, allowing strategic use of different model tiers. Results include transformed on-call workflows from manual evidence gathering to supervision of agent teams, interactive verifiable reports, and emergent discovery capabilities where agents spontaneously identified security issues beyond the original alert scope, such as discovering credential exposures during unrelated investigations.

fraud_detection content_moderation classification +27

Building Production AI Agents and Agentic Platforms at Scale

Vercel 2025

This AWS re:Invent 2025 session explores the challenges organizations face moving AI projects from proof-of-concept to production, addressing the statistic that 46% of AI POC projects are canceled before reaching production. AWS Bedrock team members and Vercel's director of AI engineering present a comprehensive framework for production AI systems, focusing on three critical areas: model switching, evaluation, and observability. The session demonstrates how Amazon Bedrock's unified APIs, guardrails, and Agent Core capabilities combined with Vercel's AI SDK and Workflow Development Kit enable rapid development and deployment of durable, production-ready agentic systems. Vercel showcases real-world applications including V0 (an AI-powered prototyping platform), Vercel Agent (an AI code reviewer), and various internal agents deployed across their organization, all powered by Amazon Bedrock infrastructure.

code_generation chatbot data_analysis +38

Building Production-Grade Agentic AI Analytics: Lessons from Real-World Deployment

Tellius 2025

Tellius shares hard-won lessons from building their agentic analytics platform that transforms natural language questions into trustworthy SQL-based insights. The core problem addressed is that chat-based analytics requires far more than simple text-to-SQL conversion—it demands deterministic planning, governed semantic layers, ambiguity management, multi-step consistency, transparency, performance engineering, and comprehensive observability. Their solution architecture separates language understanding from execution through typed plan artifacts that validate against schemas and policies before execution, implements clarification workflows for ambiguous queries, maintains plan/result fingerprinting for consistency, provides inline transparency with preambles and lineage, enforces latency budgets across execution hops, and treats feedback as governed policy changes. The result is a production system that achieves determinism, explainability, and sub-second interactive performance while avoiding the common pitfalls that cause 95% of AI pilot failures.

data_analysis question_answering structured_output +30