Dropbox's security team discovered a novel prompt injection vulnerability in OpenAI's GPT-3.5 and GPT-4 models where specially crafted control characters (like backspace and carriage return) could be used to circumvent prompt constraints and system instructions. Through systematic experimentation, they found that adding large numbers of these characters (350+ for carriage returns, 450+ for backspaces) could cause the models to forget context, ignore instructions, and even hallucinate responses.
# Dropbox's Investigation of Control Character Prompt Injection in Production LLM Systems
## Overview
Dropbox's security team conducted extensive research into potential vulnerabilities when deploying Large Language Models (LLMs) in production systems. Their investigation revealed a novel prompt injection technique using control characters that could bypass system instructions and prompt constraints in OpenAI's GPT-3.5 and GPT-4 models.
## Technical Architecture and Implementation
### Prompt Template Design
- Initial implementation used a structured prompt template for Q&A applications
- Template included:
### API Integration
- Utilized OpenAI's Chat API endpoints
- Implemented with Python using requests library
- Proper authentication and API key management
- JSON payload formatting for chat completions
### Test Framework
- Systematic testing approach with controlled experiments
- Python script to automate testing different scenarios
- Tracked prompt offsets and model responses
- Documented behavior changes with increasing control characters
## Security Vulnerabilities Discovered
### Control Character Interpretation
- Models interpret control characters like backspace () and carriage return (
) as tokens
- Two key encoding methods identified:
- Character interpretation not well documented in model specifications
### Prompt Injection Effects
- Varying numbers of control characters produced different behaviors:
- Key thresholds identified:
### Impact Assessment
- Model could be manipulated to:
- Vulnerability affects both GPT-3.5 and GPT-4, with GPT-4 showing more resistance
## Production Considerations and Mitigations
### Input Sanitization
- Recommend careful sanitization of user inputs
- Consider valid use cases for control characters (e.g., code evaluation)
- Balance functionality needs with security risks
### Model Selection
- GPT-4 showed better resistance to attacks at smaller context windows
- Trade-offs between:
### Implementation Guidelines
- Context window size impacts vulnerability
- Non-deterministic model behavior requires thorough testing
- Need for comprehensive prompt engineering strategies
- Input validation based on specific use cases
## Testing and Evaluation Methods
### Experimental Design
- Systematic testing of different question types:
### Monitoring and Analysis
- Tracked offset positions relative to prompt start
- Documented response patterns
- Analyzed model behavior changes
- Identified threshold points for instruction betrayal
## Production Deployment Recommendations
### Security Measures
- Implement robust input sanitization
- Regular security testing and monitoring
- Consider model-specific vulnerabilities
- Balance security with functionality needs
### Best Practices
- Use appropriate context window sizes
- Implement proper error handling
- Monitor for unexpected model behaviors
- Regular security assessments
### Risk Management
- Assess application-specific risks
- Document security measures
- Plan for incident response
- Regular security updates
## Future Considerations
### Ongoing Research
- Investigation of other control character combinations
- Testing across different model variants
- Development of comprehensive sanitization strategies
- Documentation of model-specific behaviors
### Industry Impact
- Need for improved model documentation
- Standardization of security practices
- Better understanding of model limitations
- Development of security frameworks
### Development Roadmap
- Follow-up research on mitigation strategies
- Testing with other LLM variants
- Development of secure prompt engineering guidelines
- Implementation of robust security measures
# Control Character-Based Prompt Injection Attack at Dropbox
## Overview
Dropbox's security team conducted extensive research into LLM security while exploring the integration of large language models into their products. This case study details their discovery of a novel prompt injection vulnerability affecting OpenAI's models, specifically focusing on how control characters can be weaponized to bypass security controls.
## Technical Implementation Details
### Testing Environment
- Models tested: OpenAI's GPT-3.5 and GPT-4
- Testing methodology: Systematic blackbox experiments using OpenAI's Chat API
- Primary focus: Control character interpretation and prompt injection possibilities
### Initial Prompt Template
The team used a structured prompt template designed for secure Q&A:
- Explicit truthfulness requirement
- Context boundaries
- Word limit constraints
- IDK (I don't know) fallback responses
- Clear question delimitation using triple backticks
### Discovery Process
- Control Character Investigation
) are interpreted as tokens
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.