Instacart shares their experience implementing various prompt engineering techniques to improve LLM performance in production applications. The article details both traditional and novel approaches including Chain of Thought, ReAct, Room for Thought, Monte Carlo brainstorming, Self Correction, Classifying with logit bias, and Puppetry. These techniques were developed and tested while building internal productivity tools like Ava and Ask Instacart, demonstrating practical ways to enhance LLM reliability and output quality in production environments.
# Advanced Prompt Engineering Techniques at Instacart
Instacart, a leading e-commerce platform, has been rapidly adopting LLMs and GenAI technologies for various internal and customer-facing applications. This case study explores their practical experience implementing and optimizing prompt engineering techniques for production LLM applications.
## Company Background and Use Cases
- Developed internal assistant called Ava
- Created AI-powered search feature "Ask Instacart"
- Focused on delivering value across multiple stakeholders:
## Technical Implementation Details
### Model Selection and Considerations
- Primary focus on GPT-4 implementation
- Some techniques also tested with GPT-3.5
- Recommendation to use GPT-4 when economically feasible due to superior performance
- Techniques developed through combination of:
### Traditional Prompt Engineering Techniques
### Chain of Thought (CoT)
- Implementation using simple phrases like "Let's take this step by step"
- Alternative approach using "Take a deep breath and come up with a plan"
- Particularly effective for complex tasks requiring structured thinking
### ReAct Pattern
- Implemented action-based prompting system
- Enabled external actions through specific commands:
- Similar to ChatGPT's plugin system but customized for internal use
### Novel Prompt Engineering Techniques
### Room for Thought
- Explicitly encourages LLM to create plans before answering
- Implementation requires careful prompt construction to prevent premature answers
- Example use case: Pull request title and description generation
- Can include pre-baked guidelines to save generation tokens
### Monte Carlo Technique
- Generates multiple different options before final answer
- Particularly effective for creative tasks
- Implementation details:
### Self Correction
- Enables model to critique its own outputs
- Often combined with Monte Carlo technique
- Process:
### Classifying Technique
- Uses logit_bias parameter to force specific token selection
- Implementation details:
- Supports "deep thought" mode for combining with other techniques
### Puppetry Technique
- Manipulates conversation state to guide model behavior
- Implementation approach:
## Production Considerations
### Performance Optimization
- Careful consideration of token usage
- Pre-baking static guidelines into prompts
- Using multiple rounds of prompting when needed
- Temperature adjustment for different use cases
### Output Reliability
- Implementation of forced choice responses
- Use of logit_bias for controlled outputs
- Multiple validation steps in complex tasks
- Combination of techniques for optimal results
### Integration Points
- OpenAI API integration
- Custom internal LLM proxy
- Conversation state management
- Format validation and processing
## Results and Impact
### Successful Applications
- Internal productivity tooling
- Code review automation
- Search functionality
- Employee assistance
### Best Practices Developed
- Technique selection based on use case
- Prompt optimization for token efficiency
- Output validation strategies
- Interactive refinement processes
## Technical Challenges and Solutions
### Common Issues Addressed
- Context size limitations
- Hallucination prevention
- Task completion reliability
- Output format consistency
### Implementation Solutions
- Multi-step prompting processes
- Forced output formatting
- Validation through multiple techniques
- Temperature and bias parameter optimization
## Future Considerations
### Scalability
- Token usage optimization
- Processing efficiency
- Response time management
- Cost considerations
### Maintenance
- Prompt template management
- Technique documentation
- Performance monitoring
- Continuous improvement processes
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.