When Wix needed to update over 2,000 code samples in their API reference documentation due to a syntax change, they implemented an LLM-based automation solution instead of manual updates. The team used GPT-4 for code classification and GPT-3.5 Turbo for code conversion, combined with TypeScript compilation for validation. This automated approach reduced what would have been weeks of manual work to a single morning of team involvement, while maintaining high accuracy in the code transformations.
This case study from Wix demonstrates a practical application of LLMs in a production environment to solve a significant technical documentation challenge. The company faced the need to update over 2,000 code samples in their Velo API reference documentation due to a syntax change in their backend code structure. Instead of pursuing a manual update process that would have consumed significant technical writer resources, they developed an automated solution leveraging LLMs.
The problem they faced was particularly complex due to several factors:
* The syntax changes only affected certain types of code samples, with no programmatic way to identify which ones needed updates
* Code samples existed in various formats due to evolving style guides
* Source files were distributed across multiple GitHub repositories
* The solution needed to maintain high accuracy as incorrect code samples in documentation would be unacceptable
Their LLMOps implementation consisted of several sophisticated components working together:
* Code Classification System
The team used GPT-4 with a carefully crafted prompt to identify which code samples needed updating. The prompt was designed to recognize specific patterns in the code, such as import statements with version numbers and exported async functions, while explicitly excluding certain patterns like event handlers and HTTP functions. This classification step was crucial for focusing the conversion effort only on relevant code samples.
* Code Conversion Pipeline
For the actual code transformation, they chose GPT-3.5 Turbo after experimentation showed it provided the best results for their use case. The prompt engineering for this component was particularly noteworthy, as it needed to handle various code formats while making very specific, consistent changes such as:
* Adding new import statements for Permissions and webMethod
* Converting function definitions to use the webMethod wrapper
* Updating file extension references
* Validation Framework
A key aspect of their LLMOps implementation was the validation system. They developed a utility that:
* Retrieved type definitions for all Velo APIs
* Used the TypeScript compiler to validate the converted code samples
* Provided detailed error logging for manual review
* Tracked the correlation between original and converted samples
* Automated Workflow Integration
The team created a comprehensive workflow that integrated with their existing tools:
* Used GitHub CLI for searching and managing code samples across repositories
* Implemented mapping systems to track file locations and conversion status
* Created automated PR generation for the updated samples
* Built error handling and review processes into the workflow
The results of this LLMOps implementation were impressive. What would have been weeks of manual work was reduced to a single morning of team involvement for six technical writers. The system demonstrated high reliability, with no reported conversion errors in the LLM output. The only issues encountered were token limit cutoffs and some API type definition mismatches, which were unrelated to the LLM performance.
Some key lessons from this implementation:
* Prompt Engineering Best Practices
The team's approach to prompt engineering was methodical, creating specific prompts for different tasks and testing them against known samples. They recognized that different models (GPT-4 vs GPT-3.5 Turbo) might be better suited for different tasks in the pipeline.
* Quality Assurance
Their approach to quality assurance was particularly noteworthy. Rather than trusting the LLM output directly, they implemented multiple validation layers:
* Automated TypeScript compilation checks
* Error logging and review processes
* Manual verification of error cases
* Source control integration for change tracking
* System Architecture
The solution architecture demonstrated good practices in LLMOps:
* Clear separation of concerns (classification, conversion, validation)
* Integration with existing development tools and workflows
* Automated but controlled deployment through PR generation
* Comprehensive error handling and logging
* Resource Optimization
The team made intelligent choices about resource allocation:
* Using more expensive models (GPT-4) only where necessary
* Automating repetitive tasks while maintaining human oversight
* Creating reusable components that could be applied to similar future challenges
This case study represents a sophisticated example of LLMOps in practice, showing how LLMs can be effectively integrated into existing workflows to solve real-world problems. The success of this implementation suggests that similar approaches could be valuable for other documentation and code transformation tasks, particularly where patterns can be clearly defined but traditional rule-based automation would be too rigid or complex to implement.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.