Company
Databricks
Title
Building a Custom LLM for Automated Documentation Generation
Industry
Tech
Year
2023
Summary (short)
Databricks developed an AI-generated documentation feature for automatically documenting tables and columns in Unity Catalog. After initially using SaaS LLMs that faced challenges with quality, performance, and cost, they built a custom fine-tuned 7B parameter model in just one month with two engineers and less than $1,000 in compute costs. The bespoke model achieved better quality than cheaper SaaS alternatives, 10x cost reduction, and higher throughput, now powering 80% of table metadata updates on their platform.
# Building a Custom LLM for Documentation Generation at Databricks ## Background and Initial Implementation Databricks implemented an AI-generated documentation feature to automatically generate documentation for tables and columns in their Unity Catalog system. The initial implementation used off-the-shelf SaaS-based LLMs and was prototyped during a quarterly hackathon. The feature quickly gained traction, with over 80% of table metadata updates becoming AI-assisted. ## Production Challenges The team encountered several significant challenges when moving to production: - **Quality Control** - **Performance Issues** - **Cost Constraints** ## Custom Model Development The team opted to build a bespoke model with these key characteristics: - **Development Metrics** - **Training Data Sources** ## Model Selection and Evaluation - **Model Selection Criteria** - **Selected Architecture** - **Evaluation Framework** ## Production Architecture Components - **Core Infrastructure** - **Key Features** ## Performance Improvements - **Quality** - **Cost Efficiency** - **Throughput** ## Production Optimization Techniques - **Prompt Engineering** - **Infrastructure Optimization** ## Monitoring and Maintenance - **Quality Assurance** - **Deployment Strategy** ## Key Learnings and Best Practices - **Model Development** - **Infrastructure** - **Cost Management** ## Results and Impact - **Business Impact** - **Technical Achievements** The case study demonstrates that building custom, fine-tuned models for specific use cases can be both practical and advantageous, offering better control, lower costs, and improved performance compared to general-purpose SaaS LLMs. The success of this implementation provides a blueprint for other organizations looking to deploy LLMs in production for specific use cases.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.