Adobe's Information Architect Jessica Talisman discusses how to build and maintain taxonomies for AI and search systems. The case study explores the challenges and best practices in creating taxonomies that bridge the gap between human understanding and machine processing, covering everything from metadata extraction to ontology development. The approach emphasizes the importance of human curation in AI systems and demonstrates how well-structured taxonomies can significantly improve search relevance, content categorization, and business operations.
This case study provides an in-depth look at how Adobe approaches taxonomy development for AI and search systems, with insights from their Information Architect Jessica Talisman. The discussion reveals crucial aspects of implementing AI systems in production, particularly focusing on the foundational data structures needed for effective AI operations.
The core challenge addressed is how to create and maintain taxonomies that can effectively bridge the gap between human understanding and machine processing of information. This is particularly crucial in the context of large language models and AI systems where ambiguity can significantly impact performance.
The technical implementation follows a structured progression:
Starting with metadata extraction, the process begins by collecting flat lists of tags or labels from various sources. These can be extracted using vector databases, web scraping, or internal stakeholder input. The key innovation here is the use of multiple data sources to ensure comprehensive coverage while maintaining practical utility.
For the clustering and analysis phase, they employ multiple approaches:
* Using OpenRefine with its eight different clustering algorithms to analyze and group related terms
* Implementing vector-based analysis to determine groupings and nomenclature
* Applying different clustering approaches based on data characteristics, whether it's Markov chains or nearest neighbor algorithms
The taxonomy development process emphasizes several critical technical considerations:
* Implementation of SKOS (Simple Knowledge Organization System) for taxonomy encoding
* Strict adherence to ISO standards for information retrieval systems
* Integration with validation frameworks to catch issues like recursive loops or relationship clashes
* Support for multilingual implementations through localization standards
A particularly interesting aspect of the production implementation is how they handle the display taxonomy versus the backend taxonomy. This dual-taxonomy approach allows for customer-facing labels to be managed separately from the technical implementation, with mappings handled through equivalency statements using SKOS or OWL.
The case study reveals several key insights about LLMOps in production:
* The importance of human curation in AI systems - while automation is valuable, subject matter expertise remains crucial for maintaining data quality
* The need for continuous validation and updating of taxonomies as they grow
* The critical role of definitions and disambiguation in making taxonomies machine-readable
* The value of integrating with existing standards and frameworks rather than building everything from scratch
One concrete example of the business impact comes from a content company's author compensation model. By using taxonomies to detect duplicate or near-synonym content categories, they identified potential savings of $8 million over five years from previous overpayments. This demonstrates how well-structured taxonomies can directly impact business operations.
The implementation also shows careful consideration of failure modes and edge cases:
* Mixed granularity issues where inconsistent levels of detail can confuse machine learning systems
* The challenges of maintaining consistency across multiple languages and locales
* The need to balance automation with human oversight, particularly for legal and compliance requirements
For evaluation and monitoring, the system looks at several key metrics:
* Tag completeness across content
* Query response times
* Result accuracy and completeness
* Ambiguity detection in search results
* Customer satisfaction metrics for B2B integrations
The case study emphasizes that while AI and machine learning are powerful tools, they must be built on a foundation of well-structured data. This requires careful attention to data modeling, taxonomy development, and ongoing maintenance - aspects that cannot be completely automated away.
A particularly valuable insight is how they approach scaling taxonomies. Rather than starting with highly granular classifications, they begin with broader categories and add detail based on observed needs and behaviors. This approach allows for more sustainable growth and better alignment with actual usage patterns.
The implementation also demonstrates strong integration with existing knowledge bases and standards:
* Using DBPedia and Wikidata for ontology validation
* Leveraging industry-standard taxonomies as starting points
* Integrating with localization standards for multilingual support
This case study provides valuable insights into how traditional information architecture practices need to evolve to support modern AI systems while maintaining the rigor and structure necessary for production environments. It emphasizes that successful AI implementations require careful attention to the underlying data structures and cannot rely solely on sophisticated algorithms.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.