KL3M Toxicity Research
modelComprehensive research examining toxicity and bias in legal language models, demonstrating KL3M's superior safety profile through rigorous testing
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
A systematic evaluation of toxicity and bias in the Kelvin Legal Large Language Model (KL3M) family, establishing new benchmarks for safety in professional AI models through careful training data curation and comprehensive testing.
Research Overview
This project demonstrates that careful dataset curation and training can produce language models with significantly reduced toxicity while maintaining strong performance on professional tasks.
Methodology
Experimental Design
The research conducted three distinct experiments:
- RealToxicityPrompts: 20 prompts testing standard toxicity benchmarks
- 4chan Discourse: 500 prompts simulating challenging internet discourse
- Protected Classes: 50 systematic prompts covering protected class descriptors
Models Tested
- 9 different language models including:
- KL3M-170m and KL3M-1.7b variants
- Comparison models from other families
- Baseline open-source models
Evaluation Framework
- Scoring: GPT-4 based evaluation
- Coding: Triple-coded completions for reliability
- Metrics: Toxicity and bias scales
- Analysis: Distinction between use vs. reference of potentially offensive language
Key Findings
Toxicity Performance
- Lowest toxicity rates: KL3M models consistently produced the least toxic completions
- Minimal βbadβ language: Significantly reduced use of problematic language
- Robust across experiments: Strong performance across diverse prompt types
Training Impact
The superior safety profile results from:
- Curated training data: ~350B tokens from 2T+ collected
- Legal/professional focus: High-quality corpus selection
- Careful filtering: Avoided problematic content sources
Technical Implementation
Data Curation Process
- Collected over 2 trillion tokens of text
- Filtered to ~350 billion high-quality tokens
- Excluded sources with:
- Contract breach risks
- Unclear licensing
- Potentially toxic content
Evaluation Pipeline
- Automated prompt generation
- Model inference across test suite
- GPT-4 based scoring system
- Statistical analysis of results
Broader Impact
This research establishes that:
- Safety is achievable: Careful data curation can dramatically reduce model toxicity
- No performance trade-off: Safety improvements donβt compromise capability
- Scalable approach: Methods apply to models of various sizes
Open Research
Released under permissive licenses:
- Code: Apache 2.0 License
- Data: CC-BY-4.0 License
This ensures reproducibility and enables the community to:
- Verify findings independently
- Apply methods to other models
- Build upon the safety research
Significance for Legal AI
For legal and professional applications, low toxicity is critical:
- Maintains professional standards
- Reduces liability risks
- Ensures appropriate client interactions
- Supports ethical AI deployment
The KL3M toxicity research sets new standards for responsible AI development in professional domains.