A systematic evaluation of toxicity and bias in the Kelvin Legal Large Language Model (KL3M) family, establishing new benchmarks for safety in professional AI models through careful training data curation and comprehensive testing.

Research Overview

This project demonstrates that careful dataset curation and training can produce language models with significantly reduced toxicity while maintaining strong performance on professional tasks.

Methodology

Experimental Design

The research conducted three distinct experiments:

RealToxicityPrompts: 20 prompts testing standard toxicity benchmarks
4chan Discourse: 500 prompts simulating challenging internet discourse
Protected Classes: 50 systematic prompts covering protected class descriptors

Models Tested

9 different language models including:
- KL3M-170m and KL3M-1.7b variants
- Comparison models from other families
- Baseline open-source models

Evaluation Framework

Scoring: GPT-4 based evaluation
Coding: Triple-coded completions for reliability
Metrics: Toxicity and bias scales
Analysis: Distinction between use vs. reference of potentially offensive language

Key Findings

Toxicity Performance

Lowest toxicity rates: KL3M models consistently produced the least toxic completions
Minimal “bad” language: Significantly reduced use of problematic language
Robust across experiments: Strong performance across diverse prompt types

Training Impact

The superior safety profile results from:

Curated training data: ~350B tokens from 2T+ collected
Legal/professional focus: High-quality corpus selection
Careful filtering: Avoided problematic content sources

Technical Implementation

Data Curation Process

Collected over 2 trillion tokens of text
Filtered to ~350 billion high-quality tokens
Excluded sources with:
- Contract breach risks
- Unclear licensing
- Potentially toxic content

Evaluation Pipeline

Automated prompt generation
Model inference across test suite
GPT-4 based scoring system
Statistical analysis of results

Broader Impact

This research establishes that:

Safety is achievable: Careful data curation can dramatically reduce model toxicity
No performance trade-off: Safety improvements don’t compromise capability
Scalable approach: Methods apply to models of various sizes

Open Research

Released under permissive licenses:

Code: Apache 2.0 License
Data: CC-BY-4.0 License

This ensures reproducibility and enables the community to:

Verify findings independently
Apply methods to other models
Build upon the safety research

Significance for Legal AI

For legal and professional applications, low toxicity is critical:

Maintains professional standards
Reduces liability risks
Ensures appropriate client interactions
Supports ethical AI deployment

The KL3M toxicity research sets new standards for responsible AI development in professional domains.