on this page

KL3M Toxicity Research

model

Comprehensive research examining toxicity and bias in legal language models, demonstrating KL3M's superior safety profile through rigorous testing

period: 2024-present
team: ALEA Institute
tech:
AI EthicsLegal Informatics
══════════════════════════════════════════════════════════════════

A systematic evaluation of toxicity and bias in the Kelvin Legal Large Language Model (KL3M) family, establishing new benchmarks for safety in professional AI models through careful training data curation and comprehensive testing.

Research Overview

This project demonstrates that careful dataset curation and training can produce language models with significantly reduced toxicity while maintaining strong performance on professional tasks.

Methodology

Experimental Design

The research conducted three distinct experiments:

  1. RealToxicityPrompts: 20 prompts testing standard toxicity benchmarks
  2. 4chan Discourse: 500 prompts simulating challenging internet discourse
  3. Protected Classes: 50 systematic prompts covering protected class descriptors

Models Tested

  • 9 different language models including:
    • KL3M-170m and KL3M-1.7b variants
    • Comparison models from other families
    • Baseline open-source models

Evaluation Framework

  • Scoring: GPT-4 based evaluation
  • Coding: Triple-coded completions for reliability
  • Metrics: Toxicity and bias scales
  • Analysis: Distinction between use vs. reference of potentially offensive language

Key Findings

Toxicity Performance

  • Lowest toxicity rates: KL3M models consistently produced the least toxic completions
  • Minimal β€œbad” language: Significantly reduced use of problematic language
  • Robust across experiments: Strong performance across diverse prompt types

Training Impact

The superior safety profile results from:

  • Curated training data: ~350B tokens from 2T+ collected
  • Legal/professional focus: High-quality corpus selection
  • Careful filtering: Avoided problematic content sources

Technical Implementation

Data Curation Process

  • Collected over 2 trillion tokens of text
  • Filtered to ~350 billion high-quality tokens
  • Excluded sources with:
    • Contract breach risks
    • Unclear licensing
    • Potentially toxic content

Evaluation Pipeline

  • Automated prompt generation
  • Model inference across test suite
  • GPT-4 based scoring system
  • Statistical analysis of results

Broader Impact

This research establishes that:

  • Safety is achievable: Careful data curation can dramatically reduce model toxicity
  • No performance trade-off: Safety improvements don’t compromise capability
  • Scalable approach: Methods apply to models of various sizes

Open Research

Released under permissive licenses:

  • Code: Apache 2.0 License
  • Data: CC-BY-4.0 License

This ensures reproducibility and enables the community to:

  • Verify findings independently
  • Apply methods to other models
  • Build upon the safety research

For legal and professional applications, low toxicity is critical:

  • Maintains professional standards
  • Reduces liability risks
  • Ensures appropriate client interactions
  • Supports ethical AI deployment

The KL3M toxicity research sets new standards for responsible AI development in professional domains.

on this page