A groundbreaking research project evaluating GPT-4’s performance on the Uniform Bar Examination (UBE), demonstrating that large language models can pass professional legal licensing exams with comfortable margins.

Research Impact

This follow-up to “GPT Takes the Bar Exam” showed dramatic improvements in AI legal reasoning capabilities:

GPT-4 scored ~297 points - significantly above passing threshold for all UBE jurisdictions
26% improvement over ChatGPT’s performance
Beat human test-takers in 5 of 7 subject areas on the MBE

Publication

Authors: Daniel Martin Katz, Michael James Bommarito, Shang Gao, Pablo Arredondo
Published: March 15, 2023 (SSRN), 2024 (Philosophical Transactions of the Royal Society A)
Paper: Available on SSRN

Methodology

The research evaluated GPT-4 across all three components of the Uniform Bar Examination:

Multistate Bar Examination (MBE) - Multiple choice questions
Multistate Essay Exam (MEE) - Essay responses
Multistate Performance Test (MPT) - Practical legal tasks

Key Findings

MBE Performance: GPT-4 significantly outperformed both humans and prior models
MEE/MPT Scores: Average of 4.2/6.0 (much higher than ChatGPT)
Subject Expertise: Particularly strong in Evidence and Torts

Technical Implementation

The evaluation framework includes:

Python scripts for running experiments
Standardized prompting strategies
Result analysis tools
Reproducible testing methodology

Broader Implications

This research demonstrates the rapid advancement in AI capabilities for professional reasoning tasks and raises important questions about the future of legal education, licensing, and practice.