on this page

GPT Takes the Bar Exam

model

Groundbreaking research demonstrating GPT-3.5's performance on the Multistate Bar Examination, predicting AI's ability to pass professional legal licensing exams

period: 2022-present
tech:
AI EvaluationLegal Informatics
══════════════════════════════════════════════════════════════════

The first comprehensive evaluation of a large language model’s performance on the bar exam, testing OpenAI’s GPT-3.5 (text-davinci-003) on the Multistate Bar Examination (MBE) multiple-choice section.

Research Impact

This pioneering study laid the groundwork for understanding AI capabilities in legal reasoning:

  • 50.3% accuracy - significantly above 25% baseline guessing rate
  • Passing performance in Evidence and Torts sections
  • 88% top-3 accuracy - demonstrating strong partial understanding
  • Predicted that LLMs would soon pass the bar exam (confirmed by GPT-4 in 2023)

Publication

  • Authors: Michael James Bommarito, Daniel Martin Katz
  • Published: December 29, 2022
  • Paper: Available on arXiv and SSRN

Key Findings

Performance Analysis

  • Overall Score: 50.3% correct on complete NCBE MBE practice exam
  • Response Quality: Top two choices correct 71% of the time
  • Subject Strengths: Evidence and Torts at passing rates
  • Zero-shot Performance: No benefit from fine-tuning at available data scale

Technical Insights

  • Hyperparameter optimization improved performance
  • Prompt engineering significantly impacted results
  • Strong correlation between model confidence and correctness

Methodology

The research evaluated:

  • Complete MBE practice examinations
  • Multiple question categories and difficulty levels
  • Various prompting strategies
  • Fine-tuning vs zero-shot approaches

Historical Significance

This research marked a turning point in AI evaluation on professional exams, establishing methodologies and baselines that would be used in subsequent studies. The prediction that β€œan LLM will pass the MBE component of the Bar Exam in the near future” was validated just months later with GPT-4’s success.

Resources

The repository includes:

  • Jupyter notebooks with analysis
  • Performance visualization charts
  • Prompt examples and optimization strategies
  • Complete session logs for reproducibility
on this page