GPT Takes the Bar Exam
modelGroundbreaking research demonstrating GPT-3.5's performance on the Multistate Bar Examination, predicting AI's ability to pass professional legal licensing exams
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The first comprehensive evaluation of a large language modelβs performance on the bar exam, testing OpenAIβs GPT-3.5 (text-davinci-003) on the Multistate Bar Examination (MBE) multiple-choice section.
Research Impact
This pioneering study laid the groundwork for understanding AI capabilities in legal reasoning:
- 50.3% accuracy - significantly above 25% baseline guessing rate
- Passing performance in Evidence and Torts sections
- 88% top-3 accuracy - demonstrating strong partial understanding
- Predicted that LLMs would soon pass the bar exam (confirmed by GPT-4 in 2023)
Publication
- Authors: Michael James Bommarito, Daniel Martin Katz
- Published: December 29, 2022
- Paper: Available on arXiv and SSRN
Key Findings
Performance Analysis
- Overall Score: 50.3% correct on complete NCBE MBE practice exam
- Response Quality: Top two choices correct 71% of the time
- Subject Strengths: Evidence and Torts at passing rates
- Zero-shot Performance: No benefit from fine-tuning at available data scale
Technical Insights
- Hyperparameter optimization improved performance
- Prompt engineering significantly impacted results
- Strong correlation between model confidence and correctness
Methodology
The research evaluated:
- Complete MBE practice examinations
- Multiple question categories and difficulty levels
- Various prompting strategies
- Fine-tuning vs zero-shot approaches
Historical Significance
This research marked a turning point in AI evaluation on professional exams, establishing methodologies and baselines that would be used in subsequent studies. The prediction that βan LLM will pass the MBE component of the Bar Exam in the near futureβ was validated just months later with GPT-4βs success.
Resources
The repository includes:
- Jupyter notebooks with analysis
- Performance visualization charts
- Prompt examples and optimization strategies
- Complete session logs for reproducibility