A pioneering study evaluating OpenAI’s GPT models on the Uniform CPA Examination, testing their capabilities as potential knowledge workers in accounting, legal, financial, and ethical domains.

Research Overview

This project systematically evaluates GPT models (text-davinci-001 through text-davinci-003) on CPA exam questions, providing insights into AI’s readiness for professional knowledge work.

Publication

Authors: Jillian Bommarito, Michael James Bommarito, Daniel Martin Katz, Jessica Katz
Published: January 11, 2023
Paper: Available on arXiv and SSRN

Key Findings

Performance Metrics

text-davinci-003: 14.4% correct on sample REG exam section
Best configuration: 57.6% questions answered correctly
Top-2 accuracy: 82.1% (indicating strong partial understanding)
Improvement over time: 30% (davinci-001) → 57% (davinci-003)

Skill-Level Analysis

Strong performance: Remembering & Understanding, Application tasks
Weakness: Quantitative reasoning and calculation-heavy problems
Approaching human-level performance on conceptual questions

Technical Implementation

The research framework includes:

Poetry-based Python environment
Scripts for exam administration and scoring
Session data export capabilities
Performance visualization tools

Evaluation Methodology

Tested on:

Sample Regulation (REG) exam sections
200+ multiple-choice questions covering:
- Legal concepts
- Financial analysis
- Accounting principles
- Technology applications
- Ethical considerations

Implications

This research demonstrates that while GPT models show promise for knowledge work, particularly in conceptual understanding and application, they still face challenges with quantitative reasoning. The rapid improvement between model versions suggests accelerating capabilities in professional domain tasks.