on this page

GPT as Knowledge Worker

model

Research evaluating GPT models' capabilities on the Uniform CPA Examination, exploring AI's potential to transform knowledge work

period: 2023-present
tech:
AI Evaluation
══════════════════════════════════════════════════════════════════

A pioneering study evaluating OpenAI’s GPT models on the Uniform CPA Examination, testing their capabilities as potential knowledge workers in accounting, legal, financial, and ethical domains.

Research Overview

This project systematically evaluates GPT models (text-davinci-001 through text-davinci-003) on CPA exam questions, providing insights into AI’s readiness for professional knowledge work.

Publication

  • Authors: Jillian Bommarito, Michael James Bommarito, Daniel Martin Katz, Jessica Katz
  • Published: January 11, 2023
  • Paper: Available on arXiv and SSRN

Key Findings

Performance Metrics

  • text-davinci-003: 14.4% correct on sample REG exam section
  • Best configuration: 57.6% questions answered correctly
  • Top-2 accuracy: 82.1% (indicating strong partial understanding)
  • Improvement over time: 30% (davinci-001) β†’ 57% (davinci-003)

Skill-Level Analysis

  • Strong performance: Remembering & Understanding, Application tasks
  • Weakness: Quantitative reasoning and calculation-heavy problems
  • Approaching human-level performance on conceptual questions

Technical Implementation

The research framework includes:

  • Poetry-based Python environment
  • Scripts for exam administration and scoring
  • Session data export capabilities
  • Performance visualization tools

Evaluation Methodology

Tested on:

  • Sample Regulation (REG) exam sections
  • 200+ multiple-choice questions covering:
    • Legal concepts
    • Financial analysis
    • Accounting principles
    • Technology applications
    • Ethical considerations

Implications

This research demonstrates that while GPT models show promise for knowledge work, particularly in conceptual understanding and application, they still face challenges with quantitative reasoning. The rapid improvement between model versions suggests accelerating capabilities in professional domain tasks.

on this page