projects

various software, data, or models i've built in an individual or affiliated capacity

books

This Is Server Country: AI, Power, and the Remaking of Rural America

book

A 402-page investigation into America's trillion-dollar AI data center buildout and its impact on rural communities

AI Infrastructure • Data Centers • Energy Policy • +2

The Math Inside the Machine: How Intelligence Emerges from Eleven Simple Operations

book

A pop-science book explaining that the math inside ChatGPT is the same math you learned between kindergarten and calculus

Artificial Intelligence • Large Language Models • Mathematics • +2

Artificial Intelligence for Law and Finance

book

An open-source textbook bridging AI technology with practical applications in legal and financial domains, covering LLMs, agents, and knowledge graphs

Artificial Intelligence • Large Language Models • Legal Informatics • +2

software

Moratorium Nation: A Survey of Data Center, Renewable Energy, and Battery Storage Moratoria in the United States

A 113-page survey of 116 moratoria across 30 states targeting data centers, solar farms, wind turbines, and battery storage facilities

Data Centers • Renewable Energy • Battery Storage • +3

NUPunkt

software

High-precision sentence boundary detection for legal text

Natural Language Processing • Legal Informatics

Parallel Iliad: Brainrot Edition

software

A parallel Greek-English reader of Homer's Iliad (Palles 1904/1917) with AI-generated 'brainrot' translations, margin notes, and a searchable character bible

AI Translation • Classical Studies • GPT-5.4 • +2

OpenGloss

software

A synthetic encyclopedic dictionary and semantic knowledge graph for English with 537K sense definitions, 9.1M semantic edges, and 60M words of encyclopedic content

Natural Language Processing • Knowledge Graphs • Rust • +1

KL3M Tokenizers

software

Domain-specific BPE tokenizers for legal, financial, and governmental text, achieving 9-17% efficiency improvements over GPT-4 and LLaMA3

Natural Language Processing

LexNLP

software

Natural language processing and information extraction for legal and regulatory text

Natural Language Processing • Legal Informatics

Binary BPE Tokenizers

software

Cross-platform Byte Pair Encoding tokenizers for binary executables, enabling 2-3× more efficient transformer-based binary analysis across ELF, PE, Mach-O, and APK formats

Machine Learning • Binary Analysis • Natural Language Processing • +1

IOCTLance

software

Windows driver vulnerability detection through symbolic execution and taint analysis

Security • Symbolic Execution • Static Analysis • +1

OpenEDGAR

software

Open source Python client for SEC EDGAR data access and analysis

Financial Technology

CharBoundary

software

Character-based text boundary detection for legal documents using Random Forest classifiers, achieving balanced precision-recall with F1 score of 0.782

Natural Language Processing • Legal Informatics

USBills.ai

software

Open-source platform using AI and NLP to make US federal legislation accessible through plain language summaries, ELI5 explanations, and readability metrics

Legal Informatics • Natural Language Processing

ALEA Preprocess

software

High-performance data preprocessing library for large language model training, supporting pretraining, SFT, and DPO datasets with Rust-powered efficiency

Machine Learning

FOLIO Data Generator

software

Python library for generating synthetic legal data using the FOLIO knowledge graph, supporting both procedural templates and LLM-based generation

Legal Informatics

FOLIO API

software

Public RESTful API for the Federated Open Legal Information Ontology, providing programmatic access to 18,000+ legal concepts with multiple output formats

Legal Informatics

FOLIO Python Client

software

Python library for interacting with the Federated Open Legal Information Ontology, providing search, exploration, and format conversion capabilities

Knowledge Representation • Legal Informatics

leeky

software

Training data contamination detection library for black-box language models, implementing six testing methods to identify potential data leakage

AI Ethics • Machine Learning

rfcorr

software

Python library for Random Forest-based correlation measures, providing alternative approaches to traditional correlation analysis using tree-based ensemble methods

Machine Learning

pyghcn

software

Python 3 library for accessing and analyzing NOAA Global Historical Climatology Network (GHCN) weather and climate data

Climate Science

amos3

software

Python 3 client for the Archive of Many Outdoor Scenes (AMOS), enabling access to billions of outdoor webcam images for computer vision and environmental research

Computer Vision

Complex Systems 530 - Winter 2016

software

Course materials for "Computer Modeling of Complex Systems" at University of Michigan, teaching agent-based modeling and computational approaches to complex systems

Complex Systems • Educational Technology

Complex Systems 530 - Winter 2015

software

First iteration of "Computer Modeling of Complex Systems" course at University of Michigan, establishing foundation for computational complex systems education

Complex Systems • Educational Technology

Well-Settled Research

software

A computational legal research project analyzing Supreme Court decisions and legal precedents using natural language processing

Legal Analytics

quick-claude

software

modular ai agent configuration system that eliminates repeated instructions and ensures consistent behavior across coding sessions

Python • Bash • AI Agents • +2

pyenvsearch

software

python package exploration for ai agents that stops them from hallucinating apis

Python • CLI Tools • AI Agents • +1

claude-interceptors

software

command interceptors that teach ai agents and developers about modern python tooling

Bash • Python Tooling • Developer Experience • +1

mysqlfuck.c

software

MySQL vulnerability exploit demonstrating default configuration flaws in 2002

C • Security • MySQL

bluebus.be

software

University of Michigan bus tracking system running on original Google Maps from a dorm room server

Web • JavaScript • Google Maps API • +1

pystats

software

Python statistics library built on numarray in the pre-NumPy and SciPy era

Python • numarray • Statistics

PHP JAMA

software

PHP 4.x port of the Java Matrix package for linear algebra computations

PHP • Linear Algebra • Mathematics

db-xml

software

PHP 3.x/4.x library for serialization and deserialization between MySQL/PostgreSQL and XML

PHP • MySQL • PostgreSQL • +1

phpDesktop

software

Web-based personal organization suite predating Google Calendar and Trello by several years

PHP • Web • JavaScript • +1

datasets

Michigan Energy Infrastructure & Population Density Map

dataset

Multi-layer geospatial visualization of Michigan's energy infrastructure overlaid on Census population density, with a data center siting score for all 1,580 county subdivisions

Geospatial • Energy Infrastructure • Data Centers • +3

KL3M Dataset

dataset

Legal Informatics

KL3M Data Project

dataset

Large-scale copyright-clean dataset containing 132M+ documents and trillions of tokens for training legal language models

Legal Informatics

OpenMPSC Data

dataset

Open dataset of Michigan Public Service Commission regulatory proceedings, 1987–2026, with a public REST API at openmpsc.com

Legal Informatics

linux-drivers.com

dataset

Evidence-backed dossiers for 864 Linux kernel drivers, recommending which legacy drivers to keep, annotate, deprecate, or remove

Linux Kernel • Static Analysis • Codex • +2

Law on the Market

dataset

A comprehensive 15-year study examining how Supreme Court decisions impact stock market returns, finding significant abnormal returns in 37% of cases

Legal Analytics • Financial Analysis

Binary-30K Dataset

dataset

The first heterogeneous binary analysis dataset for deep learning research, featuring 30,000 diverse executables spanning multiple platforms, architectures, and file formats

Machine Learning • Binary Analysis • Malware Detection • +2

FOLIO - Federated Open Legal Information Ontology

dataset

Open-source legal data standard containing 18,000+ standardized legal concepts with multilingual support for improved legal industry interoperability

Knowledge Representation • Legal Informatics

Federal Bill Statistics

dataset

Original source code and data infrastructure that powered the initial version of usbills.ai platform

Legal Informatics

Measuring and Modeling the U.S. Regulatory Ecosystem

dataset

Large-scale empirical analysis of regulatory complexity using 165,000+ SEC filings to map the evolution of the U.S. regulatory landscape

Computational Law • Complex Systems

The Race to the Bund

dataset

An innovative analysis of European financial integration using eigendecomposition of sovereign bond yield correlations from 1872 to 2010

Financial Analysis

U.S. Code Complexity

dataset

Computational analysis measuring the complexity of the United States Code using mathematical and network science approaches

Computational Law • Complex Systems

models

SCOTUS Predict

model

A machine learning model that predicts Supreme Court voting behavior with 70% accuracy, analyzing 60 years of decisions from 1953-2013

Legal Analytics • Machine Learning

SCOTUS Predict v2

model

An enhanced Supreme Court prediction model achieving 70.2% accuracy across 200 years of decisions (1816-2015), analyzing over 240,000 justice votes

Legal Analytics • Machine Learning

KL3M Toxicity Research

model

Comprehensive research examining toxicity and bias in legal language models, demonstrating KL3M's superior safety profile through rigorous testing

AI Ethics • Legal Informatics

Legal Sentence Boundary Detection Paper

model

Research presenting NUPunkt and CharBoundary libraries for high-precision sentence segmentation in legal text, achieving 29-32% improvement over general-purpose tools

Natural Language Processing • Legal Informatics

Linux as a Model

model

Training transformer models to memorize Linux kernel source code to demonstrate issues with AI training data licensing

Machine Learning • Open Source

KL3M Model Research

model

Research and development repository for advancing the Kelvin Legal Large Language Model family with new architectures and training approaches

Machine Learning • Legal Informatics

All the Patents

model

Generating and publishing obvious inventions using AI to challenge the patent system

Machine Learning • Patent Law • Legal Informatics

GPT-4 Passes the Bar Exam

model

Research demonstrating GPT-4's ability to pass the Uniform Bar Examination, significantly outperforming both human test-takers and prior AI models

AI Evaluation • Legal Informatics

GPT as Knowledge Worker

model

Research evaluating GPT models' capabilities on the Uniform CPA Examination, exploring AI's potential to transform knowledge work

AI Evaluation

GPT Takes the CPA Exam

model

Initial repository for research evaluating GPT models on the CPA exam, later developed into the comprehensive "GPT as Knowledge Worker" project

AI Evaluation

GPT Takes the Bar Exam

model

Groundbreaking research demonstrating GPT-3.5's performance on the Multistate Bar Examination, predicting AI's ability to pass professional legal licensing exams

AI Evaluation • Legal Informatics

FMLGen

model

A humorous AI project that generates absurd "F*** My Life" stories using modern language models, comparing current capabilities to 2013 n-gram approaches

Natural Language Processing