publications
this page contains my academic publications, talks, and research resources organized by type:
- papers & articles include textbooks and peer-reviewed research
- talks & presentations from conferences, podcasts, and invited lectures
- datasets & resources for research and collaboration
looking for bibtex?
here's a copy of all of my publication bibtex entries: publications.bbl
papers & articles
Textbooks
Legal informatics
Katz, D. M., Dolin, R., & Bommarito, M. J. • Cambridge University Press (2021)
Research Papers
2025
The KL3M Data Project: Copyright-Clean Training Resources for Large Language Models
Bommarito II, M. J., Bommarito, J., & Katz, D. M. • arXiv preprint
KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications
Bommarito, M. J., Katz, D. M., & Bommarito, J. • arXiv preprint
Precise Legal Sentence Boundary Detection for Retrieval at Scale: NUPunkt and CharBoundary
Bommarito, M. J., Katz, D. M., & Bommarito, J. • arXiv preprint
2024
GPT-4 passes the bar exam
Katz, D. M., Bommarito, M. J., Gao, S., & Arredondo, P. • Philosophical Transactions of the Royal Society A
2023
GPT as knowledge worker: a zero-shot evaluation of (AI) CPA capabilities
Bommarito, J., Bommarito, M., Katz, D. M., & Katz, J. • arXiv preprint
Natural language processing in the legal domain
Katz, D. M., Hartung, D., Gerlach, L., Jana, A., & Bommarito II, M. J. • arXiv preprint
2022
LexGLUE: A benchmark dataset for legal language understanding in English
Chalkidis, I., Jana, A., Hartung, D., Bommarito, M., Androutsopoulos, I., Katz, D. M., & Aletras, N. • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022)
2021
An empirical analysis of the R package ecosystem
Bommarito, M. J., & Bommarito, E. • arXiv preprint
LexNLP: Natural language processing and information extraction for legal and regulatory texts
Bommarito II, M. J., Katz, D. M., & Detterman, E. M. • Research Handbook on Big Data Law
Measuring law over time: A network analytical framework with an application to statutes and regulations in the United States and Germany
Coupette, C., Beckedorf, J., Hartung, D., Bommarito, M., & Katz, D. M. • Frontiers in Physics
Preprocessing Data
Bommarito II, M. J. • Legal Informatics (Chapter 2.3)
2020
Sensitivity of collective outcomes identifies pivotal components
Lee, E. D., Katz, D. M., Bommarito, M. J., & Ginsparg, P. H. • Journal of The Royal Society Interface
2019
An empirical analysis of the Python Package Index (PyPI)
Bommarito, E., & Bommarito, M. • arXiv preprint
2018
OpenEDGAR: Open source software for SEC EDGAR analysis
Bommarito, M. J., Katz, D. M., & Detterman, E. M. • MIT Computational Law Report
Spectral analysis of time-dependent market-adjusted return correlation matrix
Bommarito II, M. J., & Duran, A. • Physica A: Statistical Mechanics and its Applications
2017
A general approach for predicting the behavior of the Supreme Court of the United States
Katz, D. M., Bommarito, M. J., & Blackman, J. • PLoS One
Crowdsourcing accurately and robustly predicts Supreme Court decisions
Katz, D. M., Bommarito II, M. J., & Blackman, J. • arXiv preprint
Harnessing legal complexity
Ruhl, J.B., Katz, D. M., & Bommarito, M. J. • Science
Measuring and modeling the US regulatory ecosystem
Bommarito II, M. J., & Katz, D. M. • Journal of Statistical Physics
2016
Legal analytics course
Katz, D. M., & Bommarito, M. J. • Course Materials
2015
Law on the market? Evaluating the securities market impact of Supreme Court decisions
Katz, D. M., Bommarito, M. J., Soellinger, T., & Chen, J. M. • arXiv preprint
The Electronic World Treaty Index: Collecting the Population of International Agreements in the 20th Century
Poast, P., Bommarito, M. J., & Katz, D. M. • SSRN
Understanding the Federal Communication Commission's Policy-Making Using Big Data
Candeub, A., & Bommarito, M. J. • TPRC 43: The 43rd Research Conference on Communication, Information and Internet Policy
2014
Measuring the complexity of the law: the United States Code
Katz, D. M., & Bommarito, M. J. • Artificial Intelligence and Law
2013
Interactions between organizations and networks in common-pool resource governance
Agrawal, A., Brown, D. G., Rao, G., Riolo, R., Robinson, D. T., & Bommarito II, M. • Environmental Science & Policy
2011
A profitable trading and risk management strategy despite transaction costs
Duran, A., & Bommarito, M. J. • Quantitative Finance
Legal n-grams? A simple approach to track the 'evolution' of legal language
Katz, D. M., Bommarito, M. J., Seaman, J., Candeub, A., & Agichtein, E. • Proceedings of JURIX 2011
Reproduction of hierarchy? A social network analysis of the American law professoriate
Katz, D. M., Gubler, J. R., Zelner, J., & Bommarito, M. J. • Journal of Legal Education
2010
A mathematical approach to the study of the United States Code
Bommarito II, M. J., & Katz, D. M. • Physica A: Statistical Mechanics and its Applications
An Empirical Survey of the Population of US Tax Court Written Decisions
Bommarito, M. J. • Virginia Tax Review
Building the United States Supreme Court Disposition Corpus 1791-2009
Bommarito II, M. J., & Katz, D. M. • Linguistic Data Consortium
Distance measures for dynamic citation networks
Bommarito II, M. J., Katz, D. M., Zelner, J. L., & Fowler, J. H. • Physica A: Statistical Mechanics and its Applications
Intraday Correlation Patterns between the S&P 500 and Sector Indices
Bommarito, M. J. • SSRN
On the stability of community detection algorithms on longitudinal citation data
Bommarito II, M. J., Katz, D. M., & Zelner, J. L. • Procedia-Social and Behavioral Sciences
2009
Exploring Relationships between Legal Concepts in the United States Supreme Court
Bommarito, M. J. • SSRN
Law as a seamless web? Comparison of various network representations of the United States Supreme Court corpus (1791-2005)
Bommarito, M. J., Katz, D., & Zelner, J. • Proceedings of the 12th International Conference on Artificial Intelligence and Law
Properties of the United States Code citation network
Bommarito II, M. J., & Katz, D. M. • arXiv preprint
talks & presentations
The Landscape for Content Licensing for AI: Challenges and Opportunities
Book Industry Study Group • September 30, 2024
Legal certification 2.0: Are LLMs turning legal education upside down?
Law of Tech Podcast • November 14, 2023
Preparing Law Students for the Future
3 Geeks and a Law Blog • May 30, 2023
Can GPT Pass the Bar Exam? We Find Out
LawNext Podcast • January 9, 2023
From Law of the Sea to Legal Underwriting
Fin(Legal)Tech 2016 • November 4, 2016
Law's Future from Finance's Past
ReInventLaw Silicon Valley • March 8, 2013