bartz v. anthropic: settlement outline and implications
on this page
first, for all who i’ve argued with over the last three years:
told you so!
fair use is a real thing that deserves protection, and copyright is often unfairly weaponized, but this was never going to end any other way. you cannot pirate millions of works for a commercial activity without consequences.
will this be the end of ai? no, of course not. even anthropic and openai, despite their financial “independence,” can raise enough private capital to cover any settlement. meta and google were never at risk from a solvency perspective anyway.
it will, hopefully, be the beginning of a more fair future.
the “pirate” who settled
detailed analysis of why anthropic settled bartz et al. v. anthropic pbc just days before damaging discovery rulings, avoiding a december 2025 trial that could have reshaped ai training law. the case centered on anthropic’s admitted use of “pirated” book datasets to train claude, with potential damages reaching $22.5 billion.
the original complaint pulled no punches: “anthropic has built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books. rather than obtaining permission and paying a fair price for the creations it exploits, anthropic pirated them.”
on august 26, 2025, anthropic settled rather than proceed to a december 1 trial that would have exposed exactly how claude was trained on admittedly pirated materials. the settlement came hours before a noon deadline to produce documents that a special master ruled weren’t privileged.
case basics
- citation: bartz et al. v. anthropic pbc, no. 3:24-cv-05417-wha (n.d. cal. filed aug. 19, 2024)
- judge: william h. alsup (who can actually code)
- plaintiffs: andrea bartz (the lost night), charles graeber (the good nurse, the breakthrough), kirk wallace johnson (to be a friend is fatal)
- core allegation: “anthropic has built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books”
- trial date avoided: december 1, 2025
why anthropic settled: the final week
five days of adverse rulings
august 22, 2025: special master orders production of 35 documents anthropic claimed were privileged, including one where the legal department was “intentionally excluded”
august 23, 2025: three adverse orders:
- anthropic must provide missing copyright agreements (doc 341)
- expert witness battle schedule set - damages could reach billions (doc 342)
- special master compels answers about “pirated books” list (doc 343)
august 25, 2025: discovery disputes escalate:
- 8am: sarah rodriguez fails as 30(b)(6) witness
- evening: three emergency motions expose anthropic’s obstruction
- missing slack channels about datasets never produced
- mysterious spreadsheet about deletions clawed back
august 26, 2025:
- noon: deadline to produce damaging documents
- afternoon: settlement announced to avoid production
the evidence anthropic couldn’t explain
- “pirated books” terminology: special master’s order used quotes, suggesting anthropic internally called the datasets “pirated”
- excluded legal department: document antpriv_0000642 showed “the legal department was intentionally excluded”
- missing slack channels: [redacted] channel with “obvious relevance given its title” never produced
- deletion spreadsheet: five copies of spreadsheet about [redacted] produced then all clawed back claiming privilege
- co-founder involvement: document questioning about jared kaplan (co-founder) immediately triggered clawback
what the december trial would have looked like
evidence the jury would have seen
if the case had proceeded to trial, the jury would have reviewed:
the books3 smoking gun:
- anthropic admitted using the pile dataset, which included books3
- books3 = 196,640 pirated books from bibliotik (“a notorious pirated collection”)
- created by shawn presser who tweeted: “now you do. now everyone does. presenting ‘books3’, aka ‘all of bibliotik’”
- eleutherai’s own paper admitted: “we included bibliotik because books are invaluable for long-range context modeling”
anthropic’s admissions:
- july 2024: company spokesperson confirmed use of the pile to proof news
- december 2021 paper: described dataset as “32% internet books” (code word for pirated copies)
- internal references to “pirated books” (per special master’s order)
- inability to explain retention or deletion practices
damages calculations:
- approximately 150,000 copyrighted works identified
- statutory damages range: $750 to $150,000 per work
- potential exposure: $112 million to $22.5 billion
- lost licensing revenue claims from author class
why fair use wouldn’t save anthropic
the complaint demolished anthropic’s potential fair use defense:
commercial purpose: anthropic projected $850 million revenue in 2024, valued at $18+ billion after raising $7.6 billion from amazon and google
wholesale copying: “anthropic downloaded known pirated versions of plaintiffs’ works, made copies of them, and fed these pirated copies into its models”
market harm:
- claude marketed to “draft everything from a text message or email to a screenplay or a novel”
- tim boucher “wrote” 97 books using claude in less than a year, selling them for 5.99
- each book took only “six to eight hours” to generate
- ai-generated copycats flooding amazon when real authors release books
judge alsup’s technical background would have enabled examination of:
- how training data is processed and stored
- whether copies persist after training
- technical necessity of using complete works
- alternatives to copyrighted material
witness testimony disasters
ben mann (anthropic’s principal technical witness):
- former openai vp who led gpt-2 and gpt-3 development
- designated as 30(b)(6) witness on dataset handling
- repeatedly claimed inability to answer basic questions
- 10,000+ documents produced afternoon before his deposition
- anthropic clawed back key document mid-deposition when plaintiffs tried to use it
sarah rodriguez (30(b)(6) discovery witness):
- unable to describe document retention systems
- could not identify repository structures
- no knowledge of deletion protocols
- deposed august 25, one day before settlement
these 30(b)(6) failures created litigation risk - corporate representatives unable to answer basic questions about company operations indicate either inadequate preparation or problematic underlying facts.
the settlement terms (what we know)
what’s public (or probable)
- binding term sheet signed august 25, 2025
- class-wide settlement covering all authors whose works may have been used
- permanent resolution of all related claims involving the parties
- mediation by hon. layn phillips (renowned tech mediator)
what’s (currently) sealed
- dollar amount (likely substantial given liability exposure)
- any injunctive relief (probably none)
- individual allocations (how much each author gets)
- cy pres provisions (where unclaimed funds go)
i think i am eligible to join the class; will confirm when the process is public.
reasonable estimates
based on the complaint’s allegations and liability exposure:
maximum statutory damages: 150,000+ works × $150,000 = $22.5 billion
minimum statutory damages: 150,000+ works × $750 = $112.5 million
likely settlement range: $50-500 million (typical for tech settlements avoiding precedent)
per-author recovery: varies by number of works infringed
future impact: no injunction likely, anthropic continues using the data
earlier, eddie lee predicted “business-ending liability” for anthropic, calculating that even 100,000 works could result in $1-3 billion in damages at standard rates, with willful infringement potentially reaching $15 billion.
parallels to openai’s trajectory
the bartz case mirrors tensions in openai’s evolution:
the mission vs. money conflict
like openai’s 2019 capped-profit pivot, anthropic faces the fundamental tension between:
- original mission: ai safety and benefit to humanity
- commercial reality: need for capital and competitive pressure
- legal exposure: copyright liability from training methods
the transparency retreat
anthropic’s discovery obstruction parallels openai’s transparency decline:
- openai 2018: gpt-1 fully documented
- openai 2019: gpt-2 withheld for “safety”
- openai 2023: gpt-4 details secret
- anthropic 2025: can’t/won’t explain training data sources
settlement implications
why anthropic settled
the timing reveals anthropic’s calculus:
- discovery disaster: 30(b)(6) witnesses couldn’t answer basic questions
- document production issues: late dumps and clawback disputes exposed
- adverse rulings imminent: special master likely to rule against them
- december 1 trial looming: risk of precedent-setting verdict
what the settlement means
for anthropic
- prevents precedent that could threaten their other cases
- potentially undermines openai, who now has to explain what was revealed in deposition…
- protects secrets about actual training data practices
- pays for peace
for the ai industry
- first major settlement in ai training litigation
- establishes value for copyright claims
- encourages more lawsuits from content creators
- no legal precedent on fair use question
for authors
- compensation for class members (amount unknown)
- validation that their claims have merit
- faster resolution than years of litigation
- but: no injunction against future use
predictions and precedent
forward-looking predictions
based on this settlement and typical patterns in tech litigation:
near term (3-6 months)
- copycat lawsuits: the class certification precedent will trigger similar cases
- settlement amounts leak: financial press will likely uncover the sealed terms
- statutory damage fear: other ai companies will rush to settle after seeing anthropic fold
- licensing deals emerge: preventive agreements become standard practice
medium term (6-18 months)
- legislative action: congress intervenes to prevent “business-ending” scenarios
- damage cap proposals: tech lobby pushes for statutory damage reform
- technical adaptations: shift to synthetic data becomes existential necessity
- insurance crisis: liability insurance for ai training becomes prohibitively expensive
long term (2+ years)
- the great relicensing: ai companies retroactively license training data?
- market bifurcation:
- us models constrained by copyright liability
- international models trained on “everything”
- judicial precedent: supreme court finally addresses ai fair use
- new business models: authors become data vendors, not just content creators
of course, there is always discussion of the white house’s opinion, but the fact is that this has very little to do with them. no executive order has the power to udno the actions of courts in such clear statutory situations. there are practically zero regs and very little agency guidance involved here.
the exaggeration factor
while predictions of “business-ending” damages grab headlines, anthropic’s quick settlement suggests the real calculus is more nuanced. the company likely paid far less than the theoretical maximum to make the case disappear - probably in the hundreds of millions, not billions. the threat of massive statutory damages serves as a powerful negotiating tool, but actual business destruction remains unlikely for well-funded ai companies willing to settle.
lessons for ai companies
discovery preparedness
the 30(b)(6) failures reveal critical governance gaps:
- document retention policies must be clear and followed
- corporate representatives need actual knowledge
- technical details can’t be hidden forever
- courts expect transparency about data sources
the settlement calculation
anthropic’s decision matrix likely included:
- trial risk: potential for massive statutory damages
- precedent risk: adverse ruling affecting entire industry
- discovery exposure: damaging internal communications
- competitive impact: distraction from product development
strategic positioning
the case demonstrates that ai companies must:
- document data provenance defensibly
- prepare for litigation from day one
- balance growth with legal risk
- consider preventive licensing
connection to broader ai developments
the china factor
deepseek’s $6 million training cost changes the equation:
- if models can be trained cheaply, copyright damages matter less
- but u.s. companies face stricter legal environment
- creates competitive disadvantage for rule-following companies
technical details revealed
despite redactions, technical insights emerged:
data pipeline architecture
- acquisition: bulk downloads from specific sources
- storage: distributed across multiple repositories
- processing: [redacted] transformation steps
- integration: merged into training pipeline
- retention: unclear policies on raw data
organizational knowledge
- engineers knew about copyright concerns
- leadership involved in dataset decisions
- deletion policies changed over time
- documentation exists but heavily protected
conclusion
if it concludes, the bartz settlement is a big moment. even if it doesn’t, it will still stand as the first time the cards began to fall.
ultimately, bartz v. anthropic demonstrates that the era of “ask forgiveness, not permission” is finally ending. it’s time to actually do things like adults who live in a political system descended from the magna carta.
i can’t wait.
sources and references
primary documents
- document 341-355: complete case file analysis
- openai timeline: industry context and parallels
key filings analyzed
- motion to approve class notice (doc 350)
- letter motion re discovery failures (doc 348, 353)
- settlement notices (doc 354, 355)
- expert evidence orders (doc 346)
- privilege disputes (doc 344, 345)