data sources and methodology

published: October 16, 2025
on this page

overview

this document describes the data collection methodology, source types, reliability tiers, validation processes, and quality standards used to build and maintain the comprehensive us data center infrastructure database.

database scope

  • geographic coverage: all 50 us states
  • total projects: 604 documented facilities
  • total sources: 300+ primary sources
  • collection period: january 2024 - october 2025
  • ongoing updates: continuous monitoring and validation

data collection methodology

systematic search strategy

state-by-state coverage:

  1. structured web searches for each state using queries:

    • "{state} data center construction"
    • "{state} data center investment"
    • "hyperscale data center {state}"
    • "ai data center {state}"
    • "{major city} data center"
  2. government source review:

    • state economic development announcements
    • governor press releases
    • public utilities commission filings
    • tax incentive disclosures
    • environmental impact reports
  3. industry publication monitoring:

    • data center dynamics
    • data center frontier
    • data center knowledge
    • dc blox industry reports
    • mission critical magazine
  4. company disclosure review:

    • sec filings (10-k, 10-q, 8-k)
    • earnings call transcripts
    • investor presentations
    • press releases
    • corporate blogs
  5. local news monitoring:

    • regional business journals
    • local newspapers
    • construction trade publications
    • real estate development news

validation process

multi-source verification:

  • tier 1 claims (>$10b investment or >1 gw capacity): minimum 3 independent sources required
  • tier 2 claims ($1-10b or 100-1000 mw): minimum 2 independent sources required
  • tier 3 claims (less than $1b or less than 100 mw): minimum 1 credible source required

cross-reference validation:

  • compare against commercial databases (dc byte, dc map)
  • verify entity names against official registrations
  • confirm location details via property records
  • validate timeline against construction permits

conservative estimation:

  • when ranges provided, use lower bound
  • when conflicting sources, use most credible source
  • when unclear, mark fields as unknown rather than guess
  • document all assumptions in notes field

source types and reliability

tier 1 sources (highest reliability)

Source TypeExamplesReliability
SEC FilingsForm 10-K, 10-Q, 8-K, S-1, proxy statementsLegally binding, audited, highest credibility
Official Company Press ReleasesCorporate newsrooms, investor relations announcementsOfficial statements, legally reviewed
Government AnnouncementsGovernor press releases, state economic development agenciesOfficial government records, politically verified
Earnings Call TranscriptsQuarterly earnings calls with Q&AExecutive statements under regulation FD

citation format:

{
  "url": "https://www.sec.gov/Archives/edgar/data/789019/000095017024087843/msft-20240630.htm",
  "title": "Microsoft Corporation Form 10-K for Fiscal Year Ended June 30, 2024",
  "date": "2024-06-30",
  "publisher": "U.S. Securities and Exchange Commission",
  "type": "sec-filing"
}

tier 2 sources (high reliability)

Source TypeExamplesReliability
Industry PublicationsData Center Dynamics, Data Center Frontier, DCDSpecialized journalists, industry expertise
Major Business NewsBloomberg, WSJ, Financial Times, ReutersProfessional journalism, editorial standards
Regional Business JournalsBisnow, local business journalsLocal expertise, development contacts
Utility Commission FilingsPUC rate case filings, load forecastsRegulatory oversight, verified data

citation format:

{
  "url": "https://www.datacenterdynamics.com/en/news/article-title/",
  "title": "Article Title",
  "date": "2024-09-15",
  "publisher": "Data Center Dynamics",
  "type": "industry-publication"
}

tier 3 sources (moderate reliability)

Source TypeExamplesReliability
General Tech NewsTechCrunch, The Verge, Ars TechnicaTech journalism, varying depth
Local NewspapersRegional daily newspapersLocal reporting, limited technical depth
Construction Trade PublicationsENR, Construction DiveConstruction focus, project-level detail
LinkedIn PostsExecutive announcements, company updatesDirect from source, informal

tier 4 sources (supplementary only)

Source TypeExamplesUse Case
WikipediaCompany pages, technology articlesBackground only, verify facts independently
Commercial DatabasesData Center Map, DC ByteCross-reference, not primary source
Social MediaTwitter/X, company social accountsBreaking news, requires verification
Blog PostsCompany blogs, industry commentaryContext and analysis, not data

note: tier 4 sources must be supplemented with tier 1-3 sources for any factual claims

source documentation standards

required fields

all sources must include:

  1. url: full, permanent url to source

    • prefer permalink over homepage
    • use archive.org for unstable urls
    • verify url accessibility before inclusion
  2. title: exact article/document title

    • copy verbatim from source
    • include subtitle if significant
    • use title case
  3. publisher: official publication name

    • use consistent naming (e.g., “Data Center Dynamics” not “DCD”)
    • include parent organization if relevant
    • verify official publisher name
  4. type: source category from schema

    • use standardized enum values
    • select most specific type
    • defaults to “news” if unclear
  1. date: publication date

    • format: yyyy-mm-dd preferred
    • critical for news sources
    • use article date, not access date
  2. author: article author

    • full name if available
    • multiple authors separated by commas
    • omit if institutional authorship

data quality guidelines

completeness standards

minimum viable project:

  • project name (unique identifier)
  • location (city and county)
  • status (current state)
  • at least one sponsor
  • at least one operator
  • at least one purpose
  • at least one source

comprehensive project (target standard):

  • all minimum fields plus:
  • investment or power capacity
  • announced date
  • construction timeline
  • sustainability info
  • multiple sources
  • detailed notes

accuracy verification

location validation:

  • verify city/county spelling
  • confirm geographic region
  • validate against property records
  • check for multiple campuses

size validation:

  • distinguish between it load and total utility capacity
  • verify square footage includes support space
  • confirm investment includes all phases
  • cross-check against comparable projects

timeline validation:

  • verify announcement dates against press releases
  • confirm construction start via permits
  • validate completion via operational evidence
  • note delays or changes in notes field

entity validation:

  • use official legal entity names
  • distinguish parent/subsidiary relationships
  • verify operator vs tenant distinction
  • confirm sponsor financial role

handling conflicting sources

when sources conflict:

  1. prefer tier 1 over tier 2-4: sec filings trump news articles
  2. prefer newer over older: more recent typically more accurate
  3. prefer specific over general: detailed reporting over summary
  4. prefer local over national: local sources often have better access
  5. document discrepancy: note conflict in notes field

example:

{
  "notes": "Investment reported as $10B by DCD (Sept 2024) and $12B by local news (Oct 2024). Using conservative $10B estimate pending official confirmation."
}

update frequency and procedures

continuous monitoring

daily monitoring (tier 1 sources):

  • sec edgar filings
  • major company press releases
  • governor announcements
  • utility commission filings

weekly monitoring (tier 2 sources):

  • data center dynamics
  • data center frontier
  • bloomberg/wsj data center coverage
  • regional business journals

monthly monitoring (tier 3 sources):

  • general tech news
  • construction publications
  • linkedin company updates
  • local newspapers

update triggers

immediate update required for:

  • mega-project announcements (>$10b or >1gw)
  • major entity partnerships
  • project cancellations or delays
  • significant size revisions
  • status changes (announced → construction → operational)

monthly update cycle:

  • routine project progression
  • minor size adjustments
  • source additions
  • notes clarifications

version control

lastUpdated field:

  • state file level: lastUpdated: "2025-10-14"
  • updated whenever any project in state changes
  • iso 8601 date format required

entity dossiers:

  • entity level: lastUpdated: "2025-10-15"
  • updated when material information changes
  • quarterly review minimum

specialized data categories

ai/ml projects

identification criteria:

  • explicit ai/ml purpose in announcements
  • gpu-focused infrastructure
  • high rack density (>50 kw/rack)
  • partnerships with ai companies
  • purpose includes “ai-ml” tag

additional validation:

  • verify gpu counts if disclosed
  • confirm cooling infrastructure (liquid cooling)
  • validate power density claims
  • document ai workload types

nuclear partnerships

documentation requirements:

  • partner entity (utility or smr vendor)
  • capacity commitment in mw
  • technology type (traditional/smr/microreactor)
  • timeline to deployment
  • tier 1 source required

validation steps:

  • distinguish between mou and binding ppa
  • verify regulatory pathway
  • confirm timeline feasibility
  • note technology readiness level

gigawatt-scale projects

enhanced validation:

  • minimum 2 tier 1 or tier 2 sources
  • verify utility capacity availability
  • confirm power sourcing strategy
  • validate construction timeline feasibility
  • document phasing plan

red flags requiring investigation:

  • no identified utility partner
  • unrealistic timeline (less than 18 months)
  • unclear power source
  • no construction permits

data limitations and caveats

known limitations

incomplete disclosure:

  • many projects don’t disclose investment
  • power capacity often not specified
  • exact locations sometimes confidential
  • tenant information typically private

timing challenges:

  • announcements may precede permits
  • construction schedules often delayed
  • completion dates frequently revised
  • cancellations may not be announced

definition ambiguity:

  • “data center” broadly defined
  • campus vs individual building unclear
  • total vs it power capacity varies
  • square footage gross vs net varies

appropriate use cases

database suitable for:

  • market sizing and trend analysis
  • competitive intelligence
  • investment research
  • policy analysis
  • academic research

database limitations for:

  • real-time construction status
  • precise operational timelines
  • detailed technical specifications
  • private tenant information
  • investment returns analysis

source archive and preservation

url stability

archive.org integration:

  • all urls archived via wayback machine
  • archive date recorded for critical sources
  • broken links replaced with archive urls
  • permanent identifiers where available

preferred url formats:

  • direct article urls (not homepage)
  • doi links for academic sources
  • sec edgar direct filing links
  • press release permalink

source retention

minimum retention:

  • original source url preserved
  • publication date recorded
  • publisher name standardized
  • source type categorized

enhanced retention (tier 1 sources):

  • pdf download for sec filings
  • screenshot for critical claims
  • transcript for earnings calls
  • full text archive where permitted

citation best practices

citing this database

academic citation:

Bommarito, Michael J. (2025). US Data Center Infrastructure Database.
Retrieved from https://michaelbommarito.com/wiki/datacenters
[Last updated: October 16, 2025]

journalistic attribution:

According to the US Data Center Infrastructure Database compiled by
Michael J. Bommarito, there are 604 documented projects across all 50 states
representing $1.1+ trillion in disclosed investment.

data licensing:

  • database provided for research and analysis
  • attribution required for derivative works
  • commercial use requires permission
  • contact for licensing inquiries

citing individual projects

reference format:

Project Name, Location (Status). Investment/Capacity metrics.
Source: [Primary Source Title], [Publisher], [Date].
Database: Bommarito US Data Center Infrastructure Database.

example:

Stargate Project - Abilene Campus, Abilene, TX (Operational). $40B investment, 1.2 GW capacity.
Source: "Oracle to spend $40bn on Nvidia GPUs for OpenAI Texas data center,"
Data Center Dynamics, 2024.
Database: Bommarito US Data Center Infrastructure Database.

quality assurance process

initial entry validation

  • all required fields populated
  • location verified via google maps
  • entity names match official names
  • dates in iso 8601 format
  • numbers in full format (not abbreviated)
  • minimum sources met
  • source urls accessible
  • no obvious typos

quarterly review checklist

  • verify project status accuracy
  • update timeline milestones
  • check for new sources
  • validate entity name consistency
  • review size estimates
  • update notes with new info
  • archive deprecated urls

annual audit procedures

  • comprehensive source link checking
  • entity dossier completeness review
  • geographic distribution analysis
  • size estimate recalibration
  • timeline accuracy assessment
  • competitive landscape update

contact and contributions

reporting errors

found an error? please report:

  • specific project or entity name
  • field with incorrect information
  • correct information with source
  • contact via website

suggesting additions

suggest new projects by providing:

  • project name and location
  • sponsor/operator entities
  • size metrics (investment/capacity)
  • status and timeline
  • minimum 1 tier 1-2 source

methodology feedback

suggestions for methodology improvements welcome, especially regarding:

  • source reliability assessment
  • validation procedures
  • data quality metrics
  • update frequency optimization

this database represents best effort to document us data center infrastructure using publicly available sources. all data subject to verification and correction as new information becomes available. last methodology review: october 2025.

on this page