data sources and methodology
on this page
overview
this document describes the data collection methodology, source types, reliability tiers, validation processes, and quality standards used to build and maintain the comprehensive us data center infrastructure database.
database scope
- geographic coverage: all 50 us states
- total projects: 604 documented facilities
- total sources: 300+ primary sources
- collection period: january 2024 - october 2025
- ongoing updates: continuous monitoring and validation
data collection methodology
systematic search strategy
state-by-state coverage:
-
structured web searches for each state using queries:
"{state} data center construction"
"{state} data center investment"
"hyperscale data center {state}"
"ai data center {state}"
"{major city} data center"
-
government source review:
- state economic development announcements
- governor press releases
- public utilities commission filings
- tax incentive disclosures
- environmental impact reports
-
industry publication monitoring:
- data center dynamics
- data center frontier
- data center knowledge
- dc blox industry reports
- mission critical magazine
-
company disclosure review:
- sec filings (10-k, 10-q, 8-k)
- earnings call transcripts
- investor presentations
- press releases
- corporate blogs
-
local news monitoring:
- regional business journals
- local newspapers
- construction trade publications
- real estate development news
validation process
multi-source verification:
- tier 1 claims (>$10b investment or >1 gw capacity): minimum 3 independent sources required
- tier 2 claims ($1-10b or 100-1000 mw): minimum 2 independent sources required
- tier 3 claims (less than $1b or less than 100 mw): minimum 1 credible source required
cross-reference validation:
- compare against commercial databases (dc byte, dc map)
- verify entity names against official registrations
- confirm location details via property records
- validate timeline against construction permits
conservative estimation:
- when ranges provided, use lower bound
- when conflicting sources, use most credible source
- when unclear, mark fields as unknown rather than guess
- document all assumptions in notes field
source types and reliability
tier 1 sources (highest reliability)
Source Type | Examples | Reliability |
SEC Filings | Form 10-K, 10-Q, 8-K, S-1, proxy statements | Legally binding, audited, highest credibility |
Official Company Press Releases | Corporate newsrooms, investor relations announcements | Official statements, legally reviewed |
Government Announcements | Governor press releases, state economic development agencies | Official government records, politically verified |
Earnings Call Transcripts | Quarterly earnings calls with Q&A | Executive statements under regulation FD |
citation format:
{
"url": "https://www.sec.gov/Archives/edgar/data/789019/000095017024087843/msft-20240630.htm",
"title": "Microsoft Corporation Form 10-K for Fiscal Year Ended June 30, 2024",
"date": "2024-06-30",
"publisher": "U.S. Securities and Exchange Commission",
"type": "sec-filing"
}
tier 2 sources (high reliability)
Source Type | Examples | Reliability |
Industry Publications | Data Center Dynamics, Data Center Frontier, DCD | Specialized journalists, industry expertise |
Major Business News | Bloomberg, WSJ, Financial Times, Reuters | Professional journalism, editorial standards |
Regional Business Journals | Bisnow, local business journals | Local expertise, development contacts |
Utility Commission Filings | PUC rate case filings, load forecasts | Regulatory oversight, verified data |
citation format:
{
"url": "https://www.datacenterdynamics.com/en/news/article-title/",
"title": "Article Title",
"date": "2024-09-15",
"publisher": "Data Center Dynamics",
"type": "industry-publication"
}
tier 3 sources (moderate reliability)
Source Type | Examples | Reliability |
General Tech News | TechCrunch, The Verge, Ars Technica | Tech journalism, varying depth |
Local Newspapers | Regional daily newspapers | Local reporting, limited technical depth |
Construction Trade Publications | ENR, Construction Dive | Construction focus, project-level detail |
LinkedIn Posts | Executive announcements, company updates | Direct from source, informal |
tier 4 sources (supplementary only)
Source Type | Examples | Use Case |
Wikipedia | Company pages, technology articles | Background only, verify facts independently |
Commercial Databases | Data Center Map, DC Byte | Cross-reference, not primary source |
Social Media | Twitter/X, company social accounts | Breaking news, requires verification |
Blog Posts | Company blogs, industry commentary | Context and analysis, not data |
note: tier 4 sources must be supplemented with tier 1-3 sources for any factual claims
source documentation standards
required fields
all sources must include:
-
url: full, permanent url to source
- prefer permalink over homepage
- use archive.org for unstable urls
- verify url accessibility before inclusion
-
title: exact article/document title
- copy verbatim from source
- include subtitle if significant
- use title case
-
publisher: official publication name
- use consistent naming (e.g., “Data Center Dynamics” not “DCD”)
- include parent organization if relevant
- verify official publisher name
-
type: source category from schema
- use standardized enum values
- select most specific type
- defaults to “news” if unclear
optional but recommended
-
date: publication date
- format: yyyy-mm-dd preferred
- critical for news sources
- use article date, not access date
-
author: article author
- full name if available
- multiple authors separated by commas
- omit if institutional authorship
data quality guidelines
completeness standards
minimum viable project:
- project name (unique identifier)
- location (city and county)
- status (current state)
- at least one sponsor
- at least one operator
- at least one purpose
- at least one source
comprehensive project (target standard):
- all minimum fields plus:
- investment or power capacity
- announced date
- construction timeline
- sustainability info
- multiple sources
- detailed notes
accuracy verification
location validation:
- verify city/county spelling
- confirm geographic region
- validate against property records
- check for multiple campuses
size validation:
- distinguish between it load and total utility capacity
- verify square footage includes support space
- confirm investment includes all phases
- cross-check against comparable projects
timeline validation:
- verify announcement dates against press releases
- confirm construction start via permits
- validate completion via operational evidence
- note delays or changes in notes field
entity validation:
- use official legal entity names
- distinguish parent/subsidiary relationships
- verify operator vs tenant distinction
- confirm sponsor financial role
handling conflicting sources
when sources conflict:
- prefer tier 1 over tier 2-4: sec filings trump news articles
- prefer newer over older: more recent typically more accurate
- prefer specific over general: detailed reporting over summary
- prefer local over national: local sources often have better access
- document discrepancy: note conflict in notes field
example:
{
"notes": "Investment reported as $10B by DCD (Sept 2024) and $12B by local news (Oct 2024). Using conservative $10B estimate pending official confirmation."
}
update frequency and procedures
continuous monitoring
daily monitoring (tier 1 sources):
- sec edgar filings
- major company press releases
- governor announcements
- utility commission filings
weekly monitoring (tier 2 sources):
- data center dynamics
- data center frontier
- bloomberg/wsj data center coverage
- regional business journals
monthly monitoring (tier 3 sources):
- general tech news
- construction publications
- linkedin company updates
- local newspapers
update triggers
immediate update required for:
- mega-project announcements (>$10b or >1gw)
- major entity partnerships
- project cancellations or delays
- significant size revisions
- status changes (announced → construction → operational)
monthly update cycle:
- routine project progression
- minor size adjustments
- source additions
- notes clarifications
version control
lastUpdated field:
- state file level:
lastUpdated: "2025-10-14"
- updated whenever any project in state changes
- iso 8601 date format required
entity dossiers:
- entity level:
lastUpdated: "2025-10-15"
- updated when material information changes
- quarterly review minimum
specialized data categories
ai/ml projects
identification criteria:
- explicit ai/ml purpose in announcements
- gpu-focused infrastructure
- high rack density (>50 kw/rack)
- partnerships with ai companies
- purpose includes “ai-ml” tag
additional validation:
- verify gpu counts if disclosed
- confirm cooling infrastructure (liquid cooling)
- validate power density claims
- document ai workload types
nuclear partnerships
documentation requirements:
- partner entity (utility or smr vendor)
- capacity commitment in mw
- technology type (traditional/smr/microreactor)
- timeline to deployment
- tier 1 source required
validation steps:
- distinguish between mou and binding ppa
- verify regulatory pathway
- confirm timeline feasibility
- note technology readiness level
gigawatt-scale projects
enhanced validation:
- minimum 2 tier 1 or tier 2 sources
- verify utility capacity availability
- confirm power sourcing strategy
- validate construction timeline feasibility
- document phasing plan
red flags requiring investigation:
- no identified utility partner
- unrealistic timeline (less than 18 months)
- unclear power source
- no construction permits
data limitations and caveats
known limitations
incomplete disclosure:
- many projects don’t disclose investment
- power capacity often not specified
- exact locations sometimes confidential
- tenant information typically private
timing challenges:
- announcements may precede permits
- construction schedules often delayed
- completion dates frequently revised
- cancellations may not be announced
definition ambiguity:
- “data center” broadly defined
- campus vs individual building unclear
- total vs it power capacity varies
- square footage gross vs net varies
appropriate use cases
database suitable for:
- market sizing and trend analysis
- competitive intelligence
- investment research
- policy analysis
- academic research
database limitations for:
- real-time construction status
- precise operational timelines
- detailed technical specifications
- private tenant information
- investment returns analysis
source archive and preservation
url stability
archive.org integration:
- all urls archived via wayback machine
- archive date recorded for critical sources
- broken links replaced with archive urls
- permanent identifiers where available
preferred url formats:
- direct article urls (not homepage)
- doi links for academic sources
- sec edgar direct filing links
- press release permalink
source retention
minimum retention:
- original source url preserved
- publication date recorded
- publisher name standardized
- source type categorized
enhanced retention (tier 1 sources):
- pdf download for sec filings
- screenshot for critical claims
- transcript for earnings calls
- full text archive where permitted
citation best practices
citing this database
academic citation:
Bommarito, Michael J. (2025). US Data Center Infrastructure Database.
Retrieved from https://michaelbommarito.com/wiki/datacenters
[Last updated: October 16, 2025]
journalistic attribution:
According to the US Data Center Infrastructure Database compiled by
Michael J. Bommarito, there are 604 documented projects across all 50 states
representing $1.1+ trillion in disclosed investment.
data licensing:
- database provided for research and analysis
- attribution required for derivative works
- commercial use requires permission
- contact for licensing inquiries
citing individual projects
reference format:
Project Name, Location (Status). Investment/Capacity metrics.
Source: [Primary Source Title], [Publisher], [Date].
Database: Bommarito US Data Center Infrastructure Database.
example:
Stargate Project - Abilene Campus, Abilene, TX (Operational). $40B investment, 1.2 GW capacity.
Source: "Oracle to spend $40bn on Nvidia GPUs for OpenAI Texas data center,"
Data Center Dynamics, 2024.
Database: Bommarito US Data Center Infrastructure Database.
quality assurance process
initial entry validation
- all required fields populated
- location verified via google maps
- entity names match official names
- dates in iso 8601 format
- numbers in full format (not abbreviated)
- minimum sources met
- source urls accessible
- no obvious typos
quarterly review checklist
- verify project status accuracy
- update timeline milestones
- check for new sources
- validate entity name consistency
- review size estimates
- update notes with new info
- archive deprecated urls
annual audit procedures
- comprehensive source link checking
- entity dossier completeness review
- geographic distribution analysis
- size estimate recalibration
- timeline accuracy assessment
- competitive landscape update
contact and contributions
reporting errors
found an error? please report:
- specific project or entity name
- field with incorrect information
- correct information with source
- contact via website
suggesting additions
suggest new projects by providing:
- project name and location
- sponsor/operator entities
- size metrics (investment/capacity)
- status and timeline
- minimum 1 tier 1-2 source
methodology feedback
suggestions for methodology improvements welcome, especially regarding:
- source reliability assessment
- validation procedures
- data quality metrics
- update frequency optimization
this database represents best effort to document us data center infrastructure using publicly available sources. all data subject to verification and correction as new information becomes available. last methodology review: october 2025.