china ai hardware decoupling notes

current observations (august 2025)

the format: ue8m0 emerges as key technical differentiator - 8-bit float with zero mantissa for ai inference
recent catalyst: lutnick’s july 2025 “addiction” comment accelerated existing regulatory shifts against nvidia
hardware ecosystem: moore threads (ex-nvidia china leadership) positioned as domestic ue8m0 partner after years of development
manufacturing reality: smic’s 7nm constraint through 2026 drives efficiency innovations like ue8m0
strategic split: training may still require nvidia hardware, but ue8m0 chips enable large-scale inference with existing or “borrowed” model weights
market response: nvidia’s august 2025 ue8m0 support suggests acceptance of parallel ecosystems

note: this is an evolving collection of observations on china-us ai hardware decoupling that began accelerating in 2019-2020. this page documents recent developments in a multi-year strategic divergence, with updates added as new information becomes available.

recent developments (july-august 2025)

the lutnick comment in july 2025 represents a continuation of tensions that have been building since the october 2022 export controls and earlier trade restrictions dating to 2019.

on july 15, 2025, u.s. commerce secretary howard lutnick stated: “you want to sell the chinese enough that their developers get addicted to the american technology stack.”¹

this comment catalyzed existing chinese regulatory momentum:¹

july 22: cyberspace administration issues guidance to halt h20 purchases¹
july 31: cac summons nvidia executives over “serious security issues”¹
august: ndrc requests tech groups refrain from nvidia chip purchases¹

these actions build on years of preparation for technological independence, including substantial investments in domestic semiconductor capabilities beginning in 2020.

technical divergence strategy

format differentiation

the ue8m0 data format represents the latest phase in a multi-year effort to develop alternative technical standards. deepseek’s explicit statement that ue8m0 fp8 scale is “designed for the upcoming next-generation domestically produced chips”⁶ reflects years of coordinated development between chinese ai companies and hardware manufacturers.

ue8m0 technical details

8-bit exponent, 0 mantissa design differs from standard fp8³
optimized for inference over training workloads²
reduces memory bandwidth by up to 75%⁷
nvidia added ue8m0 support in ptx isa 9.0 (august 2025)³

parallel software ecosystems

the development of alternative frameworks has been ongoing since at least 2020:

moore threads musa (2021-present): cuda-compatible platform with musify migration tool⁸
huawei cann (2019-present): proprietary framework for ascend chips, accelerated after entity list addition
deepseek deepep (2024-present): hardware-specific optimizations showing ue8m0 regression issues on non-gb200 hardware⁹

model-hardware co-evolution

deepseek v3.1 (august 21, 2025) trained with ue8m0 format represents culmination of multi-year collaboration.⁶ the 840 billion additional training tokens and format-specific optimization create technical lock-in effects that reinforce ecosystem separation.

critically, while model training may still benefit from or require nvidia hardware for optimal performance, the ue8m0-optimized chips enable china to deploy large-scale inference infrastructure using model weights developed domestically or obtained through other channels. this decouples inference capability from training dependency.

key players

tl;dr

company	focus	ecosystem	key product
moore threads	consumer/research	musa (cuda-compatible)	mtt s4000¹⁰
huawei	enterprise/government	cann (proprietary)	ascend 910c
biren technology	datacenter	traditional gpu	br100/104
cambricon	inference	specialized	mlu series

structural constraints

manufacturing limitations (2020-present)

china’s fabrication constraints have shaped strategy since the 2020 entity list additions:

smic limited to 7nm through at least 2026 due to euv equipment restrictions imposed in 2019⁴
yields initially below 30% in 2023, improving to 40%+ by 2025 using double-patterning duv¹¹
5nm process developed in 2024 but with yields below commercial viability¹¹
3nm development ongoing, targeting 2026 tape-out without euv access¹¹

these persistent limitations drove the strategic decision to optimize for efficiency (ue8m0) rather than pursue performance parity through advanced nodes.

key timeline (selected events)

tl;dr

date	event	context
may 2019	huawei entity list addition	catalyst for domestic chip development
oct 2020	moore threads founded	ex-nvidia china gm starts gpu company¹⁰
oct 2022	us export controls on advanced chips	restricts nvidia a100/h100 to china
oct 2023	moore threads entity list	blocks access to tsmc, design tools¹⁰
dec 2023	mtt s4000 launch	notably lacks fp8 support¹⁰
feb 2025	moore threads-deepseek partnership	hardware-software alignment¹⁰
jul 2025	lutnick comments	accelerates existing tensions¹
aug 2025	deepseek v3.1 with ue8m0	format designed for domestic chips⁶

market implications

global ai chip market projections

amd ceo lisa su expects the ai processor market to exceed $500 billion by 2028.¹² asia-pacific region led with 33% market share in 2023, with china as key driver.¹³

china market impact

expected bifurcation by 2028:

ai chips in datacenters projected at $33 billion globally by 2028¹²
asia-pacific highest growth rate during forecast period¹³
china adding more chip capacity than rest of world combined in 2024¹¹

nvidia faces strategic dilemma:

support ue8m0 (validates china’s strategy)
ignore ue8m0 (loses china market access)
create compatibility bridges (undermines u.s. policy)

the addition of ue8m0 to ptx isa 9.0 suggests nvidia chose option 1.³

observations and analysis

technical tradeoffs (current state)

the ue8m0 approach reflects years of navigating constraints:

error tolerance: 7e-4 (ue8m0) vs 1e-5 (standard fp8)⁹
memory reduction: up to 75%⁷
simplified hardware: no mantissa circuits³
inference focus: targeting 90% of future ai workloads
training-inference split: accepts continued nvidia dependency for training while achieving inference independence

indicators to track

ongoing developments to monitor:

moore threads ipo prospectus (filed november 2024, expected q4 2025)¹⁰
smic yield improvements and 5nm/3nm progress¹¹
deepseek model performance on ue8m0 vs standard hardware⁹
patent filings mentioning “8-bit exponent” or “microscaling”
ieee p3109 working group standards proposals
additional domestic chip announcements supporting ue8m0

evolving dynamics

the ue8m0 format and associated ecosystem represent one visible outcome of multi-year strategic decisions on both sides. what began as trade tensions in 2019 has evolved into technical divergence, with the august 2025 developments marking a new phase rather than an isolated event.

china’s approach - architectural divergence through format incompatibility - reflects constraints imposed since 2019 and investments made in response. the strategy optimizes for specific realities: persistent fabrication limitations,⁴ large domestic market, and independence imperatives reinforced by successive policy actions.

the strategic insight is the decoupling of training from inference: while cutting-edge model training may continue to benefit from nvidia’s superior hardware, the ue8m0 ecosystem enables china to deploy these models at scale for inference. this creates a sustainable path where model weights - whether developed domestically on nvidia hardware, trained through international collaborations, or obtained through other means - can be efficiently deployed on domestic infrastructure.

nvidia’s addition of ue8m0 support in ptx isa 9.0³ suggests recognition that parallel ecosystems may be the new equilibrium, rather than temporary divergence.

future updates: this page will be updated as new information becomes available about technical developments, policy changes, and market evolution in the china-us ai hardware landscape.