china ai hardware decoupling notes
on this page
current observations (august 2025)
the format: ue8m0 emerges as key technical differentiator - 8-bit float with zero mantissa for ai inference
recent catalyst: lutnick’s july 2025 “addiction” comment accelerated existing regulatory shifts against nvidia
hardware ecosystem: moore threads (ex-nvidia china leadership) positioned as domestic ue8m0 partner after years of development
manufacturing reality: smic’s 7nm constraint through 2026 drives efficiency innovations like ue8m0
strategic split: training may still require nvidia hardware, but ue8m0 chips enable large-scale inference with existing or “borrowed” model weights
market response: nvidia’s august 2025 ue8m0 support suggests acceptance of parallel ecosystems
note: this is an evolving collection of observations on china-us ai hardware decoupling that began accelerating in 2019-2020. this page documents recent developments in a multi-year strategic divergence, with updates added as new information becomes available.
recent developments (july-august 2025)
the lutnick comment in july 2025 represents a continuation of tensions that have been building since the october 2022 export controls and earlier trade restrictions dating to 2019.
on july 15, 2025, u.s. commerce secretary howard lutnick stated: “you want to sell the chinese enough that their developers get addicted to the american technology stack.”1
this comment catalyzed existing chinese regulatory momentum:1
- july 22: cyberspace administration issues guidance to halt h20 purchases1
- july 31: cac summons nvidia executives over “serious security issues”1
- august: ndrc requests tech groups refrain from nvidia chip purchases1
these actions build on years of preparation for technological independence, including substantial investments in domestic semiconductor capabilities beginning in 2020.
technical divergence strategy
format differentiation
the ue8m0 data format represents the latest phase in a multi-year effort to develop alternative technical standards. deepseek’s explicit statement that ue8m0 fp8 scale is “designed for the upcoming next-generation domestically produced chips”6 reflects years of coordinated development between chinese ai companies and hardware manufacturers.
ue8m0 technical details
parallel software ecosystems
the development of alternative frameworks has been ongoing since at least 2020:
- moore threads musa (2021-present): cuda-compatible platform with musify migration tool8
- huawei cann (2019-present): proprietary framework for ascend chips, accelerated after entity list addition
- deepseek deepep (2024-present): hardware-specific optimizations showing ue8m0 regression issues on non-gb200 hardware9
model-hardware co-evolution
deepseek v3.1 (august 21, 2025) trained with ue8m0 format represents culmination of multi-year collaboration.6 the 840 billion additional training tokens and format-specific optimization create technical lock-in effects that reinforce ecosystem separation.
critically, while model training may still benefit from or require nvidia hardware for optimal performance, the ue8m0-optimized chips enable china to deploy large-scale inference infrastructure using model weights developed domestically or obtained through other channels. this decouples inference capability from training dependency.
key players
tl;dr
company | focus | ecosystem | key product |
---|---|---|---|
moore threads | consumer/research | musa (cuda-compatible) | mtt s400010 |
huawei | enterprise/government | cann (proprietary) | ascend 910c |
biren technology | datacenter | traditional gpu | br100/104 |
cambricon | inference | specialized | mlu series |
structural constraints
manufacturing limitations (2020-present)
china’s fabrication constraints have shaped strategy since the 2020 entity list additions:
- smic limited to 7nm through at least 2026 due to euv equipment restrictions imposed in 20194
- yields initially below 30% in 2023, improving to 40%+ by 2025 using double-patterning duv11
- 5nm process developed in 2024 but with yields below commercial viability11
- 3nm development ongoing, targeting 2026 tape-out without euv access11
these persistent limitations drove the strategic decision to optimize for efficiency (ue8m0) rather than pursue performance parity through advanced nodes.
key timeline (selected events)
tl;dr
date | event | context |
---|---|---|
may 2019 | huawei entity list addition | catalyst for domestic chip development |
oct 2020 | moore threads founded | ex-nvidia china gm starts gpu company10 |
oct 2022 | us export controls on advanced chips | restricts nvidia a100/h100 to china |
oct 2023 | moore threads entity list | blocks access to tsmc, design tools10 |
dec 2023 | mtt s4000 launch | notably lacks fp8 support10 |
feb 2025 | moore threads-deepseek partnership | hardware-software alignment10 |
jul 2025 | lutnick comments | accelerates existing tensions1 |
aug 2025 | deepseek v3.1 with ue8m0 | format designed for domestic chips6 |
market implications
global ai chip market projections
amd ceo lisa su expects the ai processor market to exceed $500 billion by 2028.12 asia-pacific region led with 33% market share in 2023, with china as key driver.13
china market impact
expected bifurcation by 2028:
- ai chips in datacenters projected at $33 billion globally by 202812
- asia-pacific highest growth rate during forecast period13
- china adding more chip capacity than rest of world combined in 202411
nvidia faces strategic dilemma:
- support ue8m0 (validates china’s strategy)
- ignore ue8m0 (loses china market access)
- create compatibility bridges (undermines u.s. policy)
the addition of ue8m0 to ptx isa 9.0 suggests nvidia chose option 1.3
observations and analysis
technical tradeoffs (current state)
the ue8m0 approach reflects years of navigating constraints:
- error tolerance: 7e-4 (ue8m0) vs 1e-5 (standard fp8)9
- memory reduction: up to 75%7
- simplified hardware: no mantissa circuits3
- inference focus: targeting 90% of future ai workloads
- training-inference split: accepts continued nvidia dependency for training while achieving inference independence
indicators to track
ongoing developments to monitor:
- moore threads ipo prospectus (filed november 2024, expected q4 2025)10
- smic yield improvements and 5nm/3nm progress11
- deepseek model performance on ue8m0 vs standard hardware9
- patent filings mentioning “8-bit exponent” or “microscaling”
- ieee p3109 working group standards proposals
- additional domestic chip announcements supporting ue8m0
evolving dynamics
the ue8m0 format and associated ecosystem represent one visible outcome of multi-year strategic decisions on both sides. what began as trade tensions in 2019 has evolved into technical divergence, with the august 2025 developments marking a new phase rather than an isolated event.
china’s approach - architectural divergence through format incompatibility - reflects constraints imposed since 2019 and investments made in response. the strategy optimizes for specific realities: persistent fabrication limitations,4 large domestic market, and independence imperatives reinforced by successive policy actions.
the strategic insight is the decoupling of training from inference: while cutting-edge model training may continue to benefit from nvidia’s superior hardware, the ue8m0 ecosystem enables china to deploy these models at scale for inference. this creates a sustainable path where model weights - whether developed domestically on nvidia hardware, trained through international collaborations, or obtained through other means - can be efficiently deployed on domestic infrastructure.
nvidia’s addition of ue8m0 support in ptx isa 9.03 suggests recognition that parallel ecosystems may be the new equilibrium, rather than temporary divergence.
future updates: this page will be updated as new information becomes available about technical developments, policy changes, and market evolution in the china-us ai hardware landscape.
references
[2] deepseek ai. (2025, august 21). deepseek-v3.1 model card. hugging face.
[3] nvidia. (2025, august 1). parallel thread execution isa version 9.0.
[4] wccftech. (2024). smic to limit huawei to 7nm chips until 2026.
[5] asia times. (2024, november). tsmc’s 7nm chip ban targets china’s ai chipmakers.
[6] investing.com. (2025, august 21). china’s deepseek upgrades ai model to support domestic chips.
[7] autogpt. (2025, august 21). deepseek launches new model with domestic chips.
[8] tom’s hardware. (2024). china’s moore threads polishes homegrown cuda alternative.
[9] deepseek ai. (2025). ue8m0(pr206) features cause severe regression issue. github issue #240.
[10] technode. (2024, november 15). chinese gpu unicorn moore threads files for ipo in china.
[11] granitefirm. (2025, march 8). how is smic after us embargo?
[12] bloomberg. (2025, june 12). amd ceo sees ai processor market exceeding $500 billion by 2028.
[13] globenewswire. (2024, october 28). ai chip market expected to reach usd 621.15 billion by 2032.