china ai hardware decoupling notes

current observations (august 2025)

  • the format: ue8m0 emerges as key technical differentiator - 8-bit float with zero mantissa for ai inference

  • recent catalyst: lutnick’s july 2025 “addiction” comment accelerated existing regulatory shifts against nvidia

  • hardware ecosystem: moore threads (ex-nvidia china leadership) positioned as domestic ue8m0 partner after years of development

  • manufacturing reality: smic’s 7nm constraint through 2026 drives efficiency innovations like ue8m0

  • strategic split: training may still require nvidia hardware, but ue8m0 chips enable large-scale inference with existing or “borrowed” model weights

  • market response: nvidia’s august 2025 ue8m0 support suggests acceptance of parallel ecosystems

note: this is an evolving collection of observations on china-us ai hardware decoupling that began accelerating in 2019-2020. this page documents recent developments in a multi-year strategic divergence, with updates added as new information becomes available.

recent developments (july-august 2025)

the lutnick comment in july 2025 represents a continuation of tensions that have been building since the october 2022 export controls and earlier trade restrictions dating to 2019.

on july 15, 2025, u.s. commerce secretary howard lutnick stated: “you want to sell the chinese enough that their developers get addicted to the american technology stack.”1

this comment catalyzed existing chinese regulatory momentum:1

  • july 22: cyberspace administration issues guidance to halt h20 purchases1
  • july 31: cac summons nvidia executives over “serious security issues”1
  • august: ndrc requests tech groups refrain from nvidia chip purchases1

these actions build on years of preparation for technological independence, including substantial investments in domestic semiconductor capabilities beginning in 2020.

technical divergence strategy

format differentiation

the ue8m0 data format represents the latest phase in a multi-year effort to develop alternative technical standards. deepseek’s explicit statement that ue8m0 fp8 scale is “designed for the upcoming next-generation domestically produced chips”6 reflects years of coordinated development between chinese ai companies and hardware manufacturers.

ue8m0 technical details

  • 8-bit exponent, 0 mantissa design differs from standard fp83

  • optimized for inference over training workloads2

  • reduces memory bandwidth by up to 75%7

  • nvidia added ue8m0 support in ptx isa 9.0 (august 2025)3

parallel software ecosystems

the development of alternative frameworks has been ongoing since at least 2020:

  • moore threads musa (2021-present): cuda-compatible platform with musify migration tool8
  • huawei cann (2019-present): proprietary framework for ascend chips, accelerated after entity list addition
  • deepseek deepep (2024-present): hardware-specific optimizations showing ue8m0 regression issues on non-gb200 hardware9

model-hardware co-evolution

deepseek v3.1 (august 21, 2025) trained with ue8m0 format represents culmination of multi-year collaboration.6 the 840 billion additional training tokens and format-specific optimization create technical lock-in effects that reinforce ecosystem separation.

critically, while model training may still benefit from or require nvidia hardware for optimal performance, the ue8m0-optimized chips enable china to deploy large-scale inference infrastructure using model weights developed domestically or obtained through other channels. this decouples inference capability from training dependency.

key players

tl;dr

companyfocusecosystemkey product
moore threadsconsumer/researchmusa (cuda-compatible)

mtt s400010

huaweienterprise/governmentcann (proprietary)ascend 910c
biren technologydatacentertraditional gpubr100/104
cambriconinferencespecializedmlu series

structural constraints

manufacturing limitations (2020-present)

china’s fabrication constraints have shaped strategy since the 2020 entity list additions:

  • smic limited to 7nm through at least 2026 due to euv equipment restrictions imposed in 20194
  • yields initially below 30% in 2023, improving to 40%+ by 2025 using double-patterning duv11
  • 5nm process developed in 2024 but with yields below commercial viability11
  • 3nm development ongoing, targeting 2026 tape-out without euv access11

these persistent limitations drove the strategic decision to optimize for efficiency (ue8m0) rather than pursue performance parity through advanced nodes.

key timeline (selected events)

tl;dr

dateeventcontext
may 2019huawei entity list additioncatalyst for domestic chip development
oct 2020moore threads foundedex-nvidia china gm starts gpu company10
oct 2022us export controls on advanced chipsrestricts nvidia a100/h100 to china
oct 2023moore threads entity listblocks access to tsmc, design tools10
dec 2023mtt s4000 launchnotably lacks fp8 support10
feb 2025moore threads-deepseek partnershiphardware-software alignment10
jul 2025lutnick commentsaccelerates existing tensions1
aug 2025deepseek v3.1 with ue8m0format designed for domestic chips6

market implications

global ai chip market projections

amd ceo lisa su expects the ai processor market to exceed $500 billion by 2028.12 asia-pacific region led with 33% market share in 2023, with china as key driver.13

china market impact

expected bifurcation by 2028:

  • ai chips in datacenters projected at $33 billion globally by 202812
  • asia-pacific highest growth rate during forecast period13
  • china adding more chip capacity than rest of world combined in 202411

nvidia faces strategic dilemma:

  1. support ue8m0 (validates china’s strategy)
  2. ignore ue8m0 (loses china market access)
  3. create compatibility bridges (undermines u.s. policy)

the addition of ue8m0 to ptx isa 9.0 suggests nvidia chose option 1.3

observations and analysis

technical tradeoffs (current state)

the ue8m0 approach reflects years of navigating constraints:

  • error tolerance: 7e-4 (ue8m0) vs 1e-5 (standard fp8)9
  • memory reduction: up to 75%7
  • simplified hardware: no mantissa circuits3
  • inference focus: targeting 90% of future ai workloads
  • training-inference split: accepts continued nvidia dependency for training while achieving inference independence

indicators to track

ongoing developments to monitor:

  • moore threads ipo prospectus (filed november 2024, expected q4 2025)10
  • smic yield improvements and 5nm/3nm progress11
  • deepseek model performance on ue8m0 vs standard hardware9
  • patent filings mentioning “8-bit exponent” or “microscaling”
  • ieee p3109 working group standards proposals
  • additional domestic chip announcements supporting ue8m0

evolving dynamics

the ue8m0 format and associated ecosystem represent one visible outcome of multi-year strategic decisions on both sides. what began as trade tensions in 2019 has evolved into technical divergence, with the august 2025 developments marking a new phase rather than an isolated event.

china’s approach - architectural divergence through format incompatibility - reflects constraints imposed since 2019 and investments made in response. the strategy optimizes for specific realities: persistent fabrication limitations,4 large domestic market, and independence imperatives reinforced by successive policy actions.

the strategic insight is the decoupling of training from inference: while cutting-edge model training may continue to benefit from nvidia’s superior hardware, the ue8m0 ecosystem enables china to deploy these models at scale for inference. this creates a sustainable path where model weights - whether developed domestically on nvidia hardware, trained through international collaborations, or obtained through other means - can be efficiently deployed on domestic infrastructure.

nvidia’s addition of ue8m0 support in ptx isa 9.03 suggests recognition that parallel ecosystems may be the new equilibrium, rather than temporary divergence.

future updates: this page will be updated as new information becomes available about technical developments, policy changes, and market evolution in the china-us ai hardware landscape.

references

[1] financial times. (2025, august 20). china turns against nvidia’s ai chip after ‘insulting’ howard lutnick remarks.

[2] deepseek ai. (2025, august 21). deepseek-v3.1 model card. hugging face.

[3] nvidia. (2025, august 1). parallel thread execution isa version 9.0.

[4] wccftech. (2024). smic to limit huawei to 7nm chips until 2026.

[5] asia times. (2024, november). tsmc’s 7nm chip ban targets china’s ai chipmakers.

[6] investing.com. (2025, august 21). china’s deepseek upgrades ai model to support domestic chips.

[7] autogpt. (2025, august 21). deepseek launches new model with domestic chips.

[8] tom’s hardware. (2024). china’s moore threads polishes homegrown cuda alternative.

[9] deepseek ai. (2025). ue8m0(pr206) features cause severe regression issue. github issue #240.

[10] technode. (2024, november 15). chinese gpu unicorn moore threads files for ipo in china.

[11] granitefirm. (2025, march 8). how is smic after us embargo?

[12] bloomberg. (2025, june 12). amd ceo sees ai processor market exceeding $500 billion by 2028.

[13] globenewswire. (2024, october 28). ai chip market expected to reach usd 621.15 billion by 2032.

on this page