ue8m0 data format
on this page
ue8m0 is an 8-bit unsigned floating-point data format with 8 exponent bits and 0 mantissa bits, designed for microscaling applications in ai workloads.
technical specification
format characteristics
- 8 exponent bits enabling power-of-two values from 2^-127 to 2^127
- 0 mantissa bits - functions as pure scaling factor
- unsigned format with no sign bit
- nan represented as 0xff, no infinity support
- must be used in packed format as ue8m0x2
the format was documented in nvidia’s ptx isa version 9.0 (august 2025).1
binary structure
standard fp8 (e5m2): [S][EEEEE][MM]
standard fp8 (e4m3): [S][EEEE][MMM]
ue8m0 format: [EEEEEEEE]
microscaling applications
ue8m0 serves as a shared scaling factor for blocks of 32 elements in lower precision formats:2
- mxfp8: 8-bit values with ue8m0 scale
- mxfp6: 6-bit values with ue8m0 scale
- mxfp4: 4-bit values with ue8m0 scale
this approach reduces memory bandwidth by up to 75% while maintaining acceptable accuracy for inference.3
for more on mxfp4 formats and gpt model requirements, see gpt-oss mxfp4 requirements.
hardware benefits
the zero-mantissa design simplifies hardware:
- eliminates multiplication circuits
- reduces silicon area
- optimizes for scaling operations
- aligns with 32-element tensor tiles
platform support
nvidia gpus
support introduced in ptx isa 9.0 (august 2025):1
- supported: gb200 and newer
- not supported: h100, h20, a100, older architectures
deepseek models
deepseek v3.1 (august 21, 2025) was trained using ue8m0 fp8 scale data format.4
implementation
def ue8m0_encode(value):
"""encode float as ue8m0"""
if value == 0 or math.isnan(value):
return 0xff if math.isnan(value) else 0x00
exponent = int(math.log2(abs(value)))
biased_exp = exponent + 127
return max(0, min(254, biased_exp))
def ue8m0_decode(byte_value):
"""decode ue8m0 to float"""
if byte_value == 0xff:
return float('nan')
if byte_value == 0x00:
return 0.0
exponent = byte_value - 127
return 2.0 ** exponent
precision considerations
error tolerance increases from 1e-5 (standard fp8) to approximately 7e-4 with ue8m0.5 this makes it suitable primarily for inference rather than training.
references
[1] nvidia. (2025, august 1). parallel thread execution isa version 9.0.
[2] darvish rouhani, b., et al. (2025, may 30). microscaling data formats for deep learning.
[3] autogpt. (2025, august 21). deepseek launches new model with domestic chips.
[4] deepseek ai. (2025, august 21). deepseek-v3.1 model card.
[5] deepseek ai. (2025). ue8m0 features regression issue #240.