ue8m0 data format

published: August 21, 2025

ue8m0 is an 8-bit unsigned floating-point data format with 8 exponent bits and 0 mantissa bits, designed for microscaling applications in ai workloads.

technical specification

format characteristics

  • 8 exponent bits enabling power-of-two values from 2^-127 to 2^127
  • 0 mantissa bits - functions as pure scaling factor
  • unsigned format with no sign bit
  • nan represented as 0xff, no infinity support
  • must be used in packed format as ue8m0x2

the format was documented in nvidia’s ptx isa version 9.0 (august 2025).1

binary structure

standard fp8 (e5m2): [S][EEEEE][MM]
standard fp8 (e4m3): [S][EEEE][MMM]
ue8m0 format:        [EEEEEEEE]

microscaling applications

ue8m0 serves as a shared scaling factor for blocks of 32 elements in lower precision formats:2

  • mxfp8: 8-bit values with ue8m0 scale
  • mxfp6: 6-bit values with ue8m0 scale
  • mxfp4: 4-bit values with ue8m0 scale

this approach reduces memory bandwidth by up to 75% while maintaining acceptable accuracy for inference.3

for more on mxfp4 formats and gpt model requirements, see gpt-oss mxfp4 requirements.

hardware benefits

the zero-mantissa design simplifies hardware:

  • eliminates multiplication circuits
  • reduces silicon area
  • optimizes for scaling operations
  • aligns with 32-element tensor tiles

platform support

nvidia gpus

support introduced in ptx isa 9.0 (august 2025):1

  • supported: gb200 and newer
  • not supported: h100, h20, a100, older architectures

deepseek models

deepseek v3.1 (august 21, 2025) was trained using ue8m0 fp8 scale data format.4

implementation

def ue8m0_encode(value):
    """encode float as ue8m0"""
    if value == 0 or math.isnan(value):
        return 0xff if math.isnan(value) else 0x00

    exponent = int(math.log2(abs(value)))
    biased_exp = exponent + 127

    return max(0, min(254, biased_exp))

def ue8m0_decode(byte_value):
    """decode ue8m0 to float"""
    if byte_value == 0xff:
        return float('nan')
    if byte_value == 0x00:
        return 0.0

    exponent = byte_value - 127
    return 2.0 ** exponent

precision considerations

error tolerance increases from 1e-5 (standard fp8) to approximately 7e-4 with ue8m0.5 this makes it suitable primarily for inference rather than training.

references

[1] nvidia. (2025, august 1). parallel thread execution isa version 9.0.

[2] darvish rouhani, b., et al. (2025, may 30). microscaling data formats for deep learning.

[3] autogpt. (2025, august 21). deepseek launches new model with domestic chips.

[4] deepseek ai. (2025, august 21). deepseek-v3.1 model card.

[5] deepseek ai. (2025). ue8m0 features regression issue #240.

on this page