chunkloris: http/1 vs http/2 vs http/3

one line

per-chunk amplification is a parser/dispatcher boundary property, not an http/1-specific one: every protocol that delivers a body as a sequence of framed units inherits it. http/1 has chunks. http/2 has data frames. http/3 has data frames over quic streams. websocket has text/binary frames. the attack shape is the same.

comparison of wire units across HTTP/1.1 chunks, HTTP/2 DATA frames, HTTP/3 DATA frames, and WebSocket frames

the four wire formats studied. each allows arbitrarily small per-unit payloads under the spec.

what changes per protocol

protocol comparison of per-frame CPU cost under Mode A at N=250,000

per-frame server cpu cost across protocols at mode a, N = 250,000, 1 vcpu container.

http/1.1 — chunked transfer encoding

  • wire unit: chunk. <hex-size>\r\n<bytes>\r\n per rfc 9112 §7.1.
  • flow control: none. tcp backpressure only.
  • parser callback boundary: per chunk in most servers (httptools, llhttp, hyper’s httparse-driven parser, node’s _http_common, etc.). a few servers batch at the application boundary (kestrel via System.IO.Pipelines).
  • headline range, mode b N=250k: 3.6 µs/chunk (kestrel) to 113.6 µs/chunk (nginx as origin). median ≈ 12.4 µs.
  • upstream mitigation that works: default nginx proxy_request_buffering on collapses N chunks into one content-length-framed upstream request. apache mod_proxy_http is documented to behave the same way.
  • upstream mitigation that does not work: haproxy by default streams. http-buffer-request buffers up to one buffer’s worth (tune.bufsize, default 16 kib) and then forwards the rest chunk by chunk.

http/2 — data frames over a single tcp connection

  • wire unit: DATA frame. rfc 9113 §6.1.
  • flow control: byte-level only. WINDOW_UPDATE bounds bytes; nothing bounds frames. a peer can legally send N one-byte DATA frames within a default 65,535-byte window. some implementations independently impose a “frame size lower bound” or count frames against a per-stream credit; the measurement matrix in the paper records which.
  • goaway-on-tiny-frames: three of the 17 measured h2c servers (cowboy / phoenix, spring-tomcat) abort the request with GOAWAY error code 11 (ENHANCE_YOUR_CALM) before fully consuming the tiny-frame body. this is the cleanest in-protocol mitigation we observed; it is not a spec requirement.
  • batches correctly: vertx-h2, rust-hyper-h2, actix-h2 complete the matrix below the paced cpu threshold under mode b. these implementations either apply a per-stream frame credit or coalesce frame payloads into per-window reads before waking the application.
  • headline range, mode b N=250k: 3.7 µs/frame (rust-hyper-h2) to 103.5 µs/frame (nginx-h2). 11 of 17 fall in the per-frame band.

http/3 — data frames over quic streams

  • wire unit: DATA frame on a quic bidi stream. rfc 9114 §7.2.1.
  • flow control: quic has byte-level stream and connection flow control. again no frame-level limit.
  • parser callback boundary: every measured h3 server (hypercorn / aioquic, raw aioquic, quic-go, kestrel over msquic) delivers data per frame.
  • headline range, mode a N=250k: 0.83 – 2.83 µs/frame.
  • headline range, mode b N=250k: 33 – 67 µs/frame. the spread between mode a and mode b for h3 is wider than h1 because the per-frame work includes quic stream bookkeeping that runs at frame arrival time, and paced arrivals defeat the coalescing the quic stack would otherwise do.

websocket — text/binary frames

  • wire unit: ws data frame. rfc 6455 §5.6.
  • most implementations already batch: node-ws, gorilla, rust-tungstenite, kestrel-ws complete mode a at 0.09 – 0.39 µs/frame and mode b at 5 – 11 µs/frame. they are within the cpu-cost band you would expect from a per-recv batched delivery.
  • the outlier: python asgi via uvicorn. both websockets and wsproto back-ends deliver one frame per asgi receive() call, and the asgi spec does not provide a per-frame batching primitive at the application boundary.

why the http/1 mitigation does not generalize

deploying http/1 services behind nginx with proxy_request_buffering on collapses the request body into a single content-length-framed upstream request. that mitigation does not exist in the same form for http/2 or http/3 because:

  • the upstream from a typical h2 reverse proxy to the origin is itself h2 (or h2c). aggregating DATA frames before forwarding to the origin would require the proxy to re-frame, which most proxies do not do by default.
  • the same is true for h3 / quic upstreams.

so for h2 and h3 the mitigation has to live either:

  • at the origin server, by imposing a per-stream frame credit before waking the application, or
  • in a frontend that explicitly terminates the framing and re-frames upstream (e.g. a load balancer doing h2 → h1 with proxy_request_buffering on).

why the http/1 attack does not reach the application behind nginx

with default nginx config the body delivered to the upstream is one recv() covering both headers and the entire content-length-framed body. an application handler reading the request body sees one event, not N. this is the empirical reason this paper does not headline an end-to-end production exploit against (e.g.) a python web app deployed behind nginx — the n-chunks shape never reaches the python parser.

with proxy_request_buffering off the same setup forwards the chunked stream unchanged and the per-chunk cost reaches the upstream.

summary table

protocolwire unitflow control unitmost-affected implsleast-affected impls
http/1.1chunknone (tcp)nginx as origin, waitress, tornado, gunicorn-sync, banditkestrel-9 (application-batched)
http/2data framebytes (WINDOW_UPDATE)nginx-h2, kestrel-h2, granian-h2, hypercorn-h2vertx-h2, rust-hyper-h2, actix-h2
http/3data framebytes (quic stream/conn)hypercorn-h3, kestrel-h3(none measured below paced cpu threshold)
websockettext/binary framenoneuvicorn + websockets / wsprotonode-ws, gorilla, rust-tungstenite, kestrel-ws

why this is consistent across protocols

the parser/dispatcher boundary is wherever decoded body bytes cross from the i/o-driven byte stream into the application’s read api. on every event-loop http server measured here that boundary is, by default, one event per wire unit. http/1, http/2, and http/3 all have wire units that the attacker can make arbitrarily small under standards-compliant rules. the spec gives no lower bound on chunk size, on DATA frame size, or on ws frame payload size.

a per-frame application credit (or a System.IO.Pipelines-style application batching layer) is the only known design that decouples the application’s wake rate from the attacker’s framing rate. the paper argues every event-loop http server should expose this as an opt-in primitive.

on this page