chunkloris: http/1 vs http/2 vs http/3
on this page
one line
per-chunk amplification is a parser/dispatcher boundary property, not an http/1-specific one: every protocol that delivers a body as a sequence of framed units inherits it. http/1 has chunks. http/2 has data frames. http/3 has data frames over quic streams. websocket has text/binary frames. the attack shape is the same.
the four wire formats studied. each allows arbitrarily small per-unit payloads under the spec.
what changes per protocol
per-frame server cpu cost across protocols at mode a, N = 250,000, 1 vcpu container.
http/1.1 — chunked transfer encoding
- wire unit: chunk.
<hex-size>\r\n<bytes>\r\nper rfc 9112 §7.1. - flow control: none. tcp backpressure only.
- parser callback boundary: per chunk in most servers (httptools, llhttp,
hyper’s httparse-driven parser, node’s _http_common, etc.). a few servers
batch at the application boundary (kestrel via
System.IO.Pipelines). - headline range, mode b N=250k: 3.6 µs/chunk (kestrel) to 113.6 µs/chunk (nginx as origin). median ≈ 12.4 µs.
- upstream mitigation that works: default nginx
proxy_request_buffering oncollapses N chunks into one content-length-framed upstream request. apachemod_proxy_httpis documented to behave the same way. - upstream mitigation that does not work: haproxy by default streams.
http-buffer-requestbuffers up to one buffer’s worth (tune.bufsize, default 16 kib) and then forwards the rest chunk by chunk.
http/2 — data frames over a single tcp connection
- wire unit:
DATAframe. rfc 9113 §6.1. - flow control: byte-level only.
WINDOW_UPDATEbounds bytes; nothing bounds frames. a peer can legally send N one-byteDATAframes within a default 65,535-byte window. some implementations independently impose a “frame size lower bound” or count frames against a per-stream credit; the measurement matrix in the paper records which. - goaway-on-tiny-frames: three of the 17 measured h2c servers
(cowboy / phoenix, spring-tomcat) abort the request with
GOAWAY error code 11(ENHANCE_YOUR_CALM) before fully consuming the tiny-frame body. this is the cleanest in-protocol mitigation we observed; it is not a spec requirement. - batches correctly: vertx-h2, rust-hyper-h2, actix-h2 complete the matrix below the paced cpu threshold under mode b. these implementations either apply a per-stream frame credit or coalesce frame payloads into per-window reads before waking the application.
- headline range, mode b N=250k: 3.7 µs/frame (rust-hyper-h2) to 103.5 µs/frame (nginx-h2). 11 of 17 fall in the per-frame band.
http/3 — data frames over quic streams
- wire unit:
DATAframe on a quic bidi stream. rfc 9114 §7.2.1. - flow control: quic has byte-level stream and connection flow control. again no frame-level limit.
- parser callback boundary: every measured h3 server (hypercorn / aioquic, raw aioquic, quic-go, kestrel over msquic) delivers data per frame.
- headline range, mode a N=250k: 0.83 – 2.83 µs/frame.
- headline range, mode b N=250k: 33 – 67 µs/frame. the spread between mode a and mode b for h3 is wider than h1 because the per-frame work includes quic stream bookkeeping that runs at frame arrival time, and paced arrivals defeat the coalescing the quic stack would otherwise do.
websocket — text/binary frames
- wire unit: ws data frame. rfc 6455 §5.6.
- most implementations already batch: node-ws, gorilla, rust-tungstenite, kestrel-ws complete mode a at 0.09 – 0.39 µs/frame and mode b at 5 – 11 µs/frame. they are within the cpu-cost band you would expect from a per-recv batched delivery.
- the outlier: python asgi via uvicorn. both
websocketsandwsprotoback-ends deliver one frame per asgireceive()call, and the asgi spec does not provide a per-frame batching primitive at the application boundary.
why the http/1 mitigation does not generalize
deploying http/1 services behind nginx with proxy_request_buffering on
collapses the request body into a single content-length-framed upstream
request. that mitigation does not exist in the same form for http/2 or
http/3 because:
- the upstream from a typical h2 reverse proxy to the origin is itself h2
(or h2c). aggregating
DATAframes before forwarding to the origin would require the proxy to re-frame, which most proxies do not do by default. - the same is true for h3 / quic upstreams.
so for h2 and h3 the mitigation has to live either:
- at the origin server, by imposing a per-stream frame credit before waking the application, or
- in a frontend that explicitly terminates the framing and re-frames upstream
(e.g. a load balancer doing h2 → h1 with
proxy_request_buffering on).
why the http/1 attack does not reach the application behind nginx
with default nginx config the body delivered to the upstream is one
recv() covering both headers and the entire content-length-framed body. an
application handler reading the request body sees one event, not N. this is
the empirical reason this paper does not headline an end-to-end production
exploit against (e.g.) a python web app deployed behind nginx — the n-chunks
shape never reaches the python parser.
with proxy_request_buffering off the same setup forwards the chunked stream
unchanged and the per-chunk cost reaches the upstream.
summary table
| protocol | wire unit | flow control unit | most-affected impls | least-affected impls |
|---|---|---|---|---|
| http/1.1 | chunk | none (tcp) | nginx as origin, waitress, tornado, gunicorn-sync, bandit | kestrel-9 (application-batched) |
| http/2 | data frame | bytes (WINDOW_UPDATE) | nginx-h2, kestrel-h2, granian-h2, hypercorn-h2 | vertx-h2, rust-hyper-h2, actix-h2 |
| http/3 | data frame | bytes (quic stream/conn) | hypercorn-h3, kestrel-h3 | (none measured below paced cpu threshold) |
| websocket | text/binary frame | none | uvicorn + websockets / wsproto | node-ws, gorilla, rust-tungstenite, kestrel-ws |
why this is consistent across protocols
the parser/dispatcher boundary is wherever decoded body bytes cross from the
i/o-driven byte stream into the application’s read api. on every event-loop
http server measured here that boundary is, by default, one event per wire
unit. http/1, http/2, and http/3 all have wire units that the attacker can
make arbitrarily small under standards-compliant rules. the spec gives no
lower bound on chunk size, on DATA frame size, or on ws frame payload size.
a per-frame application credit (or a System.IO.Pipelines-style application
batching layer) is the only known design that decouples the application’s
wake rate from the attacker’s framing rate. the paper argues every event-loop
http server should expose this as an opt-in primitive.