The Windows IOCTL Census: A Corpus-Scale, Multi-Architecture Database of the Driver Control-Code Surface

paper
authors: Bommarito, M. J., II
year: 2026
venue: Working paper (draft)
details: Draft manuscript. A Windows driver exposes its kernel through I/O control (IOCTL) codes, and a single unchecked length on the buffer behind one turns an unprivileged call into a kernel write. The community has strong scanners for this surface and a curated list of known-bad drivers, but no map of the surface itself. The Windows IOCTL Census builds that map: a queryable DuckDB database of the control-code dispatch surface of 27,087 signed Windows drivers, recovered by one deterministic, architecture-neutral pass with no symbolic execution. Reading a lifted intermediate representation instead of running a symbolic engine lets it recover a dispatch surface for 80% of the corpus across x86 and x64, including the 32-bit half existing scanners abort on. On the 64-bit lane it adds handler reachability, taint, and the call graph; an LLM ranks the reachable handlers for triage. Headline aggregates: 8.18M functions, 15.95M call edges, 3.1M decoded IOCTL codes, 848K handlers, 63,263 taint sinks. The structural tier is released as a public dataset on Hugging Face; a companion Linux IOCTL Census applies the same method to the Linux ioctl surface. Companion to Needles at Scale (function-level target selection). Not for citation.

pdf preview

citation

Bommarito, M. J., II (2026). The Windows IOCTL Census: A Corpus-Scale, Multi-Architecture Database of the Driver Control-Code Surface. Working paper (draft). Draft manuscript. A Windows driver exposes its kernel through I/O control (IOCTL) codes, and a single unchecked length on the buffer behind one turns an unprivileged call into a kernel write. The community has strong scanners for this surface and a curated list of known-bad drivers, but no map of the surface itself. The Windows IOCTL Census builds that map: a queryable DuckDB database of the control-code dispatch surface of 27,087 signed Windows drivers, recovered by one deterministic, architecture-neutral pass with no symbolic execution. Reading a lifted intermediate representation instead of running a symbolic engine lets it recover a dispatch surface for 80% of the corpus across x86 and x64, including the 32-bit half existing scanners abort on. On the 64-bit lane it adds handler reachability, taint, and the call graph; an LLM ranks the reachable handlers for triage. Headline aggregates: 8.18M functions, 15.95M call edges, 3.1M decoded IOCTL codes, 848K handlers, 63,263 taint sinks. The structural tier is released as a public dataset on Hugging Face; a companion Linux IOCTL Census applies the same method to the Linux ioctl surface. Companion to Needles at Scale (function-level target selection). Not for citation..