live kernel-debugging windows drivers in qemu from linux (no windbg, no whpx, no exdi)

a note on how we actually breakpoint windows kernel drivers, because the question keeps coming up in this exact shape: someone runs a qemu windows guest, points windbg at it over kdnet or exdi, fights it for hours, finally gets symbols to resolve, and then no breakpoint ever fires. the usual follow-on guess is “is whpx the problem?”

the short answer is that the whole windbg-over-the-network path is the hard way, and the breakpoint that never fires is a specific, well-understood failure that has nothing to do with whpx or symbols. we debug windows drivers from a linux host with nothing but qemu and plain gdb, attached to the qemu gdbstub. no windbg, no kdnet, no exdi, no whpx. what follows is the recipe and the handful of rules that make it reliable, each of which we learned by getting it wrong first.

why not windbg / kdnet / exdi / whpx

nothing here is a knock on windbg. it is the better tool once it is attached. the issue is everything in front of “attached.”

whpx is the windows hypervisor platform accelerator. on consumer parts (the question that started this was an i7-9700-class part) the advanced virtualization settings whpx wants are frequently unavailable or half-exposed, and qemu throws on launch or runs degraded. you are debugging the accelerator instead of the driver. if you have a linux box, kvm has none of this friction.
kdnet ships kernel-debug packets over udp to a windbg client. it works, but it adds a second machine (or a wine windbg), a udp tunnel, a debug key, and a guest-side bcdedit /debug on reboot dance before you can set a single breakpoint.
exdi is the generic “bring your own debug transport” bridge. it can sit on top of the qemu gdbstub, but in practice it is where the “symbols resolved after hours, now nothing breaks” reports come from: you are debugging the bridge’s view of the target, not the target.

the qemu gdbstub is already a kernel debug transport. gdb speaks it natively. so skip the bridge and talk to it directly.

the setup: qemu/kvm on linux with the gdbstub on

run the windows 11 guest under kvm and add one flag: -gdb tcp:127.0.0.1:1234. the parts that matter for debugging are the accelerator (kvm, not whpx), a qmp control socket so you can resume/reset the vm out of band, and the gdbstub. a trimmed launch:

qemu-system-x86_64 \
  -enable-kvm -machine q35,accel=kvm,smm=on \
  -cpu host,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vpindex,hv_runtime,hv_time,hv_synic,hv_stimer,hv_frequencies \
  -smp 4 -m 6144 \
  -drive if=pflash,format=raw,readonly=on,file=/usr/share/OVMF/OVMF_CODE_4M.fd \
  -drive if=pflash,format=raw,file=win11.OVMF_VARS.fd \
  -drive id=osdisk,file=win11.qcow2,if=none,format=qcow2,cache=none,discard=unmap \
  -device ide-hd,bus=ide.0,drive=osdisk \
  -device virtio-net-pci,netdev=net0 \
  -netdev user,id=net0,hostfwd=tcp::2231-:22,hostfwd=tcp::3396-:3389 \
  -device qemu-xhci,id=xhci -device usb-tablet,bus=xhci.0 -device usb-kbd,bus=xhci.0 \
  -qmp unix:/tmp/byovd-qmp.sock,server,nowait \
  -gdb tcp:127.0.0.1:1234 \
  -vnc 127.0.0.1:9 -serial file:guest-serial.log

a few operational choices that saved us grief, all learned from a vm that kept getting reaped or thrashing the host:

launch qemu as a transient systemd unit (sudo systemd-run --unit=... -p MemoryMax=10G -p MemorySwapMax=0 qemu-system-x86_64 ...) rather than nohup ... & from a tool shell. backgrounded from an ephemeral shell, qemu gets reaped with the shell’s cgroup; as a system unit it lives in its own cgroup owned by init and survives every shell. the memory cap with swap off means a runaway guest can never thrash the host.
control the vm over qmp, not in-guest shutdown. reset / powerdown / status / snapshot all go through the qmp unix socket. in-guest shutdown /r over ssh is unreliable; system_reset over qmp is not.
drop the emulated tpm for the debug vm. the tpm is an install-time gate; windows 11 boots fine without it at runtime, and a saved swtpm state that fails to re-init is just one more flaky dependency. install with the tpm, debug without it.

with that running, attaching is one line of gdb:

gdb -batch -nx -x your-script.gdb
# where the script begins:
#   set pagination off
#   set confirm off
#   target remote 127.0.0.1:1234

rule 1 (the big one): hardware breakpoints only

this is the fix for “symbols resolved, but no breakpoint ever hits.”

a normal gdb break is a software breakpoint: it writes a 0xCC byte into the instruction stream at the target address. on windows kernel drivers this silently fails, for a reason that has nothing to do with symbols or the debugger:

the PAGE-attributed section of a windows driver is pageable. break writes 0xCC into a page that may not be resident. when that page is later faulted in fresh from the image file, your 0xCC is gone. the breakpoint fires once at most, then never again, and usually never at all.

so it presents exactly as the symptom people report: you can resolve the symbol, you can set the breakpoint, gdb says it is set, and execution sails right past it. people blame whpx, or symbols, or the bridge. it is none of those. it is the 0xCC getting paged away.

use hardware breakpoints instead. hbreak programs the cpu debug registers (DR0-DR3) to match on the virtual address regardless of paging, and faults the page in on execute. it just works on pageable kernel code.

hbreak *0xfffff8025e9f1983

the one cost: there are only four debug registers, so four hardware breakpoints at a time. that is plenty for confirming a primitive; budget them.

rule 2: resolve the base without pdbs, and keep it stable

you do not need windows symbols to set hbreak *<address>. you need the driver’s load base, and you add the rva your disassembler shows (file va minus the pe imagebase). two wrinkles:

aslr re-randomizes the base every boot. if you cache a base across a reboot or an sc stop/start, your hbreaks point at stale addresses and silently never fire. re-resolve per capture.
keep the driver resident with a stable base for the boot by making it a system-start service (sc config <drv> start= system). no unload/reload churn to strand your breakpoints mid-session.

resolve the base from inside the guest with a tiny psapi agent over ssh, no symbols, no pdb, no race:

Add-Type @"
using System;using System.Runtime.InteropServices;using System.Text;
public class K{
 [DllImport("psapi")] public static extern bool EnumDeviceDrivers(IntPtr[] b,int c,out int n);
 [DllImport("psapi",CharSet=CharSet.Unicode)] public static extern int GetDeviceDriverBaseNameW(IntPtr i,StringBuilder n,int s);
}
"@
$need=0;[K]::EnumDeviceDrivers($null,0,[ref]$need)|Out-Null
$cnt=$need/[IntPtr]::Size;$arr=New-Object IntPtr[] $cnt
[K]::EnumDeviceDrivers($arr,$need,[ref]$need)|Out-Null
foreach($a in $arr){$sb=New-Object Text.StringBuilder 260
 [K]::GetDeviceDriverBaseNameW($a,$sb,260)|Out-Null
 if($sb.ToString() -like '*YOURDRV*'){ "BASE=0x{0:x}" -f $a.ToInt64() }}

enabledevicedrivers returns the resident virtual base of every loaded driver. add your rva and you have the exact address for hbreak. (note 0xCC paging affects software breakpoints, not this; the base is just an address.)

rule 3: gdbstub housekeeping that bites everyone

three small things, each of which cost a real afternoon:

the gdbstub is single-client. a stale gdb holds the stub and every later attach gets “packet error.” kill stale gdb before attaching.
killing gdb with -9 leaves the vm halted. detach cleanly, or always issue a qmp cont afterward. wrap it in a finally: so a timeout or crash in your harness still resumes the guest.
never raw-probe the gdb tcp port. a bare connect() to :1234 (a port scan, a nc, a liveness check) halts the vm. only gdb should ever touch that port.

rule 4: driving the path, and force-loading hardware-gated drivers

a breakpoint is useless if the code never runs. how you reach the vulnerable path depends on how the device is created:

device created in DriverEntry (IoCreateDevice + IoCreateSymbolicLink at load, no pnp gate): you can force-load the real signed driver with no hardware at all. sc create NAME type= kernel binPath= C:\path\drv.sys then sc start. shipped catalogs are often sha1-signed and rejected on win11, so re-sign with a test cert that is trusted in the debug vm (test-signing on). open \\.\NAME from a usermode probe and you are on the path.
device created in AddDevice (pnp / hardware-gated): force-load is not enough; the device only exists when a matching devnode appears. create a software devnode with the inf’s hardware id (SwDevice / pnputil /add-driver /install) so AddDevice runs and builds the genuine device context, no peripheral needed. for usb-attached stacks you can also hot-plug an emulated device over qmp (device_add usb-hub,bus=xhci.0) to drive AddDevice exactly once per boot.
a hidden runtime gate is the common last-mile snag: the sink may be guarded by a .data flag that only pnp START_DEVICE sets, which a force-load never gets. if that gated code reads only irp data, you can poke the flag to the value a real victim would have (a one-shot gdb write at base+rva, then detach) and stay honest about it. if it reads uninitialized device-context state, do not poke it; force the real AddDevice path instead, or your crash is un-attributable to the bug.

a caution on in-memory patching generally: nopping gates to “get past” device creation produces an uninitialized device whose later crash proves nothing. patch for inspection and self-contained checks, not to synthesize a device-context state machine.

where the addresses come from: glaurung on the static half

rule 2 says to add “the rva your disassembler shows.” that disassembler is glaurung, the binary-analysis toolkit, and it is worth being precise about what it does and does not contribute here, because the discipline matters.

glaurung owns the static half that decides what to breakpoint:

enumerate the driver’s ioctl dispatch surface and decode the control codes, so you know which handler an attacker can actually reach (a permissive device acl that grants Everyone read/write is what turns a kernel bug into an unprivileged one).
run structural bug-class scanners over the reachable handlers to flag the candidate sink, and lift the handler to pseudo-c so a reasoning pass can read it.
hand you the function and the rva of the copy/sink site — the number you add to the live base for hbreak.

the rule, and it is non-negotiable: glaurung’s lifted pseudo-c is a lead, never the evidence. the lifter is not magic; it will hand a plausible-looking length or flags argument straight through. so the exact byte and offset we put a hardware breakpoint on is always re-grounded on capstone disassembly of the real shipped bytes before it counts. for the example below, that meant a per-driver disasm pass to locate the handler and sink, then an independent capstone disasm that reproduced the chain byte-for-byte. glaurung points; capstone confirms; the gdbstub then proves it fires.

so the full handoff is: glaurung lifted-c finds the candidate sink and its rva, capstone ground-truths the exact address, and the live gdbstub proves it executes. the static catalog of what this pipeline has surfaced lives on the glaurung windows driver findings page; the longer narrative on keeping an llm honest against ground truth is the notepad write-up. this page is the dynamic tail of that same pipeline.

a worked example: catching a ring-0 saved-rip overwrite live

to make this concrete: a roccat mouse filter driver (KovaPlusFltr.sys) creates its control device in DriverEntry, so we force-loaded the real signed binary with no hardware, then put two hardware breakpoints around the buffer copy in its ioctl handler: one before the copy, one after. the gdb script is exactly what rule 1 implies:

target remote 127.0.0.1:1234
hbreak *0xfffff8025e9f1983
commands 1
  printf "BEFORE-COPY count(r8)=%#lx dst(rcx)=%#lx saved-RIP[rsp+0xc68]=%#lx\n", $r8, $rcx, *(unsigned long*)($rsp+0xc68)
  continue
end
hbreak *0xfffff8025e9f1989
commands 2
  printf "AFTER-COPY  saved-RIP[rsp+0xc68]=%#lx\n", *(unsigned long*)($rsp+0xc68)
  continue
end
printf "CAP-ARMED\n"
continue

send an oversized ioctl from a usermode probe and the saved return address on the kernel stack is overwritten with attacker bytes:

Thread 3 hit Breakpoint 1   BEFORE-COPY count(r8)=0xbe0 dst=0xffffc20505e73ac0 saved-RIP[rsp+0xc68]=0xfffff802c988697e
Thread 3 hit Breakpoint 2   AFTER-COPY  saved-RIP[rsp+0xc68]=0x4141414141414141

the saved rip went from a real tcpip/ntoskrnl-region address to 0x4141414141414141. that is a controlled ring-0 instruction-pointer overwrite, captured live, on the real shipped bytes. the benign control (an in-bounds count = 0x40) runs the same path and leaves the saved rip untouched:

BEFORE-COPY count(r8)=0x40 ... saved-RIP[rsp+0xc68]=0xfffff8019fd0697e
AFTER-COPY  saved-RIP[rsp+0xc68]=0xfffff8019fd0697e   (unchanged)

the control is not optional. it is what proves the trigger reached the vulnerable code and that the synthesized environment is not itself the cause. a single triggering case without a passing control is not a reproduction.

two practical notes from sizing that overflow: aim the overwrite to land exactly on the saved-rip slot ([rsp+frame]) and stop before clobbering the incoming parameters, or the post-copy code dereferences garbage params and hangs before it ever returns. and if the function then forwards to a lower hardware stack that is absent in your no-hardware setup, it simply parks rather than bugchecking; the debugger-attributed overwrite is the splat-equivalent here. it demonstrates control, which is the thing that matters.

what about the kmdf-talks-to-a-pcie-device case

the original thread also asked about a kmdf driver communicating with a pcie device, using qemu’s edu educational pci device as the endpoint, and whether it can all run on a windows host. it can, but that is the configuration that runs straight into the whpx wall on consumer hardware. the smoother shape is the same as above: run the guest under kvm on a linux host, expose edu to the guest (-device edu), develop the kmdf driver normally inside the guest, and when you need to breakpoint the driver, attach gdb to the gdbstub with hardware breakpoints. you get a deterministic, scriptable kernel debugger for your own driver without standing up kdnet or fighting exdi, and the edu mmio/irq path behaves the same whether or not the debugger is attached.

the rules, condensed

run the guest under kvm on linux, not whpx; add -gdb tcp:127.0.0.1:1234 and attach plain gdb. no windbg/kdnet/exdi required.
hardware breakpoints only (hbreak). software 0xCC breakpoints die on pageable kernel code and present as “breakpoint never hits.” four slots.
resolve the load base per boot via in-guest EnumDeviceDrivers (no pdbs); keep the driver start= system for a stable base; aslr moves it every reboot.
gdbstub is single-client; kill stale gdb first; always qmp cont on exit; never raw-probe :1234 (it halts the vm).
reach the path honestly: DriverEntry devices force-load with no hardware; AddDevice devices need a SwDevice hwid; always run a benign control on the same path.