the scrutiny gradient (2026-04-22 draft notes)

tl;dr

a mean-field survey of how security fixes, CVEs, and downstream enrichment flow through 22 linux base-system repositories. the unit of analysis is the security-relevant commit; the CVE id is a column on that unit, not the structural split.

  • 2.03% of commits across 1,139,828 commits in 22 repositories carry a security signal — 23,122 security-relevant commits.
  • only 5.6% of those security-relevant commits acquire a CVE number. 21,826 are non-CVE security-signal commits.
  • in the kernel slice the non-CVE-to-CVE ratio is 56:1 (19,705 vs 352).
  • 77.6% of 1,051 CVE’d dossiers with both dates show the fix landing on or before public disclosure — a fix-first culture, not a coordinated-disclosure culture.
  • bugs live a median of 4.7 years (1,712 days) before a fix lands, across 888 reconstructable cases.
  • since february 2024 the linux kernel has acted as its own CNA and published 6,239 records; only 226 carry a complete downstream dossier in our corpus. 6,013 are unclaimed by any downstream consumer we could observe.
  • a naive linux-kernel NVD query is 44.5% contaminated by wrong-product records: 532 of the hard-negatives are adobe flash, the rest span IBM middleware, chrome, V8, vmware, oracle, apple, cisco, and firefox.
  • spectre v1, spectre v2, and meltdown do not appear under the kernel CPE at all — NVD tags them under intel and AMD hardware CPEs.
  • the load-bearing security-fix labor is borne by ~3,413 kernel authors, ~10 CNA institutions, and ~5 distribution families.
  • CVE-based measurement captures roughly 1 in 20 security fixes.

draft pdf →

what this version is

this page mirrors the 2026-04-22 revision freeze of the meanfield paper (papers/meanfield/ in the linux-security-paper repo). it is the snapshot that the manuscript’s metadata.tex is frozen against. later versions of the paper will get their own page so the numbers on this page stay tied to the snapshot they were computed from.

the snapshot is composed of four sub-corpora, all keyed to the same commit identifiers:

sub-corpusrolerows
data/cve-dossiers/by_package/CVE-anchored, full dossiers1,418
data/cve-dossiers/out_of_scope/hard negatives / NVD overmatch1,138
data/cve-dossiers/ (orphans)extra packages outside the 11-pkg claim340
data/kernel_commits/security_commits.jsonlnon-CVE security commits (kernel)19,705
data/repo_analysis/<repo>/scan_summary.jsonper-repo flux statistics22
data/cve/kernel_cna_records.jsonlkernel CNA records (feb 2024–)6,239

the scrutiny gradient

the central object of the paper is a funnel — what fraction of fixes are scrutinized enough to acquire a CVE, then enrichment, then KEV prioritization:

all commits                          1,139,828    100%
security-relevant commits               23,122    2.03%
CVE-mentioning commits                   1,296    5.6%  of security
fully-dossiered CVE records              1,108    ~85%  of CVE'd
KEV-listed (actively exploited)             17    0.07% of commits

the headline is the second hop: only ~5.6% of the security-relevant work ever acquires a CVE number. the remaining ~94% lands as fixes-of-record in commit history without any external identifier, and is invisible to any analysis that uses CVE as a denominator.

the CVE is also often wrong

before the dark-matter story even starts, the CVE’d slice itself is contaminated. a naive linux-kernel NVD query returns 44.5% wrong-product records:

  • 532 records are adobe flash (a product that reached end of life in december 2020).
  • the rest spans IBM middleware, chrome, V8, vmware, oracle, apple, cisco, and firefox.

and three of the most famous kernel-security bugs in history are invisible to a naive NVD query for the kernel CPE — spectre v1, spectre v2, and meltdown are all tagged under intel and AMD hardware CPEs, not the linux kernel CPE.

CVE-based measurement of the linux kernel is not just undercounting; it is miscounting.

the february-2024 cliff

since february 2024 the linux kernel has acted as its own CNA. that single policy shift produced an enrichment cliff:

  • 6,239 records published by the kernel CNA in vulns.git since the pivot date.
  • 226 carry a complete downstream dossier (CPE, CVSS, weakness data, reference enrichment) in our corpus.
  • 6,013 are unclaimed by any downstream consumer we could observe.

NIST’s april 2026 NVD operations update formalizes this — it makes risk-prioritized enrichment the official policy, which means the distance between “listed in NVD” and “researchable from NVD” is now an explicit property of the data, not a transient backlog. the gap is widening fastest on the kernel slice that most downstream consumers care about.

fix-first, not disclose-first

of the 1,051 CVE’d dossiers in our corpus with both a fix date and a public-disclosure date:

  • 77.6% land the fix on or before the day of public disclosure.
  • median bug lifetime: 4.7 years (1,712 days), across 888 reconstructable cases.
  • red hat ships a distribution fix 25 days after upstream (n = 672 dossiers); ubuntu ships 101 days after upstream (n = 534).
  • both are typically earlier than NVD enrichment arrives for the same records.

so the working model “vulnerability disclosed → fix scheduled → distros patch → NVD enrichment ratifies” is empirically backwards for most of this slice. the fix landed first, often quietly, and the CVE / NVD machinery ratified afterward.

actors census

the load-bearing population is small:

  • ~3,413 kernel authors touch security-signal commits across the snapshot window.
  • ~850 of them work on the security-relevant code paths specifically.
  • ~10 CNA institutions issue the bulk of records.
  • ~5 distribution families ship the bulk of downstream patches.

this is a small enough population that the gap between “the kernel” and “the people doing the load-bearing fix work” is a real, countable distinction. the paper’s actors section walks through the per-CNA issuance share and the per-distribution backport rate.

release

the integrated corpus is commit-keyed and released as three splits:

  1. CVE-dossiered split — 1,418 in-scope records with full dossier evidence (1,108 fully articulated).
  2. non-CVE security-signal split — 19,705 kernel commits with no CVE assignment, classified as security-relevant by the methodology in section 4 of the paper.
  3. hard-negatives split — 1,138 NVD-overmatch records, preserved as the scope-audit artifact.

the snapshot is frozen at 2026-04-22 revision freeze and released under CC-BY-4.0. the snapshot id is the commit sha of the linux-security-paper repository at the freeze tag; rebuilds from the release manifest reproduce the manuscript’s tables and figures bit-for-bit.

reading order

if you only have ten minutes, read:

  1. the abstract and section 1 — the funnel and the 1-in-20 claim.
  2. section 7 (the CVE gradient) — the contamination and dark-matter story.
  3. section 8 (resolution rate) — fix-first culture and distro lag.

if you have an hour, add:

  1. section 3 (the corpus) — what each sub-corpus is and is not.
  2. section 4 (methodology) — how a “security-relevant commit” is classified and how the dossier audit was run.
  3. section 10 (case studies) — 3-4 vignettes including one non-CVE case.

status

this is a working draft. the pdf linked above is the 2026-04-22 revision freeze that the wiki numbers on this page are computed against. later revisions of the paper will get their own page (and their own versioned pdf) rather than mutating this one in place, so the numbers stay tied to the snapshot they were measured from.

comments welcome — not for citation as a final result.

on this page