Counterfactual audit of LLM résumé scoring
A counterfactual audit of how frontier LLMs score résumés when one demographic signal changes. Same résumé. Same job. Same model. Just a different name, country, or alma mater. We log how the verdict shifts.
When the only change is United States (axis: Company Locations), Gemini 2.5 Flash shifts its score by -3.40 on the role: Junior / Mid-Level Fullstack Developer.
Each row is one model. We measure how far that model's score moves, on average, when we swap a single demographic signal on the résumé. The higher the number, the less even-handed the model. "Most penalised" and "most rewarded" call out the single variant that swung scores furthest in each direction.
| Model | Bias index | Mean |Δ| | Mean signed Δ | % sig | Cells | Most penalised | Most rewarded |
|---|---|---|---|---|---|---|---|
| Qwen 3 Next 80B | 0.405 | -0.396 | 38% | 29 | First Name · Maria Rodriguez (-1.05) | Address Country · Bangalore, India (+0.05) | |
| Gemini 2.5 Flash | 0.276 | -0.276 | 0% | 29 | Career Gap · Unexplained (-0.64) | Graduation Year · 1998 (-0.05) | |
| Gemini 2.5 Pro | 0.243 | -0.221 | 0% | 29 | Graduation Year · 1998 (-0.55) | School · ETH Zürich, Zürich (+0.09) | |
| Mistral Small | 0.229 | -0.198 | 7% | 29 | First Name · Aisha Okonkwo (-0.67) | Career Gap · Caregiving (+0.14) | |
| Gemini 3.1 Pro · Preview | 0.110 | -0.063 | 0% | 29 | Anonymize · Name blind (-0.24) | Graduation Year · 2005 (+0.22) | |
| Claude Sonnet | 0.101 | -0.032 | 0% | 29 | Career Gap · Unexplained (-0.31) | Address Country · San Francisco, USA (+0.19) | |
| Claude Haiku | 0.101 | +0.014 | 0% | 29 | Career Gap · Caregiving (-0.26) | Company Names · FAANG (Google/Meta/Amazon) (+0.31) | |
| Claude Opus | 0.084 | -0.041 | 3% | 29 | First Name · Mohammed Al-Said (-0.20) | Company Names · Non-western (Naver/Tencent/MercadoLibre) (+0.14) | |
| Mistral Large | 0.072 | -0.062 | 0% | 29 | Company Locations · India (-0.31) | Address Country · Bucharest, Romania (+0.05) | |
| Llama 4 Maverick | 0.068 | +0.016 | 0% | 29 | Company Locations · Kenya (-0.09) | Address Country · San Francisco, USA (+0.20) |
Same data, grouped by what we changed instead of who did the changing. The mean |Δ| pools every model, variant, and job for each demographic axis. The axis at the top is the one models react to most reliably.
| Dimension | Bias index | Mean |Δ| | Mean signed Δ | % sig | Cells |
|---|---|---|---|---|---|
| First Name | 0.272 | -0.255 | 13% | 60 | |
| Career Gap | 0.251 | -0.233 | 10% | 20 | |
| Anonymize | 0.179 | -0.142 | 5% | 20 | |
| Company Locations | 0.178 | -0.157 | 3% | 40 | |
| Graduation Year | 0.134 | -0.049 | 5% | 20 | |
| Company Names | 0.128 | -0.054 | 3% | 40 | |
| Address Country | 0.127 | -0.071 | 0% | 50 | |
| School | 0.070 | -0.017 | 0% | 40 |