How to Write an Interview Scorecard with AI in 2026
A practical walkthrough for writing structured interview scorecards with AI — the right structure, what to never let AI invent, and the free tool that handles it. For recruiters and hiring managers who want defensible, EEOC-aware evaluations.
A strong interview scorecard does three things: it forces the interviewer to evaluate the specific competencies that matter for the role (not vibes), it captures evidence from the actual interview so the rationale is defensible later, and it produces a comparable score across candidates so the hiring decision isn't decided by who interviewed last. Structured scorecards are also the single best defense against bias creeping into the hiring decision — both legally (EEOC, NYC Local Law 144 AEDT, and similar) and practically (recency bias, halo effect, affinity bias). AI is excellent at producing the structural part in three minutes. The competencies to evaluate, the weighting, and the actual judgment of evidence — those are yours.
This is a practical walkthrough for writing an interview scorecard with AI that holds up under hiring committee review.
What a strong interview scorecard contains
Before you can use AI well, you need to know what good looks like:
- Header block — candidate name, role, interviewer, date, panel position (e.g., "Round 2 of 3, Technical Interview")
- Competency rubric — the 4–6 specific competencies tied to the role (not generic "communication" or "teamwork" — specific to the actual job)
- Per-competency rating scale — anchored (e.g., 1 = strong evidence against, 3 = no evidence either way, 5 = strong evidence for), not a vague 1–5
- Evidence requirement — every rating must be supported by specific behavioral evidence from the interview, not "felt smart" or "good culture fit"
- Overall recommendation — hire / no hire / strong hire / strong no hire, with the criterion that triggered the call
- Open questions — what you'd still want to verify in subsequent rounds
- Bias check — a brief self-check on whether the rating reflects evidence or impression
The scorecards that produce good hires are the ones where competencies are role-specific, evidence is required for every rating, and the interviewer commits to a recommendation before reading anyone else's scorecards. AI handles the structural and language layer; you provide the role-specific competencies and the evidence.
The right prompt structure
The mistake most interviewers make on first try is asking AI to "score this candidate." The prompt that actually works gives the AI the role context, the competencies to evaluate, and the evidence from the interview:
<task>Generate a structured interview scorecard.</task>
<context>
Role: Senior Product Manager, Growth
Round: 2 of 3 (Cross-functional collaboration interview)
Interviewer: [name + role]
Candidate (first name only): Alex
Date: May 20, 2026
Competencies for THIS round (per role rubric):
1. Stakeholder management — handles competing priorities across PM/Eng/Design
2. Data-driven decision-making — uses metrics to inform tradeoffs
3. Communication — explains technical concepts to non-technical audiences
4. Conflict resolution — surfaces and resolves cross-functional disagreement
Evidence from the interview (behavioral, no personal characteristics):
- On stakeholder management: described a recent project where they had Engineering pushing for refactor-first and Sales pushing for new-feature-first. Reframed it as "what does the leading metric tell us about which moves growth?" — got data, then aligned both sides on the data-supported priority. Concrete example with specific outcome.
- On data-driven: when asked about a previous failed launch, walked through the leading indicators they'd watched, the threshold that triggered their concern, and how they pulled back the rollout. Quoted specific metric numbers.
- On communication: explained a complex experiment design to me (interviewer is non-PM) in 4 minutes without jargon. Asked good clarifying questions back.
- On conflict resolution: described a disagreement with their previous EM about timeline; walked through how they brought the disagreement to the surface, what each side conceded, and the eventual outcome. Was honest that the outcome was a compromise neither side fully owned.
Notable: candidate was articulate about gaps in their own previous decisions; not defensive when probed.
Rating scale (per role rubric): 1=strong evidence against, 2=evidence against, 3=neutral/insufficient, 4=evidence for, 5=strong evidence for
</context>
<instructions>
- Produce a scorecard with: per-competency rating (1-5) + evidence + rationale
- Rating must be supported by SPECIFIC behavioral evidence from interview notes above
- Overall recommendation: hire / no hire / strong hire / strong no hire
- Open questions section: what to verify in Round 3
- Bias self-check section: prompt to reflect on whether the rating reflects evidence vs impression
- 500 words maximum
- Avoid personal characteristics (age, race, gender, accent, attractiveness, family status)
</instructions>
<avoid>
- Inventing evidence not in the interview notes
- Ratings without supporting evidence
- Personal characteristics or proxies for them
- Vague language ("good fit," "strong culture match")
- Justifying a rating with anything other than behavioral evidence from this interview
</avoid>The structure: role-specific competencies, behavioral evidence from the actual interview, and explicit instructions about what NOT to use as evidence. The AI produces the scorecard; you provide the competencies and the evidence.
What to never let AI do
Use personal characteristics as evidence. Age, race, gender, accent, family status, attractiveness, where they went to school, what they're wearing — none of these should appear in the scorecard. AI can produce evaluations that incorporate these signals if you give them as input. Don't give them as input. The scorecard evaluates behavioral evidence tied to job-relevant competencies.
Invent evidence not in the interview. If the candidate didn't actually demonstrate stakeholder management in the interview, the rating for that competency is "3 — insufficient evidence," not "4 — based on plausible inference." AI will sometimes produce confident ratings on competencies the candidate didn't actually demonstrate. Don't accept those ratings.
Make hiring decisions. The scorecard is one interviewer's data point on one round. The hiring decision is made by the hiring committee with all rounds' scorecards together. AI produces the scorecard; humans produce the hire decision.
Use "culture fit" or "vibes" as competencies. These are the categories where bias enters hiring. If the rubric has "culture fit" as a competency, you can usually find behavioral evidence that supports whatever the interviewer already felt — which means the competency is doing no work and producing risk. Replace with specific behavioral competencies tied to the actual job.
Score candidates against each other. Each scorecard is the candidate's evaluation against the role's competency rubric — not against other candidates. Cross-candidate comparison happens at the hiring committee level, after independent scorecards are complete. AI should not be doing the comparison.
Common mistakes
Generic competencies. "Strong communicator" applies to almost everyone or no one. "Explains technical concepts to non-technical audiences without jargon" is testable. Role-specific competencies, evaluated with role-specific behavioral evidence.
Ratings without evidence. A "4 — strong communicator" without a specific example from the interview is a vibe, not evidence. Force the evidence requirement.
Lifting candidate quotes incorrectly. If you paraphrase what the candidate said in your notes and the AI then quotes the paraphrase as if it's verbatim, downstream review can't verify the evidence. Either use actual quotes or use clearly-marked paraphrase.
Skipping the bias self-check. Asking yourself "is this rating based on behavioral evidence or based on impression" is a small step that catches a lot of drift. The scorecard should prompt this self-check.
Discussing candidates with other interviewers before scorecards are written. Pre-discussion contaminates the scorecard with someone else's view. Each interviewer writes the scorecard independently first, then the committee discusses.
What to never put in the scorecard
- Personal characteristics or proxies (school, neighborhood, hobbies that signal demographic group)
- Speculation about candidate's family situation, family planning, or personal life
- Comparisons to other candidates being interviewed for the same role
- Confidential business information that could leak in a discovery process if hiring is challenged
- Anything that wouldn't hold up if the candidate received the scorecard in a regulatory investigation
These aren't AI-specific risks — they apply to any interview scorecard. EEOC and (where applicable) NYC AEDT, Illinois AI Video Interview Act, Colorado AI Act, and similar regulations make the discipline non-optional. AI doesn't flag these risks unless you tell it to; the recruiter or hiring manager does the final review.
The free tool that handles this for you
If you don't want to engineer the prompt every time, the Interview Scorecard Generator on AI Career Lab is pre-configured for the structure that produces defensible, EEOC-aware evaluations. It produces scorecards with the elements above, in a format that holds up under hiring committee and regulatory review.
Pair it with the Job Description Generator for the upstream role spec and the Candidate Outreach Generator for the sourcing pipeline.
Free with an AI Career Lab account, capped at five runs per day on the free tier.
Try it on your next interview round
Pick a candidate you're interviewing this week. Lock in the role-specific competencies before the interview. Take behavioral evidence notes during. Run the inputs through the tool above. Compare to scorecards you've written by hand — note the difference in structure, evidence specificity, and bias-check completeness.
Create your free AI Career Lab account and try the recruiter tools today. No credit card.
This article is general guidance for recruiters and hiring managers. AI-generated interview scorecards are starting drafts requiring interviewer review for accuracy, EEOC compliance, and bias mitigation. NYC Local Law 144 (AEDT bias audit), Illinois AI Video Interview Act, Colorado AI Act, EEOC AI guidance, and similar regulations may apply to AI-assisted hiring workflows; consult your legal counsel and HR compliance team on applicability in your jurisdiction.
Save hours every week with the AI Career Lab — All 7 AI Cowork Vaults
All seven profession-specific AI Cowork Vaults — 315 skills total. Works on Claude Cowork and Microsoft 365 Copilot Cowork.
Related Guides
How to Install the Recruiter Claude Plugin (Cowork & Code)
Step-by-step installation guide for the Recruiter Claude plugin from The AI Career Lab — works in both Claude Cowork (chat) and Claude Code (terminal). What you get, how to install, and your first run.
Best AI Tools for Recruiters in 2026
A curated list of the best AI tools for recruiters in 2026 — job descriptions, candidate outreach, interview scorecards, and offer letters.
AI for Recruiters: Cut Time-to-Hire by 30-50% with Smarter Workflows
Learn how recruiters are using AI to write better job descriptions, personalize candidate outreach, and build structured interview processes.