Why Gut Feel Works for Some Candidates and Not Others

During the filming of the original Planet of the Apes in 1968, something happened that nobody planned. The actors in chimp costumes and the actors in gorilla costumes started eating lunch at separate tables. No one assigned seating or suggested it. The sorting just happened.

Sapolsky tells this story in his book Behave. The actors weren't hostile toward each other. They weren't even aware they were doing it. The brain had already decided who was who.

Something like this happens in hiring debriefs constantly. And it probably explains why the structured-versus-unstructured interview debate has been stuck in the same loop for twenty years.

Most organizations run one hiring process for every candidate regardless of context. The debate is about whether that process should be structured or unstructured. One side says hiring managers should trust their gut, because experienced people develop good judgment. The other side says gut feeling is biased and you need scorecards, rubrics, independent evaluations. Both sides have evidence, and both are partly right. The reason neither side wins is that they're arguing about the wrong variable. The question that actually matters is which of two fundamentally different problems the hiring team is solving when they sit down with a candidate.

Sapolsky draws on work by Joshua Greene, a moral psychologist at Harvard, who argues that our moral wiring solves two different problems. The first is what he calls the Me-vs-Us problem. Can I resist being selfish within my own group? Can I cooperate, share credit, contribute rather than free-ride? For this problem, fast intuition works well. The default is generosity toward people you recognize as one of us.

The second problem is Us-vs-Them. Can I extend fair treatment to someone my brain has already categorized as an outsider? Here, the same speed that produced cooperation flips. Fast reactions toward out-group members default to suspicion.

When your team evaluates a candidate whose background looks like everyone already on the team (same industry, familiar company names, a career path people recognize), their quick reads are reasonably calibrated. The Me-vs-Us problem is what's operating. Will these evaluators share their honest observations? Will someone hold back to avoid disagreeing with the hiring manager? Mostly, they cooperate. They build on each other's assessments, fill gaps, and arrive at something workable. The process can stay lighter.

Put the same team in front of a candidate from a different industry, an unconventional background, or a demographic that doesn't match the room. Now Us-vs-Them is operating. The gut reactions that produced collaboration five minutes ago produce something closer to closing ranks.

You've probably seen this if you've sat through enough debriefs. Three candidates from well-known competitors get discussed with warmth and specificity. "His two years running that crusher line in Minas Gerais is exactly the operational credibility we need." Then a fourth candidate, someone from an adjacent sector with a less conventional path, draws vaguer objections.

A higher burden of proof that nobody states openly.

"I'm not sure how that experience maps to our environment" is what it sounds like. What's actually happening is their brains sorting the candidate into the out-group and building reasonable-sounding justifications after the fact.

You can't prove this in real time. Each individual objection sounds like professional judgment. "I'm not sure how that experience maps" is a legitimate concern in plenty of contexts. On a case-by-case basis, it looks indistinguishable from rigor. You'd need a pattern across dozens of decisions to make the asymmetry visible, and no single debrief gives you that.

This is probably why the structured-versus-unstructured debate never resolves. Both sides are right, but about different situations.

The economist Samuel Bowles found something that adds a useful wrinkle. When you formalize cooperation that was already happening voluntarily, you can crowd out the motivation behind it. Put rigid rules around something people were doing well on their own, and some of the willingness drains out. The hiring manager who has been told to fill out independent scorecards before she can discuss a candidate she's deeply qualified to assess may comply. But her engagement with the process drops. You've added friction to the one context where friction wasn't needed. Within-group hiring still has problems. People who look alike mistake familiarity for competence. But the cost of imposing structure there is higher, and the cost of skipping it is lower.

Before any evaluation, ask whether the candidate's profile looks meaningfully different from the evaluating team's. What counts as different here is what the evaluators' brains actually detect, which is often background rather than title. A candidate from a different industry who shares the team's demographic profile in other ways may not trip the alarm, but someone who matches the team's industry and differs in gender, nationality, or educational path often will.

The evaluating team is probably the wrong group to answer this question about their own slate. They've already internalized what normal looks like for their group, and the sorting happened before anyone asked the question. A team that has always hired from the same three competitor companies may genuinely not perceive a candidate from a fourth competitor as different, while perceiving someone from an adjacent sector as radically so. The skill overlap might be equivalent. It doesn't matter. The sorting already happened.

The TA partner has their own biases, but they're at least outside the team's specific in-group frame. Better still is something structural. Map the candidate's background against the existing team's profile before the evaluation starts. If the candidate's industry, educational institution, or career path has no overlap with anyone on the evaluating panel, the structured protocol kicks in. Same if the candidate's gender, nationality, or age visibly differs from the majority of the panel. The trigger should be concrete enough that no one has to decide whether it applies. It either matches or it doesn't.

When the answer is yes, the candidate does differ, ask for independent written evaluations before any group discussion. No verbal anchoring, no reading the room before committing to a position. This forces people to think slowly and deliberately instead of defaulting to whatever feels familiar.

When the answer is no, let the team talk. Trust the cooperative instinct. Spend the process overhead where it's actually needed.

Gary Klein, a psychologist who spent his career studying expert intuition, would probably push back on this framework. He has argued convincingly that experienced professionals develop pattern-recognition abilities that formal procedures can't replicate. A hiring manager who has evaluated hundreds of candidates in her domain has genuine expertise that a scorecard can't fully capture. Klein's objection is worth taking seriously, and it's partly why the "always structure everything" advice fails. The framework says expert intuition is well-calibrated for within-group evaluation and poorly calibrated for cross-group evaluation, not that it should be dismissed. The expertise is real. Where it applies is narrower than most people assume.

I think there's a deeper problem hiding in most organizations' hiring data. When you look at your outcomes and see that the process works (hires perform well, teams are satisfied, retention is stable), you're mostly seeing the results of within-group hiring where intuition was reasonably calibrated to begin with. The cross-group candidates who were screened out by a bar nobody acknowledged never entered the data. There is no row in your ATS for the person who would have been a strong hire but faced a higher burden of proof than the three familiar candidates on the same slate. Good outcomes reflect which candidates the process let through, not whether the process itself is fair.

The organizations that do vary their approach tend to calibrate by seniority or function. More structure for VP searches, more testing for engineers. Almost none calibrate by the type of problem operating in the evaluation room. The sorting will keep happening. It's how brains work. The question is whether your process accounts for it or pretends it isn't there.

❝

Models in this article

Me-vs-Us vs. Us-vs-Them Moral Challenge Distinction — Fast intuition produces cooperation within groups but parochial bias across group boundaries, meaning the same decision speed that helps evaluate familiar candidates hurts when evaluating unfamiliar ones.
Discipline: Moral psychology / Behavioral economics
Key research: Rand, Greene & Nowak (2012); Greene (2013); Bowles (2008)
Source: Robert M. Sapolsky, Behave: The Biology of Humans at Our Best and Worst (2017)

The Recruiting Lattice applies mental models from diverse disciplines to the daily work of talent acquisition. Each article introduces one idea and shows where it's already operating in your hiring process.

Me vs. Us, Us vs. Them

Reply

Recommended for you

Quick Links

Socials