Why Giving Your Team a Puzzle to Solve Tells You More Than a Personality Test Ever Could

The test result looked nothing like the person in the room.

A senior manager at a mid-sized Malaysian company had recently gone through her organisation’s annual psychometric assessment cycle. On paper, she was rated as highly collaborative, comfortable with ambiguity, and strong on interpersonal communication. Her DISC profile suggested she was a natural relationship-builder.

Three weeks later, when the team hit a genuine obstacle in a group challenge session, a problem that didn’t have an obvious answer, she went completely silent.

Not thoughtfully quiet. Absent. She had physically withdrawn from the group, stopped contributing, and was visibly uncomfortable until someone else resolved the situation.

The test had captured who she believed herself to be. The experience revealed who she was under mild pressure.

This gap between self-reported personality and observed behaviour is one of the most consequential blind spots in how organisations understand their people.

The fundamental problem with personality assessments

Psychometric assessments are not useless. Used well, tools like DISC, the Big Five, and the Myers-Briggs Type Indicator offer useful frameworks for opening conversations about communication styles and working preferences. They give teams a shared language.

But there is a structural limitation that most HR teams don’t fully account for: when people know an assessment may impact their employment or professional standing, this knowledge influences their responses, consciously or not. We naturally want to present ourselves favourably, and the conditions of a personality test make completely honest responses difficult to achieve.

Research published in 2025 comparing personality test results across high-stakes and low-stakes settings found that faking substantially reduces personality test validity in selection contexts with validity in high-stakes settings running at less than half the level seen in low-stakes research environments.

In plain terms: the more the result matters, the less accurate it tends to be.

This doesn’t mean personality assessments should be abandoned. It means they should be understood for what they are, a starting point, not a conclusion. And it means organisations need something else alongside them: a way to observe how people actually behave when something real is at stake.

What happens when you give a team a genuine problem

In experiential programme design, we use structured challenges- puzzles, simulations, unfamiliar scenarios — not as games, but as behavioural observation tools.

The key word is genuine. A challenge that feels artificial or low-stakes produces artificial, low-stakes behaviour. But a problem that is genuinely unfamiliar, where the group doesn’t know the answer and can’t rely on their job title to guide them, produces something far more revealing.

Watch a leadership team work through an obstacle they haven’t encountered before and you will typically see the following within the first twenty minutes:

Who leads when the hierarchy isn’t obvious. In a meeting room, seniority usually determines who speaks first and whose ideas get pursued. Remove the meeting room and give everyone the same information deficit, and the actual informal leaders of the group become visible. Sometimes they’re the same people. Often they’re not.

Who asks for help. This is one of the most undervalued data points in team assessment. The person who says “I don’t understand this” in a group challenge is demonstrating something important about their relationship with ego and learning.

Someone may explicitly believe they enjoy collaboration, yet their actual responses in collaborative settings suggest discomfort, a discrepancy that personality tests, which rely on self-report, often fail to capture.

Who gets frustrated, and how they express it. Mild stress is a reliable amplifier of default behaviour. A person who manages their frustration well under pressure is showing you something about their emotional regulation that no questionnaire item can fully replicate.

Who defers, and to whom. In Asian workplace contexts particularly, the social dynamics of deference are complex and often invisible in formal settings. A shared challenge creates conditions where these patterns surface naturally, without anyone intending to reveal them.

Where the thinking actually happens. Some teams talk loudly and move quickly but generate poor solutions. Others seem disorganised but produce something unexpected and effective. The process of working toward a solution reveals how a team’s collective intelligence actually functions something that individual personality profiles, by definition, cannot show.

The BEP session that changed how we design programmes

Several years ago, we were running a Murder Mystery Dinner as part of a half-day programme for a financial services leadership team. The group was uniformly high-performing by any standard metric — experienced, senior, used to being the people with answers.

We introduced a systems-based challenge: a multi-part challenge where each piece of information was held by a different participant, and the solution required sharing, synthesising, and prioritising under a time constraint. There was no single right answer. The goal was to construct the most coherent response from incomplete data.

One participant, quiet throughout the morning’s opening session, largely overlooked in a group that contained several dominant personalities, came into his own the moment the group hit its first real obstacle. He began asking questions that nobody else had thought to ask. He drew a rough map on a piece of paper to visualise the information held by different people. He spotted a pattern in the data that redirected the whole group’s approach.

His psychometric profile had described him as introverted and process-oriented. Accurate enough. But it had said nothing about the particular kind of intelligence he carried. A systems-level thinking that only became visible when the system was under strain.

His manager, who had been in the room all day, told us afterwards that in three years of working together, she had never seen him operate that way. She had, without realising it, been consistently underutilising him.

That is the cost of relying on self-reported personality data without observational evidence alongside it.

What this means for HR leaders and programme designers

The argument here isn’t that experiential programmes should replace psychometric assessments. It’s that the two serve fundamentally different purposes and work better in combination.

A personality assessment captures how someone understands themselves. A well-designed shared experience captures how they behave when something is genuinely at stake. Both pieces of information are valuable. Only one of them requires the person to report their own behaviour accurately.

For HR leaders designing leadership development programmes, this distinction has practical implications. If your current programme cycle consists primarily of assessments, coaching conversations, and workshop content — all of which are self-referential by design — you may be missing the most reliable source of behavioural data available to you: watching people work through something real, together, with consequences that matter enough to produce authentic behaviour.

The 90-minute challenge that surfaces a hidden systems thinker, or reveals that your most confident communicator shuts down under ambiguity, is not a nice-to-have addition to your leadership programme. It is often the most important 90 minutes in it.

Designing challenges that actually reveal something

Not every team building activity qualifies. There is an important distinction between a challenge that is genuinely revealing and one that is simply unusual.

A revealing challenge has three characteristics. First, it is unfamiliar enough that no one can rely on existing expertise alone, everyone starts from a similar position of not-knowing. Second, it requires genuine interdependence, the problem cannot be solved by one person working alone, no matter how capable they are. Third, the stakes are real enough to produce authentic behaviour – mild pressure, a genuine time constraint, a problem that doesn’t have an obvious answer.

Activities that are merely novel — novelty for its own sake — tend to produce novelty behaviour. People perform their personality rather than reveal it. The design of the challenge is everything.