If You Cannot Measure the Behaviour, You Did Not Change It

There is a conversation that happens at the end of every leadership development program. It involves a room full of participants who are energized, a sponsor who is relieved, and a facilitator who is closing the loop. The feedback forms are positive. The ratings are high. Someone says, genuinely and not sarcastically, that this was the best program they have ever attended.

Six months later, most of the behaviour is gone.

Not all of it. Some things stick: a phrase someone picked up, a framework they reference occasionally, a moment from a simulation that they still think about. But the difficult feedback conversations they said they would have? Still not happening. The executive who went quiet in every cross-functional meeting? Still quiet. The pattern of escalation avoidance that had been costing the team for two years? Intact.

This is not a failure of intent. The leaders who attended that program wanted to change. The sponsor believed in it. The facilitator was skilled. The problem is that the program was evaluated on the wrong thing, and so the organization never got the signal that the change had not held.

Satisfaction is not behaviour change. It has never been behaviour change. But the leadership development industry has built its entire business model around measuring the former and calling it evidence of the latter.

Why most leadership measurement fails

The smile sheet, the end-of-workshop feedback form however sophisticated it has become, measures one thing reliably: how participants felt at the end of the program. In good hands, it also captures perceived relevance and stated intent. What it cannot capture is whether any of that intent translated into different behaviour in a real conversation, under real pressure, with real stakes.

The problem is not that organizations are naive. It is that easier measurement is persuasive. A 4.7 out of 5 on a post-program survey feels like evidence. It is something to show a steering committee. And because the alternative, measuring actual behaviour change over a twelve-month horizon, is expensive and uncomfortable, most organizations settle.

They settle even when the costs of the unmeasured failure are enormous. A senior leader who still cannot give direct performance feedback is not a training problem. She is a retention problem, a transformation problem, and eventually a regulatory problem. The cost of her unchanged behaviour is orders of magnitude higher than the cost of a more rigorous measurement system. But the cost is distributed, invisible, and not attributed to the program. The high smile-sheet score is easy to attribute to the program. So that is what gets measured.

There is also a structural dynamic at work. Leadership development vendors are not incentivized to measure at 90 days. The measurement would surface the reversion. It would complicate renewals. It would require a different kind of engagement, one built around sustained behaviour change rather than periodic program delivery. That model is harder to sell and harder to run. The path of least resistance is to finish the workshop, collect the forms, and move on.

This is not cynicism. It is a description of how market incentives shape professional practice. Most vendors doing this work are not deliberately avoiding accountability. They have simply built their practices around the measurement systems that clients accept, and clients accept smile sheets because that is what they have always received.

What observable behaviour actually looks like

When we talk about measuring behaviour, we mean something specific. Not attitude surveys. Not self-reported confidence ratings. Not 360-degree feedback forms that measure perception rather than pattern. We mean observable changes in what leaders actually say and do in their real conversations: the ones that have always been available to observation and have simply not been watched.

The questions that produce useful behavioural data are operational, not abstract. Does the director who completed the coaching program now open her one-on-ones with an open-ended question, or does she still lead with her own agenda? When the VP receives pushback on a decision, does he engage with it directly in the meeting, or does he acknowledge it generically and route the real response to a side-channel conversation afterward? When the executive team is discussing a high-risk initiative, are the three people who privately believe it is underfunded saying so in the room, or is the conversation happening in the hallway later?

These are not hypothetical examples. They are the behavioural patterns we document in every pre-program behavioural audit. And they are the patterns we return to measure at 30, 90, and 365 days.

Thirty days is too early for deep change, but it tells you whether anything has shifted at all. Whether someone tried the framework. Whether a different kind of meeting is beginning to take shape. It is a leading indicator, not a proof point.

Ninety days is where the real signal lives. By ninety days, the initial motivation from the program has faded. The comfortable habits are reasserting themselves. The pressure of daily operations is back. If a leader is still having different conversations at ninety days, still giving the direct performance feedback, still asking the open question before offering the answer, that behaviour has a reasonable chance of becoming permanent.

Three hundred and sixty-five days is the accountability measure. It is the question the organization should be asking when it evaluates whether its investment produced a return. Not "did participants rate the program highly?" but "is this conversation pattern still different from what it was before we started?"

The success metric is not satisfaction. It is whether the conversations are still happening twelve months after we leave.

Nicole North · President, Whiteboard Learning

Why organizations revert after workshops

Reversion is not a character failure. It is a systems failure. Leaders revert because the systems around them have not changed, and behaviour that is not reinforced by the operating environment gets extinguished.

Consider what a leader returns to after a well-designed leadership program. She goes back to a calendar full of meetings structured for information transfer, not conversation. She goes back to a manager who evaluates her on deliverables, not on how she develops the people below her. She goes back to a culture that implicitly rewards avoiding the uncomfortable truth and calls it staying positive.

The program gave her new tools. The system gave her no reason to use them.

Psychological safety is the variable that determines whether new leadership behaviours survive the return to operations. Not the vague, aspirational version: the survey category that reads "I feel safe speaking up." The operational version: does this leader believe that using the new tools she learned will be received well enough to be worth the risk?

That belief is not created by a program. It is created by how her own manager responds the first time she tries something different. If she gives direct developmental feedback to a peer and the peer complains, and the response from above is silence, she will not give that feedback again. The measurement system that would have captured this dynamic is not a post-program survey. It is a conversation with that leader at day thirty, asking what she tried and what happened.

This is why single-event programs fail at a rate that should embarrass the industry. A two-day workshop cannot override years of environmental conditioning. Behaviour change at any depth requires a sustained engagement with the operating context, not parallel to it but embedded in it.

This is not a new insight. The organizational behaviour literature has documented the conditions for sustained change for decades. What is new is how consistently the leadership development industry ignores those conditions, and how consistently organizations accept programs that are structurally unlikely to produce durable change.

How behaviour change is sustained

The distinction between a program and a system is not semantic. A program has a start date and an end date. A system is operational. It runs. It measures. It adjusts. When the new behaviour degrades, the system surfaces it and addresses it, rather than waiting until the next program cycle to revisit the topic.

Sustaining behaviour change requires three things most leadership development engagements do not provide.

The first is post-program measurement. Not a day-thirty survey sent as an administrative closing task, but a structured observation process that documents whether specific conversational behaviours are present, absent, or degraded, and feeds that information back to leaders and managers in a form they can act on.

The second is a reinforcement structure embedded in the operating cadence. Not an extra meeting. A redesign of existing touchpoints (one-on-ones, team meetings, performance conversations) so the new behaviours have a natural habitat. A coaching question built into every one-on-one agenda. A feedback moment in every post-project debrief. A meeting norm that creates space for disagreement that would previously have surfaced only in the hallway after.

The third, and the most frequently absent, is visible modelling from the layer above the cohort. Leaders watch their own leaders more carefully than they watch any training programme. If the executive sponsor does not model the behaviours the program teaches, the message is clear: this is for you, not for us.

The hardest moment in leadership development is not the workshop. It is not the simulation, however uncomfortable that is designed to be. The hardest moment is the live conversation, four months after the program ends, in an actual meeting, with an actual human being who is resistant or senior or both, where the leader has to choose whether to use what she learned or revert to the pattern that has always been safer.

That moment is invisible to a smile-sheet evaluation. It is not invisible to a behavioural measurement system. And whether the leader makes the right choice in that moment depends almost entirely on whether the organization built the conditions that make the right choice sustainable, or whether it handed her a framework and called it a day.

The diagnostic work we do before any program begins is partly about understanding what conversations are being avoided. It is equally about understanding whether the conditions for sustaining change are present. If the senior leadership layer is not prepared to model the change. If the operating cadence has no room for the new behaviours. If measurement is not going to happen past day one. We name that directly. Some of those conditions can be built. Some cannot. The ones that cannot are the reason we do not accept every engagement we are offered.

Behaviour change is engineered, not exhorted. The engineering happens in the operating environment, not in the workshop room, and the measurement is how you know whether the engineering held.

The question worth asking every vendor

If you are a CHRO, an executive sponsor, or a leader responsible for developing other leaders, there is one question worth asking every vendor you consider: How do you measure behaviour change at ninety days?

Not "what does your evaluation framework look like." Specifically: what observable behavioural data do you collect, from whom, and at what points after the program ends?

If the answer involves a post-program survey, that is an answer about satisfaction. Useful. Not what you asked.

If the answer involves structured observation, peer validation, or any mechanism that documents whether specific conversational patterns changed in actual operations, that is a design model aligned with what produces change.

The organizations that close the behaviour gap are rarely the ones that ran the most programs. They are the ones that decided, at some point, that satisfaction was not enough. They built measurement systems that made the change visible, and therefore fundable, sustainable, and real.

That decision begins with a different question at the end of the program. Not "did they enjoy it?" but "is the behaviour different, and can you prove it?"

Start the conversation

If your next leadership programme has to prove it worked, this is where it starts.

A 45-minute discovery call to understand what conversations your leadership team is avoiding, what a measurement framework for your context would look like, and whether this is the right engagement for your organization.

Book a discovery call

If you cannot measure the behaviour, you did not change it.