We have been struck by the passionate discussion over the article published by McKinsey last month. It should not be surprising, given developer productivity has been discussed for as long as developers have existed, and there is a long line of articles preceding this one, attempting to make sense of the problem. It was something our MD tackled 11 years ago meaning as prescient as he was then it still holds gravity in all walks of software development life.
What is most interesting to us is the dynamic of why measure at all is the central part of the conversation. Many conclude that to measure productivity is impossible, will fall foul of Goodhart’s law (“When a measure becomes a target, it ceases to be a good measure”) or is antithetical to the continuous improvement focus of the agile manifesto. As a result there seems to be a cold war in place between the executives who seek productivity measurement, and the engineers concerned this will lead to wrong-headed conclusions based on imperfect data. We have some sympathy on the engineer front, many of those executives do not have an engineering background or it was so long ago that their frame of reference to understand individual performance of a software developer is null and void, but those executives are there for a reason and have demonstrated skill in order to be the decision makers they are today.
However this conversation rarely seems to look at why measuring productivity is important to executives in the first place.
There is no controversy to the fact that if you can measure something then you can improve it – the controversy comes when one disagrees on the measure, or on the insight and action to resolve. What are the biggest problems solved by a measurement of productivity – We think of them in the following terms:
Resource Management & Competition
In Beck & Orosz’s article they refer to 4 reasons to want to measure productivity, the second of which is the comparison of two investment opportunities – how to allocate headcount or where the most efficient allocation of new headcount would be. This is a super difficult problem and no one way of attacking it depends on the variables. A lot of executives are making these decisions in the arena of competing with other departments or vying priorities inside their own function. Measurements of productivity would then allow those executives to make the most of their resources and capacity. Beck and Orosz’ first reason is to identify engineers to fire – arguably this is a political problem, layoffs rarely happen in isolation but as the result of discussion at the senior leadership level on comparison of the same investment opportunities to maximise capacity and revenue – it’s two sides of the same coin.
They go on to make the interesting point on why sales and recruitment teams can measure productivity accurately – ultimately at the C-level the desire for metrics stems from the need to objectively identify how resources are being used – when the CTO is looking to advocate for a bigger investment into technology than the revenue organisation, a level setting of productivity can help aid that justification with objective and data-led metrics. This is doubly important when there is non-technical leadership keen on shooting down technology spend based on their own ignorance – in those circumstances the CTO / CIO is bringing a knife to the gun fight.
Scale & Improvement
With our current work the majority of clients have 000’s of developers working within their business. The executives we work with, and their senior managers, are managing engineering teams significantly above Dunbar’s number (the group size where an individual knows who each person is and how each person relates to every other person). They want to understand and improve development outcomes at the team and organisational level. If we’re questioning signs of sub-optimal organisational productivity we might invoke 5 whys (to determine root cause of a defeat or problem we can repeat “why?” to dig deeper) – this is very common in production and manufacturing (namely Kaizen and Six Sigma) but less so in Engineering despite the software development lifecycle being organised in a manufacturing manner. This is because this method requires understanding of productivity at aggregations going down to the individual level – that provides flexibility to pivot, drill down, step back and turn data on its head to solve bottlenecks and business problems.
What is common is the concept or concern that measuring at the individual level will inhibit productivity through Goodhart’s law – yet in order to improve in this way the executive is likely not interested in the individual but the aggregations an individual metric provides. We don’t think it’s contentious to expect a team leader to be less productive on an individual basis in comparison to their team when their role is to enhance the team productivity by removing obstacles and blockers. We therefore, think primarily people then confuse this problem with the following.
Attribution and Correspondence bias
Software engineering is by no means the only function where senior members of staff do not necessarily have to be people managers (and it’s old hat to pretend that software developers do not need to form social relationships with their colleagues to excel), yet in any organisation very senior individual contributors in the software engineering organisation have an outsized voice in comparison to senior enterprise sales people in terms of organisational direction.
In our experience this is because they provide a bulwark against inexperienced or naive managers who are trying to drive poorly thought out KPIs – Martin Fowler’s article expresses this well – poor measurements will only make things worse. Yet companies have to make use of proxy measures all the time (Martin’s example wonders how you might manage a legal department). When there are unclear measurements that relate to performance (note performance ≠ productivity) correspondence and attribution bias (when people overemphasise personal characteristics & ignore situational factors) becomes more dangerous. In theory the perfect metric(s) that cannot be gamed will solve the problem of attribution bias. This is a fundamental cognitive bias so it’s impossible to totally disregard it (which is part of the proof of Goodhart’s law) but with objective data it is easier to manage – this is also something that can then also help hold those senior individual contributors to account as well.
It’s an incredibly difficult management problem to ensure middle management (especially at the 2nd line level, close to the front lines but a bit further removed from strategy) can handle these sorts of biases. It is true therefore that any large organisation cannot account for 100% compliance with management and performance norms – the appeal of measurement in the productivity space is that this can be a constant through the levels of an org and minimising attribution errors as they relate to performance.
That’s why we think the work Google has done with DORA is so important. Whilst the proliferation of tools and companies that look to measure the DORA metrics the initial focus on the research was the concept of measuring first. This was the big mindset shift – measurement can drive improvement. If this is coupled with scientific research you could begin to isolate the types of performance metrics that have the least variance when it comes to matching capabilities to outcomes.
Those 4 metrics in the core model do not necessarily need to be consistent (in our experience we have seen no consistent definition of Lead Time To Change for instance).
This is why the proliferation of what was initially a military model (the OODA loop) has come to be so ubiquitous in large businesses – execs are paid (or sometimes pay consultancies!) large compensation packages to cover the Orientation and Decision portion – this is only as good as the information that they can Observe. We could argue this is why you have seen such growth in companies like Snowflake or Amazon Redshift in the last 10 years.
Source: John Boyd’s OODA Loop
As a result the search for a productivity metric will never cease – the way we have seen most companies approach this intensifying problem is to look at it from a project perspective. Ultimately all software development is project management, hence the whole agile vs waterfall debate from 20+ years ago. And for any project management you are always managing a triple constraint:
Do you mitigate velocity, quality or cost in order to achieve your goals? This is part of the decisioning process for executives and you really cannot have it all. As it relates to the problem of measurement – if you can insert a productivity measurement in place of Time / Velocity you create a space to have a reasonable understanding of the tradeoffs in your project therefore maximising business value (something also controversial in terms of measurement but that might be another blog post).
One important part to consider when it comes to business value is not just the delivery of a valuable product that generates revenue, it also means delivery of a safe product given how many end points software has, business value would need to include account for regulatory fines or attacks prevented and now the quality must consider not only the product itself but that of it’s security surface also (no doubt impacting cost as well). Likewise, I think part of the success of SPACE as a framework is that it encompasses not only individual excellence but organisational excellence which can stretch from a more perfect system (say deployment frequency as potential measure) to developer satisfaction and retention (which would also include the Wellbeing portion of DORA).
From this perspective Productivity really only encompasses one third of an organisational focus on efficiency and in doing so we can minimise the concern around “bad measurements”. The William Bartlett article references this tweet (or X or whatever) about a comic with an excellent example of perverse incentives created and expected negative outcomes – where bonuses are paid for every bug fix, leading directly to the cobra effect (an anecdotal story of bounties being paid out for dead cobras leading to the breeding and farming of cobras…). The cobra effect is a slightly darker version of the prevention vs cure discussion (aptly visualised by the Work Chronicles comic) – yet in large organisations, people’s fears form from attribution bias. This means that when measurement rears its head we focus only on the worst possible motivations when it comes to incentives as they relate to measures.
Source: Work Chronicles
It should be obvious, that the outcomes of a bug bounty in a software development organisation utilising the triple constraint would show failure when it comes to quality and cost. With the former dropping with a sudden cascade of new (even though short lived) bugs and rising of the latter (with all the bonuses being paid out…) especially if you consider the amount of time invested in bug fixing in comparison to new feature development.
A unified focus on measuring efficiency using productivity metrics in combination with those of quality and cost can alleviate some of the pitfalls of the McKinsey report i.e. bad management is more difficult if you’re asking people to manage the tradeoffs of the triple constraint. As well as the dangers from considering Beck and Orosz’s mental model of Effort > Output > Outcome > Impact. In this instance they consider Deployment frequency and LTTC as Outcomes – when they are in fact Outputs of Effort.
The danger of considering Deployment frequency for instance as an outcome of software development only furthers the consideration of gaming metrics, you could be deploying software every minute but if the changes are only minimal how much more significant has your productivity become? This further incentivizes bad management to focus on KPIs with no groundings or principles of project delivery or business values and outcome. If you take the same mental model and consider LTTC as an output of effort: for example a team member comes up with an idea (effort) to merge changes faster (output) in order to iterate faster and thus deliver a project on time (outcome) that delivers on a key business objective (impact) – you are now ensuring at least a project focused outcome and are also lessening the danger of creating incorrect incentives. If you think back to the OODA loop you are also then providing further feedback and observations to iterate even faster.
At scale your ability to ask 5 whys is easier with a variety of different measures to allow for root cause analysis and on the attribution bias front an efficiency focus in this multi-faceted approach means there is a deeper discussion on the relative goals of different departments – even in our company the sales team will look at the speed at which they acquire new business as slowing can be a good thing if at the same time the percentage of opportunities won as new business is increasing – they might only be able to optimise so far at which point their tradeoffs don’t make sense. (Our own research shows that LTTC can be over optimised and we see a decrease in productivity when going too fast.). Yet this is also openly covered with my engineering team meaning a significantly more constructive approach at the exec level when my sales and engineering leaders are level setting.
For reference we see the triple constraint in this manner:
Ultimately this is why passions are so high – the ability to pass on principles for problem solving or goal setting in an organisation is obviously very difficult. This goes some way to explaining the proliferation of OKRs in the last 15 years and it’s not as if Google (as instrumental in this space as they are with DORA) is the bastion of efficiency with aborted product launches and discontinued business lines. When it comes to the surveillance angle that gets wrapped into the argument also – we’re in an inescapable new mode of work post Covid – what if you can prove developers are more productive at home? Gitlab certainly seems to have hit the remote mark but there are also many companies where this isn’t the case. If we come back to the original DORA research, measurement is the start, but to expect consistency across different companies is also an illusion. That is no reason to trivialise the importance of aims to measure and improve developer productivity just because of the difficulty, or the potential pitfalls.
There will always be good ideas to improve productivity that are universally true and deemed best practice, “commit early and commit often” being one example – it feels that current rhetoric aims to consider any measurement to be nefarious because it does not fit into such a universal truth. This neglects the importance of trade offs and constraints – constraints are what can be managed with measurements, the better this is communicated and understood in any organisation is how one delivers faster, safer and leaner software.
The Pros and Cons of Relying on DORA Metrics in Continuous Delivery
Introduction In the dynamic world of software development, metrics and…Read More
Budget-Driven Quality: The C-Suite Guide to Cost-Effective Software Excellence
Introduction In the dynamic world of software development, one of…Read More
Bringing objectivity to your decisions
Giving teams visibility, managers are enabled to increase the velocity of development teams without risking code quality.
out of 10 of the worlds biggest banks
of the S&P Top 50 Companies
of the Fortune 50 Companies